On Wed, 2008-12-31 at 13:44 -0600, Josh Paetzel wrote:
> then echo $i >> /tmp/unknown.txt

Yep! Thanks.

> While this approach will probably work, it's bound to be fairly slow.

Yep. I guess I was imagining it would be used for initial "discovery",
ie: find paths to backup then hard-code what you find, but I guess it
probably wouldn't be useful if you wanted discover non-packaged files
during every backup.

The initial "find" takes a while as well as the initial "dpkg --search".
But after that, the package database maybe fits in kernel filesystem
buffer(s)--there is a significant speedup for searches done after the
first one.

Florin's approach also sounds fine (let's call it the "reverse map") and
might be faster. dpkg is slower than I anticipated. My approach should
be conservative enough in that it will catch anything not directly
represented in the package database. Excluding /var, /proc, /sys and
such is a good idea. Let's try again. This version ignores everything
except /etc on my Ubuntu 8.10 laptop... took about 10 minutes to run.
Meh.

-------------------------8<-------------------------
#!/bin/sh

# find_unpackaged.sh
#
# Find unpackaged (unknown to dpkg) files/directories.
#
# Discarding standard error isn't very elegant.
#
# (C)2008 Adam Monsen
# License: GNU General Public License version 3 (or later)

find / \
    -path '/.*' -prune -o \
    -path '/bin*' -prune -o \
    -path '/boot*' -prune -o \
    -path '/cdrom*' -prune -o \
    -path '/dev*' -prune -o \
    -path '/home*' -prune -o \
    -path '/initrd*' -prune -o \
    -path '/lib*' -prune -o \
    -path '/lost+found*' -prune -o \
    -path '/media*' -prune -o \
    -path '/mnt*' -prune -o \
    -path '/opt*' -prune -o \
    -path '/proc*' -prune -o \
    -path '/root*' -prune -o \
    -path '/sbin*' -prune -o \
    -path '/srv*' -prune -o \
    -path '/sys*' -prune -o \
    -path '/tmp*' -prune -o \
    -path '/usr*' -prune -o \
    -path '/var*' -prune -o \
    -path '/vmlinuz*' -prune -o \
    -print > /tmp/allpaths.txt \
    2>/dev/null

for i in `cat /tmp/allpaths.txt`
do
    # show progress with dots
    echo -n .
    if ! dpkg --search $i > /dev/null 2>&1
    then echo $i >> /tmp/unknown.txt.TMP
    fi
done

# since packages have multiple files
sort /tmp/unknown.txt.TMP | uniq > /tmp/unknown.txt
------------------------->8-------------------------

To get any real performance we're probably going to need something more
sophisticated than a shell script.

But it looks like it did work... /tmp/unknown.txt contains over 1,000
lines, and has stuff like /etc/hosts, /etc/aliases, /etc/timezone, and
other stuff that should probably be backed up.

-- 
Adam Monsen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20090101/ab06eaf5/attachment.pgp