diff options
author | Tom Ryder <tom@sanctum.geek.nz> | 2017-08-04 14:28:05 +1200 |
---|---|---|
committer | Tom Ryder <tom@sanctum.geek.nz> | 2017-08-04 14:28:05 +1200 |
commit | 817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0 (patch) | |
tree | ebbcbb70b1b0d99de3182645395bad6b510e9727 /README.markdown | |
parent | Return push result from wanted sub (diff) | |
download | checkem-817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0.tar.gz checkem-817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0.zip |
Add some questions and answers
Diffstat (limited to 'README.markdown')
-rw-r--r-- | README.markdown | 38 |
1 files changed, 38 insertions, 0 deletions
diff --git a/README.markdown b/README.markdown index 6d025cf..0105747 100644 --- a/README.markdown +++ b/README.markdown @@ -20,6 +20,44 @@ You can define a `PREFIX` to install it elsewhere: $ make install PREFIX="$HOME"/.local +Q&A +--- + +### Why is this faster than just hashing every file? + +It checks the size of each file first, and only ends up hashing them if they're +the same size but have different devices and/or inode numbers (i.e. they're not +hard links). Hashing is an expensive last resort, and in many situations this +won't end up running a single hash comparison. + +### I keep getting `.git` metadata files listed in excludes. + +Filter them out by paragraph block. If you have a POSIX-fearing `awk`, you +could do something like this: + + $ checkem /dir | awk 'BEGIN{RS="";ORS="\n\n"} !/\/.git/' + +### How could I make it even quicker? + +Run it on a fast disk, mostly. For large directories or large files, it will +probably be I/O bound in most circumstances. + +If you end up hashing a lot of files because their sizes are the same, and +you're not worried about SHA256 technically being broken in practice, SHA1 is a +tiny bit faster: + + $ CHECKEM_ALG=sha1 checkem /dir + +Theoretically, you could read only the first *n* bytes of each hash-needing +file and hash those with some suitable inexpensive function *f*, and just +compare those before resorting to checking the entire file with a safe hash +function *g*. + +You'd need to decide on suitable values for *n*, *f*, and *g* in such a case; +it might be useful for very large sets of files that will almost certainly +differ in the first *n* bytes. If there's interest in this at all, I'll write +it in as optional behaviour. + License ------- |