Add some questions and answers

author: Tom Ryder <tom@sanctum.geek.nz> 2017-08-04 14:28:05 +1200
committer: Tom Ryder <tom@sanctum.geek.nz> 2017-08-04 14:28:05 +1200
commit: 817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0 (patch)
tree: ebbcbb70b1b0d99de3182645395bad6b510e9727 /README.markdown
parent: Return push result from wanted sub (diff)
download: checkem-817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0.tar.gz
checkem-817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0.zip
1 files changed, 38 insertions, 0 deletions
diff --git a/README.markdown b/README.markdown
index 6d025cf..0105747 100644
--- a/README.markdown
+++ b/README.markdown
@@ -20,6 +20,44 @@ You can define a `PREFIX` to install it elsewhere:
 
     $ make install PREFIX="$HOME"/.local
 
+Q&A
+---
+
+### Why is this faster than just hashing every file?
+
+It checks the size of each file first, and only ends up hashing them if they're
+the same size but have different devices and/or inode numbers (i.e. they're not
+hard links). Hashing is an expensive last resort, and in many situations this
+won't end up running a single hash comparison.
+
+### I keep getting `.git` metadata files listed in excludes.
+
+Filter them out by paragraph block. If you have a POSIX-fearing `awk`, you
+could do something like this:
+
+    $ checkem /dir | awk 'BEGIN{RS="";ORS="\n\n"} !/\/.git/'
+
+### How could I make it even quicker?
+
+Run it on a fast disk, mostly. For large directories or large files, it will
+probably be I/O bound in most circumstances.
+
+If you end up hashing a lot of files because their sizes are the same, and
+you're not worried about SHA256 technically being broken in practice, SHA1 is a
+tiny bit faster:
+
+    $ CHECKEM_ALG=sha1 checkem /dir
+
+Theoretically, you could read only the first *n* bytes of each hash-needing
+file and hash those with some suitable inexpensive function *f*, and just
+compare those before resorting to checking the entire file with a safe hash
+function *g*.
+
+You'd need to decide on suitable values for *n*, *f*, and *g* in such a case;
+it might be useful for very large sets of files that will almost certainly
+differ in the first *n* bytes. If there's interest in this at all, I'll write
+it in as optional behaviour.
+
 License
 -------
author	Tom Ryder <tom@sanctum.geek.nz>	2017-08-04 14:28:05 +1200
committer	Tom Ryder <tom@sanctum.geek.nz>	2017-08-04 14:28:05 +1200
commit	817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0 (patch)
tree	ebbcbb70b1b0d99de3182645395bad6b510e9727 /README.markdown
parent	Return push result from wanted sub (diff)
download	checkem-817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0.tar.gz checkem-817fc93a2ff1c6d2ac1a783d09e2ec1587a13ce0.zip