This request is for the following:
Add hashing (xxHash) as an option
- Add to database (optional index)
- Add to GUI column display (optional)
- Hashing in separate thread with perhaps user-configurable process priority setting
- This hashing in turn would instantly add to ES "duplicate" comparison arsenal of options
Using the xxHash algorithm is common in other file/copy/verify tools (fastcopy, etc.) and performance is far greater than that of larger SHA algorithms which could still be offered as optional user-selectable, but then adds far greater complexity and database storage, processing overhead, etc.
This writeup is just one of many that outline the pretty shocking performance of the xxHash algorithm compared to other common SHA-based when it comes to performance vs crypto benefits:
https://cyan4973.github.io/xxHash/
Enabling this as perhaps a new "utilize ES hashes, IF AVAILABLE" option to perform my duplicate file searches no doubt would only ensure more and more accurate results. Once in the db, it is obviously viable to add basic comparison logic to "if all matches have computed hashes in db" to ensure apples-to-apples comparisons for duplicates.
Maintaining the hashing in an always-separate, low-process and low-io priority default will help minimize overhead as much as possible. I believe many (myself included) who already use ES for all file searching would gladly allow a LOW, IDLE-TIME-ONLY hashing thread to run once for considerable time to perform initial hashing of all files already present and scanned by ES. The ongoing LOW, IDLE-TIME-ONLY hashing of newly added files would be something to initially monitor to determine average completion times and whether the low priority is capable of meeting my personal needs as far as availability time and whether I would personally need to increase the hashing priority or not - as it would be for any users interested in this feature.
Hope this description/detail helps and that you'll consider this for the future features of ES. Thanks!
-C