checksums calculated from primary data streams only?

Off-topic posts of interest to the "Everything" community.
Post Reply
jimspoon
Posts: 187
Joined: Tue Apr 26, 2011 11:39 pm

checksums calculated from primary data streams only?

Post by jimspoon »

What if you have two files with identical primary data streams, but one of the files has an alternate data streams? Will a checksum generator show the same checksum for both files? I made a copy of a file and then added an ADS to one of the two files. Nirsoft Hashmyfiles still shows the same checksums for each file. So I am guessing that Hashmyfiles, at least, generates checksums based only on the data in the primary data stream. I don't know if there are any checksum generators that take ADS into account. So if you see the same checksum for two files, the primary streams are almost certainly identical, but they may have different ADS. To identify files which are apparently identical but have different ADS, you'll have to look at other properties, such as the file modification date. Adding an ADS won't change the displayed file size (which shows the size of the primary stream only), but it will change the file modification date.
horst.epp
Posts: 1443
Joined: Fri Apr 04, 2014 3:24 pm

Re: checksums calculated from primary data streams only?

Post by horst.epp »

No, writers of ADS stream can preserve the modification date.
At least I do so.
For example, a tag system which changes the original file modification date would be almost useless.
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: checksums calculated from primary data streams only?

Post by therube »

The way I look at it is, a file is a file, & an ADS is an ADS.

And as most copy programs & alternative OS's are not ADS aware...


I've posted about ADS (do a search), but don't recall offhand in what manner it applied...


And as far as date goes, it certainly cannot be used as any sort of equivalency - except for date itself.
horst.epp
Posts: 1443
Joined: Fri Apr 04, 2014 3:24 pm

Re: checksums calculated from primary data streams only?

Post by horst.epp »

therube wrote: Mon Oct 31, 2022 6:12 pm And as far as date goes, it certainly cannot be used as any sort of equivalency - except for date itself.
I disagree.
It is for example the main criteria to compare backup versions with their sources.
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: checksums calculated from primary data streams only?

Post by therube »

That may be - if you are to assume that content is the same - in which date/time is just fine.
But the whole reason for a backup is to ensure that content is the same.

And content need not be the same, even if date & time are (i.e. data corruption or whatnot).

When Voidhash runs, it does not touch directory time/date. So if you were to assume that two directory structures, because they had the same date/time, are the same, well, they are not (as one may have the hash file in it & the other not - but date/time will not tell you that).

Many backup/sync programs do use date/time to determine "diff", but doing so does not ensure "exactness".
So there is a trade-off between speed date/time vs. having to perform some sort of actual comparison (hash/content checks) on sets of files.

And plenty do backup based on date/time (living with the impression that media does not go bad, or that files have not been modified - silently...).
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: checksums calculated from primary data streams only?

Post by raccoon »

While ADS can, and in my opinion should, be read in from the NTFS $MFT (fast) and treated as distinct objects (like files and folders). Everything only collects attributes/properties/metadata, such as checksum hashing, on distinct objects. ADS are currently only treated as a property or metadata, not as distinct objects.

In the future would like to see Everything support alternate data streams as distinct objects. I would also like to see archive (zip;rar;7z) contents indexed as objects.
jimspoon
Posts: 187
Joined: Tue Apr 26, 2011 11:39 pm

Re: checksums calculated from primary data streams only?

Post by jimspoon »

therube wrote: Mon Oct 31, 2022 6:12 pm And as far as date goes, it certainly cannot be used as any sort of equivalency - except for date itself.
What I meant was, if you have two files with identical hashes, but different modification dates, that could be a sign that the files are not in fact really identical, e.g. that an ADS was added to one of them. But Everything of course provides more direct ways to determine this.
jimspoon
Posts: 187
Joined: Tue Apr 26, 2011 11:39 pm

Re: checksums calculated from primary data streams only?

Post by jimspoon »

horst.epp wrote: Mon Oct 31, 2022 6:10 pm No, writers of ADS stream can preserve the modification date.
At least I do so.
For example, a tag system which changes the original file modification date would be almost useless.
That's good to know. I guess it depends entirely on the tool being used. When I used 7-zip to add an ADS to a file, the file's modification date was changed. As an experiment, I just used Powershell's set-content cmdlet to add an ADS to a file, and the file's LastWriteTime (shown by the get-childitem command) was changed. What tool do you use to write an ADS?
jimspoon
Posts: 187
Joined: Tue Apr 26, 2011 11:39 pm

Re: checksums calculated from primary data streams only?

Post by jimspoon »

raccoon wrote: Mon Oct 31, 2022 7:29 pm While ADS can, and in my opinion should, be read in from the NTFS $MFT (fast) and treated as distinct objects (like files and folders). Everything only collects attributes/properties/metadata, such as checksum hashing, on distinct objects. ADS are currently only treated as a property or metadata, not as distinct objects.

In the future would like to see Everything support alternate data streams as distinct objects. I would also like to see archive (zip;rar;7z) contents indexed as objects.
I'd like to see that too! The Files file manager ( https://github.com/files-community/Files ) does give you option to view ADS alongside their containing files in the same directory listing, and so does the V File Viewer. The 7-Zip file manager lets you navigate from a file down to a listing of its ADS.

I think the best solution would let us (optionally) view ADS as distinct objects AND let metadata contained in the ADS be viewed in columns for the primary stream.
void
Developer
Posts: 16680
Joined: Fri Oct 16, 2009 11:31 pm

Re: checksums calculated from primary data streams only?

Post by void »

I will consider a property to calculate the checksum of alternate data streams and data + alternate data streams.

Thank you for the suggestion.
horst.epp
Posts: 1443
Joined: Fri Apr 04, 2014 3:24 pm

Re: checksums calculated from primary data streams only?

Post by horst.epp »

jimspoon wrote: Tue Nov 01, 2022 12:35 am
horst.epp wrote: Mon Oct 31, 2022 6:10 pm No, writers of ADS stream can preserve the modification date.
At least I do so.
For example, a tag system which changes the original file modification date would be almost useless.
That's good to know. I guess it depends entirely on the tool being used. When I used 7-zip to add an ADS to a file, the file's modification date was changed. As an experiment, I just used Powershell's set-content cmdlet to add an ADS to a file, and the file's LastWriteTime (shown by the get-childitem command) was changed. What tool do you use to write an ADS?
A plugin in Total Commander and a script in XYplorer.
This allows me to have tags available thru the file system.
Also this tags are indexed by Everything and can be searched fast in both file managers.
Post Reply