How to find many names (partial match of different names) of music files on disk?

General discussion related to "Everything".
Post Reply
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

How to find 1000 names (partial match of different names) of music files on disk?
I tried with a very, very, very, very, very LONG REGEX, but it failed.
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by therube »

Explain?

In any case, I'd think a filelist would be the thing to use.
If you have a list of names (partial names), you can copy & paste that into a filelist & ...
void
Developer
Posts: 16680
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by void »

Please give an example of the filenames and what regex patterns you have tried.
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

But this regular expression is more than 10000000 characters long, which is all partial filenames.
Example:
(filename1|filename2|filename10000)
I want to find all similar filenames, which are identical files but have a slightly different name.
void
Developer
Posts: 16680
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by void »

Thank you for bringing this issue to my attention.

regex in Everything is currently limited to 65536 characters.

I will increase this to 1073741824 characters for the next alpha update.
(link size of 4)



For now, you'll have to break your search into 64k chunks and perform multiple searches.

Searching with DFA lists (really long OR lists) are on my TODO list.
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

This is unworkable anyway, because I don't know how to match the file names
eg.
Artist - File name (Some second artist remix, etc.).
OR
ID-Artist - File name (Some second artist remix, etc.).
eg.
Artist - File name (Some second artist remix, etc.).
Besides, you would have to specify whether the parenthesis should be with or without space or other data. This is too complicated for me. Only possible for advanced experts who know how to do it.
I have been agonizing over this for many years, but have not been able.
I have low quality bitrate and high quality bitrate, but some don't have high quality bitrate, so I don't want to delete them, but they have similar names, but I can't recognize the correct ones. This will probably never work.
void
Developer
Posts: 16680
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by void »

Matching filenames will be difficult if they differ greatly.

You might need to look for duplicated acoustic fingerprints and then sort by bitrate.
Everything doesn't have an Acoustic fingerprint property yet.
I've put this on my TODO list.
(There might be a third party property system that does this already?)



For now, please try the following search:

add-column:regmatch1;audio-bitrate regex:(?:\d*-?)([^-]*" - "[^(]*)" "\( dupe:regmatch1

Check the text shown in the Regular Expression Match 1 column.
It may require some adjusting, but it should show the Artist - File name.

From there you can do a secondary sort on the bitrate:

add-column:regmatch1;audio-bitrate regex:(?:\d*-?)([^-]*" - "[^(]*)" "\( dupe:regmatch1 sort:regmatch1;audio-bitrate

This will show files with duplicated Artist - File name in groups with the highest bitrate shown at the top.
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

I turned on Regex, typed these expressions, but it shows 0 objects.

I thought you could get around this with the trick of partially matching just the title and the name of some remix artist, e.g.


title(artist
e.g. Chinese/English/Russian name(DJ P
Or with a space next to the name
I will be searching by a single name of a well-known artist


And so for any other name. Important must support Unicode!
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by therube »

I want to find all similar filenames, which are identical files but have a slightly different name.
Explain?

Identical in what way? Both file size & content are exactly the same?
Then compare hashes.

Same "music" (same musical data), but possibly different tags?
hashmedia.bat might do it.
(Might have to adjust it, outputting results to individual files, then comparing those files for dups.)
Or there are duplicate file finders that can deal with various types of audio dups. (AllDup is one.)
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

E.g..
Artist 1 - Name track (Name Artist 2 remix/mix etc here in brackets may be different additional names) and so the differences may be in brackets, but Artist 2 is always the same.
Files may be repeated, but some have better quality and some worse, e.g. 96kb and the other 320kb.
but some are only 96kb
So there is no rare tool in the 21st century that will do this?
They came up with so many tools (big list), but each one shows forever 0 objects, zero duplicates. Even one letter different in the name, will mean that nothing was found. This is a failure in all the tools of the whole world...,
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by therube »

Explain the file names?
Real names would help.

What would be the search you're using or looking to use to find said file names?

A Bit Rate (Property) column can be added.
(Needs Everything 1.5.)
ChrisGreaves
Posts: 684
Joined: Wed Jan 05, 2022 9:29 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by ChrisGreaves »

therube wrote: Mon Mar 25, 2024 5:50 pm Identical in what way? Both file size & content are exactly the same?
I have thought about this for years. Off and On. Especially in terms of content.
Along the way i have learned a bit about metadata and the "packets" of sound in an MP3 file.

MUSIC is an auditory experience, and the pressure waves are converted to electrical signals in a human brain.

If, like me, you have a jukebox playing 24/7 then you will abhor duplicates.
If I trim the 2 minutes of applause from the start of an opera, is the MP3 different from another track as considered by me? No.
If I trim the 7 minutes of applause from the end of an opera, is the MP3 different from another track as considered by me? No.
I have a method for locating a packet of sound in the middle of one MP3 and searching for that same packet in a different MP3; that usually tells me that one track fathered the other.

But what about "Grand Chœur Dialogué - Gigout "? My opinion is that you can never have too many copies of this track.
Especially if they are versions by different performers.

And right now I am rather fond of "The Beatles - Golden Slumbers-Carry That Weight-The End" and have several copies of it in my jukebox. But in six month's time those copies will be considered duplicates by me, and I would like to weed them out.

So I think that the definition of DUPLICATE, especially for audio files, is dependent on time, as well as content.

Today weeding out duplicates by name, size, and total content (SHA256 ???) is effective and removes the obvious weeds, but since a weed is a plant that is growing where it is not wanted, even "Jingle bells" is a weed if it occurs only once in 19, 657 items.
Cheers, Chris
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

Example The main artist is the same, the song title is the same, the remix artist is the same, but the two files differ only in the name.
I tried all the popular tools and no tool on the market NEVER, NEVER, NEVER finds even 1 duplicate!
So I perpetually have duplicates!
Even a space makes a difference, even a page address in a file makes a difference, any one character or more. Even one extra letter in a duplicate makes an extra difference! That's why every tool fails, it shows 0 objects found.
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by therube »

"Jethro Tull - Thick As A Brick.mp3"
"Jethro Tull - Thick As A Brick (live).mp3"
void
Developer
Posts: 16680
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by void »

With those filenames as an example, you could do the following:

addcolumn:regmatch1 regex:^(.*?)(\(.*)?\.[^.]*$ dupe:regmatch1

Both these files will show in the results with the regmatch1 column showing "Jethro Tull - Thick As A Brick"
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

This doesn't work properly, it doesn't find duplicates by specific folders, and it doesn't exclude artist 2 in brackets.

Artist and title are not enough information, as a title can have countless remixes by other artists of a song.

I don't have a single song with a single title, but they all have other artists remixing those songs.

Example:
Artist - Title (Artist remix 1)
Artist - Title (Artist Remix 2)
Artist - Title (Artist Remix 3)
...
Artist - Title (Artist Remix20)
void
Developer
Posts: 16680
Joined: Fri Oct 16, 2009 11:31 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by void »

Please provide filenames.
I cannot guess your filenames.


This doesn't work properly, it doesn't find duplicates by specific folders
What happens?
What results are shown?
What results are missing?
To limit your results to folders only, include the following in your search:

folder:



and it doesn't exclude artist 2 in brackets.
What is artist 2 in brackets?


Artist and title are not enough information, as a title can have countless remixes by other artists of a song.

I don't have a single song with a single title, but they all have other artists remixing those songs.

Example:
Artist - Title (Artist remix 1)
Artist - Title (Artist Remix 2)
Artist - Title (Artist Remix 3)
...
Artist - Title (Artist Remix20)
What are you trying to extract here? just the artist?



Extract the just the artist and find duplicates, from folder names:

folder: addcolumn:regmatch1 regex:^(.*?)\s-\s dupe:regmatch1




Extract the artist and title and find duplicates, from folder names:

folder: addcolumn:regmatch1 regex:^(.*?)\s\( dupe:regmatch1
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

I have multiple file paths. But it is supposed to compare only partial matches in file name strings e.g.



G:\1023-artis - title (artist2
h:\artis - title (artist2
Bad Romance (Starsmith Remix)
Bad Romance (Grum Remix)

By2 - 爱丫(Mix泽仔[/b] Etro Mix国语).mp3
G:\【1独家】\By2 - 爱丫(D九[/b] Electro Mix国女).mp3
S:\By2 - 爱丫(j衍 Electro Mix 女).mp3
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: How to find many names (partial match of different names) of music files on disk?

Post by Debugger »

I have now found this solution.

I give all the names in lines from multiple paths, first extracting the titles and what's behind the initial parenthesis, then I select the duplicates, and paste the result into everything with a path where there might be duplicates and it finds.

Find: ‘title(artist’

([^\(\s]+)\(([^)]+?)(?:\s|$)


Find Duplicate:

^(.*)[\r\n]*(\r?\n\1)+$

Extract Options: Display Matched Strings Only

Found: 526 Items 8+GB
incans
Posts: 4
Joined: Sun Nov 03, 2024 3:48 pm

Re: How to find many names (partial match of different names) of music files on disk?

Post by incans »

Pretty certain I have seen this question crop up in another forum? It seems to me what you are looking for is fuzzy matching on file names. There are such tools around (e.g. dupeGuru) but I can't claim to have used them. I have a feeling in that other thread you said you had tried several of these tools and not had good results.
Post Reply