soundex

avi · Post by **avi** » Fri Jun 21, 2024 9:41 am

1. This option is very useful! Could you please add a setting to make it the default?
2. Could you please add support for letters in the Hebrew language?

Thanks!

Post by **void** » Fri Jun 21, 2024 10:38 am

1. This option is very useful! Could you please add a setting to make it the default?

I will look into an option to make this easier.

For now, please try creating the following soundex filter:

In Everything, from the Search menu, click Add to filters....
Change the Name to: Soundex
Change the Search to: soundex:$param:
Click OK.

Filters can be activated from the Search menu, Filter bar (View -> Filters), right clicking the status bar, filter macro or filter keyboard shortcut.

The problem here is you can only have one search term when this filter is active.
(this search modifier was never really designed to be used with multiple terms)

soundex:
https://en.wikipedia.org/wiki/Soundex

2. Could you please add support for letters in the Hebrew language?

This is outside the scope of soundex.
Only A-Z is supported.
I will look into other algorithms that support Hebrew.

Thank you for the suggestions.

avi · Post by **avi** » Sun Jun 23, 2024 4:31 pm

Thank you for your detailed reply!

I will look into other algorithms that support Hebrew.

If it is possible, with the option to make it the default (with a setting in "Advanced"?), that would be great!
This is especially necessary in Hebrew, because in Hebrew there are letters "י" and "ו" that some write and some omit them, for example there are those who write "ביאור" and there are those who write "באור", there are those who write "שלחן" and there are those who write "שולחן", and many more. (You can do a search for "ב*אור" and "ש*לחן" but because there are so many such words, it would be more helpful if there was such a default option).
I also use a program called "Fluent Search" a lot, and choose there your search engine, and if there was such a default setting in "Everything" it would be useful there too.

On this occasion I would like to personally thank you for your software, I use it a lot!

Post by **therube** » Wed Jun 26, 2024 8:00 pm

Wonder if something in the Diacritics end could work?
Or even a filter that filtered out vowels?

meteorquake · Post by **meteorquake** » Wed Jun 26, 2024 9:04 pm

The trick is to use a language-independent algorithm that doesn't analyse sound, like the one I mentioned that in essence simply counts up the Find's letter pairs present in the Target; I'm sure there may be better as I've not researched what algorithms there are but I found it works very well.
d

ChrisGreaves · Post by **ChrisGreaves** » Thu Jun 27, 2024 10:30 am

meteorquake wrote: ↑Wed Jun 26, 2024 9:04 pmThe trick is to use a language-independent algorithm that doesn't analyse sound, like the one I mentioned that in essence simply counts up the Find's letter pairs present in the Target; ...

Hi meteorquake. I agree that a language-independent algorithm would be good.
In particular a mobile algorithem (which a language-independent algorithm satisfies) is good.

I am thinking of a spell-checker using symbol-pairs to match likely suggestions for corrections. A bonus would/should be the ability to include longer strings, for example: phrases built of hyphenated or space-separated strings.
Cheers, Chris

meteorquake · Post by **meteorquake** » Thu Jun 27, 2024 2:41 pm

It should work all the same. However if doing that and you're looking for total matching you could speed it up by only searching words whose length is within a certain % of the Find phrase. For example you'd not expect hello to be a match for a 12 letter sequence. You'd just be looking at character symbols and you might or might not want to completely ignore anything else (so "It was highly-priced" would be "itwashighlypriced") or treat all sequences of non-characters as a single space.
d

ChrisGreaves · Post by **ChrisGreaves** » Thu Jun 27, 2024 7:20 pm

meteorquake wrote: ↑Thu Jun 27, 2024 2:41 pm... only searching words whose length is within a certain % of the Find phrase.

Quite so! And those values would be part of my parameters of the application.
Thanks, Chris

Post by **void** » Thu Jun 27, 2024 10:42 pm

Sounds like you want Levenshtein distance.

I have plans to add support for this.
However, it will require sorting results by relevance.
There's too many unwanted results without a relevance sort.

ChrisGreaves · Post by **ChrisGreaves** » Fri Jun 28, 2024 10:18 am

void wrote: ↑Thu Jun 27, 2024 10:42 pmSounds like you want Levenshtein distance.

Thank you Void.
The section "Upper and lower bounds" suggests a means to do a preliminary filter of potential matches. Where the bounds are given as e.g. "It is at most the length of the longer string" one could sort the strings by ascending length, and treat the longest strings last as having a lesser probability of finding a match.
I haven't thought about that in detail; just tucked it away for now.
Cheers, Chris

meteorquake · Post by **meteorquake** » Sat Jun 29, 2024 4:56 pm

Looking at that article you would need to solve word rearrangement.
For example Edinburgh Tasks should be viewed as almost matching Tasks Edinburgh.
One of the reasons I adopted a simple word pair count for my own searching is that rearranged blocks will come out with a close score to the original as will small drops, changes and insertions. The price you pay is you can get some surprises included, but I've always thought it's better to have a few false inclusions than some good ones not showing.
It may be there are some very optimised processes to tackle block rearrangement, although if a routine is made too intricate it can run the risk of being slower than desirable... possibly not an issue for 500,000 filenames with something written in Assembly.
d

voidtools forum

soundex

soundex

Re: soundex

Re: soundex

Re: soundex

Re: soundex

Re: soundex

Re: soundex

Re: soundex

Re: soundex

Re: soundex

Re: soundex