Use LZ4 instead of BZ2 to compress the saved database
Use LZ4 instead of BZ2 to compress the saved database
Using LZ4 will balance the speed and space saving.
At present, it may take more than 10 seconds to compress (BZIP2) and write the database (Everything.db) when the program is closed.
At present, it may take more than 10 seconds to compress (BZIP2) and write the database (Everything.db) when the program is closed.
Re: Use LZ4 instead of BZ2 to compress the saved database
Please do not use database compression.
The compression is minimal and the extra CPU usage is expensive.
It is only useful if your drive is extremely slow (< 1MBps) and you have plenty of CPU usage available.
I will consider LZ4, thank you for the suggestion.
The compression is minimal and the extra CPU usage is expensive.
It is only useful if your drive is extremely slow (< 1MBps) and you have plenty of CPU usage available.
I will consider LZ4, thank you for the suggestion.
Re: Use LZ4 instead of BZ2 to compress the saved database
Zstandard is also an algorithm which is fast to compress and, importantly, to decompress. It was co-designed by the same author as LZ4, Yann Collet.
Re: Use LZ4 instead of BZ2 to compress the saved database
Isn't the database loaded from the hdd\ssd into ram where it is uncompressed if it is?
If yes,does that mean if the database is not compressed,Everything will start faster after windows restart
Re: Use LZ4 instead of BZ2 to compress the saved database
No, the database is uncompressed as it is read from disk.
Everything will always read the database from disk with a 64KB buffer. So Everything will read 64 KB chunks at a time..
bz2 will have its own buffers, it will decompress from the 64KB read buffer into its own buffer which is 900KB.
The Everything database is already compressed without bz2.
bz2 just adds another layer of compression.
The "Compress Database" option enables or disables this extra bz2 compression layer.
Enabling bz2 compression will:
Makes loading slightly slower for SSDs.
Severely reduce the saving performance of Everything.
The performance difference with loading compressed vs uncompressed is minimal.
The saving performance is severely reduced when enabling compression (high CPU usage).
Everything will report the database load and save timings to the debug console.
So you can check which option works best for you.
Everything will always read the database from disk with a 64KB buffer. So Everything will read 64 KB chunks at a time..
bz2 will have its own buffers, it will decompress from the 64KB read buffer into its own buffer which is 900KB.
The Everything database is already compressed without bz2.
bz2 just adds another layer of compression.
The "Compress Database" option enables or disables this extra bz2 compression layer.
Enabling bz2 compression will:
Makes loading slightly slower for SSDs.
Severely reduce the saving performance of Everything.
The performance difference with loading compressed vs uncompressed is minimal.
The saving performance is severely reduced when enabling compression (high CPU usage).
Everything will report the database load and save timings to the debug console.
So you can check which option works best for you.
Re: Use LZ4 instead of BZ2 to compress the saved database
Maybe you can benchmark a 100MB or larger database? It takes several seconds to saving, even tens of seconds if BZ2 compress is enabled.void wrote: ↑Mon Jun 29, 2020 3:11 am No, the database is uncompressed as it is read from disk.
Everything will always read the database from disk with a 64KB buffer. So Everything will read 64 KB chunks at a time..
bz2 will have its own buffers, it will decompress from the 64KB read buffer into its own buffer which is 900KB.
The Everything database is already compressed without bz2.
bz2 just adds another layer of compression.
The "Compress Database" option enables or disables this extra bz2 compression layer.
Enabling bz2 compression will typically make Everything load slightly faster, as there is less I/O.
However, saving performance is severely reduced.
The performance difference with loading compressed vs uncompressed is minimal.
The saving performance is severely reduced when enabling compression (high CPU usage).
Everything will report the database load and save timings to the debug console.
So you can check which option works best for you.
Here are some of my test results:
This is with write buffering enabled in Windows, so the save speed is not accurate.Save (Secs) Load (Secs) Size (MB)
BZ2 1.09 0.47 3.22 (63% compression)
Normal 0.07 1.83 8.66
It also takes up a discernible disk size, which can be compressed by some fast algorithms.
Re: Use LZ4 instead of BZ2 to compress the saved database
Timing debug information is always in Blue text.What entries should we look for?
SSD Normal (uncompressed):
Code: Select all
db_save_local 85181 folders, 812077 files
saved db: 0.277260 seconds
Everything.db size on disk: 46,358,004 bytes
Code: Select all
loaded 85180 folders, 812058 files, in 1.497799 seconds
SSD Compressed:
Code: Select all
db_save_local 85180 folders, 812059 files
saved db: 5.362448 seconds
Code: Select all
loaded 85180 folders, 812058 files, in 3.791765 seconds
SSD Normal (uncompressed) Write Cache Disabled:
Code: Select all
db_save_local 85180 folders, 812059 files
saved db: 0.803708 seconds
Code: Select all
db_save_local 228829 folders, 800000 files
saved db: 0.613437 seconds
Code: Select all
loaded 228829 folders, 800000 files, in 2.482928 seconds
HDD Compressed:
Code: Select all
db_save_local 228829 folders, 800000 files
saved db: 6.409400 seconds
Code: Select all
loaded 228829 folders, 800000 files, in 2.717218 seconds
TL:DR: -rough tests for about 1million files:
Code: Select all
Uncompressed Compressed
SSD Load 1.49 3.79
SSD Save 0.27 5.36
HDD Load 2.48 2.71
HDD Save 0.61 6.40
Re: Use LZ4 instead of BZ2 to compress the saved database
(Oh, I mentioned - though haven't tried yet, "clearing cache", https://freefilesync.org/forum/viewtopi ... 420#p25052.)
(Voids results, at least with this .db, seem to fly in the face of Google/Mozilla's [supposed] reasoning for lz4'ing everything under the sun.
Also interesting the rather negligible difference between SSD/HDD, with HDD ever quicker in the 1 instance.
[One day, I'll have a SSD, maybe.]
Oh, & Mozilla's implementation of lz4, while it follows the spec, isn't "standard" [also think, .jar] to the majority of the lz4 [zip] related tools out there.)
(Voids results, at least with this .db, seem to fly in the face of Google/Mozilla's [supposed] reasoning for lz4'ing everything under the sun.
Also interesting the rather negligible difference between SSD/HDD, with HDD ever quicker in the 1 instance.
[One day, I'll have a SSD, maybe.]
Oh, & Mozilla's implementation of lz4, while it follows the spec, isn't "standard" [also think, .jar] to the majority of the lz4 [zip] related tools out there.)
Re: Use LZ4 instead of BZ2 to compress the saved database
Something like LZ4 or snappy would give moderate size savings while the performance impact is pretty much negligible. Both should be faster than any SSD on the market, so things will still be I/O bound not CPU bound (unlike bz2).
Either that, or just remove compression altogether, bz2 compression never makes things faster.
To give you an idea:
LZ4 (default fast mode) is roughly 80.5% of the uncompressed size. Snappy (via the snzip tool) is roughly 79.1%.
For my DB that is, YMMV
Either that, or just remove compression altogether, bz2 compression never makes things faster.
To give you an idea:
Code: Select all
1938301944 Everything.db
1559762640 Everything.db.lz4
1533548730 Everything.db.snappy
For my DB that is, YMMV
Re: Use LZ4 instead of BZ2 to compress the saved database
For reference: snappy, snzip.
("Download snzip-1.0.4.tar.gz from https://bintray.com/kubo/generic/snzip, uncompress and untar it, and run configure."
So in order to snzip, you [first] need a tar & a gz .
Which came first, the chicken or the egg.)
(Is there anything Google doesn't have its' hand in?)
(FWIW: Speeding up Redis with compression.)
("Download snzip-1.0.4.tar.gz from https://bintray.com/kubo/generic/snzip, uncompress and untar it, and run configure."
So in order to snzip, you [first] need a tar & a gz .
Which came first, the chicken or the egg.)
(Is there anything Google doesn't have its' hand in?)
(FWIW: Speeding up Redis with compression.)
Re: Use LZ4 instead of BZ2 to compress the saved database
Thanks for the feedback.
I would like to remove the bz2 option. The compression option will likely be removed from the UI and I'll keep the ini option.
A simple huffman encode might be enough, ~80% of total size, saving performance was about the same and loading performance was about 1.5 times slower.
huffman test results from 4 million files:
uncompressed load: 1.573047 seconds
compressed load: 2.034786 seconds
uncompressed save: 2.142087 seconds
compressed save: 2.463001 seconds
I would like to remove the bz2 option. The compression option will likely be removed from the UI and I'll keep the ini option.
A simple huffman encode might be enough, ~80% of total size, saving performance was about the same and loading performance was about 1.5 times slower.
huffman test results from 4 million files:
uncompressed load: 1.573047 seconds
compressed load: 2.034786 seconds
uncompressed save: 2.142087 seconds
compressed save: 2.463001 seconds
Re: Use LZ4 instead of BZ2 to compress the saved database
That sounds... a bit slow.
Did you try the others, lz4 in particular? Standing on the shoulder's of giants and all that. It's essentially one self-contained c file + xxhash.c (and lz4.h), builds fine in gcc, clang, msvc. And the BSD-2-Clause license won't cause any legal troubles.
Compared to integrating snappy, it's a breeze
https://github.com/lz4/lz4/blob/dev/lib/README.md
Re: Use LZ4 instead of BZ2 to compress the saved database
https://github.com/lz4/lz4/releases, trying it is simple.
4 million files:
>lz4.exe -1 Everything.db
Compressed 135471523 bytes into 95093012 bytes ==> 70.19%
>lz4.exe -9 Everything.db
Compressed 135471523 bytes into 82277721 bytes ==> 60.73%
>lz4.exe -d Everything.db.lz4
...
I think it is fast and worth considering, although the space saving is not great. It may be more suitable for standard users to have a large number of files, in order to use less space.
I got 45.1MB (34.95%) using 7z standard. 54.8MB (~42.48%) using BZIP2, about 32 seconds, 'lz4 -9' is 7 seconds, on an older PC. In comparison,'lz4 -1' is less than 3 seconds.
If the device is a low-speed storage and high-speed CPU, low-speed algorithms is worth choosing by user, I don't think this is common.
4 million files:
>lz4.exe -1 Everything.db
Compressed 135471523 bytes into 95093012 bytes ==> 70.19%
>lz4.exe -9 Everything.db
Compressed 135471523 bytes into 82277721 bytes ==> 60.73%
>lz4.exe -d Everything.db.lz4
...
I think it is fast and worth considering, although the space saving is not great. It may be more suitable for standard users to have a large number of files, in order to use less space.
I got 45.1MB (34.95%) using 7z standard. 54.8MB (~42.48%) using BZIP2, about 32 seconds, 'lz4 -9' is 7 seconds, on an older PC. In comparison,'lz4 -1' is less than 3 seconds.
If the device is a low-speed storage and high-speed CPU, low-speed algorithms is worth choosing by user, I don't think this is common.
Re: Use LZ4 instead of BZ2 to compress the saved database
Users having a large number of files will most of the time have plenty of disk space.yfdyh000 wrote: ↑Mon Jul 13, 2020 9:55 am https://github.com/lz4/lz4/releases, trying it is simple.
4 million files:
>lz4.exe -1 Everything.db
Compressed 135471523 bytes into 95093012 bytes ==> 70.19%
>lz4.exe -9 Everything.db
Compressed 135471523 bytes into 82277721 bytes ==> 60.73%
>lz4.exe -d Everything.db.lz4
...
I think it is fast and worth considering, although the space saving is not great. It may be more suitable for standard users to have a large number of files, in order to use less space.
I got 45.1MB (34.95%) using 7z standard. 54.8MB (~42.48%) using BZIP2, about 32 seconds, 'lz4 -9' is 7 seconds, on an older PC. In comparison,'lz4 -1' is less than 3 seconds.
If the device is a low-speed storage and high-speed CPU, low-speed algorithms is worth choosing by user, I don't think this is common.
I see the whole discussion about compressing the database and saving some CPU cycles
as almost unnecesary on todays hardware.