Use LZ4 instead of BZ2 to compress the saved database

Have a suggestion for "Everything"? Please post it here.
Post Reply
yfdyh000
Posts: 12
Joined: Wed Sep 07, 2016 1:44 am

Use LZ4 instead of BZ2 to compress the saved database

Post by yfdyh000 »

Using LZ4 will balance the speed and space saving.

At present, it may take more than 10 seconds to compress (BZIP2) and write the database (Everything.db) when the program is closed.
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by void »

Please do not use database compression.

The compression is minimal and the extra CPU usage is expensive.

It is only useful if your drive is extremely slow (< 1MBps) and you have plenty of CPU usage available.

I will consider LZ4, thank you for the suggestion.
Marco77
Posts: 4
Joined: Mon Nov 14, 2016 5:58 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by Marco77 »

Zstandard is also an algorithm which is fast to compress and, importantly, to decompress. It was co-designed by the same author as LZ4, Yann Collet.
vsub
Posts: 474
Joined: Sat Nov 12, 2011 11:51 am

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by vsub »

void wrote: Mon Jun 22, 2020 1:29 am Please do not use database compression.

The compression is minimal and the extra CPU usage is expensive.

It is only useful if your drive is extremely slow (< 1MBps) and you have plenty of CPU usage available.

I will consider LZ4, thank you for the suggestion.
Isn't the database loaded from the hdd\ssd into ram where it is uncompressed if it is?
If yes,does that mean if the database is not compressed,Everything will start faster after windows restart
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by void »

No, the database is uncompressed as it is read from disk.

Everything will always read the database from disk with a 64KB buffer. So Everything will read 64 KB chunks at a time..
bz2 will have its own buffers, it will decompress from the 64KB read buffer into its own buffer which is 900KB.

The Everything database is already compressed without bz2.
bz2 just adds another layer of compression.
The "Compress Database" option enables or disables this extra bz2 compression layer.

Enabling bz2 compression will:
Makes loading slightly slower for SSDs.
Severely reduce the saving performance of Everything.

The performance difference with loading compressed vs uncompressed is minimal.
The saving performance is severely reduced when enabling compression (high CPU usage).

Everything will report the database load and save timings to the debug console.
So you can check which option works best for you.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by NotNull »

void wrote: Mon Jun 29, 2020 3:11 am Everything will report the database load and save timings to the debug console.
What entries should we look for?
yfdyh000
Posts: 12
Joined: Wed Sep 07, 2016 1:44 am

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by yfdyh000 »

void wrote: Mon Jun 29, 2020 3:11 am No, the database is uncompressed as it is read from disk.

Everything will always read the database from disk with a 64KB buffer. So Everything will read 64 KB chunks at a time..
bz2 will have its own buffers, it will decompress from the 64KB read buffer into its own buffer which is 900KB.

The Everything database is already compressed without bz2.
bz2 just adds another layer of compression.
The "Compress Database" option enables or disables this extra bz2 compression layer.

Enabling bz2 compression will typically make Everything load slightly faster, as there is less I/O.
However, saving performance is severely reduced.

The performance difference with loading compressed vs uncompressed is minimal.
The saving performance is severely reduced when enabling compression (high CPU usage).

Everything will report the database load and save timings to the debug console.
So you can check which option works best for you.

Here are some of my test results:
       Save (Secs)  Load (Secs)  Size (MB)
BZ2      1.09     0.47      3.22 (63% compression)
Normal    0.07     1.83      8.66
This is with write buffering enabled in Windows, so the save speed is not accurate.
Maybe you can benchmark a 100MB or larger database? It takes several seconds to saving, even tens of seconds if BZ2 compress is enabled.
It also takes up a discernible disk size, which can be compressed by some fast algorithms.
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by void »

What entries should we look for?
Timing debug information is always in Blue text.

SSD Normal (uncompressed):

Code: Select all

db_save_local 85181 folders, 812077 files
saved db: 0.277260 seconds
-This is not accurate as write cache is enabled, see below for write cache disabled test to give a better idea of performance.

Everything.db size on disk: 46,358,004 bytes

Code: Select all

loaded 85180 folders, 812058 files, in 1.497799 seconds
-Load timings from a fresh boot

SSD Compressed:

Code: Select all

db_save_local 85180 folders, 812059 files
saved db: 5.362448 seconds
Everything.db size on disk: 20,452,358 bytess

Code: Select all

loaded 85180 folders, 812058 files, in 3.791765 seconds
-Load timings from a fresh boot

SSD Normal (uncompressed) Write Cache Disabled:

Code: Select all

db_save_local 85180 folders, 812059 files
saved db: 0.803708 seconds
HDD Normal (uncompressed):

Code: Select all

db_save_local 228829 folders, 800000 files
saved db: 0.613437 seconds
Everything.db size on disk: 38,289,196 bytes

Code: Select all

loaded 228829 folders, 800000 files, in 2.482928 seconds
-Load timings from a fresh boot

HDD Compressed:

Code: Select all

db_save_local 228829 folders, 800000 files
saved db: 6.409400 seconds
Everything.db size on disk: 15,518,383 bytes

Code: Select all

loaded 228829 folders, 800000 files, in 2.717218 seconds
-Load timings from a fresh boot

TL:DR: -rough tests for about 1million files:

Code: Select all

               Uncompressed      Compressed
SSD Load       1.49              3.79
SSD Save       0.27              5.36
HDD Load       2.48              2.71
HDD Save       0.61              6.40
Results will vary for your hardware.
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by therube »

(Oh, I mentioned - though haven't tried yet, "clearing cache", https://freefilesync.org/forum/viewtopi ... 420#p25052.)

(Voids results, at least with this .db, seem to fly in the face of Google/Mozilla's [supposed] reasoning for lz4'ing everything under the sun.
Also interesting the rather negligible difference between SSD/HDD, with HDD ever quicker in the 1 instance.
[One day, I'll have a SSD, maybe.]
Oh, & Mozilla's implementation of lz4, while it follows the spec, isn't "standard" [also think, .jar] to the majority of the lz4 [zip] related tools out there.)
dolos
Posts: 8
Joined: Wed Jul 17, 2019 3:39 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by dolos »

Something like LZ4 or snappy would give moderate size savings while the performance impact is pretty much negligible. Both should be faster than any SSD on the market, so things will still be I/O bound not CPU bound (unlike bz2).

Either that, or just remove compression altogether, bz2 compression never makes things faster.

To give you an idea:

Code: Select all

1938301944 Everything.db
1559762640 Everything.db.lz4
1533548730 Everything.db.snappy
LZ4 (default fast mode) is roughly 80.5% of the uncompressed size. Snappy (via the snzip tool) is roughly 79.1%.
For my DB that is, YMMV
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by therube »

For reference: snappy, snzip.


("Download snzip-1.0.4.tar.gz from https://bintray.com/kubo/generic/snzip, uncompress and untar it, and run configure."
So in order to snzip, you [first] need a tar & a gz ;-).
Which came first, the chicken or the egg.)

(Is there anything Google doesn't have its' hand in?)

(FWIW: Speeding up Redis with compression.)
void
Developer
Posts: 16683
Joined: Fri Oct 16, 2009 11:31 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by void »

Thanks for the feedback.

I would like to remove the bz2 option. The compression option will likely be removed from the UI and I'll keep the ini option.

A simple huffman encode might be enough, ~80% of total size, saving performance was about the same and loading performance was about 1.5 times slower.

huffman test results from 4 million files:

uncompressed load: 1.573047 seconds
compressed load: 2.034786 seconds

uncompressed save: 2.142087 seconds
compressed save: 2.463001 seconds
dolos
Posts: 8
Joined: Wed Jul 17, 2019 3:39 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by dolos »

void wrote: Fri Jul 10, 2020 7:29 am A simple huffman encode might be enough, ~80% of total size, saving performance was about the same and loading performance was about 1.5 times slower.
uncompressed load: 1.573047 seconds
compressed load: 2.034786 seconds
That sounds... a bit slow.

Did you try the others, lz4 in particular? Standing on the shoulder's of giants and all that. It's essentially one self-contained c file + xxhash.c (and lz4.h), builds fine in gcc, clang, msvc. And the BSD-2-Clause license won't cause any legal troubles.
Compared to integrating snappy, it's a breeze :D
https://github.com/lz4/lz4/blob/dev/lib/README.md
yfdyh000
Posts: 12
Joined: Wed Sep 07, 2016 1:44 am

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by yfdyh000 »

https://github.com/lz4/lz4/releases, trying it is simple.

4 million files:
>lz4.exe -1 Everything.db
Compressed 135471523 bytes into 95093012 bytes ==> 70.19%
>lz4.exe -9 Everything.db
Compressed 135471523 bytes into 82277721 bytes ==> 60.73%
>lz4.exe -d Everything.db.lz4
...

I think it is fast and worth considering, although the space saving is not great. It may be more suitable for standard users to have a large number of files, in order to use less space.
I got 45.1MB (34.95%) using 7z standard. 54.8MB (~42.48%) using BZIP2, about 32 seconds, 'lz4 -9' is 7 seconds, on an older PC. In comparison,'lz4 -1' is less than 3 seconds.
If the device is a low-speed storage and high-speed CPU, low-speed algorithms is worth choosing by user, I don't think this is common.
horst.epp
Posts: 1443
Joined: Fri Apr 04, 2014 3:24 pm

Re: Use LZ4 instead of BZ2 to compress the saved database

Post by horst.epp »

yfdyh000 wrote: Mon Jul 13, 2020 9:55 am https://github.com/lz4/lz4/releases, trying it is simple.

4 million files:
>lz4.exe -1 Everything.db
Compressed 135471523 bytes into 95093012 bytes ==> 70.19%
>lz4.exe -9 Everything.db
Compressed 135471523 bytes into 82277721 bytes ==> 60.73%
>lz4.exe -d Everything.db.lz4
...

I think it is fast and worth considering, although the space saving is not great. It may be more suitable for standard users to have a large number of files, in order to use less space.
I got 45.1MB (34.95%) using 7z standard. 54.8MB (~42.48%) using BZIP2, about 32 seconds, 'lz4 -9' is 7 seconds, on an older PC. In comparison,'lz4 -1' is less than 3 seconds.
If the device is a low-speed storage and high-speed CPU, low-speed algorithms is worth choosing by user, I don't think this is common.
Users having a large number of files will most of the time have plenty of disk space.
I see the whole discussion about compressing the database and saving some CPU cycles
as almost unnecesary on todays hardware.
Post Reply