tag:blogger.com,1999:blog-3940568014938333882.post137891688945381453..comments2024-03-09T17:26:03.264+01:00Comments on Wardrobe strength: Better support for short strings in Neo4jTobiashttp://www.blogger.com/profile/15796529762063980134noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-3940568014938333882.post-87247338604219260062011-03-08T21:46:10.268+01:002011-03-08T21:46:10.268+01:00static huffman compression @char level can be made...static huffman compression @char level can be made to work like a charm on small sized strings. One needs to enable periodic updating of the coding tables to keep-up with the data distribution. And to reserve one symbol for escaping in order to cover symbols not found in "preloaded static code tree". Making huffman canonical is really easy, and makes cpu-hit from compression unnoticable.Eks Devhttps://www.blogger.com/profile/10827220876127146689noreply@blogger.comtag:blogger.com,1999:blog-3940568014938333882.post-63977861412776079802011-03-02T00:48:46.091+01:002011-03-02T00:48:46.091+01:00@Daniel: At these small sizes a conventional compr...@Daniel: At these small sizes a conventional compression algorithm, such as gzip, is not going to give much, if any, improvement. And the overhead is going to eat up too much space. What could be interesting is to look at general frequencies and use variable length encodings (such as some huffman code) to encode common characters in fewer bits. But when trying that out the added complexity to the code was not worth it. Especially since the gain wasn't that big, and it made it much harder too look at a string and judge if it would fit as a short string or not.<br /><br />For strings that are slightly longer, and stored in the DynamicStringStore, it would be more interesting to look at using a conventional compression algorithm. As you point out, text compresses really well. I think the first step however is going to be to use some character encoding (probably UTF-8) instead of just storing the raw 16bit Java characters in the DynamicStringStore.Tobiashttps://www.blogger.com/profile/07972773521986135690noreply@blogger.comtag:blogger.com,1999:blog-3940568014938333882.post-24030596578909849812011-03-02T00:25:31.024+01:002011-03-02T00:25:31.024+01:00Would gzipping the short strings make sense? I don...Would gzipping the short strings make sense? I don't know much (read: anything) about gzip's constant overhead, but I know that text compresses unreasonably well and that zlib is really fast. You're probably not going to be able to squeeze many more characters out of 64 bits, but every bit is precious, right? ;)Dan Hackneyhttps://www.blogger.com/profile/10358325177901149625noreply@blogger.com