Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A related thing that I realised a few days ago about compression algorithms:

  <!doctype html><meta charset=utf-8>
  <!DOCTYPE html><meta charset="utf-8">
Compress these with gzip, and the first is smaller than the second (56 and 58 bytes): lowercase doctype because you’re using very few uppercase letters in your document (on slightly larger samples it tends to save a byte or two), and omit the quotes as unnecessary. On larger documents there will be some places where you need quotes around attribute values, but it’s still worth omitting them when you can.

LZMA, similar: 60 and 63 bytes.

But then compress these with Brotli, and it’s the other way around by a larger margin, 29 and 19 bytes, because Brotli ships a dictionary primed on arbitrary web content. And so it becomes a popularity contest, and an inferior but vastly more popular technique compresses better.

In the case of #008000/green/#00ff00/#0f0/lime/#ffff00/#ff0/yellow, the dictionary doesn’t look to bee tainted, so traditional length and repetition wisdom still applies.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: