Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

    I’d certainly seen this at Google, where they’re 
    pretty good at that kind of thing. But at Scalyr, 
    we settled on a much more brute-force approach: a 
    linear scan through the log
Art of Computer Programming mentions this methodology, and the importance of learning about it. There are entire sections of "Tape Algorithms" that maximize the "brute force linear scan" of tape drives, complete with diagrams on how a hypothetical tape drives work in the situation.

Few "fancy algorithms" books actually get into the low level stuff. But Knuth knows: modern computers are extremely fast at linear data, and accessing data linearly (as well as learning algorithms that access data linearly) is a good idea.

I'll bet you that a B-tree approach would be fastest actually. B-Trees are usually good at balancing "massive linear performance" with "theoretical asymptotic complexity". IIRC, there is some research into this area: (look up cache sensitive search trees)



Why would you need a B-tree when you only append to the end of the logs?


    We use some special tricks for searches that are executed 
    frequently, e.g. as part of a dashboard. (We’ll describe 
    this in a future article.) 
And...

    (You might wonder why we store log messages in this
     4K-paged, metadata-and-text format, rather than
     working with raw log files directly. There are many 
     reasons, which boil down to the fact that internally,
     the Scalyr log engine looks more like a distributed 
     database than a file system. Text searches are often 
     combined with database-style filters on parsed log 
     fields; we may be searching many thousands of logs at 
     once; and simple text files are not a good fit for our 
     transactional, replicated, distributed data 
     management.)
It sounds like they're doing more than just "appending to the end of the log". If you're going to make an index of any kind, the index will likely be fastest with some sort of B-Tree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: