Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

tl;dr: PostreSQL stores small values directly in the table, but has to store large values outside of the table by itself, and the table merely contains a reference to the external storage. In either case, the table itself is small, and performance is fast. However, if you store data that is just under the size threshold for moving outside of the table, the table gets large, and performance degrades. Thus, the tl;dr of the the tl;dr: small and large TEXT values are faster than medium-sized TEXT values.


'tis why some newer DB storage systems support variable sized pages.

https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf


...for some definition of faster


Objectively slower than large/small strings. I don't think it requires that much nuance to understand the point.


*IF* your workload ignores the value of the string. That's the nuance.

From the article:

> We executed a query against all three tables to search for a string within the text value...This time, the results are more consistent with what we would expect. [Large is slower than medium.]

---

I.E. This isn't a "bug" ... this is a performance tradeoff. Transitioning values from inline to a separate table has performance positives and negatives. This article highlights a performance positive from TOAST once the toast_tuple_target boundary is crossed.


This is a somewhat-artificial case, though. In most cases I've seen in practice, even when you're using the value of the string, you're also filtering by some non-text columns, where the compactness of the TOASTed table allows you to do the filtering step on the left side of the implicit TOAST-join much faster (and keep more of the left table hot in memory); so that even despite needing to implicitly join in the TOAST records, you're only joining them in for the filtered rowset — usually a win overall.


In most cases I've seen in practice, you're querying against an index.

This drastically reduces the impact of unused fields compared to a full table scan.


Only faster for queries that don’t care about the value of the text, and do a full table scan instead of reading from the index


Or when you use the index, but still have to do a lot of IO.


Microsoft SQL Server does this as well with [n]varchar(max) as any size that exceeds a data pages 8k size goes off to a special different data page.

tl;dr; CPU cache sizes are small and text is large. Using numeric fields will always be faster. "hackyhacky" is equal to 1.25 Int64s.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: