tl;dr: PostreSQL stores small values directly in the table, but has to store lar...

cmrdporcupine · on Aug 30, 2023

'tis why some newer DB storage systems support variable sized pages.

https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf

paulddraper · on Aug 30, 2023

...for some definition of faster

junon · on Aug 30, 2023

Objectively slower than large/small strings. I don't think it requires that much nuance to understand the point.

paulddraper · on Aug 30, 2023

*IF* your workload ignores the value of the string. That's the nuance.

From the article:

> We executed a query against all three tables to search for a string within the text value...This time, the results are more consistent with what we would expect. [Large is slower than medium.]

---

I.E. This isn't a "bug" ... this is a performance tradeoff. Transitioning values from inline to a separate table has performance positives and negatives. This article highlights a performance positive from TOAST once the toast_tuple_target boundary is crossed.

derefr · on Aug 30, 2023

This is a somewhat-artificial case, though. In most cases I've seen in practice, even when you're using the value of the string, you're also filtering by some non-text columns, where the compactness of the TOASTed table allows you to do the filtering step on the left side of the implicit TOAST-join much faster (and keep more of the left table hot in memory); so that even despite needing to implicitly join in the TOAST records, you're only joining them in for the filtered rowset — usually a win overall.

paulddraper · on Aug 30, 2023

In most cases I've seen in practice, you're querying against an index.

This drastically reduces the impact of unused fields compared to a full table scan.

MobiusHorizons · on Aug 30, 2023

Only faster for queries that don’t care about the value of the text, and do a full table scan instead of reading from the index

troupo · on Aug 30, 2023

Or when you use the index, but still have to do a lot of IO.

jabart · on Aug 30, 2023

Microsoft SQL Server does this as well with [n]varchar(max) as any size that exceeds a data pages 8k size goes off to a special different data page.

tl;dr; CPU cache sizes are small and text is large. Using numeric fields will always be faster. "hackyhacky" is equal to 1.25 Int64s.