Here's the real reason: this dev is an jerk who thinks postgres users get too much sleep.
My experience on 8.4, within the last 3 years: a query that is run hundreds of times per second with an average run time in milliseconds and a max query time of .3 seconds suddenly starts taking 100 to 1000 times as long to run. Six hours of debugging later starting at 3 in the morning when systems started failing, we figured out that some magic in the query planner had tripped over and changed the query plan to something that is at least two orders of magnitude worse. No indices have changed. No schemas have changed. Data grows by maybe 30k rows per day which is very reasonable given table sizes and the 128GB of ram dedicated to pg.
Of course, there's no way to specify the query plan. Instead, we ended up fucking with configs until the query plan swapped back.
That's why people like locking query plans. Not necessarily to control the best case, but to control the average / worst case.
> That's why people like locking query plans. Not necessarily to control the best case, but to control the average / worst case.
This. I've had exactly the same experience as you.
Nobody wants to put hints in the query plan. But when your web site is down because one day the freakin' query planner decided all by itself that it was time for a change and some of your worst case queries are taking 2 minutes to return I don't want to be fudging about with statistics trying to understand the internal mind of the planner to convince it to return to sanity. I just want to make it do a plan that I know won't lose me my job.
The kind of attitude in this article reminds me - sadly - of the BS that used to come from the MySQL devs. "No you don't need transactions! Your system is broken if it uses transactions!". Of course it was BS and the minute they supported transactions they were all over how good it was.
For the record, that form of statement ("no, you don't need X") cuts both ways, and therefore whether or not I agree that PostgreSQL needs query hints, this argument of comparison bothers me: you could use the same argument for "No, you don't need the ability to violate referential integrity! Your system is broken if it violates declared foreign key constraints!", which is to me a very correct (and "very PostgreSQL") statement to make about how horrible it is that MySQL /added a feature/ that turns off foreign key constraint checks in a way that leaves your data in an incosistent state due to the same kind of pressure from DBAs who weren't really thinking about the long term consequences. Put differently: you stripped away all of the reasoning and are now comparing the lexical syntax of a conclusion and trying to claim that this means two situations are alike, which is actively misleading to your readers and yourself.
Absolutely. It is the nature of the profession that DBAs are conservative. If you have something that works, you want it to continue to work. So if it is working well enough, don't change it!
The optimizer is great. Wonderful. It found plans that are working well for me. Yay!
Now what? The potential upside of it finding an even better plan is minimal. I'm satisfied with its performance as is. The potential downside of it deciding that a worse plan is better is huge. Let me go and tell it to not change its mind!
Relational databases already have enough "fall over with no previous sign of problems" failure modes. (The hard one has to do with locking, if a particular fine-grained lock is operating at 98% of capacity it shows no sign of problems, but at 101% of capacity the database falls over.) There is no need to add more.
Update: I've actually run across people recommending a workaround for this problem: turn off automatic statistics gathering. That way you know that the query optimizer won't automatically change what it is doing. I bet I know what this particular developer thinks of that strategy!
In several cases, I have order by (id{primary key}+0) desc just to bust up the very strong bias to just backwards scan on the primary key till it fills the limit window. That's perfectly fine if you're looking at the end of a time series, but if you're actually looking 200k records back, even if stuff is cached in memory, that's a hell of a hit.
The case that first triggered that one was when the query planner went from a constant time lookup to that index scan, and took the time to do a large update from 10s of seconds to 12 hours.
Yea, I have seen Oracle do basically the same thing, and our solution is to stick in a stored outlined (effectively locking the plan). If you own the code (ie it is not a 3rd party COTS package) you can also use hints to control queries. I would hate to be without hints in Oracle.
Oracle 11g apparently can spot when a query plan changes and the new one is much slower than the old one - at least that is what is says in the documentation. Whether it works or not I will find out if we ever get a system upgraded to 11g!
> Oracle 11g apparently can spot when a query plan changes and the new one is much slower than the old one - at least that is what is says in the documentation. Whether it works or not I will find out if we ever get a system upgraded to 11g!
I'm surprised this isn't already the default.
I guess the next step for postrgres is to collect statistics on plans as well as data.
My experience on 8.4, within the last 3 years: a query that is run hundreds of times per second with an average run time in milliseconds and a max query time of .3 seconds suddenly starts taking 100 to 1000 times as long to run. Six hours of debugging later starting at 3 in the morning when systems started failing, we figured out that some magic in the query planner had tripped over and changed the query plan to something that is at least two orders of magnitude worse. No indices have changed. No schemas have changed. Data grows by maybe 30k rows per day which is very reasonable given table sizes and the 128GB of ram dedicated to pg.
Of course, there's no way to specify the query plan. Instead, we ended up fucking with configs until the query plan swapped back.
That's why people like locking query plans. Not necessarily to control the best case, but to control the average / worst case.