Hacker News new | past | comments | ask | show | jobs | submit
> database engine sets up the join

Every major database uses nested loop, hash, or sort-merge joins. If the data is small enough to fetch in a second query, I don't see any scenario where it would make a query spill to disk when it otherwise wouldn't have (except those pesky OR's).

Most JOIN issues can be resolved by composing using sub-queries/CTE's to defer a join to a smaller intermediary result, and the only time this generally can't be done is if you need a predicate on the joined table.

> you've asked the database engine to sort things

I've only found this to be true when there's an OR condition where the single query ends up doing a bitmap against every row, in which case a UNION will be faster since it's just appending two already-index-ordered streams.

> Same with UNION vs IN

A union is almost always going to be worse - most query planners cannot optimize across queries in a UNION, so each query is going to have separate ops vs. a single op with the IN. The only case I've seen this be true is for multi-column predicates with an OR clause against a composite index. I just tested the former in both Postgres and MSSQL and the UNION query cost was 2x the IN clause.

> maybe it decides to do an index scan which takes longer

This has not been my experience unless the table statistics are bad.