I'm trying to figure out what is ISO/ANSI SQL:1999 approved usage for the "group by" statement in terms of the...
SQL:1999 E051-04 feature, which states that in reference to the "group by clause" that the "GROUP BY can contain columns not in
The SQL92 standard required that any column referred to in the "group by" must also be specified in the "select" list (among other restrictions).
Oracle 9i documents that they now support feature "E051-04" (as does Transact-SQL, e.g.).
DB2 7 had stated that the argument of "group by" must be a column name found in the "select" (a SQL92 requirement). DB2 8 relaxed "group by" support in allowing the more generic "grouping-expression" to be the argument of "group by" (but didn't state this in terms of support for feature E051-04).
This all sounds theoretical I'm afraid, but it's practical for me in helping to decide what kind of "group by" statements are more likely to be portable (because of support for the SQL:1999 "core" support).
The issue is more complicated than it needs to be, in my opinion, because the wording for feature E051-04 is itself so generic and vague (it only says that the "group by" can contain columns Not in the 'select' list - but they don't state where they can come from (the original "from" table? but if so, is that table still around after the intermediate, subset table has been built?).
Thanks for any light you can shed on this.
What a great question. Sincerely. Your interest in how things work, or should work, or might work, is commendable. In particular, your objective of portability shows good insight into what makes a database application a good database application or an albatross.
So this may not be a good time to mention MySQL's quirky support of the GROUP BY clause. MySQL allows so-called "hidden fields" in a grouping SQL statement, allowing you to omit columns from the GROUP BY but keep them in the SELECT list. They explain how it works, and the reason why this syntax is allowed, on the documentation's 12.9.3 GROUP BY with Hidden Fields page, which includes the warning:
Do not use this feature if the columns you omit from the GROUP BY part are not unique in the group! You will get unpredictable results.
This warning used to be in bold, so they've toned it down. Note the exasperation in the exclamation mark.
But you're asking about the reverse situation, extra columns in the GROUP BY that aren't in the SELECT. Is it okay to GROUP BY a,b,c but omit c from the SELECT list?
My advice may not be worth much, as it comes of experience, not theory. (If you want standards, you want Joe Celko. My knowledge of standards is nowhere near his.)
My advice is to write your SQL so that it will work in as many databases as you're ever going to target. Make sure it conforms to at least one of the published standards, and uses nothing deprecated in a later one. If in doubt, use the Mimer Validator.
So, is it okay to write a GROUP BY on columns that aren't in the SELECT? If you get different results in different databases, and if portability matters, then obviously, no. Is there a way around this? Yes, but that's a different question.