MySQL Bad Idea #384

MySQL is a database of compromise. Compromise between running a production-ready relational database and being popular with all sorts of hackers – mostly the ones that don’t really like SQL. And because they don’t really like SQL, they choose MySQL, as MySQL is very forgiving. It is just as forgiving as their favourite language PHP, which forgives their mistakes involving escaping and quoting through funny things like “magic quotes”. Not only is MySQL forgiving, it allows you to write “wrong” SQL and still does something with it. Here’s what I mean by “wrong” SQL: In MySQL, you can legally execute the following statement:


SELECT o.custid, c.name, MAX(o.payment)
  FROM orders AS o, customers AS c
  WHERE o.custid = c.custid
  GROUP BY o.custid;

The statement was taken from here: https://dev.mysql.com/doc/refman/5.6/en/group-by-hidden-columns.html So what does this statement even mean? What will be returned in the c.name projection? MAX(c.name)? ANY(c.name)? FIRST(c.name)? NULL? 42? According to the documentation, ANY(c.name) would best describe what’s going on. This peculiar syntax is probably quite clever for those few that really know when this is useful. When they know exactly, that o.custid and c.name have a 1:1 correlation, and they can speed things up a little by avoiding writing things like MAX(c.name), or by adding c.name to the GROUP BY clause (“yes, saved yet another 8 characters”). But the bulk of newbie MySQL users will be confused by this.

First, they will be confused because they don’t get the c.name they’d expect.
Secondly, they will eventually switch over to another database that gets these things right, and be frustrated all over again, over the funny syntax errors, such as ORA-00979 not a GROUP BY expression

So please,

MySQL users: stop using this non-feature. It will only cause pain and suffering, even if you know how/why it works. SQL’s GROUP BY was not meant to work that way.
MySQL: Deprecate this non-feature.

9 thoughts on “MySQL Bad Idea #384”

What will be the ideal SQL for the solution?

SELECT o.custid, c.name, MAX(o.payment)
FROM customers AS c
LEFT OUTER JOIN orders AS o ON (custid = c.custid)

Or using MAX as in

SELECT o.custid, MAX(c.name), MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;

As mentioned in the post, the ideal SQL statement would be any of these:

-- If there is no real 1:1 correlation between o.custid and c.name
-- (i.e. there are different c.name values per o.custid), then write:
SELECT o.custid, MAX(c.name), MAX(o.payment)
  FROM orders AS o, customers AS c
  WHERE o.custid = c.custid
  GROUP BY o.custid;

-- If there is a 1:1 correlation between o.custid and c.name
-- (i.e. there is exactly 1 c.name value per o.custid), then write:
SELECT o.custid, c.name, MAX(o.payment)
  FROM orders AS o, customers AS c
  WHERE o.custid = c.custid
  GROUP BY o.custid, c.name;

Thanks – now I understand why there are so many questions on SO about Group By on Oracle.

lukaseder says:

August 6, 2012 at 08:14

Yes, its crazy. Just copy-paste the MySQL documentation link, add a “don’t do it that way” and a link to this post, and your Stack Overflow reputation will be sky-high in no time!

Loading...

Reply

MySQL 5.7 DMR5 and above enables ONLY_FULL_GROUP_BY by default – which has also been enhanced to respect functional dependencies. There is a good overview on this change here:

http://rpbouman.blogspot.ca/2014/09/mysql-575-group-by-respects-functional.html

So the short answer it that it’s fixed :) The previous behavior can also be re-enabled if users require it.

lukaseder says:

November 12, 2014 at 22:50

That’s wonderful news! That way, it should be more and more painful in the future, to re-enable all these lenient “features”.

Thanks for the update

Loading...

Reply

Like this:

Published by lukaseder

9 thoughts on “MySQL Bad Idea #384”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.