10 More Common Mistakes Java Developers Make when Writing SQL

Posted on August 12, 2013April 7, 2021 by lukaseder

I was positively surprised to see how popular my recent listing about 10 Common Mistakes Java Developers Make when Writing SQL was, both on my own blog and on my syndication partner DZone. The popularity shows a couple of things:

How important SQL is to the professional Java world.
How common it is to forget about some basic SQL things.
How well SQL-centric libraries such as jOOQ or MyBatis are responding to market needs, by embracing SQL. An amusing fact is that users have even mentioned my blog post on SLICK’s mailing list. SLICK is a non-SQL-centric database access library in Scala. Like LINQ (and LINQ-to-SQL) it focuses on language integration, not on SQL code generation.

jOOQ is the best way to write SQL in Java

Anyway, the common mistakes I listed previously are far from complete, so I will treat you to a sequel of 10 subtly less common, yet equally interesting mistakes Java developers make when writing SQL.

1. Not using PreparedStatements

Interestingly, this mistake or misbelief still surfaces blogs, forums and mailing lists many years after the appearance of JDBC, even if it is about a very simple thing to remember and to understand. It appears that some developers refrain from using PreparedStatements for any of these reasons:

They don’t know about PreparedStatements
They think that PreparedStatements are slower
They think that writing a PreparedStatement takes more effort

First off, let’s bust the above myths. In 96% of the cases, you’re better off writing a PreparedStatement rather than a static statement. Why? For simple reasons:

You can omit syntax errors originating from bad string concatenation when inlining bind values.
You can omit SQL injection vulnerabilities from bad string concatenation when inlining bind values.
You can avoid edge-cases when inlining more “sophisticated” data types, such as TIMESTAMP, binary data, and others.
You can keep open PreparedStatements around for a while, reusing them with new bind values instead of closing them immediately (useful in Postgres, for instance).
You can make use of adaptive cursor sharing (Oracle-speak) in more sophisticated databases. This helps prevent hard-parsing SQL statements for every new set of bind values.

Convinced? Yes. Note, there are some rare cases when you actually want to inline bind values in order to give your database’s cost-based optimiser some heads-up about what kind of data is really going to be affected by the query. Typically, this results in “constant” predicates such as:

DELETED = 1
STATUS = 42

But it shouldn’t result in “variable” predicates such as:

FIRST_NAME LIKE “Jon%”
AMOUNT > 19.95

Note that modern databases implement bind-variable peeking. Hence, by default, you might as well use bind values for all your query parameters. Note also that higher-level APIs such as JPA CriteriaQuery or jOOQ will help you generate PreparedStatements and bind values very easily and transparently when writing embedded JPQL or embedded SQL. More background info:

Caveats of bind value peeking: An interesting blog post by Oracle Guru Tanel Poder on the subject
Cursor sharing. An interesting Stack Overflow question.

The Cure: By default, always use PreparedStatements instead of static statements. By default, never inline bind values into your SQL.

2. Returning too many columns

This mistake is quite frequent and can lead to very bad effects both in your database’s execution plan and in your Java application. Let’s look at the second effect first: Bad effects on the Java application: If you’re selecting * (star) or a “default” set of 50 columns, which you’re reusing among various DAOs, you’re transferring lots of data from the database into a JDBC ResultSet. Even if you’re not reading the data from the ResultSet, it has been transferred over the wire and loaded into your memory by the JDBC driver. That’s quite a waste of IO and memory if you know that you’re only going to need 2-3 of those columns. This was obvious, but beware also of… Bad effects on the database execution plan: These effects may actually be much worse than the effects on the Java application. Sophisticated databases perform a lot of SQL transformation when calculating the best execution plan for your query. It may well be that some parts of your query can be “transformed away”, knowing that they won’t contribute to the projection (SELECT clause) or to the filtering predicates. I’ve recently blogged about this in the context of schema meta data: How schema meta data impacts Oracle query transformations Now, this is quite a beast. Think about a sophisticated SELECT that will join two views:


SELECT *
FROM   customer_view c
JOIN   order_view o
  ON   c.cust_id = o.cust_id

Each of the views that are joined to the above joined table reference might again join data from dozens of tables, such as CUSTOMER_ADDRESS, ORDER_HISTORY, ORDER_SETTLEMENT, etc. Given the SELECT * projection, your database has no choice but to fully perform the loading of all those joined tables, when in fact, the only thing that you were interested in was this:


SELECT c.first_name, c.last_name, o.amount
FROM   customer_view c
JOIN   order_view o
  ON   c.cust_id = o.cust_id

A good database will transform your SQL in a way that most of the “hidden” joins can be removed, which results in much less IO and memory consumption within the database. The Cure: Never execute SELECT *. Never reuse the same projection for various queries. Always try to reduce the projection to the data that you really need. Note that this can be quite hard to achieve with ORMs.

3. Thinking that JOIN is a SELECT clause

This isn’t a mistake with a lot of impact on performance or SQL correctness, but nevertheless, SQL developers should be aware of the fact that the JOIN clause is not part of the SELECT statement per se. The SQL standard 1992 defines a table reference as such:

6.3 <table reference>

<table reference> ::=
    <table name> [ [ AS ] <correlation name>
      [ <left paren> <derived column list> <right paren> ] ]
  | <derived table> [ AS ] <correlation name>
      [ <left paren> <derived column list> <right paren> ]
  | <joined table>

The FROM clause and also the joined table can then make use of such table references:

7.4 <from clause>

<from clause> ::= 
    FROM <table reference> [ { <comma> <table reference> }... ]

7.5 <joined table>

<joined table> ::=
    <cross join>
  | <qualified join>
  | <left paren> <joined table> <right paren>

<cross join> ::=
    <table reference> CROSS JOIN <table reference>

<qualified join> ::=
    <table reference> [ NATURAL ] [ <join type> ] JOIN
      <table reference> [ <join specification> ]

Relational databases are very table-centric. Many operations are performed on physical, joined or derived tables in one way or another. To write SQL effectively, it is important to understand that the SELECT .. FROM clause expects a comma-separated list of table references in whatever form they may be provided. Depending on the complexity of the table reference, some databases also accept sophisticated table references in other statements, such as INSERT, UPDATE, DELETE, MERGE. See Oracle’s manuals for instance, explaining how to create updatable views. The Cure: Always think of your FROM clause to expect a table reference as a whole. If you write a JOIN clause, think of this JOIN clause to be part of a complex table reference:


SELECT c.first_name, c.last_name, o.amount
FROM

    customer_view c
      JOIN order_view o
      ON c.cust_id = o.cust_id

4. Using pre-ANSI 92 JOIN syntax

Now that we’ve clarified how table references work (see the previous point), it should become a bit more obvious that the pre-ANSI 92 JOIN syntax should be avoided at all costs. To execution plans, it usually makes no difference if you specify join predicates in the JOIN .. ON clause or in the WHERE clause. But from a readability and maintenance perspective, using the WHERE clause for both filtering predicates and join predicates is a major quagmire. Consider this simple example:


SELECT c.first_name, c.last_name, o.amount
FROM   customer_view c,
       order_view o
WHERE  o.amount > 100
AND    c.cust_id = o.cust_id
AND    c.language = 'en'

Can you spot the join predicate? What if we joined dozens of tables? This gets much worse when applying proprietary syntaxes for outer join, such as Oracle’s (+) syntax. The Cure: Always use the ANSI 92 JOIN syntax. Never put JOIN predicates into the WHERE clause. There is absolutely no advantage to using the pre-ANSI 92 JOIN syntax.

5. Forgetting to escape input to the LIKE predicate

The SQL standard 1992 specifies the like predicate as such:

8.5 <like predicate>

<like predicate> ::=
    <match value> [ NOT ] LIKE <pattern>
      [ ESCAPE <escape character> ]

The ESCAPE keyword should be used almost always when allowing for user input to be used in your SQL queries. While it may be rare that the percent sign (%) is actually supposed to be part of the data, the underscore (_) might well be:


SELECT *
FROM   t
WHERE  t.x LIKE 'some!_prefix%' ESCAPE '!'

The Cure: Always think of proper escaping when using the LIKE predicate.

6. Thinking that NOT (A IN (X, Y)) is the boolean inverse of A IN (X, Y)

This one is subtle but very important with respect to NULLs! Let’s review what A IN (X, Y) really means:

                  A IN (X, Y)
is the same as    A = ANY (X, Y)
is the same as    A = X OR A = Y

When at the same time, NOT (A IN (X, Y)) really means:

                  NOT (A IN (X, Y))
is the same as    A NOT IN (X, Y)
is the same as    A != ANY (X, Y)
is the same as    A != X AND A != Y

That looks like the boolean inverse of the previous predicate, but it isn’t! If any of X or Y is NULL, the NOT IN predicate will result in UNKNOWN whereas the IN predicate might still return a boolean value. Or in other words, when A IN (X, Y) yields TRUE or FALSE, NOT(A IN (X, Y)) may still yield UNKNOWN instead of FALSE or TRUE. Note, that this is also true if the right-hand side of the IN predicate is a subquery. Don’t believe it? See this SQL Fiddle for yourself. It shows that the following query yields no result:


SELECT 1
WHERE     1 IN (NULL)
UNION ALL
SELECT 2
WHERE NOT(1 IN (NULL))

More details can be seen in my previous blog post on that subject, which also shows some SQL dialect incompatibilities in that area. The Cure: Beware of the NOT IN predicate when nullable columns are involved!

7. Thinking that NOT (A IS NULL) is the same as A IS NOT NULL

Right, so we remembered that SQL implements three-valued logic when it comes to handling NULL. That’s why we can use the NULL predicate to check for NULL values. Right? Right. But even the NULL predicate is subtle. Beware that the two following predicates are only equivalent for row value expressions of degree 1:

                   NOT (A IS NULL)
is not the same as A IS NOT NULL

If A is a row value expression with a degree of more than 1, then the truth table is transformed such that:

A IS NULL yields true only if all values in A are NULL
NOT(A IS NULL) yields false only if all values in A are NULL
A IS NOT NULL yields true only if all values in A are NOT NULL
NOT(A IS NOT NULL) yields false only if all values in A are NOT NULL

See more details in my previous blog post on that subject. The Cure: When using row value expressions, beware of the NULL predicate, which might not work as expected.

8. Not using row value expressions where they are supported

Row value expressions are an awesome SQL feature. When SQL is a very table-centric language, tables are also very row-centric. Row value expressions let you describe complex predicates much more easily, by creating local ad-hoc rows that can be compared with other rows of the same degree and row type. A simple example is to query customers for first names and last names at the same time.


SELECT c.address
FROM   customer c,
WHERE (c.first_name, c.last_name) = (?, ?)

As can be seen, this syntax is slightly more concise than the equivalent syntax where each column from the predicate’s left-hand side is compared with the corresponding column on the right-hand side. This is particularly true if many independent predicates are combined with AND. Using row value expressions allows you to combine correlated predicates into one. This is most useful for join expressions on composite foreign keys:


SELECT c.first_name, c.last_name, a.street
FROM   customer c
JOIN   address a
  ON  (c.id, c.tenant_id) = (a.id, a.tenant_id)

Unfortunately, not all databases support row value expressions in the same way. But the SQL standard had defined them already in 1992, and if you use them, sophisticated databases like Oracle or Postgres can use them for calculating better execution plans. This is explained on the popular Use The Index, Luke page. The Cure: Use row value expressions whenever you can. They will make your SQL more concise and possibly even faster.

9. Not defining enough constraints

So, I’m going to cite Tom Kyte and Use The Index, Luke again. You cannot have enough constraints in your meta data. First off, constraints help you keep your data from corrupting, which is already very useful. But to me, more importantly, constraints will help the database perform SQL transformations, as the database can decide that

Some values are equivalent
Some clauses are redundant
Some clauses are “void” (i.e. they will not return any values)

Some developers may think that constraints are slow. The opposite is the case, unless you insert lots and lots of data, in case of which you can either disable constraints for a large operation, or use a temporary “load table” without constraints, transferring data offline to the real table. The Cure: Define as many constraints as you can. They will help your database to perform better when querying.

10. Thinking that 50ms is fast query execution

The NoSQL hype is still ongoing, and many companies still think they’re Twitter or Facebook in dire need of faster, more scalable solutions, escaping ACID and relational models to scale horizontally. Some may succeed (e.g. Twitter or Facebook), others may run into this:

BNELF1GCUAExynU — Found here: https://twitter.com/codinghorror/status/347070841059692545

For the others who are forced (or chose) to stick with proven relational databases, don’t be tricked into thinking that modern databases are slow. They’re hyper fast. In fact, they’re so fast, they can parse your 20kb query text, calculate 2000-line execution plans, and actually execute that monster in less than a millisecond, if you and your DBA get along well and tune your database to the max. They may be slow because of your application misusing a popular ORM, or because that ORM won’t be able to produce fast SQL for your complex querying logic. In that case, you may want to chose a more SQL-centric API like JDBC, jOOQ or MyBatis that will let you get back in control of your SQL. So, don’t think that a query execution of 50ms is fast or even acceptable. It’s not. If you get these speeds at development time, make sure you investigate execution plans. Those monsters might explode in production, where you have more complex contexts and data.

Conclusion

SQL is a lot of fun, but also very subtle in various ways. It’s not easy to get it right as my previous blog post about 10 common mistakes has shown. But SQL can be mastered and it’s worth the trouble. Data is your most valuable asset. Treat data with respect and write better SQL.

Published by lukaseder

I made jOOQ View all posts by lukaseder

55 thoughts on “10 More Common Mistakes Java Developers Make when Writing SQL”

anjan bacchu says:

September 3, 2013 at 14:05

thanks for the list. more people should read them

Loading...

Reply
1. lukaseder says:
  
  September 3, 2013 at 16:23
  
  Feel free to promote it by linking to it from your blog, or from a social network…
  
  Loading...
  
  Reply
Oleksandr Alesinskyy says:

November 13, 2013 at 14:56

“Using pre-ANSI JOIN syntax” – there are good reasons to use it, especially with Oracle.

It is well known that the Oracle optimizer does not treat ANSI joins well enough (see e.g. https://mail.google.com/mail/?ui=2&shva=1#search/ANSI+join/13890b73971c9405, you may go directly to the Conclusion section at the page bottom).

Even worse, ANSI outer joins in Oracle sometimes result in the incorrect answers (see e.g. http://jonathanlewis.wordpress.com/2013/10/18/deferrable-ri/).

Yes, the situation improves with each new version of Oracle, but the problem still exists even in the newest ones.

On the side note – it is disputable which syntax is more readable. As for me the”classical” Oracle syntax (while less powerful) is much easier to read and understand.

Loading...

Reply
1. lukaseder says:
  
  November 13, 2013 at 15:07
  
  It is well known that the Oracle optimizer does not treat ANSI joins well enough (see e.g. https://mail.google.com/mail/?ui=2&shva=1#search/ANSI+join/13890b73971c9405, you may go directly to the Conclusion section at the page bottom).
  
  Hmm, that link doesn’t work for me, unless you give me your GMail password ;-)
  
  Even worse, ANSI outer joins in Oracle sometimes result in the incorrect answers (see e.g. http://jonathanlewis.wordpress.com/2013/10/18/deferrable-ri/).
  
  Seriously!? Funny bug. Well, I’ve been browsing through Oracle minor release notes before. It’s crazy when you think that you’d trust your most sensitive banking data to that monster, when you see the thousands of fixes they ship with a minor release.
  
  But anyway, it generally works really well. I’ve hardly actually encountered a bug, much less than with other databases…
  
  On the side note – it is disputable which syntax is more readable. As for me the”classical” Oracle syntax (while less powerful) is much easier to read and understand.
  
  Even when joining around 20 tables?
  
  Loading...
  
  Reply
  1. Oleksandr Alesinskyy says:
    
    November 13, 2013 at 15:13
    
    Sorry, copied a link from the wrong tab, the correct one is http://jonathanlewis.wordpress.com/2012/07/16/ansi-outer-2/.
    
    Honestly speaking, never joined 20 tables in my 30+ years of a professional software development (and 19 of them in the relational DB area, mostly – but not exclusively – Oracle) . Not sure is even 8.
    
    Anyhow, I dislike an ANSI syntax – prefer to have all conditions in a WHERE clause and not distributed all over the statement. But this is a matter of taste (save cases when you have no choice as ANSI syntax is more powerful).
    
    Loading...
    
    Reply
    1. lukaseder says:
      
      November 13, 2013 at 15:18
      
      Thanks for the update. That certainly looks like a very interesting blog to follow!
      
      Well, we did join 20 tables in our extremely normalised E-Banking database schema. It wasn’t necessarily a joy, every day. But it worked well once we’ve transformed Oracle-JOIN to ANSI-JOIN. I agree that the previous syntax can be OK with less complex queries, but with our monsters (e.g. 4 out of 20 joined tables were derived tables with unions in them, sigh) ANSI JOINs were simply much more expressive or “just expressive enough” :-)
      
      Loading...
      
      Reply
      1. Oleksandr Alesinskyy says:
        
        November 13, 2013 at 15:23
        
        OK, I would rather go for WITH or views – to factor out some logic.
        
        Funny enough, for that 19 years 8 years where spent in a development of banking systems (as well pretty normalized).
        
        Loading...
        
        Reply
        
        lukaseder says:
        
        November 13, 2013 at 15:26
        
        I can’t cite a reference, but I’ve observed common table expressions to have a non-negligible, negative impact on execution plans, themselves. But you’re right. Factoring out SQL to views is usually the best choice. But you know. Large legacy system and all that, no one daring to touch the code :-)
        
        I’m sure, yours must’ve been slightly better designed or slightly less complex.
        
        Loading...
        
        Reply
Oleksandr Alesinskyy says:

November 13, 2013 at 15:05

“Thinking that NOT (A IN (X, Y)) is the boolean inverse of A IN (X, Y)” – your comment is plainly incorrect. There is no way (save an implementation bug in the RDBMS) that this is not true (if you release that not UNKNOWN is . UNKNOWN).

It is simply impossible that “in other words, when A IN (X, Y) yields TRUE or FALSE, NOT(A IN (X, Y)) may still yield UNKNOWN instead of FALSE or TRUE.”.
You, likely mixed-up NOT (A IN (X,Y)) with A NOT IN (X,Y). These 2 clauses are not equivalent, former is the exact opposite to A IN (X,Y) and latter is not. E.g. ‘A’ IN (‘A’, NULL) shall yield TRUE, NOT (‘A’ IN (‘A’, NULL)) shall yield FALSE, ‘A’ NOT IN (‘A’, NULL) shall yield UNKNOWN.

If some RDBMS behaves differently – file a bug.

Loading...

Reply
1. lukaseder says:
  
  November 13, 2013 at 15:12
  The SQL1992 standard specifies in 8.4 <in predicate>:
```
3) The expression

      RVC NOT IN IPV

   is equivalent to

      NOT ( RVC IN IPV )
```
  Try it yourself: http://sqlfiddle.com/#!4/d41d8/20619. In fact, the problem is related to the fact that these are the same:
```
A IN (B, C)
A = B OR A = C
```
  whereas these are the same, too
```
A NOT IN (B, C)
A != B AND A != C
```
  Substitute UNKNOWN for B and you’ll see how the first expression yields TRUE, whereas the second yields UNKNOWN
  
  Loading...
  Reply
  1. Oleksandr Alesinskyy says:
    
    November 13, 2013 at 15:20
    
    Seems so – but I would rate it as a bug.
    
    Loading...
    
    Reply
    1. lukaseder says:
      
      November 13, 2013 at 15:24
      
      SQL NULLs is certainly not at all intuitive. But it’s been specified this way for the last 20 (or more) years, and MySQL and Sybase ASE are the only databases I’m aware of that got it wrong (or intuitively right, if you prefer):
      
      https://blog.jooq.org/2012/01/27/sql-incompatibilities-not-in-and-null-values/
      
      Loading...
      
      Reply
      1. Oleksandr Alesinskyy says:
        
        November 13, 2013 at 15:43
        
        It is rather intuitive if you are accustomed with the three-stat logic.
        It’s easy “not TRUE -> FALSE”, “not FALSE -> TRUE”, “not UNKNOWN -> UNKNOWN” (or “not NULL -> NULL, if you prefer).
        
        The problem is that the discussed implementation of “not (A in (…))” violates this logic by performing an illegal (from my point of view) transformation. Namely, it transforms to “A not in (…)”. Former shall firstly evaluate “A in (…)” (that mean to check if at least one member inside parenthesis is A) and then apply NOT to this evaluation results, latter should verify that each member inside parenthesis is not A). Semantics clearly differs.
        
        Loading...
        
        Reply
        
        lukaseder says:
        
        November 13, 2013 at 15:46
        
        Yes, that’s what I meant to say, actually.
        
        Loading...
        
        Reply
        
        Oleksandr Alesinskyy says:
        
        November 13, 2013 at 15:54
        
        Yes, so this behavior should be considered as a bug in the RDBMS-es that deliver it. IT is nice that you spotted it is wide-spreading. If I remember correctly, older version of Oracle behaved correctly (that mean have distinguished NOT (A IN (…)) form A NOT IN (…), but I have no chance to verify this statement as I do not have an access to them any more.
        
        Loading...
        
        Reply
BBI says:

December 5, 2013 at 08:46

For prepared statements, which send a prepare request to the DB, then work with a returned query identifier, I think these should be replaced by simple parameterized SQL, as supplied through the “execute immediate” command in Oracle, or the “sp_executesql” stored procedure in SQL Server.

I don’t know what the Java world is like at the moment, but working with the ADO.NET interface allows such queries quite for some time now: You build your SQL string like “select id, name, creationdate from mytable where name=@searchname and datetime >= @startdatetime” (replace the “@” MS SQL Server named parameters with “:” prefixed Oracle parameters or question marks if you like). Then you set the binding parameters, and the database detects the already executed query (if already executed) by SQL string equality.

No need for an extra round trip to prepare, no keeping track of SQL handles in the application, in best case, one query with a plan for all applications. Just same SQL string = same query plan! Even if you create new connections and commands, or send the same statement from different applications.

I tried and compared to classic prepared statements, and saw no difference in performance, although sending a full SQL string takes more IO than sending a handle.

Also, one can avoid faulty implementations of prepared statements, such as sending the prepare request before every single SQL query, killing performance with double round-trips and by forcing the database to recompile on every request, maybe even leaving many open cursors (not sure about that).

Loading...

Reply
1. lukaseder says:
  
  December 5, 2013 at 10:33
  
  That’s interesting insight. I actually don’t have production quality experience with SQL Server, but I’ve also heard that preparing statements may not be the most efficient way to do things there. I’m actually not even sure if you can do the things you’ve described for ADO.NET with JDBC.
  
  Anyway, the Oracle database distinguishes between soft and hard parses. Sending a full SQL string for a previously cached cursor tends to yield almost no overhead in a “regular” setup.
  
  Loading...
  
  Reply
2. Oleksandr Alesinskyy says:
  
  December 5, 2013 at 13:28
  
  Hi BBI,
  
  Unfortunately you mixed to different concepts in your post – namely, server-side prepared statements and JDBC prepared statements. The JDBC specification does not state that java.sql.PreparedStatement shall be implemented using a server-side prepared statement, it is up to the driver implementer.
  
  And in reality many drivers do not use them, them most notable, MySQL (for a very simple reason – MySQL does not support server-side prepared statements).
  
  Moreover, even a usage of a server/side prepared statement does not necessarily involve an additional round-trip. The reason is that an execution of the PARSE command may delayed to the first execution of the statement, so they may be sent to a server in a single shot (Oracle drivers do so, moreover, under some circumstances they perform PARSE, EXECUTE and FETCH in a single round-trip).
  
  Loading...
  
  Reply
  1. lukaseder says:
    
    December 5, 2013 at 13:33
    
    Man, what does MySQL support?? :-) Really? That is news to me. Crazy. So bind values are in fact inlined by the MySQL JDBC driver before sending the statement across the wire?
    
    Loading...
    
    Reply
    1. Oleksandr Alesinskyy says:
      
      December 5, 2013 at 13:55
      
      Sorry, my bad. It (MySQL) does support server-side prepared statement. But the MySQL JDBC documentation is written in a very confusing manner. See http://dev.mysql.com/doc/refman/5.6/en/connector-j-reference-implementation-notes.html.
      “PreparedStatements are implemented by the driver, as MySQL does not have a prepared statement feature.
      …
      Starting with version 3.1.0 MySQL Connector/J, server-side prepared statements and binary-encoded result sets are used when the server supports them. ”
      
      That said – I do not use MySQL.
      
      Loading...
      
      Reply
      1. lukaseder says:
        
        December 5, 2013 at 15:09
        
        Thanks for the clarification! Yes, there are lots of good reasons for not using MySQL. Check out the MySQL stored procedure nightmare as explained by Bill Karwin here on Quora:
        http://www.quora.com/MySQL/What-are-the-reasons-not-to-use-stored-procedures-vs-using-those-We-are-running-mySql-using-JPA/answer/Bill-Karwin
        
        Loading...
        
        Reply
        
        Oleksandr Alesinskyy says:
        
        December 5, 2013 at 16:01
        
        Honestly speaking, only the first item from the Bill’s list makes any sense for me. Other are either applicable for most of databases or irrelevant.
        
        Loading...
        
        Reply
        
        lukaseder says:
        
        December 5, 2013 at 16:09
        
        Really? I know that many people appreciate PL/SQL’s modularity and object-orientation (second point). With SQLDeveloper or Toad, you can easily debug PL/SQL, which really isn’t irrelevant if your developing a core-banking system in PL/SQL (third point). SQLDeveloper and Toad are awesome IDEs (fourth point). PL/SQL’s standard libs are very extensive (fifth point).
        
        So, maybe you just haven’t done any “hard-core” stored procedure programming before? :-)
        
        Loading...
        
        Reply
        
        Oleksandr Alesinskyy says:
        
        December 5, 2013 at 16:23
        
        “developing a core-banking system in PL/SQL” – this i exactly that I have done for 8 years :)
        
        And I agree with everything you written about Oracle and PL/SQL. But you missed that I have mentioned not ALL databases but MOST databases. How many other databases support all these capabilities? For example, packages?
        
        As well some of your points I do not consider as important – e.g. debugging of stored procedures.
        
        As for IDEs – there is e.g. a TOAD for MySQL (freeware). Do not know how good it is – but it exists.
        
        Loading...
        
        Reply
        
        lukaseder says:
        
        December 5, 2013 at 19:00
        
        Point taken :-) Indeed, packages I’ve only seen in Oracle, even if they’re specified (in a non-Oracle way) in the SQL standard – where they’re called “modules”. I suspect that SQL Server has a lot of good tooling too, and MySQL also has the SQL Workbench, which isn’t too bad. I don’t think that you can debug through MySQL stored procedures though.
        
        Loading...
        
        Reply
  2. BBI says:
    
    December 8, 2013 at 01:30
    
    What I meant is that these statements are fully server side, while classic prepared statements are still at least half client side (the client has to keep a handle to the prepared query, and the prepared statement and query plan will be lost when the handle gets lost or is unknown in another application or part of the same application). Some drivers use query caching to reduce this problem, but, internally, still keep track of the SQL handle.
    
    Parameterized queries without prepared statements are sent just like you might send Ad-hoc queries with inlined SQL parameter values, but have, in contrary, SQL (with placeholders) and parameter values separated. So the SQL string can be identified when the same query comes again, maybe many times, only with different parameter values. The (minor) downside is that any difference in SQL, such as capitalization or spaces, will result in a different query plan, which can mostly be avoided.
    
    Loading...
    
    Reply
hourim says:

December 29, 2013 at 09:45

I would like to share with you a real life situation where not using Prepare Statement while calling stored PL/SQL procedure was damaging the library cache of the shared pool so that the unique solution was sometimes to stop and restart the database

http://hourim.wordpress.com/2011/06/16/bind-variable-shared-pool-and-cursor-sharing-parameter/

Best regards
Mohamed Houri
http://www.hourim.wordpress.com

Loading...

Reply
1. lukaseder says:
  
  December 29, 2013 at 10:20
  Thanks for sharing! You restarted the database? What about
```
alter system flush shared_pool;
```
  But anyway, these are both extreme measures. The only solution is to write better SQL, with bind values.
  
  Loading...
  Reply
  1. hourim says:
    
    December 29, 2013 at 14:33
    
    When I was handled this issue to trouble shoot the DBA told me that even flushing the shared_pool was not possible (if my memory serves me well). The only thing that bring the database to its normal state was a stop/restart.
    
    Best regards
    
    Loading...
    
    Reply
    1. lukaseder says:
      
      December 29, 2013 at 19:35
      
      Alright. Wow, you must’ve really killed the DB with your queries, then :-)
      
      Loading...
      
      Reply
Brendan Furey says:

July 22, 2016 at 10:43

I agree with the main point 2, but as for ‘A good database will transform your SQL in a way that most of the “hidden” joins can be removed, which results in much less IO and memory consumption within the database’, I don’t think that’s true. While Oracle, for example, will take the select list into account in its optimization, it can’t just remove the joins as they might affect the result set.

Joining complex views is actually a really bad anti-pattern, as I discuss here: http://aprogrammerwrites.eu/?p=931

Loading...

Reply
1. al0 says:
  
  July 22, 2016 at 11:45
  
  It depends – if database may identify that joins do not affect an result set (and for it appropriate not null/FK constraints should be in place) it may (and in many cases will) eliminate unnecessary joins.
  
  Loading...
  
  Reply
2. lukaseder says:
  
  July 22, 2016 at 23:05
  
  There it is. The “anti pattern” word. I’ll tell my client who nests views 5 levels deep and joins dozens of them at each level without the database complaining the least bit ;)
  
  Perhaps, the anti pattern you had in mind is “SELECT *” at the top-level query
  
  Loading...
  
  Reply
Brendan Furey says:

July 22, 2016 at 16:52

You’re right! – in some cases it’s possible although it’s not something I’ve seen happen in practice – I just verified it in Oracle 11.2

Loading...

Reply
1. lukaseder says:
  
  July 22, 2016 at 23:09
  
  I’m really not sure what you mean. JOIN elimination is a thing in 11gR2. It already was before. There are, however, and very unfortunately, 1-2 bugs related to ANSI JOIN. See this Tweet’s discussion for examples:
  https://twitter.com/lukaseder/status/732948399826325504
  
  Loading...
  
  Reply
  1. Brendan Furey says:
    
    July 23, 2016 at 02:13
    
    I mean “I know. I just checked it on my current database. But it doesn’t happen often in my personal experience “.
    
    Loading...
    
    Reply
    1. lukaseder says:
      
      July 23, 2016 at 08:13
      
      Well, I’m really curious about concrete examples. Because it happens very often in my personal experience.
      
      Loading...
      
      Reply
Michael Eriksson says:

October 27, 2017 at 20:00

Most of this is not in anyway specific to use with Java, but is more a matter of poor developers. (Notwithstanding that a frame-work can remove choices, e.g. whether a “SELECT *” can be avoided.)

I have seen a number of SQL and/or PL/SQL developers who are weak on prepared statements (but PL/SQL takes care of much of that automatically); “*” are quite common, e.g. “SELECT * INTO record” where only several of the fields are actually used later; most of the colleagues in my current project do not use ANSI-joins and several of them have been skeptical to my use, claiming that ANSI-joins are hard to understand (I do not agree, obviously!); only yesterday I had to explain to a colleague that the reason that one view with “a = ‘A'” delivered different results from another view with “a NOT IN (‘B’, ‘C’, ‘D’)” was unlikely to be related to NULL values; etc.

(Where A/B/C/D are fictional representations of the actual values; except for NULL no other values were present.)

Loading...

Reply
1. lukaseder says:
  
  October 28, 2017 at 18:21
  You’re right. As many others have pointed out, this isn’t Java specific. We just happen to be a blog for Java developers, at least historically. So, there :)
  
  I agree, the “SELECT * INTO record” thing is a bit of a sore spot in the PL/SQL language. I wish PL/SQL allowed for ad-hoc structural record type creation using INTO, just like using FOR. I do this quite a few times as a workaround:
```
FOR rec IN (SELECT a, b FROM tab) LOOP
  -- I now have that record, even if this loop iterates only once.
END LOOP;
```
  Regarding ANSI joins, ask your friends to run a FULL OUTER JOIN
  
  Loading...
  Reply
earlb71 says:

June 7, 2018 at 12:49

Been a database programmer for 20 years, some things listed are very subjective and at least one is totally wrong. Don’t believe everything in this blog, go and look them up.

Loading...

Reply
1. lukaseder says:
  
  June 7, 2018 at 13:10
  
  Thanks for your comment. Would you mind being a bit more specific?
  
  Loading...
  
  Reply
JD says:

November 23, 2018 at 19:27

For number 8 I was having a heck of a time finding out how to use row value expressions (RVE) when it came to using Oracle’s JDBC 12.c. I tried executing queries like:
SELECT * FROM TABLE WHERE (C1, C2) = (V1,V2)
and was always met with errors.

The only references to RVE I could find were relating to JOOQ, which I don’t have the authority to implement in my applications. Anyway, I found a single reference to RVE in an Oracle community discussion and found the correct implementation syntax to be:
SELECT * FROM TABLE WHERE (C1, C2) = ((V1,V2))

If this is common knowledge to those versed in Oracle please link some of the documentation because I couldn’t find anything when reading the developer guides.

Hope this helps!

Loading...

Reply
1. lukaseder says:
  
  November 26, 2018 at 10:30
  
  See, the very moderate licensing of jOOQ would have already saved you all this trouble and you would have gotten your money back in time not wasted to funky syntax! :) (Every manager will understand this quite authoritative reasoning)
  
  Anyway. Next time, you could translate your SQL to Oracle on our website, for free: https://www.jooq.org/translate
  
  The documentation is here. It seems quite clear once you read it, but it is not at all obvious that it exists, nor why it should be needed. I haven’t gotten a reasonable answer from anyone at Oracle as to why this syntax is even necessary:
  https://docs.oracle.com/database/121/SQLRF/conditions002.htm#SQLRF52110
  
  Loading...
  
  Reply

10 More Common Mistakes Java Developers Make when Writing SQL

1. Not using PreparedStatements

2. Returning too many columns

3. Thinking that JOIN is a SELECT clause

4. Using pre-ANSI 92 JOIN syntax

5. Forgetting to escape input to the LIKE predicate

6. Thinking that NOT (A IN (X, Y)) is the boolean inverse of A IN (X, Y)

7. Thinking that NOT (A IS NULL) is the same as A IS NOT NULL

8. Not using row value expressions where they are supported

9. Not defining enough constraints

10. Thinking that 50ms is fast query execution

Conclusion

Like this:

Published by lukaseder

55 thoughts on “10 More Common Mistakes Java Developers Make when Writing SQL”

Leave a ReplyCancel reply

1. Not using PreparedStatements

2. Returning too many columns

3. Thinking that JOIN is a SELECT clause

4. Using pre-ANSI 92 JOIN syntax

5. Forgetting to escape input to the LIKE predicate

6. Thinking that NOT (A IN (X, Y)) is the boolean inverse of A IN (X, Y)

7. Thinking that NOT (A IS NULL) is the same as A IS NOT NULL

8. Not using row value expressions where they are supported

9. Not defining enough constraints

10. Thinking that 50ms is fast query execution

Conclusion

Like this:

Published by lukaseder

55 thoughts on “10 More Common Mistakes Java Developers Make when Writing SQL”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.