CUBRID: A Lesser-Known Korean OSS Database Gem

While RedHat and Google have been dumping MySQL for MariaDB, there’s actually a third, much lesser-known option for MySQL-oriented database folks in the RDBMS market: CUBRID.

One of CUBRID’s main goals is also to lure MySQL users away from Oracle by offering many equivalent syntax elements that are available in either the MySQL or Oracle databases. But this gem from the far east, which is backed by the Korean Naver Corp. offers a lot more. It is one of the few Open Source RDBMS that

  • is also an ORDBMS (only PostgreSQL can compete with this)
  • implements the SQL standard MERGE statement (HSQLDB and Firebird do as well, Derby is currently implementing it)
  • implements a wide variety of window functions (only PostgreSQL competes, again)
  • implements a wide variety of MySQL’s proprietary SQL extensions
  • implements Oracle’s awesome CONNECT BY syntax (no other database does that)

Popularity-wise, there’s surely much to catch up with, comparing with MySQL and MariaDB:

Reproduced with permission by DB-Engines.com
Reproduced with permission by DB-Engines.com

But the next time you’re looking at a replacement for MySQL, why not also consider CUBRID?

jOOQ Newsletter September 17, 2013

Subscribe to this newsletter here.

SQL for calculations

SQL can be used for heavy calculations. This doesn’t mean that it has to, of course. Many Java-oriented software architects are reluctant to allow for business logic entering their database. DBA tend to disagree and promote complex logic in database views or stored procedures.

The pros and cons of both sides can be seen in this popular discussion on reddit. Our take on this discussion is a rather pragmatic one, saying: Neither approach is “better”. But not knowing about awesome, often vendor-specific SQL features is a pity. For example, here are two approaches to calculating a running total with Oracle SQL:

What is your experience with SQL for calculations? Reply to this e-mail and tell us your story.

SQL Performance Explained

SQL Performance ExplainedWhen running calculations in the database using SQL, it is always good to know your way around indexing and performance tuning. As jOOQ is a very SQL-centric environment, we would like to promote a very good book about SQL Performance.

SQL databases have become incredibly fast in parsing and executing even very complex SQL statements. But to many SQL developers, writing performing SQL is still a mystery. SQL Performance Explained by Markus Winandcovers 90% of what any SQL developer should know in very simple terms. A must-read for all SQL beginners and pros! Available in English, German, and French.

PostgreSQL 9.3

PostgreSQL 9.3 has been released! This is great news as a wonderful database has just gotten better. Apart from support for materialised views and updatable views, PostgreSQL now also supports the SQL standard LATERAL JOIN, which will soon be supported in jOOQ as well.

More about what’s new in PostgreSQL 9.3 can be seen here.

Google and MariaDB

After RedHat’s alleged switching from MySQL to MariaDB in RHEL, Google now also announces that they will start to migrate away from Oracle’s popular Open Source database. These migrations will further strengthen MariaDB’s position and communities, as large players in the data market will stop contributing to MySQL and contribute to MariaDB instead.

Read more about this migration here.

The Myth About Slow SQL JOIN Operations

In my recent SQL work for a large Swiss bank, I have maintained nested database view monsters whose unnested SQL code amounted up to 5k lines of code, joining the same table over and over again in separate subselects combined via UNION operations. This monster performed in way under 50ms, no matter how we queried it (see “10 more common mistakes” about the speed of queries). Of course, this performance was only achieved after lots of fine-tuning, load-testing and benchmarking. But it worked. Our Oracle database never let us down on these things.

Nonetheless, many SQL users think that JOIN operations are slow. Why? Perhaps because they are / used to be, in MySQL? I’m currently reading this interesting book by Markus Winand. The book is called SQL Performance Explained. He’s also the author of Use-The-Index-Luke.com where you can get free insight into his book. I still recommend reading the whole book, though. Even SQL old-timers and SQL nerds like me will find 1-2 novel, very interesting approaches, some of which will be incorporated into jOOQ very soon!

In particular, consider this page which explains very well how Hash JOIN operations work:
http://use-the-index-luke.com/sql/join/hash-join-partial-objects

MySQL Bad Idea #573

This is MySQL’s Bad Idea #573 (after #384, which I’ve blogged about before) I’ve just had a terrible experience with a bug report from the jOOQ User Group, related to escaping of backslashes in string literals in MySQL. First, I thought to myself, whatever. SQL doesn’t escape backslashes. The only escape character within a string literal according to the early SQL standards is the quote as in quote quote. Citing from SQL-1992 (slightly simplified):

<character string literal> ::=
    <quote> [ <character representation>... ] <quote>

<character representation> ::=
    <nonquote character>
  | <quote symbol><nonquote character> ::= !! See the Syntax Rules.
<quote symbol> ::= <quote><quote>

Alright? Crystal clear. There’s no escaping other than <quote><quote>

The only time when you will need to be able to escape something is with the LIKE predicate, in case you want to escape % and _ symbols. You can then use the ESCAPE clause:

<like predicate> ::=
    <match value> [ NOT ] LIKE <pattern>
    [ ESCAPE <escape character> ]

I have now learned, that MySQL (and of course MariaDB) unlike any other database also support quoting with backslashes, similar to Java and other languages. That’s not a problem per se, although from a cross-vendor compatibility perspective, it’s quite nasty. But then, I’ve discovered there is a flag called NO_BACKSLASH_ESCAPES:

This just reminds me of PHP’s horrible magic quotes. In fact, combine arbitrary configurations of PHP and MySQL on your server and good luck to you of ever getting string literals right. Sigh.

SAP’s Hilarious SQL Whitepaper(s)

While looking for some authoritative information about Sybase SQL Anywhere 12’s TOP .. START AT clause, I stumbled upon this hilarious white paper here, which I do not want to keep from you:
http://www.sybase.com/files/White_Papers/Sybase_Top_10_Features_In_SQL_Anywhere_12.pdf

I will take advantage of “fair use policy” and cite parts from section 7:

Feature number 7: improved support for DaffySQL syntax

If I told you that RowGenerator.row_num contains the values 1 through 255, what would you say this query returned?

[Query example]

Give up? OK, how about this one?

[Query example]

Still stumped? If I told you they both returned exactly the same result set as the following query, what would you say?

[Query example]

Yes, the LIMIT clause is new to SQL Anywhere 12, exactly the same as TOP START AT except it uses zero as the starting point for numbering rows instead of 1.

An “offset”, get it?

As in “Here’s ten dollars, let me count it for you: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.”

Why implement LIMIT? And why include it in a list of cool features?

Because there are a lot of MySQL users out there who don’t have TOP START AT, and they’ve written zillions of queries using LIMIT, and they’d like to migrate their apps to SQL Anywhere without rewriting everything. And PostgreSQL users too… welcome aboard!

Migrating to SQL Anywhere is definitely cool.

So be cool and migrate to SQL Anywhere already! :-) I’m now going through the rest of this fun document.

10 Common Mistakes Java Developers Make when Writing SQL

This article is part of a series. You might also like:

Java developers mix object-oriented thinking with imperative thinking, depending on their levels of:

  • Skill (anyone can code imperatively)
  • Dogma (some use the “Pattern-Pattern”, i.e. the pattern of applying patterns everywhere and giving them names)
  • Mood (true OO is more clumsy to write than imperative code. At first)

But when Java developers write SQL, everything changes. SQL is a declarative language that has nothing to do with either object-oriented or imperative thinking. It is very easy to express a query in SQL. It is not so easy to express it optimally or correctly. Not only do developers need to re-think their programming paradigm, they also need to think in terms of set theory.

Here are common mistakes that a Java developer makes when writing SQL through JDBC or jOOQ (in no particular order). For 10 More Common Mistakes, see this article here.

jOOQ is the best way to write SQL in Java

Here are common mistakes that a Java developer makes when writing SQL (in no particular order):

1. Forgetting about NULL

Misunderstanding NULL is probably the biggest mistake a Java developer can make when writing SQL. This is also (but not exclusively) due to the fact that NULL is also called UNKNOWN. If it were only called UNKNOWN, it would be easier to understand. Another reason is that JDBC maps SQL NULL to Java null when fetching data or when binding variables. This may lead to thinking that NULL = NULL (SQL) would behave the same way as null == null (Java)

One of the crazier examples of misunderstanding NULL is when NULL predicates are used with row value expressions.

Another, subtle problem appears when misunderstanding the meaning of NULL in NOT IN anti-joins.

The Cure:

Train yourself. There’s nothing but explicitly thinking about NULL, every time you write SQL:

  • Is this predicate correct with respect to NULL?
  • Does NULL affect the result of this function?

2. Processing data in Java memory

Few Java developers know SQL very well. The occasional JOIN, the odd UNION, fine. But window functions? Grouping sets? A lot of Java developers load SQL data into memory, transform the data into some appropriate collection type, execute nasty maths on that collection with verbose loop structures (at least, before Java 8’s Collection improvements).

But some SQL databases support advanced (and SQL standard!) OLAP features that tend to perform a lot better and are much easier to write. A (non-standard) example is Oracle’s awesome MODEL clause. Just let the database do the processing and fetch only the results into Java memory. Because after all some very smart guys have optimised these expensive products. So in fact, by moving OLAP to the database, you gain two things:

  • Simplicity. It’s probably easier to write correctly in SQL than in Java
  • Performance. The database will probably be faster than your algorithm. And more importantly, you don’t have to transmit millions of records over the wire.

The Cure:

Every time you implement a data-centric algorithm in Java, ask yourself: Is there a way to let the database perform that work for me?

3. Using UNION instead of UNION ALL

It’s a shame that UNION ALL needs an extra keyword compared to UNION. It would be much better if the SQL standard had been defined to support:

  • UNION (allowing duplicates)
  • UNION DISTINCT (removing duplicates)

Not only is the removal of duplicates rarely needed (or sometimes even wrong), it is also quite slow for large result sets with many columns, as the two subselects need to be ordered, and each tuple needs to be compared with its subsequent tuple.

Note that even if the SQL standard specifies INTERSECT ALL and EXCEPT ALL, hardly any database implements these less useful set operations.

The Cure:

Every time you write a UNION, think if you actually wanted to write UNION ALL.

4. Using JDBC Pagination to paginate large results

Most databases support some way of paginating ordered results through LIMIT .. OFFSET, TOP .. START AT, OFFSET .. FETCH clauses. In the absence of support for these clauses, there is still the possibility for ROWNUM (Oracle) or ROW_NUMBER() OVER() filtering (DB2, SQL Server 2008 and less), which is much faster than pagination in memory. This is specifically true for large offsets!

The Cure:

Just use those clauses, or a tool (such as jOOQ) that can simulate those clauses for you.

5. Joining data in Java memory

From early days of SQL, some developers still have an uneasy feeling when expressing JOINs in their SQL. There is an inherent fear of JOIN being slow. This can be true if a cost-based optimiser chooses to perform a nested loop, possibly loading complete tables into database memory, before creating a joined table source. But that happens rarely. With appropriate predicates, constraints and indexes, MERGE JOIN and HASH JOIN operations are extremely fast. It’s all about the correct metadata (I cannot cite Tom Kyte often enough for this). Nonetheless, there are probably still quite a few Java developers who will load two tables from separate queries into maps and join them in Java memory in one way or another.

The Cure:

If you’re selecting from various tables in various steps, think again to see if you cannot express your query in a single statement.

6. Using DISTINCT or UNION to remove duplicates from an accidental cartesian product

With heavy joining, one can lose track of all the relations that are playing a role in a SQL statement. Specifically, if multi-column foreign key relationships are involved, it is possible to forget to add the relevant predicates in JOIN .. ON clauses. This might result in duplicate records, but maybe only in exceptional cases. Some developers may then choose to use DISTINCT to remove those duplicates again. This is wrong in three ways:

  • It (may) solve the symptoms but not the problem. It may as well not solve the symptoms in edge-cases.
  • It is slow for large result sets with many columns. DISTINCT performs an ORDER BY operation to remove duplicates.
  • It is slow for large cartesian products, which will still load lots of data into memory

The Cure:

As a rule of thumb, when you get unwanted duplicates, always review your JOIN predicates. There’s probably a subtle cartesian product in there somewhere.

7. Not using the MERGE statement

This isn’t really a mistake, but probably some lack of knowledge or some fear towards the powerful MERGE statement. Some databases know other forms of UPSERT statements, e.g. MySQL’s ON DUPLICATE KEY UPDATE clause. But MERGE is really so powerful, most importantly in databases that heavily extend the SQL standard, such as SQL Server.

The Cure:

If you’re UPSERTING by chaining INSERT and UPDATE or by chaining SELECT .. FOR UPDATE and then INSERT or UPDATE, think again. Apart from risking race conditions, you might be able to express a simpler MERGE statement.

8. Using aggregate functions instead of window functions

Before the introduction of window functions, the only means to aggregate data in SQL was by using a GROUP BY clause along with aggregate functions in the projection. This works well in many cases, and if aggregation data needed to be enriched with regular data, the grouped query can be pushed down into a joined subquery.

But SQL:2003 defined window functions, which are implemented by many popular database vendors. Window functions can aggregate data on result sets that are not grouped. In fact, each window function supports its own, independent PARTITION BY clause, which is an awesome tool for reporting.

Using window functions will:

  • Lead to more readable SQL (less dedicated GROUP BY clauses in subqueries)
  • Improve performance, as a RDBMS is likely to optimise window functions more easily

The Cure:

When you write a GROUP BY clause in a subquery, think again if this cannot be done with a window function.

9. Using in-memory sorting for sort indirections

The SQL ORDER BY clause supports many types of expressions, including CASE statements, which can be very useful for sort indirections. You should probably never sort data in Java memory because you think that

  • SQL sorting is too slow
  • SQL sorting cannot do it

The Cure:

If you sort any SQL data in memory, think again if you cannot push sorting into your database. This goes along well with pushing pagination into the database.

10. Inserting lots of records one by one

JDBC knows batching, and you should use it. Do not INSERT thousands of records one by one, re-creating a new PreparedStatement every time. If all of your records go to the same table, create a batch INSERT statement with a single SQL statement and multiple bind value sets. Depending on your database and database configuration, you may need to commit after a certain amount of inserted records, in order to keep the UNDO log slim.

The Cure:

Always batch-insert large sets of data.

Some interesting books

Some very interesting books on similar topics are

jOOQ is the best way to write SQL in Java

Liked this article?

You might also like these follow-up articles

10 Things in SQL Server Which Don’t Work as Expected

So far, I have been blogging about curious RDBMS caveats mostly related to Oracle and MySQL databases. Some examples:

But there are also other databases, which have 1-2 things that do not work as expected. For example, SQL Server. Here’s an interesting blog post, showing 10 things in SQL Server which don’t work as expected:
http://tech.pro/tutorial/1419/10-things-in-sql-server-which-don-t-work-as-expected