CUBRID: A Lesser-Known Korean OSS Database Gem

While RedHat and Google have been dumping MySQL for MariaDB, there’s actually a third, much lesser-known option for MySQL-oriented database folks in the RDBMS market: CUBRID.

One of CUBRID’s main goals is also to lure MySQL users away from Oracle by offering many equivalent syntax elements that are available in either the MySQL or Oracle databases. But this gem from the far east, which is backed by the Korean Naver Corp. offers a lot more. It is one of the few Open Source RDBMS that

  • is also an ORDBMS (only PostgreSQL can compete with this)
  • implements the SQL standard MERGE statement (HSQLDB and Firebird do as well, Derby is currently implementing it)
  • implements a wide variety of window functions (only PostgreSQL competes, again)
  • implements a wide variety of MySQL’s proprietary SQL extensions
  • implements Oracle’s awesome CONNECT BY syntax (no other database does that)

Popularity-wise, there’s surely much to catch up with, comparing with MySQL and MariaDB:

Reproduced with permission by DB-Engines.com

Reproduced with permission by DB-Engines.com

But the next time you’re looking at a replacement for MySQL, why not also consider CUBRID?

jOOQ Newsletter October 28, 2013

subscribe to the newsletter here

jOOQ 3.2 Licensing Update

Three weeks ago, we have announced and released

jOOQ 3.2 under the terms of a new dual-licensing model offering jOOQ’s support for commercial databases under a commercial jOOQ license.

The jOOQ community has reacted intensively on the user group, most of the feedback having been very positive. The licensing change is received as a good sign for jOOQ’s long-term future, even if some aspects of it were rightfully challenged.

One important such aspect is the jOOQ license 1.0 (Oct 1) phrasing on the term “distribution”. This has been rectified in the new jOOQ license 1.1 (Oct 28) which explicitly grants our licensees the right to distribute and embed jOOQ with their end-user applications (see §6.2). This has been a misunderstanding on our part and was never intended to be a limitation to our customers. As this updated license grants new rights to our customers, it shall be in effect immediately also for existing jOOQ 3.2 customers.

Do also note that we have published an FAQ for licensing questions you may have.

jOOQ 3.3 Outlook

The upcoming release 3.3 (scheduled for early 2014) will include a bulk of exciting new features among which:

  • Support for MS Access, available in the jOOQ Professional Edition
  • Support for keyset paging (see SQL zone, below)
  • SQL transformation SPI example implementations showing how to perform row-level security and shared-schema multi-tenancy with jOOQ 3.2+
  • A new Open Source contribution platform, allowing the community to easily share SPI implementations and other sorts of plugins not maintained by the Data Geekery

And much more. See the GitHub 3.3.0 milestone for an overview of feature candidates, or participate in the user group to discuss your ideas!

Upcoming Events

Some of you may have noticed with regret that the MEDIT Symposium in beautiful Sicilly has been cancelled. It would have been a great place to learn about the latest activities in the Open Source world. But we are more than happy to announce to you that these events are confirmed. Meet Lukas and jOOQ and lots of other good stuff at:

Stay informed about 2014 events on www.jooq.org/news.

SQL Zone – keyset paging

In recent newsletters, we’ve been talking about NoSQL. But let’s get back to SQL andhow to stay in control of your SQL. As mentioned before, jOOQ 3.3 will provide native support for keyset paging, which is also referred to by some as the “seek method”. Our recent blog post explains how this works, and why it allows you to perform constant-time paging even in very large data sets. This is particularly useful when implementing Twitter- or Facebook-like lazy loading of more and more data!

SQL Zone – useful SQL tricks

We also would like to advertise two SQL tricks that come in handy every once in a while:

SQL Zone – MySQL parser / jOOQ code generator

Our October 10 newsletter announced a new cooperation between GUDU Soft and Data Geekery: SQL2jOOQ. Developments around a MySQL -> jOOQ code transformation have progressed quite a bit. This promising tool will soon help you migrate your large legacy application with tons of hand-written, string-based SQL to jOOQ.

Top 5 Useful Hidden Eclipse Features

Eclipse is a beast. A device whose mystery is only exceeded by its power. Some would call it a continuum transfunctioner. Others would call it a transmogrifier. Yes, it is so huge, it takes years to master. And then, your manager shows up and tells you: We’re using NetBeans now.

Just kidding. No one uses NetBeans, except for Adam Bien. So let’s have a look at 5 very useful, but hidden Eclipse features.

1: My favourite feature: Favourites

Everyone uses libraries with static methods. Since Java 5, we can static import them, so we no longer have to write things like

SomeVeryImportantUtility.split(string1, string2);

But who is going to static-import all those methods manually in every class referencing them? No one. Because you can define your favourite types and members in the preferences:

Preferences > Favorites

Preferences > Favorites

And then, just start typing and look for auto-completion:

Auto-completion

Auto-completion

The auto-completion will also generate the necessary static import. Very useful when using DSLs with many functions, for instance. Obviously, you will want to make a careful decision, which ones are really your favourite libraries and within those libraries, which are your favourite types. As you’re on the jOOQ blog (or a syndication thereof), let me give you a hint. Always favourite org.jooq.impl.DSL.

2: The awesome block selection mode

I’ve recently blogged about it here. This is so awesome, it merits being mentioned one more time.

Block selection

Notice the highlighted button, the sixth from the left. That’s the awesome “Block Selection Mode” (Alt-Shift-A on Windows, or Ctrl+3 and then typing block selection). It lets you write opening quotes on each selected line at the same position. This is so useful when you have to edit large amounts of almost identical lines.

3: The EGit staging view

Aparently, not every EGit user is aware of this view. As a matter of fact, to me, this view is the most important reason why I’m NOT using the shell commands. Check out this beauty:

Staging view

Staging view

OK, not really a beauty. Reminds me of this comic strip by Eric Burke. But we’re in transmogrifier land and the staging view is filled with changes waiting to be added / committed / pushed once you synchronize a repository with its origin. I can now decide on a line-per-line basis, which changes I want to be added to the index (note I haven’t added the main method). This leads to there being staged (added) changes and unstaged (not yet added) changes. As always in EGit, I can now either directly commit, commit+push, amend+commit, amend+commit+push in one go. Try doing that on the command line.

Now after this first commit, I can now again stage/add the main method in a separate commit. I guess, behind the scenes, this might be implemented using stashing or some other sort of local cache. I don’t care, this is beautiful!

I can probably do much more with this beauty, but that would fill an entire book (I’m waiting for such a book, @niborst, if you’re reading this)

If you didn’t understand any of the above Git talk, try this popular guide, or join me with…

4: Type filters

Yes, because Eclipse’s out-of-the-box autocompletion is nonsense. Yes, it is! No one really wants to call hashCode() or wait() or notifyAll() on an API. Ever. Actually, nowadays, hardly anyone even wants to call wait() or notify() even on a lock object, except if they’re writing the JDK’s concurrency libraries. But luckily someone else is doing that for free, and they’re certainly using vi or emacs or a hex editor, so they don’t mind Object methods.

So then, why is my autocompletion popup filled with this nonsense? Let’s create a class…

So many options? Really?

So many options? Really?

Wow. I thought I had only one method? I don’t mind equals(), although the few times I actually type equals() I can type it out. So let’s remove that stuff:

Preferences > Type filters

Preferences > Type filters

Thanks Eclipse for mentioning awt, too. I wish there was an option to remove awt from the JDK and from our collective memory, entirely. But at least, adding awt to the type filters stops you from having to choose between importing java.util.List (99.9%) and java.awt.List (8.3%). The rounding error is due to the amount of times you had previously chosen the wrong list, accidentally, and wondered why you couldn’t assign an ArrayList. Thanks again, awt. Do also note how my chameleon Windows 8 has changed window colours while taking screenshots. Tiles, what’s next? Anyway back to Eclipse, let’s try auto-completing again…

There can be only one

There can be only one

Better? Better!

Now…

5: Formatter tags

After having spent all that time with block selection, carefully formatting our SQL… bam comes the intern and/or styleguide-nazi and auto-formats all your beautiful source files to a huge one-liner. But not for much longer. Protect your code with easy to type formatter tags:

Preferences > Formatting Options > Off/On Tags

Preferences > Formatting Options > Off/On Tags

Remember to use something concise to protect your carefully crafted SQL, as you might have to type these tags once or twice:

Now protected

Now protected

No one touches that code again. Except the intern who forgot to and/or styleguide-nazi who refuses to apply your formatting settings. And the other intern, who uses NetBeans. Just kidding. No one uses NetBeans.

Again, these tags work wonderfully with DSLs, which are hard to auto-format.

More to come

Eclipes is an awesome beast. Every year, new versions are stacked with new features that we Java developers get for free! You can give back to Eclipse. While I think that the foundation (backed by IBM) might not heavily rely on donations, your best option is to report bugs and feature requests, here:

https://bugs.eclipse.org/bugs

… or if you’re brave, to sign the CLA and start contributing.

There is always room for improvement to this awesome developer device, adding more material and candidates for my next top 5 list.

Faster SQL Pagination with jOOQ Using the Seek Method

Last week, I have blogged about why it is important to stay in control of your SQL, as writing good SQL helps keeping your operations costs down. This is true in many ways and today, we’re going to look into another way to write good, high-performing SQL: Using the “Seek Method”.

Slow OFFSET

In order to understand the Seek Method, let’s first understand what problem it solves: SQL OFFSET clauses are slow. They’re slow for a simple reason. In order to reach a high offset from a result set, all previous records have to be skipped and counted. While a query with no OFFSET can be very fast (using MySQL syntax):

SELECT first_name, last_name, score
FROM players
WHERE game_id = 42
ORDER BY score DESC
LIMIT 10;

Skipping to page number 10’000 will be much slower:

SELECT first_name, last_name, score
FROM players
WHERE game_id = 42
ORDER BY score DESC
LIMIT 10
OFFSET 100000;

Even if the tuple (game_id, score) is indexed, we’ll have to actually traverse the whole index in order to count how many records we’ve already skipped. While this problem can be somewhat lessened by a trick, joining players to a derived table, there is an alternative, much faster approach to tackling paging: the Seek Method.

The Seek Method

While it is not quite clear who originally invented the Seek Method (some also call it “keyset paging”), a very prominent advocate for it is Markus Winand. He describes the Seek Method on his blog (and in his book):

http://use-the-index-luke.com/sql/partial-results/fetch-next-page

Essentially, the Seek Method does not skip records before an OFFSET, but it skips records until the last record previously fetched. Think about paging on Google. From a usability point of view, you hardly ever skip exactly 100’000 records. You mostly want to skip to the next page and then again, to the next page, i.e. just past the last record / search result previously fetched. Take the following top 10 players (fake names generated with name generator):

first_name | last_name | score
------------------------------
Mary       | Paige     |  1098
Tracey     | Howard    |  1087
Jasmine    | Butler    |  1053
Zoe        | Piper     |  1002
Leonard    | Peters    |   983
Jonathan   | Hart      |   978
Adam       | Morrison  |   976
Amanda     | Gibson    |   967
Alison     | Wright    |   958
Jack       | Harris    |   949

The above are the first 10 players ordered by score. This can be achieved quite quickly using LIMIT 10 only. Now, when skipping to the next page, you can either just use an OFFSET 10 clause, or you skip all users with a score higher than 949:

SELECT first_name, last_name, score
FROM players
WHERE game_id = 42
-- Let's call this the "seek predicate"
AND score < 949
ORDER BY score DESC
LIMIT 10;

This will then give you the players on the next page:

first_name | last_name | score
------------------------------
William    | Fraser    |   947
Claire     | King      |   945
Jessica    | McDonald  |   932
...        | ...       |   ...

Note that the previous query assumes that the score is unique within the players table, which is unlikely, of course. If William Fraser also had 949 points, just as Jack Harris, the last player on the first page, he would be “lost between pages”. It is thus important to create a non-ambiguous ORDER BY clause and “seek predicate”, by adding an additional unique column:

SELECT player_id, first_name, last_name, score
FROM players
WHERE game_id = 42
-- assuming 15 is Jack Harris's player_id
AND (score, player_id) < (949, 15)
ORDER BY score DESC, player_id DESC
LIMIT 10;

Now, the “seek predicate” depends on the ORDER BY clause. Here are a couple of possible, alternative configurations:

-- "consistent" ASC and DESC correspond to > and <
AND (score, player_id) > (949, 15)
ORDER BY score ASC, player_id ASC

-- "mixed" ASC and DESC complicate things a bit
AND ((score < 949)
  OR (score = 949 AND player_id > 15))
ORDER BY score DESC, player_id ASC

-- The above might be further performance-tweaked
AND (score <= 949)
AND ((score < 949)
  OR (score = 949 AND player_id > 15))
ORDER BY score DESC, player_id ASC

If columns in the ORDER BY clause are nullable, NULLS FIRST and NULLS LAST might apply and further complicate the “seek predicate”.

How is this better than OFFSET?

The Seek Method allows for avoiding expensive “skip-and-count” operations, replacing them with a simple range scan on an index that might cover the “seek predicate”. Since you’re applying ORDER BY on the columns of the “seek predicate” anyway, you might have already chosen to index them appropriately.

While the Seek Method doesn’t improve queries for low page numbers, fetching higher page numbers is significantly faster as proven in this nice benchmark:

Reproduced from use-the-index-luke.com with permission by Markus Winand

More interesting feedback on the subject can be found in this reddit.com thread, in which even Tom Kyte himself added a couple of remarks.

A side effect of the Seek Method

A side effect of the Seek Method is the fact that the paging is more “stable”. When you’re about to display page 2 and a new player has reached page 1 in the mean time, or if any player is removed entirely, you will still display the same players on page 2. In other words, when using the Seek Method, there is no guarantee that the first player on page 2 has rank 11.

This may or may not be desired. It might be irrelevant on page 10’000, though.

jOOQ 3.3 support for the Seek Method

The upcoming jOOQ 3.3 (due for late 2013) will include support for the Seek Method on a SQL DSL API level. In addition to jOOQ’s existing LIMIT .. OFFSET support, a “seek predicate” can then be specified through the synthetic SEEK clause (similar to jOOQ’s synthetic DIVIDE BY clause):

DSL.using(configuration)
   .select(PLAYERS.PLAYER_ID,
           PLAYERS.FIRST_NAME,
           PLAYERS.LAST_NAME,
           PLAYERS.SCORE)
   .from(PLAYERS)
   .where(PLAYERS.GAME_ID.eq(42))
   .orderBy(PLAYERS.SCORE.desc(),
            PLAYERS.PLAYER_ID.asc())
   .seek(949, 15) // (!)
   .limit(10)
   .fetch();

Instead of explictly phrasing the “seek predicate”, just pass the last record from the previous query, and jOOQ will see that all records before and including this record are skipped, given the ORDER BY clause.

This appears much more readable than the actual SQL rendered because the “seek predicate” is closer to the ORDER BY clause where it belongs. Also, jOOQ’s usual row value typesafety is applied here helping you find the right degree / arity and data types for your SEEK clause. In the above example, the following method calls would not compile in Java:

// Not enough arguments in seek()
   .orderBy(PLAYERS.SCORE.desc(),
            PLAYERS.PLAYER_ID.asc())
   .seek(949)

// Wrong argument types in seek()
   .orderBy(PLAYERS.SCORE.desc(),
            PLAYERS.PLAYER_ID.asc())
   .seek(949, "abc")

Get to work with the Seek Method

With native API support for a SEEK clause, you can get in control of your SQL again and implement high-performing SQL quite easily. Early adopters can already play around with the current state of jOOQ’s 3.3.0 Open Source Edition, which is available on GitHub.

And even if you don’t use jOOQ, give the Seek Method a try. You may just have a much faster application afterwards!

Why Staying in Control of Your SQL is so Important

Lots of blog posts and research papers are written about the topics of scaling up and scaling out. This interesting blog post, for instance, sheds some light on the two strategies with respect to physical maintenance costs, such as cooling and electricity consumption. Certainly non-negligible aspects for very large systems.

But before solving problems at a very big scale, consider much simpler SQL tuning mechanisms. Very often, when you have a bottleneck in your application, it is at the database layer. This fact is used by many NoSQL evangelists to promote their products, claiming that scaling out is much easier with NoSQL databases. This might be true, but ask yourself: Do you need a system that works under heavy load? Or is your bottleneck a performance bottleneck?

In other words: Do you have a 5’000’000-concurrent-users problem? Or do you have a request-takes-more-than-3-seconds problem? Because if you suffer from the latter, you probably do not need to scale out nor up. Your “traditional” architecture is probably quite fine, but your database / SQL queries aren’t. The popular use-the-index-luke.com website features an interesting article about the two top performance problems caused by ORM tools. They are:

  • The infamous N+1 selects problem
  • The hardly-known Index-Only Scan

Both problems result from the fact that popular ORMs generate SQL code in a way that can hardly be influenced by developers. People often claim that tools like Hibernate generate better SQL than the average developer. This is true to an extent that the average developer might never care to actually learn to write better SQL. Which in turn leads to the above problems.

Hibernate is very good at generating 70% of your application’s boring CRUD SQL. At the same time, Hibernate never claimed to be a replacement for SQL as you will have to resort to native SQL in 30% of the time. Should you then use Hibernate’s native SQL API? Or an alternative like jOOQ?

The important thing is to get back in control of your SQL when performance matters. Know your indexes. Know your database meta data. And use a tool that allows you to write precisely the SQL statement you want. Learning better SQL will help you save lots of money on operations costs as you might not need to scale out nor up.

Recursive SQL for Data Normalisation

Recursive SQL can be awesome, although a bit hard to read in its SQL standard beauty. Let’s assume you have some aggregated data with dates and a number of events per date:

|                           DATE | COUNT |
|--------------------------------|-------|
| October, 01 2013 00:00:00+0000 |     2 |
| October, 02 2013 00:00:00+0000 |     1 |
| October, 03 2013 00:00:00+0000 |     3 |
| October, 04 2013 00:00:00+0000 |     4 |
| October, 05 2013 00:00:00+0000 |     2 |
| October, 06 2013 00:00:00+0000 |     0 |
| October, 07 2013 00:00:00+0000 |     2 |

Now let’s assume you want to normalise or “unaggregate” this data, generating “COUNT” records per date. The desired output is this:

|                           DATE | EVENT_NUMBER |
|--------------------------------|--------------|
| October, 01 2013 00:00:00+0000 |            1 |
| October, 01 2013 00:00:00+0000 |            2 |
| October, 02 2013 00:00:00+0000 |            1 |
| October, 03 2013 00:00:00+0000 |            1 |
| October, 03 2013 00:00:00+0000 |            2 |
| October, 03 2013 00:00:00+0000 |            3 |
| October, 04 2013 00:00:00+0000 |            1 |
| October, 04 2013 00:00:00+0000 |            2 |
| October, 04 2013 00:00:00+0000 |            3 |
| October, 04 2013 00:00:00+0000 |            4 |
| October, 05 2013 00:00:00+0000 |            1 |
| October, 05 2013 00:00:00+0000 |            2 |
| October, 07 2013 00:00:00+0000 |            1 |
| October, 07 2013 00:00:00+0000 |            2 |

As you may have noticed, there are no records for those dates with zero events (October 06). With recursive SQL, this is rather simple to achieve.

with recursive

-- Data could also be a regular table containing
-- the actual data
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
),

-- This is the recursive common table expression
-- It starts with all data where count > 0
-- ... and then recurses by subtracting one
recurse(date, count) as (
  select date, count
  from data
  where count > 0
  union all
  select date, count - 1
  from recurse
  where count > 1
)
select date, count event_number from recurse
order by date asc, event_number asc;

See also this SQLFiddle to see the above CTE in action.

Incredibly, Oracle’s CONNECT BY clause doesn’t seem to be an option here. I challenge you to find a better solution, though! For instance, this beautiful solution that works with PostgreSQL:

with recursive
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
)
select date, generate_series(1, count) event_number
from data
where count > 0
order by date asc, event_number asc;

Who Says “I Want to Contribute” ?

In recent posts, I’ve been a bit critical with Open Source. But doing Open Source is a lot of fun. The publicity of feedback is thrilling and encouraging. And then, there’s the “I want to contribute” phenomenon. Those guys that are so thrilled with your wicked product that will change the world, they want to help. They want to participate. They want to contribute. They want instructions on how to do it. And then ……….. the big void. You’ll never hear from them again. Even if you ask them a couple of times, through private messages, etc. Even if you show them a couple of easy tickets that they could resolve first. Blank.

Who are they? Where did they come from? Where do they go? Are the real? Or robots?

Whoever they are, I find them amusing and refreshing. But now, back to work. Someone has to actually write the Open Source code.