How to use SQL PIVOT to Compare Two Tables in Your Database


This can happen ever so easily. You adapt a table by adding a new column:

ALTER TABLE payments ADD code NUMBER(3);

You go on, implementing your business logic – absolutely no problem. But then, later on (perhaps in production), some batch job fails because it makes some strong assumptions about data types. Namely, it assumes that the two tables payments and payments_archive are of the same row type:

CREATE TABLE payments
  (
    id         NUMBER(18) NOT NULL,
    account_id NUMBER(18) NOT NULL,
    value_date DATE,
    amount     NUMBER(25, 2) NOT NULL
  );

CREATE TABLE payments_archive
  (
    id         NUMBER(18) NOT NULL,
    account_id NUMBER(18) NOT NULL,
    value_date DATE,
    amount     NUMBER(25, 2) NOT NULL
  );

Being of the same row type, you can simply move a row from one table to the other, e.g. using a query like this one:

INSERT INTO payments_archive
SELECT * FROM payments
WHERE value_date < SYSDATE - 30;

(not that using the above syntax is a good idea in general, it’s actually a bad idea. but you get the point)

What you’re getting now is this:

ORA-00913: too many values

The fix is obvious, but probably, the poor soul who has to fix this is not you, but someone else who has to figure out among possibly hundreds of columns, which ones don’t match. Here’s how (in Oracle):

Use PIVOT to compare two tables!

You could of course not use PIVOT and simply select all columns from either table from the dictionary views:

SELECT 
  table_name,
  column_name
FROM all_tab_cols
WHERE table_name LIKE 'PAYMENTS%'

This will produce the following result:

TABLE_NAME         COLUMN_NAME                  
------------------ ---------------
PAYMENTS           ID                             
PAYMENTS           ACCOUNT_ID                     
PAYMENTS           VALUE_DATE                     
PAYMENTS           AMOUNT                         
PAYMENTS           CODE                           
PAYMENTS_ARCHIVE   ID                             
PAYMENTS_ARCHIVE   ACCOUNT_ID                     
PAYMENTS_ARCHIVE   VALUE_DATE                     
PAYMENTS_ARCHIVE   AMOUNT    

Not very readable. You could of course use set operations and apply INTERSECT and MINUS (EXCEPT) to filter out matching values. But much better:

SELECT *
FROM (
  SELECT 
    table_name,
    column_name
  FROM all_tab_cols
  WHERE table_name LIKE 'PAYMENTS%'
) 
PIVOT ( 
  COUNT(*) AS cnt
  FOR (table_name) 
  IN (
    'PAYMENTS' AS payments, 
    'PAYMENTS_ARCHIVE' AS payments_archive 
  ) 
) t;

And the above now produces:

COLUMN_NAME  PAYMENTS_CNT PAYMENTS_ARCHIVE_CNT
------------ ------------ --------------------
CODE                    1                    0 
ACCOUNT_ID              1                    1 
ID                      1                    1 
VALUE_DATE              1                    1 
AMOUNT                  1                    1 

It is now very easy to identify the column that is missing from the PAYMENTS_ARCHIVE table. As you can see, the result from the original query produced one row per column AND per table. We took that result and pivoted it “FOR” the table name, such that we will now only get one row per column

How to read PIVOT?

It’s easy. Comments are inline:

SELECT *

-- This is the table that we're pivoting. Note that
-- we select only the minimum to prevent side-effects
FROM (
  SELECT 
    table_name,
    column_name
  FROM all_tab_cols
  WHERE table_name LIKE 'PAYMENTS%'
) 

-- PIVOT is a keyword that is applied to the above
-- table. It generates a new table, similar to JOIN
PIVOT (

  -- This is the aggregated value that we want to
  -- produce for each pivoted value
  COUNT(*) AS available 

  -- This is the source of the values that we want to
  -- pivot
  FOR (table_name) 

  -- These are the values that we accept as pivot
  -- columns. The columns names are produced from
  -- these values concatenated with the corresponding
  -- aggregate function name
  IN (
    'PAYMENTS' AS payments, 
    'PAYMENTS_ARCHIVE' AS payments_archive 
  ) 
) t;

That’s it. Not so hard, was it?

The nice thing about this syntax is that we can generate as many additional columns as we want, very easily:

SELECT *
FROM (
  SELECT 
    table_name,
    column_name,
    CAST(data_type AS varchar(6)) data_type
  FROM all_tab_cols
  WHERE table_name LIKE 'PAYMENTS%'
) 
PIVOT ( 
  COUNT(*) AS cnt,
  MAX(data_type) AS type -- new function here
  FOR (table_name) 
  IN (
    'PAYMENTS' AS p, 
    'PAYMENTS_ARCHIVE' AS a
  ) 
) t;

… producing (after additional erroneous DDL) …

COLUMN_NAME      P_CNT P_TYPE      A_CNT A_TYPE
----------- ---------- ------ ---------- ------
CODE                 1 NUMBER          0
ACCOUNT_ID           1 NUMBER          1 NUMBER 
ID                   1 NUMBER          1 NUMBER 
VALUE_DATE           1 DATE            1 TIMESTAMP
AMOUNT               1 NUMBER          1 NUMBER

This way, we can discover even more flaws between the different row types of the tables. In the above example, we’ve used MAX(), because we have to provide an aggregation function, even if each pivoted column corresponds to exactly one row in our example – but that doesn’t have to be.

What if I’m not using Oracle?

SQL Server also supports PIVOT, but other databases don’t. You can always emulate PIVOT using GROUP BY and CASE. The following statement is equivalent to the previous one:

SELECT
  t.column_name,
  COUNT(CASE table_name 
        WHEN 'PAYMENTS' THEN 1 END) p_cnt,
  MAX  (CASE table_name 
        WHEN 'PAYMENTS' THEN data_type END) p_type,
  COUNT(CASE table_name 
        WHEN 'PAYMENTS_ARCHIVE' THEN 1 END) a_cnt,
  MAX  (CASE table_name 
        WHEN 'PAYMENTS_ARCHIVE' THEN data_type END) a_type
FROM (
  SELECT 
    table_name,
    column_name,
    data_type
  FROM all_tab_cols
  WHERE table_name LIKE 'PAYMENTS%'
) t
GROUP BY
  t.column_name;

This query will now produce the same result on all the other databases as well.

Isn’t that… ?

Yes, it is! The above usage of aggregate functions in combination with CASE can be “shortened” (as in less keywords) even more, using the SQL standard FILTER clause, which we’ve blogged about recently.

So, in PostgreSQL, you could write the following query:

SELECT
  t.column_name,
  COUNT(table_name) 
    FILTER (WHERE table_name = 'PAYMENTS') p_cnt,
  MAX  (data_type) 
    FILTER (WHERE table_name = 'PAYMENTS') p_type,
  COUNT(table_name) 
    FILTER (WHERE table_name = 'PAYMENTS_ARCHIVE') a_cnt,
  MAX  (data_type) 
    FILTER (WHERE table_name = 'PAYMENTS_ARCHIVE') a_type
FROM (
  SELECT 
    table_name,
    column_name,
    data_type
  FROM information_schema.columns
  WHERE table_name LIKE 'payments%'
) t
GROUP BY
  t.column_name;

Further reading

Excited? Yes. There are more awesome SQL features in various databases. Read on about:

How to Extract a Date Part in SQL


The Modern SQL Twitter account (by Markus Winand) published a hint about how to extract a date part in SQL:

Is it true? Yes it is, in the SQL standard and in a variety of standards-compliant databases. But let’s check what jOOQ does when you run the following program on all 18 currently supported RDBMS:

import static org.jooq.impl.DSL.currentDate;
import static org.jooq.impl.DSL.extract;
import static org.jooq.impl.DSL.using;

import java.util.stream.Stream;

import org.jooq.DatePart;
import org.jooq.SQLDialect;

public class Extract {
    public static void main(String[] args) {
        // Get all distinct SQLDialect families
        Stream
        .of(SQLDialect.values())
        .map(SQLDialect::family)
        .distinct()
        .forEach(family -> {
            System.out.println();
            System.out.println(family);

            // Get all supported date parts
            Stream
            .of(DatePart.values())

            // For each family / part, get the
            // EXTRACT() function
            .map(part -> extract(currentDate(), part))
            .forEach(expr -> {
                System.out.println(
                    using(family).render(expr)
                );
            });
        });
    }
}

The output is:

Open Source databases

DEFAULT
extract(year from current_date())
extract(month from current_date())
extract(day from current_date())
extract(hour from current_date())
extract(minute from current_date())
extract(second from current_date())

CUBRID
extract(year from current_date())
extract(month from current_date())
extract(day from current_date())
extract(hour from current_date())
extract(minute from current_date())
extract(second from current_date())

DERBY
year(current_date)
month(current_date)
day(current_date)
hour(current_date)
minute(current_date)
second(current_date)

FIREBIRD
extract(year from current_date)
extract(month from current_date)
extract(day from current_date)
extract(hour from current_date)
extract(minute from current_date)
extract(second from current_date)

H2
extract(year from current_date())
extract(month from current_date())
extract(day from current_date())
extract(hour from current_date())
extract(minute from current_date())
extract(second from current_date())

HSQLDB
extract(year from current_date)
extract(month from current_date)
extract(day from current_date)
extract(hour from current_date)
extract(minute from current_date)
extract(second from current_date)

MARIADB
extract(year from current_date())
extract(month from current_date())
extract(day from current_date())
extract(hour from current_date())
extract(minute from current_date())
extract(second from current_date())

MYSQL
extract(year from current_date())
extract(month from current_date())
extract(day from current_date())
extract(hour from current_date())
extract(minute from current_date())
extract(second from current_date())

POSTGRES
extract(year from current_date)
extract(month from current_date)
extract(day from current_date)
extract(hour from current_date)
extract(minute from current_date)
extract(second from current_date)

SQLITE
strftime('%Y', current_date)
strftime('%m', current_date)
strftime('%d', current_date)
strftime('%H', current_date)
strftime('%M', current_date)
strftime('%S', current_date)

Commercial databases

ACCESS
datepart('yyyy', date())
datepart('m', date())
datepart('d', date())
datepart('h', date())
datepart('n', date())
datepart('s', date())

ASE
datepart(yy, current_date())
datepart(mm, current_date())
datepart(dd, current_date())
datepart(hh, current_date())
datepart(mi, current_date())
datepart(ss, current_date())

DB2
year(current_date)
month(current_date)
day(current_date)
hour(current_date)
minute(current_date)
second(current_date)

HANA
extract(year from current_date)
extract(month from current_date)
extract(day from current_date)
extract(hour from current_date)
extract(minute from current_date)
extract(second from current_date)

INFORMIX
year(current year to day)
month(current year to day)
day(current year to day)
current year to day::datetime hour to hour::char(2)::int
current year to day::datetime minute to minute::char(2)::int
current year to day::datetime second to second::char(2)::int

INGRES
extract(year from current_date)
extract(month from current_date)
extract(day from current_date)
extract(hour from current_date)
extract(minute from current_date)
extract(second from current_date)

ORACLE (in jOOQ 3.5)
to_char(trunc(sysdate), 'YYYY')
to_char(trunc(sysdate), 'MM')
to_char(trunc(sysdate), 'DD')
to_char(trunc(sysdate), 'HH24')
to_char(trunc(sysdate), 'MI')
to_char(trunc(sysdate), 'SS')

ORACLE (in jOOQ 3.6)
extract(year from current_date)
extract(month from current_date)
extract(day from current_date)
extract(hour from cast(current_date as timestamp))
extract(minute from cast(current_date as timestamp))
extract(second from cast(current_date as timestamp))

SQLSERVER
datepart(yy, convert(date, current_timestamp))
datepart(mm, convert(date, current_timestamp))
datepart(dd, convert(date, current_timestamp))
datepart(hh, convert(date, current_timestamp))
datepart(mi, convert(date, current_timestamp))
datepart(ss, convert(date, current_timestamp))

SYBASE
datepart(yy, current date)
datepart(mm, current date)
datepart(dd, current date)
datepart(hh, current date)
datepart(mi, current date)
datepart(ss, current date)

Yes. The standard… If only it were implemented thoroughly…

jOOQ vs. Slick – Pros and Cons of Each Approach


Every framework introduces a new compromise. A compromise that is introduced because the framework makes some assumptions about how you’d like to interact with your software infrastructure.

An example of where this compromise has struck users recently is the discussion “Are Slick queries generally isomorphic to the SQL queries?“. And, of course, the answer is: No. What appears to be a simple Slick query:

val salesJoin = sales 
      join purchasers 
      join products 
      join suppliers on {
  case (((sale, purchaser), product), supplier) =>
    sale.productId === product.id &&
    sale.purchaserId === purchaser.id &&
    product.supplierId === supplier.id
}

… turns into a rather large monster with tons of derived tables that are totally unnecessary, given the original query (formatting is mine):

select x2.x3, x4.x5, x2.x6, x2.x7 
from (
    select x8.x9 as x10, 
           x8.x11 as x12, 
           x8.x13 as x14, 
           x8.x15 as x7, 
           x8.x16 as x17, 
           x8.x18 as x3, 
           x8.x19 as x20, 
           x21.x22 as x23, 
           x21.x24 as x25, 
           x21.x26 as x6 
    from (
        select x27.x28 as x9,
               x27.x29 as x11, 
               x27.x30 as x13, 
               x27.x31 as x15, 
               x32.x33 as x16, 
               x32.x34 as x18, 
               x32.x35 as x19 
        from (
            select x36."id" as x28, 
                   x36."purchaser_id" as x29, 
                   x36."product_id" as x30, 
                   x36."total" as x31 
            from "sale" x36
        ) x27 
        inner join (
            select x37."id" as x33, 
                   x37."name" as x34, 
                   x37."address" as x35 
	    from "purchaser" x37
        ) x32 
        on 1=1
    ) x8 
    inner join (
        select x38."id" as x22, 
               x38."supplier_id" as x24, 
               x38."name" as x26 
        from "product" x38
    ) x21
    on 1=1
) x2 
inner join (
    select x39."id" as x40, 
           x39."name" as x5, 
           x39."address" as x41 
    from "supplier" x39
) x4 
on ((x2.x14 = x2.x23) 
and (x2.x12 = x2.x17)) 
and (x2.x25 = x4.x40) 
where x2.x7 >= ?

Christopher Vogt, a former Slick maintainer and still actively involved member of the Slick community, explains the above in the following words:

This means that Slick relies on your database’s query optimizer to be able to execute the sql query that Slick produced efficiently. Currently that is not always the case in MySQL

One of the main ideas behind Slick, according to Christopher, is:

Slick is not a DSL that allows you to build exactly specified SQL strings. Slick’s Scala query translation allows for re-use and composition and using Scala as the language to write your queries. It does not allow you to predict the exact sql query, only the semantics and the rough structure.

Slick vs. jOOQ

Since Christopher later on also compared Slick with jOOQ, I allowed myself to chime in and to add my two cents:

From a high level (without actual Slick experience) I’d say that Slick and jOOQ embrace compositionality equally well. I’ve seen crazy queries of several 100s of lines of [jOOQ] SQL in customer code, composed over several methods. You can do that with both APIs.

On the other hand, as Chris said: Slick has a focus on Scala collections, jOOQ on SQL tables.

  • From a conceptual perspective (= in theory), this focus shouldn’t matter.
  • From a type safety perspective, Scala collections are easier to type-check than SQL tables and queries because SQL as a language itself is rather hard to type-check given that the semantics of various of the advanced SQL clauses alter type configurations rather implicitly (e.g. outer joins, grouping sets, pivot clauses, unions, group by, etc.).
  • From a practical perspective, SQL itself is only an approximation of the original relational theories and has attained a life of its own. This may or may not matter to you.

I guess in the end it really boils down to whether you want to reason about Scala collections (queries are better integrated / more idiomatic with your client code) or about SQL tables (queries are better integrated / more idiomatic with your database).

At this point, I’d like to add another two cents to the discussion. Customers don’t buy the product that you’re selling. They never do. In the case of Hibernate, customers and users were hoping to be able to forget SQL forever. The opposite is true. As Gavin King himself (the creator of Hibernate) had told me:

gavin-king

Because customers and users had never listened to Gavin (and to other ORM creators), we now have what many call the object-relational impedance mismatch. A lot of unjustified criticism has been expressed against Hibernate and JPA, APIs which are simply too popular for the limited scope they really cover.

With Slick (or C#’s LINQ, for that matter), a similar mismatch is impeding integrations, if users abuse these tools for what they believe to be a replacement for SQL. Slick does a great job at modelling the relational model directly in the Scala language. This is wonderful if you want to reason about relations just like you reason about collections. But it is not a SQL API. To illustrate how difficult it is to overcome these limitations, you can browse the issue tracker or user group to learn about:

We’ll simply call this:

The Functional-Relational Impedance Mismatch

SQL is much more

Markus Winand (the author of the popular SQL Performance Explained) has recently published a very good presentation about “modern SQL”, an idea that we fully embrace at jOOQ:

We believe that APIs that have been trying to hide the SQL language from general purpose languages like Java, Scala, C# are missing out on a lot of the very nice features that can add tremendous value to your application. jOOQ is an API that fully embraces the SQL language, with all its awesome features (and with all its quirks). You obviously may or may not agree with that.

We’ll leave this article open ended, hoping you’ll chime in to discuss the benefits and caveats of each approach. Of staying close to Scala vs. staying close to SQL.

As a small teaser, however, I’d like to announce a follow-up article showing that there is no such thing as an object-relational impedance mismatch. You (and your ORM) are just not using SQL correctly. Stay tuned!

Thou Shalt Not Name Thy Method “Equals”


(unless you really override Object.equals(), of course).

I’ve stumbled upon a rather curious Stack Overflow question by user Frank:

Why does Java’s Area#equals method not override Object#equals?

Interestingly, there is a Area.equals(Area) method which really takes an Area argument, instead of a Object argument as declared in Object.equals(). This leads to rather nasty behaviour, as discovered by Frank:

@org.junit.Test
public void testEquals() {
    java.awt.geom.Area a = new java.awt.geom.Area();
    java.awt.geom.Area b = new java.awt.geom.Area();
    assertTrue(a.equals(b)); // -> true

    java.lang.Object o = b;
    assertTrue(a.equals(o)); // -> false
}

Technically, it is correct for AWT’s Area to have been implemented this way (as hashCode() isn’t implemented either), but the way Java resolves methods, and the way programmers digest code that has been written like the above code, it is really a terrible idea to overload the equals method.

No static equals, either

These rules also hold true for static equals() methods, such as for instance Apache Commons Lang‘s

ObjectUtils.equals(Object o1, Object o2)

The confusion here arises by the fact that you cannot static-import this equals method:

import static org.apache.commons.lang.ObjectUtils.equals;

When you now type the following:

equals(obj1, obj2);

You will get a compiler error:

The method equals(Object) in the type Object is not applicable for the arguments (…, …)

The reason for this is that methods that are in the scope of the current class and its super types will always shadow anything that you import this way. The following doesn’t work either:

import static org.apache.commons.lang.ObjectUtils.defaultIfNull;

public class Test {
  void test() {
    defaultIfNull(null, null);
    // ^^ compilation error here
  }

  void defaultIfNull() {
  }
}

Details in this Stack Overflow question.

Conclusion

The conclusion is simple. never overload any of the methods declared in Object (overriding is fine, of course). This includes:

  • clone()
  • equals()
  • finalize()
  • getClass()
  • hashCode()
  • notify()
  • notifyAll()
  • toString()
  • wait()

Of course, it would be great if those methods weren’t declared in Object in the first place, but that ship has sailed 20 years ago.

jOOQ Tuesdays: Raoul-Gabriel Urma Explores What Java 8 and English have in Common


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

rgurma[1]

We have the pleasure of talking to Raoul-Gabriel Urma in this second edition who will be telling us a little about what Java 8 and English have in common.

Hi Raoul – You’re about to publish a very interesting PhD regarding programming language evolution. What is it about?

Natural languages such as English and Spanish have evolved over the years. However, in general the evolution take centuries. Consequently, society has had time to adapt.

Programming languages share a similar characteristic: they evolve. However, unlike natural languages, they evolve almost every year! This puts greater pressure on developers and existing code bases. For example, new features are introduced, idioms changes and sometime different language versions are backward incompatible.

As part of my dissertation I argue that developers need machine support focused on locating sources of backward incompatibilities and contribute various techniques for both static and dynamically typed languages.

I’m also co-organising a workshop on the topic as part of ECOOP if that’s of interest! http://2015.ecoop.org/track/PLE-2015-papers

Very interesting! Programming languages seem to be a favourite topic of yours. You’ve written an impressive book with great reviews on Amazon (4.7 out of 5!). What do you readers learn from it?

I’ve written a book called Java 8 in Action together with Mario Fusco and Alan Mycroft. We wanted to write a book which teaches how to benefit from the various new Java 8 features. However, we also wanted to write a book that the Java community will still be reading in five or ten years. This is why the book covers many topics on top of lambda expressions and the Streams API such as testing & debugging, refactoring, enhanced concurrency, functional programming concepts and even a bit of Scala!

And what was your biggest learning from writing the book?

It always takes more time than you think 😉

Being (blog) writers ourselves, we can certainly relate to that!

Java 8 with lambdas and streams is inevitable for all of us. From your earlier work experience at Google, Oracle, Ebay, Goldman Sachs, what do you expect will be the biggest challenges for adopters in the industry when upgrading to Java 8 and functional programming?

Java 8 makes Java sexy again. Unfortunately, a big part of software engineering is to deal with legacy frameworks and codebases, which may impede the adoption of Java 8 for various companies. Nonetheless, several recent surveys show that Java 8’s adoption is going strong.

From the various conference talks, discussions and training courses I’ve given so far, it is clear that the Java community is excited about using lambdas and the Streams API. Hopefully this means that more people will embrace the functional-style in the next 5 years :-).

At Data Geekery with jOOQ, we’ve been working to integrate SQL – a very powerful external domain-specific language – into Java, (ab)using the Java language to the extreme. From a Java / Java 8 perspective, what is your take on domain-specific languages in general, and on SQL in particular?

I think jOOQ is a fantastic project! I’m a believer of writing code that matches the business problem’s domain. It helps readabiliy and maintenance, and ultimately it’s useful for productivity.

You’ve also been busy co-founding startups next to writing books and PhDs. What is it that you love about your various jobs?

I love contributing to people’s lives and knowledge!

<PLUG>
If you are interested in Java 8 training, check out the course I’ve put together with Richard Warbuton and James Gough: http://java8training.com
</PLUG>

10 SQL Articles Everyone Must Read


We’ve been blogging about Java and SQL for a while now, on the jOOQ blog. Over the years, while researching interesting blog topics, we’ve discovered a lot of SQL gems in the blogosphere that have inspired our work and our passion for SQL.

Today, we’re presenting to you a list of 10 articles that we think you should absolutely read. At the end of the list, you will agree that either:

  • SQL is awesome
  • SQL is crazy

… or probably both. Here goes, in no particular order:

1. Joe Celko: “Divided We Stand: The SQL of Relational Division”

Relational division is a very powerful concept in relational algebra. It answers questions like:

Give me all the students that have completed a given set of courses

Unfortunately, division doesn’t have any direct equivalent in SQL. We wish there would be a clause like

TABLE_A
  DIVIDE BY TABLE_B 
  ON [ some predicate ]

Nonetheless, you can express a division in SQL in various ways.

Read Joe’s: “Divided We Stand: The SQL of Relational Division

2. Alex Bolenok: “Happy New Year!”

Alek Bolenok (a.k.a. Quassnoi) blogs about various interesting SQL-related things, but one of his top contributions every year are his “happy new year” series. Alek paints “beautiful” (beauty is in the eye of the beholder), and certainly impressive pictures into your SQL console. For instance:

Read Alek’s, “Christmas tree in SQL

3. Markus Winand: “Clustering Data: The Second Power of Indexing”

Markus Winand is the author of the popular book SQL Performance Explained, parts of which you can also read on his blog “Use The Index Luke“. There is an incredible amount of very useful knowledge both in the book and on this page, but one of the most revealing and neat SQL tricks is to know about “covering indexes”, “clustering indexes”, or “index only scans”

Read Markus’s: “Clustering Data: The Second Power of Indexing

4. Dimitri Fontaine: “Understanding Window Functions”

There was SQL before window functions and SQL after window functions

Window functions are some of the most powerful and underused features of SQL. They’re available in all commercial databases, in PostgreSQL, and soon also in Firebird 3.0. We’ve blogged about window functions a couple of times ourselves, but one of the best summaries and explanations about what they really are and how they work has been written by Dimitri Fontaine.

Read Dimitri’s: “Understanding Window Functions

5. Lukas Eder: “10 Common Mistakes Java Developers Make when Writing SQL”

A bit of advertising for our own writing. We’ve collected 10 of the most common mistakes that Java developers make when writing SQL. These mistakes are actually not even specific to Java developers, they could happen to any developer. This article has generated so much traction on our blog, there must be some great truth in it.

Read Lukas’s: “10 Common Mistakes Java Developers Make when Writing SQL

6. András Gábor’s “Techniques for Pagination in SQL”

Up until recently, offset pagination has been rather difficult to implement in commercial databases like Oracle, SQL Server, DB2, Sybase, which didn’t feature the equivalent of MySQL’s / PostgreSQL’s LIMIT .. OFFSET clause. Pagination could be emulated, however, and there are a lot of techniques for doing that. Picking the right technique is essential for performance. If you’re using Oracle 11g or less, you should filter on ROWNUM:

Read András’s “Techniques for Pagination in SQL

On a side-note, you should probably consider not using OFFSET at all. For details about the NO OFFSET movement, read…

7. Markus Windand: “We need tool support for keyset pagination”

If you think about OFFSET pagination, it’s actually a rather dumb thing from a technical perspective, and a useless thing from a business perspective. Here’s why.

From a technical perspective…

… you need to apply filtering, grouping, and ordering on a vast amount of data, skipping and throwing away all the data that appears before the offset until you reach the first row of interest. That is a lot of waste of resources given that…

From a business perspective…

… perhaps, pages 1-3 are interesting, but there is absolutely no meaning in offering users to navigate to page 1337. After a certain offset, the meaning of the offset from a business perspective has vanished. You might as well display random, unordered data samples. The user wouldn’t notice. Proably, when you reach a higher page on Google search results, this is exactly what happens. Random stuff.

Or on reddit. There, you get random stuff already on the first page – such as this popular display of a Stabilized head on green vine snake.

Much better than offset pagination is keyset pagination (which we’ve blogged about as well).

Read Markus’s “We need tool support for keyset pagination

no-offset-banner-468x60.white

8. Josh Berkus “Tag All The Things”

Implementing tagging in a relational database can be a beast from a performance perspective. Should you normalise (one-to-many)? Should you normalise heavily (many-to-many)? Should you use nested collections / arrays / or even JSON data structures?

Josh has written a very interesting write-up on the performance of heavy tagging in PostgreSQL, showing that normalisation isn’t always the best choice.

Read Josh’s “Tag All The Things

9. Alek Bolenok’s “10 things in SQL Server (which don’t work as expected)”

This is again Alek’s (Quassnoi’s) work. A very interesting set of things that happen inside of SQL Server, which you might not have expected when you’re used to using other databases. Whether you’re using SQL Server or not, this is a must-read to re-raise awareness of the subtle little differences between SQL implementations

Read Alek’s “10 things in SQL Server (which don’t work as expected)

10. Aaron Bertrand: “Best approaches for running totals”

Running totals are a very typical use-case for SQL-based reporting. A running total is something that every project manager using Excel knows intuitively how to do. Just drag-and-drop that sweet sweet formula across your spreadsheet and done:

excel-running-total

How to do the same in SQL? There are again tons of ways. Aaron Bertrand has summarised various solutions for SQL Server 2012.

Read Aaron’s “Best approaches for running totals

Many other articles

There are, of course, many other very good articles providing deep insight into useful SQL tricks. If you find you’ve encountered an article that would nicely complement this list, please leave a link and description in the comments section. Future readers will appreciate the additional insight.

“NoSQL Injection” – What 40000 Unsecured MongoDB Databases Mean for our Industry


The news is all over reddit

Major security alert as 40,000 MongoDB databases left unsecured on the internet

Security is a feature that is often neglected until it’s too late. And when it’s too late, it is often hard to bake it into a well-established architecture without major refactoring efforts.

Every system and thus also every database is always vulnerable. Most databases, however, do offer a significant amount of features to implement a security layer – and MongoDB is no different from any other DBMS here. So, how could this massive security hole happen?

Security is a cultural thing. Either, a company has security in their DNA, or it doesn’t. The same is true for scalability, or user experience, or any other aspect of software engineering. I’ve worked for companies that are at completely opposite ends of security awareness. Some (in the E-Banking field) were ultra-paranoiac, implementing thorough security checks in around 7 layers of the application. Others were rather lenient with management focusing much more on marketing than anything else. Without any empirical evidence, however, there was a certain correlation between security-awareness in a company and the backend-orientedness of the same company, E-Banking being a very backend-oriented business.

Backend developers are more security aware

This is an over-generalisation and probably doesn’t do justice to many excellent frontend developers out there, but security is where the data is. Where the algorithms are. Where people reason about constraints, workflows, batch jobs, accounting, money, … algorithms. These folks focus on all the users. On the system. And they want to protect it. On the flip side, they might neglect usability.

There is only little security-awareness where the user experience is. Where people reason about layout, formatting, usability, style, … user interfaces. These folks focus on single users. On their experience. And they want to make things easy for the user.

(and again, the same is true for scalability)

It is no coincidence that backend technology evolves extremely slowly. Java: 20 years and we’ve just finally gotten lambdas. SQL: 30 years and we still don’t have easy ways to reuse code.

At the same time, frontend technology evolves at the “speed of reddit”. The next hype is just 100 karma away, and we’ll throw all the previous tech out of the window, just to be part of the game.

Clearly, security is something that has to be reasoned about way too thoroughly for it to survive in the fast-paced frontend world.

What does MongoDB have to do with it?

The current event isn’t actually directly related to MongoDB (you could probably find just as many unprotected MySQL instances out there). But it strongly correlates with MongoDB’s sales and marketing strategies. MongoDB has done very aggressive and successful marketing in the past, claiming that the reign of the RDBMS is over – just as much as the reign of the RDBMS had been over before, when the astonishing object databases surfaced this planet. Well, we all know where object or XML databases went:

d8938bef47ea2f62ed0543dd9e35a483

This time, the anti-RDBMS marketing resonated mostly with frontend developers, obviously, because JSON is their favourite data representation format, and MongoDB promised to be able to store data directly from the DOM into the DB. Not only did this mean “the end of the DBA” for some software vendors, but many vendors also hoped that they could omit operations, and perhaps even backend development. What obviously worked well for prototyping and simple applications doesn’t scale well to applications with sensitive data.

The Solution

The solution is obvious. Homogeneity kills your business. You should hire a variety of different types of personnel. You should have skilled frontend developers, backend developers, operations people, DBA, and security experts on your team. You should make them work all together, hear each of their opinions, review each others’ code, learn from each other. Because each one of them has a strong focus and interest on an entirely different, yet equally important aspect of your application.

Do not neglect any of these aspects. Because if you do, and if it’s security, and if you lose sensitive customer data – well, you’re not going to stay in business, you’ll be sued in court.

Got hooked on the security topic?

Continue reading about …