Various Meanings of SQL’s PARTITION BY Syntax

For SQL beginners, there's a bit of an esoteric syntax named PARTITION BY, which appears all over the place in SQL. It always has a similar meaning, though in quite different contexts. The meaning is similar to that of GROUP BY, namely to group/partition data sets by some grouping/partitioning criteria. For example, when querying the … Continue reading Various Meanings of SQL’s PARTITION BY Syntax

Calculating Pagination Metadata Without Extra Roundtrips in SQL

When paginating results in SQL, we use standard SQL OFFSET .. FETCH or a vendor specific version of it, such as LIMIT .. OFFSET. For example: SELECT first_name, last_name FROM actor ORDER BY actor_id OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY As always, we're using the Sakila database for this example. This is rather … Continue reading Calculating Pagination Metadata Without Extra Roundtrips in SQL

Implementing a generic REDUCE aggregate function with SQL

So, @rotnroll666 nerd sniped me again. Apparently, the Neo4j Cypher query language supports arbitrary reductions, just like any functional collection API, oh say, the JDK Stream API: Stream.of(2, 4, 3, 1, 6, 5) .reduce((i, j) -> i * j) .ifPresent(System.out::println); // Prints 720 SQL doesn't have this, yet it would be very useful to be … Continue reading Implementing a generic REDUCE aggregate function with SQL

Using DISTINCT ON in Non-PostgreSQL Databases

A nice little gem in PostgreSQL's SQL syntax is the DISTINCT ON clause, which is as powerful as it is esoteric. In a previous post, we've blogged about some caveats to think of when DISTINCT and ORDER BY are used together. The bigger picture can be seen in our article about the logical order of … Continue reading Using DISTINCT ON in Non-PostgreSQL Databases

Using IGNORE NULLS With SQL Window Functions to Fill Gaps

I found a very interesting SQL question on Twitter recently: https://twitter.com/vikkiarul/status/1120669222672261120 Rephrasing the question: We have a set of sparse data points: +------------+-------+ | VALUE_DATE | VALUE | +------------+-------+ | 2019-01-01 | 100 | | 2019-01-02 | 120 | | 2019-01-05 | 125 | | 2019-01-06 | 128 | | 2019-01-10 | 130 | +------------+-------+ … Continue reading Using IGNORE NULLS With SQL Window Functions to Fill Gaps

How to Calculate a Cumulative Percentage in SQL

A fun report to write is to calculate a cumulative percentage. For example, when querying the Sakila database, we might want to calculate the percentage of our total revenue at any given date. The result might look like this: Notice the beautifully generated data. Or as raw data: payment_date |amount |percentage -------------|--------|---------- 2005-05-24 |29.92 |0.04 … Continue reading How to Calculate a Cumulative Percentage in SQL

How to Emulate PERCENTILE_DISC in MySQL and Other RDBMS

In my previous article, I showed what the very useful percentile functions (also known as inverse distribution functions) can be used for. Unfortunately, these functions are not ubiquitously available in SQL dialects. As of jOOQ 3.11, they are known to work in these dialects: DialectAs aggregate functionAs window functionMariaDB 10.3.3NoYesOracle 18cYesYesPostgreSQL 11YesNoSQL Server 2017NoYesTeradata 16YesNo … Continue reading How to Emulate PERCENTILE_DISC in MySQL and Other RDBMS

Writing Custom Aggregate Functions in SQL Just Like a Java 8 Stream Collector

All SQL databases support the standard aggregate functions COUNT(), SUM(), AVG(), MIN(), MAX(). Some databases support other aggregate functions, like: EVERY() STDDEV_POP() STDDEV_SAMP() VAR_POP() VAR_SAMP() ARRAY_AGG() STRING_AGG() But what if you want to roll your own? Java 8 Stream Collector When using Java 8 streams, we can easily roll our own aggregate function (i.e. a … Continue reading Writing Custom Aggregate Functions in SQL Just Like a Java 8 Stream Collector

How to Reduce Syntactic Overhead Using the SQL WINDOW Clause

SQL is a verbose language, and one of the most verbose features are window functions. In a stack overflow question that I've encountered recently, someone asked to calculate the difference between the first and the last value in a time series for any given day: Input volume tstamp --------------------------- 29011 2012-12-28 09:00:00 28701 2012-12-28 10:00:00 … Continue reading How to Reduce Syntactic Overhead Using the SQL WINDOW Clause

Find the Next Non-NULL Row in a Series With SQL

I've stumbled across this fun SQL question on reddit, recently. The question was looking at a time series of data points where some events happened. For each event, we have the start time and the end time timestamp start end ----------------------------------- 2018-09-03 07:00:00 1 null 2018-09-03 08:00:00 null null 2018-09-03 09:00:00 null null 2018-09-03 10:00:00 … Continue reading Find the Next Non-NULL Row in a Series With SQL