jOOQ Tuesdays: Gerald Sangudi and Keshav Murthy Reveal the Secrets of N1QL (SQL on JSON)

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Gerald Sangudi and Keshav Murthy, creators of the the N1QL query language, CouchBase’s SQL-based JSON querying language.

Hi Gerald and Keshav – It’s great to have two people on the jOOQ Tuesdays interview at once! How did you guys meet?

Keshav: Gerald interviewed me for my the job at Couchbase.

Gerald: Best decision we ever made.

You’re both working at CouchBase, one of the leading “JSON databases” in the market. What drove you towards JSON?

Keshav: I worked for IBM, specifically on the Informix database. In 2013, Websphere teams needed JSON support. I was skeptical of JSON for data management except for managing metadata.  Years before, I had successfully opposed implementing XML and XQuery inside Informix. Then I went to a NoSQL conference where I saw enterprises using JSON in real-world applications.  I saw enterprises like eBay, Cisco developing their enterprise applications on NoSQL and JSON.

RDBMS customers had been asking for ALTER TABLE ONLINE when they needed to add new columns. They typically would have additional unused  columns which they’d rename them when needed.  Obviously, this wasn’t an optimal or a good solution because you could run out these columns, types or structure wasn’t flexible.

For years, customers had been ALTER TABLE while application were online.  I realized, JSON was a good way to provide it. That’s when we added JSON as a type, added sharding, and started to extend SQL to support JSON.  A good relational database can deliver lot of value JSON.  So, we developed a mechanism to exploit JSON to provide schema flexibility on relational tables and we received a patent for it: http://bit.ly/2og1uXe.

At Couchbase, data is stored as JSON and N1QL was designed as SQL for JSON.  So, we keep calm and JSON.

Gerald: I am not at Couchbase any more, but I was there for four years. Couchbase has its roots in memcached and CouchDB, so Couchbase was already storing and managing JSON data before N1QL and before we arrived. N1QL was developed to enable our users to query and manipulate their JSON data.

You’re both working on, or have worked on N1QL, CouchBase’s innovative SQL-esque query language for JSON. Has the SQL language come full circle? Why is the SQL syntax such a great fit for you?

Keshav: Don Chamberlin, co-inventor of SQL said this. During XQuery discussion, he argued against using SQL for manipulating XML. Now, with N1QL and SQL++ on JSON, he sees enormous possibilities for SQL to manipulate JSON effectively.  So, you could say, it has come a full circle.

From the Application point of view, requirement for SQL to support complex data models has been there.  SQL-99 added the structured type into the language, but didn’t recognize need for schema flexibility.

What Gerald has very nicely is to inherit SQL and  extend not just the language but the
underlying boolean logic.  

SQL has three valued logic: TRUE, FALSE, NULL.  N1QL has 4-valued logic: TRUE, FALSE, NULL and MISSING.  SQL has the select-join-project operations. N1QL has those and adds NEST and UNNEST operations for handling arrays.  Once we have the logic and operations, the type system, rest of the expressions for handling nested objects and arrays can be added.

There is another important change in N1QL compared to SQL.  SQL  is the query language to manipulate data.  N1QL can discover the document metadata (names, structure and types) dynamically and operate on it.

Gerald: SQL is the greatest and most successful query language of all time. Our job was to enable our users to query data that is sometimes different from what standard SQL expects. We hope N1QL does that.

As an aside, the “N1” in N1QL stands for non-first normal form. SQL geeks like you Lukas may know that “non-first” is one primary difference between JSON data and relational data. The other primary differences are schema and uniformity.

Will your relational competition steal features from N1QL? Or will you steal more features from them?

Keshav: I do hope relational databases steal features from N1QL. Having common approaches solve problems makes it easier for customers to choose the right database for the right problem. I do hope they choose Couchbase more often than RDBMS!

Gerald always says, we don’t differ from SQL unless there is a good reason. In that sense, we’ve taken lot of the features from relational databases already.  In addition, we learn from successful models in relational databases for things like index design, query optimization, security and monitoring.  We stand on shoulders of giants.

Gerald: We hope that both relational and non-relational vendors steal features from each other. There is a collaborative effort on something called SQL++, which is a superset of both SQL and N1QL. We hope SQL++ is the convergence point. One lesson from the success of SQL is that standards are great for both users and vendors.

CouchBase doesn’t have what you call a “static schema”. The biggest advantage of a static schema for the database optimiser is the many ways such a schema can be used to predict performance and choose optimal execution plans. How does optimisation work in a “schema-less” database?

Keshav: Actually, static schema gives you the structure, but not the data distribution, which is a major factor for calculating the execution cost. N1QL uses the information within the query and available indexes to glean the structure and decide on the plan. For example, if you have a query with a predicate:  WHERE state = “CA” and zipcode = “94040”, and there is an index on either state, zipcode or both, we’d assume these key-values exist in the document and push down the predicates to index scan.  We given further details on an article on DZone:  https://dzone.com/articles/a-deep-dive-into-couchbase-n1ql-query-optimization

Separately, we do have a mechanism to INFER the schema by sampling.  Right now, we show the inferred schema so users can understand the structure.  We also use the inferred schema to make query editing easier with hints and validations within the workbench.  We do have plans to use this, collect additional statistics to improve decisions in the optimizer.

Gerald: The N1QL optimizer is one of the joys of working on N1QL. As Keshav said, most of the SQL optimization techniques carry over to N1QL. The data may not have a static schema, but the query has an implicit schema, and the indexes have implicit schemas. That is, the query has predicates and other characteristics, and each index has keys and other characteristics. That is enough to keep an optimizer busy.

Working for a newer database vendor, you don’t have as much legacy, so you can innovate faster and more freely. What will be the next big thing in the database market?

Keshav: Our metric for innovation is the progress customer trying to make in their business. So, we innovate within the constraint of a customer job.  That helps us to innovate and measure its success from customer point of view.

We see customers deploying NoSQL databases for newer patterns like systems of engagement.

When you plan your vacation, you search a lot before you buy.  Search for places, hotels, airlines, things-to-do.  Then you compare costs, ratings, before you make the final purchase. For a travel company, these require significant database infrastructure to support high number of queries with low latencies and at a very low cost.

We see customers deploying NoSQL databases to support lot of the customer engagement and information requirement use cases.  RDBMS is still used as system of record for buy-sell-cancel-checkin-etc operations but will integrate with system of engagement databases to enhance customer experience.

Doing this effectively requires innovations in every area: data platform designs, scale out, query processing, index designs and manageability.

Gerald: I read Jeff Bezos’ latest annual letter to shareholders.  He says that the things that don’t change are more important than the things that do change. Databases should store your data reliably and give you answers and updates quickly. If database vendors continue to do that, we’ll be ok.

Using Kotlin’s Apply Function for Dynamic SQL with jOOQ

It was hard to limit ourselves to 10 Nice Examples of Writing SQL in Kotlin With jOOQ, recently, because the Kotlin language has many nice little features that really help a lot when working with Java libraries. We’ve talked about the nice with() stdlib function, which allows to “import” a namespace for a local scope or closure:

with (AUTHOR) {
    ctx.select(FIRST_NAME, LAST_NAME)
       .from(AUTHOR)
       .where(ID.lt(5))
       .orderBy(ID)
       .fetch {
           println("${it[FIRST_NAME]} ${it[LAST_NAME]}")
       }
}

In the above example, the AUTHOR table is made available as the this reference in the closure following the with function, which works exactly like JavaScript’s with(). Everything in AUTHOR is available, without dereferencing it from AUTHOR.

Apply is very similar

A very similar feature is made available through apply(), although with different syntactic implications. Check out this Stack Overflow question for some details about with() vs. apply() in Kotlin.

When using jOOQ, apply() is most useful for dynamic SQL. Imagine you have local variables indicating whether some parts of a query should be added to the query:

val filtering = true;
val joining = true;

These boolean variables would be evaluated dynamically, of course. filtering specifies whether a dynamic filter / where clause is needed, whereas joining specifies whether an additional JOIN is required.

So, the following query will select authors, and:

  • if “filtering”, we’re selecting only author ID = 1
  • if “joining”, we’ll join the books table and count the number of books per author

Both of these predicates are independent. Enter the game: apply():

ctx.select(
      a.FIRST_NAME, 
      a.LAST_NAME, 
      if (joining) count() else value(""))
   .from(a)
   .apply { if (filtering) where(a.ID.eq(1)) }
   .apply { if (joining) join(b).on(a.ID.eq(b.AUTHOR_ID)) }
   .apply { if (joining) groupBy(a.FIRST_NAME, a.LAST_NAME) }
   .orderBy(a.ID)
   .fetch {
       println(it[a.FIRST_NAME] + " " + 
               it[a.LAST_NAME] +
               (if (joining) " " + it[count()] else ""))
   }

That’s neat! See, the jOOQ API doesn’t specify any apply() method / function, yet you can chain the apply() function to the jOOQ API as if it were natively supported.

Like with(), apply() makes a reference available to a closure as this, so it doesn’t have to be referenced explicitly anymore. Which means, we can write neat things like

   .apply { if (filtering) where(a.ID.eq(1)) }

Where a where() clause is added only if we’re filtering!

Of course, jOOQ (or any other query builder) lends itself to this kind of dynamic SQL, and it can be done in Java too:
https://www.jooq.org/doc/latest/manual/sql-building/dynamic-sql

But the Kotlin-specific fluent integration using apply() is exceptionally neat. Well done, Kotlin!

Side-note

This only works because the jOOQ DSL API of jOOQ 3.x is mutable and every operation returns the same this reference as was kindly pointed out by Ilya Ryzhenkov

In the future (e.g. version 4.0), we’re planning on making the jOOQ API more immutable – mutability is a historic legacy (although, often, it’s the desired behaviour for a query builder).

More nice Kotlin/jOOQ tricks in this article here.

How to Use SQL INTERSECT to Work Around SQL’s NULL Logic

ANOTHER SQL Post this week? I got nerd-sniped:

Oooooh, challenge accepted!

So, let’s assume we have a table T with columns (A, B, C) like this:

WITH t(a, b, c) AS (
  SELECT 'a', 'b', null FROM dual UNION ALL
  SELECT 'a', null, 'c' FROM dual UNION ALL
  SELECT 'a', 'b', 'c'  FROM dual
)
SELECT * FROM t

As expected, this yields:

A       B       C
-----------------
a       b
a               c
a       b       c

Truly exciting.

Now we want to find all those rows that “match” either ('a', 'b', NULL) or ('a', NULL, 'b'). Clearly, this should produce the first two rows, right?

A       B       C
-----------------
a       b
a               c

Yes. Now the canonical solution would be to tediously write out the entire predicate as such:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE (a = 'a' AND b = 'b' AND c IS NULL)
OR (a = 'a' AND b IS NULL AND c = 'c')

That’s really boring. Sure, we could have factored out the first, common predicate:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE a = 'a' AND (
     (b = 'b' AND c IS NULL)
  OR (b IS NULL AND c = 'c')
)

That’s certainly better from a performance perspective, but Rafael had a nifty idea. Let’s use row value expressions (tuples) in our predicates:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE (a, b, c) IN (('a', 'b', NULL), ('a', NULL, 'c'))

Unfortunately this doesn’t yield any results, because nothing is equal to NULL in SQL (not even NULL itself). The above query is the same as this one:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE (a = 'a' AND b = 'b' AND c = NULL /* oops */)
OR (a = 'a' AND b = NULL /* oops */ AND c = 'c')

D’oh.

Solutions

The lame one

The canonical solution then would be a really lame (but perfectly valid) one. Encode NULL to be some “impossible” string value. Rafael suggested yolo. Fair enough.

This works:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE (a, NVL(b, 'yolo'), NVL(c, 'yolo')) 
  IN (('a', 'b', 'yolo'), ('a', 'yolo', 'c'))

All we have to do now is to always remember the term yolo which means “NULL, but not NULL thank you SQL”

The hipster one

But wait! SQL is painstakingly inconsistent when it comes to NULL. See, NULL really means UNKNOWN in three-valued logic, and this means, we never know if SQL abides to its own rules.

Come in INTERSECT. Like UNION or EXCEPT (MINUS) in Oracle, as well as SELECT DISTINCT, these set operations handle two NULL values as NOT DISTINCT. Yes, they’re not equal but also not distinct. Whatever. Just remember: That’s how it is 🙂

So, we can write this hipster solution to Rafael’s problem:

WITH t(a, b, c) AS (...)
SELECT *
FROM t
WHERE EXISTS (
  SELECT a, b, c FROM dual
  INTERSECT (
    SELECT 'a', 'b', null FROM dual
    UNION ALL
    SELECT 'a', null, 'c' FROM dual
  )
)

We create an intersection of the tuple (a, b, c), the left side of Rafael’s IN predicate, and the desired values on the right side of the IN predicate, and we’re done.

Clearly less tedious than writing the original predicates, right? (We won’t look into performance this time)

Cheers, and a happy weekend.

How to Find Redundant Indexes in SQL

The following two indexes are redundant in most SQL databases:

CREATE INDEX i_actor_1 ON actor (last_name);
CREATE INDEX i_actor_2 ON actor (last_name, first_name);

It is usually safe to drop the first index, because all queries that query the LAST_NAME column only can still profit from the second index I_ACTOR_2. The reason being that LAST_NAME is the first column of the composite index I_ACTOR_2 (it would be a different story, if it weren’t the first column).

Note: It is usually safe to drop the first index, because the benefits probably outweigh the cost:

Benefits of dropping

Costs of dropping

  • Querying a composite index can be slightly slower as can be seen in the below benchmark

Let’s see the costs of dropping the index below for Oracle, PostgreSQL, and SQL Server in this particular case (beware as always when interpreting benchmarks, they heavily depend on a lot of context, especially data size!)

Oracle

Preparation:

CREATE TABLE t (
  a NUMBER(10) NOT NULL,
  b NUMBER(10) NOT NULL
);

INSERT INTO t (a, b)
SELECT level, level
FROM dual
CONNECT BY level <= 100000;

CREATE INDEX i1 ON t(a);
CREATE INDEX i2 ON t(a, b);

EXEC dbms_stats.gather_table_stats('TEST', 'T');

Benchmark:

SET SERVEROUTPUT ON
CREATE TABLE results (
  run     NUMBER(2),
  stmt    NUMBER(2),
  elapsed NUMBER
);

DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 2000;
BEGIN

  -- Repeat benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT /*+INDEX(t i1)*/ * FROM t WHERE a = 1
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, 1, 
      SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT /*+INDEX(t i2)*/ * FROM t WHERE a = 1
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, 2, 
      SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
  END LOOP;
  
  FOR rec IN (
    SELECT 
      run, stmt, 
      CAST(elapsed / MIN(elapsed) OVER() AS NUMBER(10, 5)) ratio 
    FROM results
  )
  LOOP
    dbms_output.put_line('Run ' || rec.run || 
      ', Statement ' || rec.stmt || 
      ' : ' || rec.ratio);
  END LOOP;
END;
/

DROP TABLE results;

The result being:

Run 1, Statement 1 : 1.4797
Run 1, Statement 2 : 1.45545

Run 2, Statement 1 : 1.1997
Run 2, Statement 2 : 1.01121

Run 3, Statement 1 : 1.13606
Run 3, Statement 2 : 1

Run 4, Statement 1 : 1.13455
Run 4, Statement 2 : 1.00242

Run 5, Statement 1 : 1.13303
Run 5, Statement 2 : 1.00606

Some notes on benchmarks here.

The fastest query execution in the above result yields 1, the other executions are multiples of 1. Yes, there’s a 10% difference in this case, so as you can see. The benefits (faster insertions) certainly should outweight the cost (slower queries), so, don’t apply this advice in a read-heavy / write-rarely database.

PostgreSQL

A similar difference can be seen in a PostgreSQL benchmark. No hints can be used to choose indexes, so we’re simply creating two tables:

CREATE TABLE t1 (
  a INT NOT NULL,
  b INT NOT NULL
);
CREATE TABLE t2 (
  a INT NOT NULL,
  b INT NOT NULL
);

INSERT INTO t1 (a, b)
SELECT s, s
FROM generate_series(1, 100000) AS s(s);

INSERT INTO t2 (a, b)
SELECT s, s
FROM generate_series(1, 100000) AS s(s);

CREATE INDEX i1 ON t1(a);
CREATE INDEX i2 ON t2(a, b);

ANALYZE t1;
ANALYZE t2;

Benchmark:

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 10000;
  rec RECORD;
BEGIN

  -- Repeat benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT * FROM t1 WHERE a = 1
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 1: %', r, 
      (clock_timestamp() - v_ts); 
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT * FROM t2 WHERE a = 1
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 2: %', r, 
      (clock_timestamp() - v_ts); 
    RAISE INFO '';
  END LOOP;
END$$;

Result:

INFO:  Run 1, Statement 1: 00:00:00.071891
INFO:  Run 1, Statement 2: 00:00:00.080833

INFO:  Run 2, Statement 1: 00:00:00.076329
INFO:  Run 2, Statement 2: 00:00:00.079772

INFO:  Run 3, Statement 1: 00:00:00.073137
INFO:  Run 3, Statement 2: 00:00:00.079483

INFO:  Run 4, Statement 1: 00:00:00.073456
INFO:  Run 4, Statement 2: 00:00:00.081508

INFO:  Run 5, Statement 1: 00:00:00.077148
INFO:  Run 5, Statement 2: 00:00:00.083535

SQL Server

Preparation:

CREATE TABLE t (
  a INT NOT NULL,
  b INT NOT NULL
);

WITH s(s) AS (
  SELECT 1
  UNION ALL
  SELECT s + 1 FROM s WHERE s < 100
)
INSERT INTO t
SELECT TOP 100000 
  row_number() over(ORDER BY (SELECT 1)), 
  row_number() over(ORDER BY (select 1)) 
FROM s AS s1, s AS s2, s AS s3;

CREATE INDEX i1 ON t(a);
CREATE INDEX i2 ON t(a, b);

UPDATE STATISTICS t;

Benchmark:

DECLARE @ts DATETIME;
DECLARE @repeat INT = 2000;
DECLARE @r INT;
DECLARE @i INT;
DECLARE @dummy VARCHAR;

DECLARE @s1 CURSOR;
DECLARE @s2 CURSOR;

DECLARE @results TABLE (
  run     INT,
  stmt    INT,
  elapsed DECIMAL
);

SET @r = 0;
WHILE @r < 5
BEGIN
  SET @r = @r + 1

  SET @s1 = CURSOR FOR 
    SELECT b FROM t WITH (INDEX (i1)) WHERE a = 1;

  SET @s2 = CURSOR FOR 
    SELECT b FROM t WITH (INDEX (i2)) WHERE a = 1;

  SET @ts = current_timestamp;
  SET @i = 0;
  WHILE @i < @repeat
  BEGIN
    SET @i = @i + 1

    OPEN @s1;
    FETCH NEXT FROM @s1 INTO @dummy;
    WHILE @@FETCH_STATUS = 0
    BEGIN
      FETCH NEXT FROM @s1 INTO @dummy;
    END;

    CLOSE @s1;
  END;

  DEALLOCATE @s1;
  INSERT INTO @results 
    VALUES (@r, 1, DATEDIFF(ms, @ts, current_timestamp));

  SET @ts = current_timestamp;
  SET @i = 0;
  WHILE @i < @repeat
  BEGIN
    SET @i = @i + 1

    OPEN @s2;
    FETCH NEXT FROM @s2 INTO @dummy;
    WHILE @@FETCH_STATUS = 0
    BEGIN
      FETCH NEXT FROM @s2 INTO @dummy;
    END;

    CLOSE @s2;
  END;

  DEALLOCATE @s2;
  INSERT INTO @results 
    VALUES (@r, 2, DATEDIFF(ms, @ts, current_timestamp));
END;

SELECT 'Run ' + CAST(run AS VARCHAR) + 
  ', Statement ' + CAST(stmt AS VARCHAR) + ': ' + 
  CAST(CAST(elapsed / MIN(elapsed) OVER() AS DECIMAL(10, 5)) AS VARCHAR)
FROM @results;

Result:

Run 1, Statement 1: 1.22368
Run 1, Statement 1: 1.09211

Run 2, Statement 1: 1.05263
Run 2, Statement 1: 1.09211

Run 3, Statement 1: 1.00000
Run 3, Statement 1: 1.05263

Run 4, Statement 1: 1.05263
Run 4, Statement 1: 1.00000

Run 5, Statement 1: 1.09211
Run 5, Statement 1: 1.05263

As can be seen, predictably, in all databases the smaller non-composite index is slightly faster for this type of query than the composite index. In this particular benchmark, this is specifically true because the composite index acts as a covering index.

Yet both indexes can be used for the query in a reasonable way, so if disk space / insertion speed is an issue, the redundant single-column index can be dropped.

How to find such indexes

The following query will help you detect such indexes in Oracle, PostgreSQL, and SQL Server:

Oracle

WITH indexes AS (
  SELECT
    i.owner,
    i.index_name,
    i.table_name,
    listagg(c.column_name, ', ')
      WITHIN GROUP (ORDER BY c.column_position)
      AS columns
  FROM all_indexes i
  JOIN all_ind_columns c
    ON i.owner = c.index_owner
    AND i.index_name = c.index_name
  GROUP BY i.owner, i.table_name, i.index_name, i.leaf_blocks
)
SELECT
  i.owner,
  i.table_name,
  i.index_name AS "Deletion candidate index",
  i.columns AS "Deletion candidate columns",
  j.index_name AS "Existing index",
  j.columns AS "Existing columns"
FROM indexes i
JOIN indexes j
  ON i.owner = j.owner
  AND i.table_name = j.table_name
  AND j.columns LIKE i.columns || ',%'

Result:

TABLE_NAME   delete index   columns   existing index   columns
-------------------------------------------------------------------------
T            I1             A         I2               A, B

In short, it lists all the indexes whose columns are a prefix of another index’s columns

PostgreSQL

Get ready for a really nifty query. Here’s how to discover redundant indexes in PostgreSQL, which unfortunately doesn’t seem to have an easy, out-of-the-box dictionary view to discover index columns:

WITH indexes AS (
  SELECT 
    tnsp.nspname AS schema_name,
    trel.relname AS table_name,
    irel.relname AS index_name,
    string_agg(a.attname, ', ' ORDER BY c.ordinality) AS columns
  FROM pg_index AS i
  JOIN pg_class AS trel ON trel.oid = i.indrelid
  JOIN pg_namespace AS tnsp ON trel.relnamespace = tnsp.oid
  JOIN pg_class AS irel ON irel.oid = i.indexrelid
  JOIN pg_attribute AS a ON trel.oid = a.attrelid
  JOIN LATERAL unnest(i.indkey) 
    WITH ORDINALITY AS c(colnum, ordinality)
      ON a.attnum = c.colnum
  GROUP BY i, tnsp.nspname, trel.relname, irel.relname
)
SELECT
  i.table_name,
  i.index_name AS "Deletion candidate index",
  i.columns AS "Deletion candidate columns",
  j.index_name AS "Existing index",
  j.columns AS "Existing columns"
FROM indexes i
JOIN indexes j
  ON i.schema_name = j.schema_name
  AND i.table_name = j.table_name
  AND j.columns LIKE i.columns || ',%';

This is a really nice case of lateral unnesting with ordinality, which you should definitely add to your PostgreSQL tool chain.

SQL Server

Now, SQL Server doesn’t have a nice STRING_AGG function (yet), but we can work around this using STUFF and XML to get the same query.

Of course, there are other solutions using recursive SQL, but I’m too lazy to translate the simple string pattern-matching approach to something recursive.

WITH 
  i AS (
    SELECT 
	  s.name AS schema_name,
      t.name AS table_name,
      i.name AS index_name,
      c.name AS column_name,
      ic.index_column_id
    FROM sys.indexes i 
    JOIN sys.index_columns ic 
      ON i.object_id = ic.object_id 
      AND i.index_id = ic.index_id 
    JOIN sys.columns c 
      ON ic.object_id = c.object_id 
      AND ic.column_id = c.column_id 
    JOIN sys.tables t 
      ON i.object_id = t.object_id 
	JOIN sys.schemas s
	  ON t.schema_id = s.schema_id
  ),
  indexes AS (
    SELECT
	  schema_name,
      table_name,
      index_name,
      STUFF((
        SELECT ',' + j.column_name 
        FROM i j
        WHERE i.table_name = j.table_name 
        AND i.index_name = j.index_name 
        FOR XML PATH('') -- Yay, XML in SQL!
      ), 1, 1, '') columns
    FROM i
    GROUP BY schema_name, table_name, index_name
  )
SELECT
  i.schema_name,
  i.table_name,
  i.index_name AS "Deletion candidate index",
  i.columns AS "Deletion candidate columns",
  j.index_name AS "Existing index",
  j.columns AS "Existing columns"
FROM indexes i
JOIN indexes j
  ON i.schema_name = j.schema_name
  AND i.table_name = j.table_name
  AND j.columns LIKE i.columns + ',%';

Conclusion

Now go run the above query on your production database and… Very carefully and reasonably think about whether you really want to drop those indexes 😉