A Guide to SQL Naming Conventions

One of Java’s big strengths, in my opinion, is the fact that most naming conventions have been established by the creators of the language. For example:

  • Class names are in PascalCase
  • Member names are in camelCase
  • Constants are in SNAKE_CASE

If someone does not adhere to these conventions, the resulting code quickly looks non-idiomatic.

What about SQL?

SQL is different. While some people claim UPPER CASE IS FASTEST:

Others do not agree on the “correct” case:

There seems to be a tendency towards writing identifiers in lower case, with no agreement on the case of keywords. Also, in most dialects, people prefer snake_case for identifiers, although in SQL Server, people seem to prefer PascalCase or camelCase.

That’s for style. And I’d love to hear your opinion on style and naming conventions in the comments!

What about naming conventions?

In many languages, naming conventions (of identifiers) is not really relevant, because the way the language designs namespacing, there is relatively little risk for conflict. In SQL, this is a bit different. Most SQL databases support only a 3-4 layered set of namespaces:

  1. Catalog
  2. Schema
  3. Table (or procedure, type)
  4. Column (or parameter, attribute)

Some dialect dependent caveats:

  • While SQL Server supports both catalog AND schema, most dialects only support one of them
  • MySQL treats the catalog (“database”) as the schema
  • Oracle supports a package namespace for procedures, between schema and procedure

In any case, there is no such concept as package (“schema”) hierarchies as there is in languages like Java, which makes namespacing in SQL quite tricky. A problem that can easily happen when writing stored procedures:

FUNCTION get_name (id NUMBER) IS
  result NUMBER;
BEGIN
  SELECT name
  INTO result
  FROM customer
  WHERE id = id; -- Ehm...

  RETURN result;
END;

As can be seen above, both the CUSTOMER.ID column as well as the GET_NAME.ID parameter could be resolved by the unqualified ID expression. This is easy to work around, but a tedious problem to think of all the time.

Another example is when joining tables, which probably have duplicate column names:

SELECT *
FROM customer c
JOIN address a ON c.id = a.customer_id

This query might produce two ambiguous ID columns: CUSTOMER.ID and ADDRESS.ID. In the SQL language, it is mostly easy to distinguish between them by qualifying them. But in clients (e.g. Java), they are less easy to qualify properly. If we put the query in a view, it gets even trickier.

“Hungarian notation”

Hence, SQL and the procedural languages are a rare case where some type of Hungarian notation could be useful. Unlike with hungarian notation itself, where the data type is encoded in the name, in this case, we might encode some other piece of information in the name. Here’s a list of rules I’ve found very useful in the past:

1. Prefixing objects by semantic type

Tables, views, and other “tabular things” may quickly conflict with each other. Especially in Oracle, where one does not simply create a schema because of all the security hassles this produces (schemas and users are kinda the same thing, which is nuts of course. A schema should exist completely independently from a user), it may be useful to encode a schema in the object name:

  • C_TABLE: The C_ prefix denotes “customer data”, e.g. as opposed to:
  • S_TABLE: The S_ prefix denotes “system data” or “master data”
  • L_TABLE: The L_ prefix denotes “log data”
  • V_VIEW: The V_ prefix denotes a view
  • P_PARAMETER: The P_ prefix denotes a procedure or function parameter
  • L_VARIABLE: The L_ prefix denotes a local variable

Besides, when using views for security and access control, one might have additional prefixes or suffixes to denote the style of view:

  • _R: The _R suffix denotes read only views
  • _W: The _W suffix denotes writeable (updatable) views

This list is obviously incomplete. I’m undecided whether this is necessarily a good thing in general. For example, should packages, procedures, sequences, constraints be prefixed as well? Often, they do not lead to ambiguities in namespace resolution. But sometimes they do. The importance, as always, is to be consistent with a ruleset. So, once this practice is embraced, it should be applied everywhere.

2. Singular or plural table names

Who cares. Just pick one and use it consistently.

3. Establishing standard aliasing

Another technique that I’ve found very useful in the past is a standard approach to aliasing things. We need to alias tables all the time, e.g. in queries like this:

SELECT *
FROM customer c
JOIN address a ON c.id = a.customer_id

But what if we have to join ACCOUNT as well? We already used A for ADDRESS, so we cannot reuse A. But if we don’t re-use the same aliases in every query, the queries start to be a bit confusing to read.

We could just not use aliases and always fully qualify all identifiers:

SELECT *
FROM customer
JOIN address ON customer.id = address.customer_id

But that quickly turns out to be verbose, especially with longer table names, so also not very readable. The standard approach to aliasing things I’ve found very useful is to use this simple algorithm that produces 4 letter aliases for every table. Given the Sakila database, we could establish:

PREFIX TABLE NAME
ACTO ACTOR
ADDR ADDRESS
CATE CATEGORY
CITY CITY
COUN COUNTRY
CUST CUSTOMER
FILM FILM
FIAC FILM_ACTOR
FICA FILM_CATEGORY
FITE FILM_TEXT
INVE INVENTORY
LANG LANGUAGE
PAYM PAYMENT
RENT RENTAL
STAF STAFF
STOR STORE

The algorithm to shorten a table name is simple:

  • If the name does not contain an underscore, take the four first letters, e.g CUSTOMER becomes CUST
  • If the name contains 1 underscore, take the first two letters of each word, e.g. FILM_ACTOR becomes FIAC
  • If the name contains 2 underscores, take the first two letters of the first word, and the first letter of the other words, e.g. FILM_CATEGORY_DETAILS becomes FICD
  • If the name contains 3 or more underscores, take the first letter of each word
  • If a new abbreviation causes a conflict with the existing ones, make a pragmatic choice

This technique worked well for large-ish schemas with 500+ tables. You’d think that abbreviations like FICD are meaningless, and indeed, they are, at first. But once you start writing a ton of SQL against this schema, you start “learning” the abbreviations, and they become meaningful.

What’s more, you can use these abbreviations everywhere, not just when writing joins:

SELECT 
  cust.first_name,
  cust.last_name,
  addr.city
FROM customer cust
JOIN address addr ON cust.id = addr.customer_id

But also when aliasing columns in views or derived tables:

SELECT 
  cust.first_name AS cust_first_name,
  cust.last_name AS cust_last_name,
  addr.city AS addr_city
FROM customer cust
JOIN address addr ON cust.id = addr.customer_id

This becomes invaluable when your queries become more complex (say, 20-30 joins) and you start projecting tons of columns in a library of views that select from other views that select from other views. It’s easy to keep consistent, and you can also easily recognise things like:

  • What table a given column originates from
  • If that column has an index you can use (on a query against the view!)
  • If two columns that look the same (e.g. FIRST_NAME) really are the same

I think that if you work with views extensively (I’ve worked with schemas of 1000+ views in the past), then such a naming convention is almost mandatory.

Conclusion

There isn’t really a “correct” way to name things in any language, including SQL. But given the limitations of some SQL dialects, or the fact that after joining, two names may easily conflict, I’ve found the above two tools very useful in the past: 1) Prefixing identifiers with a hint about their object types, 2) Establishing a standard for aliasing tables, and always alias column names accordingly.

When you’re using a code generator like jOOQ’s, the generated column names on views will already include the table name as a prefix, so you can easily “see” what you’re querying.

I’m curious about your own naming conventions, looking forward to your comments in the comment section!

The dreaded DefaultAbstractHelperImpl

A while ago, we have published this fun game we like to call Spring API Bingo. It is a tribute and flattery to Spring’s immense creativeness when forming meaningful class names like

  • FactoryAdvisorAdapterHandlerLoader
  • ContainerPreTranslatorInfoDisposable
  • BeanFactoryDestinationResolver
  • LocalPersistenceManagerFactoryBean

Two of the above classes actually exist. Can you spot them? If no, play Spring API Bingo!

Clearly, the Spring API suffers from having…

To name things

There are only two hard problems in computer science. Cache invalidation, naming things, and off-by-one errors

– Tim Bray quoting Phil Karlton

There are a couple of these prefixes or suffixes that are just hard to get rid of in Java software. Consider this recent discussion on Twitter, that inevitably lead to an (very) interesting discussion:

Yes, the Impl suffix is an interesting topic. Why do we have it, and why do we keep naming things that way?

Specification vs. body

Java is a quirky language. At the time it was invented, object orientation was a hot topic. But procedural languages had interesting features as well. One very interesting language at the time was Ada (and also PL/SQL, which was largely derived from Ada). Ada (like PL/SQL) reasonably organises procedures and functions in packages, which come in two flavours: specification and body. From the wikipedia example:

-- Specification
package Example is
  procedure Print_and_Increment (j: in out Number);
end Example;

-- Body
package body Example is
 
  procedure Print_and_Increment (j: in out Number) is
  begin
    -- [...]
  end Print_and_Increment;
 
begin
  -- [...]
end Example;

You always have to do this, and the two things are named exactly the same: Example. And they’re stored in two different files called Example.ads (ad for Ada and s for specification) and Example.adb (b for body). PL/SQL followed suit and names package files Example.pks and Example.pkb with pk for Package.

Java went a different way mainly because of polymorphism and because of the way classes work:

  • Classes are both specification AND body in one
  • Interfaces cannot be named the same as their implementing classes (mostly, because there are many implementations, of course)

In particular, classes can be a hybrid of spec-only, with a partial body (when they’re abstract), and full spec and body (when they’re concrete).

How this translates to naming in Java

Not everyone appreciates clean separation of specs and body, and this can certainly be debated. But when you’re in that Ada-esque mind set, then you probably want one interface for every class, at least wherever API is exposed. We’re doing the same for jOOQ, where we have established the following policy to name things:

*Impl

All implementations (bodies) that are in a 1:1 relationship with a corresponding interface are suffixed Impl. If ever possible, we try to keep those implementations package-private and thus sealed in the org.jooq.impl package. Examples are:

This strict naming scheme makes it immediately clear, which one is the interface (and thus public API), and which one is the implementation. We wish Java were more like Ada with this respect, but we have polymorphism, which is great, and…

Abstract*

… and it leads to reusing code in base classes. As we all know, common base classes should (almost) always be abstract. Simply because they’re most often incomplete implementations (bodies) of their corresponding specification. Thus, we have a lot of partial implementations that are also in a 1:1 relationship with a corresponding interface, and we prefix them with Abstract. Most often, these partial implementations are also package-private and sealed in the org.jooq.impl package. Examples are:

In particular, ResultQuery is an interface that extends Query, and thus AbstractResultQuery is a partial implementation that extends the AbstractQuery, which is also a partial implementation.

Having partial implementations makes perfect sense in our API, because our API is an internal DSL (Domain-Specific Language) and thus has thousands of methods that are always the same, no matter what the concrete Field really does – e.g. Substring

Default*

We do everything API related with interfaces. This has proven highly effective already in popular Java SE APIs, such as:

  • Collections
  • Streams
  • JDBC
  • DOM

We also do everything SPI (Service Provider Interface) related with interfaces. There is one essential difference between APIs and SPIs in terms of API evolution:

  • APIs are consumed by users, hardly implemented
  • SPIs are implemented by users, hardly consumed

If you’re not developing the JDK (and thus don’t have completely mad backwards-compatibility rules), you’re probably mostly safe adding new methods to API interfaces. In fact, we do so in every minor release as we do not expect anyone to implement our DSL (who’d want to implement Field‘s 286 methods, or DSL‘s 677 methods. That’s mad!)

But SPIs are different. Whenever you provide your user with SPIs, such as anything suffixed *Listener or *Provider, you can’t just simply add new methods to them – at least not prior to Java 8, as that would break implementations, and there are many of them.

Well. We still do it, because we don’t have those JDK backwards-compatibility rules. We have more relaxed ones. But we suggest our users do not implement the interfaces directly themselves, but extend a Default implementation instead, which is empty. For instance ExecuteListener and the corresponding DefaultExecuteListener:

public interface ExecuteListener {
    void start(ExecuteContext ctx);
    void renderStart(ExecuteContext ctx);
    // [...]
}

public class DefaultExecuteListener
implements ExecuteListener {

    @Override
    public void start(ExecuteContext ctx) {}

    @Override
    public void renderStart(ExecuteContext ctx) {}

    // [...]
}

So, Default* is a prefix that is commonly used to provide a single public implementation that API consumers can use and instantiate, or SPI implementors can extend – without risking backwards-compatibility issues. It’s pretty much a workaround for Java 6 / 7’s lack of interface default methods, which is why the prefix naming is even more appropriate.

Java 8 Version of this rule

In fact, this practice makes it evident that a “good” rule to specify Java-8 compatible SPIs is to use interfaces and to make all methods default with an empty body. If jOOQ didn’t support Java 6, we’d probably specify our ExecuteListener like this:

public interface ExecuteListener {
    default void start(ExecuteContext ctx) {}
    default void renderStart(ExecuteContext ctx) {}
    // [...]
}

*Utils or *Helper

OK, so here’s one for the mock/testing/coverage experts and aficionados out there.

It’s TOTALLY OK to have a “dump” for all sorts of static utility methods.

Please. Don’t be “that guy”! :-)

So, there are various techniques of identifying utility classes. Ideally, you take a naming convention and then stick to it. E.g. *Utils.

From our perspective, ideally you’d even just dump all utility methods that are not stricly bound to a very specific domain in a single class, because frankly, when did you last appreciate having to go through millions of classes to find that utility method? Never. We have org.jooq.impl.Utils. Why? Because it’ll allow you to do:

import static org.jooq.impl.Utils.*;

This then almost feels as if you had something like “top-level functions” throughout your application. “global” functions. Which we think is a nice thing. And we totally don’t buy the “we can’t mock this” argument, so don’t even try starting a discussion

Discussion

… or, in fact, let’s do start a discussion. What are your techniques, and why? Here are a couple of reactions to Tom Bujok’s original Tweet, to help get you started:

Let’s go ;-)