10 Tips on How to be a Great Programmer

I was recently asked in an interview about my opinion on how to be a great programmer. That’s an interesting question, and I think we can all be great programmers, regardless of our talent, if we follow a couple of rules that – I believe – should be common sense. In fact, these rules don’t all apply to programmers only, but to any professional.

Not everything in this list is meant to be taken entirely serious, some things are just my opinion and yours may vary, and not all descriptions of programmer personalities match real-world situations I may have encountered, so if in doubt, please do not take offense. I didn’t mean you 🙂

Here they are:

1. Learn how to ask questions

There are essentially these types of programmers asking questions:

  • The perfectionist: Especially when asking a question about some open source tool, they may have already debugged through the code and found the real cause of their problem. But even if not, the perfectionist will write an introduction to the problem, steps to reproduce, potentially a suggested workaround, and as I said, potentially a suggested fix. In fact, the perfectionist doesn’t have questions. Only answers.
  • The chatterbox: This person will not really ask questions. Rather, they arrange their thoughts publicly and occasionally put a rhetorical question mark here and there. What may appear to be a question is really just a stream of thoughts and if you wait with the answer, they may either find the answer themselves, or ask the real question in email number 37. Or not. Or maybe, if we try it this way. You know what? It turned out that the requirement is completely wrong, I solved it with some other technique. Oh, actually, I changed libraries entirely. Hehe. No more questions.
  • The slacker: Here’s the code. What’s wrong? Halp plz.
  • The manager: For this type, time is money. The questions must be short and the answers ASAP. There’s something ironic about this approach, because by keeping the questions short (read: incomplete, not concise), most often, a lot of important details are missing, which only leads to requests for details. Then, the manager (naturally being disappointed because the reply is not an answer but a new question) will send yet again a short message and demand an answer with even more urgency. This can go back and forth for quite a while. In the end, it may take 1-2 weeks until what could’ve been the answer will actually be answered.
  • The complainer: This person doesn’t ask questions. They complain. Until the question goes away. Perhaps. If it doesn’t all the better. More reasons to complain.

By now, it should be clear that a well prepared question (concise, simple, short, yet with enough details) will yield much better answers. If you know what exactly you want to learn with your question, chances are, you’ll get exactly what you wanted.

2. Learn how to avoid asking questions

In fact, mostly, it’s better to try to avoid asking questions. Perhaps you can figure it out yourself? Not always, of course. Many things, you simply cannot know and by asking a domain expert, you’ll find the quickest and most efficient path to success. But often, trying something yourself has these benefits:

  • You learn it the “hard way” which is the way that sticks to our memory much better – we’ll remember what we’ve learned
  • It’s more rewarding to figure out stuff yourself
  • You don’t create “noise”. Remember the chatterbox? Unless the person you’re asking the question(s) is routinely answering questions (and thus postponing their answer), they may not see through your stream of thoughts and try to answer each individual incomplete “question”. That doesn’t help anyone.
  • By postponing a question (for a while, at least), you can gather more relevant information that you can provide to someone who might be able to answer your question. Think of the “perfectionist”. Spend more time looking for details first, then answer the question.
  • You’ll get better at asking questions by training to ask good questions. And that takes a bit more time.

3. Don’t leave broken windows

There was a very interesting article on reddit, recently about not leaving broken windows. The essence of the article is to never compromise on quality. To never become a slacker. To never leave … broken windows. Here’s a quote from the article

When we take certain shortcuts to deliver something in the shortest period of time, and our code reflects how careless we’ve been, developers coming after us (from the same team, a future team, and even ourselves!), will reach an important conclusion: it’s not important to pay enough attention to the code we produce. Little by little our application will start deteriorating in what it becomes an unstoppable process.

This isn’t about being a perfectionist, in fact. Sometimes, fixing broken windows can be postponed (and must be, of course). But often times, by allowing windows to break and stay broken, we’re introducing a situation where no one is happy. Not us the programmers, not our customers, not our users, not our project managers. This is an attitude thing and thus at the core of being professional. How did Benjamin Franklin say?

The bitterness of poor quality remains long after the sweetness of low price is forgotten

That’s true for everything. In this case, “low price” is the quick win we might get by implementing something in a sloppy way.

4. Software ought to be deterministic. Aim for it!

In an ideal world, everything in software is “deterministic”. We’d all be functional programmers, writing side-effect free, pure functions. Like String.contains(). No matter how many times you execute the following:

assertTrue("abcde".contains("bc"));

… the result is always the same, expected result. The universe and all it’s statefulness will have absolutely no impact on this computation. It’s deterministic.

We can do that, too in our own programs, not just in standard libraries. We can try to write side-effect free, deterministic modules for as often as possible. This isn’t really a matter of what technology we choose. Deterministic programming can be done in any language – even if functional languages have more tools to prevent accidental side effects through more sophisticated type systems. But the example I’ve shown is a Java example. Object orientation permits determinism. Heck, procedural languages like PL/SQL allow for determinism. It’s a requirement for a function to be deterministic, if it is to be used in an index:

CREATE INDEX upper_first_name ON customer (upper (first_name));
-- Deterministic function here: -----------^^^^^^^^^^^^^^^^^^

So again, this is a matter of discipline. You could see a side-effectful procedure / method / “function” as a “broken window”. Perhaps it was easier to maintain a side-effect, hoping that eventually, it can be removed. But that’s usually a lie. The price will be paid later, when non-determinism suddenly strikes. And it will.

5. Expect the unexpected

That previous link is key. Murphy’s Law is something we programmers should observe all the time. Everything can break. And it will. Come on, as software engineers, we know it will break. Because our world is not deterministic, neither are the business requirements that we’re implementing. We can implement tip #4 (determinism) only until it is no longer possible. From then on, we’ll inevitably enter the world of non-determinism (the “real world”), where stuff will go wrong. So, aim for it. Expect the unexpected. Train your inner pessimist to foresee all kinds of things.

How to write that pessimistic code in a concise manner is another story, of course. And how to distinguish from things that will fail (and need to be dealt with) from things that might fail (and don’t need to be dealt with), that takes some practice 🙂

6. Never cargo cult. Never follow dogma. Always embrace: “It depends”

Everything you’ve been taught is potentially wrong. Including (or probably: especially) when someone really popular says it. Here’s a nice quote:

Our profession is full of hypocrisy. We like to think of ourselves as mathematicians, where only the purest of ideas persist, and they’re necessarily correct.

That’s wrong. Our profession is built on top of mathematics, but unless you go into the funky world of category theory or relational algebra (and even then, I doubt everything is “correct”), you’re in the pragmatic world of real-world business requirements. And that’s, quite frankly, very far from perfect. Let’s look at some of the most popular programming languages out there:

  • C
  • Java
  • SQL

Do you really think that these languages resemble mathematics in the least bit? If so, let’s discuss segmentation faults, Java generics, or SQL three-valued logic. These languages are platforms built by pragmatists. There is some really cool theoretical background in all of them, but ultimately, they had to do the job.

The same is true for everything built on top of languages. Libraries. Frameworks. Heck, even architectures. Design patterns. Nothing is right or wrong. Everything is a tool that was designed for some context. Think of the tool in its context. Never think of the tool as a standalone raison d’être. We’re not doing art for art’s sake.

So, say no to unquestioned:

  • XML
  • JSON
  • Functional programming
  • Object oriented programming
  • Design patterns
  • Microservices
  • Three tier architectures
  • DDD
  • TDD
  • In fact: *DD

All of these are nice tools given a certain context. They don’t always hold true. By staying curious and thinking out of the box, you’ll be a better programmer and know when to use which one of these tools.

7. Do it

True. There are luminaries out there who outperform everyone.

But most programmers are simply “good”. Or they have the potential to be “good”. How to be a good programmer? By doing it. Great software wasn’t written in a day, and popular people aren’t the only heroes of our time. I’ve met many great programmers no one knows about, because they lead private lives, solve private problems of small companies.

But great programmers all have one thing in common: They just do it. They practice. They work every day to get better and better.

Want to get better at SQL? Do it! Try to write a SQL statement with some new features in them, every day. Use window functions. Grouping sets. Recursion. Partitioned outer join. The MODEL and/or MATCH_RECOGNIZE clauses. It doesn’t have to ship to production every time, but the practice will be worth it.

8. Focus on one subject (in the long run)

There might be only very few “good” full stack developers out there. Most full stack developers will rather be mediocre at everything. Sure, a small team might need only a few of them and they can cover a lot of business logic to quickly bootstrap a piece of new software. But everything will be quite sloppy and “kinda works”. Perhaps that’s good enough for the minimum viable product phase, but in the long run, there will be more complex problems that a full stack developer will not have the time to properly analyse (or foresee!).

Focus on one subject mostly, and get really good at that. A specialist will always be in demand as long as that specialist’s niche exists, and many niches will outlive us all (hello COBOL or SQL). So, do your career a favour and do something really good, rather than many things “just ok”.

9. Play

While you should mostly focus on one subject, you shouldn’t forget the other subjects completely. You may never be really really really good at all of SQL, scaling up, scaling out, low level performance, CSS (who’s good at that anyway!?), object orientation, requirements engineering, architecture, etc. all at once (see tip #8). It’s just not possible.

But you should at least understand the essence of each of these. You should understand when SQL is the right choice (and when it isn’t). When low level performance tuning is important (and when it isn’t). How CSS works in principle. The advantage of object orientation, FP, etc.

You should spend some time to play with these (and many more) concepts, technologies to better understand why they’re important. To know when to apply them, and then perhaps to find an expert to actually execute the work.

By playing around with new paradigms and technologies, you’ll open up your mind to entirely different ways of thinking, and chances are, you’ll be able to use that in your every day work in one way or another.

10. Keep it simple, stupid

Albert Einstein said:

Everything should be made as simple as possible, but no simpler

No one is able to handle huge complexity. Not in software, not in any other aspect of life. Complexity is the killer of good software and thus simplicity is the enabler. Easy to understand. Hard to implement. Simplicity is something that takes a lot of time and practice to recognise and produce. There are many rules you can follow, of course.

One of the simplest rules is to have methods/functions with only a few parameters. Let’s look at that. Surely, the aforementioned String.contains() method qualifies. We can write "abcde".contains("bcd") and without reading any documentation, everyone will immediately understand what this does and why. The method does one thing and one thing only. There are no complicated context/settings/other arguments that can be passed to the method. There are no “special cases”, nor any caveats.

Again, producing simplicity in a library is much easier than doing that in business logic. Can we achieve it? Perhaps. By practicing. By refactoring. But like great software, simplicity is not built in a day.

(Protip: Conway’s Law applies. It is utterly impossible to write good, simple software in an environment where the business is super complex. Either, you like complexity and ugly legacy, or you better get out of that business).

The Open-Closed Principle is Often Not What You Think it Is

jOOQ is a library that loves making everything internal final and package private. We have tons of classes like these:

final class Concat extends AbstractFunction<String> {
    // ...
}

The class implements the semantics of SQL string concatenation. Clearly, you shouldn’t need to tamper with it (or even know about it), because it is “protected” behind the corresponding public API in the DSL class:

// You can see this:
public class DSL {

    // You can see this but not override it:
    public static Field<String> concat(Field<?>... fields) {

        // But you cannot do this, yourself:
        return new Concat(nullSafe(fields));
    }
}

Now, in the past decades, there have been a lot of software design movements that were contrary to the concept of encapsulation in some ways. The driving powers of that were:

A fun to read example of “slightly” (i.e. completely) exaggerated advocacy of extreme application of object orientation is Yegor Bugayenko’s blog:

http://www.yegor256.com

Through exaggeration, he makes some really interesting points that make you think. Of course, you have to be able to accept the hyperboles as non-facts. Not everyone can do that, so don’t get angry reading 😉

Let’s look at the open-closed principle

The open-closed principle claims, according to Wikipedia:

In object-oriented programming, the open/closed principle states “software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification”; that is, such an entity can allow its behaviour to be extended without modifying its source code.

This is a very desireable aspect of some software entities. For instance, it is always true for an SPI (Service Provider Interface), by design, of course. Let’s read the Wikipedia definition of an SPI:

Service Provider Interface (SPI) is an API intended to be implemented or extended by a third party. It can be used to enable framework extension and replaceable components

Perfect. For instance, a jOOQ Converter is a SPI. We’ve just published a recent post about how to use the Converter API in a strategy pattern style with lambdas – the strategy pattern works really well with SPIs.

In fact, the strategy pattern isn’t even strictly an object oriented feature, you can get it for free in functional programming without giving it a fancy name. It’s just any ordinary higher order function.

Another fine example of what could be considered an SPI is an Iterable. While Iterable subtypes like List are more often used as APIs (user is the consumer) rather than SPIs (user is the implementor), the Iterable API itself is more of a way of providing the functionality required to run code inside of a foreach loop. For instance, jOOQ’s ResultQuery implements Iterable, which allows it to be used in a foreach loop:

for (MyTableRecord rec : DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)) { // Automatic execution, fetching
 
    doThingsWithRecord(rec);
}

So, clearly, it can be said that:

  • Iterable follows the open-closed principle as it models an entity that is open for extension (I can produce my own iterable semantics), but closed for modification (I won’t ever modify the Java compiler and/or the foreach loop semantics
  • The Liskov substitution principle is also followed trivially, as the foreach loop doesn’t care at all about how I implement my Iterable, as long as it behaves like one (providing an Iterator)

That was easy

But when does it not apply?

In a lot of situations. For instance, jOOQ is in many ways not designed for object oriented extension. You simply should not:

  • Mock the concat() function.
    You might be tempted to do so, as you might think that you need to unit test everything, including third party libraries, and then you have to mock out the string concatenation feature inside of your database. But it doesn’t work. The DSL.concat() method is static, and the implementation hidden. No way you could replace it with ordinary means (there are some dirty tricks).

    But hold on for a second. Why are you even doing this? Aren’t integration tests the better way here? Do you really have time (and want to spend it) on replacing entire complex implementations with your mocks? I don’t think so. That hardly ever works

  • Modify the concatenation behaviour for some use-case.
    While you may think that sometimes, you’d just like to tweak an implementation a little bit to get a quick win, that is certainly not the intent of the authors of the open-closed principle or the Lishkov substitution principle. We as API designers don’t want you to extend all of our functionality. As simple as that. Why? Because we want you to get in touch with us to help us improve our software for everyone, rather than you tweaking something for a quick win.

Let this sink in – especially the latter.

The premise that everything should be object oriented and everything should be extensible is wrong. Object orientation (and all the philosophies connected to it) are a tool. They’re a very powerful tool, for instance, when we as API/SPI designers want to allow users to extend our software. (mostly through SPIs). And we spend a lot of time thinking about really good, generic, useful, powerful SPIs that solve 99% of all extensibility problems in a way that we can control and keep backwards compatible. For some examples, check out these blog posts:

And sometimes, yes, we did not foresee a justified request for extensibility. Nothing is perfect. You have a feature request, and you cannot implement it right away. Then you start exploring. You look into ways how you can inject some behaviour into jOOQ. And as we Java developers like object orientation, we’re looking into writing subclasses to override existing behaviour. That’s what we were taught. That’s what we’re doing all the time. That’s what the combination of the open-closed principle and the Liskov substitution principle suggest.

Let me shock you for a moment.

Haskell (and many other languages) doesn’t support subtype polymorphism

Yes. There are entire ecosystems out there, that don’t have the luxury of bikeshedding the fact that if a class cannot be (easily) extended through subtype polymorphism and overriding of methods, it must be ill-designed. An entire ecosystem that never worries about something being final, and thus “closed for extension” (through subtype polymorphism).

Alternative definitions

Given the historic context, both principles are very interesting things. But their object-oriented context is something we should free our minds of. Here’s a better definition:

  • open-closed principle:
    Systems should strive for openness for extension, but not at any price. Some parts of a system / module / perhaps class should be open for extension. Those parts should be very well designed and kept very backwards compatible. And the vendor of those parts should listen to its consumers to better identify the required extension points. Consumers on the other hand shouldn’t blindly assume that everything can be extended. If they’re extending (through unexpected subtype polymorphism) random parts, then they’re hacking in the same way as if they would be actually modifying the system / parts. There’s no more benefit to extending.
  • Liskov substitution principle:
    Subtype polymorphism is just a tool, and in 2017, we have long started understanding that it’s a very wrong tool for many things. The composition over inheritance concept has shown that we’ve regretted the subtype polymorphism hype from the 90s. So, forget about your mocks through subtype overriding. Start looking for alternative interpretations of this principle. I like Jessica Kerr’s finding:

    Therefore, the Liskov Substition Principle says, “Don’t surprise people.”

    That’s a much better credo to follow, than the one that is strictly related to an aspect of object orientation and in particular to subtype polymorphism.

Conclusion

Yes. Package private, final classes mean, you cannot extend them. The open-closed principle is “violated”. Because that part of the system was not designed for you to know about (it’s encapsulated).

Sometimes, you think that if just you could override such an entity, you might get a quick win and inject your desired behaviour into a third party library / entity / class / module / system. My claim here is that: Mostly, you’ll deeply regret your desire for a quick win later on. You shouldn’t argue about open-closed or Liskov substitution. These principles simply don’t apply here. They do not at all, in particular, apply to badly designed legacy software. Once software is “badly designed”, no principles will help you.

Instead, do get in touch with the vendor if you run into a bump. There’s always an interesting idea for a great new feature hidden in such a limitation. And for the time being, accept that your overriding of what was not meant to be overridden is just the same thing as actually modifying that entity. You’re patching the library. Let’s do that and move on.

A Nice API Design Gem: Strategy Pattern With Lambdas

With Java 8 lambdas being available to us as a programming tool, there is a “new” and elegant way of constructing objects. I put “new” in quotes, because it’s not new. It used to be called the strategy pattern, but as I’ve written on this blog before, many GoF patterns will no longer be implemented in their classic OO way, now that we have lambdas.

A simple example from jOOQ

jOOQ knows a simple type called Converter. It’s a simple SPI, which allows users to implement custom data types and inject data type conversion into jOOQ’s type system. The interface looks like this:

public interface Converter<T, U> {
    U from(T databaseObject);
    T to(U userObject);
    Class<T> fromType();
    Class<U> toType();
}

Users will have to implement 4 methods:

  • Conversion from a database (JDBC) type T to the user type U
  • Conversion from the user type U to the database (JDBC) type T
  • Two methods providing a Class reference, to work around generic type erasure

Now, an implementation that converts hex strings (database) to integers (user type):

public class HexConverter implements Converter<String, Integer> {

    @Override
    public Integer from(String hexString) {
        return hexString == null 
            ? null 
            : Integer.parseInt(hexString, 16);
    }

    @Override
    public String to(Integer number) {
        return number == null 
            ? null 
            : Integer.toHexString(number);
    }

    @Override
    public Class<String> fromType() {
        return String.class;
    }

    @Override
    public Class<Integer> toType() {
        return Integer.class;
    }
}

That wasn’t difficult to write, but it’s quite boring to write this much boilerplate:

  • Why do we need to give this class a name?
  • Why do we need to override methods?
  • Why do we need to handle nulls ourselves?

Now, we could write some object oriented libraries, e.g. abstract base classes that take care at least of the fromType() and toType() methods, but much better: The API designer can provide a “constructor API”, which allows users to provide “strategies”, which is just a fancy name for “function”. One function (i.e. lambda) for each of the four methods. For example:

public interface Converter<T, U> {
    ...

    static <T, U> Converter<T, U> of(
        Class<T> fromType,
        Class<U> toType,
        Function<? super T, ? extends U> from,
        Function<? super U, ? extends T> to
    ) {
        return new Converter<T, U>() { ... boring code here ... }
    }

    static <T, U> Converter<T, U> ofNullable(
        Class<T> fromType,
        Class<U> toType,
        Function<? super T, ? extends U> from,
        Function<? super U, ? extends T> to
    ) {
        return of(
            fromType,
            toType,

            // Boring null handling code here
            t -> t == null ? null : from.apply(t),
            u -> u == null ? null : to.apply(u)
        );
    }
}

From now on, we can easily write converters in a functional way. For example, our HexConverter would become:

Converter<String, Integer> converter =
Converter.ofNullable(
    String.class,
    Integer.class,
    s -> Integer.parseInt(s, 16),
    Integer::toHexString
);

Wow! This is really nice, isn’t it? This is the pure essence of what it means to write a Converter. No more overriding, null handling, type juggling, just the bidirectional conversion logic.

Other examples

A more famous example is the JDK 8 Collector.of() constructor, without which it would be much more tedious to implement a collector. For example, if we want to find the second largest element in a stream… easy!

for (int i : Stream.of(1, 8, 3, 5, 6, 2, 4, 7)
                   .collect(Collector.of(
    () -> new int[] { Integer.MIN_VALUE, Integer.MIN_VALUE },
    (a, t) -> {
        if (a[0] < t) {
            a[1] = a[0];
            a[0] = t;
        }
        else if (a[1] < t)
            a[1] = t;
    },
    (a1, a2) -> {
        throw new UnsupportedOperationException(
            "Say no to parallel streams");
    }
)))
    System.out.println(i);

Run this, and you get:

8
7

Bonus exercise: Make the collector parallel capable by implementing the combiner correctly. In a sequential-only scenario, we don’t need it (until we do, of course…).

Conclusion

The concrete examples are nice examples of API usage, but the key message is this:

If you have an interface of the form:

interface MyInterface {
    void myMethod1();
    String myMethod2();
    void myMethod3(String value);
    String myMethod4(String value);
}

Then, just add a convenience constructor to the interface, accepting Java 8 functional interfaces like this:

// You write this boring stuff
interface MyInterface {
    static MyInterface of(
        Runnable function1,
        Supplier<String> function2,
        Consumer<String> function3,
        Function<String, String> function4
    ) {
        return new MyInterface() {
            @Override
            public void myMethod1() {
                function1.run();
            }

            @Override
            public String myMethod2() {
                return function2.get();
            }

            @Override
            public void myMethod3(String value) {
                function3.accept(value);
            }

            @Override
            public String myMethod4(String value) {
                return function4.apply(value);
            }
        }
    }
}

As an API designer, you write this boilerplate only once. And your users can then easily write things like these:

// Your users write this awesome stuff
MyInterface.of(
    () -> { ... },
    () -> "hello",
    v -> { ... },
    v -> "world"
);

Easy! And your users will love you forever for this.

Should I Implement the Arcane Iterator.remove() Method? Yes You (Probably) Should

An interesting question was asked on reddit’s /r/java recently:

Should Iterators be used to modify a custom Collection?

Paraphrasing the question: The author wondered whether a custom java.util.Iterator that is returned from a mutable Collection.iterator() method should implement the weird Iterator.remove() method.

A totally understandable question.

What does Iterator.remove() do?

Few people ever use this method at all. For instance, if you want to implement a generic way to remove null values from an arbitrary Collection, this would be the most generic approach:

Collection<Integer> collection =
Stream.of(1, 2, null, 3, 4, null, 5, 6)
      .collect(Collectors.toCollection(ArrayList::new));

System.out.println(collection);

Iterator<Integer> it = collection.iterator();
while (it.hasNext())
    if (it.next() == null)
        it.remove();

System.out.println(collection);

The above program will print:

[1, 2, null, 3, 4, null, 5, 6]
[1, 2, 3, 4, 5, 6]

Somehow, this API usage does feel dirty. An Iterator seems to be useful to … well … iterate its backing collection. It’s really weird that it also allows for modifying it. It’s even weirder that it only offers removal. E.g. we cannot add a new element before or after the current element of iteration, or replace it.

Luckily, Java 8 provides us with a much better method on the Collection API directly, namely Collection.removeIf(Predicate).

The above iteration code can be re-written as such:

collection.removeIf(Objects::isNull);

OK, now should I implement remove() on my own iterators?

Yes, you should – if your custom collection is mutable. For a very simple reason. Check out the default implementation of Collection.removeIf():

default boolean removeIf(Predicate<? super E> filter) {
    Objects.requireNonNull(filter);
    boolean removed = false;
    final Iterator<E> each = iterator();
    while (each.hasNext()) {
        if (filter.test(each.next())) {
            each.remove();
            removed = true;
        }
    }
    return removed;
}

As I said. The most generic way to remove specific elements from a Collection is precisely to go by its Iterator.remove() method and that’s precisely what the JDK does. Subtypes like ArrayList may of course override this implementation because there’s a more performant alternative, but in general, if you write your own custom, modifiable collection, you should implement this method.

And enjoy the ride into Java’s peculiar, historic caveats for which we all love the language.

What we Need is Standardised Non-OSS Licenses

If you’ve followed the recent (fake) news, you’ve probably already heard it. Oracle is “massively ramping up audits of Java customers it claims are in breach of its licences”

After a quick check on the source (The Register), here’s a more realistic, probably more accurate version of that headline:

Oracle is thinking about auditing 1-2 companies that massively ran the commercial Java extensions in production without paying

There, fixed. Also:

But there is a deeper problem to this discussion

Of course, all sorts of (ex) Red Hat and or Pivotal employees quickly jumped to the conclusion of the kind: Hey this wouldn’t happen with us – the good guys – the OSS guys.

For example:

That’s not surprising, of course. What’s also not surprising is that people who are already strongly opinionated will see their opinions reinforced. Another random example:

If you want more examples, just search Twitter for the article URL. There are tons of reactions.

The latter case is not very interesting. The former, however, is. Aleksey Shipilëv obviously has a good point.

use products with unambiguous licenses, like OSS

… and, of course, he’s not right at all. 🙂 There are some very ambiguous licenses in the OSS field, including many copyleft licenses. Take, for instance, LGPL 2.1, which is a very long license, and contains ridiculuous things like:

If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.)

(emphasis mine). ten lines of code. What’s a line? Everything between two \n characters? On Windows, does a line have to end in \r\n for this clause to be applicable? What if I remove formatting and have 10000 character lines? Such functions aren’t small, but certainly less than 10 lines. Right? RIGHT?

Hmm…

Not to mention that this single ambiguity (there are more) infects the entire rest of the license text, because it introduces unrestricted use in a rather restrictive library. Think that’s nuts? Go check Hibernate’s license. Most of it (and thus YOUR application, if you patched Hibernate) is affected.

Licensing = restricting

At the end of the day, pretty much every license will restrict rights in some way (except for the public domain “license”). The problem with commercial licenses, however, is that they’re very unique, whereas OSS licenses are usually always the same (mostly some [X]GPL or ASL, MIT, BSD). In other words, OSS licenses are standardised and thus: pretty well understood. And thus: Much less risky.

That’s not the case with commercial licenses. Take the jOOQ license for instance. As of the end of 2016, it’s 23 pages strong (including the annex containing pricing). What does the license mean to our customers? Here’s a TL;DR version (obviously, if in doubt: the actual license will apply, not this TL;DR version):

  • Developer workstations need a timely limited or perpetual license
  • All server workstations are licensed for free, perpetually
  • Object code may be distributed and sublicensed
  • Source code may be used (e.g. for maintenance), but not distributed

And, of course, there are different price plans, but those aren’t really part of the license. So, jOOQ feels like Open Source: source code is shipped, may be used for documentation purpose, may be patched, recompiled, but not distributed, i.e. it isn’t free as in freedom (of course not, it would be the end of our business).

But what does it mean that the source code may be used? The license explicitly allows “modification”, but what does that mean? Are you also allowed to document such modification, just not ship it? E.g. in a public GitHub issue? Such that other users who are affected may profit from your fix?

If in doubt, the best way forward is to ask the vendor. In our case, we’re very open minded and quick to answer – and also quick to improve the license when it is not clear.

In Oracle’s case, a bit less. Of course, because Oracle is a huge company, and who are you even going to ask? Who will take the time to answer an individual question? It’s simply not possible.

The solution: Standardised commercial licenses

There aren’t too many business models with software. First off, there are a few different categories of software, e.g.:

  • SaaS: This is still the wild west. But essentially, you don’t license the software, you rent an access point.
  • Servers: Databases, programming environments, operating systems, they all fall into this category. These are systems that run your software (and/or data).
  • Libraries: Things like jOOQ, Hibernate. These are programs that are embedded in other programs (e.g. SaaS or Servers)
  • Tools: Things like IntelliJ, JRebel. These are programs to create and manipulate data, but they aren’t needed to run it. They can be easily removed.

Each category works entirely differently. For instance, copyleft doesn’t really affect SaaS and tools categories (unless you want to protect your trade secrets, of course), whereas it’s a killer for libraries.

SaaS, libraries and tools are usually per seat licenses, whereas servers are usually per core licenses – i.e. whatever scales better for both the vendor and customer.

This is an extremely simplified overview of commercial licensing, but imagine: What if all vendors in each one of the above categories could just pick a couple of yes/no answers to a standardised set of questions (e.g. what may be distributed? what may be modified? what may be run?), and they could pick only well understood standard wording of these concepts, then everything would be much clearer.

Back to the original Oracle auditing story

In the linked article, Oracle allegedly starts auditing Java users. Because the OracleJDK obviously isn’t “free” (as in freedom), but partially, it is “free” (as in beer) because there are a variety of use-cases where you don’t pay. However, there are some features that are “commercial” (i.e. non-free-as-in-beer), such as JMC and the Flight Recorder.

The interesting thing is that both of these features (and some others) ship with the “free” (as in beer) OracleJDK, but they’re part of the “COMMERCIAL FEATURES” (legal yelling) and those features must even be documented in YOUR LICENSE using this notice, such that YOUR end users may also not use them for free:

Use of the Commercial Features for any commercial or production purpose requires a separate license from Oracle. “Commercial Features” means those features identified Table 1-1 (Commercial Features In Java SE Product Editions) of the Java SE documentation accessible at http://www.oracle.com/technetwork/java/javase/documentation/index.html

(Did you know that? If you’re using OracleJDK in your application, you have to embed the above in your own EULA).

But do note, outside of these cryptic licenses, I’ve found several references to

Java Mission Control is available free of charge for development

E.g. here: http://download.oracle.com/technology/products/missioncontrol/updatesites/base/5.2.0/eclipse

Of course, this has absolutely no legal value, it might have been true at some time but now outdated. But that’s how I remember it. I can use Java Mission Control for free for development (not for productive use). Now, we’re back to this discussion. What’s productive use?

  • Can I profile a simple test program for free? Probably yes.
  • Can I profile my entire program (e.g. jOOQ) for free? Probably yes.
  • Can I run the profile in an CI environment to detect regressions for free? Hmmm.

And how is that understanding of “free” encoded in the actual license?

Standardised wording

Oracle has a long tradition of giving away software for free-as-in-beer to developers. Back in the days (before OSS, when there was only Oracle and IBM), that was a cunning move, because the money is not in development. It’s in operations. So, if developers get top notch software for free, they become evangelists. They’ll love the products, and convince the end users.

But again. Who are developers? When do they stop developing and start operating? When they test? When they ship?

We’ll never know for sure – as every vendor writes their own, unique license.

What we need is a standardised set of well understood commercial licenses, just like the OSS folks have their standardised set of well understood OSS licenses. For our industry as a whole, this would be of immense value, because the little fish (like ourselves), we could compete much better with the big ones without having to give away all of our IP for free under the terms of an OSS license. Our customers would no longer run into any legal issues. All risks from weird license texts would be removed.

And hopefully, this would put pressure on the big ones. And prevent articles like the one from the Register.

Do You Really Have to Name Everything in Software?

This is one of software engineering’s oldest battles. No, I’m not talking about where to put curly braces, or whether to use tabs or spaces. I mean the eternal battle between nominal typing and structural typing.

This article is inspired by a very vocal blogger who eloquently reminds us to …

[…] Please Avoid Functional Vomit

Read the full article here:
https://dzone.com/articles/using-java-8-please-avoid-functional-vomit

What’s the post really about?

It is about naming things. As we all know:

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Now, for some reason, there is a group of people who wants constant pain and suffering by explicitly naming everything, including rather abstract concepts and algorithmic components, such as compound predicates. Those people like nominal typing and all the features that are derived from it. What is nominal typing (as opposed to structural typing)?

Structural typing

SQL is a good example to study the two worlds. When you write SQL statements, you’re creating structural row types all the time. For instance, when you write:

SELECT first_name, last_name
FROM customer

… what you’re really doing is you’re creating a new rowtype of the structure (in pseudo-SQL):

TYPE (
  first_name VARCHAR,
  last_name VARCHAR
)

The type has the following properties:

  • It is a tuple or record (as always in SQL)
  • It contains two attributes or columns
  • Those two attributes / columns are called first_name and last_name
  • Their types is VARCHAR

This is a structural type, because the SQL statement that produces the type only declares the type’s structure implicitly, by producing a set of column expressions.

In Java, we know lambda expressions, which are (incomplete) structural types, such as:

// A type that can check for i to be even
i -> i % 2 == 0

Nominal typing

Nominal typing takes things one step further. In SQL, nominal typing is perfectly possible as well, for instance, in the above statement, we selected from a well-known table by name customer. Nominal typing assigns a name to a structural type (and possibly stores the type somewhere, for reuse).

If we want to name our (first_name, last_name) type, we could do things like:

-- By using a derived table:
SELECT *
FROM (
  SELECT first_name, last_name
  FROM customer
) AS people

-- By using a common table expression:
WITH people AS (
  SELECT first_name, last_name
  FROM customer
)
SELECT *
FROM people

-- By using a view
CREATE VIEW people AS
SELECT first_name, last_name
FROM customer

In all cases, we’ve assigned the name people to the structural type (first_name, last_name). The only difference being the scope for which the name (and the corresponding content) is defined.

In Java, we can only use lambda expressions, once we assign them to a typed name, either by using an assignment, or by passing the expression to a method that takes a named type argument:

// Naming the lambda expression itself
Predicate<Integer> p = i -> i % 2 == 0

// Passing the lambda expression to a method
Stream.of(1, 2, 3)
      .filter(i -> i % 2 == 0);

Back to the article

The article claims that giving a name to things is always better. For instance, the author proposes giving a name to what we would commonly refer to as a “predicate”:

//original, less clear code
if(barrier.value() > LIMIT && barrier.value() > 0){
//extracted out to helper function. More code, more clear
if(barrierHasPositiveLimitBreach()){

So, the author thinks that extracting a rather trivial predicate into an external function is better because a future reader of such code will better understand what’s going on. At least in the article’s opinion. Let’s refute this claim for the sake of the argument:

  • The proposed name is verbose and requires quite some thinking.
  • What does breach mean?
  • Is breach the same as >= or the same as >?
  • Is LIMIT a constant? From where?
  • Where is barrier? Who owns it?
  • What does the verb “has” mean, here? Does it depend on something outside of barrier? E.g. some shared state?
  • What happens if there’s a negative limit?

By naming the predicate (remember, naming things is hard), the OP has added several layers of cognitive complexity to the reader, while quite possibly introducing subtle bugs, because probably both LIMIT and barrier should be function arguments, rather than global (im)mutable state that is assumed to be there, by the function.

The name introduced several concepts (“to have a breach”, “positive limit”, “breach”) that are not well defined and need some deciphering. How do we decipher it? Probably by looking inside the function and reading the actual code. So what do we gain? Better reuse, perhaps? But is this really reusable?

Finally, there is a (very slight) risk of introducing a performance penalty by the additional indirection. If we translate this to SQL, we could have written a stored function and then queried:

SELECT *
FROM orders -- Just an assumption here
WHERE barrier_has_positive_limit_breach(orders.barrier)

If this was some really complicated business logic depending on a huge number of things, perhaps extracting the function might’ve been worthwile. But in this particular case, is it really better than:

SELECT *
FROM orders
WHERE barrier > :limit AND barrier > 0

or even

SELECT *
FROM orders
WHERE barrier > GREATEST(:limit, 0)

Conclusion

There are some people in our industry who constantly want to see the world in black and white. As soon as they’ve had one small success story (e.g. reusing a very common predicate 4-5 times by extracting it into a function), they conclude with a general rule of this approach being always superior.

They struggle with the notion of “it depends”. Nominal typing and structural typing are both very interesting concepts. Structural typing is extremely powerful, whereas nominal typing helps us humans keep track of complexity. In SQL, we’ve always liked to structure our huge SQL statements, e.g. in nameable views. Likewise, Java programmers structure their code in nameable classes and methods.

But it should be immediately clear to anyone reading the linked article that the author seems to like hyperboles and probably wasn’t really serious, given the silly example he came up with. The message he’s conveying is wrong, because it claims that naming things is always better. It’s not true.

Be pragmatic. Name things where it really helps. Don’t name things where it doesn’t. Or as Leon Bambrick amended Phil Karlton’s quote:

There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors

Here’s my advice to you, dear nominal typing loving blogger. There’s are only two ways of typing: nominal typing and structural typing. And it depends typing.

SQL, Streams, For Comprehension… It’s All the Same

Recently, at Devoxx, I’ve seen this beautiful slide in a talk by Kevlin Henney

In his talk, he was displaying a variety of approaches to solve the FizzBuzz “problem”, including a couple of very elegant solutions in completely declarative approaches and languages.

In this particular slide, Kevlin used a notation that is derived from maths. The set builder notation. Here’s an example from Wikipedia:

even-numbers

The example reads: For all n in (the set of all integer numbers), take those for which there exists () another integer k, for which the following equation is satisfied: n = 2k.

Or in plain English: All even integers. (because for even integers, there exists another integer that is half the even integer)

Beautiful, eh? In imperative programming, we’d probably do something like this instead:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i++)
    even.add(i * 2);

Or this:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i = i + 2)
    even.add(i);

But there are several problems with the imperative approach:

  • We have to realistically start somewhere
  • We have to realistically end somewhere
  • We have to store all values in an intermediate collection

Sure, those aren’t severe limitations in every day use-cases, because we’re probably solving a real world problem where we don’t actually need an infinite number of even integers, and storing them in an intermediate collection doesn’t consume all of our memory, but still, the declarative, mathematical approach is much leaner, because we can still answer those questions about where to start and where to end later, and we never need to materialise any intermediate collection before we make those final decisions.

For instance, we can declare X to be that set, and then declare Y to be a set that is derived from X, and finally materialise Z, which is a very tiny set derived from Y. For this, we may have never needed to materialise all the (even) integers.

How this compares to SQL

Kevlin made a cunning comparison. Of course, all functional programming aficionados will immediately recognise that languages like Scala have something called a “for comprehension”, which models precisely the mathematical set-builder notation.

Java 8 now has the Streams API, which allows us, to some extent, model something similar (although not as powerful). But Kevlin didn’t use those “modern” languages. He used SQL as a comparison. That “arcane” declarative programming language that has been around forever, and that we love so much. Yes, here’s how we can declare all the even numbers in SQL:

SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE n = 2 * k
)

If optimisers were perfect, this semi-self-join between the two references of the integers “table” could be optimised perfectly. In most databases, we’d probably manually transform the above notation to this equivalent one:

SELECT n
FROM integers
WHERE MOD(n, 2) = 0

Yes, indeed. The set-builder notation and the SQL language are very similar beasts. The former prefers using mathematical symbols for brevity and conciseness, the latter prefers using English words to connect the different operators, but it’s the same thing. And if you squint hard enough, you’ll see that Java 8 Streams, for instance, are also pretty much the same thing:

everything-is-a-table

I’ve blogged about this recently where all the Java 8 Streams operations are compared to their SQL clause counterparts:
https://blog.jooq.org/2015/08/13/common-sql-clauses-and-their-equivalents-in-java-8-streams

How is this better?

It’s simple. Both the set-builder notation, and the SQL language (and in principle, other languages’ for comprehensions) are declarative. They are expressions, which can be composed to other, more complex expressions, without necessarily executing them.

Remember the imperative approach? We tell the machine exactly what to do:

  • Start counting from this particular minimal integer value
  • Stop counting at this particular maximal integer value
  • Store all even integers in between in this particular intermediate collection

What if we don’t actually need negative integers? What if we just wanted to have a utility that calculates even integers and then reuse that to list all positive integers? Or, all positive integers less than 100? Etc.

In the imperative approach, we have to refactor constantly, to avoid the overhead of

  • Producing too many integers
  • Storing too many integers (or storing them at all)

In truly declarative languages like SQL, we’re just describing “even integers” with an expression, possibly assigning the expression a name:

CREATE VIEW even_integers AS
SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE k = 2 * n
)

So, when we actually use and materialise the even integers, e.g. positive integers less than 100, the optimiser can optimise away the double access to the integer table and produce only the exact number of values that we’re requesting (without materialising them in intermediate collections):

SELECT n
FROM even_integers
WHERE n BETWEEN 0 AND 100

Conclusion

Thinking in terms of sets, in terms of declaring sets, has always been our dream as software engineers. The approach is extremely compelling and elegant. We can delegate a lot of boring algorithmic work to the implementation engine of the declarative programming language. In the case of SQL, it would be a SQL database optimiser, which figures out a great lot of optimisations that we might not have thought of.

The above example is trivial. We can perfectly live in a world where we manually iterate over a local integer variable that goes from 0 to 100:

for (int i = 0; i <= 100; i++)
  doSomething(i);

But stuff gets hairy quite quickly. Compare Mario Fusco‘s famous tweet’s two versions of the same algorithm:

This also applies to SQL, and what’s even better in SQL than with Streams: The SQL statement is a declarative expression tree, not a formally ordered set of stream pipeline operations. The optimiser can freely reorder / transform the expression tree into something that it thinks is more optimal. This isn’t just a promise. This works in modern SQL databases every day, for very complex queries, which you can write in a matter of seconds, rather than hours.

Stay tuned for a short series of blog posts on the jOOQ blog illustrating what modern cost-based optimisation can do for you, when you’re using the SQL language.