Plagiarism is Copyright Infringement AND Poor Form

Please repost / reblog / spread the word, should you have been victim of a similar act of plagiarism or copyright infringement!

With my blog getting increasingly popular, I’m more and more facing the problem of plagiarism. Plagiarism is bad for a variety of reasons:

  • It hurts the original author’s SEO, as content starts getting less relevant when duplicated verbatim across the net.
  • It is very poor form and just a plain embarassment to the offender.
  • It will inevitably get back at you. Right, Mr Guttenberg?

Why do people engage in plagiarism? When there is Fair Use? Why do people pretend they have authored something themselves, which they have stolen? Why do people obscure their sources?

I am going to take plagiarism very seriously and not tolerate it. With Google and Google Analytics, it is very easy to detect plagiarism. I’ve recently had an article removed from a popular Indian website, which seems to heavily engage in plagiarism: TechGig.com

ITeye.com is another platform from China, whose members ruthlessly engage in plagiarism as well. Yes, Google also ships with Google Translate. Another great tool to detect plagiarism. Beware, offenders! I will be going after you. And if you make money with my content, I am more than happy to collect some of that, or have your domain challenged with your registrar! All top-level domains are eventually protected by the DMCA, as the ICANN is an American-dominated organisation. You don’t want to risk such action just because a couple of geeks on your platform cannot control themselves! And if your platform itself is the offender, then be sure that you will close down very soon!

Here’s a letter I wrote to CSDN.net (registrar) and ITeye.com. I am licensing this letter as public domain, for you to reuse against your own offenders. Take any parts you may need.

To whom it may concern,

I have found your contact through a whois lookup, as ITeye themselves fail to respond to my recent enquiry. I am continuing to notice that a couple of ITeye bloggers and curators copy and translate articles off my blog http://blog.jooq.org, which is a promotional blog for my database product jOOQ.

In particular, these posts here:

… were in fact copied from my popular blog post here:

… which was syndicated with my express permission to DZone and JCG:

Plagiarism and copyright infringement is a non-trivial offence in many countries, including Switzerland from where I am operating. I urge ITeye to

  • give full author attribution to myself for all blog posts that ITeye writers copy and / or translate verbatim
  • have such authors link to my *original* blog post (not to syndications thereof)
  • have such authors keep promotional links in place
  • or to remove such blog posts immediately

I am taking this infringement very seriously, as the above displays go beyond what is known as “Fair Use”. I am sure ITeye understand that it is of utmost priority for a platform such as ITeye to comply with such laws. I am also sure that CSDN, the registrant will be able to execute the appropriate actions, should ITeye fail to comply, in case of which I will need to act on behalf of the American DMCA.

Please do not engage in plagiarism. Please, critically review your writers’ works and actively block all suspicious content.

Sincerely,

Lukas Eder

What Will be Oracle’s Next Big Acquisition?

Now THIS is an interesting Quora question. Citing:

What will be the next big acquisition by Oracle?

What will be the next acquisition made by Oracle that could be compared (as a strategic decision, not necessarily  by value) to Oracle’s Sun Microsystems acquisition?

From my perspective, clearly, Oracle will buy jOOQ from Data Geekery GmbH, in order to finally closely integrate their two most valuable assets:

  • The Oracle Database
  • The JVM and Java

But maybe, they’ll just buy another airline ;-)

What is your bet? Comment below, or answer the question here:
http://www.quora.com/Oracle-company/What-will-be-the-next-big-acquisition-by-Oracle

The Myth About Slow SQL JOIN Operations

In my recent SQL work for a large Swiss bank, I have maintained nested database view monsters whose unnested SQL code amounted up to 5k lines of code, joining the same table over and over again in separate subselects combined via UNION operations. This monster performed in way under 50ms, no matter how we queried it (see “10 more common mistakes” about the speed of queries). Of course, this performance was only achieved after lots of fine-tuning, load-testing and benchmarking. But it worked. Our Oracle database never let us down on these things.

Nonetheless, many SQL users think that JOIN operations are slow. Why? Perhaps because they are / used to be, in MySQL? I’m currently reading this interesting book by Markus Winand. The book is called SQL Performance Explained. He’s also the author of Use-The-Index-Luke.com where you can get free insight into his book. I still recommend reading the whole book, though. Even SQL old-timers and SQL nerds like me will find 1-2 novel, very interesting approaches, some of which will be incorporated into jOOQ very soon!

In particular, consider this page which explains very well how Hash JOIN operations work:
http://use-the-index-luke.com/sql/join/hash-join-partial-objects

10 Subtle Best Practices when Coding Java

This is a list of 10 best practices that are more subtle than your average Josh Bloch Effective Java rule. While Josh Bloch’s list is very easy to learn and concerns everyday situations, this list here contains less common situations involving API / SPI design that may have a big effect nontheless.

I have encountered these things while writing and maintaining jOOQ, an internal DSL modelling SQL in Java. Being an internal DSL, jOOQ challenges Java compilers and generics to the max, combining generics, varargs and overloading in a way that Josh Bloch probably wouldn’t recommend for the “average API”.

jOOQ is the best way to write SQL in Java

Let me share with you 10 Subtle Best Practices When Coding Java:

1. Remember C++ destructors

Remember C++ destructors? No? Then you might be lucky as you never had to debug through any code leaving memory leaks due to allocated memory not having been freed after an object was removed. Thanks Sun/Oracle for implementing garbage collection!

But nonetheless, destructors have an interesting trait to them. It often makes sense to free memory in the inverse order of allocation. Keep this in mind in Java as well, when you’re operating with destructor-like semantics:

  • When using @Before and @After JUnit annotations
  • When allocating, freeing JDBC resources
  • When calling super methods

There are various other use cases. Here’s a concrete example showing how you might implement some event listener SPI:

@Override
public void beforeEvent(EventContext e) {
    super.beforeEvent(e);
    // Super code before my code
}

@Override
public void afterEvent(EventContext e) {
    // Super code after my code
    super.afterEvent(e);
}

Another good example showing why this can be important is the infamous Dining Philosophers problem. More info about the dining philosophers can be seen in this awesome post:
http://adit.io/posts/2013-05-11-The-Dining-Philosophers-Problem-With-Ron-Swanson.html

The Rule: Whenever you implement logic using before/after, allocate/free, take/return semantics, think about whether the after/free/return operation should perform stuff in the inverse order. tweet this

2. Don’t trust your early SPI evolution judgement

Providing an SPI to your consumers is an easy way to allow them to inject custom behaviour into your library / code. Beware, though, that your SPI evolution judgement may trick you into thinking that you’re (not) going to need that additional parameter. True, no functionality should be added early. But once you’ve published your SPI and once you’ve decided following semantic versioning, you’ll really regret having added a silly, one-argument method to your SPI when you realise that you might need another argument in some cases:

interface EventListener {
    // Bad
    void message(String message);
}

What if you also need a message ID and a message source? API evolution will prevent you from adding that parameter easily, to the above type. Granted, with Java 8, you could add a defender method, to “defend” you bad early design decision:

interface EventListener {
    // Bad
    default void message(String message) {
        message(message, null, null);
    }
    // Better?
    void message(
        String message,
        Integer id,
        MessageSource source
    );
}

Note, unfortunately, the defender method cannot be made final.

But much better than polluting your SPI with dozens of methods, use a context object (or argument object) just for this purpose.

interface MessageContext {
    String message();
    Integer id();
    MessageSource source();
}

interface EventListener {
    // Awesome!
    void message(MessageContext context);
}

You can evolve the MessageContext API much more easily than the EventListener SPI as fewer users will have implemented it.

The Rule: Whenever you specify an SPI, consider using context / parameter objects instead of writing methods with a fixed amount of parameters. tweet this

Remark: It is often a good idea to also communicate results through a dedicated MessageResult type, which can be constructed through a builder API. This will add even more SPI evolution flexibility to your SPI.

3. Avoid returning anonymous, local, or inner classes

Swing programmers probably have a couple of keyboard shortcuts to generate the code for their hundreds of anonymous classes. In many cases, creating them is nice as you can locally adhere to an interface, without going through the “hassle” of thinking about a full SPI subtype lifecycle.

But you should not use anonymous, local, or inner classes too often for a simple reason: They keep a reference to the outer instance. And they will drag that outer instance to wherevery they go, e.g. to some scope outside of your local class if you’re not careful. This can be a major source for memory leaks, as your whole object graph will suddenly entangle in subtle ways.

The Rule: Whenever you write an anonymous, local or inner class, check if you can make it static or even a regular top-level class. Avoid returning anonymous, local or inner class instances from methods to the outside scope. tweet this

Remark: There has been some clever practice around double-curly braces for simple object instantiation:

new HashMap<String, String>() {{
    put("1", "a");
    put("2", "b");
}}

This leverages Java’s instance initializer as specified by the JLS §8.6. Looks nice (maybe a bit weird), but is really a bad idea. What would otherwise be a completely independent HashMap instance now keeps a reference to the outer instance, whatever that just happens to be. Besides, you’ll create an additional class for the class loader to manage.

4. Start writing SAMs now!

Java 8 is knocking on the door. And with Java 8 come lambdas, whether you like them or not. Your API consumers may like them, though, and you better make sure that they can make use of them as often as possible. Hence, unless your API accepts simple “scalar” types such as int, long, String, Date, let your API accept SAMs as often as possible.

What’s a SAM? A SAM is a Single Abstract Method [Type]. Also known as a functional interface, soon to be annotated with the @FunctionalInterface annotation. This goes well with rule number 2, where EventListener is in fact a SAM. The best SAMs are those with single arguments, as they will further simplify writing of a lambda. Imagine writing

listeners.add(c -> System.out.println(c.message()));

Instead of

listeners.add(new EventListener() {
    @Override
    public void message(MessageContext c) {
        System.out.println(c.message()));
    }
});

Imagine XML processing through jOOX, which features a couple of SAMs:

$(document)
    // Find elements with an ID
    .find(c -> $(c).id() != null)
    // Find their  child elements
    .children(c -> $(c).tag().equals("order"))
    // Print all matches
    .each(c -> System.out.println($(c)))

The Rule: Be nice with your API consumers and write SAMs / Functional interfaces already now. tweet this

Remarks: A couple of interesting blog posts about Java 8 Lambdas and improved Collections API can be seen here:

5. Avoid returning null from API methods

I’ve blogged about Java’s NULLs once or twice. I’ve also blogged about Java 8’s introduction of Optional. These are interesting topics both from an academic and from a practical point of view.

While NULLs and NullPointerExceptions will probably stay a major pain in Java for a while, you can still design your API in a way that users will not run into any issues. Try to avoid returning null from API methods whenever possible. Your API consumers should be able to chain methods whenever applicable:

initialise(someArgument).calculate(data).dispatch();

In the above snippet, none of the methods should ever return null. In fact, using null’s semantics (the absence of a value) should be rather exceptional in general. In libraries like jQuery (or jOOX, a Java port thereof), nulls are completely avoided as you’re always operating on iterable objects. Whether you match something or not is irrelevant to the next method call.

Nulls often arise also because of lazy initialisation. In many cases, lazy initialisation can be avoided too, without any significant performance impact. In fact, lazy initialisation should be used carefully, only. If large data structures are involved.

The Rule: Avoid returning nulls from methods whenever possible. Use null only for the “uninitialised” or “absent” semantics. tweet this

6. Never return null arrays or lists from API methods

While there are some cases when returning nulls from methods is OK, there is absolutely no use case of returning null arrays or null collections! Let’s consider the hideous java.io.File.list() method. It returns:

An array of strings naming the files and directories in the directory denoted by this abstract pathname. The array will be empty if the directory is empty. Returns null if this abstract pathname does not denote a directory, or if an I/O error occurs.

Hence, the correct way to deal with this method is

File directory = // ...

if (directory.isDirectory()) {
    String[] list = directory.list();

    if (list != null) {
        for (String file : list) {
            // ...
        }
    }
}

Was that null check really necessary? Most I/O operations produce IOExceptions, but this one returns null. Null cannot hold any error message indicating why the I/O error occurred. So this is wrong in three ways:

  • Null does not help in finding the error
  • Null does not allow to distinguish I/O errors from the File instance not being a directory
  • Everyone will keep forgetting about null, here

In collection contexts, the notion of “absence” is best implemented by empty arrays or collections. Having an “absent” array or collection is hardly ever useful, except again, for lazy initialisation.

The Rule: Arrays or Collections should never be null. tweet this

7. Avoid state, be functional

What’s nice about HTTP is the fact that it is stateless. All relevant state is transferred in each request and in each response. This is essential to the naming of REST: Representational State Transfer. This is awesome when done in Java as well. Think of it in terms of rule number 2 when methods receive stateful parameter objects. Things can be so much simpler if state is transferred in such objects, rather than manipulated from the outside. Take JDBC, for instance. The following example fetches a cursor from a stored procedure:

CallableStatement s =
  connection.prepareCall("{ ? = ... }");

// Verbose manipulation of statement state:
s.registerOutParameter(1, cursor);
s.setString(2, "abc");
s.execute();
ResultSet rs = s.getObject(1);

// Verbose manipulation of result set state:
rs.next();
rs.next();

These are the things that make JDBC such an awkward API to deal with. Each object is incredibly stateful and hard to manipulate. Concretely, there are two major issues:

  • It is very hard to correctly deal with stateful APIs in multi-threaded environments
  • It is very hard to make stateful resources globally available, as the state is not documented
State is like a box of chocolates

Theatrical poster for Forrest Gump, Copyright © 1994 by Paramount Pictures. All Rights Reserved. It is believed that the above usage fulfils what is known as Fair Use

The Rule: Implement more of a functional style. Pass state through method arguments. Manipulate less object state. tweet this

8. Short-circuit equals()

This is a low-hanging fruit. In large object graphs, you can gain significantly in terms of performance, if all your objects’ equals() methods dirt-cheaply compare for identity first:

@Override
public boolean equals(Object other) {
    if (this == other) return true;

    // Rest of equality logic...
}

Note, other short-circuit checks may involve null checks, which should be there as well:

@Override
public boolean equals(Object other) {
    if (this == other) return true;
    if (other == null) return false;

    // Rest of equality logic...
}

The Rule: Short-circuit all your equals() methods to gain performance. tweet this

9. Try to make methods final by default

Some will disagree on this, as making things final by default is quite the opposite of what Java developers are used to. But if you’re in full control of all source code, there’s absolutely nothing wrong with making methods final by default, because:

  • If you do need to override a method (do you really?), you can still remove the final keyword
  • You will never accidentally override any method anymore

This specifically applies for static methods, where “overriding” (actually, shadowing) hardly ever makes sense. I’ve come across a very bad example of shadowing static methods in Apache Tika, recently. Consider:

TikaInputStream extends TaggedInputStream and shadows its static get() method with quite a different implementation.

Unlike regular methods, static methods don’t override each other, as the call-site binds a static method invocation at compile-time. If you’re unlucky, you might just get the wrong method accidentally.

The Rule: If you’re in full control of your API, try making as many methods as possible final by default. tweet this

10. Avoid the method(T…) signature

There’s nothing wrong with the occasional “accept-all” varargs method that accepts an Object... argument:

void acceptAll(Object... all);

Writing such a method brings a little JavaScript feeling to the Java ecosystem. Of course, you probably want to restrict the actual type to something more confined in a real-world situation, e.g. String.... And because you don’t want to confine too much, you might think it is a good idea to replace Object by a generic T:

void acceptAll(T... all);

But it’s not. T can always be inferred to Object. In fact, you might as well just not use generics with the above methods. More importantly, you may think that you can overload the above method, but you cannot:

void acceptAll(T... all);
void acceptAll(String message, T... all);

This looks as though you could optionally pass a String message to the method. But what happens to this call here?

acceptAll("Message", 123, "abc");

The compiler will infer <? extends Serializable & Comparable<?>> for T, which makes the call ambiguous!

So, whenever you have an “accept-all” signature (even if it is generic), you will never again be able to typesafely overload it. API consumers may just be lucky enough to “accidentally” have the compiler chose the “right” most specific method. But they may as well be tricked into using the “accept-all” method or they may not be able to call any method at all.

The Rule: Avoid “accept-all” signatures if you can. And if you cannot, never overload such a method. tweet this

Conclusion

Java is a beast. Unlike other, fancier languages, it has evolved slowly to what it is today. And that’s probably a good thing, because already at the speed of development of Java, there are hundreds of caveats, which can only be mastered through years of experience.

jOOQ is the best way to write SQL in Java

Stay tuned for more top 10 lists on the subject!

Do You Put Trust in Vendors’ NoSQL Promises?

Disruptive times lead to disruptive technologies. But moreover, they lead to new buzzwords and thus new “value propositions” for an unsettled clientele who will buy into any promise these vendors make. If you believe those expensive reports that you can buy from companies like Gartner, you will think:

Cloud Computing

You need to run your business in the cloud

“You need to run your business in the cloud”. The advent of what marketers have come to call “The Cloud” (formerly known as “Web 2.0”, or “The Internet”) has started to transform the ways some people think about data. Many vendors trust in “Big Data” becoming the next “Big Paradigm” for software engineering. “Big Data” seems to cry for new data storage technologies, as advertised in this article here: http://www.ncs-london.com/blog/Five-Benefits-of-a-noSQL-approach-to-DBA-management Consider section 4:

4: An end to DBAs

Despite the many manageability enhancements claimed by RDBMS vendors over the years, high-end RDBMS systems can be upheld only with the help of lavish, highly trained DBAs. DBAs are intimately involved in the design, installation, and on-going tweaking of high-end RDBMS systems.

“lavish, highly trained DBAs”. This article assumes that moving away from SQL will make maintaining and managing large amounts of data a piece of cake that your average junior PHP script kiddie can handle.

“Blimey, Dear Rupert! Our customer fancies going to The Cloud with his 20 million transactions per day.”

“Good Lord, Rodney, I believe you are right. We ought to get Clive, our lovely intern youngster on the job. He only costs us a penny a minute.”

People often forget that an average iPhone with 64 GB of memory would have been considered quite “Big Data” 13 years ago, when the average Nokia 3310 could only store 100 phone numbers! 13 years changed a lot in terms of volume, but not that much in terms of software technology. Since then, software engineering has slowly transformed to manage ever larger sets of data, but few companies really need to scale in dimensions where RDBMS cannot scale to. In fact, some database vendors actually went to “The Cloud” with their RDBMS themselves, such as Google App Engine’s Cloud SQL.  And what about Twitter? Well, they’re actually using a MySQL sharded database, which serves them perfectly fine. Just as Instagram uses a sharded PostgreSQL database. Pinterest, another provider for a large “Cloud” application uses sharded MySQL (along with Solr, Memcache and Redis).

RDBMS might not be the optimal choice for some data models (hierarchical, unstructured, document-oriented models). But they are extremely powerful when it comes to manipulating relational data. SQL itself has evolved quite a bit since Nokia 3310 times, when 64 GB of memory was “Big Data”. The differences between Oracle 8i and Oracle 12c are amazing. The same applies to the differences between SQL Server 2000 and SQL Server 2012. Moreover, you can employ the very same DBA for the Oracle 12c job as you could for the 8i job back in the days.

While we are certainly living in exciting times where new technologies lead to new ways of thinking (and vice-versa), we should be sceptical of vendors who promise that we will migrate to the next paradigm within a blink of an eye. Your data might endure longer than the new technology you use to store it.

We should be sceptical of vendors who claim that highly trained maintenance people are the main problem we need to solve. They promise you to cut down on DBA and licensing costs, while hardly anyone even knows about future “NoDBA” and “NoLicensing” costs.

Do you put trust in those vendors’ NoSQL promises? Or do you believe SQL will catch up, and the old elephants can be taught new tricks? I’m curious to hear your thoughts and experiences.

Apache Derby About to Adopt the Awesome SQL:2003 MERGE Statement

Apache Derby is one out of three popular Java embeddable databases (apart from H2 and HSQLDB). It is very SQL and JDBC standards-compliant, but maybe a bit behind on developments of more advanced SQL features. Around 6 years after its first submission, there has now been some action on the Apache Derby DERBY-3155 ticket, recently. Rick Hillegas has attached a first, promising draft for the MERGE statement specification, which can be seen here:
https://issues.apache.org/jira/secure/attachment/12597795/MergeStatement.html

Among all 14 of the SQL databases supported by jOOQ, Derby would thus be the 8th to support the SQL:2003 MERGE statement, lining up with:

Other databases support proprietary versions of MERGE:

… or other forms of UPSERT:

Visit DERBY-3155 and show the maintainers some love for implementing this awesome and powerful SQL statement!

Java EE 7: JSRs That Make You Powerful

Tori Wieldt by Oracle has released an overview of all the goodies that are included in Java EE 7:
https://blogs.oracle.com/java/entry/java_ee_7_the_details

… with a couple of video presentations:
http://www.youtube.com/playlist?list=PL74xrT3oGQfCCLFJ2HCTR_iN5hV4penDz