I will take advantage of “fair use policy” and cite parts from section 7:
Feature number 7: improved support for DaffySQL syntax
If I told you that RowGenerator.row_num contains the values 1 through 255, what would you say this query returned?
Give up? OK, how about this one?
Still stumped? If I told you they both returned exactly the same result set as the following query, what would you say?
Yes, the LIMIT clause is new to SQL Anywhere 12, exactly the same as TOP START AT except it uses zero as the starting point for numbering rows instead of 1.
An “offset”, get it?
As in “Here’s ten dollars, let me count it for you: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.”
Why implement LIMIT? And why include it in a list of cool features?
Because there are a lot of MySQL users out there who don’t have TOP START AT, and they’ve written zillions of queries using LIMIT, and they’d like to migrate their apps to SQL Anywhere without rewriting everything. And PostgreSQL users too… welcome aboard!
Migrating to SQL Anywhere is definitely cool.
So be cool and migrate to SQL Anywhere already! :-) I’m now going through the rest of this fun document.
It just struck me like lightning. I just realised one (surely not the only) very important reason, why Stack Overflow always winds up at least once in the top 10 search results on Google for virtually any programming-related search. The Announcer Badge. When you share a link to a Stack Overflow question or answer using the “share” button, you get a personalised link including your own user ID. If that link generates 25 clicks from distinct IPs, you’ll get an Announcer Badge as a reward for linking to Stack Overflow. This badge has been awarded more than 25k times. Note, there’s also the Booster Badge (awarded around 800 times) and Publicist Badge (awarded around 300 times).
Stack Overflow uses a lot of techniques following the Gamification principle. But this particular badge is really clever. Trick your users into creating highly relevant links from arbitrary websites towards Stack Overflow helping to add relevancy to the platform and to its various questions / answers.
Java developers mix object-oriented thinking with imperative thinking, depending on their levels of:
Skill (anyone can code imperatively)
Dogma (some use the “Pattern-Pattern”, i.e. the pattern of applying patterns everywhere and giving them names)
Mood (true OO is more clumsy to write than imperative code. At first)
But when Java developers write SQL, everything changes. SQL is a declarative language that has nothing to do with either object-oriented or imperative thinking. It is very easy to express a query in SQL. It is not so easy to express it optimally or correctly. Not only do developers need to re-think their programming paradigm, they also need to think in terms of set theory.
Here are common mistakes that a Java developer makes when writing SQL through JDBC or jOOQ (in no particular order). For 10 More Common Mistakes, see this article here.
Here are common mistakes that a Java developer makes when writing SQL (in no particular order):
1. Forgetting about NULL
Misunderstanding NULL is probably the biggest mistake a Java developer can make when writing SQL. This is also (but not exclusively) due to the fact that NULL is also called UNKNOWN. If it were only called UNKNOWN, it would be easier to understand. Another reason is that JDBC maps SQL NULL to Java null when fetching data or when binding variables. This may lead to thinking that NULL = NULL (SQL) would behave the same way as null == null (Java)
Train yourself. There’s nothing but explicitly thinking about NULL, every time you write SQL:
Is this predicate correct with respect to NULL?
Does NULL affect the result of this function?
2. Processing data in Java memory
Few Java developers know SQL very well. The occasional JOIN, the odd UNION, fine. But window functions? Grouping sets? A lot of Java developers load SQL data into memory, transform the data into some appropriate collection type, execute nasty maths on that collection with verbose loop structures (at least, before Java 8’s Collection improvements).
But some SQL databases support advanced (and SQL standard!) OLAP features that tend to perform a lot better and are much easier to write. A (non-standard) example is Oracle’s awesome MODEL clause. Just let the database do the processing and fetch only the results into Java memory. Because after all some very smart guys have optimised these expensive products. So in fact, by moving OLAP to the database, you gain two things:
Simplicity. It’s probably easier to write correctly in SQL than in Java
Performance. The database will probably be faster than your algorithm. And more importantly, you don’t have to transmit millions of records over the wire.
Every time you implement a data-centric algorithm in Java, ask yourself: Is there a way to let the database perform that work for me?
3. Using UNION instead of UNION ALL
It’s a shame that UNION ALL needs an extra keyword compared to UNION. It would be much better if the SQL standard had been defined to support:
UNION (allowing duplicates)
UNION DISTINCT (removing duplicates)
Not only is the removal of duplicates rarely needed (or sometimes even wrong), it is also quite slow for large result sets with many columns, as the two subselects need to be ordered, and each tuple needs to be compared with its subsequent tuple.
Note that even if the SQL standard specifies INTERSECT ALL and EXCEPT ALL, hardly any database implements these less useful set operations.
Every time you write a UNION, think if you actually wanted to write UNION ALL.
4. Using JDBC Pagination to paginate large results
Most databases support some way of paginating ordered results through LIMIT .. OFFSET, TOP .. START AT, OFFSET .. FETCH clauses. In the absence of support for these clauses, there is still the possibility for ROWNUM (Oracle) or ROW_NUMBER() OVER() filtering (DB2, SQL Server 2008 and less), which is much faster than pagination in memory. This is specifically true for large offsets!
Just use those clauses, or a tool (such as jOOQ) that can simulate those clauses for you.
5. Joining data in Java memory
From early days of SQL, some developers still have an uneasy feeling when expressing JOINs in their SQL. There is an inherent fear of JOIN being slow. This can be true if a cost-based optimiser chooses to perform a nested loop, possibly loading complete tables into database memory, before creating a joined table source. But that happens rarely. With appropriate predicates, constraints and indexes, MERGE JOIN and HASH JOIN operations are extremely fast. It’s all about the correct metadata (I cannot cite Tom Kyte often enough for this). Nonetheless, there are probably still quite a few Java developers who will load two tables from separate queries into maps and join them in Java memory in one way or another.
If you’re selecting from various tables in various steps, think again to see if you cannot express your query in a single statement.
6. Using DISTINCT or UNION to remove duplicates from an accidental cartesian product
With heavy joining, one can lose track of all the relations that are playing a role in a SQL statement. Specifically, if multi-column foreign key relationships are involved, it is possible to forget to add the relevant predicates in JOIN .. ON clauses. This might result in duplicate records, but maybe only in exceptional cases. Some developers may then choose to use DISTINCT to remove those duplicates again. This is wrong in three ways:
It (may) solve the symptoms but not the problem. It may as well not solve the symptoms in edge-cases.
It is slow for large result sets with many columns. DISTINCT performs an ORDER BY operation to remove duplicates.
It is slow for large cartesian products, which will still load lots of data into memory
As a rule of thumb, when you get unwanted duplicates, always review your JOIN predicates. There’s probably a subtle cartesian product in there somewhere.
7. Not using the MERGE statement
This isn’t really a mistake, but probably some lack of knowledge or some fear towards the powerful MERGE statement. Some databases know other forms of UPSERT statements, e.g. MySQL’s ON DUPLICATE KEY UPDATE clause. But MERGE is really so powerful, most importantly in databases that heavily extend the SQL standard, such as SQL Server.
If you’re UPSERTING by chaining INSERT and UPDATE or by chaining SELECT .. FOR UPDATE and then INSERT or UPDATE, think again. Apart from risking race conditions, you might be able to express a simpler MERGE statement.
8. Using aggregate functions instead of window functions
Before the introduction of window functions, the only means to aggregate data in SQL was by using a GROUP BY clause along with aggregate functions in the projection. This works well in many cases, and if aggregation data needed to be enriched with regular data, the grouped query can be pushed down into a joined subquery.
But SQL:2003 defined window functions, which are implemented by many popular database vendors. Window functions can aggregate data on result sets that are not grouped. In fact, each window function supports its own, independent PARTITION BY clause, which is an awesome tool for reporting.
Using window functions will:
Lead to more readable SQL (less dedicated GROUP BY clauses in subqueries)
Improve performance, as a RDBMS is likely to optimise window functions more easily
When you write a GROUP BY clause in a subquery, think again if this cannot be done with a window function.
9. Using in-memory sorting for sort indirections
The SQL ORDER BY clause supports many types of expressions, including CASE statements, which can be very useful for sort indirections. You should probably never sort data in Java memory because you think that
SQL sorting is too slow
SQL sorting cannot do it
If you sort any SQL data in memory, think again if you cannot push sorting into your database. This goes along well with pushing pagination into the database.
10. Inserting lots of records one by one
JDBC knows batching, and you should use it. Do not INSERT thousands of records one by one, re-creating a new PreparedStatement every time. If all of your records go to the same table, create a batch INSERT statement with a single SQL statement and multiple bind value sets. Depending on your database and database configuration, you may need to commit after a certain amount of inserted records, in order to keep the UNDO log slim.
As an Open Source developer, I’m used to trying to support as many reasonable things for my users as possible. However, this has never included support for Java 5, which itself is hardly supported by popular Java vendors anymore. Hence jOOQ requires Java 6 or more to compile and run.
There is now an interesting initiative by Kohsuke Kawaguchi, the creator of the Jenkins CI server. In a letter, he’s attempting to get other Open Source projects and developers on board with him to drop support for Java 5. While this change is rather trivial and marginal for most Open Source projects, it’s a major change for a continuous integration server such as Jenkins. With his permission, I’m citing his letter about why Java 5 should no longer be supported by Jenkins CI. If you’re an Open Source developer and you want to drop or have already dropped support for Java 5, then join this initiative:
We are putting the stake on the ground: our releases after Sep 30th 2013 will start requiring Java 6 as the minimum runtime environment.
We are delivering this message to our users to give them a fair notice. To make this more effective, we are building a coalition of OSS projects. We’ll put up a simple website to advertise this, and encourage people to spread the news. Our collective project names and logos will help spread the message.
We are developers of an OSS project. To help our users use our software, we’ve been refraining from requiring Java 6 as the minimum runtime so far. But we think we did that long enough. It’s time to move on.
Most Java VM vendors no longer support Java 5. People shouldn’t be using it.
There’s no viable open-source Java 5 implementation.
We can’t use increasing number of libraries that require newer Java, translating into more development effort, less features and fixes.
It’s adding to the integration test cost. We run more tests for Java 5, when increasingly smaller number of developers actually have Java 5.
Newer Java runtime has more features. More collection APIs, NIO improvements, console access, XML support, compiler API, annotation processors, and scripting language interface.
1.50 class file format comes with split verifier, resulting in faster classloading.
Putting our collective weight behind this will help us reach more users. Picking this fight individually is harder.
If this is successful, it’ll make it easier for us to move on to newer Java runtimes in the future versions.
Java5 was released in 2004, nearly a decade ago. Its public support has ended in 2009.
Even IBM is ending its support for Java 5 on the server side at Sep 30th, 2013.
I’ve just come across an interesting SQL question here on CodeRanch. How to generate date ranges in SQL, given any input date. The question didn’t specify the SQL dialect, so let’s choose Oracle SQL, which features the awesome CONNECT BY clause. The requirements specify that given any input date:
Date ranges span 12 months
The first date range starts at the input date
The last date range includes today
Subsequent date ranges repeat the same date
For example, using 2010-06-10 we would get:
2013-06-10 2014-06-10 <-- This includes today, 2013-07-24
It’s actually very simple:
add_months(input, (level - 1) * 12) start_date,
add_months(input, level * 12) end_date
-- Set the input date here
SELECT DATE '2010-06-10' input
add_months(input, (level - 1) * 12) < sysdate
No matter what programming language we choose, we programmers all share one big misery: Having to deal with calendars. We all get it wrong dozens of times in our working lives. And when we think we finally understood calenders, we get taught better. While some take the time to delve into the misteries of calendars, others are faced with this situation:
Google “programming” and “calendars”, and you’ll find lots of interesting discussions, such as:
The question is about a silly NullPointerException and its answer is straightforward. Now the JRebel link at the bottom:
“I agree. Here’s the link” – Whatever. Completely off-topic, no? It looks as though CodeRanch had a spam filtering problem which was exploited by a decent company selling decent products. Why would they do that? Well, I understand that being able to put your link on one of the Top 20 Java Websites can be good for search engine optimisation. But making it look like spam will probably start pissing off potential customers, no?
When blogging about DSLs, I’m mostly blogging about internal domain-specific languages, because jOOQ is a good example for internal DSLs. But external domain-specific languages are an interesting topic as well. As an external DSL does not depend on any host language (e.g. Java), it can be much richer in syntax and expressivity. On the other hand, you need tools to parse, compile and maybe even interpret / run them.
Here’s a very interesting blog post showing how to create an external DSL with Xtext, Eclipse’s DSL tool chain. It creates a simple DSL to describe video game control flows and shows how Eclipse Xtext immediately provides an editor with all sorts of IDE features for that custom DSL: