Writing tests that use an actual database is hard.
Period.
Now that this has been established, let’s have a look at a blog post by Marco Behler, in which he elaborates on various options when testing database code, with respect to transactionality. Testing database transactions is even harder than just testing database code. Marco lists a couple of options how to tweak these tests to make them “easier” to write.
One of the options is:
3. Set the flush-mode to FlushMode.ALWAYS for your tests
(note, this is about testing Hibernate code).
Marco puts this option in parentheses, because he’s not 100% convinced if it’s a good idea to test different behaviour from the productive one. Here’s our take on that:
Stop Unit Testing Database Code
By the time you start thinking about tweaking your test setup to achieve “simpler” transaction behaviour (or worse, use mocks for your database), you’re pretty much doomed. You start creating an alternative system that heavily deviates from your productive system. This essentially means:
that results from tests against your test system have (almost) no meaning
that your tests will not cover some of the most complex aspects of your productive system
that you will start spending way too much time on tweaking tests rather than implementing useful business logic
Instead, focus on writing integration tests that test your business logic on a very high level, i.e. on a “service layer” level, if your architecture has such a thing. If you’re using EJB, this would probably be on a session bean level. If you’re using REST or SOAP, this would be on a REST or SOAP service level. If you’re using HTTP (the retro name for REST), it would be on an HTTP service level.
Here are the advantages of this approach:
You might not get 100% coverage, but you will get the 80% coverage for those parts that really matter (with 20% of the effort). Your UI (or external system) doesn’t call the database in the quirkiest of forms either. It calls your services. Why would you test anything other than your services?
Your tests are very easy to maintain. You have a public API to your UI (or external system). It’s formal and easy to understand. It’s a black box with well-defined input and output parameters, which makes it easy to read / write tests
Databases are stateful. Obviously
What you have to do is let go of this idea that your database will ever participate in a “unit” (as in “unit” test). Units are pretty stateless, and thus it is very easy to write mutually independent unit tests for functional algorithms, for instance.
Databases couldn’t be any less stateless. The whole idea of a database is to manage state. And that’s very complicated and completely opposite to what any unit test can ever model. Many of the most meaningful state transitions span several database interactions, or even transactions, or maybe even services. For instance, it may be important that the CREATE_USER service invocation be immediately followed by an invocation of CHANGE_PASSWORD. You can only integration-test that on a service layer. Don’t believe it? What if CREATE_USER depends on an external LDAP system? Or complex security logic in stored procedures? Your integration test’s got that covered.
Takeaway
Writing tests that use an actual database is hard.
Yes. That won’t change. But your perception may. Don’t try to tweak things around this fact. Create a well-known test database. Reset it between tests. And write integration tests on a very high level. The 20/80 cost/benefit ratio will leave you no better choice.
Stay tuned for another blog post on this blog about how we integration-test the jOOQ API against 16 actual RDBMS
There are a variety of ways how jOOQ and Flyway could interact with each other in various development setups. In this tutorial we’re going to show just one variant of such framework team play – a variant that we find particularly compelling for most use cases.
The general philosophy and workflow behind the following approach can be summarised as this:
1. Database increment
2. Database migration
3. Code re-generation
4. Development
The four steps above can be repeated time and again, every time you need to modify something in your database. More concretely, let’s consider:
1. Database increment – You need a new column in your database, so you write the necessary DDL in a Flyway script
2. Database migration – This Flyway script is now part of your deliverable, which you can share with all developers who can migrate their databases with it, the next time they check out your change
3. Code re-generation – Once the database is migrated, you regenerate all jOOQ artefacts (see code generation), locally
4. Development – You continue developing your business logic, writing code against the udpated, generated database schema
0.1. Maven Project Configuration – Properties
The following properties are defined in our pom.xml, to be able to reuse them between plugin configurations:
While jOOQ and Flyway could be used in standalone migration scripts, in this tutorial, we’ll be using Maven for the standard project setup. You will also find the source code of this tutorial on GitHub, and the full pom.xml file here.
These are the dependencies that we’re using in our Maven configuration:
<!-- We'll add the latest version of jOOQ
and our JDBC driver - in this case H2 -->
<dependency>
<groupId>org.jooq</groupId>
<artifactId>jooq</artifactId>
<version>3.4.0</version>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.4.177</version>
</dependency>
<!-- For improved logging, we'll be using
log4j via slf4j to see what's going
on during migration and code generation -->
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.16</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.5</version>
</dependency>
<!-- To esnure our code is working, we're
using JUnit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
0.3. Maven Project Configuration – Plugins
After the dependencies, let’s simply add the Flyway and jOOQ Maven plugins like so. The Flyway plugin:
<plugin>
<groupId>org.flywaydb</groupId>
<artifactId>flyway-maven-plugin</artifactId>
<version>3.0</version>
<!-- Note that we're executing the Flyway
plugin in the "generate-sources" phase -->
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>migrate</goal>
</goals>
</execution>
</executions>
<!-- Note that we need to prefix the db/migration
path with filesystem: to prevent Flyway
from looking for our migration scripts
only on the classpath -->
<configuration>
<url>${db.url}</url>
<user>${db.username}</user>
<locations>
<location>filesystem:src/main/resources/db/migration</location>
</locations>
</configuration>
</plugin>
The above Flyway Maven plugin configuration will read and execute all database migration scripts from src/main/resources/db/migrationprior to compiling Java source code. While the official Flyway documentation suggests that migrations be done in the compile phase, the jOOQ code generator relies on such migrations having been done prior to code generation.
<plugin>
<groupId>org.jooq</groupId>
<artifactId>jooq-codegen-maven</artifactId>
<version>${org.jooq.version}</version>
<!-- The jOOQ code generation plugin is also
executed in the generate-sources phase,
prior to compilation -->
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>generate</goal>
</goals>
</execution>
</executions>
<!-- This is a minimal working configuration.
See the manual's section about the code
generator for more details -->
<configuration>
<jdbc>
<url>${db.url}</url>
<user>${db.username}</user>
</jdbc>
<generator>
<database>
<includes>.*</includes>
<inputSchema>FLYWAY_TEST</inputSchema>
</database>
<target>
<packageName>org.jooq.example.flyway.db.h2</packageName>
<directory>target/generated-sources/jooq-h2</directory>
</target>
</generator>
</configuration>
</plugin>
This configuration will now read the FLYWAY_TEST schema and reverse-engineer it into the target/generated-sources/jooq-h2 directory, and within that, into the org.jooq.example.flyway.db.h2 package.
1. Database increments
Now, when we start developing our database. For that, we’ll create database increment scripts, which we put into thesrc/main/resources/db/migration directory, as previously configured for the Flyway plugin. We’ll add these files:
V1__initialise_database.sql
V2__create_author_table.sql
V3__create_book_table_and_records.sql
These three scripts model our schema versions 1-3 (note the capital V!). Here are the scripts’ contents
-- V1__initialise_database.sql
DROP SCHEMA flyway_test IF EXISTS;
CREATE SCHEMA flyway_test;
-- V2__create_author_table.sql
CREATE SEQUENCE flyway_test.s_author_id START WITH 1;
CREATE TABLE flyway_test.author (
id INT NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50) NOT NULL,
date_of_birth DATE,
year_of_birth INT,
address VARCHAR(50),
CONSTRAINT pk_t_author PRIMARY KEY (ID)
);
-- V3__create_book_table_and_records.sql
CREATE TABLE flyway_test.book (
id INT NOT NULL,
author_id INT NOT NULL,
title VARCHAR(400) NOT NULL,
CONSTRAINT pk_t_book PRIMARY KEY (id),
CONSTRAINT fk_t_book_author_id FOREIGN KEY (author_id) REFERENCES flyway_test.author(id)
);
INSERT INTO flyway_test.author VALUES (next value for flyway_test.s_author_id, 'George', 'Orwell', '1903-06-25', 1903, null);
INSERT INTO flyway_test.author VALUES (next value for flyway_test.s_author_id, 'Paulo', 'Coelho', '1947-08-24', 1947, null);
INSERT INTO flyway_test.book VALUES (1, 1, '1984');
INSERT INTO flyway_test.book VALUES (2, 1, 'Animal Farm');
INSERT INTO flyway_test.book VALUES (3, 2, 'O Alquimista');
INSERT INTO flyway_test.book VALUES (4, 2, 'Brida');
2. Database migration and 3. Code regeneration
The above three scripts are picked up by Flyway and executed in the order of the versions. This can be seen very simply by executing:
mvn clean install
And then observing the log output from Flyway…
[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] Database: jdbc:h2:~/flyway-test (H2 1.4)
[INFO] Validated 3 migrations (execution time 00:00.004s)
[INFO] Creating Metadata table: "PUBLIC"."schema_version"
[INFO] Current version of schema "PUBLIC": <>
[INFO] Migrating schema "PUBLIC" to version 1
[INFO] Migrating schema "PUBLIC" to version 2
[INFO] Migrating schema "PUBLIC" to version 3
[INFO] Successfully applied 3 migrations to schema "PUBLIC" (execution time 00:00.073s).
Note that all of the previous steps are executed automatically, every time someone adds new migration scripts to the Maven module. For instance, a team member might have committed a new migration script, you check it out, rebuild and get the latest jOOQ-generated sources for your own development or integration-test database.
Now, that these steps are done, you can proceed writing your database queries. Imagine the following test case
If you run the mvn clean install again, the above integration test will now compile and pass!
Reiterate
The power of this approach becomes clear once you start performing database modifications this way. Let’s assume that the French guy on our team prefers to have things his way (no offense intended ;-) ):
-- V4__le_french.sql
ALTER TABLE flyway_test.book
ALTER COLUMN title RENAME TO le_titre;
They check it in, you check out the new database migration script, run
mvn clean install
And then observe the log output:
[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] Database: jdbc:h2:~/flyway-test (H2 1.4)
[INFO] Validated 4 migrations (execution time 00:00.005s)
[INFO] Current version of schema "PUBLIC": 3
[INFO] Migrating schema "PUBLIC" to version 4
[INFO] Successfully applied 1 migration to schema "PUBLIC" (execution time 00:00.016s).
When we go back to our Java integration test, we can immediately see that the TITLE column is still being referenced, but it no longer exists:
public class AfterMigrationTest {
@Test
public void testQueryingAfterMigration() throws Exception {
try (Connection c = DriverManager.getConnection("jdbc:h2:~/flyway-test", "sa", "")) {
Result<?> result =
DSL.using(c)
.select(
AUTHOR.FIRST_NAME,
AUTHOR.LAST_NAME,
BOOK.ID,
BOOK.TITLE
// ^^^^^ This column no longer exists.
// We'll have to rename it to LE_TITRE
)
.from(AUTHOR)
.join(BOOK)
.on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
.orderBy(BOOK.ID.asc())
.fetch();
assertEquals(4, result.size());
assertEquals(asList(1, 2, 3, 4), result.getValues(BOOK.ID));
}
}
}
Conclusion
This tutorial shows very easily how you can build a rock-solid development process using Flyway and jOOQ to prevent SQL-related errors very early in your development lifecycle – immediately at compile time, rather than in production!
Infrequent SQL developers often get confused about when to put parentheses and/or aliases on derived tables. There has been this recent Reddit discussion about the subject, where user Elmhurstlol was wondering why they needed to provide an alias to the derived table (the subselect with the UNION) in the following query:
SELECT AVG(price) AS AVG_PRICE
FROM (
SELECT price from product a JOIN pc b
ON a.model=b.model AND maker='A'
UNION ALL
SELECT price from product a JOIN laptop c
ON a.model=c.model and maker='A'
) hello
The question really was about why the "hello" table alias was necessary, because often it seems not to be required.
Here’s what the SQL standard states
If in doubt, it is often useful to consider the SQL standard about the rationale behind some syntax elements. In this case, let’s consider the freely available SQL 1992 standard text (for simplicity), and see how it specifies table references:
The essence of the above syntax specification is this:
A derived table MUST always be aliased
The AS keyword is optional, for improved readability
The parentheses MUST always be put around subqueries
Following these rules, you’ll be pretty safe in most SQL dialects. Here are some deviations to the above, though:
Some dialects allow for unaliased derived tables. However, this is still a bad idea, because you will not be able to unambiguously qualify a column from such a derived table
Takeaway
Always provide a meaningful alias to your derived tables. As simple as that.
Most databases that support default values on their column DDL, it is also possible to actually alter that default. An Oracle example:
CREATE TABLE t (
val NUMBER(7) DEFAULT 1 NOT NULL
);
-- Oops, wrong default, let us change it
ALTER TABLE t MODIFY val DEFAULT -1;
-- Now that is better
Unfortunately, this isn’t possible in SQL Server, where the DEFAULT column property is really a constraint, and probably a constraint whose name you don’t know because it was system generated.
But luckily, jOOQ 3.4 now supports DDL and can abstract this information away from you by generating the following Transact-SQL program:
DECLARE @constraint NVARCHAR(max);
DECLARE @command NVARCHAR(max);
SELECT @constraint = name
FROM sys.default_constraints
WHERE parent_object_id = object_id('t')
AND parent_column_id = columnproperty(
object_id('t'), 'val', 'ColumnId');
IF @constraint IS NOT NULL
BEGIN
SET @command = 'ALTER TABLE t DROP CONSTRAINT '
+ @constraint;
EXECUTE sp_executeSQL @command
SET @command = 'ALTER TABLE t ADD CONSTRAINT '
+ @constraint + ' DEFAULT -1 FOR val';
EXECUTE sp_executeSQL @command
END
ELSE
BEGIN
SET @command = 'ALTER TABLE t ADD DEFAULT -1 FOR val';
EXECUTE sp_executeSQL @command
END
This program will either drop and create a new constraint with the same name, or create an entirely new constraint with a system-generated name.
With jOOQ, you can execute this statement as such:
Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, method references, default methods, the Streams API, and other great stuff. You’ll find the source code on GitHub.
The Best Java 8 Resources – Your Weekend is Booked
We’re obviously not the only ones writing about Java 8. Ever since this great language update’s go live, there had been blogs all around the world appearing with great content and different perspectives on the subject. In this edition of the Java 8 Friday series, we’d like to summarise some of the best content that has been going on on that subject.
1. Brian Goetz’s Answers on Stack Overflow
Brian Goetz was the spec lead for JSR 335. Together with his Expert Group team, he has worked very hard to help Java 8 succeed. However, now that JSR 335 has shipped, his work is far from being over. Brian has had the courtesy of giving authoritative answers to questions from the Java community on Stack Overflow. Here are some of the most interesting questions:
Thumbs up to this great community effort. It cannot get any better than hearing authoritative answers from the spec lead himself.
2. Baeldung.com’s Collection of Java 8 Resources
This list of resources wouldn’t be complete without the very useful list of Java 8 resources (mostly authoritative links to specifications) from the guys over at Baeldung.com. Here is:
Yes, we’ve worked hard to bring you the latest from our experience when integrating jOOQ with Java 8. Here are some of our most popular articles from the recent months:
As part of the ZeroTurnaround content marketing strategy, ZeroTurnaround has launched RebelLabs quite a while ago where various writers publish interesting articles around the topic of Java, which aren’t necessarily related to JRebel and other ZT products. There is some great Java 8 related content having been published over there. Here are our favourite gems:
This blog series we found particularly fun to read. Benji Weber really thinks outside of the box and does some crazy things with default methods, method references and all that. Things that Java developers could only dream of, so far. Here are:
Edwin Dalorzo from Informatech has been treating us with a variety of well-founded comparisons between Java 8 and .NET. This is particularly interesting when comparing Streams with LINQ. Here are some of his best writings:
No, it is missing many other, very interesting blog series. Do you have a series to share? We’re more than happy to update this post, just let us know (in the comments section)
So, we’ve all heard of “flatmap that sh**”, right? The new, functional, hipster way of saying
You’re doing it wrong
Since we don’t really care about dogma and functional religion that much, let’s just start trolling a little (it’s so much easier anyway), with Scala code. Check out the following piece of useful Scala code, which does compile:
def C = Seq(1, 2, 3)
def Windows = 0
def WAT = (C:\Windows) /* OK, we also need this */ (_)
Awesome, right? I leave it up to you to figure out what the above piece of code does.
Don’t believe it? Or don’t care about the puzzle? Here’s the solution on ScalaFiddle.
We’re super excited to announce that our CEO and Head of R&D Lukas will be heading to San Francisco this fall to talk about jOOQ at JavaOne™! This is not just great for Data Geekery and jOOQ, but also for the whole Java / SQL ecosystem, as we believe that the Java / SQL integration deserves much more focus at conferences, where buzzwords like Big Data and NoSQL dominate the agenda disproportionally.
From our perspective, the JVM is the best platform for general purpose languages, whereas SQL is the best tool for database interaction – with Oracle SQL being a leader in the industry. So…
2014 will be a great year for Java and SQL
Prior to JavaOne™, we have also been talking at the awesome 33rd Degree and GeekOut conferences, the latter having been hosted by our friends over at ZeroTurnaround who have launched XRebel, a very promising tool to help you find rogue SQL statements in your application. Stay tuned as we’ll be trying out XRebel to compare jOOQ with Hibernate on our blog, soon.
— Petri Kainulainen (@petrikainulaine) June 5, 2014
Thanks for the shouts, guys! You make the jOOQ experience rock!
Upcoming License Improvements
From our recent negotiations with site license leads, we’ve come to two conclusions that will benefit all of the jOOQ Professional and jOOQ Enterprise customer base.
When you buy a car or a TV, you probably don’t run to the supplier every time you encounter a small defect that prevents you from fully enjoying your product. You’ll fix it yourself. We want to do the same in the future. As we trust our customers, and as we already ship our sources, we will soon allow you to implement urgent fixes to jOOQ yourselves, as we believe that this will improve the jOOQ experience for everyone and add further value to your own experience.
We understand the requirements of purchasing departments in large organisations. Often, it is easier to purchase a site license from a supplier rather than going through the hassles of adapting workstation-based subscriptions all the time. To respond to this need, we’ll soon publish a discounted, tiered pricing model for large-volume purchases of our perpetual licenses.
Both of these improvements will be deployed to all of our customer base in the beginning of July.
Do you already have any questions regarding what will change / improve? Do not hesitate to contact us.
Community Zone – The jOOQ aficionados have been active!
The jOOQ community has been very active again in the last month. We’re happy to point out these editor’s picks from our radar:
Vlad Mihalcea is a very active blogger on the subject of Hibernate integrations, transaction mangagement and connection pooling performance. We’re looking forward to his future blog posts about how to integrate ORMs with SQL/jOOQ, e.g. by applying emerging architecture patterns such as CQRS. One of his most recent, very interesting blog posts deals precisely with that subject.
Micha Kops has been blogging about a variety of Java tool integrations and has now published this comprehensive and very useful jOOQ tutorial. It is great to see fresh opinions from people just getting to know the platform and blogging about it.
SQL Zone – More common SQL mistakes
Our popular blog series “Top 10 mistakes Java developers make when writing SQL”has been enhanced with yet another must-read article for the Java/SQL community:
SQL Zone – Don’t roll your own OFFSET pagination emulation
One of the great reasons why you should use jOOQ is the fact that jOOQ abstracts away all the hard parts of your SQL dialect. If you’re using Oracle (prior to 12c), SQL Server (prior to 2012), or DB2, you might need to emulate what other databases know as OFFSET pagination. While most people get the simple use-cases right, we’ve tried to outline all of the other issues that may arise when you try to do it yourself in our blog post:
A must-read for all SQL transformation aficionados.
Feedback zone
You’ve read to the end of this newsletter, that’s great! Did you like it? What did we do great? What can we improve? What other subjects would you like us to cover?
We’d love to hear from you, so if you want to reach out to us, just drop a message to contact@datageekery.com. Looking forward to hearing from you!
After deep consideration with our lawyers, we would like to follow suit with Oracle and provide you with the following legal disclaimer about our jOOQ-related conference talks, as presented at the awesome GeekOut conference in Tallinn. Please do read them and take them seriously.
Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.
But we haven’t done a top 10 mistakes list with Java 8 yet! For today’s occasion (it’s Friday the 13th), we’ll catch up with what will go wrong in YOUR application when you’re working with Java 8. (it won’t happen to us, as we’re stuck with Java 6 for another while)
1. Accidentally reusing streams
Wanna bet, this will happen to everyone at least once. Like the existing “streams” (e.g. InputStream), you can consume streams only once. The following code won’t work:
IntStream stream = IntStream.of(1, 2);
stream.forEach(System.out::println);
// That was fun! Let's do it again!
stream.forEach(System.out::println);
You’ll get a
java.lang.IllegalStateException:
stream has already been operated upon or closed
So be careful when consuming your stream. It can be done only once
2. Accidentally creating “infinite” streams
You can create infinite streams quite easily without noticing. Take the following example:
// Will run indefinitely
IntStream.iterate(0, i -> i + 1)
.forEach(System.out::println);
The whole point of streams is the fact that they can be infinite, if you design them to be. The only problem is, that you might not have wanted that. So, be sure to always put proper limits:
// That's better
IntStream.iterate(0, i -> i + 1)
.limit(10)
.forEach(System.out::println);
We can’t say this enough. You WILL eventually create an infinite stream, accidentally. Take the following stream, for instance:
IntStream.iterate(0, i -> ( i + 1 ) % 2)
.distinct()
.limit(10)
.forEach(System.out::println);
So…
we generate alternating 0’s and 1’s
then we keep only distinct values, i.e. a single 0 and a single 1
then we limit the stream to a size of 10
then we consume it
Well… the distinct() operation doesn’t know that the function supplied to the iterate() method will produce only two distinct values. It might expect more than that. So it’ll forever consume new values from the stream, and the limit(10) will never be reached. Tough luck, your application stalls.
We really need to insist that you might accidentally try to consume an infinite stream. Let’s assume you believe that the distinct() operation should be performed in parallel. You might be writing this:
IntStream.iterate(0, i -> ( i + 1 ) % 2)
.parallel()
.distinct()
.limit(10)
.forEach(System.out::println);
Now, we’ve already seen that this will turn forever. But previously, at least, you only consumed one CPU on your machine. Now, you’ll probably consume four of them, potentially occupying pretty much all of your system with an accidental infinite stream consumption. That’s pretty bad. You can probably hard-reboot your server / development machine after that. Have a last look at what my laptop looked like prior to exploding:
If I were a laptop, this is how I’d like to go.
5. Mixing up the order of operations
So, why did we insist on your definitely accidentally creating infinite streams? It’s simple. Because you may just accidentally do it. The above stream can be perfectly consumed if you switch the order of limit() and distinct():
IntStream.iterate(0, i -> ( i + 1 ) % 2)
.limit(10)
.distinct()
.forEach(System.out::println);
This now yields:
0
1
Why? Because we first limit the infinite stream to 10 values (0 1 0 1 0 1 0 1 0 1), before we reduce the limited stream to the distinct values contained in it (0 1).
Of course, this may no longer be semantically correct, because you really wanted the first 10 distinct values from a set of data (you just happened to have “forgotten” that the data is infinite). No one really wants 10 random values, and only then reduce them to be distinct.
If you’re coming from a SQL background, you might not expect such differences. Take SQL Server 2012, for instance. The following two SQL statements are the same:
-- Using TOP
SELECT DISTINCT TOP 10 *
FROM i
ORDER BY ..
-- Using FETCH
SELECT *
FROM i
ORDER BY ..
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY
So, as a SQL person, you might not be as aware of the importance of the order of streams operations.
6. Mixing up the order of operations (again)
Speaking of SQL, if you’re a MySQL or PostgreSQL person, you might be used to the LIMIT .. OFFSET clause. SQL is full of subtle quirks, and this is one of them. The OFFSET clause is applied FIRST, as suggested in SQL Server 2012’s (i.e. the SQL:2008 standard’s) syntax.
If you translate MySQL / PostgreSQL’s dialect directly to streams, you’ll probably get it wrong:
IntStream.iterate(0, i -> i + 1)
.limit(10) // LIMIT
.skip(5) // OFFSET
.forEach(System.out::println);
The above yields
5
6
7
8
9
Yes. It doesn’t continue after 9, because the limit() is now applied first, producing (0 1 2 3 4 5 6 7 8 9). skip() is applied after, reducing the stream to (5 6 7 8 9). Not what you may have intended.
BEWARE of the LIMIT .. OFFSET vs. "OFFSET .. LIMIT" trap!
The above stream appears to be walking only through non-hidden directories, i.e. directories that do not start with a dot. Unfortunately, you’ve again made mistake #5 and #6. walk() has already produced the whole stream of subdirectories of the current directory. Lazily, though, but logically containing all sub-paths. Now, the filter will correctly filter out paths whose names start with a dot “.”. E.g. .git or .idea will not be part of the resulting stream. But these paths will be: .\.git\refs, or .\.idea\libraries. Not what you intended.
While that will produce the correct output, it will still do so by traversing the complete directory subtree, recursing into all subdirectories of “hidden” directories.
I guess you’ll have to resort to good old JDK 1.0 File.list() again. The good news is, FilenameFilter and FileFilter are both functional interfaces.
8. Modifying the backing collection of a stream
While you’re iterating a List, you must not modify that same list in the iteration body. That was true before Java 8, but it might become more tricky with Java 8 streams. Consider the following list from 0..9:
// Of course, we create this list using streams:
List<Integer> list =
IntStream.range(0, 10)
.boxed()
.collect(toCollection(ArrayList::new));
Now, let’s assume that we want to remove each element while consuming it:
list.stream()
// remove(Object), not remove(int)!
.peek(list::remove)
.forEach(System.out::println);
Interestingly enough, this will work for some of the elements! The output you might get is this one:
If we introspect the list after catching that exception, there’s a funny finding. We’ll get:
[1, 3, 5, 7, 9]
Heh, it “worked” for all the odd numbers. Is this a bug? No, it looks like a feature. If you’re delving into the JDK code, you’ll find this comment in ArrayList.ArraListSpliterator:
/*
* If ArrayLists were immutable, or structurally immutable (no
* adds, removes, etc), we could implement their spliterators
* with Arrays.spliterator. Instead we detect as much
* interference during traversal as practical without
* sacrificing much performance. We rely primarily on
* modCounts. These are not guaranteed to detect concurrency
* violations, and are sometimes overly conservative about
* within-thread interference, but detect enough problems to
* be worthwhile in practice. To carry this out, we (1) lazily
* initialize fence and expectedModCount until the latest
* point that we need to commit to the state we are checking
* against; thus improving precision. (This doesn't apply to
* SubLists, that create spliterators with current non-lazy
* values). (2) We perform only a single
* ConcurrentModificationException check at the end of forEach
* (the most performance-sensitive method). When using forEach
* (as opposed to iterators), we can normally only detect
* interference after actions, not before. Further
* CME-triggering checks apply to all other possible
* violations of assumptions for example null or too-small
* elementData array given its size(), that could only have
* occurred due to interference. This allows the inner loop
* of forEach to run without any further checks, and
* simplifies lambda-resolution. While this does entail a
* number of checks, note that in the common case of
* list.stream().forEach(a), no checks or other computation
* occur anywhere other than inside forEach itself. The other
* less-often-used methods cannot take advantage of most of
* these streamlinings.
*/
Now, check out what happens when we tell the stream to produce sorted() results:
This will now produce the following, “expected” output
0
1
2
3
4
5
6
7
8
9
And the list after stream consumption? It is empty:
[]
So, all elements are consumed, and removed correctly. The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state. It is now “safe” to remove elements from the list!
Well… can we really? Let’s proceed with parallel(), sorted() removal:
Eek. We didn’t remove all elements!? Free beers (and jOOQ stickers) go to anyone who solves this streams puzzler!
This all appears quite random and subtle, we can only suggest that you never actually do modify a backing collection while consuming a stream. It just doesn’t work.
9. Forgetting to actually consume the stream
What do you think the following stream does?
IntStream.range(1, 5)
.peek(System.out::println)
.peek(i -> {
if (i == 5)
throw new RuntimeException("bang");
});
When you read this, you might think that it will print (1 2 3 4 5) and then throw an exception. But that’s not correct. It won’t do anything. The stream just sits there, never having been consumed.
As with any fluent API or DSL, you might actually forget to call the “terminal” operation. This might be particularly true when you use peek(), as peek() is an aweful lot similar to forEach().
This can happen with jOOQ just the same, when you forget to call execute() or fetch():
All concurrent systems can run into deadlocks, if you don’t properly synchronise things. While finding a real-world example isn’t obvious, finding a forced example is. The following parallel() stream is guaranteed to run into a deadlock:
With streams and functional thinking, we’ll run into a massive amount of new, subtle bugs. Few of these bugs can be prevented, except through practice and staying focused. You have to think about how to order your operations. You have to think about whether your streams may be infinite.
Streams (and lambdas) are a very powerful tool. But a tool which we need to get a hang of, first.