Should I Implement the Arcane Iterator.remove() Method? Yes You (Probably) Should


An interesting question was asked on reddit’s /r/java recently:

Should Iterators be used to modify a custom Collection?

Paraphrasing the question: The author wondered whether a custom java.util.Iterator that is returned from a mutable Collection.iterator() method should implement the weird Iterator.remove() method.

A totally understandable question.

What does Iterator.remove() do?

Few people ever use this method at all. For instance, if you want to implement a generic way to remove null values from an arbitrary Collection, this would be the most generic approach:

Collection<Integer> collection =
Stream.of(1, 2, null, 3, 4, null, 5, 6)
      .collect(Collectors.toCollection(ArrayList::new));

System.out.println(collection);

Iterator<Integer> it = collection.iterator();
while (it.hasNext())
    if (it.next() == null)
        it.remove();

System.out.println(collection);

The above program will print:

[1, 2, null, 3, 4, null, 5, 6]
[1, 2, 3, 4, 5, 6]

Somehow, this API usage does feel dirty. An Iterator seems to be useful to … well … iterate its backing collection. It’s really weird that it also allows for modifying it. It’s even weirder that it only offers removal. E.g. we cannot add a new element before or after the current element of iteration, or replace it.

Luckily, Java 8 provides us with a much better method on the Collection API directly, namely Collection.removeIf(Predicate).

The above iteration code can be re-written as such:

collection.removeIf(Objects::isNull);

OK, now should I implement remove() on my own iterators?

Yes, you should – if your custom collection is mutable. For a very simple reason. Check out the default implementation of Collection.removeIf():

default boolean removeIf(Predicate<? super E> filter) {
    Objects.requireNonNull(filter);
    boolean removed = false;
    final Iterator<E> each = iterator();
    while (each.hasNext()) {
        if (filter.test(each.next())) {
            each.remove();
            removed = true;
        }
    }
    return removed;
}

As I said. The most generic way to remove specific elements from a Collection is precisely to go by its Iterator.remove() method and that’s precisely what the JDK does. Subtypes like ArrayList may of course override this implementation because there’s a more performant alternative, but in general, if you write your own custom, modifiable collection, you should implement this method.

And enjoy the ride into Java’s peculiar, historic caveats for which we all love the language.

Arcane magic with the SQL:2003 MERGE statement


Every now and then, we feel awkward about having to distinguish INSERT from UPDATE for any of the following reasons:

  • We have to issue at least two statements
  • We have to think about performance
  • We have to think about race conditions
  • We have to choose between [UPDATE; IF UPDATE_COUNT = 0 THEN INSERT] and [INSERT; IF EXCEPTION THEN UPDATE]
  • We have to do those statements once per updated / inserted record

All in all, this is a big source of error and frustration. When at the same time, it could’ve been so easy with the SQL MERGE statement!

A typical situation for MERGE

Among many other use-cases, the MERGE statement may come in handy when handling many-to-many relationships. Let’s say we have this schema:

CREATE TABLE documents (
  id NUMBER(7) NOT NULL,
  CONSTRAINT docu_id PRIMARY KEY (id)
);

CREATE TABLE persons (
  id NUMBER(7) NOT NULL,
  CONSTRAINT pers_id PRIMARY KEY (id)
);

CREATE TABLE document_person (
  docu_id NUMBER(7) NOT NULL,
  pers_id NUMBER(7) NOT NULL,
  flag NUMBER(1) NULL,

  CONSTRAINT docu_pers_pk PRIMARY KEY (docu_id, pers_id),
  CONSTRAINT docu_pers_fk_docu
    FOREIGN KEY (docu_id) REFERENCES documents(id),
  CONSTRAINT docu_pers_fk_pers
    FOREIGN KEY (pers_id) REFERENCES persons(id)
);

The above tables are used to model which person has read (flag=1) / deleted (flag=2) what document. To make things simple, the “document_person” entity is usually OUTER JOINed to “documents”, such that the presence or absence of a “document-person” record may have the same semantics: “flag IS NULL” means the document is unread.

Now when you want to mark a document as read, you have to decide whether you INSERT a new “document_person”, or whether to UPDATE the existing one. Same with deletion. Same with marking all documents as read, or deleting all documents.

Use MERGE instead

You can do it all in one statement! Let’s say, you want to INSERT/UPDATE one record, in order to mark one document as read for a person:

-- The target table
MERGE INTO document_person dst

-- The data source. In this case, just a dummy record
USING (
  SELECT :docu_id as docu_id,
         :pers_id as pers_id,
         :flag    as flag
  FROM DUAL
) src

-- The merge condition (if true, then update, else insert)
ON (dst.docu_id = src.docu_id AND dst.pers_id = src.pers_id)

-- The update action
WHEN MATCHED THEN UPDATE SET
  dst.flag = src.flag

-- The insert action
WHEN NOT MATCHED THEN INSERT (
  dst.docu_id,
  dst.pers_id,
  dst.flag
)
VALUES (
  src.docu_id,
  src.pers_id,
  src.flag
)

This looks quite similar, yet incredibly more verbose than MySQL’s INSERT .. ON DUPLICATE KEY UPDATE statement, which is a bit more concise.

Taking it to the extreme

But you can go further! As I said previously, you may also want to mark ALL documents as read, for a given person. No problem with MERGE. The following statement does the same as the previous one, if you specify :docu_id. If you leave it null, it will just mark all documents as :flag:

MERGE INTO document_person dst

-- The data source is now all "documents" (or just :docu_id) left outer
-- joined with the "document_person" mapping
USING (
  SELECT d.id     as docu_id,
         :pers_id as pers_id,
         :flag    as flag
  FROM documents d
  LEFT OUTER JOIN document_person d_p
  ON d.id = d_p.docu_id AND d_p.pers_id = :pers_id
  -- If :docu_id is set, select only that document
  WHERE (:docu_id IS NOT NULL AND d.id = :docu_id)
  -- Otherwise, select all documents
     OR (:docu_id IS NULL)
) src

-- If the mapping already exists, update. Else, insert
ON (dst.docu_id = src.docu_id AND dst.pers_id = src.pers_id)

-- The rest stays the same
WHEN MATCHED THEN UPDATE SET
  dst.flag = src.flag
WHEN NOT MATCHED THEN INSERT (
  dst.docu_id,
  dst.pers_id,
  dst.flag
)
VALUES (
  src.docu_id,
  src.pers_id,
  src.flag
)

MERGE support in jOOQ

MERGE is also fully supported in jOOQ. See the manual for more details (scroll to the bottom):

http://www.jooq.org/manual/JOOQ/Query/

Happy merging! 🙂