Tech Pro: Next Generation Social Content Syndication


Some established content syndicators and discussion platforms include DZone and JCG, my two blog syndication partners. For a content producer like me, such platforms are very interesting, as 10% of their traffic eventually reaches my blog. But there’s room for improvement with these syndicators, as their appearance is not up to date with the latest HTML5 / Frontend developments. DZone uses what appears to be a home-grown editing tool chain, whereas JCG is simply a very large wordpress blog.

Other platforms that are highly relevant but not really up to date with the latest frontend developments include Oracle’s OTN, O’Reilly’s On Java, and The Server Side

TECH.PRO

TECH.PRO. The Tech.Pro logo is a trademark of SIOPCO

Here’s a new-comer, which you should definitely keep an eye on: Tech.Pro, combining a very lean, responsive, intuitive and modern web UI with high-quality articles and social networking. The platform feels fully integrated, something that one wants to participate in.

Not only their website, but also their newsletter is nicely done, with cute images accompanying every post.

Tech.Pro is certainly something to keep an eye on in the near future!

nerd-table-cool-table

Heavyweights Martin Odersky, Erik Meijer and Roland Kuhn Team up for a Coursera Course


Erik Meijer (famously known for LINQ, lots of other .NET goodies, and tie-dye shirts of timeless beauty) teams up with Typesafe‘s Martin Odersky (Scala Language) and Roland Kuhn (Akka) to bring you a 7-week-course on the Principles of Reactive Programming, starting on November 4, 2013.

This cooperation of sharp minds can only mean good things, far beyond the scope of this concrete course. Sign up for the Coursera course here:
https://www.coursera.org/course/reactive

Curious what Erik Meijer is up to these days? Check out his new company Applied Duality, Inc’s website:
http://www.applied-duality.com

The slogan is quite cheeky:

After being stuck for > 40 years in a first-order flat model of rigid tabular data locked up in a closed world, it is time for a developer revolution. It is time to embrace monads and duality to lift the database industry to the next level.

It seems like every great scientist and every great salesman is trying to lift the database industry to “the next level”, these days. Exciting times for database professionals!

Awesome Tweets About Recent Blog Post


When writing blog posts, most people are striving to create great content. Creating great content is very hard. Most often, content is niche content, irrelevant content, overlooked content, boring content, advertising content.

But every now and then, great content is created. Often by accident or by luck. How to recognise great content? By checking twitter. Thanks, jOOQ community for being there and sharing your humour and insights with the public. You’re making jOOQ what it is. Here’s some jOOQ community humour:

awesome-blog-reactions

Column Stores: Teaching an Old Elephant New Tricks


Prof. Michael Stonebraker is a controversial visionary, who is known for nothing less than Ingres, Postgres, Vertica, Streambase, Illustra, VoltDB, SciDB, besides being a renowned MIT professor. My recent blog post about Stonebraker’s talk at the EPFL (host university to Prof. Martin Odersky, creator of the Scala Language and Co-Founder of Typesafe) has triggered a very interesting discussion on reddit.

While Stonebraker is very sure about his obviously biased claims that “The Traditional RDBMS Wisdom is All Wrong”, the bottom line of the reddit discussion included:

Interesting insight on SQL Server’s enhancement can be seen in this blog post by Microsoft’s Nicolas Bruno, who challenges the fact that column stores cannot be implemented by “traditional RDBMS”. As Nicolas Bruno stated, an “Old Elephant” can be taught new tricks. “Traditional RDBMS” have proven to adapt to long-term trends in the database industry. Their success isn’t based around the fact that they are mainly fast, or particularly well-designed to respond to niche problem domains. Their success is mainly based on the fact that they are designed according to Codd’s 12 Rules, and thus to be extremely flexible in how they separate data interfacing (SQL) from data storage.

A lot of additional insight and ongoing links can be found in these blog posts by Daniel Lemire, where he had challenged Stonebraker’s similar claims already four years ago:

Silly Metrics: The Most Used Java Keywords


Tell me…

  • Haven’t you ever wondered how many times you actually “synchronized” something?
  • Didn’t you worry about not using the “do {} while ()” loop structure often enough?
  • Are you an expert in applying “volatile”?
  • Do you “catch” more often than you “try”?
  • Is your program rather “true” or rather “false?
  • And how did that “goto” make it into your source code??

Here’s a bit of a distraction among all the other, rather informative posts I’ve written recently. An utterly useless ranking of the top Java keywords in jOOQ. I mean, after all, useful metrics can already be reviewed at ohloh, or collected with FindBugs and JArchitect

Now, you can figure it out. Here’s the ranking!

Keyword      Count
public       8127
return       6801
final        6608
import       5938
static       3903
new          3110
extends      2111
int          1822
throws       1756
void         1707
if           1661
this         1464
private      1347
class        1239
case         841
else         839
package      711
boolean      506
throw        495
for          421
long         404
true         384
byte         345
interface    337
false        332
protected    293
super        265
break        200
try          149
switch       146
implements   139
catch        127
default      112
instanceof   107
char         96
short        91
abstract     54
double       43
transient    42
finally      34
float        34
enum         25
while        23
continue     12
synchronized 8
volatile     6
do           1

Curious about your own Java keyword ranking? I’ve published the script to calculate these values on GitHub, under the ASL 2.0 license. Check out the sources here:

https://github.com/lukaseder/silly-metrics

Use it, and publish your own rankings! And feel free to provide pull requests to count keywords from other languages, or to calculate entirely different silly and useless metrics.

How to Speed Up Apache Xalan’s XPath Processor by Factor 10x


There has been a bit of an awkward bug in Apache Xalan for a while now, and that bug is XALANJ-2540. The effect of this bug is that an internal SPI configuration file is loaded by Xalan thousands of times per XPath expression evaluation, which can be measured easily as such:

this:

Element e = (Element)
  document.getElementsByTagName("SomeElementName")
          .item(0);
String result = ((Element) e).getTextContent();

Seems to be an incredible 100x faster than this:

// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();

// Negligible
XPath xpath = factory.newXPath();

// Negligible
XPathExpression expression =
  xpath.compile("//SomeElementName");

// Accounts for 70%
String result = (String) expression
  .evaluate(document, XPathConstants.STRING);

It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file’s configuration file:

META-INF/service/org.apache.xml.dtm.DTMManager

Every time!:

A profiling session on Xalan

Fortunately, this behaviour can be overridden by specifying a JVM parameter like this:

-Dorg.apache.xml.dtm.DTMManager=
  org.apache.xml.dtm.ref.DTMManagerDefault

or

-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
  com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault

The above works, as this will allow to bypass the expensive work in lookUpFactoryClassName() if the factory class name is the default anyway:

// Code from c.s.o.a.xml.internal.dtm.ObjectFactory
static String lookUpFactoryClassName(
       String factoryId,
       String propertiesFilename,
       String fallbackClassName) {
  SecuritySupport ss = SecuritySupport
    .getInstance();

  try {
    String systemProp = ss
      .getSystemProperty(factoryId);
    if (systemProp != null) { 

      // Return early from the method
      return systemProp;
    }
  } catch (SecurityException se) {
  }

  // [...] "Heavy" operations later

References

The above text is an extract from a Stack Overflow question and answer that I’ve contributed to the public a while ago. I’m posting it again, here on my blog, such that the community’s awareness for this rather heavy bug can be raised. Feel free to upvote on this ticket here, as every Sun/Oracle JDK on this planet is affected:

https://issues.apache.org/jira/browse/XALANJ-2540

Contributing a fix to Apache would be even better, of course…

MIT Prof. Michael Stonebraker: “The Traditional RDBMS Wisdom is All Wrong”


A very interesting talk about the future of DBMS was recently given at EPFL by MIT Professor and VoltDB Co-founder and CTO Michael Stonebraker, who also gave us Ingres and Postgres. In a bit less than one hour, he explains his views with respect to the three main pillars of database management systems:

  • OLAP / Data warehouses
  • OLTP
  • Other types of data stores

As a NewSQL vendor also actively involved with H-Store, he is of course heavily yet refreshingly biased towards traditional RDBMS storage models being obsolete (an interesting fact is that Oracle Labs representative Eric Sedlar also attended the talk. One might think that the talk was a slighly FUD-dy move against a VoltDB competitor). Unlike what has come to be known as the NoSQL movement, NewSQL relies on similar relational theory / set theory as “traditional SQL”, including support for ACID and structured data.

His claims mainly include that:

  • OLAP / data warehouses will migrate to column-based data stores within 10 years. The traditional row-based data storage approach is dead, as row-based storage will never match column-based storage’s performance increase by factor 100x.
  • For OLTP, the race for the best data storage designs has not yet been decided, but there is a clear indication of classic models being “plain wrong” (according to Stonebraker), as only 4% of wall-clock time is spent on useful data processing, while the rest is occupied with buffer pools, locking, latching, recovery.
Image from Stonebraker's presentation depicting the amount of "useful" work performed by any RDBMS

Image from Stonebraker’s presentation depicting the amount of “useful” work performed by any RDBMS

I specifically recommend the OLTP part of his talk, as it shows how various new techniques could heavily increase performance of traditional RDBMS already today:

  • Most OLTP systems can afford to buy the amount of memory needed to keep data off the disk. This will remove the need for a buffer pool.
  • Single-threading would get rid of the latching overhead. H-Store and VoltDB statically divide shared memory among the cores, for instance. This is very important as latching gets worse and worse with the increasing amount of cores we have, today.
  • Dynamic locking is not really implemented in any popular RDBMS, but the market is uncertain, which workaround best implements concurrency control. In his opinion, MVCC is not going to do the trick in the long run.
  • ACIDness is something that even Jeff Dean from Google admits to miss, once it’s gone, as eventual consistency does not really keep its promise.
  • In a cluster, active-active consistency management can increase log throughput by factor 3x, compared to active-passive logging. (active-active = transaction is run on every node, active-passive = transaction is run only on the master node, the log is sent to all slave nodes)
  • And also, very importantly, anti-caching is a good technique when the in-memory format matches the disk format, as traditional RDBMS spend a substantial amount of time converting disk data formats (blocks, sectors) into memory formats (actual data).

The essence of Stonebraker’s talk is that the “elephants” who currently dominate the market are too slow to react to all the NewSQL vendors’ innovations. It is a very exciting time for a database professional (some refer to them as data geeks) to enter the market and publish new findings.

Another interesting thing to note is that SQL (call it NewSQL, OldSQL) will remain a dominant language for querying DBMS, both for column-stores as for row-stores. This is a strong statement for tools like jOOQ, which embrace SQL as a first-class citizen among programming languages.

See the complete talk by Michael Stonebraker here:

See Stonebraker's Talk here: http://slideshot.epfl.ch/play/suri_stonebraker

See Stonebraker’s Talk here: http://slideshot.epfl.ch/play/suri_stonebraker

Further reading: