Using SQL to Calculate the Popularity (on Stack Overflow) of Derby, H2, and HSQLDB

Few people know about this very very awesome feature of the Stack Exchange platform. The Stack Exchange Data Explorer

se-data-explorer

To be found here:
http://data.stackexchange.com

As you may know, much of the Stack Exchange platform runs on SQL Server (interesting architecture details here: http://stackexchange.com/performance), and the team has had the courtesy of making a lot of data publicly available through a SQL web API. Here’s the schmema that you can query:

se-data-explorer-schema

Using a running total to calculate cumulative daily questions per tag

The amount of analytics possibilities with such a public schema are infinite. Today, we’ll look into a question that has been interesting for a lot of users in the past: What is the most “popular” Java in-memory database among Derby (also known as Java DB, which ships with the JDK), the popular test database H2 (see also our interview with Thomas Müller, its creator), or HSQLDB.

What we’d like to do is sum up the number of questions per database, up to any given date. This should give us one of those nice exponential curves that managers like so much.

Here’s the SQL query that we’ll run:

SELECT 
  d,
  SUM(h2)     OVER (ORDER BY d) AS h2,
  SUM(hsqldb) OVER (ORDER BY d) AS hsqldb,
  SUM(derby)  OVER (ORDER BY d) AS derby
FROM (
  SELECT 
    CAST(CreationDate AS DATE) AS d, 
    COUNT(CASE WHEN Tags LIKE '%<h2>%'     THEN 1 END) AS h2,
    COUNT(CASE WHEN Tags LIKE '%<hsqldb>%' THEN 1 END) AS hsqldb,
    COUNT(CASE WHEN Tags LIKE '%<derby>%'  THEN 1 END) AS derby
  FROM Posts
  GROUP BY CAST(CreationDate AS DATE)
) AS DailyPosts
ORDER BY d ASC

A short explanation:

The nested select "DailyPosts" creates a PIVOT table with the aggregated number of questions per database and date. We could have used the SQL Server PIVOT clause, if the Stack Exchange platform had stored tagging information in a normalised form, but the equivalent COUNT(CASE) expressions work just as nicely (see also our article about PostgreSQL’s aggregation FILTER clause for more inspiration).

Now, that we have the number of posts per tag and day, all that’s left to do is sum up those numbers from the first day to any given day. That is often also called a “running total”, which can be calculated very easily using the SUM() OVER() window function.

Now we’re done. You can run and play around with this query here:
http://data.stackexchange.com/stackoverflow/query/469392/java-in-memory-database-popularity-by-time

The raw result is not very interesting. It’s a lot of numbers and dates. But if we plot that result in a graph / chart, we’re getting this nice-looking curve here:

se-data-explorer-derby-h2-hsqldb

As we can see, all three databases are roughly equivalent in terms of “popularity”, although H2 seems to be catching up momentum while HSQLDB is on a slight decline.

(Obviously, this “popularity” is not representative of true market share. More questions might just mean that people struggle more with the technology, or – less skilled people are using it).

Have fun further exploring the Stack Exchange Data Explorer:
http://data.stackexchange.com

(all Stack Exchange Subscriber Content is licensed under the terms of the CC BY-SA 3.0 license. For more details, see: http://stackexchange.com/legal)

Further articles

Further articles that are interesting in the context of the displayed query:

How to Find the Longest Consecutive Series of Events in SQL

A very interesting problem that can be solved very easily with SQL is to find consecutive series of events in a time series. But what is a consecutive series of events in a time series?

Take Stack Overflow, for example. Stack Overflow has a cool reputation system that uses badges to reward certain behaviour. As a social website, they encourage users to visit the platform every day. As such, two distinct badges are awarded:

stackoverflow-visits

Informally, it is obvious what this means. You’ll have to log in on day 1. Then again on day 2. Then again (perhaps several times, it doesn’t matter) on day 3. Forgot to log in on day 4? Ooops. We’ll start counting again.

How to do this in SQL?

On this blog, every problem will find its solution in SQL. So does this. And in order to solve this problem, we’re going to use the awesome Stack Exchange Data Explorer, which exposes a lot of Stack Exchange’s publicly available usage information.

Note that we won’t query the consecutive days of visits, as this information is not made available publicly. Instead, let’s query the consecutive days of posts a user has made.

The backing database is SQL Server, so we can run the following statement:

SELECT DISTINCT CAST(CreationDate AS DATE) AS date
FROM Posts
WHERE OwnerUserId = ##UserId##
ORDER BY 1

… which, for my own UserId generates something like:

date          
----------
2010-11-26
2010-11-27
2010-11-29
2010-11-30
2010-12-01
2010-12-02
2010-12-03
2010-12-05
2010-12-06
2010-12-07
2010-12-08
2010-12-09
2010-12-13
2010-12-14
...
(769 rows)

(run the statement yourself, here)

As we can see in the data, there have been gaps in the very early days:

date          
--------------------------------------
2010-11-26
2010-11-27 <---- Gap here after 2 days

2010-11-29
2010-11-30
2010-12-01
2010-12-02
2010-12-03 <---- Gap here after 5 days

2010-12-05
2010-12-06
2010-12-07
2010-12-08
2010-12-09 <---- Gap here after 5 days

2010-12-13
2010-12-14
...

Visually, it is very easy to see how many days in a row there were posts without any gaps. But how to do it with SQL?

To simplify the problem, let’s “store” individual queries in common table expressions. The above query, we’ll call dates:

WITH 

  -- This table contains all the distinct date 
  -- instances in the data set
  dates(date) AS (
    SELECT DISTINCT CAST(CreationDate AS DATE)
    FROM Posts
    WHERE OwnerUserId = ##UserId##
  )
...

Now, the goal of the resulting query is to put all consecutive dates in the same group, such that we can aggregate over this group. The following query is what we want to write:

SELECT 
  COUNT(*) AS consecutiveDates,
  MIN(date) AS minDate,
  MAX(date) AS maxDate
FROM groups
GROUP BY grp -- This "grp" value will be explained later
ORDER BY 1 DESC, 2 DESC

We’d like to aggregate each group “grp” and count the number of dates in the group, as well as find the lowest and the highest date within each group.

Generating groups for consecutive dates

Let’s look at the data again, and to illustrate the idea, we’ll add consecutive row numbers, regardless of the gaps in dates:

row number   date          
--------------------------------
1            2010-11-26
2            2010-11-27

3            2010-11-29 <-- gap before this row
4            2010-11-30
5            2010-12-01
6            2010-12-02
7            2010-12-03

8            2010-12-05 <-- gap before this row

As you can see, regardless whether there is a gap between dates (two dates are not consecutive), their row numbers will still be consecutive. We can do this with the ROW_NUMBER() window function, very easily:

SELECT
  ROW_NUMBER() OVER (ORDER BY date) AS [row number],
  date
FROM dates

Now, let’s check out the following, interesting query:

WITH 

  -- This table contains all the distinct date 
  -- instances in the data set
  dates(date) AS (
    SELECT DISTINCT CAST(CreationDate AS DATE)
    FROM Posts
    WHERE OwnerUserId = ##UserId##
  ),
  
  -- Generate "groups" of dates by subtracting the
  -- date's row number (no gaps) from the date itself
  -- (with potential gaps). Whenever there is a gap,
  -- there will be a new group
  groups AS (
    SELECT
      ROW_NUMBER() OVER (ORDER BY date) AS rn,
      dateadd(day, -ROW_NUMBER() OVER (ORDER BY date), date) AS grp,
      date
    FROM dates
  )
SELECT *
FROM groups
ORDER BY rn

The above query yields:

rn  grp          date          
--- ----------   ----------
1   2010-11-25   2010-11-26
2   2010-11-25   2010-11-27

3   2010-11-26   2010-11-29
4   2010-11-26   2010-11-30
5   2010-11-26   2010-12-01
6   2010-11-26   2010-12-02
7   2010-11-26   2010-12-03

8   2010-11-27   2010-12-05
9   2010-11-27   2010-12-06
10  2010-11-27   2010-12-07
11  2010-11-27   2010-12-08
12  2010-11-27   2010-12-09

13  2010-11-30   2010-12-13
14  2010-11-30   2010-12-14

(run the statement yourself, here)

All we did is subtract the row number from the date to get a new date “grp“. The actual date obtained this way is irrelevant. It’s just an auxiliary value.

What we can guarantee, though, is that for consecutive dates, the value of grp will be the same because for all consecutive dates, the following two equations yield true:

date2 - date1 = 1 // difference in days between dates
rn2   - rn1   = 1 // difference in row numbers

Yet, for non-consecutive dates, while the difference in row numbers is still 1, the difference in days is no longer 1. The groups can now be seen easily:

rn  grp          date          
--- ----------   ----------
1   2010-11-25   2010-11-26
2   2010-11-25   2010-11-27

3   2010-11-26   2010-11-29
4   2010-11-26   2010-11-30
5   2010-11-26   2010-12-01
6   2010-11-26   2010-12-02
7   2010-11-26   2010-12-03

8   2010-11-27   2010-12-05
9   2010-11-27   2010-12-06
10  2010-11-27   2010-12-07
11  2010-11-27   2010-12-08
12  2010-11-27   2010-12-09

13  2010-11-30   2010-12-13
14  2010-11-30   2010-12-14

Thus, the complete query can now be seen here:

WITH 

  -- This table contains all the distinct date 
  -- instances in the data set
  dates(date) AS (
    SELECT DISTINCT CAST(CreationDate AS DATE)
    FROM Posts
    WHERE OwnerUserId = ##UserId##
  ),
  
  -- Generate "groups" of dates by subtracting the
  -- date's row number (no gaps) from the date itself
  -- (with potential gaps). Whenever there is a gap,
  -- there will be a new group
  groups AS (
    SELECT
      ROW_NUMBER() OVER (ORDER BY date) AS rn,
      dateadd(day, -ROW_NUMBER() OVER (ORDER BY date), date) AS grp,
      date
    FROM dates
  )
SELECT 
  COUNT(*) AS consecutiveDates,
  MIN(date) AS minDate,
  MAX(date) AS maxDate
FROM groups
GROUP BY grp
ORDER BY 1 DESC, 2 DESC

And it yields:

consecutiveDates minDate       maxDate       
---------------- ------------- ------------- 
14               2012-08-13    2012-08-26
14               2012-02-03    2012-02-16
10               2013-10-24    2013-11-02
10               2011-05-11    2011-05-20
9                2011-06-30    2011-07-08
7                2012-01-17    2012-01-23
7                2011-06-14    2011-06-20
6                2012-04-10    2012-04-15
6                2012-04-02    2012-04-07
6                2012-03-26    2012-03-31
6                2011-10-27    2011-11-01
6                2011-07-17    2011-07-22
6                2011-05-23    2011-05-28
...

(run the statement yourself, here)

Bonus query 1: Find consecutive weeks

The fact that we chose the granularity of days in the above query is a random choice. We simply took the timestamp from our time series and “collapsed” it to the desired granularity using a CAST function:

SELECT DISTINCT CAST(CreationDate AS DATE)

If we want to know the consecutive weeks, we’ll simply change that function to a different expression, e.g.

SELECT DISTINCT datepart(year, CreationDate) * 100 
              + datepart(week, CreationDate)

This new expression takes the year and the week and generates values like 201503 for week 03 in the year 2015. The rest of the statement remains exactly the same:

WITH 
  weeks(week) AS (
    SELECT DISTINCT datepart(year, CreationDate) * 100 
                  + datepart(week, CreationDate)
    FROM Posts
    WHERE OwnerUserId = ##UserId##
  ),
  groups AS (
    SELECT
      ROW_NUMBER() OVER (ORDER BY week) AS rn,
      dateadd(day, -ROW_NUMBER() OVER (ORDER BY week), week) AS grp,
      week
    FROM weeks
  )
SELECT 
  COUNT(*) AS consecutiveWeeks,
  MIN(week) AS minWeek,
  MAX(week) AS maxWeek
FROM groups
GROUP BY grp
ORDER BY 1 DESC, 2 DESC

And we’ll get the following result:

consecutiveWeeks minWeek maxWeek 
---------------- ------- ------- 
45               201401  201445  
29               201225  201253  
25               201114  201138  
23               201201  201223  
20               201333  201352  
16               201529  201544  
15               201305  201319  
12               201514  201525  
12               201142  201153  
9                201502  201510  
7                201447  201453  
7                201321  201327  
6                201048  201053  
4                201106  201109  
3                201329  201331  
3                201102  201104  
2                201301  201302  
2                201111  201112  
1                201512  201512  

(run the statement yourself, here)

Unsurprisingly, the consecutive weeks span much longer ranges, as I generally use Stack Overflow extensively.

Bonus query 2: Simplify the query using DENSE_RANK()

In a previous article, we’ve shown that SQL Trick: ROW_NUMBER() is to SELECT what DENSE_RANK() is to SELECT DISTINCT.

If we go back to our consecutive days example, we can rewrite the query to find the distinct dates AND the groups in one go, using DENSE_RANK():

WITH 
  groups(date, grp) AS (
    SELECT DISTINCT 
      CAST(CreationDate AS DATE),
      dateadd(day, 
        -DENSE_RANK() OVER (ORDER BY CAST(CreationDate AS DATE)), 
        CAST(CreationDate AS DATE)) AS grp
    FROM Posts
    WHERE OwnerUserId = ##UserId##
  )
SELECT 
  COUNT(*) AS consecutiveDates,
  MIN(date) AS minDate,
  MAX(date) AS maxDate
FROM groups
GROUP BY grp
ORDER BY 1 DESC, 2 DESC

(run the statement yourself, here)

If the above doesn’t make sense, I recommend reading our previous article here, which explains it:

https://blog.jooq.org/2013/10/09/sql-trick-row_number-is-to-select-what-dense_rank-is-to-select-distinct/

Further reading

The above has been one very useful example of using window functions (ROW_NUMBER()) in SQL. Learn more about window functions in any of the following articles:

Java 8 Friday: Language Design is Subtle

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

Language Design is Subtle

It’s been a busy week for us. We have just migrated the jOOQ integration tests to Java 8 for two reasons:

  • We want to be sure that client code compiles with Java 8
  • We started to get bored of writing the same old loops over and over again

The trigger was a loop where we needed to transform a SQLDialect[] into another SQLDialect[] calling .family() on each array element. Consider:

Java 7

SQLDialect[] families = 
    new SQLDialect[dialects.length];
for (int i = 0; i < families.length; i++)
    families[i] = dialects[i].family();

Java 8

SQLDialect[] families = 
Stream.of(dialects)
      .map(d -> d.family())
      .toArray(SQLDialect[]::new);

OK, it turns out that the two solutions are equally verbose, even if the latter feels a bit more elegant. :-)

And this gets us straight into the next topic:

Backwards-compatibility

For backwards-compatibility reasons, arrays and the pre-existing Collections API have not been retrofitted to accommodate all the useful methods that Streams now have. In other words, an array doesn’t have a map() method, just as much as List doesn’t have such a method. Streams and Collections/arrays are orthogonal worlds. We can transform them into each other, but they don’t have a unified API.

This is fine in everyday work. We’ll get used to the Streams API and we’ll love it, no doubt. But because of Java being extremely serious about backwards compatibility, we will have to think about one or two things more deeply.

Recently, we have published a post about The Dark Side of Java 8. It was a bit of a rant, although a mild one in our opinion (and it was about time to place some criticism, after all the praise we’ve been giving Java 8 in our series, before ;-) ). First off, that post triggered a reaction by Edwin Dalorzo from our friends at Informatech. (Edwin has written this awesome post comparing LINQ and Java 8 Streams, before). The criticism in our article evolved around three main aspects:

  • Overloading getting more complicated (see also this compiler bug)
  • Limited support for method modifiers on default methods
  • Primitive type “API overloads” for streams and functional interfaces

A response by Brian Goetz

I then got a personal mail from no one less than Brian Goetz himself (!), who pointed out a couple of things to me that I had not yet thought about in this way:

I still think you’re focusing on the wrong thing. Its not really the syntax you don’t like; its the model — you don’t want “default methods”, you want traits, and the syntax is merely a reminder that you didn’t get the feature you wanted. (But you’d be even more confused about “why can’t they be final” if we dropped the “default” keyword!) But that’s blaming the messenger (where here, the keyword is the messenger.)

Its fair to say “this isn’t the model I wanted”. There were many possible paths in the forest, and it may well be the road not taken was equally good or better.

This is also what Edwin had concluded. Default methods were a necessary means to tackle all the new API needed to make Java 8 useful. If Iterator, Iterable, List, Collection, and all the other pre-existing interfaces had to be adapted to accommodate lambdas and Streams API interaction, the expert group would have needed to break an incredible amount of API. Conversely, without adding these additional utility methods (see the awesome new Map methods, for instance!), Java 8 would have been only half as good.

And that’s it.

Even if maybe, some more class building tools might have been useful, they were not in the center of focus for the expert group who already had a lot to do to get things right. The center of focus was to provide a means for API evolution. Or in Brian Goetz’s own words:

Reaching out to the community

It’s great that Brian Goetz reaches out to the community to help us get the right picture about Java 8. Instead of explaining rationales about expert group decisions in private messages, he then asked me to publicly re-ask my questions again on Stack Overflow (or lambda-dev), such that he can then publicly answer them. For increased publicity and greater community benefit, I chose Stack Overflow. Here are:

The amount of traction these two questions got in no time shows how important these things are to the community, so don’t miss reading through them!

“Uncool”? Maybe. But very stable!

Java may not have the “cool” aura that node.js has. You may think about JavaScript-the-language whatever you want (as long as it contains swear words), but from a platform marketing perspective, Java is being challenged for the first time in a long time – and being “uncool” and backwards-compatible doesn’t help keeping developers interested.

But let’s think long-term, instead of going with trends. Having such a great professional platform like the Java language, the JVM, the JDK, JEE, and much more, is invaluable. Because at the end of the day, the “uncool” backwards-compatibility can also be awesome. As mentioned initially, we have upgraded our integration tests to Java 8. Not a single compilation error, not a single bug. Using Eclipse’s BETA support for Java 8, I could easily transform anonymous classes into lambdas and write awesome things like these upcoming jOOQ 3.4 nested transactions (API not final yet):

ctx.transaction(c1 -> {
    DSL.using(c1)
       .insertInto(AUTHOR, AUTHOR.ID, AUTHOR.LAST_NAME)
       .values(3, "Doe")
       .execute();

    // Implicit savepoint here
    try {
        DSL.using(c1).transaction(c2 -> {
            DSL.using(c2)
               .update(AUTHOR)
               .set(AUTHOR.FIRST_NAME, "John")
               .where(AUTHOR.ID.eq(3))
               .execute();

            // Rollback to savepoint
            throw new MyRuntimeException("No");
        });
    }

    catch (MyRuntimeException ignore) {}

    return 42;
});

So at the end of the day, Java is great. Java 8 is a tremendous improvement over previous versions, and with great people in the expert groups (and reaching out to the community on social media), I trust that Java 9 will be even better. In particular, I’m looking forward to learning about how these two projects evolve:

Although, again, I am really curious how they will pull these two improvements off from a backwards-compatibility perspective, and what caveats we’ll have to understand, afterwards. ;-)

Anyway, let’s hope the expert groups will continue to provide public feedback on Stack Overflow. Stay tuned for more awesome Java 8 content on this blog.

The Top 10 Productivity Booster Techs for Programmers

This is the list we’ve all been waiting for. The top 10 productivity booster techs for programmers that – once you’ve started using them – you can never do without them any longer.

Here it is:

1. Git

logo@2x Before, there were various version control systems. Better ones, worse ones. But somehow they all felt wrong in one way or another.

Came along Git (and GitHub, EGit). Once you’re using this miraculous tool, it’s hard to imagine that you’ll ever meet a better VCS again.

You’ve never used Git? Get started with this guide.

2. Stack Overflow

stackoverflow

No kidding. Have you ever googled for anything tech-related back in 2005? Or altavista’d something back in 2000? Or went to FidoNet in search for answers in 1995? It was horrible. The top results always consisted in boring forum discussions with lots of un-experts and script kiddies claiming wrong things.

These forums still exist, but they don’t turn up on page 1 of Google search results.

Today, any time you search for something, you’ll have 2-3 hits per top 10 from Stack Overflow. And chances are, you’ll look no further because those answers are 80% wonderful! That’s partially because of Stack Overflow’s cunning reputation system, but also partially because of Stack Overflow’s even more cunning SEO rewarding system. (I already got 98 announcer, 19 booster, and 5 publicist badges. Yay).

While Stack Overflow allows its more active user to pursue their vanity (see above ;-) ), all the other users without any accounts will continue to flock in, finding perfect answers and clicking on very relevant ads.

Thumbs up for Stack Overflow and their awesome business model.

3. Office 365

excel We’re a small startup. Keeping costs low is of the essence. With Office 365, we only pay around $120 per user for a full-fledged Office 2013 suite, integrated with Microsoft Onedrive, Sharepoint, Exchange, Access, and much more.

In other words, we get enterprise-quality office software for the price of what students used to pay, before.

And do note, Office 2013 is better than any other Microsoft (or Libre) Office suite before. While not a 100% Programmer thing, it’s still an awesome tool chain for a very competitive price.

4. IntelliJ

intellij

While Eclipse is great (and free), IntelliJ IDEA, and also phpStorm for those unfortunate enough to write PHP are just subtly better in almost every aspect of an IDE. You can try their free community edition any time, but beware, you probably won’t switch back. And then you probably won’t be able to evade the Ultimate edition for long ;-)

PHPStorm is the only way to survive working with PHP
PHPStorm is the only way to survive working with PHP

5. PostgreSQL

pg PostgreSQL claims to be the world’s most advanced Open Source database, and we think it’s also one of the most elegant, easy, standards-compliant databases. It is really the one database that makes working with SQL fun.

We believe that within a couple of years, there’s a real chance of PostgreSQL not only beating commercial databases in terms of syntax but also in terms of performance.

Any time you need a data storage system with a slight preference for SQL-based ones, just make PostgreSQL your default choice. You won’t be missing any feature in that database.

Let’s hear it for PostgreSQL.

6. Java

duke Java is almost 20 years old, but it’s still the #1 or #2 language on the TIOBE index (sharing ranks with C), for very good reasons:

  • It’s robust
  • It’s mature
  • It works everywhere (almost, really too bad it has never succeeded in the browser)
  • It runs on the best platform ever, the JVM
  • It is Open Source
  • It has millions of tools, libraries, extensions, and applications

While some languages may seem a bit more modern or sexy or geeky, Java has and will always rule them all in terms of popularity. It is a first choice and with Java 8, things have improved even more.

7. jOOQ

jooq-logo-black-100x80 Now, learning this from the jOOQ blog is really unexpected and a shocker, but we think that jOOQ fits right into this programmer’s must-have top-10 tool chain. Most jOOQ users out there have never returned back to pre-jOOQ tools, as they’ve found writing SQL in Java as simple as never before.

Given that we’ve had Java and PostgreSQL before, there’s only this one missing piece gluing the two together in the most sophisticated way.

And besides, no one wants to hack around with the JDBC API, these days, do they?

8. Less CSS

less When you try Less CSS for the first time, you’ll think that

Why isn’t CSS itself like this!?

And you’re right. It feels just like CSS the way it should have always been. All the things that you have always hated about CSS (repetitiveness, verbosity, complexity) are gone. And if you’re using phpStorm or some other JetBrains product (see above), you don’t even have to worry about compiling it to CSS.

As an old HTML-table lover who doesn’t care too much about HTML5, layout, and all that, using Less CSS makes me wonder if I should finally dare creating more fancy websites!

Never again without Less CSS.

9. jQuery

jqueryWhat Less CSS is for CSS, jQuery is for JavaScript. Heck, so many junior developers on Stack Overflow don’t even realise that jQuery is just a JavaScript library. They think it is the language, because we’ve grown to use it all over the place.

Yes, sometimes, jQuery can be overkill as is indicated by this slightly cynical website: http://vanilla-js.com

joox-logo-blackBut it helps so much abstracting all the DOM manipulation in a very fluent way. If only all libraries were written this way.

Do note that we’ve also published a similar library for Java, in case you’re interested in jQuery-style DOM XML manipulation. Along with Java 8’s new lambda expressions, manipulating the DOM becomes a piece of cake.

10. C8H10N4O2

764px-Caffeine.svgC8H10N4O2 (more commonly known as Caffeine) is probably the number one productivity booster for programmers.

Some may claim that there’s such a thing like the Ballmer Peak. That might be true, but the Caffeine Peak has been proven times and again.

Have Dilbert’s view on the matter:

http://dilbert.com/strips/comic/2006-10-19/

More productivity boosters

We’re certainly not the only ones believing that there is such a thing as a programmer-productivity-booster. Enjoy this alternative list by Troy Topnik here for more insight:

http://www.activestate.com/blog/2010/03/top-ten-list-productivity-boosters-programmers

#CTMMC, the New Hashtag for Code That Made Me Cry

Our recent article about Code That Made Me Cry was really well received and had quite a few readers, both on our blog and on our syndication partners:

In each of us programmers is a little geek with a little geek humour. This is reflected by funny comics like the one about

The only valid measurement of code quality: WTF/minute

Other platforms ridiculing bad code include

But one thing missing is a hashtag (and acronym) on twitter. It should be called #CTTMC for Code That Made Me Cry.

To promote this hashtag, we have created a website, where we will list our collective programmer pain. Behold:

http://www.ctmmc.net

The Announcer Badge on Stack Overflow

It just struck me like lightning. I just realised one (surely not the only) very important reason, why Stack Overflow always winds up at least once in the top 10 search results on Google for virtually any programming-related search. The Announcer Badge. When you share a link to a Stack Overflow question or answer using the “share” button, you get a personalised link including your own user ID. If that link generates 25 clicks from distinct IPs, you’ll get an Announcer Badge as a reward for linking to Stack Overflow. This badge has been awarded more than 25k times. Note, there’s also the Booster Badge (awarded around 800 times) and Publicist Badge (awarded around 300 times).

Stack Overflow uses a lot of techniques following the Gamification principle. But this particular badge is really clever. Trick your users into creating highly relevant links from arbitrary websites towards Stack Overflow helping to add relevancy to the platform and to its various questions / answers.

Well done, Stack Overflow!

jOOX answers many Stack Overflow questions

jOOX - a jQuery port to Java When you search for Stack Overflow questions regarding XML, DOM, XPath, JAXB, etc, you could very often answer them simply with an example involving jOOX. Take this question extract for example:

Goal

My goal is to achieve following from this ex xml file :

<root>
    <elemA>one</elemA>
    <elemA attribute1='first' attribute2='second'>two</elemA>
    <elemB>three</elemB>
    <elemA>four</elemA>
    <elemC>
        <elemB>five</elemB>
    </elemC>
</root>

to produce the following :

//root[1]/elemA[1]='one'
//root[1]/elemA[2]='two'
//root[1]/elemA[2][@attribute1='first']
//root[1]/elemA[2][@attribute2='second']
//root[1]/elemB[1]='three'
//root[1]/elemA[3]='four'
//root[1]/elemC[1]/elemB[1]='five'

jOOX answer

The above can be achieved quite simply with jOOX (compared to the other answers to the question involving XSLT, SAX, DOM, etc):

List<String> list = $(document).xpath("//*[not(*)]").map(new Mapper<String>() {
  public String map(Context context) {
    return $(context).xpath() + "='" + $(context).text() + "'";
  }
}});

This will produce

/root[1]/elemA[1]='one'
/root[1]/elemA[2]='two'
/root[1]/elemB[1]='three'
/root[1]/elemA[3]='four'
/root[1]/elemC[1]/elemB[1]='five'

It is an “almost” solution to the OP’s problem, jOOX does not (yet) support matching/mapping attributes. Hence, attributes will not produce any output. This will be implemented in the near future, though.

See the full answer and question here:

http://stackoverflow.com/questions/4746299/generate-get-xpath-from-xml-node-java/8943144#8943144