Monday, September 19, 2016

Painful Reminder of Java Date Nuances

I don't need to use java.util.Date much anymore these days, but recently chose to do so and was reminded of the pain of using the APIs associated with Java Date. In this post, I look at a couple of the somewhat surprising API expectations of the deprecated parameterized Date constructor that accepts six integers.

In 2016, Java developers are probably most likely to use Java 8's new Date/Time API if writing new code in Java SE 8 or are likely to use a third-party Java date/time library such as Joda-Time if using a version of Java prior to Java 8. I chose to use Date recently in a very simple Java-based tool that I wanted to be deliverable as a single Java source code file (easy to compile without a build tool) and to not depend on any libraries outside Java SE. The target deployment environment for this simple tool is Java SE 7, so the Java 8 Date/Time API was not an option.

One of the disadvantages of the Date constructor that accepts six integers is the differentiation between those six integers and ensuring that they're provided in the proper order. Even when the proper order is enforced, there are subtle surprises associated with specifying the month and year. Perhaps the easiest way to properly instantiate a Date object is either via SimpleDateFormat.parse(String) or via the not-deprecated Date(long) constructor accepting milliseconds since epoch zero.

My first code listing demonstrates instantiation of a Date representing "26 September 2016" with 0 hours, 0 minutes, and 0 seconds. This code listing uses a String to instantiate the Date instance via use of SimpleDateFormat.parse(String).

final SimpleDateFormat formatter = new SimpleDateFormat(DEFAULT_FORMAT);
final Date controlDate = formatter.parse(CONTROL_DATE_TIME_STR);
printDate("Control Date/Time", controlDate);

When the above is run, the printed results are as expected and the output date matches the string provided and parsed for the instance of Date.

=============================================================
= Control Date/Time -> Mon Sep 26 00:00:00 MDT 2016
=============================================================

It can be tempting to use the Date constructors that accept integers to represent different "fields" of a Date instance, but these present the previously mentioned nuances.

The next code listing shows a very naive approach to invoking the Date constructor which accepts six integers representing these fields in this order: year, month, date, hour, minutes, seconds.

// This will NOT be the intended Date of 26 September 2016
// with 0 hours, 0 minutes, and 0 seconds because both the
// "month" and "year" parameters are NOT appropriate.
final Date naiveDate = new Date(2016, 9, 26, 0, 0, 0);
printDate("new Date(2016, 9, 26, 0, 0, 0)", naiveDate);

The output from running the above code has neither the same month (October rather than September) nor the same year (not 2016) as the "control" case shown earlier.

=============================================================
= new Date(2016, 9, 26, 0, 0, 0) -> Thu Oct 26 00:00:00 MDT 3916
=============================================================

The month was one later than we expected (October rather than September) because the month parameter is a zero-based parameter with January being represented by zero and September thus being represented by 8 instead of 9. One of the easiest ways to deal with the zero-based month and feature a more readable call to the Date constructor is to use the appropriate java.util.Calendar field for the month. The next example demonstrates doing this with Calendar.SEPTEMBER.

// This will NOT be the intended Date of 26 September 2016
// with 0 hours, 0 minutes, and 0 seconds because the
// "year" parameter is not correct.
final Date naiveDate = new Date(2016, Calendar.SEPTEMBER, 26, 0, 0, 0);
printDate("new Date(2016, Calendar.SEPTEMBER, 26, 0, 0, 0)", naiveDate);

The code snippet just listed fixes the month specification, but the year is still off as shown in the associated output that is shown next.

=============================================================
= new Date(2016, Calendar.SEPTEMBER, 26, 0, 0, 0) -> Tue Sep 26 00:00:00 MDT 3916
=============================================================

The year is still 1900 years off (3916 instead of 2016). This is due to the decision to have the first integer parameter to the six-integer Date constructor be a year specified as the year less 1900. So, providing "2016" as that first argument specifying the year as 2016 + 1900 = 3916. So, to fix this, we need to instead provide 116 (2016-1900) as the first int parameter to the constructor. To make this more readable to the normal person who would find this surprising, I like to code it literally as 2016-1900 as shown in the next code listing.

final Date date = new Date(2016-1900, Calendar.SEPTEMBER, 26, 0, 0, 0);
printDate("new Date(2016-1900, Calendar.SEPTEMBER, 26, 0, 0, 0)", date);

With the zero-based month used and with the intended year being expressed as the current year less 1900, the Date is instantiated correctly as demonstrated in the next output listing.

=============================================================
= new Date(2016-1900, Calendar.SEPTEMBER, 26, 0, 0, 0) -> Mon Sep 26 00:00:00 MDT 2016
=============================================================

The Javadoc documentation for Date does describe these nuances, but this is a reminder that it's often better to have clear, understandable APIs that don't need nuances described in comments. The Javadoc for the Date(int, int, int, int, int, int) constructor does advertise that the year needs 1900 subtracted from it and that the months are represented by integers from 0 through 11. It also describes why this six-integer constructor is deprecated: "As of JDK version 1.1, replaced by Calendar.set(year + 1900, month, date, hrs, min, sec) or GregorianCalendar(year + 1900, month, date, hrs, min, sec)."

The similar six-integer GregorianCalendar(int, int, int, int, int, int) constructor is not deprecated and, while it still expects a zero-based month parameter, it does not expect one to subtract the actual year by 1900 when proving the year parameter. When the month is specified using the appropriate Calendar month constant, this makes the API call far more readable when 2016 can be passed for the year and Calendar.SEPTEMBER can be passed for the month.

I use the Date class directly so rarely now that I forget its nuances and must re-learn them when the rare occasion presents itself for me to use Date again. So, I am leaving these observations regarding Date for my future self.

  1. If using Java 8+, use the Java 8 Date/Time API.
  2. If using a version of Java prior to Java 8, use Joda-Time or other improved Java library.
  3. If unable to use Java 8 or third-party library, use Calendar instead of Date as much as possible and especially for instantiation.
  4. If using Date anyway, instantiate the Date using either the SimpleDateFormat.parse(String) approach or using Date(long) to instantiate the Date based on milliseconds since epoch zero.
  5. If using the Date constructors accepting multiple integers representing date/time components individually, use the appropriate Calendar month field to make API calls more readable and consider writing a simple builder to "wrap" the calls to the six-integer constructor.

We can learn a lot about what makes an API useful and easy to learn and what makes an API more difficult to learn from using other peoples' APIs. Hopefully these lessons learned will benefit us in writing our own APIs. The Date(int, int, int, int, int, int) constructor that was the focus of this post presents several issues that make for a less than optimal API. The multiple parameters of the same type make it easy to provide the parameters out of order and the "not natural" rules related to providing year and month make put extra burden on the client developer to read the Javadoc to understand these not-so-obvious rules.

Tuesday, September 13, 2016

Apache NetBeans?

It's fairly common to have significant announcements related to the world of Java released in the days and weeks leading up to JavaOne. With that in mind, it's not surprising that we're seeing some significant Java-related announcements just prior to JavaOne 2016 that begins next week. One announcement is Mark Reinhold's Proposed schedule change for JDK 9 in which Reinhold proposes "a four-month extension of the JDK 9 schedule, moving the General Availability (GA) milestone to July 2017." Another major proposal, the subject of this post, is the proposal by Oracle for Oracle to "contribut[e] the NetBeans IDE as a new open-source project within the Apache Incubator."

The Apache NetBeans proposal is summarized on NetBeans.org, but additional details are available on Apache Software Foundation's Incubator Wiki page called NetBeansProposal. The NetBeansProposal Wiki page provides several details related to the benefits, costs, and risks associated with moving NetBeans to the Apache Software Foundation. Additional views on this proposal that summarize or interpret the proposal can be found in online resources such as Proposal has NetBeans moving to Apache Incubator, Oracle's NetBeans Headed to The Apache Software Foundation, Oracle no more - NetBeans is moving to Apache, Java founder James Gosling endorses Apache takeover of NetBeans Java IDE, An unexpected proposal: Oracle bids farewell to NetBeans, and Oracle Proposes NetBeans IDE as Apache Incubator Project. There are also two Reddit threads on this subject on the subreddits programming and java.

I've felt for some time that the open source projects I'm most willing to "take a chance on" and recommend to management and customers are those that have either strong corporate sponsorship or are affiliated with an established and successful umbrella organization such as Apache Software Foundation. Therefore, although I don't like to see NetBeans lose the corporate backing and investment of Oracle, the Apache Software Foundation does provide a home for NetBeans to continue being a successful project.

Like many software developers who have been working in this area for years, I've been using Apache Software Foundation projects for most of those years. The liberal Apache 2 license is welcoming and uncomplicated. The projects tend to be well run and well used. On occasion when projects are no longer active, the ASF is fairly timely in moving such projects to the Apache Attic. Projects associated with ASF tend to enjoy benefits often associated with open source such as multiple contributors including multiple reviewers and real-life "testers." Many of the ASF projects enjoy a large community with the accompanying benefits of a large community such as improved main site documentation as well as third-party supplemental documentation with blogs, books, and articles. Of course, NetBeans already enjoys much of this, so moving to ASF might be more of an approach to retain some of the advantages it already enjoys while at the same time potentially encouraging greater community collaboration.

The Apache Software Foundation projects I've used over the years seem to come from two different types of origins. Some of them have been associated with ASF from their beginning or almost their beginning while others were popular projects already when they were moved to the ASF. NetBeans falls in the latter category with other projects that I used before they went to ASF such as Groovy (from SpringSource/Pivotal) and Flex (from Adobe). It seems likely that Oracle has proposed donating NetBeans to Apache Software Foundation for the same reasons that Pivotal and Adobe donated Groovy and Flex respectively to Apache Software Foundation.

The examples just mentioned (Adobe|Flex, Pivotal|Groovy, and Oracle|NetBeans) are just a subset of examples that could be cited in which corporations who are the sponsors and dominant contributors have given away the open source project, typically with the intent to spend fewer resources managing that project. If NetBeans is able to enjoy significant community contributions, the disadvantages of reduced corporate sponsorship might be at least partially offset. Some of this, of course, depends on what level of involvement Oracle supports its employees in contributing to NetBeans.

When Oracle acquired Sun, many of us wondered about the future of GlassFish (Oracle had already acquired WebLogic from BEA) and NetBeans (Oracle already had a free, but not open source, Java IDE in JDeveloper). Oracle announced in 2013 that GlassFish 4.x would not be available as a commercial offering and would only continue as an unsupported Java EE reference implementation (though third-party support can be found for the "drop-in replacement" Payara Server). Although there are some advantages to this "developer-friendly" reference implementation in terms of trying new Java EE features and learning Java EE concepts, most Java EE developers I'm aware of who use an open source Java EE application server for production have moved to WildFly. Given this, I've been happy to see NetBeans moving along and being supported as well as it has for as many years as it has.

One potentially new prospect for NetBeans is being the basis for more specialized IDEs. Eclipse has long been the basis of specialized IDEs and development tool suites such as Spring Tool Suite (Spring IDE), Oracle Enterprise Pack for Eclipse, Adobe Flash Builder, Red Hat JBoss Developer Studio, and Zend Studio. Similarly, Android Studio is built on IntelliJ IDEA. Although there are already tools based on NetBeans (such as VisualVM), NetBeans's independence from Oracle may seem more attractive to some for future tools' development.

At the time of this writing, the NetBeansProposal Wiki page already lists 63 people in "the initial list of individual contributors" (including 26 people contributors associated with Oracle). That, along with the extensive resources already available related to NetBeans, encourage me and make me think that NetBeans could be a successful and thriving Apache Software Foundation project. I certainly prefer NetBeans's chances as an Apache Software Foundation project over its chances if it existed in a state similar to that placed upon GlassFish.

We Java developers are fortunate to have multiple very strong IDEs available for our use. It's in our best interest if they can each remain strong and viable as all the IDEs (and the developers who use them) benefit from the competition and from the innovation that talented developers working on these IDEs bring to our development experience. Each of the IDEs offers different advantages and has different strengths and I'm hoping that we can benefit from NetBeans's current strengths and future strengths for years to come.

Monday, September 12, 2016

More on Spooling Queries and Results in psql

In the recent blog post SPOOLing Queries with Results in psql, I looked briefly at some PostgreSQL database psql meta-commands and options that can be used to emulate Oracle database's SQL*Plus spooling behavior. In that post, I wrote, "I have not been able to figure out a way to ... have both the query and its results written to the file without needing to use \qecho." Fortunately, since that writing, a colleague pointed me to the psql option --log-file (or -L).

The PostgreSQL psql documentation states that the --log-file / -L option "write[s] all query output into file filename, in addition to the normal output destination." This handy single option prints both the query and its non-error results to the indicated file. For example, if I start psql with the command "psql -U postgres -L C:\output\albums.txt" and then run the query select * from albums;, the generated file C:\output\albums.txt appears like this:

********* QUERY **********
select * from albums;
**************************

           title           |     artist      | year 
---------------------------+-----------------+------
 Back in Black             | AC/DC           | 1980
 Slippery When Wet         | Bon Jovi        | 1986
 Third Stage               | Boston          | 1986
 Hysteria                  | Def Leppard     | 1987
 Some Great Reward         | Depeche Mode    | 1984
 Violator                  | Depeche Mode    | 1990
 Brothers in Arms          | Dire Straits    | 1985
 Rio                       | Duran Duran     | 1982
 Hotel California          | Eagles          | 1976
 Rumours                   | Fleetwood Mac   | 1977
 Kick                      | INXS            | 1987
 Appetite for Destruction  | Guns N' Roses   | 1987
 Thriller                  | Michael Jackson | 1982
 Welcome to the Real World | Mr. Mister      | 1985
 Never Mind                | Nirvana         | 1991
 Please                    | Pet Shop Boys   | 1986
 The Dark Side of the Moon | Pink Floyd      | 1973
 Look Sharp!               | Roxette         | 1988
 Songs from the Big Chair  | Tears for Fears | 1985
 Synchronicity             | The Police      | 1983
 Into the Gap              | Thompson Twins  | 1984
 The Joshua Tree           | U2              | 1987
 1984                      | Van Halen       | 1984
(23 rows)

One drawback when using -L is that any error messages are not written to the file that the queries and successful results are written to. The next screen snapshot demonstrates an error caused by querying from the column name rather than from the table name and the listing after the screen snapshot shows what appears in the output file.

********* QUERY **********
select * from artist;
**************************

The output file generated with psql's -L option shows the incorrect query, but the generated file does not include the error message that was shown in the psql terminal application ('ERROR: relation "artist" does not exist'). I don't know of any way to easily ensure that this error message is written to the same file that the query is written to. Redirection of standard output and standard error is a possibility, but then I'd need to redirect the error messages to a different file than the file to which the query and output are being written based on the filename provided with the -L option.

Saturday, September 10, 2016

AutoCommit in PostgreSQL's psql

One potential surprise for someone familiar with Oracle database's SQL*Plus when being introduced to PostgreSQL database's psql may be psql's default enabling of autocommit. This post provides an overview of psql's handling of autocommit and some related nuances.

By default, Oracle's SQL*Plus command-line tool does not automatically commit DML statements and the operator must explicitly commit these statements as part of a transaction (or exit from the session without rolling back). Because of this, developers and administrators familiar with using SQL*Plus to work with the Oracle database might be a bit surprised when the opposite is true for PostgreSQL and its psql command-line tool. Auto-commit is turned on by default in psql, meaning that every statement (including DML statements such as INSERT, UPDATE, and DELETE statements) are automatically committed once submitted.

One consequence of PostgreSQL's psql enabling autocommit by default is that COMMIT statements are unnecessary. When one tries to submit a commit; in psql with autocommit enabled, the WARNING-level message "there is no transaction in progress" is shown. This is demonstrated in the next screen snapshot.

The remainder of this post looks at how to turn off this automatic committing of all manipulation statements in psql.

One often cited approach to overriding psql's autocommit default is to explicitly begin a transaction with the BEGIN keyword and then psql won't commit until an explicit commit is provided. However, this can become a bit tedious over time and fortunately PostgreSQL's psql provides a convenient way of configuring psql to have autocommit disabled.

Before getting into the easy approach used to disable autocommit in psql, I'll point out here that one should not confuse the advise for ECPG (Embedded SQL in C). When using ECPG, the "SET AUTOCOMMIT" section of the PostgreSQL documentation on ECPG applies. Although this only applies to ECPG and does NOT apply to psql, it might be easy to not realize that as one of the first responses to a Google search for "psql autocommit" is this ECPG-specific manual page. That ECPG-specific manual page states that the command looks like "SET AUTOCOMMIT { = | TO } { ON | OFF }" and adds, "By default, embedded SQL programs are not in autocommit mode, so COMMIT needs to be issued explicitly when desired." This is like Oracle's SQL*Plus and is not how psql behaves by default.

Fortunately, it's very easy to disable autocommit in psql. One merely needs to enter the following at the psql command prompt (AUTOCOMMIT is case sensitive and should be all uppercase):

      \set AUTOCOMMIT off

This simple command disables autocommit for the session. One can determine whether autocommit is enabled with a simple \echo meta-command like this (AUTOCOMMIT is case sensitive and all uppercase and prefixed with colon indicating it's a variable):

      \echo :AUTOCOMMIT

The next screen snapshot demonstrates the discussion so far. It uses an \echo to indicate the default nature of autocommit (on) and how use of \set AUTOCOMMIT allows it to be disabled (off).

If it's desired to "always" have autocommit disabled, the \set AUTOCOMMIT off meta-command can be added to one's local ~/.psqlrc file. For an even more global setting, this meta-command can be placed in a psqlrc file in the database's system config directory (which can be located using PostgreSQL operating system-level command pg_config --sysconfdir as shown in the next screen snapshot).

One last nuance to be wary of when using psql and dealing with autocommit, is to realize that show AUTOCOMMIT; is generally not useful. In PostgreSQL 9.5, as the next screen snapshot demonstrates, an error message makes it clear that it's not even available anymore.

Conclusion

Although autocommit is enabled by default in PostgreSQL database's psql command-line tool, it can be easily disabled using \set AUTOCOMMIT off explicitly in a session or via configuration in the personal ~/.psqlrc file or in the global system configuration psqlrc file.

Saturday, September 3, 2016

Running -XX:CompileCommand on Windows

The HotSpot JVM provides several command-line arguments related to Just In Time (JIT) compilation. In this post, I look at the steps needed to start applying the command-line flag -XX:CompileCommand to see the just-in-time compilation being performed on individual methods.

JIT Overview

Nikita Salnikov-Tarnovski's blog post Do you get Just-in-time compilation? provides a nice overview of the JIT compiler and why it's needed. The following is an excerpt of that description:

Welcome - HotSpot. The name derives from the ability of JVM to identify "hot spots" in your application's - chunks of bytecode that are frequently executed. They are then targeted for the extensive optimization and compilation into processor specific instructions. ... The component in JVM responsible for those optimizations is called Just in Time compiler (JIT). ... Rather than compiling all of your code, just in time, the Java HotSpot VM immediately runs the program using an interpreter, and analyzes the code as it runs to detect the critical hot spots in the program. Then it focuses the attention of a global native-code optimizer on the hot spots.

The IBM document JIT compiler overview also provides a concise high-level overview of the JIT and states the following:

In practice, methods are not compiled the first time they are called. For each method, the JVM maintains a call count, which is incremented every time the method is called. The JVM interprets a method until its call count exceeds a JIT compilation threshold. Therefore, often-used methods are compiled soon after the JVM has started, and less-used methods are compiled much later, or not at all. The JIT compilation threshold helps the JVM start quickly and still have improved performance. The threshold has been carefully selected to obtain an optimal balance between startup times and long term performance.

Identifying JIT-Compiled Methods

Because JIT compilation "kicks" in for a particular method only after it's been invoked and interpreted a number of times equal to that specified by -XX:CompileThreshold (10,000 for server JVM and 5,000 for client JVM), not all methods will be compiled by the JIT compiler. The HotSpot command-line option -XX:+PrintCompilation is useful for determining which methods have reached this threshold and have been compiled. Any method that has output displayed with this option is a compiled method for which compilation details can be gleaned using -XX:CompileCommand.

The following screen snapshot demonstrates using -XX:+PrintCompilation to identify JIT-compiled methods. None of the methods shown are of the simple application itself. All methods runs enough times to meet the threshold to go from being interpreted to being compiled just-in-time are "system" methods.

-XX:CompileCommand Depends on -XX:+UnlockDiagnosticVMOptions

One of the prerequisites for using -XX:CompileCommand to "print generated assembler code after compilation of the specified method" is to use -XX:+UnlockDiagnosticVMOptions to "unlock the options intended for diagnosing the JVM."

-XX:CompileCommand Depends on Disassembler Plugin

Another dependency required to run -XX:CompileCommand against a method to view "generated assembler code" created by the JIT compilation is inclusion of the disassembler plugin. Project Kenai contains a Basic Disassembler Plugin for HotSpot Downloads page that can be used to access these, but Project Kenai is closing. The online resource How to build hsdis-amd64.dll and hsdis-i386.dll on Windows details how to build the disassembler plugin for Windows. Lukas Stadler documents the need for the disassembler plugin and provides a link to a "Windows x86 precompiled binary" hsdis-i386.zip.

The easiest way I found to access a Windows-compatible disassembler plugin was to download it from the Free Code Manipulation Library (FCML) download page at http://fcml-lib.com/download.html. As of this writing, the latest version of download is fcml-1.1.1 (04.08.2015). The hsdis-1.1.1-win32-amd64.zip can be downloaded for "An externally loadable disassembler plugin for 64-bit Java VM" and additional options for download are available as shown in the next screen snapshot.

The next screen snapshot demonstrates the error one can expect to see if this disassembler plugin has not been downloaded and placed in the proper directory.

The error message states, "Could not load hsdis-amd64.dll; library not loadable; PrintAssembly is disabled". There is a hsdis-amd64.dll in the ZIP file hsdis-1.1.1-win32-amd64.zip available for download from FMCL. Now, we just need to extract the hsdis-amd64.dll file from the ZIP file and copy it into the appropriate JRE directory.

The disassembler plugin JAR needs to be placed in either the jre/bin/server or jre/bin/client directories associated with the JRE that is applied when you run the Java launcher (java). In my case, I know that my path is defined such that it gets Java executables, including the Java launcher, from a JRE based on what my JAVA_HOME environment variable is set to. The next screen snapshot shows which directory that is and I can see that I'll need to copy the disassembler plugin JAR into the JDK's "jre" directory rather than into a non-JDK "jre" directory.

Knowing that my Java launcher (java) is run out of the JDK's "jre" installation, I know that I need to copy the disassembler plugin JAR into the appropriate subdirectory under that. In my case, there is a "server" subdirectory and no "client" subdirectory, so I want to copy the disassembler plugin JAR into %JAVA_HOME%\jre\bin\server.

Seeing JIT Compiled Method's Generated Assembler Code

With the disassembler plugin JAR copied into my JRE's bin/server subdirectory, I am now able to include the command-line option -XX:CompileCommand=print with a specific method name to see that method's generated assembler code upon JIT compilation. In my case, because my own simple application doesn't have any methods that get interpreted enough times to trigger JIT, I'll monitor a "system" method instead. In this case, I specify the option "-XX:CompileCommand=print,java/lang/String.hashCode" to print out the generated assembler code for the String.hashCode() method. This is demonstrated in the next screen snapshot.

This screen snapshot includes several affirmations that we've got the necessary dependencies set appropriately to use -XX:CompileCommand. These affirmations include existence of the messages, "Loaded disassembler from..." and "Decoding compiled method...". The mere existence of much more output than before and the presence of assembler code are obvious verifications of successful use of -XX:CompilerCommand to print a method's generated assembler code.

Deciphering Assembly Code

At this point, the real work begins. The printed generated assembler code can now be analyzed and methods can potentially be changed based on this analysis. This type of effort, of course, requires knowledge of the assembler syntax.

A Side Note on -XX:+PrintAssembly

I have not covered the option -XX:+PrintAssembly in this post because it is rarely as useful to see all generated assembly code at once as it is to see assembly code for specifically selected methods. I like how Martin Thompson articulates the issue, "[Using -XX:+PrintAssembly] can put you in the situation of not being able to see the forest for the trees."

Conclusion

The HotSpot JVM option -XX:CompileCommand is useful for affecting and monitoring the behavior of the Just-in-Time compiler. This post has shown how to apply the option in a Windows environment with the "print" command to see the generated assembler code for a method that had been interpreted enough times to be compiled into assembler code for quicker future access.

Friday, September 2, 2016

Even Good Code Comments Deteriorate

As I mentioned in a previous blog post, I've been working a bit with Infinispan recently. One of the things that I like about Infinispan's and PostgreSQL's documentation is that each product's documentation makes it very clear which version of the product the documentation applies to and they each make it easy to find other or more current versions of the documentation. For example, the PostgreSQL 9.5 documentation lists "This page in other versions" across the top with links to other versions of the same documentation. An Infinispan example is the Infinispan 8.1 documentation that not only includes the version in the URL, but also states in very obvious fashion, "This is written specifically for Infinispan 8.1." In the case of the Infinispan documentation, the versioned documentation helped me realize when some class-level Javadoc documentation was no longer correct.

The Javadoc-based class level description for class org.infinispan.manager.DefaultCacheManager provides the type of core API class documentation that I like to see. It explains the types and number of instances of the class would commonly be expected and even provides a code-based "Sample usage" of the class. Unfortunately, this description applies more to the old interface CacheManager than the current class DefaultCacheManager that formerly (but no longer) implemented the CacheManager interface. The following screen snapshot shows the 8.2 Javadoc documentation for DefaultCacheManager.

This Javadoc documentation for CacheManager appears to have been on the interface org.infinispan.manager.CacheManager since at least Infinispan 4 (first version of Infinispan because of its JBoss Cache heritage). The class org.infinispan.manager.DefaultCacheManager at that time implemented CacheManager and had almost the same Javadoc documentation as the interface it implemented as shown in the next screen snapshot.

Infinispan 5 introduced another CacheManager as a class (org.infinispan.cdi.CacheManager). Infinispan 6 removed that CacheManager class and deprecated the CacheManager interface with comment, "This interface is only for backward compatibility with Infinispan 4.0.Final and it will be removed in a future version. Use EmbeddedCacheManager or CacheContainer wherever needed." There is no mention of a CacheManager (class or interface) in Infinispan 7.0 Javadoc, but the DefaultCacheManager class still references CacheManager like it's an interface.

This is an example of how even good code comments can deteriorate over time as APIs and other documented constructs change. I have seen this effect in several code bases that I've worked on: even in-code comments that are accurate and helpful when they are first written can become less helpful or even misleading when things change and the in-code documentation doesn't change with them. This is more likely to happen when the documentation is repeated across multiple constructs such that when the code changes, the comments only get updated or removed in one of the places and not the other.

I don't want to imply that I think Javadoc or other comments or in-code documentation should not be written because I definitely think that well-written in-code documentation can be highly useful. However, it's also true that comments are not something to be written once with the expectation that they won't need to be changed at some point. In-code documentation needs to be treated with the same care and attention as the code it describes. Otherwise, the in-code documentation that at one time accurately described a construct might actually end up confusing understanding of the construct it's expected to help describe.

Wednesday, August 31, 2016

Will StackOverflow Documentation Realize Its Lofty Goal?

StackOverflow.com has had a huge impact on software development. Although I agree with Fred Brooks that there is no silver bullet in software development, StackOverflow.com has certainly played a significant role in developers learning quicker from others' experiences, being able to learn from many more peoples' experiences, and being more productive. StackOverflow.com allows developers to benefit from the advantages of social media with community sharing and community voting and improving of the answers provided there. Now, the StackOverflow Documentation Tour, which offers a tour of the new (beta) StackOverflow Documentation, advertises, "Together, we can do for Documentation what we did for Q&A." That's a lofty goal given what StackOverflow and the development community have done for questions and answers.

I have always enjoyed learning from software development "cookbooks" or "recipe-based" books. These example-heavy books have helped me to learn programming languages, frameworks, and libraries more quickly and to apply them in meaningful ways more quickly. Like many developers, I often find the answers to more detailed questions or issues I encounter on StackOverflow.com. This is largely explained by the fact that StackOverflow has so many "contributers" from the worldwide software development community. That vast number of developers increase the probability of any particular issue or need having been encountered before by someone else. StackOverflow Documentation combines the best of the example-heavy cookbooks and StackOverflow's community contributions by combining example-heavy documentation like that commonly presented in cookbooks with the expertise of an entire community that StackOverflow currently enjoys.

My first impression upon reading about StackOverflow Documentation was that it would undoubtedly be a huge success because it does combine the best of the "cookbooks" with the best of StackOverflow. However, as I've thought about it a bit more, I've started to think of some potential hurdles that might prevent it from achieving the success rate of the questions and answers StackOverflow. I describe these thoughts briefly in this post.

Although the documentation for some languages, frameworks, libraries, and toolkits is not very good, some is actually pretty well written. For example, I have long thought highly of the Spring Framework documentation. It's well-written and mixes text and many examples. Another example of well-written documentation with plenty of examples is the Java Tutorials. My most common use of StackOverflow related to Spring and Java has been to get answers to specific issues I've run into or "corner cases" that aren't in the documentation or that I don't know how to look up in the documentation. Basic examples of how to do general things in Spring and in Java have rarely been what I have needed StackOverflow for. Many official documentation sites now allow comments and feedback from the community as well.

I rarely go directly to StackOverflow to ask a specific question. Rather, I typically use a search engine such as Google to type in my search and allow the search engine to point me to potential references. StackOverflow matches are often high on the returned list and I definitely favor returned results that reference StackOverflow over lesser known sites. One of the most disheartening experiences in searching the web to resolve a particularly difficult issue is to have a search engine return no matches or very few matches with no StackOverflow matches. Much of StackOverflow Documentation's success may hinge on its code examples doing well in the search engine algorithms and on developers learning to give it the same preference many of us give to StackOverflow today when choosing which search engine results to look at first.

Besides the advantages of social collaboration that StackOverflow Documentation enjoys, I think a significant advantage of StackOverflow Documentation for developers will be that providing of version information with the examples. The web and blogosphere are full of code-heavy examples, but many of these examples don't providing date or version information. Even when dates are provided, it's not always clear to which versions of a language, framework, toolkit, or library the examples provide. StackOverflow Documentation specifically supports providing of version information and I think that will be extremely beneficial to users of the site. If a contributor associates the wrong version with the example, other community members will be quick to fix it. Even when an original contributor does not keep an example updated as versions change, the community often will likely do so.

Another interesting characteristic of StackOverflow Documentation is the vendor affiliations. StackOverflow Documentation partners include Microsoft, Xamarin (now part of Microsoft), DropBox, PubNub, and PayPal. I can envision these partnerships contributing to or taking away from the success of StackOverflow Documentation. If the partners do a good job of integrating their own documentation with the StackOverflow Documentation rather than just creating a case of developers needing to look in more places for "official" documentation, then it could lead to success. It will be interesting to see how the community edits to partner-affiliated topics will be moderated, censored, and responded to by individuals associated with the partner organization.

Another characteristic of StackOverflow Documentation that gives it a clear advantage over any alternative is its affiliation with the well-established question and answer portion. Developers who needed to create a new account and have an entirely different "rewards" system might be less likely to involve themselves. By sharing the total reputation between traditional Q&A StackOverflow and StackOverflow Documentation, a developer might be more likely to move between the two. This also allows a developer to reference his or her own entries in one of the forums from the other forum when appropriate. StackOverflow Documentation also offers its own specific badges which might be the incentive experienced StackOverflow users need to be active in StackOverflow Documentation.

I expect that StackOverflow Documentation will be a helpful resource and likely a successful one. However, achieving the same level of prominence as its question and answer counterpart may be difficult as the need for community-managed documentation may not be as great as the need for community-managed question and answers was when StackOverflow.com entered the scene. On the other hand, there were questions and answer forums before StackOverflow (such as JavaRanch), but their existence did not prevent StackOverflow.com from becoming to go-to resource for many developers.