Monday, February 23, 2015

Joining Strings in JDK 8

JDK 8 introduced language features such as lambda expressions, streams, and even the new Date/Time API that will change the way we write Java applications. However, there are also several new APIs and features that might be less "game changing," but still bring greater convenience and expressiveness to the Java programming language. In this post, I look at one of these smaller features and examine the ability to easily concatenate multiple Strings in JDK 8.

Perhaps the easiest way to concatenate multiple Strings in JDK 8 is via two new static methods on the ubiquitous Java class String: join(CharSequence, CharSequence...) and join(CharSequence, Iterable). The next two code listings demonstrate how easy it is to apply these two String.join methods.

Using String.join(CharSequence, CharSequence...)
/**
 * Words associated with the blog at http://marxsoftware.blogspot.com/ in array.
 */
private final static String[] blogWords = {"Inspired", "by", "Actual", "Events"};

/**
 * Demonstrate joining multiple Strings using static String
 * "join" method that accepts a "delimiter" and a variable
 * number of Strings (or an array of Strings).
 */
private static void demonstrateStringJoiningArray()
{
   final String blogTitle = String.join(" ", blogWords);
   out.println("Blog Title: " + blogTitle);

   final String postTitle = String.join(" ", "Joining", "Strings", "in", "JDK", "8");
   out.println("Post Title: " + postTitle);
}
Using String.join(CharSequence, Iterable)
/**
 * Pieces of a Media Access Control (MAC) address.
 */
private final static List<String> macPieces;

static
{
   macPieces = new ArrayList<>();
   macPieces.add("01");
   macPieces.add("23");
   macPieces.add("45");
   macPieces.add("67");
   macPieces.add("89");
   macPieces.add("ab");
};

/**
 * Demonstrate joining multiple Strings using static String
 * "join" method that accepts a "delimiter" and an Iterable
 * on Strings.
 */
private static void demonstrateStringJoiningIterable()
{
   final String macAddress = String.join(":", macPieces);
   out.println("MAC Address: " + macAddress);
}

The output from running the two above code listings is:

Blog Title: Inspired by Actual Events
Post Title: Joining Strings in JDK 8
MAC Address: 01:23:45:67:89:ab

Using the two static String.join methods is an easy way to combine strings, but the StringJoiner class introduced with JDK 8 provides even more power and flexibility. The next code listing demonstrates instantiating a StringJoiner and passing it a specified delimiter (decimal point), prefix (opening parenthesis), and suffix (closing parenthesis).

Simple String Joiner Use
/**
 * Demonstrate joining multiple Strings using StringJoiner
 * with specified prefix, suffix, and delimiter.
 */
private static void demonstrateBasicStringJoiner()
{
   // StringJoiner instance with decimal point for delimiter, opening
   // parenthesis for prefix, and closing parenthesis for suffix.
   final StringJoiner joiner = new StringJoiner(".", "(", ")");
   joiner.add("216");
   joiner.add("58");
   joiner.add("216");
   joiner.add("206");
   final String ipAddress = joiner.toString();
   out.println("IP Address: " + ipAddress);
}

Running the above code prints the following string to standard output: "IP Address: (216.58.216.206)"

The StringJoiner is an especially attractive approach in the scenario where one is adding delimiting characters to a String being built up as part of some type of iteration with a StringBuilder. In such cases, it was often necessary to remove an extra character added to the end of that builder with the last iteration. StringJoiner is "smart enough" to only add the delimiters between strings being concatenated and not after the final one. The successive calls to add(CharSequence) methods look very similar to the StringBuilder/StringBuffer APIs.

The final JDK 8-introduced approach for joining Strings that I will cover in this post is use of stream-powered collections with a joining collector (reduction operation). This is demonstrated in the next code listing and its output is the same as the String.join approach used to print a MAC address via the String.join that accepted an Iterable as its second argument.

String Joining with a Collection's Stream
/**
 * Demonstrate joining Strings in a collection via that collection's
 * Stream and use of the Joining Collector.
 */
private static void demonstrateStringJoiningWithCollectionStream()
{
   final String macAddress =
      macPieces.stream().map(
         piece -> piece).collect(Collectors.joining(":"));
   out.println("MAC Address: " + macAddress);
}

If a developer wants the ability to provide a prefix and suffix to the joined string without having to make successive calls to add methods required to join Strings with StringJoiner, the Collectors.joining(CharSequence, CharSequence, CharSequence) method is a perfect fit. The next code example shows the IP address example from above used to demonstrate StringJoiner, but this time implemented with a collection, stream, and joining collector. The output is the same as the previous example without the need to specify add(CharSequence) for each String to be joined.

String Joining with Collection's Stream and Prefix and Suffix
/**
 * Demonstrate joining Strings in a collection via that collection's
 * Stream and use of a Joining Collector that with specified prefix 
 * and suffix.
 */
private static void demonstrateStringJoiningWithPrefixSuffixCollectionStream()
{
   final List<String> stringsToJoin = Arrays.asList("216", "58", "216", "206");
   final String ipAddress =
      stringsToJoin.stream().map(
         piece -> piece).collect(Collectors.joining(".", "(", ")"));
   out.println("IP Address: " + ipAddress);
}

This blog post has covered three of the approaches to joining Strings available with JDK 8:

  1. Static String.join methods
  2. Instance of StringJoiner
  3. Collection Stream with Joining Collector

Wednesday, February 18, 2015

Determining File Types in Java

Programmatically determining the type of a file can be surprisingly tricky and there have been many content-based file identification approaches proposed and implemented. There are several implementations available in Java for detecting file types and most of them are largely or solely based on files' extensions. This post looks at some of the most commonly available implementations of file type detection in Java.

Several approaches to identifying file types in Java are demonstrated in this post. Each approach is briefly described, illustrated with a code listing, and then associated with output that demonstrates how different common files are typed based on extensions. Some of the approaches are configurable, but all examples shown here use "default" mappings as provided out-of-the-box unless otherwise stated.

About the Examples

The screen snapshots shown in this post are of each listed code snippet run against certain subject files created to test the different implementations of file type detection in Java. Before covering these approaches and demonstrating the type each approach detects, I list the files under test and what they are named and what they really are.

File
Name
File
Extension
File
Type
Type Matches
Extension Convention?
actualXml.xml xml XML Yes
blogPostPDF   PDF No
blogPost.pdf pdf PDF Yes
blogPost.gif gif GIF Yes
blogPost.jpg jpg JPEG Yes
blogPost.png png PNG Yes
blogPostPDF.txt txt PDF No
blogPostPDF.xml xml PDF No
blogPostPNG.gif gif PNG No
blogPostPNG.jpg jpg PNG No
dustin.txt txt Text Yes
dustin.xml xml Text No
dustin   Text No

Files.probeContentType(Path) [JDK 7]

Java SE 7 introduced the highly utilitarian Files class and that class's Javadoc succinctly describes its use: "This class consists exclusively of static methods that operate on files, directories, or other types of files" and, "in most cases, the methods defined here will delegate to the associated file system provider to perform the file operations."

The java.nio.file.Files class provides the method probeContentType(Path) that "probes the content type of a file" through use of "the installed FileTypeDetector implementations" (the Javadoc also notes that "a given invocation of the Java virtual machine maintains a system-wide list of file type detectors").

/**
 * Identify file type of file with provided path and name
 * using JDK 7's Files.probeContentType(Path).
 *
 * @param fileName Name of file whose type is desired.
 * @return String representing identified type of file with provided name.
 */
public String identifyFileTypeUsingFilesProbeContentType(final String fileName)
{
   String fileType = "Undetermined";
   final File file = new File(fileName);
   try
   {
      fileType = Files.probeContentType(file.toPath());
   }
   catch (IOException ioException)
   {
      out.println(
           "ERROR: Unable to determine file type for " + fileName
              + " due to exception " + ioException);
   }
   return fileType;
}

When the above Files.probeContentType(Path)-based approach is executed against the set of files previously defined, the output appears as shown in the next screen snapshot.

The screen snapshot indicates that the default behavior for Files.probeContentType(Path) on my JVM seems to be tightly coupled to the file extension. The files with no extensions show "null" for file type and the other listed file types match the files' extensions rather than their actual content. For example, all three files with names starting with "dustin" are really the same single-sentence text file, but Files.probeContentType(Path) states that they are each a different type and the listed types are tightly correlated with the different file extensions for essentially the same text file.

MimetypesFileTypeMap.getContentType(String) [JDK 6]

The class MimetypesFileTypeMap was introduced with Java SE 6 to provide "data typing of files via their file extension" using "the .mime.types format." The class's Javadoc explains where in a given system the class looks for MIME types file entries. My example uses the ones that come out-of-the-box with my JDK 8 installation. The next code listing demonstrates use of javax.activation.MimetypesFileTypeMap.

/**
 * Identify file type of file with provided name using
 * JDK 6's MimetypesFileTypeMap.
 *
 * See Javadoc documentation for MimetypesFileTypeMap class
 * (http://docs.oracle.com/javase/8/docs/api/javax/activation/MimetypesFileTypeMap.html)
 * for details on how to configure mapping of file types or extensions.
 */
public String identifyFileTypeUsingMimetypesFileTypeMap(final String fileName)
{    
   final MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
   return fileTypeMap.getContentType(fileName);
}

The next screen snapshot demonstrates the output from running this example against the set of test files.

This output indicates that the MimetypesFileTypeMap approach returns the MIME type of application/octet-stream for several files including the XML files and the text files without a .txt suffix. We see also that, like the previously discussed approach, this approach in some cases uses the file's extension to determine the file type and so incorrectly reports the file's actual file type when that type is different than what its extension conventionally implies.

URLConnection.getContentType()

I will be covering three methods in URLConnection that support file type detection. The first is URLConnection.getContentType(), a method that "returns the value of the content-type header field." Use of this instance method is demonstrated in the next code listing and the output from running that code against the common test files is shown after the code listing.

/**
 * Identify file type of file with provided path and name
 * using JDK's URLConnection.getContentType().
 *
 * @param fileName Name of file whose type is desired.
 * @return Type of file for which name was provided.
 */
public String identifyFileTypeUsingUrlConnectionGetContentType(final String fileName)
{
   String fileType = "Undetermined";
   try
   {
      final URL url = new URL("file://" + fileName);
      final URLConnection connection = url.openConnection();
      fileType = connection.getContentType();
   }
   catch (MalformedURLException badUrlEx)
   {
      out.println("ERROR: Bad URL - " + badUrlEx);
   }
   catch (IOException ioEx)
   {
      out.println("Cannot access URLConnection - " + ioEx);
   }
   return fileType;
}

The file detection approach using URLConnection.getContentType() is highly coupled to files' extensions rather than the actual file type. When there is no extension, the String returned is "content/unknown."

URLConnection.guessContentTypeFromName(String)

The second file detection approach provided by URLConnection that I'll cover here is its method guessContentTypeFromName(String). Use of this static method is demonstrated in the next code listing and associated output screen snapshot.

/**
 * Identify file type of file with provided path and name
 * using JDK's URLConnection.guessContentTypeFromName(String).
 *
 * @param fileName Name of file whose type is desired.
 * @return Type of file for which name was provided.
 */
public String identifyFileTypeUsingUrlConnectionGuessContentTypeFromName(final String fileName)
{
   return URLConnection.guessContentTypeFromName(fileName);
}

URLConnection's guessContentTypeFromName(String) approach to file detection shows "null" for files without file extensions and otherwise returns file type String representations that closely mirror the files' extensions. These results are very similar to those provided by the Files.probeContentType(Path) approach shown earlier with the one notable difference being that URLConnection's guessContentTypeFromName(String) approach identifies files with .xml extension as being of file type "application/xml" while Files.probeContentType(Path) identifies these same files' types as "text/xml".

URLConnection.guessContentTypeFromStream(InputStream)

The third approach I cover that is provided by URLConnection for file type detection is via the class's static method guessContentTypeFromStream(InputStream). A code listing employing this approach and associated output in a screen snapshot are shown next.

/**
 * Identify file type of file with provided path and name
 * using JDK's URLConnection.guessContentTypeFromStream(InputStream).
 *
 * @param fileName Name of file whose type is desired.
 * @return Type of file for which name was provided.
 */
public String identifyFileTypeUsingUrlConnectionGuessContentTypeFromStream(final String fileName)
{
   String fileType;
   try
   {
      fileType = URLConnection.guessContentTypeFromStream(new FileInputStream(new File(fileName)));
   }
   catch (IOException ex)
   {
      out.println("ERROR: Unable to process file type for " + fileName + " - " + ex);
      fileType = "null";
   }
   return fileType;
}

All the file types are null! The reason for this appears to be explained by the Javadoc for the InputStream parameter of the URLConnection.guessContentTypeFromStream(InputStream) method: "an input stream that supports marks." It turns out that the instances of FileInputStream in my examples do not support marks (their calls to markSupported() all return false).

Apache Tika

All of the examples of file detection covered in this post so far have been approaches provided by the JDK. There are third-party libraries that can also be used to detect file types in Java. One example is Apache Tika, a "content analysis toolkit" that "detects and extracts metadata and text from over a thousand different file types." In this post, I look at using Tika's facade class and its detect(String) method to detect file types. The instance method call is the same in the three examples I show, but the results are different because each instance of the Tika facade class is instantiated with a different Detector.

The instantiations of Tika instances with different Detectors is shown in the next code listing.

/** Instance of Tika facade class with default configuration. */
private final Tika defaultTika = new Tika();

/** Instance of Tika facade class with MimeTypes detector. */
private final Tika mimeTika = new Tika(new MimeTypes());
his is 
/** Instance of Tika facade class with Type detector. */
private final Tika typeTika = new Tika(new TypeDetector());

With these three instances of Tika instantiated with their respective Detectors, we can call the detect(String) method on each instance for the set of test files. The code for this is shown next.

/**
 * Identify file type of file with provided name using
 * Tika's default configuration.
 *
 * @param fileName Name of file for which file type is desired.
 * @return Type of file for which file name was provided.
 */
public String identifyFileTypeUsingDefaultTika(final String fileName)
{
   return defaultTika.detect(fileName);
}

/**
 * Identify file type of file with provided name using
 * Tika's with a MimeTypes detector.
 *
 * @param fileName Name of file for which file type is desired.
 * @return Type of file for which file name was provided.
 */
public String identifyFileTypeUsingMimeTypesTika(final String fileName)
{
   return mimeTika.detect(fileName);
}

/**
 * Identify file type of file with provided name using
 * Tika's with a Types detector.
 *
 * @param fileName Name of file for which file type is desired.
 * @return Type of file for which file name was provided.
 */
public String identifyFileTypeUsingTypeDetectorTika(final String fileName)
{
   return typeTika.detect(fileName);
}

When the three above Tika detection examples are executed against the same set of files are used in the previous examples, the output appears as shown in the next screen snapshot.

We can see from the output that the default Tika detector reports file types similarly to some of the other approaches shown earlier in this post (very tightly tied to the file's extension). The other two demonstrated detectors state that the file type is application/octet-stream in most cases. Because I called the overloaded version of detect(-) that accepts a String, the file type detection is "based on known file name extensions."

If the overloaded detect(File) method is used instead of detect(String), the identified file type results are much better than the previous Tika examples and the previous JDK examples. In fact, the "fake" extensions don't fool the detectors as much and the default Tika detector is especially good in my examples at identifying the appropriate file type even when the extension is not the normal one associated with that file type. The code for using Tika.detect(File) and the associated output are shown next.

   /**
    * Identify file type of file with provided name using
    * Tika's default configuration.
    *
    * @param fileName Name of file for which file type is desired.
    * @return Type of file for which file name was provided.
    */
   public String identifyFileTypeUsingDefaultTikaForFile(final String fileName)
   {
      String fileType;
      try
      {
         final File file = new File(fileName);
         fileType = defaultTika.detect(file);
      }
      catch (IOException ioEx)
      {
         out.println("Unable to detect type of file " + fileName + " - " + ioEx);
         fileType = "Unknown";
      }
      return fileType;
   }

   /**
    * Identify file type of file with provided name using
    * Tika's with a MimeTypes detector.
    *
    * @param fileName Name of file for which file type is desired.
    * @return Type of file for which file name was provided.
    */
   public String identifyFileTypeUsingMimeTypesTikaForFile(final String fileName)
   {
      String fileType;
      try
      {
         final File file = new File(fileName);
         fileType = mimeTika.detect(file);
      }
      catch (IOException ioEx)
      {
         out.println("Unable to detect type of file " + fileName + " - " + ioEx);
         fileType = "Unknown";
      }
      return fileType;
   }

   /**
    * Identify file type of file with provided name using
    * Tika's with a Types detector.
    *
    * @param fileName Name of file for which file type is desired.
    * @return Type of file for which file name was provided.
    */
   public String identifyFileTypeUsingTypeDetectorTikaForFile(final String fileName)
   {
      String fileType;
      try
      {
         final File file = new File(fileName);
         fileType = typeTika.detect(file);
      }
      catch (IOException ioEx)
      {
         out.println("Unable to detect type of file " + fileName + " - " + ioEx);
         fileType = "Unknown";
      }
      return fileType;
   }

Caveats and Customization

File type detection is not a trivial feat to pull off. The Java approaches for file detection demonstrated in this post provide basic approaches to file detection that are often highly dependent on a file name's extension. If files are named with conventional extensions that are recognized by the file detection approach, these approaches are typically sufficient. However, if unconventional file type extensions are used or the extensions are for files with types other than that conventionally associated with that extension, most of these approaches to file detection break down without customization. Fortunately, most of these approaches provide the ability to customize the mapping of file extensions to file types. The Tika approach using Tika.detect(File) was generally the most accurate in the examples shown in this post when the extensions were not the conventional ones for the particular file types.

Conclusion

There are numerous mechanisms available for simple file type detection in Java. This post reviewed some of the standard JDK approaches for file detection and some examples of using Tika for file detection.

Tuesday, February 17, 2015

Using JDK 8 Streams to Convert Between Collections of Wrapped Objects and Collections of Wrapper Objects

I have found Decorators and Adapters to be useful from time to time as I have worked with Java-based applications. These "wrappers" work well in a variety of situations and are fairly easy to understand and implement, but things can become a bit more tricky when a hierarchy of objects rather than a single object needs to be wrapped. In this blog post, I look at how Java 8 streams make it easier to convert between collections of objects and collections of objects that wrap those objects.

For this discussion, I'll apply two simple Java classes representing a Movie class and a class that "wraps" that class called MovieWrapper. The Movie class was used in my post on JDK 8 enhancements to Java collections. The Movie class and the class that wraps it are shown next.

Movie.java
package dustin.examples.jdk8.streams;

import java.util.Objects;

/**
 * Basic characteristics of a motion picture.
 *
 * @author Dustin
 */
public class Movie
{
   /** Title of movie. */
   private final String title;

   /** Year of movie's release. */
   private final int yearReleased;

   /** Movie genre. */
   private final Genre genre;

   /** MPAA Rating. */
   private final MpaaRating mpaaRating;

   /** imdb.com Rating. */
   private final int imdbTopRating;

   public Movie(final String newTitle, final int newYearReleased,
                final Genre newGenre, final MpaaRating newMpaaRating,
                final int newImdbTopRating)
   {
      this.title = newTitle;
      this.yearReleased = newYearReleased;
      this.genre = newGenre;
      this.mpaaRating = newMpaaRating;
      this.imdbTopRating = newImdbTopRating;
   }

   public String getTitle()
   {
      return this.title;
   }

   public int getYearReleased()
   {
      return this.yearReleased;
   }

   public Genre getGenre()
   {
      return this.genre;
   }

   public MpaaRating getMpaaRating()
   {
      return this.mpaaRating;
   }

   public int getImdbTopRating()
   {
      return this.imdbTopRating;
   }

   @Override
   public boolean equals(Object other)
   {
      if (!(other instanceof Movie))
      {
         return false;
      }
      final Movie otherMovie = (Movie) other;
      return   Objects.equals(this.title, otherMovie.title)
            && Objects.equals(this.yearReleased, otherMovie.yearReleased)
            && Objects.equals(this.genre, otherMovie.genre)
            && Objects.equals(this.mpaaRating, otherMovie.mpaaRating)
            && Objects.equals(this.imdbTopRating, otherMovie.imdbTopRating);
   }

   @Override
   public int hashCode()
   {
      return Objects.hash(this.title, this.yearReleased, this.genre, this.mpaaRating, this.imdbTopRating);
   }

   @Override
   public String toString()
   {
      return "Movie: " + this.title + " (" + this.yearReleased + "), " + this.genre + ", " + this.mpaaRating + ", "
            + this.imdbTopRating;
   }
}
MovieWrapper.java
package dustin.examples.jdk8.streams;

/**
 * Wraps a movie like a Decorator or Adapter might.
 * 
 * @author Dustin
 */
public class MovieWrapper
{
   private Movie wrappedMovie;

   public MovieWrapper(final Movie newMovie)
   {
      this.wrappedMovie = newMovie;
   }

   public Movie getWrappedMovie()
   {
      return this.wrappedMovie;
   }

   public void setWrappedMovie(final Movie newMovie)
   {
      this.wrappedMovie = newMovie;
   }

   public String getTitle()
   {
      return this.wrappedMovie.getTitle();
   }

   public int getYearReleased()
   {
      return this.wrappedMovie.getYearReleased();
   }

   public Genre getGenre()
   {
      return this.wrappedMovie.getGenre();
   }

   public MpaaRating getMpaaRating()
   {
      return this.wrappedMovie.getMpaaRating();
   }

   public int getImdbTopRating()
   {
      return this.wrappedMovie.getImdbTopRating();
   }

   @Override
   public String toString()
   {
      return this.wrappedMovie.toString();
   }
}

With the Movie and MovieWrapper classes defined above, I now look at converting a collection of one of these into a collection of the other. Before JDK 8, a typical approach to convert a collection of Movie objects into a collection of MovieWrapper objects would to iterate over the source collection of Movie objects and add each one to a new collection of MovieWrapper objects. This is demonstrated in the next code listing.

Converting Collection of Wrapped Object Into Collection of Wrapper Objects
// movies previously defined as Set<Movie>
final Set<MovieWrapper> wrappedMovies1 = new HashSet<>();
for (final Movie movie : movies)
{
   wrappedMovies1.add(new MovieWrapper(movie));
}

With JDK 8 streams, the operation above can now be implemented as shown in the next code listing.

Converting Collection of Wrapped Objects Into Collection of Wrapper Objects - JDK 8
// movies previously defined as Set<Movie>
final Set<MovieWrapper> wrappedMovies2 =
   movies.stream().map(movie -> new MovieWrapper(movie)).collect(Collectors.toSet());

Converting the other direction (from collection of wrapper objects to collection of wrapped objects) can be similarly compared to demonstrate how JDK 8 changes this. The next two code listings show the old way and the JDK 8 way.

Converting Collection of Wrapper Objects Into Collection of Wrapped Objects
final Set<Movie> newMovies1 = new HashSet();
for (final MovieWrapper wrappedMovie : wrappedMovies1)
{
   newMovies1.add(wrappedMovie.getWrappedMovie());
}
Converting Collection of Wrapper Objects Into Collection of Wrapped Objects - JDK 8
final Set<Movie> newMovies2 =
   wrappedMovies2.stream().map(MovieWrapper::getWrappedMovie).collect(Collectors.toSet());

Like some of the examples in my post Stream-Powered Collections Functionality in JDK 8, the examples in this post demonstrate the power of aggregate operations provided in JDK 8. The advantages of these aggregate operations over traditional iteration include greater conciseness in the code, arguably (perhaps eventually) greater readability, and the advantages of internal iteration (including easier potential streams-supported parallelization). A good example of using streams and more complex Functions to convert between collections of less cohesively related objects is shown in Transform object into another type with Java 8.

Friday, February 13, 2015

Writing Groovy's groovy.util.slurpersupport.GPathResult (XmlSlurper) Content as XML

In a previous blog post, I described using XmlNodePrinter to present XML parsed with XmlParser in a nice format to standard output, as a Java String, and in a new file. Because XmlNodePrinter works with groovy.util.Node instances, it works well with XmlParser, but doesn't work so well with XmlSlurper because XmlSlurper deals with instances of groovy.util.slurpersupport.GPathResult rather than instances of groovy.util.Node. This post looks at how groovy.xml.XmlUtil can be used to present GPathResult objects that results from slurping XML to standard output, as a Java String, and to a new file.

This post's first code listing demonstrates slurping XML with XmlSlurper and writing the slurped GPathResult to standard output using XmlUtil's serialize(GPathResult, OutputStream) method and the System.out handle.

slurpAndPrintXml.groovy : Writing XML to Standard Output
#!/usr/bin/env groovy

// slurpAndPrintXml.groovy
//
// Use Groovy's XmlSlurper to "slurp" provided XML file and use XmlUtil to write
// XML content out to standard output.

if (args.length < 1)
{
   println "USAGE: groovy slurpAndPrint.xml <xmlFile>"
}

String xmlFileName = args[0]

xml = new XmlSlurper().parse(xmlFileName)

import groovy.xml.XmlUtil
XmlUtil xmlUtil = new XmlUtil()
xmlUtil.serialize(xml, System.out)

The next code listing demonstrates use of XmlUtil's serialize(GPathResult) method to serialize the GPathResult to a Java String.

slurpXmlToString.groovy : Writing XML to Java String
#!/usr/bin/env groovy

// slurpXmlToString.groovy
//
// Use Groovy's XmlSlurper to "slurp" provided XML file and use XmlUtil to
// write the XML content to a String.

if (args.length < 1)
{
   println "USAGE: groovy slurpAndPrint.xml <xmlFile>"
}

String xmlFileName = args[0]

xml = new XmlSlurper().parse(xmlFileName)

import groovy.xml.XmlUtil
XmlUtil xmlUtil = new XmlUtil()
String xmlString = xmlUtil.serialize(xml)
println "String:\n${xmlString}"

The third code listing demonstrates use of XmlUtil's serialize(GPathResult, Writer) method to write the GPathResult representing the slurped XML to a file via a FileWriter instance.

slurpAndSaveXml.groovy : Writing XML to File
#!/usr/bin/env groovy

// slurpAndSaveXml.groovy
//
// Uses Groovy's XmlSlurper to "slurp" XML and then uses Groovy's XmlUtil
// to write slurped XML back out to a file with the provided name. The
// first argument this script expects is the path/name of the XML file to be
// slurped and the second argument expected by this script is the path/name of
// the file to which the XML should be saved/written.

if (args.length < 2)
{
   println "USAGE: groovy slurpAndSaveXml.groovy <xmlFile> <outputFile>"
}

String xmlFileName = args[0]
String outputFileName = args[1]
xml = new XmlSlurper().parse(xmlFileName)

import groovy.xml.XmlUtil
XmlUtil xmlUtil = new XmlUtil()
xmlUtil.serialize(xml, new FileWriter(new File(outputFileName)))

The examples in this post have demonstrated writing/serializing XML slurped into GPathResult objects through the use of XmlUtil methods. The XmlUtil class also provides methods for serializing the groovy.util.Node instances that XmlParser provides. The three methods accepting an instance of Node are similar to the three methods above for instances of GPathResult. Similarly, other methods of XmlUtil provide similar support for XML represented as instances of org.w3c.dom.Element, Java String, and groovy.lang.Writable.

The XmlNodePrinter class covered in my previous blog post can be used to serialize XmlParser's parsed XML represented as a Node. The XmlUtil class also can be used to serialize XmlParser's parsed XML represented as a Node but offers the additional advantage of being able to serialize XmlSlurper's slurped XML represented as a GPathResult.

Thursday, February 12, 2015

A JAXB Nuance: String Versus Enum from Enumerated Restricted XSD String

Although Java Architecture for XML Binding (JAXB) is fairly easy to use in nominal cases (especially since Java SE 6), it also presents numerous nuances. Some of the common nuances are due to the inability to exactly match (bind) XML Schema Definition (XSD) types to Java types. This post looks at one specific example of this that also demonstrates how different XSD constructs that enforce the same XML structure can lead to different Java types when the JAXB compiler generates the Java classes.

The next code listing, for Food.xsd, defines a schema for food types. The XSD mandates that valid XML will have a root element called "Food" with three nested elements "Vegetable", "Fruit", and "Dessert". Although the approach used to specify the "Vegetable" and "Dessert" elements is different than the approach used to specify the "Fruit" element, both approaches result in similar "valid XML." The "Vegetable" and "Dessert" elements are declared directly as elements of the prescribed simpleTypes defined later in the XSD. The "Fruit" element is defined via reference (ref=) to another defined element that consists of a simpleType.

Food.xsd
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:dustin="http://marxsoftware.blogspot.com/foodxml"
           targetNamespace="http://marxsoftware.blogspot.com/foodxml"
           elementFormDefault="qualified"
           attributeFormDefault="unqualified">

   <xs:element name="Food">
      <xs:complexType>
         <xs:sequence>
            <xs:element name="Vegetable" type="dustin:Vegetable" />
            <xs:element ref="dustin:Fruit" />
            <xs:element name="Dessert" type="dustin:Dessert" />
         </xs:sequence>
      </xs:complexType>
   </xs:element>

   <!--
        Direct simple type that restricts xs:string will become enum in
        JAXB-generated Java class.
   -->
   <xs:simpleType name="Vegetable">
      <xs:restriction base="xs:string">
         <xs:enumeration value="Carrot"/>
         <xs:enumeration value="Squash"/>
         <xs:enumeration value="Spinach"/>
         <xs:enumeration value="Celery"/>
      </xs:restriction>
   </xs:simpleType>

   <!--
        Simple type that restricts xs:string but is wrapped in xs:element
        (making it an Element rather than a SimpleType) will become Java
        String in JAXB-generated Java class for Elements that reference it.
   -->
   <xs:element name="Fruit">
      <xs:simpleType>
         <xs:restriction base="xs:string">
            <xs:enumeration value="Watermelon"/>
            <xs:enumeration value="Apple"/>
            <xs:enumeration value="Orange"/>
            <xs:enumeration value="Grape"/>
         </xs:restriction>
      </xs:simpleType>
   </xs:element>

   <!--
        Direct simple type that restricts xs:string will become enum in
        JAXB-generated Java class.        
   -->
   <xs:simpleType name="Dessert">
      <xs:restriction base="xs:string">
         <xs:enumeration value="Pie"/>
         <xs:enumeration value="Cake"/>
         <xs:enumeration value="Ice Cream"/>
      </xs:restriction>
   </xs:simpleType>

</xs:schema>

Although Vegetable and Dessert elements are defined in the schema differently than Fruit, the resulting valid XML is the same. A valid XML file is shown next in the code listing for food1.xml.

food1.xml
<?xml version="1.0" encoding="utf-8"?>
<Food xmlns="http://marxsoftware.blogspot.com/foodxml"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <Vegetable>Spinach</Vegetable>
   <Fruit>Watermelon</Fruit>
   <Dessert>Pie</Dessert>
</Food>

At this point, I'll use a simple Groovy script to validate the above XML against the above XSD. The code for this Groovy XML validation script (validateXmlAgainstXsd.groovy) is shown next.

validateXmlAgainstXsd.groovy
#!/usr/bin/env groovy

// validateXmlAgainstXsd.groovy
//
// Accepts paths/names of two files. The first is the XML file to be validated
// and the second is the XSD against which to validate that XML.

if (args.length < 2)
{
   println "USAGE: groovy validateXmlAgainstXsd.groovy <xmlFile> <xsdFile>"
   System.exit(-1)
}

String xml = args[0]
String xsd = args[1]

import javax.xml.validation.Schema
import javax.xml.validation.SchemaFactory
import javax.xml.validation.Validator

try
{
   SchemaFactory schemaFactory =
      SchemaFactory.newInstance(javax.xml.XMLConstants.W3C_XML_SCHEMA_NS_URI)
   Schema schema = schemaFactory.newSchema(new File(xsd))
   Validator validator = schema.newValidator()
   validator.validate(new javax.xml.transform.stream.StreamSource(xml))
}
catch (Exception exception)
{
   println "\nERROR: Unable to validate ${xml} against ${xsd} due to '${exception}'\n"
   System.exit(-1)
}
println "\nXML file ${xml} validated successfully against ${xsd}.\n"

The next screen snapshot demonstrates running the above Groovy XML validation script against food1.xml and Food.xsd.

The objective of this post so far has been to show how different approaches in an XSD can lead to the same XML being valid. Although these different XSD approaches prescribe the same valid XML, they lead to different Java class behavior when JAXB is used to generate classes based on the XSD. The next screen snapshot demonstrates running the JDK-provided JAXB xjc compiler against the Food.xsd to generate the Java classes.

The output from the JAXB generation shown above indicates that Java classes were created for the "Vegetable" and "Dessert" elements but not for the "Fruit" element. This is because "Vegetable" and "Dessert" were defined differently than "Fruit" in the XSD. The next code listing is for the Food.java class generated by the xjc compiler. From this we can see that the generated Food.java class references specific generated Java types for Vegetable and Dessert, but references simply a generic Java String for Fruit.

Food.java (generated by JAXB jxc compiler)
//
// This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 
// See <a href="http://java.sun.com/xml/jaxb">http://java.sun.com/xml/jaxb</a> 
// Any modifications to this file will be lost upon recompilation of the source schema. 
// Generated on: 2015.02.11 at 10:17:32 PM MST 
//


package com.blogspot.marxsoftware.foodxml;

import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlSchemaType;
import javax.xml.bind.annotation.XmlType;


/**
 * <p>Java class for anonymous complex type.
 * 
 * <p>The following schema fragment specifies the expected content contained within this class.
 * 
 * <pre>
 * <complexType>
 *   <complexContent>
 *     <restriction base="{http://www.w3.org/2001/XMLSchema}anyType">
 *       <sequence>
 *         <element name="Vegetable" type="{http://marxsoftware.blogspot.com/foodxml}Vegetable"/>
 *         <element ref="{http://marxsoftware.blogspot.com/foodxml}Fruit"/>
 *         <element name="Dessert" type="{http://marxsoftware.blogspot.com/foodxml}Dessert"/>
 *       </sequence>
 *     </restriction>
 *   </complexContent>
 * </complexType>
 * </pre>
 * 
 * 
 */
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "vegetable",
    "fruit",
    "dessert"
})
@XmlRootElement(name = "Food")
public class Food {

    @XmlElement(name = "Vegetable", required = true)
    @XmlSchemaType(name = "string")
    protected Vegetable vegetable;
    @XmlElement(name = "Fruit", required = true)
    protected String fruit;
    @XmlElement(name = "Dessert", required = true)
    @XmlSchemaType(name = "string")
    protected Dessert dessert;

    /**
     * Gets the value of the vegetable property.
     * 
     * @return
     *     possible object is
     *     {@link Vegetable }
     *     
     */
    public Vegetable getVegetable() {
        return vegetable;
    }

    /**
     * Sets the value of the vegetable property.
     * 
     * @param value
     *     allowed object is
     *     {@link Vegetable }
     *     
     */
    public void setVegetable(Vegetable value) {
        this.vegetable = value;
    }

    /**
     * Gets the value of the fruit property.
     * 
     * @return
     *     possible object is
     *     {@link String }
     *     
     */
    public String getFruit() {
        return fruit;
    }

    /**
     * Sets the value of the fruit property.
     * 
     * @param value
     *     allowed object is
     *     {@link String }
     *     
     */
    public void setFruit(String value) {
        this.fruit = value;
    }

    /**
     * Gets the value of the dessert property.
     * 
     * @return
     *     possible object is
     *     {@link Dessert }
     *     
     */
    public Dessert getDessert() {
        return dessert;
    }

    /**
     * Sets the value of the dessert property.
     * 
     * @param value
     *     allowed object is
     *     {@link Dessert }
     *     
     */
    public void setDessert(Dessert value) {
        this.dessert = value;
    }

}

The advantage of having specific Vegetable and Dessert classes is the additional type safety they bring as compared to a general Java String. Both Vegetable.java and Dessert.java are actually enums because they come from enumerated values in the XSD. The two generated enums are shown in the next two code listings.

Vegetable.java (generated with JAXB xjc compiler)
//
// This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 
// See <a href="http://java.sun.com/xml/jaxb">http://java.sun.com/xml/jaxb</a> 
// Any modifications to this file will be lost upon recompilation of the source schema. 
// Generated on: 2015.02.11 at 10:17:32 PM MST 
//


package com.blogspot.marxsoftware.foodxml;

import javax.xml.bind.annotation.XmlEnum;
import javax.xml.bind.annotation.XmlEnumValue;
import javax.xml.bind.annotation.XmlType;


/**
 * <p>Java class for Vegetable.
 * 
 * <p>The following schema fragment specifies the expected content contained within this class.
 * <p>
 * <pre>
 * <simpleType name="Vegetable">
 *   <restriction base="{http://www.w3.org/2001/XMLSchema}string">
 *     <enumeration value="Carrot"/>
 *     <enumeration value="Squash"/>
 *     <enumeration value="Spinach"/>
 *     <enumeration value="Celery"/>
 *   </restriction>
 * </simpleType>
 * </pre>
 * 
 */
@XmlType(name = "Vegetable")
@XmlEnum
public enum Vegetable {

    @XmlEnumValue("Carrot")
    CARROT("Carrot"),
    @XmlEnumValue("Squash")
    SQUASH("Squash"),
    @XmlEnumValue("Spinach")
    SPINACH("Spinach"),
    @XmlEnumValue("Celery")
    CELERY("Celery");
    private final String value;

    Vegetable(String v) {
        value = v;
    }

    public String value() {
        return value;
    }

    public static Vegetable fromValue(String v) {
        for (Vegetable c: Vegetable.values()) {
            if (c.value.equals(v)) {
                return c;
            }
        }
        throw new IllegalArgumentException(v);
    }

}
Dessert.java (generated with JAXB xjc compiler)
//
// This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 
// See <a href="http://java.sun.com/xml/jaxb">http://java.sun.com/xml/jaxb</a> 
// Any modifications to this file will be lost upon recompilation of the source schema. 
// Generated on: 2015.02.11 at 10:17:32 PM MST 
//


package com.blogspot.marxsoftware.foodxml;

import javax.xml.bind.annotation.XmlEnum;
import javax.xml.bind.annotation.XmlEnumValue;
import javax.xml.bind.annotation.XmlType;


/**
 * <p>Java class for Dessert.
 * 
 * <p>The following schema fragment specifies the expected content contained within this class.
 * <p>
 * <pre>
 * <simpleType name="Dessert">
 *   <restriction base="{http://www.w3.org/2001/XMLSchema}string">
 *     <enumeration value="Pie"/>
 *     <enumeration value="Cake"/>
 *     <enumeration value="Ice Cream"/>
 *   </restriction>
 * </simpleType>
 * </pre>
 * 
 */
@XmlType(name = "Dessert")
@XmlEnum
public enum Dessert {

    @XmlEnumValue("Pie")
    PIE("Pie"),
    @XmlEnumValue("Cake")
    CAKE("Cake"),
    @XmlEnumValue("Ice Cream")
    ICE_CREAM("Ice Cream");
    private final String value;

    Dessert(String v) {
        value = v;
    }

    public String value() {
        return value;
    }

    public static Dessert fromValue(String v) {
        for (Dessert c: Dessert.values()) {
            if (c.value.equals(v)) {
                return c;
            }
        }
        throw new IllegalArgumentException(v);
    }

}

Having enums generated for the XML elements ensures that only valid values for those elements can be represented in Java.

Conclusion

JAXB makes it relatively easy to map Java to XML, but because there is not a one-to-one mapping between Java and XML types, there can be some cases where the generated Java type for a particular XSD prescribed element is not obvious. This post has shown how two different approaches to building an XSD to enforce the same basic XML structure can lead to very different results in the Java classes generated with the JAXB xjc compiler. In the example shown in this post, declaring elements in the XSD directly on simpleTypes restricting XSD's string to a specific set of enumerated values is preferable to declaring elements as references to other elements wrapping a simpleType of restricted string enumerated values because of the type safety that is achieved when enums are generated rather than use of general Java Strings.

Monday, February 9, 2015

Writing Groovy's groovy.util.Node (XmlParser) Content as XML

Groovy's XmlParser makes it easy to parse an XML file, XML input stream, or XML string using one its overloaded parse methods (or parseText in the case of the String). The XML content parsed with any of these methods is made available as a groovy.util.Node instance. This blog post describes how to make the return trip and write the content of the Node back to XML in alternative formats such as a File or a String.

Groovy's MarkupBuilder provides a convenient approach for generating XML from Groovy code. For example, I demonstrated writing XML based on SQL query results in the post GroovySql and MarkupBuilder: SQL-to-XML. However, when one wishes to write/serialize XML from a Groovy Node, an easy and appropriate approach is to use XmlNodePrinter as demonstrated in Updating XML with XmlParser.

The next code listing, parseAndPrintXml.groovy demonstrates use of XmlParser to parse XML from a provided file and use of XmlNodePrinter to write that Node parsed from the file to standard output as XML.

parseAndPrintXml.groovy : Writing XML to Standard Output
#!/usr/bin/env groovy

// parseAndPrintXml.groovy
//
// Uses Groovy's XmlParser to parse provided XML file and uses Groovy's
// XmlNodePrinter to print the contents of the Node parsed from the XML with
// XmlParser to standard output.

if (args.length < 1)
{
   println "USAGE: groovy parseAndPrintXml.groovy <XMLFile>"
   System.exit(-1)
}

XmlParser xmlParser = new XmlParser()
Node xml = xmlParser.parse(new File(args[0]))
XmlNodePrinter nodePrinter = new XmlNodePrinter(preserveWhitespace:true)
nodePrinter.print(xml)

Putting aside the comments and code for checking command line arguments, there are really 4 lines (lines 15-18) in the above code listing of significance to this discussion. These four lines demonstrate instantiating an XmlParser (line 15), using the instance of XmlParser to "parse" a File instance based on a provided argument file name (line 16), instantiating an XmlNodePrinter (line 17), and using that XmlNodePrinter instance to "print" the parsed XML to standard output (line 18).

Although writing XML to standard output can be useful for a user to review or to redirect output to another script or tool, there are times when it is more useful to have access to the parsed XML as a String. The next code listing is just a bit more involved than the last one and demonstrates use of XmlNodePrinter to write the parsed XML contained in an Node instance as a Java String.

parseXmlToString.groovy : Writing XML to Java String
#!/usr/bin/env groovy

// parseXmlToString.groovy
//
// Uses Groovy's XmlParser to parse provided XML file and uses Groovy's
// XmlNodePrinter to write the contents of the Node parsed from the XML with
// XmlParser into a Java String.

if (args.length < 1)
{
   println "USAGE: groovy parseXmlToString.groovy <XMLFile>"
   System.exit(-1)
}

XmlParser xmlParser = new XmlParser()
Node xml = xmlParser.parse(new File(args[0]))
StringWriter stringWriter = new StringWriter()
XmlNodePrinter nodePrinter = new XmlNodePrinter(new PrintWriter(stringWriter))
nodePrinter.setPreserveWhitespace(true)
nodePrinter.print(xml)
String xmlString = stringWriter.toString()
println "XML as String:\n${xmlString}"

As the just-shown code listing demonstrates, one can instantiate an instance of XmlNodePrinter that writes to a PrintWriter that was instantiated with a StringWriter. This StringWriter ultimately makes the XML available as a Java String.

Writing XML from a groovy.util.Node to a File is very similar to writing it to a String with a FileWriter used instead of a StringWriter. This is demonstrated in the next code listing.

parseAndSaveXml.groovy : Write XML to File
#!/usr/bin/env groovy

// parseAndSaveXml.groovy
//
// Uses Groovy's XmlParser to parse provided XML file and uses Groovy's
// XmlNodePrinter to write the contents of the Node parsed from the XML with
// XmlParser to file with provided name.

if (args.length < 2)
{
   println "USAGE: groovy parseAndSaveXml.groovy <sourceXMLFile> <targetXMLFile>"
   System.exit(-1)
}

XmlParser xmlParser = new XmlParser()
Node xml = xmlParser.parse(new File(args[0]))
FileWriter fileWriter = new FileWriter(args[1])
XmlNodePrinter nodePrinter = new XmlNodePrinter(new PrintWriter(fileWriter))
nodePrinter.setPreserveWhitespace(true)
nodePrinter.print(xml)

I don't show it in this post, but the value of being able to write a Node back out as XML often comes after modifying that Node instance. Updating XML with XmlParser demonstrates the type of functionality that can be performed on a Node before serializing the modified instance back out.

Monday, February 2, 2015

Recent (Late January 2015) Java Posts of Interest

Java is approaching its 20th birthday (initial version was announced on 23 May 1995 at SunWorld) and continues to enjoy advances in terms of the language and platform and in terms of developers' understanding how to use the language and platform most effectively. In this post, I look at some of the recent blog posts on Java that I have found to be particularly useful or interesting.

Last Publicly Available Java 7 Critical Patch Update

Lucian Constantin's Critical Java updates fix 19 vulnerabilities, disable SSL 3.0 states that because "future Java 7 security patches will not be publicly available," users "should migrate to Java 8." Constantin's article summarizes some specific details on the security patches associated with this Critical Patch Update.

Cleaning Up Those Pesky javac lint Warnings

Without careful attention to warnings generated with javac lint, the number of these warnings can become so large on long-term projects that it can be difficult to even notice when new javac lint warnings appear. It can also be daunting to address these when there is an overwhelming number of the warnings present. Joseph D. Darcy's post Advice on removing javac lint warnings describes a detailed approach for working off javac lint warnings in an orderly manner and how to resolve different types of these warnings. The steps of the outlined approach and the approaches to use with different types of warnings are based on Darcy's experiences "fixing and reviewing fixes of javac lint warnings in the JDK 9 code base ... over the last twelve months or so."

Advanced Java Debugging Tips

Tal Weiss's 5 Advanced Java Debugging Techniques Every Developer Should Know About presents ideas with detailed descriptions for improving debugging of Java-based applications. These include improving jstack output by use of explicit thread names; detecting deadlock programatically; using BTrace to get the value of any variable from any point in the code from a live JVM, without attaching a debugger or redeploying code;" and building native JVM agents.

Conclusion

Although Java has been around for almost 20 years, the the platform and the language continue to evolve and advance and developers' insight into using these tools advances with them. The three posts highlighted here are examples of that.