Saturday, May 19, 2018

Predicate::not Coming to Java

Jim Laskey's recent message "RFR: CSR - JDK-8203428 Predicate::not" on the OpenJDK core-libs-dev mailing list calls out JDK Bug JDK-8203428 ["Predicate::not"]. The "Summary" of JDK-8203428 states, "Introduce a new static method Predicate::not which will allow developers to negate predicate lambdas trivially." It is currently assigned to JDK 11.

The "Problem" section of JDK-8203428 provides a succinct description of the issue that Predicate::not addresses:

The requirement for predicate negation occurs frequently since predicates are defined antipodal to a positive selection; isNull, isEmpty, isBlank.

Presently there is no easy way to negate a predicate lambda without first wrapping in a Predicate Object.

There is a highly illustrative example of how this would work in the JDK-8203428 write-up. The "Problem" section of JDK-8203428 provides code that demonstrates how "predicate negation" would be performed today and the "Solution" section provides code demonstrating how the same functionality could be implemented with the proposed static method Predicate::not.

There are some other interesting messages in this mailing list thread. A Brian Goetz message in the thread states that "we did discover that default methods on [functional interfaces] combined with subtyping of [functional interfaces] caused trouble. But static methods are fine." A Remi Forax message in the thread states that "stackoverflow has already decided that Predicate.not was the right method." A Sundararajan Athijegannathan message in the thread points out that "not(String::isEmpty) reads almost like !str.isEmpty()".

The addition of static function not(Predicate<T>) to Predicate is a small thing, but should improve the fluency of many lines of Java code.

Monday, May 14, 2018

Three New JEPs Targeted for JDK 11

Three new JEPs were targeted for JDK 11 a week ago today (7 May 2014). In three separate messages on the jdk-dev mailing list, Mark Reinhold made the following announcements:

JEP 324: Key Agreement with Curve25519 and Curve448

The "Summary" section of JEP 324 ("Key Agreement with Curve25519 and Curve448") states, "Implement key agreement using Curve25519 and Curve448 as described in RFC 7748." The Curve25519 entry on Wikipedia has an opening paragraph that makes it clear why this particular elliptic curve is well-suited as an addition to the JDK. It states that "Curve25519 is an elliptic curve offering 128 bits of security" that is "designed for use with the elliptic curve Diffie–Hellman (ECDH) key agreement scheme and is one of the fastest ECC curves and is not covered by any known patents." It adds that "the reference implementation is public domain software."

D. J. Bernstein provides a more specific and approachable summary of the value of Curve25519: "Given a user's 32-byte secret key, Curve25519 computes the user's 32-byte public key. Given the user's 32-byte secret key and another user's 32-byte public key, Curve25519 computes a 32-byte secret shared by the two users. This secret can then be used to authenticate and encrypt messages between the two users."

RFC 7748 ("Elliptic Curves for Security") is a memo provided by the Internet Research Task Force (IRTF) that "specifies two elliptic curves [curve25519 and curve448] over prime fields that offer a high level of practical security in cryptographic applications, including Transport Layer Security (TLS), and that are "intended to operate at the ~128-bit and ~224-bit security level, respectively, and are generated deterministically based on a list of required properties."

The "primary goal" of JEP 324 is to provided "an API and an implementation for [the RFC 7748] standard," the two additional goals are also spelled out. One of the additional goals is to provide "a platform-independent, all-Java implementation with better performance than the existing ECC (native C) code at the same security strength."

Current plans for JEP 324 only involve an RFC implementation in the SunEC elliptic curve cryptography provider. However, with an API provided ("XDH"), it seems possible for other ECC providers to implement RFC 7748 as desired. JEP 324 currently adds this important note: "This new library will be in an internal JDK package, and will only be used by new crypto algorithms."

JEP 327: Unicode 10

JEP 327 ("Unicode 10") provides a straightforward "Summary": "Upgrade existing platform APIs to support version 10.0 of the Unicode Standard." The WikiBooks entry "Unicode/Versions" lists each major version of Unicode from Unicode 1.0 through Unicode 12.0. Unicode 10 was released last summary and Unicode 11 is planned for next month.

An early history/mapping of Java versions to Unicode versions is provided in "Unicode Versions Supported in Java-History," which shows Unicode 1.1.5 associated with JDK 1.0 through Unicode 6.0 associated with JDK 7. Unicode 6.2.0 was supported by JDK 8, Unicode 8.0.0 was supported by JDK 9 and JDK 10, so JDK 11 will add support for both Unicode 9.0 and Unicode 10.0.

JEP 327 explicitly lists "four related Unicode specifications" (Unicode Technical Standards) that "will not be implemented" as part of JEP 327. These are UTS #10 ("Unicode Collation Algorithm"), UTS #39 ("Unicode Security Mechanisms"), UTS #46 ("Unicode IDNA Compatibility Processing"), and UTS #51 ("Unicode Emoji").

JEP 328: Flight Recorder

JEP 328 ("Flight Recorder") aims to "provide a low-overhead data collection framework for troubleshooting Java applications and the HotSpot JVM." In its "Motivation" section, this JEP states, "Flight Recorder records events originating from applications, the JVM and the OS. Events are stored in a single file that can be attached to bug reports and examined by support engineers, allowing after-the-fact analysis of issues in the period leading up to a problem. Tools can use an API to extract information from recording files."

The current Java Mission Control page describes Java Flight Recorder and its relationship to Java Mission Control:

Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collection framework built into the Oracle JDK. It allows Java administrators and developers to gather detailed low level information about how the Java Virtual Machine (JVM) and the Java application are behaving. Java Mission Control is an advanced set of tools that enables efficient and detailed analysis of the extensive of data collected by Java Flight Recorder. The tool chain enables developers and administrators to collect and analyze data from Java applications running locally or deployed in production environments.

This JEP is another step in achieving the announcement made last September that "Oracle will also open source commercial features such as Java Flight Recorder previously only available in the Oracle JDK." Marcus Hirt announced just this month that Java Mission Control has been open sourced with repositories available at http://hg.openjdk.java.net/jmc.

Summary

The three JEPs highlighted in this post now bring the number of JEPs currently associated with JDK 11 to a total of eight:

Saturday, May 12, 2018

Java's @Serial Annotation

The JDK may be getting another standard (predefined) annotation with JDK 11: @Serial. JDK-8202385 ["Annotation to mark serial-related fields and methods"] aims to add "some kind of 'SerialRelated' annotation to facilitate automated checking of the declarations of serial fields and methods." The idea is to better indicate to a developer when a serialization-related field or method is misspelled similar to the way that "the java.lang.Override annotation type is used to signal the compiler should verify the method is in fact overridden."

Joe Darcy recently requested review of the "webrev" (proposed code addition). This provides a peek at what the new @Serial might look like. The current proposal is for this annotation definition to reside in the java.io package, to be targeted at specific methods or fields, and to have SOURCE retention.

The Javadoc comments for the proposed definition of @Serial currently provide significant detail on how to use this annotation. This Javadoc also explicitly specifies which methods and fields are anticipated to be annotated with @Serial: writeObject(), readObject(), readObjectNoData(), writeReplace(), readResolve(), ObjectStreamField[], and serialVersionUID.

The proposed @Serial annotation will be checked when the javac "serial" lint check is executed. This is described in Darcy's e-mail request for review:

The proposed java.io.Serial annotation type is intended to be used along with an augmented implementation of javac's "serial" lint check; that work will be done separately as part of JDK-8202056: "Expand serial warning to check for bad overloads of serial-related methods".

It's interesting to note that the name of this annotation is not necessarily finalized, though it seems likely to stick. Darcy's e-mail message points out that alternate names such as @Serialize and @SerialRelated could also be used.

An interesting distinction is that the @Serial annotation cannot or should not be used with certain methods and certain fields of the Externalizable interface (extends Serializable) because those methods and fields are not used in Externalizable. More details on this distinction are available in the core-libs-dev messages 053060, 053061, 053064, and 053067.

The @Serial annotation is not officially scheduled for JDK 11 as of this writing, but it appears likely that it could be available in time for the JDK 11 release given the recent progress of JDK-8202385. Besides the potential usefulness of this annotation to those implementing custom serialization, this annotation's definition will provide another example of how any custom annotation can be documented to allow it to be used correctly.

Tuesday, May 1, 2018

New Methods on Java String with JDK 11

It appears likely that Java's String class will be gaining some new methods with JDK 11, expected to be released in September 2018.

BUG #BUG TITLENEW String METHODDESCRIPTION
JDK-8200425 String::lines lines() "String instance method that uses a specialized Spliterator to lazily provide lines from the source string."
JDK-8200378 String::strip, String::stripLeading, String::stripTrailing strip() "Unicode-aware" evolution of trim()
stripLeading() "removal of Unicode white space from the beginning"
stripTrailing() "removal of Unicode white space from the ... end"
JDK-8200437 String::isBlank isBlank() "instance method that returns true if the string is empty or contains only white space"

Evidence of the progress that has been made related to these methods can be found in messages requesting "compatibility and specification reviews" (CSR) on the core-libs-dev mailing list:

A common characteristic of four of these five new methods is that they use a different (newer) definition of "whitespace" than did old methods such as String.trim(). Bug JDK-8200373 ["String::trim JavaDoc should clarify meaning of space"] even addresses this for the String.trim() method (mailing list review request):

The current JavaDoc for String::trim does not make it clear which definition of "space" is being used in the code. With additional trimming methods coming in the near future that use a different definition of space, clarification is imperative. String::trim uses the definition of space as any codepoint that is less than or equal to the space character codepoint (\u0040.) Newer trimming methods will use the definition of (white) space as any codepoint that returns true when passed to the Character::isWhitespace predicate.

The method isWhitespace(char) was added to Character with JDK 1.1, but the method isWhitespace(int) was not introduced to the Character class until JDK 1.5. The latter method (the one accepting a parameter of type int) was added to support supplementary characters. The Javadoc comments for the Character class define supplementary characters (typically modeled with int-based "code point") versus BMP characters (typically modeled with single character):

The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values ... A char value, therefore, represents Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. An int value represents all Unicode code points, including supplementary code points. ... The methods that only accept a char value cannot support supplementary characters. ... The methods that accept an int value support all Unicode characters, including supplementary characters.

I added the bold emphasis in the above quote to emphasize the significance of a "code point," which is defined for the Java context as "a value that can be used in a coded character set". Four of the five proposed new methods for String in JDK 11 rely heavily on the concept embodied in Character.isWhitespace(int) to determine how to "trim" a given string or when determining if a given string is "blank."

Speaking of Unicode, JEP 327 ["Unicode 10"] has been proposed to be added to JDK 11 as well. As that JEP states, its intent is to "upgrade existing platform APIs to support version 10.0 of the Unicode Standard."

Conclusion

The new methods on String currently proposed for JDK 11 provide a more consistent approach to handling white space in strings that can better handle internationalization, provide methods for trimming whitespace only at the beginning of the string or at the end of the string, and provide a method especially intended for coming raw string literals.

Additional References

Monday, April 30, 2018

Faster Repeated Access to Java Class Names Coming to Java?

Claes Redestad has posted the message "RRF: 8187123: (reflect) Class#getCanonicalName and Class#getSimpleName is a part of performance issue" on the core-libs-dev mailing list in which he requests review of a proposed change "to enable caching of getCanonicalName and getSimpleName, repeated calls of which has been reported to be a performance bottleneck." He adds that "the caching improves performance of these methods by up to 20x."

An obvious solution to the performance issue might have been to add the name of the class as a field to the Class class definition, but Redestad points out in the associated bug JDK-8187123 that "we should avoid adding more fields to java.lang.Class." Instead, this bug was addressed by the idea to "piggy back off other reflection information that is cached in ReflectionData."

ReflectionData is a nested (private static) class defined within the Class class. The Class class's reference to ReflectionData is defined as:

    private volatile transient SoftReference<ReflectionData<T>> reflectionData;

The Class instance holds a soft reference (java.lang.ref.SoftReference) to the instance of nested class ReflectionData. The class-level Javadoc for SoftReference states that a soft reference is "cleared at the discretion of the garbage collector in response to memory demand" and that a soft reference is "most often used to implement memory-sensitive caches." This seems like a nice solution to balance performance and memory concerns.

The mailing list message references a link to the proposed changes to Class.java. Reviewing those changes, one can quickly see how the proposed code changes add three new Strings to the attributes contained in an ReflectionData instance to represent canonical name, simple name, and type name. Of course, the three methods that provide access to those details [getCanonicalName(), getSimpleName(), and getTypeName()] are changed to use these values.

As of this writing, JDK-8187123 has not been associated with a particular Java release.

Saturday, April 21, 2018

Recent Java Developments - Late April 2018

There have been several recent developments in the Java-sphere this week and I summarize some of them in this post.

The End of JavaOne as We Know It

In the post "JavaOne Event Expands with More Tracks, Languages and Communities – and New Name," Stephen Chin writes, "The JavaOne conference is expanding to create a new, bigger event that’s inclusive to more languages, technologies and developer communities." He adds that it has been renamed to "Oracle Code One" and that this year's edition (the "inaugural year of Oracle Code One") will be held in San Francisco's Moscone West in late October (October 22-25, 2018).

GraalVM: "Run Programs Faster Anywhere"

In the 17 April 2018 post "Announcing GraalVM: Run Programs Faster Anywhere," Thomas Wuerthinger and the GraalVM Team "present the first production-ready release" of "a universal virtual machine designed for a polyglot world" called GraalVM 1.0. GraalVM Community Edition (CE) is open source and is hosted on GitHub. The main GraalVM page describes it as "a universal virtual machine for running applications written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin, and LLVM-based languages such as C and C++."

JavaScript and the JVM-based languages are recommended for production use of GraalVM 1.0 with improved support advertised for other languages in the near future. The GraalVM Downloads page provides for downloads of either the Community Edition (from GitHub) or the Enterprise Edition (EE, from Oracle Technology Network).

Mission Control Project in OpenJDK

Marcus Hirt has proposed "the creation of the Mission Control Project" on the OpenJDK announce mailing list. This seems like a logical step in the effort discussed in Mark Reinhold's message "Accelerating the JDK release cadence" to "open-source the commercial features in order to make the OpenJDK builds more attractive to developers and to reduce the differences between those builds and the Oracle JDK" with the "ultimate goal" of making "OpenJDK and Oracle JDK builds completely interchangeable."

Flight Recorder in OpenJDK

Speaking of commercial features of the Oracle JDK being brought into the OpenJDK, JEP 328 ("Flight Recorder") had some interesting news this month with Markus Gronlund's hotspot-dev mailing list announcement of the availability of "a preview of a large part of the source code for JEP 328 : Flight Recorder."

JEP 321: HTTP Client (Standard) Targeted for JDK 11

As announced late last month, JEP 321 ["HTTP Client (Standard)"] has been targeted for JDK 11.

Significant Progress on Switch Expressions (and Improving Switch Statements)

There has been significant progress in the OpenJDK mailing lists' high-level design of switch expressions that includes enhancements to the existing switch statements since my original post on switch expressions. I have summarized some of the latest discussion (particularly that in a Brian Goetz post) in a recent blog post called "Enhancing Java switch Statement with Introduction of switch Expression."

Should I Return A Collection or Stream?

There's an interesting thread "Should I return a Collection or a Stream?" on the Java sub-reddit that is based on an interesting July 2017 discussion on StackOverflow related to whether it's most appropriate to return a Collection or a Stream in a particular case.

Friday, April 20, 2018

Enhancing Java switch Statement with Introduction of switch Expression

In late December of last year, I posted "Switch Expressions Coming to Java?" Since then, there has been significant discussion, expressed differences of opinion, and now a coalescence of general agreement regarding the future of switch expressions in Java. I have tried to capture some of the major developments related to switch expressions as comments on my December blog post. However, I felt like this week's Brian Goetz message title "[switch] Further unification on switch" on the amber-spec-observers mailing list warranted a new blog post on Java switch expressions.

Goetz opens his message with a reminder that the end game is not Java switch expressions. Instead, Goetz points out that "switch expressions are supposed to just be an uncontroversial waypoint on the way to the real goal, which is a more expressive and flexible switch construct that works in a wider variety of situations, including supporting patterns, being less hostile to null, use as either an expression or a statement, etc."

Goetz also points out that "switch does come with a lot of baggage" and he points out that "this baggage has produced the predictable distractions in the discussion." Goetz states that "the worst possible outcome ... would be to invent a new construct that is similar to, but not quite the same as switch ... without being a 100% replacement for today's quirky switch." Given that concern, the original proposed switch expression syntax is being discarded because it was leading the discussion toward this "worst possible outcome."

The new switch unification proposal (dubbed "Unification Attempt #2" [UA2]) proposes that "that _all_ switches can support either old-style (colon) or new-style (arrow) case labels -- but must stick to one kind of case label in a given switch." This means that a given switch's case labels all must use either the "colon" syntax we're used to today with switch statements or used the new proposed "arrow" syntax, but cannot use both within the same switch.

There are reasons a developer might choose one form over the other ("colon" versus "arrow"). Goetz highlights some advantages of the "arrow" syntax associated with switch's current proposal: "in the all-arrow form, all of the things people hate about switch -- the need to say break, the risk of fallthrough, and the questionable scoping -- all go away."

Goetz, in text, presents how the "structural properties" of the various "switch forms" drive "control flow and scoping rules." This is shown in the following table.

  STATEMENT
("Nonlocal control flow _out_ of a switch [continue to an enclosing loop, break with label, return]")
EXPRESSION
(Totality: return a value)
COLON
(Enables Fall-through)
switch we know and "love", but enhanced break returns a value like return
ARROW
(Prevents Fall-through)
"Syntactic shorthand" for Statement/Colon (above) plus
  • "obviates the annoyance of 'break'"
  • "implicitly prevents fallthrough of all forms"
  • "avoids the confusion of current switch scoping"
Arrow (->) points to returned value

Goetz summarizes what the above table shows with the statement "the colon form gives you the old control flow and the arrow form gives you the new. And either can be used as a statement, or an expression. And no one will be confused by mixing." He also specifically describes the structure in the lower left corner of the table above (switch statement with "arrow" syntax): "Switch statements now come in a simpler (arrow) flavor, where there is no fallthrough, no weird scoping, and no need to say break most of the time. Many switches can be rewritten this way, and this form can even be taught first."

Goetz concludes his post with this promising summary:

The result is one switch construct, with modern and legacy flavors, which supports either expressions or statements. You can immediately look at the middle of a switch and tell (by arrow vs colon) whether it has the legacy control flow or not.

The overall response so far to the proposed "Unification Attempt #2" so far has been overwhelming positive, but not without the expected lingering concerns. Gavin Bierman summarizes this proposal by saying "it's really all about enhancement as opposed to a new construct" and states, "Writing revised spec as we speak - be ready!"