de en

Amazon DJL - a new DL framework for Java

Developers who wanted to explore neural networks and deep learning using the JVM, and especially Java, had little choice so far. Those who wanted to focus exclusively on Java could not get around DL4J until now. If it had to be the JVM, but not necessarily Java, the MXNet Scala Frontend was also an option. Finally, if a little Python didn’t scare you, you could try a hybrid solution, combining TensorFlow and Java just like we already explained in previous articles.

But now a new competitor is entering the still very limited Java Deep Learning landscape: DJL, Deep Java Library, a Java framework for Deep Learning, created by Amazon. While the other Silicon Valley heavyweights are already offering their own “in-house” frameworks (Google: TensorFlow, Facebook: PyTorch, Microsoft: CnTK), Amazon had previously lacked an “own” solution, although the company itself is already quite active in the AI market with products such as Alexa or the Amazon Cloud APIs for both end users and developers. With DJL, Amazon now closes this gap in its own portfolio – and at the same time offers a new alternative for Java programmers who have been neglected by the deep learning industry.

Since it is a framework developed and published by Amazon, it is very likely not just a passing fancy, but will hopefully be supported and improved continuously. Although it is an Apache 2.0 licensed open source project, the development and support of DJL is of course not just fueled by pure altruism. As the world’s largest cloud provider, an “in-house” framework for usually processor-hungry deep learning tasks is also strategically important.

The set-up of DJL

DJL itself as a Deep Learning API is framework-independent. This means that in the future you should be able to use different Deep Learning Frameworks (which are usually all natively written in C/C++/Cuda) to perform the actual computations. However, the only fully functional implementation at the moment is based on the C API of MXNet. It enables Java users to create and train completely new models “from scratch”, unlike the official MXNet Java API, which only allows the “running” (inference) of ready models. It is therefore a complete and fully functional deep learning framework for the MXNet engine.

However, DJL only supports the MXNet Eager Mode – and thus an imperative execution of computational steps, as is also standard for TensorFlow 2 – and not the MXNet computational graph, the likes of which TensorFlow 1.0 users know well. This makes the development and debugging of models much easier, but comes at a slight expense of performance.

For the future, the support of TensorFlow, Torch and DL4J as engines is also intended – it remains to be seen to what extent this framework-agnostic strategy proves useful and practicable.

DJL modules and dependencies

DJL consists of a number of modules, all of which can be included as Gradle / Maven dependencies. The following dependencies are required (if you want to use the MXNet Engine):

  • ai.djl:api: The core DJL API
  • ai.djl.mxnet:mxnet-engine: Implementing the DJL API with MXNet – as soon as other framework implementations become available, this could be replaced by alternatives.
  • ai.djl.mxnet:mxnet-native-auto: This dependency contains the native code for MXNet. There are several alternatives, the -auto variant downloads the one matching the machine the code is running on (GPU or CPU). If you need a specific variant, there are other libraries that allow you to specify exactly which dependency to use. The -auto variant is recommended for development, for deployments you should use the documentation to determine the appropriate variant manually.

Additionally, there are several dependencies that facilitate the work with ready-made network architectures, models and data sets:

  • ai.djl:basicdataset: Publicly available standard ML datasets like (of course) MNIST.
  • ai.djl:model-zoo: Fully trained standard models

DJL Classes and Interfaces

Since DJL is still brand-new, there is not much to find in these modules yet. The most important classes or interfaces that you should be familiar with when using DJL are the following:

  • ai.djl.ndarray.NDArray: This is the essential class in DJL. It represents a multidimensional array (tensor) that is managed by the underlying engine. It offers a large number of mathematical functions, ranging from simple operations like addition and multiplication to special operations like softmax. If NDArrays are used to train neural networks, the gradients are calculated and managed automatically. But you could also use NDArrays for fast calculation in other settings, where fast, accelerated tensor operations are needed. The naming is intentionally based on the NDArray known from Python. As far as possible names and signatures of methods are the same.
  • ai.djl.ndarray.types.Shape: This class defines the shape (number and size of dimensions) of an NDArray. It is important because one of the trickiest tasks when programming your own networks is usually making sure that the shape of input and output of successive operations match.
  • ai.djl.ndarray.NDManager: This class is basically the entry point for an engine like MxNet. With an NDManager you can create NDArray objects, which are the foundation for all calculations in DJL.
  • ai.djl.nn.Block: This is a component of a neural network, often called layer in other frameworks. Blocks can in themselves contain blocks, which in turn contain blocks etc. This is probably why the somewhat more general term was chosen instead of “layer”. Each neural network is itself a single block at the highest level. There are various subclasses that make it easier to create your own blocks.
  • ai.djl.ndarray.NDList: As the name suggests, this is a list of NDArrays. This auxiliary class is necessary as more complex architectures may need several inputs for some steps or may generate several outputs. Therefore, the most important function of a Block, forward, has NDLists instead of NDArrays as input parameter and output value to handle such calculations.
  • ai.djl.Model: Model refers to the entirety of a model, i.e. the architecture of a neural network (= its blocks) and the learned (or yet to be learned) parameters.
  • A Trainer can train a model, so this class performs backpropagation, i.e. the deep learning learning process. You interact with this class when training a model.
  • ai.djl.inference.Predictor: A Predictor is the counterpart to a Trainer, if you want to use a Model that has already been trained. It allows inference on a neural net.

In addition, there are a number of important auxiliary classes and interfaces such as Dataset, Translator, Record and Batch that help to get data in and out of the model in a reasonable way. We’ll explain how to do this in detail – step by step – using code examples in a tutorial series over the next weeks.

Our experience with DJL

At DIVISIO we are already big fans of DJL as it gives us the opportunity to implement completely new architectures and insights of new papers directly on the JVM. It offers automatic GPU accelerated gradient calculation for own layers (blocks) – this is unfortunately still missing with the main competitor DL4J: Here you might have to implement the backwards pass (the derivation) of new layers yourself. However, the DL4J team is working hard on this functionality, so that this feature will be introduced in DL4J in the foreseeable future.

Pro: Faster implementation and improved troubleshooting

Because DJL is completely Java-based, it eliminates the complexity of Python/Java hybrid projects. DJL’s imperative approach allows even complex models to be debugged with e.g. IntelliJ like normal Java code. This speeds up work and debugging enormously and simplifies the development of more complex control structures in neural networks. During our initial project experience with DJL we have found that the source code and JavaDoc documentation for such a ‘young’ project is already impressively advanced. A big plus is the helpful and friendly development team that reacts extremely quickly to bug reports and feature requests.

Con: (Still) missing ready-made components

A drawback so far is that some high-level components that are a given in other frameworks are still missing in DJL. For our daily work this is less of a hindrance, since our current projects usually require the implementation of custom architectures anyway. Therefore, the ability to develop completely new components quickly and easily is more important to us than a wide range of ready-made functionality. But for deep learning beginners it can be a bit tricky to navigate at the moment, e.g. if a layer type, which other frameworks of course offer, is still missing in DJL. Also the selection of finished, pre-trained models is still very limited. Here DL4J is clearly ahead.

As a relatively new project, there is of course not much material online beyond the documentation of the DJL team itself – StackOverflow is not yet of any help with DJL issues as the community is still evolving. But this will probably change quickly in the next few months.

Finally, as DJL is still in an early alpha stage, one should be prepared to deal with a higher number of bugs than with a mature framework.

Conclusion: Deep Learning Framework with potential – not only for specialists

If you’re already familiar with Java and Deep Learning, DJL is a good choice for you. The understandable source code, the good Javadoc and the helpful development team make it easy to get started quickly. If you’re still a Deep Learning beginner, you’ll find it a bit harder to adapt, as the broad support through online tutorials, StackOverflow answers and sample projects is still missing. Nevertheless, DJL is already an excellent framework that has the chance to become one of the leading Deep Learning frameworks on the JVM.

To help beginners find their way around and explain some aspects in detail, we’ ll present a step-by-step approach to the classic Deep Learning example in DJL, the classification of handwritten numbers based on the MNIST dataset in an upcoming article.