de en

Deep Java Learning - NDManager & NDArray

After our first presentation of Amazon’s new Deep Learning Framework for Java, DJL, we now want to introduce the basics of Deep Learning under Java with DJL step by step in a series of beginner posts. This is not about quickly copying code snippets, but about really understanding the framework and the concepts.

If you can’t wait, you can already find a lot of complete examples in DJL’s Github repository, both as Java projects and as interactive Jupyter notebooks.

However, we will go a little deeper and start with the two most essential interfaces of the DJL API: ai.djl.ndarray.NDManager, ai.djl.ndarray.NDArray. Both are interfaces that are implemented at runtime by one of the underlying engines. For the time being, this will mostly be Apache MXNet, but implementations based on TensorFlow and PyTorch are already in the works.

Getting started with the API: creating an NDManager

The NDManager takes care of managing data on a device - often the GPU. Access to this data is given in the form of NDArray instances. If one trains a new DJL model or uses an existing one, the NDManager is created by the corresponding auxiliary classes. If you want to access it directly for test purposes or “non-Deep Learning” applications, you can simply create it as follows:

NDManager manager = NDManager.newBaseManager();
    NDManager managerOnCPU = NDManager.newBaseManager(Device.cpu());

In the first variant, DJL selects a so-called device on which the operations are executed - usually the first available GPU or otherwise the CPU if no GPUs are usable. If one wants to select a very special device manually, one uses the second variant.

The most important class of the DJL API: NDArray

If you want to perform calculations, you have to put the values you want to calculate with into NDArray. To create a new NDArray, you need an NDManager. This then places the data on its device outside the Java heap and manages the memory required for it:

NDArray pi        = manager.create((float)Math.PI);
    NDArray e         = manager.create(Math.E);
    NDArray one       = manager.create((byte)1);
    NDArray theAnswer = manager.create(42);
    NDArray big       = manager.create(Long.MAX_VALUE);
    NDArray isTrue    = manager.create(true);

The simplest way to create an NDArray is to wrap a single value in an NDArray. This can be a Java primitive, or a class that implements Number, such as Integer or Float. Unlike, for example, java.util.List, NDArray is not generic, so we cannot tell from the type what data is stored. So while you can create a List<Float>, there is no NDArray<Float>. To find out what the type of data stored in the NDArray is, there is the method NDArray.getDataType(). These are the data types that specify the previously created NDArrays:

System.out.println(pi.getDataType());        //float32
    System.out.println(e.getDataType());         //float64
    System.out.println(one.getDataType());       //int8
    System.out.println(theAnswer.getDataType()); //int32
    System.out.println(big.getDataType());       //int64
    System.out.println(isTrue.getDataType());    //boolean

The possible data types of an NDArray can be found in the enum ai.djl.ndarray.types.DataType. Most NDArray data types correspond 1:1 to a Java primitive:

  • floatDataType.FLOAT32
  • doubleDataType.FLOAT64
  • byteDataType.INT8
  • intDataType.INT32
  • longDataType.INT64
  • booleanDataType.BOOLEAN

The data type of the created NDArray thus depends on the Java data type passed to the create method. However, there are also two data types that have no Java equivalent: UINT8 (an unsigned byte) and FLOAT16 (a float value with lower precision; less precise, but saves memory, which can sometimes be scarce on graphics cards). To create NDArrays of this type, one must first create an array of another type and then manually convert the data type:

NDArray pi16 = pi.toType(DataType.FLOAT16, true);

The second parameter, copy, specifies whether the existing NDArray is modified or whether a new copy is obtained and the old NDArray is retained.

Other ways to create NDArrays

There are a number of other ways to create an NDArray. Practically all of them are member functions of the NDManager. The most important method is - as above - the create method. However, it accepts not only single values, but also arrays of Java primitives and number instances. Very often you will create NDArrays from one or two dimensional float[] or int[] arrays.

In addition, there are the methods NDManager.arange and NDManager.linspace, with which one can create sequences of numbers as NDArrays, e.g. 0, 1, 2, 3 or 0.0, -0.1, -0.2, -0.3. The start value, end value and step size can be set. This is very useful to quickly create some test data, but also, for example, to create offsets for input data in very small calculation steps in a neural network.

With NDManager.ones and NDManager.zeros you can create NDArrays of any size, filled with ones or zeros. Finally, the methods with which one creates NDArrays filled with random numbers are very important in practice. With NDManager.randomNormal, NDManager.randomUniform and NDManager.randomMultinomial one can generate random numbers with the corresponding probability distributions. This is especially important for neural networks, because they have to be randomly initialised before they can be trained.

Calculations on NDArrays

Now that we have packaged data so that DJL can work with it, we can also perform mathematical operations:

System.out.println(pi.sin().getFloat()); //-8.742278E-8

All calculations are now performed natively on the device of the underlying NDManager. When calculating a single value, this is of course neither exciting nor useful. The back and forth between GPU and JVM is slower and more time-consuming than simply calculating everything in Java. It becomes exciting when we have a lot to calculate at once. For testing, we generate 100 million random numbers:

float[] random = new float[1000 * 1000 * 100];
    Random rand = new Random();
    for (int i = 0; i < random.length; ++i) {
        random[i] = rand.nextFloat();
    }

Now we calculate the sine of each of these numbers in Java:

float[] sines1 = new float[random.length];
    for (int i = 0; i < random.length; ++i) {
        sines1[i] = (float)Math.sin(random[i]);
    }

On one of our working laptops this takes about 3s. Now we perform the same calculation using DJL on the GPU:

NDArray randOnGpu = manager.create(random);
    float[] sines2 = randOnGpu.sin().toFloatArray();

This takes about 500ms, so it is six times as fast. As a rule, calculations with DJL are even faster by a much larger factor than in Plain Java. The main time eater in our example is the transfer from and to the graphics card. If one stays on the GPU and performs many operations in succession, the relative time gain compared to an unaccelerated solution becomes greater and greater.

The shape of NDArrays - Shape

Important when using NDArrays compared to normal arrays is not only the higher speed, but also the much more readable code. All operations are executed “vectorised”, that is, with all elements at once. With an operation like sin() you can easily imagine this, because you only need one input for a sine - the operation is simply repeated on every element of the array.

It gets exciting with operations where NDArrays are combined, e.g. with a simple addition (the result is always given in the comment above the call):

// I. 4
    manager.create(2).add(manager.create(2));
    // II. [10, 12, 14, 16, 18, 20, 22, 24]
    manager.arange(0, 8).add(manager.arange(10, 18));
    // III. [ 2,  3,  4,  5,  6,  7,  8,  9]
    manager.arange(0, 8).add(manager.create(2));
    // IV.
    // [[ 100, 1001],
    //  [ 102, 1003],
    //  [ 104, 1005],
    //  [ 106, 1007],
    // ]
    manager.arange(0, 8).reshape(4, 2)
        .add(manager.create(new int[]{100, 1000}));

The first example is unsurprising: 2 + 2 = 4. The second is more interesting: You can simply add two arrays with one call, the elements are added together in each case (this corresponds to a vector addition). The third example is even more interesting: It shows that the NDArrays do not necessarily have to have the same size. If you add a single value, it is added to all the elements of the first NDArray. It gets really exciting in example IV. Here we see a new, important operation on arrays, reshape. If you omit it in this example, the code crashes. But what does reshape do and how does the result come about?

So far we have learned that an NDArray has a data type (e.g. FLOAT32) and a size (the number of elements in the array). But an NDArray also has a shape. The shape determines how arithmetic operations that combine arrays must handle the array. In the example above, the number series [0, 1, ... , 7] is given a new shape by reshape. It is no longer a number series (a vector), but a series of series (a matrix). The call reshape(4, 2) means that the existing series is to be divided into four pieces of length two. For this to work, the resulting shape must have the same size as the original one. Since 2 * 4 = 8, this is no problem here. But since we now only have rows of length two at the “end” of the NDArray, another row of length two can now be added. There is only one of them, but it will be used every time. This behaviour is called broadcasting and is an essential feature of all deep learning frameworks.

If you don’t know what shape an NDArray has, you can always find out with getShape. The reshaping of NDArrays and the correct linking of NDArrays of different shapes is one of the most important and trickiest tasks in programming Deep Learning systems. For the budding Java Deep Learning expert, it is important to know the broadcasting behaviour for a number of important operations like add, sub, mul, dot, matMul etc. pp. in order to effectively and elegantly broadcast formulas and pseudocode into a concatenation of NDArray operations.

The memory management of NDArrays

Now that we know how to create and use NDArrays, all that remains is to clean up after ourselves when the work is done. As mentioned earlier, NDArrays are placeholders for data on a Device used by the NDManager. However, the memory of this device cannot be managed by the memory manager of the JVM, so we have to take care of it ourselves. Each initial NDArray must be closed again (as with streams with .close()). so that the underlying native memory, e.g. on the GPU, becomes available again.

This could of course be done with each array individually, preferably with tryfinally blocks. However, operations on NDArrays also create new NDArrays in the native memory of the Device. If we add two arrays, a new one is created for the result. (Exceptions here are some special variants of operations that make changes in the NDArray, like addi. These have the suffix -i for in place) . Closing all these intermediate results can quickly become tedious. But fortunately there is a simple solution: All these NDArrays are linked to an NDManager, through which they were created directly or indirectly. However, NDManager itself also implements AutoClosable! Closing the manager in turn closes all NDArrays “descended” from it, so that one can easily clean up all memories with one operation.

But what if you don’t want to close all NDArrays, but only those that have been created, for example, during an intermediate calculation? This is also quite simple: With NDManager.newSubManager() you can create a “submanager” that behaves like the original manager but does not “inherit” its NDArrays. With this submanager one can now perform calculations, and then close only the submanager. The original manager and its arrays are then retained.

Conclusion

In this introduction we have seen how to use the most basic classes of the DJL API: NDManager and NDArray. In the following post, we will then take the next step towards Deep Learning and load data for our first example in such a way that it can be used by DJL for training. To do this, we will have to create, fill and transform NDArrays, as well as make the first calculations so that our data can also be “digested” by a neural network.