Deep Java Learning - NDManager & NDArray
After our first presentation of Amazon’s new Deep Learning Framework for Java, DJL, we now want to introduce the basics of Deep Learning under Java with DJL step by step in a series of beginner posts. This is not about quickly copying code snippets, but about really understanding the framework and the concepts.
If you can’t wait, you can already find a lot of complete examples in DJL’s Github repository, both as Java projects and as interactive Jupyter notebooks.
However, we will go a little deeper and start with the two most essential interfaces of the DJL API: ai.djl.ndarray.NDManager
, ai.djl.ndarray.NDArray
. Both are interfaces that are implemented at runtime by one of the underlying engines. For the time being, this will mostly be Apache MXNet, but implementations based on TensorFlow and PyTorch are already in the works.
Getting started with the API: creating an NDManager
The NDManager
takes care of managing data on a device
- often the GPU. Access to this data is given in the form of NDArray
instances. If one trains a new DJL model or uses an existing one, the NDManager
is created by the corresponding auxiliary classes. If you want to access it directly for test purposes or “non-Deep Learning” applications, you can simply create it as follows:
NDManager manager = NDManager.newBaseManager();
NDManager managerOnCPU = NDManager.newBaseManager(Device.cpu());
In the first variant, DJL selects a so-called device
on which the operations are executed - usually the first available GPU or otherwise the CPU if no GPUs are usable. If one wants to select a very special device
manually, one uses the second variant.
The most important class of the DJL API: NDArray
If you want to perform calculations, you have to put the values you want to calculate with into NDArray
. To create a new NDArray
, you need an NDManager
. This then places the data on its device
outside the Java heap and manages the memory required for it:
NDArray pi = manager.create((float)Math.PI);
NDArray e = manager.create(Math.E);
NDArray one = manager.create((byte)1);
NDArray theAnswer = manager.create(42);
NDArray big = manager.create(Long.MAX_VALUE);
NDArray isTrue = manager.create(true);
The simplest way to create an NDArray
is to wrap a single value in an NDArray
. This can be a Java primitive, or a class that implements Number
, such as Integer
or Float
. Unlike, for example, java.util.List
, NDArray
is not generic, so we cannot tell from the type what data is stored. So while you can create a List<Float>
, there is no NDArray<Float>
. To find out what the type of data stored in the NDArray
is, there is the method NDArray.getDataType()
. These are the data types that specify the previously created NDArray
s:
System.out.println(pi.getDataType()); //float32
System.out.println(e.getDataType()); //float64
System.out.println(one.getDataType()); //int8
System.out.println(theAnswer.getDataType()); //int32
System.out.println(big.getDataType()); //int64
System.out.println(isTrue.getDataType()); //boolean
The possible data types of an NDArray
can be found in the enum ai.djl.ndarray.types.DataType
. Most NDArray
data types correspond 1:1 to a Java primitive:
float
→DataType.FLOAT32
double
→DataType.FLOAT64
byte
→DataType.INT8
int
→DataType.INT32
long
→DataType.INT64
boolean
→DataType.BOOLEAN
The data type of the created NDArray
thus depends on the Java data type passed to the create
method. However, there are also two data types that have no Java equivalent: UINT8
(an unsigned byte) and FLOAT16
(a float
value with lower precision; less precise, but saves memory, which can sometimes be scarce on graphics cards). To create NDArray
s of this type, one must first create an array of another type and then manually convert the data type:
NDArray pi16 = pi.toType(DataType.FLOAT16, true);
The second parameter, copy
, specifies whether the existing NDArray
is modified or whether a new copy is obtained and the old NDArray
is retained.
Other ways to create NDArray
s
There are a number of other ways to create an NDArray
. Practically all of them are member functions of the NDManager
. The most important method is - as above - the create
method. However, it accepts not only single values, but also arrays of Java primitives and number
instances. Very often you will create NDArrays
from one or two dimensional float[]
or int[]
arrays.
In addition, there are the methods NDManager.arange
and NDManager.linspace
, with which one can create sequences of numbers as NDArrays
, e.g. 0, 1, 2, 3
or 0.0, -0.1, -0.2, -0.3
. The start value, end value and step size can be set. This is very useful to quickly create some test data, but also, for example, to create offsets for input data in very small calculation steps in a neural network.
With NDManager.ones
and NDManager.zeros
you can create NDArrays
of any size, filled with ones or zeros. Finally, the methods with which one creates NDArrays
filled with random numbers are very important in practice. With NDManager.randomNormal
, NDManager.randomUniform
and NDManager.randomMultinomial
one can generate random numbers with the corresponding probability distributions. This is especially important for neural networks, because they have to be randomly initialised before they can be trained.
Calculations on NDArray
s
Now that we have packaged data so that DJL can work with it, we can also perform mathematical operations:
System.out.println(pi.sin().getFloat()); //-8.742278E-8
All calculations are now performed natively on the device
of the underlying NDManager
. When calculating a single value, this is of course neither exciting nor useful. The back and forth between GPU and JVM is slower and more time-consuming than simply calculating everything in Java. It becomes exciting when we have a lot to calculate at once. For testing, we generate 100 million random numbers:
float[] random = new float[1000 * 1000 * 100];
Random rand = new Random();
for (int i = 0; i < random.length; ++i) {
random[i] = rand.nextFloat();
}
Now we calculate the sine of each of these numbers in Java:
float[] sines1 = new float[random.length];
for (int i = 0; i < random.length; ++i) {
sines1[i] = (float)Math.sin(random[i]);
}
On one of our working laptops this takes about 3s. Now we perform the same calculation using DJL on the GPU:
NDArray randOnGpu = manager.create(random);
float[] sines2 = randOnGpu.sin().toFloatArray();
This takes about 500ms, so it is six times as fast. As a rule, calculations with DJL are even faster by a much larger factor than in Plain Java. The main time eater in our example is the transfer from and to the graphics card. If one stays on the GPU and performs many operations in succession, the relative time gain compared to an unaccelerated solution becomes greater and greater.
The shape of NDArray
s - Shape
Important when using NDArray
s compared to normal arrays is not only the higher speed, but also the much more readable code. All operations are executed “vectorised”, that is, with all elements at once. With an operation like sin()
you can easily imagine this, because you only need one input for a sine - the operation is simply repeated on every element of the array.
It gets exciting with operations where NDArray
s are combined, e.g. with a simple addition (the result is always given in the comment above the call):
// I. 4
manager.create(2).add(manager.create(2));
// II. [10, 12, 14, 16, 18, 20, 22, 24]
manager.arange(0, 8).add(manager.arange(10, 18));
// III. [ 2, 3, 4, 5, 6, 7, 8, 9]
manager.arange(0, 8).add(manager.create(2));
// IV.
// [[ 100, 1001],
// [ 102, 1003],
// [ 104, 1005],
// [ 106, 1007],
// ]
manager.arange(0, 8).reshape(4, 2)
.add(manager.create(new int[]{100, 1000}));
The first example is unsurprising: 2 + 2 = 4. The second is more interesting: You can simply add two arrays with one call, the elements are added together in each case (this corresponds to a vector addition). The third example is even more interesting: It shows that the NDArray
s do not necessarily have to have the same size. If you add a single value, it is added to all the elements of the first NDArray
. It gets really exciting in example IV. Here we see a new, important operation on arrays, reshape
. If you omit it in this example, the code crashes. But what does reshape
do and how does the result come about?
So far we have learned that an NDArray
has a data type (e.g. FLOAT32
) and a size (the number of elements in the array). But an NDArray
also has a shape. The shape determines how arithmetic operations that combine arrays must handle the array. In the example above, the number series [0, 1, ... , 7]
is given a new shape by reshape
. It is no longer a number series (a vector), but a series of series (a matrix). The call reshape(4, 2)
means that the existing series is to be divided into four pieces of length two. For this to work, the resulting shape must have the same size as the original one. Since 2 * 4 = 8, this is no problem here. But since we now only have rows of length two at the “end” of the NDArray
, another row of length two can now be added. There is only one of them, but it will be used every time. This behaviour is called broadcasting and is an essential feature of all deep learning frameworks.
If you don’t know what shape an NDArray
has, you can always find out with getShape
. The reshaping of NDArrays
and the correct linking of NDArrays
of different shapes is one of the most important and trickiest tasks in programming Deep Learning systems. For the budding Java Deep Learning expert, it is important to know the broadcasting behaviour for a number of important operations like add
, sub
, mul
, dot
, matMul
etc. pp. in order to effectively and elegantly broadcast formulas and pseudocode into a concatenation of NDArray
operations.
The memory management of NDArray
s
Now that we know how to create and use NDArray
s, all that remains is to clean up after ourselves when the work is done. As mentioned earlier, NDArray
s are placeholders for data on a Device
used by the NDManager
. However, the memory of this device cannot be managed by the memory manager of the JVM, so we have to take care of it ourselves. Each initial NDArray
must be closed again (as with streams with .close()
). so that the underlying native memory, e.g. on the GPU, becomes available again.
This could of course be done with each array individually, preferably with try
…finally
blocks. However, operations on NDArray
s also create new NDArray
s in the native memory of the Device
. If we add two arrays, a new one is created for the result. (Exceptions here are some special variants of operations that make changes in the NDArray
, like addi
. These have the suffix -i for in place
) . Closing all these intermediate results can quickly become tedious. But fortunately there is a simple solution: All these NDArray
s are linked to an NDManager
, through which they were created directly or indirectly. However, NDManager
itself also implements AutoClosable
! Closing the manager in turn closes all NDArray
s “descended” from it, so that one can easily clean up all memories with one operation.
But what if you don’t want to close all NDArray
s, but only those that have been created, for example, during an intermediate calculation? This is also quite simple: With NDManager.newSubManager()
you can create a “submanager” that behaves like the original manager but does not “inherit” its NDArray
s. With this submanager one can now perform calculations, and then close only the submanager. The original manager and its arrays are then retained.
Conclusion
In this introduction we have seen how to use the most basic classes of the DJL API: NDManager
and NDArray
. In the following post, we will then take the next step towards Deep Learning and load data for our first example in such a way that it can be used by DJL for training. To do this, we will have to create, fill and transform NDArrays
, as well as make the first calculations so that our data can also be “digested” by a neural network.