MLOps: Establishment and operation of an AI
With Machine Learning Operations (MLOps) we ensure that data is efficiently and strategically integrated into business processes through regular and automated training, thus contributing to increased revenue. The challenge is to establish and maintain these automated processes.
This means adapting the frictionless installation on test and production systems (Continuous Integration/Continuous Delivery) to new requirements arising from ML in enterprise software development.
We would like to show you what Machine Learning Operations means and which steps are necessary for the implementation by means of concrete examples. Basically, it’s about the life cycle of an entire AI system: From the collection of existing data, to the correct use and training of algorithms, to the constant “data feeding” for the continuous optimisation of the system - all in compliance with corporate governance.
In this series of posts, we will dedicate ourselves to the individual steps around Machine Learning Operations and will go through many scenarios that will introduce you to different software and their importance within MLOps.
The questions we ask ourselves at the beginning of a project and which serve as the basis for the topic series on MLOps are the following:
- where does the data come from that we want to use to achieve a certain goal?
- how do we version the data or create a selection of data to achieve our goal?
- how do we carry out training on the data?
- how do we control and ensure the quality of the system created from the data?
- how do we deploy or integrate the trained model into your system?
For all these steps there are different starting points and dependencies. What amount of data do we have to reckon with and how big is the resulting model? Is the architecture fixed and can established tools be used? Do we trigger the training automatically? Do we have a specific target (supervised training) or should our trained algorithm be able to recognise correlations and patterns itself (self-supervised training)?
Data is as important as programmed source code.
Why is data so important? Data offers enormous potential in the field of research and development and is therefore an essential tool for many entrepreneurial processes - even in the world of big data, which is constantly being re-analysed.
At the beginning of establishing an AI system, everything is exploratory: we take a close look at the data from which our model is to learn. Because all machine learning operations always begin with the optimisation of the available data - these are decisive for the quality of the AI model. They are therefore just as important as the human-programmed source code, which describes all the functions of a programme and its execution.
The AI’s training algorithm should independently recognise correlations from the input data and the outgoing information and learn from them. In the two-dimensional example about the chirping of crickets depending on the temperature, you can read again how the relationship between the available data, the previous knowledge and the result is.
With the knowledge gained from a generated neural network, we can, for example, make predictions, assign images to a feature and many other things in which an AI system should support the company.
When we talk about the model learning, the algorithm uses the available data sets (observation/observation) to draw conclusions or gain insights. Bad data that is useless for our goal means underfitting: bad results with which, in the worst case, we can do nothing. The game starts all over again: we need more data. But even supposedly useless data can produce surprising results. Therefore, we will take a closer look at the importance of raw data or a data pipeline in another article from the MLOps series.
Continuous Delivery and Version Control
We determine how “well” our model has been trained based on the validation/test data and the selected procedure. If the result is satisfactory, a functioning practical application can be ensured by delivering it, e.g. as a ready-to-use JAVA Enterprise component. In parallel, the training data is expanded and trained again on this functioning version to ensure continuous development and optimisation.
This process is repeated in a continuous loop so that the data is subjected to quality assurance at short intervals. These automated test runs, using a CI system such as Jenkins, allow you to review the build after each change to the source code or database, and then provide feedback on the exploitation. If the automated test criteria are passed, deployments can be fully automated, just as with traditional software, even with AI/ML software.
Practical MLOps with Standard Tools
In the following articles, we would like to provide a practical, efficient view of MLOps that does not require a whole new set of tools, but instead uses, as far as possible, the tools that have been established and proven over the years in software development. In this way, the introduction of MLOps in already existing teams can be accelerated considerably, as it is possible to fall back on existing know-how and infrastructure. In the coming months we will show you
- how you can use standard CI systems like Jenkins to do training, quality control and deployment of ML systems.
- when Git is sufficient for versioning training data and when you need other ways to archive data
- how to manage trained ML models with a standard repository manager like Nexus
- how to use established build systems like Maven or Gradle to integrate ML models into software
- how to prepare a JAVA server environment for the use of GPU-accelerated neural networks.
In the next article, we would therefore like to start with the area of production. Later in the article series, we will also describe the more complicated part of Research and Development (R&D) and Continuous Training. Although R&D takes place before production in the practical world, it is easier to understand this part first. In our daily practice we often find that in many use cases it is enough to use an existing model (e.g. open source) than to train our own. In this case, the question of proper integration into production becomes immediately relevant, while R&D and automated training are skipped for the time being.
Be curious!