
The keyword machine learning is currently omnipresent. Whether in the context of industry 4.0, e-commerce or marketing – basically, machine learning is applicable, when data is available. Was jedoch tatsächlich darunter verstanden wird, ist häufig unklar. The following article gives an overview of what is really beyond that.
The renowned expert Robert Schapire sees machine learning as an automated process for optimizing and improving future activities based on historical data. This also makes clear why the whole phenomenon is called “machine learning”: the computer, or the digital processes running in it, are trained automatically on existing data to apply the acquired knowledge later on to new data and the associated processes. This is also the essential difference to previous analytical methods, which essentially precede manual modelling to subsequently implement these models as a “static” solution. If the process conditions change, it quickly leads to erroneous results. Machine learning reduces these types of mistakes. With each new data set, more knowledge is acquired and the model is constantly improved. This happens without human-controlled interaction. Therefore, machine learning is understood as a subcategory of artificial intelligence. Humans only have the task of selecting and parameterizing the relevant data and algorithms and monitoring the process.
There are four types of machine learning:
- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Reinforcement Learning
Supervised Learning implies that models are trained on labelled data. Labelled data includes both independent variables (features) and dependent variables (labels). The label is included in the model training so that the trained model can predict the label for new data (without a label). Exemplary algorithms are: Support Vector Machines, Stochastic Gradient Descent, Naive Bayes Classification, Random Forests and Neural Networks (Supervised).
Unsupervised Learning assumes that there is no training data and the result is unclear. In contrast to the Supervised Learning, there is no label (dependent variable). The algorithm processes the data completely exploratively and performs an autonomous classification (clustering). Selected algorithms include K-Means Clustering, Hierarchical Clustering, Affinity Propagation and Principal Component Analysis.
Semi-supervised Learning is a kind of mixture of the two types already mentioned and is often used when only a small part of the data is labelled. The aim of this learning approach is to assign a label to the unlabeled data. A widely used method is label propagation. The method is similar to the clustering algorithm: data is divided into clusters and within each cluster the same label can be assigned to the unlabeled data, as the labeled data have.
Reinforcement Learning means that the algorithm learns a policy to reach a certain goal state in a defined environment. By means of defined actions it changes from one state to another until the target is reached. One example is the Markov Decision Process.
The two most widely used types of machine learning are Supervised Learning and Unsupervised Learning. Both are used in a wide range of application areas. These include image analysis, image recognition, voice control, voice recognition, chat bots, face recognition and sentiment analysis.
If you have any questions about the usage of machine algorithms to optimize your processes, please do not hesitate to contact us.