The buzzword “Machine Learning” is currently ubiquitous. Whether in the context of Industry 4.0, e-commerce, or marketing, wherever data is present, machine learning is fundamentally applicable. However, what is actually understood by this term is often unclear. The following article provides an overview of what truly lies behind it.
Renowned expert Robert Schapire views machine learning as an automated process to optimize or improve future activities based on historical data. This explanation clarifies why it is called “machine learning”:
The computer, or rather the digital processes running within it, are automatically trained on existing data. The acquired knowledge is then applied to new data and related processes. This is the essential difference from traditional analytical methods, which primarily involve manual model building, followed by implementing these models as “static” solutions.
If process conditions change, this can quickly lead to erroneous results. Machine learning reduces this type of error. With each new dataset, more knowledge is acquired, continuously improving the model. This happens without human-controlled interaction. Therefore, machine learning is also considered a subcategory of artificial intelligence. The human role is limited to selecting the relevant data and algorithms, parameterizing them, and monitoring the process. There are four types of machine learning:
Supervised learning involves training models on labeled data. Labeled data includes both independent variables (features) and dependent variables (labels). The label is incorporated into the model training so that the trained model can predict the label for new data (without labels). Example algorithms include Support Vector Machines, Stochastic Gradient Descent, Naive Bayes Classification, Random Forests, and Neural Networks (Supervised).
Unsupervised learning assumes that there is no training data and the outcome is unclear. Unlike supervised learning, there is no label (dependent variable). The algorithm processes the data completely exploratively and performs an independent classification (clustering). Selected algorithms include K-Means Clustering, Hierarchical Clustering, Affinity Propagation, and Principal Component Analysis.
Semi-supervised learning is a hybrid of the two previously mentioned types and is often used when only a small portion of the data is labeled. The goal of this approach is to assign labels to the unlabeled data. A widespread method here is Label Propagation. This method is similar to the clustering algorithm: data is divided into clusters, and within each cluster, the same label is assigned to the unlabeled data as the labeled data.
Reinforcement learning means that the algorithm learns a strategy (policy) to reach a specific goal (goal-state) in a defined environment. It transitions from one state to another using predefined actions until the goal is achieved. An example of this is the Markov Decision Process.
The two most commonly used types of machine learning are supervised learning and unsupervised learning. Both are applied in a wide range of applications, including image analysis, image recognition, voice control, speech recognition, chatbots, facial recognition, and sentiment analysis.