November 12, 2021
Picking an algorithm for a machine learning project can be confusing business. Most data scientists will tell you that there is not always a perfect answer to the question “What algorithm or model should I use?” In this series, we look to break down the important details that go into making that decision.
In this article we will look at the big picture topics of supervised learning and unsupervised learning. This is the first hurdle to picking an algorithm, as the types of algorithms available to you completely depend on the type of problem you are trying to solve.
Supervised learning is often what people think of when they think of machine learning. This is when you have a known target variable, and you are trying to predict it. Think about trying to predict; how much a home will sell for, how many items I need to stock on the shelves to accurately match demand, or if an incoming email is spam or not. In all these predictions, you are trying to predict a value (price, number of items) or category (spam/not spam).
Supervised learning needs to be trained on labeled data – data that already knows the outcome. If you want a model to predict sale prices of a house, you need to have data that knows what houses previously sold for. To predict sale volumes, you need to know past sale volumes, and to predict if an email is spam or not, you need a whole lot of emails that are already marked as spam or not spam. Supervised learning takes this training data and uses algorithms to predict the target based on past patterns. Some examples of supervised learning algorithms in the Coleman AI suite are Linear Learner, XGBoost, Decision Tree, Random Forest, Extra Trees, Multilayer Perceptron, and DeepAR. All of these algorithms will have their strengths and weaknesses in certain problems, but for these algorithms that training needs to happen on previously labeled data.
One of our retail customers, immensely concerned with margins and properly priced gasoline in a penny profit business, had a problem they needed to solve. They need to be able to price their gasoline perfectly and down to the penny because that penny means millions at large scale and pricing accuracy is paramount. This client uses Coleman AI to predict margins and utility expenses for each of their stores using Supervised algorithms. They use profit/loss statements, regional information, and other data from the data lake to predict a specific value – and that is supervised learning. You can hear their story here.
Unsupervised learning on the other hand, does not need labeled data. Unsupervised algorithms make inferences from data using what it’s given, often grouping the data into intuitive groups. If you had pictures of different zoo animals, and didn’t tell the computer which one was which, it could separate out the tigers from the lions despite not knowing what a tiger or a lion is using clustering methods, and that is unsupervised learning.
Here is an example of a supervised learning model in Infor Coleman. After the import of data, the green boxes perform some of the crucial supervised learning tasks, the supervision! The target label gets identified, as well as the features that will be used to predict it. The data gets split for training and test before the model gets trained. Then, Coleman can test, score, and evaluate the performance on the remaining data. The drag and drop interface even lets you easily test and compare different models and their respective performance.
We will continue this blog series in the coming weeks to further understand the world of machine learning.
Infor Coleman ML is part of the Infor technology platform. If you would like to learn more about how Coleman can benefit your business and the industry specific machine learning models Infor can deploy, please don’t hesitate to contact us.