Six Jars of Machine Learning || CODING SNAP ||

Six Jars of Machine Learning

In this emerging era of Machine Learning and Artificial Intelligence. The new budding developers are trying to switch into ML or aspire to be a Machine learning engineer. But when someone with no technical background tries to learn the concepts and models, they get stuck and feel demotivated.

Though, mathematics also plays a major part in the demotivation.

Here, I want to simplify the term Machine Learning into six categories, whenever you try implementing a model or algorithm, you can map your approach to these six categories, and you will be good to go without getting stuck or confused.

Basically, these six categories can be thought to be the six jars of machine learning that have a lot many cookies in it.

As demonstrated in the picture, the six jars are namely:

Data
Task
Model
Loss
Learning
Evaluation

Let's demystify them one by one:

Data

Objective: Collect and curate the data.

Data is the core of any ML/AI algorithm. It must be supplied in the form that the algorithm understands. The main function of the ML/AI algorithm is to unlock the concealed information within the data. There is a lot of valuable information or knowledge hidden within the structure of the data. We need to carefully examine it and take it into usage.

The algorithm will end up providing incorrect, bogus insights if the data is available in the form not comprehended by the algorithm. This might end in the revenue loss for the project or the company.

Important points to remember about the data is that:

All data should be encoded as numbers
They are typically high-dimensional

Now, the question arises, from where to get the data?

So, we can get the data in the following ways:

Web Scraping
Open repositories - eg: GoogleAI, wiki data, UCI, governmental data(data.gov.in)
Amazon Mechanical Turk, Data Turk
For Business-specific, generate on your own using tools.

Task

Objective: Taking the task into account.

After having the data, we need to pre-process it and take into account what all tasks can be done using them. Finding out the necessary dimensions and getting rid of the irrelevant data should be the first and foremost thing.

Categorizing the task -

Example: Classification, Regression, Prediction, Clustering, Analysis, Pattern Tracking, Decision-making, Anamoly Detection, Spam Detection, etc.

Model

Objective: Representing the task into mathematical function.

Typically, for every existing case, there is a function f(x) that exists between an input x and the output y. But it is unknown!

If we knew this function, there was no need to apply or design any kind of algorithm, we could directly hardcode it and get our exact accurate results. This function could be called the True relation.

Now, since we can't find that true relation between the inputs and the outputs. So in machine learning, we humans define an approximate function that gives out the output which is very close to the true/actual output.

Loss

Objective: Calculate the Loss

To calculate the loss, we take the loss functions into account. There are several Loss functions that help in estimating the difference or loss between the calculated value and the actual true output.

Hence, we can say that a loss function basically estimates how good or bad the model is? and also helps in determining which model is best suited for the task.

Few examples of Loss Functions are:

Mean Squared Loss(MSE)
Cross-Entropy Loss
KL Divergence etc.

Learning

Objective: Implementing Learning Algorithm

This is the stage where human interaction is not entertained, it is totally the machine's space where it learns through the mathematical functions such that the loss is minimum.

It can be formulated as the optimization problem and there are many optimization solvers available.

Examples of Learning Algorithms:

Gradient Descent
Adagrad
RMSProp
BPTT

Evaluation

Objective: Estimate Accuracy

Evaluation is basically estimating the degree of correctness by comparing the performance of our model with the actual results, We need to note that evaluation is always done on a different set of data, generally known as the Test data.

We can use evaluation metrics to calculate the accuracy of our model.

Examples of od Evaluation Metrics:

(Top K)Accuracy
Precision
Recall
F1 Score etc.

So, this was all the six jars of Machine Learning. Now, whenever you try to build any model, just map them with these six jars/stages, it will help you in analyzing and formulating your approach.

If you want to get started with the Machine Learning Algorithms, then please check this:
Detailed Implementation of Linear Algorithm explaining the six jars properly.

Six Jars of Machine Learning || CODING SNAP ||