The most intuitive way to understand Machine learning is,
to be a machine.
1.
So what is ‘learning’?
Just try to think like that. One day you woke up, and you were a machine.
As this machine, you got a mission - classifying spam emails.
Among thousands of emails, classifying only the spam.
The first and easiest method you could come up with, is hunting down
emails containing certain words in their titles. Let’s try it.
Question 1
You are a machine which operates following command.
Command: Select the emails which include any of the following words.
{
“discount”, “deal”, “free” }
Result:
As you can see, there are some problems.
Now let’s try a more advanced way.
Question 2
You are a machine which operates following commands.
Command 1: Select the emails which include any of the following words.
{“discount”,
“free”, “premium”}
Command 2: If you find “Deal” in the title, then follow these orders:
I. De-select the emails which
also include any of the following words.
{“Business”,
“Report”, “Sales”}
II. Select only the emails which
don’t satisfy the condition “I”.
Result:
But do you really think that these two commands will classify thousands more emails correctly as well? Not realistically.
For instance, if spam email (1) changes “Free” to “for nothing”, then it can avoid this spam filtering.
Another email with the title “Modifications WRT Korea Deal” will be classified as spam, because it includes “Deal” but doesn’t satisfy command 2-I.
In the real world, those who spread spam emails research and develop their method in order to successfully advertise certain goods, following the updates of spam filtering systems.
Therefore, it doesn’t seem realistic to solve the spam classification issue by adding many more commands with countless conditions.
His immediate superior needs to set him work by giving him step-by-step instructions. When he encounters a situation which is slightly different from the guidelines he was given, then he will run to his superior and complain, “I don’t know how to do this, please solve it for me!”
Then the superior may think,
‘Why can’t he do this simple job by himself !’
The head of department tabs his shoulder and says,
“I totally believe in you, Mr. Yun.”
He always manages to achieve results before the deadline, without
being told how to do things step-by-step, as was the case with Mr.
Yun. We call this ability ‘flexibility’ or ‘applicability.’
How could Mr. Yun solve this problem by himself ? – Not by making a step-by-step list and modifying / adding new conditions whenever an error occurs.
Let’s stop thinking like a machine for now, and try the following question.
Question 3
As a human, use your own ability and deal with the following case.
Find the spam emails in the [Data] section, and check the numbers of
them in the [Label] section.
Data
(1) Online Casino 800% return Free entrance limited time!
(2) 50% OFF every products overnight shipping LAST Deal!
(3) Modifications of sales report related to Korea Deal
(4) Just 24 Hours Left to Get 90% Off Premium Upgrade!
Label
(There is no grading here, because you already made the solutions firsthand)
What you just did is called “labeling”, which means that you have added
a label tag. More precisely, you marked which answers are correct out of
the four email titles (the data).
Index |
Email title |
Label* |
1 |
Online Casino 800% return Free entrance limited time! |
1 |
2 |
50% OFF every products overnight shipping LAST Deal! |
1 |
3 |
Modifications of sales report related to Korea Deal |
0 |
4 |
Just 24 Hours Left to Get 90% Off Premium Upgrade! |
1 |
* 1: spam, 0: not spam
Let me tell you a story. Emily is solving problems in her Korean
learning workbook. If she doesn’t have the solution, she cannot know
whether her answers are correct or not. Only when she has the solution
can she grade herself by comparing her ‘predicted answers’ to the
‘correct answers’ as listed as the solutions.
When her answer is correct, she happily continues marking her
work; when her answer is wrong, however, she thinks about why she may
have got it wrong and adjusts her initial thinking in relation to that
part. This is her algorithm...
Emily’s learning algorithm
1.
Hide the solutions.
2.
Solve the problems in her notebook. (Prediction stage)
3.
Compare her answers to the correct answers, as per the solution
page.
5.
Repeat this process.
This is the basic notion of learning.
If it’s right, then keep going on; if it’s wrong, then adjust it in
direction to the right answer.
Let’s liken this process to that which a machine does, as an example;
you have four pieces of data which have been labeled.
When you put this labeled data into a ‘learning machine’, the machine
executes the following process.
Machine’s learning algorithm
1.
Keep the labels (solution) away.
2.
Make predictions based on data.
3.
Compare machine’s predictions to the correct answers on the
label.
5.
Repeat this process.
Just like Emily, the machine proceeds to learn by comparing its predictions to the correct answers which are labeled. Which means,
If it’s right, then keep going on; if it’s wrong, then adjust it in
direction to the right answer.
This is machine learning.
Then you changed the method after facing many problems, to the ‘learning’ method – troubleshooter Mr. Yun’s method.
There are other countless cases where machine learning is making enormous differences out there.
Can you tell what ‘learning’ means now?
In this lecture, you successfully ‘labeled’, and began to understand a bit of the learning process. In the following lecture, we’ll learn about the process of learning and prediction, and about training set and test set.
After the Chapter ‘1: Big picture at glance’, we’ll take a look into the algorithms of machine learning. Of course, through the same method of thinking as a machine and solving problems, as you did today.
0 댓글