Building up on the previous article, we review some basic learning algorithms in this article.
A learning algorithm will work on training data set (generally denoted by m) and output a hypothesis (h).
y=h(x) where 'x' is the input. And the hypothesis h is derived by using the dataset m. For any new input 'x', we can predict the value 'y' using the hypothesis 'h'.
Linear Regression:
Hypothesis for a linear regression (simple) learning algorithm with a variable x will be
Our learning algorithm will find out values of theta zero and theta one which will be a good fit for the data - gives us values close to 'y' for every 'x' in dataset. In order to do so, we define something called a cost function, which is essentially a squared error function.
To find the correct values of theta zero and theta one, we have to minimize the cost function over theta zero and theta one.
Mathematically, that means that we are trying to find a straight line which best fits in all the provided training data. We can use gradient descent algorithm to minimize our cost function:
In this we iteratively find new values of theta zero and theta one simulatneously.
Repeat until convergence (for j =0 & j=1) {
}
Alpha = Learning rate of the gradient descent.
In case of multiple variables or features (multivariate linear regression), our functions will change
For example new hypothesis function will be:
where we have n features or variables. Similarly cost function and gradient descent functions will incorporate all n features.
Feature scaling: In order to make the gradient descent converge , we can scale all features in the same range. So let's if feature x1 was taking values from 0-1 and feature x2 was taking values from 0-5000, then we can scale down x2 by a factor of 5000 so that it also take ups values in the range similar to x1.
Polynomial Regression:
We can also have hypothesis functions in which we use squared or cubic or squared root values of features. Here is an example for polynomial regression:
If number of features are not very large we can use normal equation to compute theta.
Logistic Regression:
Logistic regression is unlike the name suggests a classification algorithm. Predicted output from logistic regressions is always between 0 and 1. The hypothesis function is based on the sigmoid function.
List of machine learning posts :
1. Nuts: Machine Learning -1 , Basics of machine learning
2. Machine Learning - 2 : Regression Learning Algorithms
References -
1. https://www.coursera.org/learn/machine-learning
2. http://cs229.stanford.edu
A learning algorithm will work on training data set (generally denoted by m) and output a hypothesis (h).
y=h(x) where 'x' is the input. And the hypothesis h is derived by using the dataset m. For any new input 'x', we can predict the value 'y' using the hypothesis 'h'.
Linear Regression:
Hypothesis for a linear regression (simple) learning algorithm with a variable x will be
Our learning algorithm will find out values of theta zero and theta one which will be a good fit for the data - gives us values close to 'y' for every 'x' in dataset. In order to do so, we define something called a cost function, which is essentially a squared error function.
To find the correct values of theta zero and theta one, we have to minimize the cost function over theta zero and theta one.
Mathematically, that means that we are trying to find a straight line which best fits in all the provided training data. We can use gradient descent algorithm to minimize our cost function:
In this we iteratively find new values of theta zero and theta one simulatneously.
Repeat until convergence (for j =0 & j=1) {
}
Alpha = Learning rate of the gradient descent.
In case of multiple variables or features (multivariate linear regression), our functions will change
For example new hypothesis function will be:
where we have n features or variables. Similarly cost function and gradient descent functions will incorporate all n features.
Feature scaling: In order to make the gradient descent converge , we can scale all features in the same range. So let's if feature x1 was taking values from 0-1 and feature x2 was taking values from 0-5000, then we can scale down x2 by a factor of 5000 so that it also take ups values in the range similar to x1.
Polynomial Regression:
We can also have hypothesis functions in which we use squared or cubic or squared root values of features. Here is an example for polynomial regression:
If number of features are not very large we can use normal equation to compute theta.
Logistic Regression:
Logistic regression is unlike the name suggests a classification algorithm. Predicted output from logistic regressions is always between 0 and 1. The hypothesis function is based on the sigmoid function.
List of machine learning posts :
1. Nuts: Machine Learning -1 , Basics of machine learning
2. Machine Learning - 2 : Regression Learning Algorithms
References -
1. https://www.coursera.org/learn/machine-learning
2. http://cs229.stanford.edu
No comments:
Post a Comment