Friday, July 22, 2016

Machine Learning - 3 : Evaluating a Learning Algorithm


Some of the promising things that we can try in case our learning algorithm is erroneous. Some of the strategies for evaluating a learning algorithm are as of follows

1. Splitting Data Set.

Let's say we spilt the training data in some ratio (70:30). 70% of the data is used as the training set, rest of the data (30%) is set as test data. If the data is not randomly ordered that it's always better to do so .

We then compute test set error, which is the cumulative error when using test data against the hypothesis obtained from training data.

2. Model Selection

What degree of polynomial should we pick?

Split the test data into three categories : training data, cross validation data & test data. (Ratio may be 60:20:20). Now we can define three types of error

Training error
Cross Validation error
Test Data error

Now we can pick up model based on which model gives us lowest cross validator error. We can then use the test data to find out generalization error for the hypothesis chosen.

3. Identifying if the learning algorithm is suffering from high bias or high variance.

Plot a graph of error vs degree of polynomial to visualize or simply calculate training error and cross validation error. In case of  high bias (under-fitting data) , both training error and cross validation will be high. If the training error is low but cross validation error is high, it means we have high variance and the data is over-fitting.

In case we regularize parameters, a higher lambda will lead to high bias and a low lambda will lead to high variance.


4. Learning curves

Plot a graph of errors (training error/cross validation error) vs no. of training set (m). We can conclude following from the graph

- If learning algorithm has high bias, adding more training examples will not help.
- If learning algorithm has high variance, adding more training examples will help.



Summary here's what we can do to improve our learning algorithm:


1. Get more training examples (Helpful in case of high variance)

2. Try smaller set of features (Helpful in case of high variance)

3. Try getting additional features (Helpful in case of high bias)

4. Try adding polynomial features (Helpful in case of high bias)

5. Try decreasing lambda (Helpful in case of high bias)

6. Try increasing lambda (Helpful in case of high variance)


In case of support vector machines:

A larger C  value (small lambda) will result in lower bias, high variance. And low value will result in high bias, low variance

Larger (sigma^2) -> Features vary more smoothly. High bias, low variance. In case of small (sigma^2) value -> lower bias , high variance - features vary less smoothly.

So in case of overfitting, decrease C, increase (sigma^2)



References -

1. https://www.coursera.org/learn/machine-learning

Tuesday, July 12, 2016

Training Neural Networks : Notes

Steps to Train Neural Network 

1. Randomly initialize weights (theta)

2.  For any input/feature set, get the value obtained from hypothesis using forward propagation (Getting activation terms)

3. Compute cost function

4. Compute partial derivatives using backward propagation (Getting delta terms)

5. Gradient checking: Compare derivative values obtained via backward propagation wrt the values obtained using numerical estimation. Disable gradient checking if there is no significant difference.

6. Use any optimization algorithm to minimize cost function.





References -

1. https://www.coursera.org/learn/machine-learning

Sunday, July 10, 2016

Machine Learning - 2 : Regression Learning Algorithms

Building up on the previous article, we review some basic learning algorithms in this article.

A learning algorithm will work on training data set (generally denoted by m) and output a hypothesis (h).

y=h(x) where 'x' is the input. And the hypothesis h is derived by using the dataset m. For any new input 'x', we can predict the value 'y' using the hypothesis 'h'.

Linear Regression: 

Hypothesis for a linear regression (simple) learning algorithm with a variable x will be

Our learning algorithm will find out values of theta zero and theta one which will be a good fit for the data - gives us values close to 'y' for every 'x' in dataset. In order to do so, we define something called a cost function, which is essentially a squared error function.

To find the correct values of theta zero and theta one, we have to minimize the cost function over theta zero and theta one.

Mathematically, that means that we are trying to find a straight line which best fits in all the provided training data. We can use gradient descent algorithm to minimize our cost function:

In this we iteratively find new values of theta zero and theta one simulatneously.

Repeat until convergence (for j =0 & j=1) {

}

Alpha = Learning rate of the gradient descent. 

In case of multiple variables or features (multivariate linear regression), our functions will change

For example new hypothesis function will be:

where we have n features or variables. Similarly cost function and gradient descent functions will incorporate all n features.

Feature scaling: In order to make the gradient descent converge , we can scale all features in the same range. So let's if feature x1 was taking values from 0-1 and feature x2 was taking values from 0-5000, then we can scale down x2 by a factor of 5000 so that it also take ups values in the range similar to x1.




Polynomial Regression: 

We can also have hypothesis functions in which we use squared or cubic or squared root values of features. Here is an example for polynomial regression:


If number of features are not very large we can use normal equation to compute theta.




Logistic Regression:

Logistic regression is unlike the name suggests a classification algorithm. Predicted output from logistic regressions is always between 0 and 1. The hypothesis function is based on the sigmoid function.








List of machine learning posts :

1.  Nuts: Machine Learning -1 , Basics of machine learning
2.  Machine Learning - 2 : Regression Learning Algorithms




References -

1. https://www.coursera.org/learn/machine-learning
2. http://cs229.stanford.edu


Tuesday, July 5, 2016

Node.js - 1

Node.js = Javascript Runtime

Built on top of Chrome's V8 Javascript Engine.

And? And it also uses libuv. So while Node.js is single threaded from development perspective, it's actually using libuv internally for threading, implementing thread pool and managing file system events. 

How does async programming work in Node.js?

Node.js applications act on events i.e. the flow of the application is governed by events (like ) . Node.js manages this asynchronously via event loop.

What's event loop?
That explanation requires another post. Event loop essentially takes up any pending tasks in task queue/ callback queue and puts them on stack.

All functions to be executed by V8 are pushed to a stack and popped when they are executed. Event Loop will wait for the Stack to be empty and if it's so it will push a function from callback queue to stack.

How event loop works is explained beautifully by Philip Roberts at https://youtu.be/8aGhZQkoFbQ


And checkout the tool demoed in the talk above at  latentflip.com/loupe

Machine Learning - 1

Machine Learning = Science of getting computer (machine) to learn without being explicitly programmed


How do machines learn? What are the major types of machine learning problems.


Supervised Learning: 

Using data supplied for learning (supervised learning). In this case , algorithm uses previous data to predict the outcome for new data. So we feed in "right answers" to the algorithm.

Using supervised learning algo we can solve a regression or a classification problem .

Predicting house prices based on given sample data (let's say a mapping of area and actual price) will be a regression problem.

Regression = Predicting continuous value output.

Predicting the probability of cancer being malignant or benign based on dataset (tumor size and cancer type) for these classes or types of cancer will be a classification problem.

Classification = Predicting a discrete output (categorization into different classes)

In above examples, we had only a feature or parameter based on which we were predicting outcomes. But there will many parameters in real world machine learning problems.

Even for house pricing example, we can add parameters like age of house, nearest market etc.

So, essentially in case of supervised learning, algorithm is explicitly told what is the so called right answer.


Unsupervised Learning: 

In unsupervised learning problems, algorithms make sense of the data set on their own - find if there is any structure in the data set. The algorithm will segregate data in different clusters (hence cluster algorithm).

Once of the example here will be grouping different types of people together given a set of DNA microarray data.



List of machine learning posts :

1.  Nuts: Machine Learning -1 , Basics of machine learning
2.  Machine Learning - 2 : Learning Algorithms



References -

1. https://www.coursera.org/learn/machine-learning
2. http://cs229.stanford.edu