Saturday, December 21, 2019

Characteristics of Computational Systems


1. Information Capacity governed by Shannon's Formula

Shannon's Formula describes the amount of information that can be saved by a system of N states.

S=-\sum _{i}P_{i}\log {P_{i}}
source: wikipedia

Information essentially is an interpretation of system's state. In general for any information system designed by humans, P(i) = P(j) = 1/n, so  formula can be

  S= - Summation(1:n) 1/n log b n 

This will come out to be log n where b is the base and in most cases b=2. So a system with just one state cannot store any information (log 2 1 = 0 ) and a system with two states can store log 2 2 = 1 or one bit of information. 

2. Speed - Switching between states

3. Universality - Which tasks can be solved with a computational system?



Reference - Coursera



Tuesday, April 17, 2018

Introduction to NumPy & Scikit Learn

NumPy introduces an array type data structure in Python.

import numpy as np
a = np.array([1,2,3])

We can also create matrices and do all matrix operations using NumPy

m1 = np.matrix('1 2; 3 4')m2 = np.matrix('9 10; 11 12')m1 * m2
We can also import csv files as matrices.



Scikit-learn is a machine learning library. We can preprocess data, reduce redundant variables (dimensionality reduction), implement classification & regression models & fine tune the models using  scikit-learn

Creating ML Model:

1. In order to create a feature vector, we need to factorize data so that we have numeric values for all features. This can be done using factorize functionality in pandas library.

2. Next step of scale feature vector. This can be done using inbuilt scalars in sklearn like Standard Scalar.

3. After we have scaled featured vector, we can go for dimensionality reduction. We can apply Principle Component Analysis for the same.

4. After we have final set of features, we can divide the data into test data and training data. For this we can use train_test_split functionality from sklearn.

5a) Let's say we now want to apply logistic regression to classify data. We can use inbuilt functions from sklearn for the same and run on training data. We can then verify model's performance on test data by comparing values from model's predictions and actual test data results.

5b) Let's say we are interested in finding natural grouping of data, we can implement k-means clustering in that case.

We need to define number of clusters , number of time we have to run k-means algorithm with different seed values, number of times we will iterate k-means algorithm for a set of seed values and also specify tolerance - relative tolerance with regards to inertia to declare convergence.





Monday, April 16, 2018

Introduction to Pandas Library

Pandas library Provides fast data cleaning, preparation and analysis

Built on top of NumPy , so its easy to work with array (Series) and matrices (Data Frames)

DataFrames are indexable and are made up of number of Series objects. Consider this as similar to DB table / spreadsheet - rows & columns.

Series can also be considered 1-d DataFrame - DataFrame with only 1 column.

Code Examples:


import numpy as np
import pandas as pd
s = pd.Series([1,2,4, 5, 6, 8])

s is Series. Printing s would give us

0 1
1 2
2 4
3 5
4 6
5 8
dtype: int64

Data Frame can be defined as 

df = pd.DataFrame({'date' : ['2018-04-01', '2018-04-02', '2018-04-03'],
'price': [200, 380, 405]})

This is essentially a table with 2 columns date & price and 3 rows having values. We can directly load
a csv file as DataFrame using pd.read_csv

Once the data is loaded , we can leverage the power of pandas. For the table above, let's try these

1. Find entries where price is > 200

df[df['price'] > 200]

This will return entries corresponding to 380, 405

Let's try to find total sale value:

df['price'].sum()

By default DataFrame rows are numbered 0 to n. We can also give them names. In order to use name to fetch a row, we need to set index. Let's say we wanted to name the row as a combination of data & price with a # separator:

df = df.set_index(df['date'])

This will assign the mutated data frame to the old variable. We can now query using newly named indexes

df.loc['2018-04-02'] will correspond to second row.










Tuesday, December 19, 2017

Object Functional Programming

Object functional programming can be considered as a mix of object oriented programming and functional programming. While standard OOPS approach will focus on objects, and standard functional programming function will focus on functions, object functional programming will focus on object transformations - resulting in higher productivity and quality.



References

http://www.artima.com/weblogs/viewpost.jsp?thread=275983
https://academy.realm.io/posts/altconf-saul-mora-object-orientated-functional-programming/

Sunday, December 3, 2017

Transducing in JS

Say there is a list

var list = [3,4,7,8,11,14,15]

And we want to apply following function to each element

function add1 (n) { return n+1;}

And then filter out only the odd numbers

function isOdd (n) { return n%%2=1;}

And then get sum of those numbers

function sum (total, n) { return total +n };


This can be done :

list
.map(add1)
.filter(isOdd)
.reduce(sum);

This can be reduced to a one line code by using transducer which takes up a sum function as combiner.

Transducer will compose mapReducer and filterReducer together.



Further Reading

frontendmasters.com
Full code - https://gist.github.com/getify/7ba2b8f5a50116f3efa2849e4c6d1f79
http://jlongster.com/Transducers.js--A-JavaScript-Library-for-Transformation-of-Data




Proper Tail Calls, CPS, Trampolines

Proper Tail Calls allows us to run a program with recursive calls without blowing up stack memory.  Tail calls are functions which can be executed without growing stack.

In addition to having proper tail calls, we can implement TCO - Tail Call Optimizations to improve performance of tail recursive functions.

Continuation Passing Style can be used for advanced recursion.

Using Trampolines, we will be calling functions one by one and can get around limitations of having no PTC support. A Trampoline expects a continuation function and executes the same whenever it gets one.

Further Reading
http://lucasfcosta.com/2017/05/08/All-About-Recursion-PTC-TCO-and-STC-in-JavaScript.html
frontendmasters.com
http://www.datchley.name/recursion-tail-calls-and-trampolines/

Saturday, December 2, 2017

Closures, Partial functions and Currying

Closure is when a function "remembers" the variables around it even when the function is executed elsewhere.

Any language that implements lexically scoped name binding with first class functions will have closures. It can be considered as a record storing function and its environment.

Partial functions and Currying are two different techniques for specializing a generalized functions. Partial functions will take some arguments at one time and rest at other time. Currying on the other hand provides arguments one by one for specialization. Currying is kind of calling partial function automatically at each level with one argument




Further Reading

frontendmasters.com