My clan

One amazing thing I find about fellow academics is the ease with which we connect.

Even when I meet them after decades, the shared knowledge goes beyond technical. There is an open-ness which allows us to trust each-other, joke about ourselves, gripe about students ( yes, most of the people I know do that while admitting that they themselves were the same) and discuss health, family and society.

Regression, How to model

This year my linear algebra class is using regression to model real world data. The data ranges from climate change to bank interests to chemical reactions.

A standard question is, given the data, how do we choose the model. Since most of the data is in two variables say (x,y). Here is the usual process.

First Plot the data, using scatter plot. This will give you an idea as to “do you expect a linear or a nonlinear relationship between x and y?”

Consider if smoothing the data will help. If your graph looks like a noisy line or a noisy quadratic, rolling average will make it smoother.

Decide on the model. Looking at the scatter plot you should be able to get an idea if the relationship between x and y is linear, quadratic, cubic and so on.

Write out your model, for example $y = a + b x + c x^2$. Thus each value of the data point when plugged into the model will give you a linear equation in the parameters a,b,c. For the collection of these linear equations you can write the matrix equation $A \vec{x} = \vec{b}$ (note that here $\vec{x}$ contains the parameters as its elements).

Use the normal equation $A^TA \vec{x} = A^T \vec{b}$ or the equation $\vec{x} = (A^T A)^{-1} A^T \vec{b}$ to find the best fit parameter values ($\vec{x}$).

The easiest way of solving the above equations is to use matlab or mathematica, which have built-in functions for matrix manipulations. (most of the programming languages like C or python also may have corresponding libraries). However writing the code is better in terms of gaining skills and making your foundations stronger.

Please note that even if you find splines easy to use for interpolation, regression is a better choice for modeling as the resultant equation is simpler.

Remember that when the parameters are found by minimizing the magnitude square of the error vector using calculus, one would get the same result for the best fit parameters. That method is known as “the least square method“.

You may post your doubts below, and if you are at the Ahmedabad University then catch me after a class.

Dinosaurs

My colleaue sometimes brought his five year old son to the lab. The kid was very much into Dinosaurs, so he would generally arrange his toys on the table, bring out drawing pads and proceed to make amazing sketches.

I don’t know about you, but I thoroughly enjoy kidding with the kids. So once I asked, “will the T-rex eat the Stegosaurus?”. The kid gravely replied “T rex lived in Cretaceous while Stegosaurus lived in Jurasic, so that is not possible”. I rightfully felt put to my place, no kidding.