Fork me on GitHub

Saturday, June 14, 2014

Programs are programming languages

"When someone tells you, here is a new programming language, your first question shouldn't be,  'Well, gee how many characters does it take to invert a matrix?' , rather you should ask, 'If the language did not have matrices built-in, how would I go about implementing them?' "
    - Harold Abelson

This blogpost is about one of the techniques we have, to conquer complexity as programs evolve and grow. The technique of abstraction.

Let's start with the basics:
A variable is the simplest abstraction that a programmer can use.  What exactly is a variable? When you see a statement like

var rate_of_interest = 15;

what you have really done is, you have given a name to a numerical quantity, in order to avoid typing that quantity everywhere you need to use it. It is a simple abstraction which has a couple of uses. The first one is obviously the fact that your program is much easier to read and understand because of the good variable name you have used. It's much more enlightening to read:

var final_amount = (principal * rate_of_interest * time_period)/100;

Than,

var final_amount = (principal * 15 * 4)/100;

The second thing which you have gained is the ability to modify your program in one place than in 20 places where that value is used. So, that's the same advantage you gain when you follow the  'don't repeat yourself' principle when coding. But these advantages all emerge out of one thing we normally don't even think of, because it's so common - we have just built an abstraction. The kind of abstraction where we don't care about what something looks like, but only about it's behaviour. Nobody cares about which memory location the variable gets stored in when they declare a variable. Nobody has to. The same principle applies to all primitives and the means of combination a programming language natively provides you.

The most interesting part about these simple primitives and the means of combination, is that they can be used to create complex primitives, which can be given their own means of combination. These complex primitives can themselves be made use of in a much larger system as simple primitives.

The previous paragraph is mostly theory, so lets see some code. I use JS to illustrate what I mean.
JavaScript provides a means of  combining things to create complex primitives in the form of JavaScript Objects. So, if you wanted to store a student's information, you would create a JavaScript object that might look something like this:

var student = {
    name: 'Tyrion',
    age: 17,
    email: 'abc@xyz',
    major: 'history'
};

(Actually, you would have a class with a constructor that created one of these things for you when you asked it to give you a student object, but for sake of brevity, the above example is good enough). You could have stored the same information in four different variables. But of course, you would never do that. The reason why you would never do that is because, you know better. There is an abstraction here which is the fact that you can meaningfully talk about 'student' objects and explain something to the other person (or the computer) which you would lose if you had variables that stored the above information like:

var student1name = 'Tyrion';
var student1age = 17;
var student1email = 'blah';
var student1major = 'history';

In a system with had to maintain data of twenty students of a classroom, instead of having 20 'student objects', you would have 80 variables hanging around with no easy way to distinguish or use.

The next level of abstraction is what Object Oriented Programming is all about:  when you look at an object from outside the object, all you must see are it's public functions. This abstraction is one level higher than the abstraction we have seen so far in my examples. (This is one of the concepts that I took a long time to grasp when I was introduced to object oriented programming initially.)

To illustrate the principle in code, consider you have something like,

var student = {
    name: 'Tyrion',
    age: 17,
    email: 'abc@xyz',
    major: 'history'
};

and you had an array of such student objects called 'students', now suppose you wanted to know how many students were majoring in history, you would have a loop such as:

var history_majors = [];
for(var i = 0, j = students.length; i < j; i++) {
     if(students[i].major === 'history') {
         history_majors.push(students[i]);
    }
}

Would you do something like this? Of course you wouldn't!  you are better than that. Well, why not?
suppose a crazy programmer decides tomorrow to change the way student objects are represented, he decides to make all the fields in the "student" begin with an uppercase letter. (It's a contrived example, I know, but bear with me), then your code would no longer work because it's logic is intimately tied to the structure of the objects that it uses. In other words, it relies on the fact that a field called 'major' exists inside the student object.

 This violates the principle of information hiding completely. To fix this, what you would do is, you would have methods that returned you the information you needed and student objects would expose those methods to you. So, instead of saying,

students[i].major,

you would say,

students[i].getMajor()

where getMajor is a method that the student object gives you to use (it's public api) that returns the major of the object it is invoked with. This simple change is enormously important because having methods like these allow you to change the way you represent data from the way you use that data. This means you can go ahead and choose a specific way of representing data and have the choice to change it whenever you like and all your code would still function without breaking. Also, note that the student object does not provide you with any means of combination. You cannot combine two student objects in any meaningful way. When you have things that can be combined meaningfully, like say,
a vector v1 and a vector v2, then the interfaces that a vector exposes would look something like:

v3 = v1.addVector(v2).

So, you don't have to know how the vector is added, only that v1 has a method that you can use, to add another vector to it, to obtain a new vector. Again, this is an example of information hiding and exposing a means of combination, and of using complex data objects as primitives in larger programs.

You can now see a very specific pattern that we have developed to create and use abstractions well.
We can apply closure to the means to arrive at abstractions themselves! in other words, this principle of separating the data representation from its usage can be extended to whole modules, where every module might consist of a hundred different things which have controlled apis and each of those things is a compound entity which has a public api which other entities in the module use. Now the module itself has some public apis that it provides you and you make use of that in your programs to build more complex entities.

Inheritance, and Polymorphism in Object Oriented Programming,  build on this basic premise and impose special semantic rules for minimizing code duplication and for easy extensibility. A basic requirement when you use Inheritance is that a parent object can be passed to any expression or a method call or anywhere really, where a child object can be passed. This is a direct consequence of having isolated the representation of the data from it's usage.

It's important to structure your programs using these principles of information hiding because your program will eventually end up being used as a module or as a primitive in some larger system.

So, in a way, all programs are programming languages for the programs they are going to be used in.