Malaikannan The Deep Learning way of life

How to find a home with a kitchen you like with deeplearning?

Ai in clinical trials

American Tamil Enterprenur association invited me for a talk on AI. Below is the video on the talk, this was a fun talk loaded with lot of MEMEs

Learned index

Jeff Dean and Co came up with a Seminal Paper on whether Indexes can be learned using Neural Networks. I gave a talk in Saama Tech Talk Series on this topic. Guess I am moving more towards talking rather than writing nowadays.

Natural language processing tools for tamil

Language is a beautiful thing, Tamil as a Language is even more beautiful. Need for learning other languages like English is for commerce, communication among other things. Languages that have written form and grammar have survived longer than languages that are without it. Technology is playing the role of written form and grammar now. English being the most spoken and written language in the world has the benefit of lot of technology and tools developed to understand language better. English has lot of good systems developed as open source as well as by the companies like Google,Facebook, Microsoft etc for Speech to Text, Syntactic Parser, Stemmer, Lemmatizer, Parts of Speech Tagger etc. This was due to years of data collection, research, availability of computing power and to great extent due to Deep Learning. We have to develop most of these Natural Language Understanding tools for Tamil, there are some that exist but they are in very very early stages. When a language is beautiful it is complicated too, English has 26 alphabets and Tamil has 247 alphabets, and the grammar rules are different between English and Tamil. In most cases we are limited by the lack of available data for us(tech community) to build these tools. These tools when built should be open sourced under Apache 2.0 license, so it is available for public to use for free. I have listed few key activities that have to be performed for this to be successful

  1. Website to be created to list out mission statement and project status. This can be as simple as github pages.
  2. Android app to be developed for data collection and even gamify it. For e.g. for Speech to Text similar to how Mozilla recently asked people to donate their voices to develop open source Speech to Text System for English. They open sourced the code and the trained model.
  3. Data Access. There are research libraries that have digitized lot of books, dailies, news papers etc. If that data is publicly available, it can spawn multiple research similar to how Project Gutenberg did. One low hanging fruit for algorithm would be a vectorization engine for Tamil words.
  4. Community involvement. This effort will not succeed without community involvement, particularly without student and teacher community for data collection.
  5. Tech community involvement to develop this as open-source.
  6. Eminent linguistic advisors to advise the correct approach for Tamil.
  7. Technology advisors on right technology choices to solving problems.
  8. Advisors to help registering and managing this as a non-profit and may be even seeking grants and sponsorship. This can start as a small thing, but if it has to be taken seriously it has to be registered as a non-profit organization.
  9. Advisors who can help with media and press out-reach.

Why use Activation Functions in Deep Learning ?

Machine Learning or Deep Learning is all about using Affine Maps. Affine map is a function which can be expressed as

f(x) = WX + b

Where W and X are matrix, and b (bias term) is a vector. Deep learning learns parameters W and b. In Deep Learning you can stack multiple affine maps on top of one another. for e.g

  • f(x) = WX + b
  • g(x) = VX + d

If we stack one affine map over the other then

  • f(g(x)) = W (VX +d) + b
  • f(g(x)) = WVx + Wd + b

WV is a matrix , Wd and b are vectors.

Deep learning requires lot of affine maps stacked on top of the other. But Composing one affine map over the other gives another affine map so stacking is not going to give the desired effect and it gives nothing more than what a single affine map is going to give. It still leaves us with a linear model. In a classification problem linear model will not be able to solve for a non-linear decision boundary.

How do we solve this ? By introducing non-linearity between affine maps/layers. Most commonly used non-linear functions are

  • Tanh
  • Sigmoid
  • RELU

When there are lot of non-linear functions why use only the above ones ? Because the derivatives of these functions are easier to compute which is how Deep Learning algorithms learn. Non-Linear functions are called Activation functions in Deep Learning world.

Thanks to Dr Jacob Minz suggestion to add explanation about Universal Approximation Theorem. Universal Approximation Theorem says that when you introduce simple non-linearity between affine layers, you’ll be able to approximate any function to any arbitrary degree (as close to that function as you want). If there is a pattern in the data, the neural network will “learn” it given enough of computation and data.

You can read more about the Activation functions in wiki. Who writes better about Neural Networks than Chris Olah. Refer to his blog for further reading. Spandan Madan has written a quora answer on the similar topic