Statistics , Probability and Information Theory

Member Feature Story

The right — and wrong — way to compete with a locked market

"We know The past but cannot Control it...We control the future But can Not Know It.."

-Cluade Shannon
Why Linear Algebra Critical to Understand Deep Learning..?


nformation theory is a measurement of the quantity of information. It consolidates calculus, probability, and statistics which are mathematical roots of deep learning. It is all about how we represent information in the form of: ● Bits ● Few bunch of letters on a paper ● Sound ● Visual signals

Throughout history, humans not only evolved but also our way to represent information. The way we communicate today is changed significantly. In order to make a machine understand the information, we must know how to describe information with a mathematical approach. The significance of Information theory: In a language translator, there are 2 major components: 1. Encoder: Takes input in a language and converts it into vector. 2. Decoder: Take vector as input and translates it into a target language. In order to follow semantic and syntactic rules while translation it must keep track of objects and their relationship in the input signal.

This is the part where information theory plays important role. Here Information measures decides our training objectives. If we have less information then the next question arises how much information is required?

Few widely used and advanced applications of deep learning and information theory:
1. Noise removal in communication signals 2. Theory of GAN 3. Natural Language Processing 4. Image Captioning

Here at CTL we will cover essential concepts like Probability Distributions, Marginal Probability, Conditional Probability, Bayes’ Rule and more. Concepts of information theory like entropy, cross entropy, mutual information, KL Divergence…… And how Probability and Information Theory correlate which will help you in creating optimized deep learning models.

In Deep learning mainly 2 parts came - Computer vision (CNN) - Image recogtniton, object detection, activity recognition etc. Natural language processing - text classification, language translation, chatbots etc.

CNN - does take a biological inspiration from the visual cortex. The visual cortex has small regions of cells that are sensitive to specific regions of the visual field, like some neurons fired when exposed to vertical edges and some when shown horizontal or diagonal edges.

A more detailed overview of what CNN's do would be that you take the image, pass it through a series of convolutional, non-linear, pooling (downsampling), and fully connected layers, and get an output.

Don’t be feared from this layers they just feature extractors and make complex decision combining those simple features.