At the nurture.ai’s NIPS paper implementation challenge, I implemented and validate the paper ‘Training Deep Networks without Learning Rates Through Coin Betting’ using PyTorch. (github) This paper caught my attention due to it’s promise to get rid of the learning rate hyper-parameter during model training. The paper says: In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function.
Introduction Learning rate (LR) is one of the most important hyperparameters to be tuned and holds key to faster and effective training of neural networks. Simply put, LR decides how much of the loss gradient is to be applied to our current weights to move them in the direction of lower loss. new_weight = existing_weight - learning_rate * gradient The step is simple. But as research has shown, there is so much that can be done to improve this step alone which has a profound influence on the training.
Introduction Fast.ai’s 2017 batch kicked off on 30th Oct and Jeremy Howard introduced us participants to the ResNet model in the first lecture itself. I had used this model earlier in the passing but got curious to dig into its architecture this time. (In fact in one of my earlier client projects I had used Faster RCNN, which uses a ResNet variant under the hood.) ResNet was unleashed in 2015 by Kaiming He.
Introduction In this post, I explain the Network In Network paper by Min Lin, Qiang Chen, Shuicheng Yan (2013). This paper was quite influential in that it had a new take on convolutional filter design, which inspired the Inception line of deep architectures from Google. Motivation Anyone getting introduced to convolutional networks first come across this familiar arrangement of neurons designed by Yann LeCun decades ago: Fig. LeNet-5
These are my raw notes of Georgia Tech’s Intro to Computer Vision course taught by Aaron Bobick. Putting out mostly for me to refer to easily, but also in case anyone else finds them useful. This is a text heavy post. Lecture 1: Introduction Course Purpose: Build systems that can do image understanding. Computational Photography: About capturing light from scene, record the scene and photograph and such other related artifacts that showcase the scene Image Processing: Support the capture and display of a scene (input: image, output: modified image) Computer Vision: Interpret and analyze the scene (input: image, output: meaning).
Recently I spent 4 months at Flytbase on an interesting problem: detection and counting of Oryxes, an endangered animal in the middle east, in very high resolution aerial images. Using high resolution aerial images to train computer vision models poses unique challenges: Lack of sufficient training data: There are plenty of open training datasets out there, but almost all of them have images taken from human eye level. What makes aerial images unique is their top-down view of the objects.
As part of Udacity Machine Learning Nanaodgree program, I did my capstone in reinforcement learning. I conceptualized, designed and executed the project. I simulated an environment in V-REP framework wherein I setup a robotic arm tasked with picking up an object and putting it in a bin. I trained it to learn the same using Q-Learning technique. The repository with project report and source code is here. You can see it training here in this video:
(I wrote this piece as part of an assignment for Udacity’s AI Nanodegree program. The assignment was to summarize the AlphaGo paper in a page) Introduction Go is a two player, turn taking, deterministic game of perfect information. Two main factors make Go very complex to solve: Go has an average branching factor ‘b’ of ~250 options per node (chess ~35) Go has an average depth ‘d’ of ~150 moves (chess ~80) These factos make the state space of Go (bd) enormous to search end to end using traditional techniques.
Search algorithms help find the correct sequence of actions in a search space, to reach a goal state. The sequence of actions might be: Sequence in which cities are to be visited to travel from a source to a destination under a given cost function (shortest path, cheapest fare etc.) Sequence in which an agent should play moves in a game (chess, tic tac toe, pacman etc.) to win a board game Sequence in which a robot arm should solder components on a PCB under a given cost function (e.
Jan 2018: I am actively looking for a Deep Learning Engineer (or similar) role. Willing to relocate. You can check my credentials at https://www.linkedin.com/in/anandsaha/ I mostly blog about intuition and concepts in Deep Learning and Computer Vision. I am on a journey to explore these topics. I find them fascinating. I take notes as I read, listen, learn. These blog posts are extensions of my notes. These are mostly for me to read later.