Amlaan Bhoi (aam-LAAN bho-EE)
Seattle, WA

I am an applied scientist at Amazon in the WW Returns, ReCommerce & Sustainability organization working on computer vision and natural language processing systems in the Reverse Logistics domain.

I graduated with a Master in Computer Science from University of Illinois at Chicago with a focus on machine learning & computer vision where I was fortunate enough to be advised by Xinhua Zhang. I was also an Intel AI Ambassador where I shared upcoming research on ML & computer vision. Before that, graduated from Amity University with a Bachelor of Technology in Computer Science & Engineering in May 2017 with First Class honors where I was advised by Sushil Kumar.

E-mail  |  Curriculum Vitae  |  Publications  |  LinkedIn  |  Github  |  Twitter

Research Interests

My research interests lie in computer vision, image processing, machine learning, optimization, and statistical learning theory. I'm also passionate about augmented reality & computational photography. I also dabble around with topics on self-supervised learning, distributed machine learning, and monocular depth estimation.

My thesis titled Invariant Kernels for Few-shot Learning generalized the idea of orbit embeddings (projection of embeddings to low-dimensional feature space) for invariance modeling to patch-based image classification in the few-shot learning setting. In simple terms, it guaranteed strong invariance to data augmentations/perturbations to images such as rotation, translation, etc. in few-shot learning.

  • Joined Amazon as an Applied Scientist in the WW Returns, Recommerce & Sustainability organization.
  • Awarded Outstanding Thesis Award at UIC during Fall 2019. Only Master's Thesis in a cohort of five theses awarded this title.
  • Joined CCC Information Services as a Sr. Data Scientist, Computer Vision.
  • Continuing internship at CCC Information Services for the Fall 2018, Winter 2018, and Spring 2019 semester.
  • Featured on Intel's website discussing my work, computer vision problems, and how Intel can help. You can check it out here.
  • Presented poster on Tiramisu DenseNet Architecture for Precise Segmentation at Intel AI Booth at CVPR 2018.
  • Joined as an R&D Intern, Computer Vision in the Photo Analytics and Machine Learning group at CCC Information Services.
  • Joined Intel AI Ambassador Program.
  • Mentioned in UCSC newsletter for developing a low-poly VR application (Google Pixel 2 + DayDream) at CruzHacks 2017.
  • Awarded Best Microsoft Hack at HackHarvard 2017.
  • Awarded Best Technical Innovation at Amity University Convocation 2017.
Monocular Depth Estimation: A Survey
Amlaan Bhoi
arXiv Preprint, 2019
Monocular depth estimation is often described as an ill-posed and inherently ambiguous problem. Estimating depth from 2D images is a crucial step in scene reconstruction, 3D object recognition, segmentation, and detection. The problem can be framed as: given a single RGB image as input, predict a dense depth map for each pixel. This problem is worsened by the fact that most scenes have large texture and structural variations, object occlusions, and rich geometric detailing. All these factors contribute to difficulty in accurate depth estimation. In this paper, we review five papers that attempt to solve the depth estimation problem with various techniques including supervised, weakly-supervised, and unsupervised learning techniques. We then compare these papers and understand the improvements made over one another. Finally, we explore potential improvements that can aid to better solve this problem.
Spatio-temporal Action Recognition: A Survey
Amlaan Bhoi
arXiv Preprint, 2019
The task of action recognition or action detection involves analyzing videos and determining what action or motion is being performed. The primary subject of these videos are predominantly humans performing some action. However, this requirement can be relaxed to generalize over other subjects such as animals or robots. The applications can range from anywhere between human-computer inter-action to automated video editing proposals. When we consider spatiotemporal action recognition, we deal with action localization. This task not only involves determining what action is being performed but also when and where it is being performed in said video. This paper aims to survey the plethora of approaches and algorithms attempted to solve this task, give a comprehensive comparison between them, explore various datasets available for the problem, and determine the most promising approaches.
A Comprehensive Comparison between Neural Style Transfer and Universal Style Transfer
Somshubra Majumdar, Amlaan Bhoi, Ganesh Jagadeesan
arXiv Preprint, 2018
arxiv | code
Style transfer aims to transfer arbitrary visual styles to content images. We explore algorithms adapted from two papers that try to solve the problem of style transfer while generalizing on unseen styles or compromised visual quality. Majority of the improvements made focus on optimizing the algorithm for real-time style transfer while adapting to new styles with considerably less resources and constraints. We compare these strategies and compare how they measure up to produce visually appealing images. We explore two approaches to style transfer: neural style transfer with improvements and universal style transfer. We also make a comparison between the different images produced and how they can be qualitatively measured.
3DSP Various Approaches to Aspect-based Sentiment Analysis
Amlaan Bhoi, Sandeep Joshi
arXiv Preprint, 2018
arxiv | code
The problem of aspect-based sentiment analysis deals with classifying sentiments (negative, neutral, positive) for a given aspect in a sentence. A traditional sentiment classification task involves treating the entire sentence as a text document and classifying sentiments based on all the words. Let us assume, we have a sentence such as "the acceleration of this car is fast, but the reliability is horrible". This can be a difficult sentence because it has two aspects with conflicting sentiments about the same entity. Considering machine learning techniques (or deep learning), how do we encode the information that we are interested in one aspect and its sentiment but not the other? Let us explore various pre-processing steps, features, and methods used to facilitate in solving this task.

Iris: Speech to Code Converter
Amlaan Bhoi, Shubadra Govindan, Sandeep Joshi, Debojit Kaushik
HackHarvard, 2018
devpost | code
A semantic speech to code generator.
  • Trained an intent classification model in Microsoft LUIS to recognize 15+ commands.
  • Implemented a message passing protocol using RabbitMQ to talk between backend scripts, ElectronJS, and Visual Code extension.
  • Wrote Python API wrappers for Microsoft LUIS and Google Cloud Speech API.
Conditional Random Fields for Structured Output Prediction
Amlaan Bhoi, Somshubra Majumdar, Ganesh Jagadeesan
Advanced Machine Learning, Spring 2018
code | report
An optical character recognition system to detect letters and words using conditional random fields.
  • Implemented linear-chain Conditional Random Fields from scratch to detect characters on UPenn OCR dataset.
  • Implemented the Viterbi algorithm for forward-backward message passing between nodes, calculated the log probabilities and gradients, and used LBFGS solver to reach convergence.
  • Achieved 84% letter-wise accuracy with dynamic programming implementation.
  • Wrote a PETSc/Tao version to run on ACER cluster in parallel using MPI code.
  • Implemented SGD with Nestorov Momentum, AMSGrad, and Adam with MCMC for CRFs to compare with LBFGS implementation and plot comparison charts on different λ values.
MS Apriori: Rule Mining with Multiple Minimum Supports
Amlaan Bhoi, Sandeep Joshi
Data Mining & Text Mining, Spring 2018
An association rule mining (unsupervised learning) algorithm with multiple minimum support. This algorithm can be used for product recommendations based on historical data.
Alethea: Data science, visualization, and analysis
Somshubra Majumdar, Amlaan Bhoi, Debojit Kaushik, Christopher Alphones
Introduction to Data Science, Spring 2018
code | demo
An ETL pipeline, visualization, classical ML prediction, and ML & DL sentiment analysis application on publicly available Chicago and Yelp data.
  • Performed data discovery, integration, and visualization on Chicago datasets using Pandas, Numpy, and React Recharts.
  • Achieved 81.9% sentiment analysis accuracy using Multiplicative LSTMs on Yelp Reviews dataset.
  • Achieved 91.3% accuracy predicting types of robberies occuring in Chicago for the Summer of 2018 based on previous crime and weather datasets.
LifeGuard: Action Recognition of Drowning while Swimming
Sudipta Swarnakar, Amlaan Bhoi, Chetan Velivela
HackHarvard, 2017
We trained a 3D Convolutional Neural Network model on Microsoft Azure to detect drowning people in swimming pools. We also created the bounding boxes for our train, test, and validation set.
Sandeep Joshi, Amlaan Bhoi, Debojit Kaushik
Virtual and Augmented Reality, Fall 2017
project page | code | video
An ARKit iOS application utilizing Google Maps and Mapbox APIs to show nearby attractions in Augmented Reality with support for visualizing the distance, detailed description of places, an AR walking guide to destinations, support for saving favorite places, and more.
AutoColor: Color Segmentation using Clustering
Amlaan Bhoi
Summer, 2017
A K-means clustering algorithm using OpenCV and Scikit-Learn that detects K dominant colors in an image. Autopicks K using Silhouette Coefficient metric and MiniBatchKMeans for testing.

“There are many problems in vision where getting 50% of the solution takes one minute, getting to 90% can take you a day, getting to 99% may take you five years, and 99.99% may be not in your lifetime.” ~ Jitendra Malik
Design stolen from here & inspiration from here