Amlaan Bhoi

Amlaan Bhoi
Seattle, WA

I am an Applied Scientist II at Amazon in the WW Returns, ReCommerce & Sustainability organization leading, designing, and researching on AI/ML systems in the Reverse Logistics (ReLo) ecosystem.

I graduated with a Master in Computer Science from University of Illinois at Chicago with a focus on Machine Learning & Computer Vision where I was fortunate enough to be advised by Xinhua Zhang. I was also an Intel AI Ambassador where I shared upcoming research on ML & computer vision. Before that, graduated from Amity University with a Bachelor of Technology in Computer Science & Engineering in May 2017 with First Class honors where I was advised by Sushil Kumar.

Research Interests

My current research interests lie in training/aligning large language models (LLMs) to follow instructions using Reinforcement Learning from Human Feedback (RLHF) & Reinforcement Learning from AI Feedback (RLAIF), hedge on malicious inputs, and provide accurate and useful completions. Broadly, I focus on natural language understanding, computer vision, machine learning, and statistical learning theory. I sometimes dabble around with topics on distributed machine learning and multimodal learning via aligning text and vision signals.

My thesis titled Invariant Kernels for Few-shot Learning generalized the idea of orbit embeddings (projection of embeddings to low-dimensional feature space) for invariance modeling to patch-based image classification in the few-shot learning setting. In simple terms, it guaranteed strong invariance to data augmentations/perturbations to images such as rotation, translation, etc. in few-shot learning.

News

Joined Amazon as an Applied Scientist in the WW Returns, Recommerce & Sustainability organization.
Awarded Outstanding Thesis Award at UIC during Fall 2019. Only Master's Thesis in a cohort of five theses awarded this title.
Joined CCC Information Services as a Sr. Data Scientist, Computer Vision.
Continuing internship at CCC Information Services for the Fall 2018, Winter 2018, and Spring 2019 semester.
Featured on Intel's website discussing my work, computer vision problems, and how Intel can help. You can check it out here.
Presented poster on Tiramisu DenseNet Architecture for Precise Segmentation at Intel AI Booth at CVPR 2018.
Joined as an R&D Intern, Computer Vision in the Photo Analytics and Machine Learning group at CCC Information Services.
Joined Intel AI Ambassador Program.
Mentioned in UCSC newsletter for developing a low-poly VR application (Google Pixel 2 + DayDream) at CruzHacks 2017.
Awarded Best Microsoft Hack at HackHarvard 2017.
Awarded Best Technical Innovation at Amity University Convocation 2017.

Papers

	Monocular Depth Estimation: A Survey Amlaan Bhoi arXiv Preprint, 2019 arxiv Monocular depth estimation is often described as an ill-posed and inherently ambiguous problem. Estimating depth from 2D images is a crucial step in scene reconstruction, 3D object recognition, segmentation, and detection. The problem can be framed as: given a single RGB image as input, predict a dense depth map for each pixel. This problem is worsened by the fact that most scenes have large texture and structural variations, object occlusions, and rich geometric detailing. All these factors contribute to difficulty in accurate depth estimation. In this paper, we review five papers that attempt to solve the depth estimation problem with various techniques including supervised, weakly-supervised, and unsupervised learning techniques. We then compare these papers and understand the improvements made over one another. Finally, we explore potential improvements that can aid to better solve this problem.
	Spatio-temporal Action Recognition: A Survey Amlaan Bhoi arXiv Preprint, 2019 arxiv The task of action recognition or action detection involves analyzing videos and determining what action or motion is being performed. The primary subject of these videos are predominantly humans performing some action. However, this requirement can be relaxed to generalize over other subjects such as animals or robots. The applications can range from anywhere between human-computer inter-action to automated video editing proposals. When we consider spatiotemporal action recognition, we deal with action localization. This task not only involves determining what action is being performed but also when and where it is being performed in said video. This paper aims to survey the plethora of approaches and algorithms attempted to solve this task, give a comprehensive comparison between them, explore various datasets available for the problem, and determine the most promising approaches.
	A Comprehensive Comparison between Neural Style Transfer and Universal Style Transfer Somshubra Majumdar, Amlaan Bhoi, Ganesh Jagadeesan arXiv Preprint, 2018 arxiv \| code Style transfer aims to transfer arbitrary visual styles to content images. We explore algorithms adapted from two papers that try to solve the problem of style transfer while generalizing on unseen styles or compromised visual quality. Majority of the improvements made focus on optimizing the algorithm for real-time style transfer while adapting to new styles with considerably less resources and constraints. We compare these strategies and compare how they measure up to produce visually appealing images. We explore two approaches to style transfer: neural style transfer with improvements and universal style transfer. We also make a comparison between the different images produced and how they can be qualitatively measured.
	Various Approaches to Aspect-based Sentiment Analysis Amlaan Bhoi, Sandeep Joshi arXiv Preprint, 2018 arxiv \| code The problem of aspect-based sentiment analysis deals with classifying sentiments (negative, neutral, positive) for a given aspect in a sentence. A traditional sentiment classification task involves treating the entire sentence as a text document and classifying sentiments based on all the words. Let us assume, we have a sentence such as "the acceleration of this car is fast, but the reliability is horrible". This can be a difficult sentence because it has two aspects with conflicting sentiments about the same entity. Considering machine learning techniques (or deep learning), how do we encode the information that we are interested in one aspect and its sentiment but not the other? Let us explore various pre-processing steps, features, and methods used to facilitate in solving this task.

Projects

	Iris: Speech to Code Converter Amlaan Bhoi, Shubadra Govindan, Sandeep Joshi, Debojit Kaushik HackHarvard, 2018 devpost \| code A semantic speech to code generator. Trained an intent classification model in Microsoft LUIS to recognize 15+ commands. Implemented a message passing protocol using RabbitMQ to talk between backend scripts, ElectronJS, and Visual Code extension. Wrote Python API wrappers for Microsoft LUIS and Google Cloud Speech API.
	Conditional Random Fields for Structured Output Prediction Amlaan Bhoi, Somshubra Majumdar, Ganesh Jagadeesan Advanced Machine Learning, Spring 2018 code \| report An optical character recognition system to detect letters and words using conditional random fields. Implemented linear-chain Conditional Random Fields from scratch to detect characters on UPenn OCR dataset. Implemented the Viterbi algorithm for forward-backward message passing between nodes, calculated the log probabilities and gradients, and used LBFGS solver to reach convergence. Achieved 84% letter-wise accuracy with dynamic programming implementation. Wrote a PETSc/Tao version to run on ACER cluster in parallel using MPI code. Implemented SGD with Nestorov Momentum, AMSGrad, and Adam with MCMC for CRFs to compare with LBFGS implementation and plot comparison charts on different λ values.
	MS Apriori: Rule Mining with Multiple Minimum Supports Amlaan Bhoi, Sandeep Joshi Data Mining & Text Mining, Spring 2018 code An association rule mining (unsupervised learning) algorithm with multiple minimum support. This algorithm can be used for product recommendations based on historical data.
	Alethea: Data science, visualization, and analysis Somshubra Majumdar, Amlaan Bhoi, Debojit Kaushik, Christopher Alphones Introduction to Data Science, Spring 2018 code \| demo An ETL pipeline, visualization, classical ML prediction, and ML & DL sentiment analysis application on publicly available Chicago and Yelp data. Performed data discovery, integration, and visualization on Chicago datasets using Pandas, Numpy, and React Recharts. Achieved 81.9% sentiment analysis accuracy using Multiplicative LSTMs on Yelp Reviews dataset. Achieved 91.3% accuracy predicting types of robberies occuring in Chicago for the Summer of 2018 based on previous crime and weather datasets.
	LifeGuard: Action Recognition of Drowning while Swimming Sudipta Swarnakar, Amlaan Bhoi, Chetan Velivela HackHarvard, 2017 devpost We trained a 3D Convolutional Neural Network model on Microsoft Azure to detect drowning people in swimming pools. We also created the bounding boxes for our train, test, and validation set.
	ARYouThereYet Sandeep Joshi, Amlaan Bhoi, Debojit Kaushik Virtual and Augmented Reality, Fall 2017 project page \| code \| video An ARKit iOS application utilizing Google Maps and Mapbox APIs to show nearby attractions in Augmented Reality with support for visualizing the distance, detailed description of places, an AR walking guide to destinations, support for saving favorite places, and more.
	AutoColor: Color Segmentation using Clustering Amlaan Bhoi Summer, 2017 code A K-means clustering algorithm using OpenCV and Scikit-Learn that detects K dominant colors in an image. Autopicks K using Silhouette Coefficient metric and MiniBatchKMeans for testing.

“There are many problems in vision where getting 50% of the solution takes one minute, getting to 90% can take you a day, getting to 99% may take you five years, and 99.99% may be not in your lifetime.” ~ Jitendra Malik
Design stolen from here & inspiration from here