|
Research Interests
My primary research lies in evaluating Agentic systems on generation quality, search retrieval,
tool-use, and responsible AI. When conversations extend beyond single-turn and/or have
complex, multi-step tool-use with error handling involved, metrics such as correctness, completeness,
and relevancy do not suffice. My work extends beyond conversational quality to measure compound metrics
like user utility, user-perceived defects, and multi-tool planner evaluation. My second, complementary
focus is on minimizing latency and cost by intelligently routing complex tasks to frontier models while
reducing escalation rate and maintaining high quality, accurate generations.
My graduate thesis titled Invariant
Kernels for Few-shot Learning generalized the idea of orbit embeddings (projection of embeddings
to low-dimensional feature space) for invariance modeling to patch-based image classification in the
few-shot learning setting. In simple terms, it guaranteed strong invariance to data
augmentations/perturbations to images such as rotation, translation, etc. a in
few-shot learning setting.
|
News
- Joined Human-centered AI working on scaling agentic evaluations for Apple Media Products
- Joined the Seller Experience Innovation org working on evaluating Seller Assistant and generative
AI-enabled
Listings
- Joined Amazon as an Applied Scientist in the Returns & Recommerce org working on multi-modal fraud
detection, large-scale insights generation, and LLMs for conversational returns
- Awarded Outstanding Thesis
Award at UIC during Fall 2019. Only Master Thesis in a cohort of five theses awarded this
title
- Joined CCC as a Senior R&D Engineer working on image classification and segmentation
- Continuing internship at CCC for the Fall 2018, Winter 2018, and Spring 2019 semester
- Featured on Intel's
website discussing my work, computer vision problems, and how Intel can help.
- Presented poster on Tiramisu DenseNet
Architecture for Precise Segmentation at Intel AI Booth at CVPR 2018.
- Joined as an R&D Engineer Intern at CCC Intelligent Solutions.
- Joined Intel AI Ambassador Program.
- Mentioned in UCSC newsletter for
developing a low-poly VR application (Google Pixel 2 + DayDream) at CruzHacks 2017.
- Awarded Best Microsoft Hack at HackHarvard
2017.
- Awarded Best Technical Innovation at Amity University Convocation 2017.
|
|
|
Monocular Depth Estimation: A Survey
Amlaan Bhoi
arXiv Preprint, 2019
arxiv
Monocular depth estimation is often described as an ill-posed and inherently ambiguous problem. Estimating
depth from 2D images is a crucial step in scene reconstruction, 3D object recognition, segmentation, and
detection. The problem can be framed as: given a single RGB image as input, predict a dense depth map for
each pixel. This problem is worsened by the fact that most scenes have large texture and structural
variations, object occlusions, and rich geometric detailing. All these factors contribute to difficulty in
accurate depth estimation. In this paper, we review five papers that attempt to solve the depth estimation
problem with various techniques including supervised, weakly-supervised, and unsupervised learning
techniques. We then compare these papers and understand the improvements made over one another. Finally,
we explore potential improvements that can aid to better solve this problem.
|
|
|
Spatio-temporal Action Recognition: A Survey
Amlaan Bhoi
arXiv Preprint, 2019
arxiv
The task of action recognition or action detection involves analyzing videos and determining what action
or motion is being performed. The primary subject of these videos are predominantly humans performing some
action. However, this requirement can be relaxed to generalize over other subjects such as animals or
robots. The applications can range from anywhere between human-computer inter-action to automated video
editing proposals. When we consider spatiotemporal action recognition, we deal with action localization.
This task not only involves determining what action is being performed but also when and where it is being
performed in said video. This paper aims to survey the plethora of approaches and algorithms attempted to
solve this task, give a comprehensive comparison between them, explore various datasets available for the
problem, and determine the most promising approaches.
|
|
|
A Comprehensive Comparison between Neural Style Transfer and Universal Style Transfer
Somshubra Majumdar,
Amlaan Bhoi,
Ganesh Jagadeesan
arXiv Preprint, 2018
arxiv
|
code
Style transfer aims to transfer arbitrary visual styles to content images. We explore algorithms adapted
from two papers that try to solve the problem of style transfer while generalizing on unseen styles or
compromised visual quality. Majority of the improvements made focus on optimizing the algorithm for
real-time style transfer while adapting to new styles with considerably less resources and constraints. We
compare these strategies and compare how they measure up to produce visually appealing images. We explore
two approaches to style transfer: neural style transfer with improvements and universal style
transfer. We also make a comparison between the different images produced and how they can be
qualitatively measured.
|
|
Various Approaches to Aspect-based Sentiment Analysis
Amlaan Bhoi,
Sandeep Joshi
arXiv Preprint, 2018
arxiv
|
code
The problem of aspect-based sentiment analysis deals with classifying sentiments (negative, neutral,
positive) for a given aspect in a sentence. A traditional sentiment classification task involves treating
the entire sentence as a text document and classifying sentiments based on all the words. Let us assume,
we have a sentence such as "the acceleration of this car is fast, but the reliability is
horrible". This can be a difficult sentence because it has two aspects with conflicting sentiments about
the same entity. Considering machine learning techniques (or deep learning), how do we encode the
information that we are interested in one aspect and its sentiment but not the other? Let us explore
various pre-processing steps, features, and methods used to facilitate in solving this task.
|
|
|
Iris: Speech to Code Converter
Amlaan Bhoi,
Shubadra Govindan,
Sandeep Joshi,
Debojit Kaushik
HackHarvard, 2017
devpost
|
code
A semantic speech to code generator.
- Trained an intent classification model in Microsoft LUIS to recognize 15+ commands.
- Implemented a message passing protocol using RabbitMQ to talk between backend scripts, ElectronJS,
and Visual Code extension.
- Wrote Python API wrappers for Microsoft LUIS and Google Cloud Speech API.
|
|
|
Conditional Random Fields for Structured Output Prediction
Amlaan Bhoi,
Somshubra Majumdar,
Ganesh Jagadeesan
Advanced Machine Learning, Spring 2018
code
|
report
An optical character recognition system to detect letters and words using conditional random fields.
- Implemented linear-chain Conditional Random Fields from scratch to detect characters on UPenn OCR
dataset.
- Implemented the Viterbi algorithm for forward-backward message passing between nodes, calculated the
log probabilities and gradients, and used LBFGS solver to reach convergence.
- Achieved 84% letter-wise accuracy with dynamic programming implementation.
- Wrote a PETSc/Tao version to run on ACER cluster in parallel
using MPI code.
- Implemented SGD with Nestorov Momentum, AMSGrad, and Adam with MCMC for CRFs to compare with LBFGS
implementation and plot comparison charts on different λ values.
|
|
|
MS Apriori: Rule Mining with Multiple Minimum Supports
Amlaan Bhoi, Sandeep Joshi
Data Mining & Text Mining, Spring 2018
code
An association rule mining (unsupervised learning) algorithm with multiple minimum support. This algorithm
can be used for product recommendations based on historical data.
|
|
|
Alethea: Data science, visualization, and analysis
Somshubra Majumdar,
Amlaan Bhoi,
Debojit Kaushik,
Christopher Alphones
Introduction to Data Science, Spring 2018
code
|
demo
An ETL pipeline, visualization, classical ML prediction, and ML & DL sentiment analysis application on
publicly available Chicago and Yelp data.
- Performed data discovery, integration, and visualization on Chicago datasets using Pandas, Numpy, and React Recharts.
- Achieved 81.9% sentiment analysis accuracy using Multiplicative LSTMs on Yelp Reviews dataset.
- Achieved 91.3% accuracy predicting types of robberies occuring in Chicago for the Summer of 2018
based on previous crime and weather datasets.
|
|
|
LifeGuard: Action Recognition of Drowning while Swimming
Sudipta Swarnakar,
Amlaan Bhoi,
Chetan Velivela
HackHarvard, 2017
devpost
We trained a 3D Convolutional Neural Network model on Microsoft Azure to detect drowning people in
swimming pools. We also created the bounding boxes for our train, test, and validation set.
|
|
|
ARYouThereYet
Sandeep Joshi, Amlaan Bhoi, Debojit Kaushik
Virtual and Augmented Reality, Fall 2017
project
page
|
code
|
video
An ARKit iOS application utilizing Google Maps and Mapbox APIs to show nearby attractions in Augmented
Reality with support for visualizing the distance, detailed description of places, an AR walking guide to
destinations, support for saving favorite places, and more.
|
|
|
AutoColor: Color Segmentation using Clustering
Amlaan Bhoi
Summer, 2017
code
A K-means clustering algorithm using OpenCV and Scikit-Learn that detects K dominant colors in an
image. Autopicks K using Silhouette
Coefficient metric and MiniBatchKMeans
for testing.
|
“Machine learning systems don't understand the world—they optimize signals. Large language
models only appear intelligent because they predict patterns humans find meaningful.”
~ GPT-4o
Design from here & inspiration from here
|
|