Niluthpol Chowdhury Mithun

I am a Senior Computer Vision Scientist at Center for Vision Technologies, SRI International in Princeton, USA. I work on solving real world problems using computer vision and machine learning.

I received my Ph.D. from University of California, Riverside, under the supervision of Prof. Amit K. Roy-Chowdhury. Previously, I completed my Bachelors and Masters from Bangladesh University of Engineering and Technology.

I worked at Samsung R&D Institute Bangladesh as a Software Engineer from 2011-2014. I have spent Summer 2018 at SRI Center for Vision Technologies and Summer 2017 at Bosch Research Center, Pittsburgh as Research Intern.

Email  /  CV  /  Google Scholar  / LinkedIn      


 Research

I am interested in computer vision, machine learning, and multimedia. My researches are mainly focused on weakly supervised learning, semantic scene understanding, vision-based localization and multimodal data analysis.



 News


 Selected Publications





   Unsupervised Domain Adaptation for Semantic Segmentation with Pseudo Label Self-Refinement
   X. Zhao, Niluthpol Chowdhury Mithun, A. Rajvanshi, H. Chiu, S. Samarasekera
   IEEE Winter Conference on Applications of Computer Vision (WACV), 2024






   C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
   N. Karim, Niluthpol Chowdhury Mithun, A. Rajvanshi, H. Chiu, S. Samarasekera, N. Rahnavard
   IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023






   Cross-View Visual Geo-Localization for Outdoor Augmented Reality
   Niluthpol Chowdhury Mithun, K. Minhas, H. Chiu, T. Oskiper, M. Sizintsev, S. Samarasekera, R. Kumar
   IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR), 2023






   Text-based Temporal Localization of Novel Events
   Sudipta Paul, Niluthpol Chowdhury Mithun, Amit K Roy-Chowdhury
   European Conference on Computer Vision (ECCV), 2022




  
   Striking the Right Balance: Recall Loss for Semantic Segmentation
    Junjiao Tian, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Zsolt Kira
   International Conference on Robotics and Automation (ICRA), 2022
   [Code]





   
    SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language  
    Navigation in Continuous Environments
    M. Irshad, Niluthpol Chowdhury Mithun, Z. Seymour, H. Chiu, S. Samarasekera, R. Kumar
    International Conference on Pattern Recognition (ICPR), 2022




  
   Text-based Localization of Moments in a Video Corpus
    Sudipta Paul, Niluthpol Chowdhury Mithun, Amit K Roy-Chowdhury
   IEEE Transactions on Image Processing (TIP), 2021




 
   Long-Range Augmented Reality with Dynamic Occlusion Rendering
    Mikhail Sizintsev, Niluthpol Chowdhury Mithun, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
   IEEE Transactions on Visualization and Computer Graphics (TVCG), 2021





   MaAST: Map Attention with Semantic Transformers for Efficient Visual Navigation
   Zachary Seymour, Kowshik Thopalli, Niluthpol C Mithun, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
   International Conference on Robotics and Automation (ICRA), 2021





   RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
    Niluthpol Chowdhury Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
    ACM International Conference on Multimedia (ACM MM), 2020
   [BEST Paper Candidate]  [GRAL (Ground RGB to Aerial LIDAR) Dataset]





   Weakly Supervised Video Moment Retrieval from Text Queries
    Niluthpol Chowdhury Mithun, Sujoy PaulAmit K. Roy-Chowdhury
   IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
   [Code]

 




   Joint Embeddings with Multimodal Cues for Video-Text Retrieval
   Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury
   International Journal Multimedia Information Retrieval (IJMIR), 2019





 
   Construction of Diverse Image Datasets from Web Collections with Limited Labeling
   Niluthpol Chowdhury MithunRameswar Panda, Amit K. Roy-Chowdhury
   IEEE Transactions Circuits & Systems for Video Technology (TCSVT), 2019





   Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
    Niluthpol Chowdhury Mithun, Juncheng B Li, Florian Metze, Amit K.  Roy-Chowdhury
   ACM Int. Conference on Multimedia Retrieval (ICMR), 2018
   [Winner of BEST Paper Award] [Code]




  
   Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
    Niluthpol Chowdhury Mithun, Rameswar Panda, Evangelos Papalexakis, Amit K. Roy-Chowdhury
   ACM International Conference on Multimedia (ACM MM), 2018




  
   ODDS: Real-Time Object Detection using Depth Sensors on Embedded GPUs
   Niluthpol Chowdhury Mithun, Sirajum Munir, Karen Guo, Charles Shelton
    ACM/IEEE Conference on Information Processing in Sensor Networks (IPSN), 2018
   [video]



 Learning Long-Term Invariant features for Vision-based Localization

 Niluthpol Chowdhury Mithun, Cody Simons, Robert Casey, Stefan Hilligardt,  Amit K. Roy-Chowdhury

IEEE Winter Conference on Computer Vision (WACV), 2018 






    Diversity-aware Multi Video Summarization
      Rameswar Panda, Niluthpol Chowdhury Mithun, Amit K. Roy-Chowdhury
    IEEE Transactions on Image Processing (TIP), 2017

    [Tour20 video summarization dataset]





   Generating Diverse Image Datasets with Limited Labeling
   Niluthpol Chowdhury Mithun, Rameswar Panda, Amit K. Roy-Chowdhury
   ACM Multimedia Conference (ACM MM), 2016
   [DivNet dataset]





   Detection and Classification of Vehicles from Video using  Multiple Time-Spatial Images
   Niluthpol Chowdhury Mithun, Nafi Ur Rashid, S. M. Mahbubur Rahman
   IEEE Transactions on Intelligent Transportation Systems (TITS), 2012
   [EBVT dataset]