Ayan Kumar Bhunia

I completed my Doctor of Philosophy (PhD), focusing on Computer Vision and Deep Learning in 2022, from SketchX Lab, Centre for Vision, Speech and Signal Processing (CVSSP) , University of Surrey, England, United Kingdom under the supervision of Prof. Yi-Zhe Song and Prof. Tao(Tony) Xiang .

Prior to that, I worked as a full-time research assistant at the Institute for Media Innovation (IMI) Lab of Nanyang Technological University (NTU) , Singapore.

Currently, I am working as a Senior Research Scientist (Computer Vision) at iSIZE, a London-based deep-tech company specializing in deep learning for video delivery.

Top-venue Publications (March 2023): 17xCVPR, 3xICCV, 3xECCV, 1xSiggraph Asia.

Google Scholar  /  GitHub  /  LinkedIn  /  DBLP

profile photo
Recent Updates

  • New!! [March 2023]: Our paper What Can Human Sketches Do for Object Detection? (CVPR'23) has been selected among 12 award candidates , out of 9155 submissions and 2360 accepted papers in CVPR, 2023 .
  • New!! [March 2023]: Got endorsement from Royal Academy of Engineering through peer-review route for UK Global Talent Visa .
  • New!! [March 2023]: Seven papers got accepted in CVPR 2023. (Arxiv/Code/Demo coming soon!!)
  • New!! [Oct 2022]: Defended my PhD Thesis before Prof. Stella Yu and Prof. Adrian Hilton — with No corrections
  • [July 2022]: Two papers got accepted in ECCV 2022.
  • [March 2022]: Four papers got accepted in CVPR 2022.
  • [July 2021]: Three papers got accepted in ICCV 2021.
  • [June 2021]: Talk on 'Beyond Supervised Sketch Representation Learning' YouTube
  • [March 2021] : Four papers got accepted in CVPR 2021.
  • [Aug 2020] : One paper got accepted in Siggraph Asia 2020. Check Online Demo
  • [Aug 2020] : One paper got accepted in BMVC 2020 for oral presentation.
  • [July 2020] : One paper got accepted in ECCV 2020.
  • [March 2020] : One paper got accepted in CVPR 2020 for oral presentation.
  • Notes: If you are interested in some potential research collaboration, feel free to contact me by Email or LinkedIn. Most importantly, I would be happy to collaborate with some really self-motivated and enthusiastic undergraduate or post-graduate students who have intention to pursue MS/Ph.D. in future.

    Research Interests

    My research focus is broadly centered around Computer Vision and Deep Learning. I have tried to explore broadly three specific topics under computer vision.

    a) Sketch for Visual Understanding: Hand-drawn sketches by nature inherit the cognitive potential of human intelligence, thus facilitating the application of sketches to various visual understanding tasks. During my PhD, my research centres around how sketches could be leveraged to address different visual understanding problems. For instance, I have extended the traditional sketch-based image-retrieval (SBIR) to an on-the-fly retrieval setup where the system starts retrieving as soon as the user starts drawing. In addition, I have explored Annotation-Efficient Learning under a low-resource data scenario that includes a semi-supervised framework for cross-modal instance-level retrieval and self-supervised learning on sparse image data like sketch/handwriting. Following the recent proliferation of touch-screen devices, sketch is a potential medium to interact with the digital system due to its fine-grained personalized controlling ability. Furthermore, I am looking forward to exploring how 2D sketches can facilitate creative image generation/editing with 3D perception.

    b) Document Image Analysis and Text Recognition: Over the last few years, I have worked on various problems of Document Analysis and Recognition. A few representative works include MetaHTR (a writer-adaptive Handwritten Text Recognition system), Unifying Handwritting and Scene Text Recognition, Unsupervised Document Image Binarization, Script Identification, etc.

    c) Image/Video Restoration: In my current company at iSIZE, I work on developing state-of-the-art Image/Video Restoration solutions. During my ongoing tenure with iSIZE, I have developed/designed the deep model for the product BitClear from the ground up that provides a low-cost neural solution for compressed-video artefacts removal. BitClear won NAB product of the Year (2022) under AI/ML category. NAB Show is the largest show for media, entertainment and technology.

    Moreover, from an industry perspective, these are the following subtopics that I have explored till now: (i) Cross-modal Image Retrieval (ii) Generative Adversarial Network (iii) Meta-Learning (iv) Self-supervised Learning (v) Image/Video Denoising (vi)Text Recognition (vii) Image to Sequence Generation (viii) RL for Vision (ix) Semi-supervised Learning (vii) Fine-grained Visual Recognition (viii) Knowledge Distillation (ix) Incremental Learning.

    Selected Publications

    2023
    Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

    Ayan Kumar Bhunia , Subhadeep Koley, Amandeep Kumar, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang , Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 (New!)

    Abstract / Code / arXiv / BibTex

    Picture that Sketch: Photorealistic Image Generation from Abstract Sketches

    Subhadeep Koley, Ayan Kumar Bhunia , Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang , Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 (New!)

    Abstract / Code / arXiv / BibTex

    SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text

    Pinaki Nath Chowdhury, Ayan Kumar Bhunia , Aneeshan Sain, Subhadeep Koley, Tao Xiang , Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 (New!)

    Abstract / Code / arXiv / BibTex

    Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR

    Aneeshan Sain, Ayan Kumar Bhunia , Subhadeep Koley, Pinaki Nath Chowdhury, Soumitri Chattopadhyay, Tao Xiang , Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 (New!)

    Abstract / Code / arXiv / BibTex

    What Can Human Sketches Do for Object Detection?

    Pinaki Nath Chowdhury, Ayan Kumar Bhunia , Aneeshan Sain, Subhadeep Koley, Tao Xiang , Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 (New!) [Top 12 Award Candidates]

    Abstract / Code / arXiv / BibTex

    CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

    Aneeshan Sain, Ayan Kumar Bhunia , Pinaki Nath Chowdhury, Subhadeep Koley, Tao Xiang , Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 (New!)

    Abstract / Code / arXiv / BibTex

    Data-Free Sketch-Based Image Retrieval

    Abhra Chaudhuri, Ayan Kumar Bhunia , Yi-Zhe Song, Anjan Dutta .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 (New!)

    Abstract / Code / arXiv / BibTex

    2022
    FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

    Pinaki Nath Chowdhury, Aneeshan Sain, Yulia Gryaditskaya, Ayan Kumar Bhunia , Tao Xiang , Yi-Zhe Song .
    European Conference on Computer Vision( ECCV ), 2022

    Abstract / Code / arXiv / BibTex

    Adaptive Fine-Grained Sketch-Based Image Retrieval

    Ayan Kumar Bhunia , Aneeshan Sain, Parth Hiren Shah, Animesh Gupta, Pinaki Nath Chowdhury, Tao Xiang , Yi-Zhe Song .
    European Conference on Computer Vision( ECCV ), 2022

    Abstract / Code / arXiv / BibTex

    Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

    Ayan Kumar Bhunia , Viswanatha Reddy Gajjala, Subhadeep Koley, Rohit Kundu, Aneeshan Sain, Tao Xiang , Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2022

    Abstract / Code / arXiv / Marktechpost Blog / BibTex

    Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval

    Ayan Kumar Bhunia , Subhadeep Koley, Abdullah Faiz Ur Rahman Khilji , Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2022

    Abstract / Code / arXiv / BibTex

    Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

    Pinaki Nath Chowdhury, Ayan Kumar Bhunia , Viswanatha Reddy Gajjala, Aneeshan Sain, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2022

    Abstract / Code / arXiv / BibTex

    Sketch3T: Test-time Training for Zero-Shot SBIR

    Aneeshan Sain, Ayan Kumar Bhunia , Vaishnav Potlapalli , Pinaki Nath Chowdhury , Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2022

    Abstract / Code / arXiv / BibTex

    2021
    Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

    Ayan Kumar Bhunia , Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song .
    IEEE International Conference on Computer Vision ( ICCV ), 2021

    Abstract / arXiv / BibTex

    Towards the Unseen: Iterative Text Recognition by Distilling from Errors

    Ayan Kumar Bhunia , Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song .
    IEEE International Conference on Computer Vision ( ICCV ), 2021

    Abstract / arXiv / BibTex

    Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

    Ayan Kumar Bhunia , Aneeshan Sain, Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Yi-Zhe Song .
    IEEE International Conference on Computer Vision ( ICCV ), 2021

    Abstract / arXiv / BibTex

    Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting

    Ayan Kumar Bhunia , Pinaki Nath Chowdhury, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2021

    Abstract / Code / arXiv / BibTex

    More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

    Ayan Kumar Bhunia , Pinaki Nath Chowdhury, Aneeshan Sain, Yongxin Yang, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2021

    Abstract / Code / arXiv / BibTex

    StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

    Aneeshan Sain, Ayan Kumar Bhunia , Yongxin Yang and , Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2021

    Abstract / arXiv / BibTex

    MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition

    Ayan Kumar Bhunia , Shuvozit Ghose, Amandeep Kumar, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2021

    Abstract / arXiv / BibTex

    2020
    Pixelor: A Competitive Sketching AI Agent. So you think you can beat me?

    Ayan Kumar Bhunia* , Ayan Das*, Umar Riaz Muhammad*, Yongxin Yang, Timothy M. Hospedalis, Tao Xiang, Yulia Gryaditskaya, Yi-Zhe Song .
    SIGGRAPH Asia , 2020.

    Abstract / Code / arXiv / BibTex / Try Online Demo (*equal contribution)

    Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

    Aneeshan Sain, Ayan Kumar Bhunia , Yongxin Yang, Tao Xiang, Yi-Zhe Song .
    British Machine Vision Conference ( BMVC ), 2020.

    Abstract / arXiv / BibTex (Oral Presentation)

    Fine-grained visual classification via progressive multi-granularity training of jigsaw patches

    Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia , Jiyang Xie, Zhanyu Ma, Yi-Zhe Song , Jun Guo .
    European Conference on Computer Vision ( ECCV ), 2020.

    Abstract / Code/ arXiv / BibTex

    Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval

    Ayan Kumar Bhunia , Yongxin Yang, Timothy M. Hospedalis, Tao Xiang, Yi-Zhe Song.
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2020.

    Abstract / Code / arXiv / BibTex (Oral Presentation)

    2019
    Handwriting Recognition in Low-Resource Scripts Using Adversarial Learning

    Ayan Kumar Bhunia , Abhirup Das, Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy.
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2019

    Abstract / Code / arXiv / BibTex

    Improving Document Binarization via Adversarial Noise-Texture Augmentation

    Ankan Kumar Bhunia, Ayan Kumar Bhunia , Aneeshan Sain, Partha Pratim Roy.
    IEEE Conference on Image Processing ( ICIP ), 2019

    Abstract / Code / arXiv / BibTex (Top 10% Papers)

    A Deep One-Shot Network for Query-based Logo Retrieval

    Ayan Kumar Bhunia , Ankan Kumar Bhunia, Shuvozit Ghose, Abhirup Das, Partha Pratim Roy, Umapada Pal
    Pattern Recognition ( PR ), 2019

    Abstract / Code / Third Party Implementation / arXiv / BibTex

    User Constrained Thumbnail Generation Using Adaptive Convolutions

    Perla Sai Raj Kishore, Ayan Kumar Bhunia , Shovozit Ghose, Partha Pratim Roy
    International Conference on Acoustics, Speech and Signal Processing ( ICASSP ), 2019

    Abstract / Code / arXiv / BibTex (Oral Presentation)

    Texture Synthesis Guided Deep Hashing for Texture Image Retrieval

    Ayan Kumar Bhunia , Perla Sai Raj Kishore, Pranay Mukherjee, Abhirup Das, Partha Pratim Roy
    IEEE Winter Conference on Applications of Computer Vision ( WACV ), 2019

    Abstract / arXiv / BibTex / Video Presentation

    Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network

    Ankan Kumar Bhunia, Aishik Konwer, Ayan Kumar Bhunia , Abir Bhowmick, Partha Pratim Roy, Umapada Pal
    Pattern Recognition ( PR ), 2019

    Abstract / Code / arXiv / BibTex



    Template credits : Dr. Jon Barron