Ayan Kumar Bhunia

I am a Doctor of Philosophy (PhD) student, focusing on Computer Vision and Deep Learning, at SketchX Lab. of Centre for Vision, Speech and Signal Processing (CVSSP) , University of Surrey , England, United Kingdom. My primary supervisor is Prof. Yi-Zhe Song , and co-supervisors are Prof. Tao(Tony) Xiang and Dr. Yongxin Yang .

Prior to that, I worked as a full-time research assistant at the Institute for Media Innovation (IMI) Lab of Nanyang Technological University (NTU) , Singapore.

Currently, I am working as a Research Scientist (Computer Vision) at iSIZE, a London-based deep-tech company specializing in deep learning for video delivery.

Top-venue Conference publications (March 2022): 10xCVPR, 3xICCV, 1xECCV, 1xSiggraph Asia.

Google Scholar  /  GitHub  /  LinkedIn  /  DBLP

profile photo
Recent Updates

  • New!! [March 2022]: Four papers got accepted in CVPR 2022.
  • [July 2021]: Three papers got accepted in ICCV 2021.
  • [June 2021]: Talk on 'Beyond Supervised Sketch Representation Learning' YouTube
  • [March 2021] : Four papers got accepted in CVPR 2021.
  • [Aug 2020] : One paper got accepted in Siggraph Asia 2020. Check Online Demo
  • [Aug 2020] : One paper got accepted in BMVC 2020 for oral presentation.
  • [July 2020] : One paper got accepted in ECCV 2020.
  • [March 2020] : One paper got accepted in CVPR 2020 for oral presentation.
  • Notes: If you are interested in some potential research collaboration, feel free to contact me by Email or LinkedIn. Most importantly, I would be happy to collaborate with some really self-motivated and enthusiastic undergraduate or post-graduate students who have intention to pursue MS/Ph.D. in future.

    Research Interests

    My research focus is broadly centered around Computer Vision and Deep Learning. I have tried to explore broadly three specific topics under computer vision.

    a) Sketch for Visual Understanding: Hand-drawn sketches by nature inherit the cognitive potential of human intelligence, thus facilitating the application of sketches to various visual understanding tasks. During my PhD, my research centres around how sketches could be leveraged to address different visual understanding problems. For instance, I have extended the traditional sketch-based image-retrieval (SBIR) to an on-the-fly retrieval setup where the system starts retrieving as soon as the user starts drawing. In addition, I have explored Annotation-Efficient Learning under a low-resource data scenario that includes a semi-supervised framework for cross-modal instance-level retrieval and self-supervised learning on sparse image data like sketch/handwriting. Following the recent proliferation of touch-screen devices, sketch is a potential medium to interact with the digital system due to its fine-grained personalized controlling ability. Furthermore, I am looking forward to exploring how 2D sketches can facilitate creative image generation/editing with 3D perception.

    b) Document Image Analysis and Text Recognition: Over the last few years, I have worked on various problems of Document Analysis and Recognition. A few representative works include MetaHTR (a writer-adaptive Handwritten Text Recognition system), Unifying Handwritting and Scene Text Recognition, Unsupervised Document Image Binarization, Script Identification, etc.

    c) Image/Video Restoration: In my current company at iSIZE, I work on developing state-of-the-art Image/Video Restoration solutions. During my ongoing tenure with iSIZE, I have developed/designed the deep-model for the product BitClear that provides a low-cost neural solution for compressed-video artifacts removal. BitClear won NAB product of the Year (2022) under AI/ML category. NAB show is the largest show for media, entertainment and technology.

    Moreover, from an industry perspective, these are the following subtopics that I have explored till now: (i) Cross-modal Image Retrieval (ii) Generative Adversarial Network (iii) Meta-Learning (iv) Self-supervised Learning (v) Image/Video Denoising (vi)Text Recognition (vii) Image to Sequence Generation (viii) RL for Vision (ix) Semi-supervised Learning (vii) Fine-grained Visual Recognition (viii) Knowledge Distillation (ix) Incremental Learning.

    Selected Publications

    2022
    Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

    Ayan Kumar Bhunia , Viswanatha Reddy Gajjala, Subhadeep Koley, Rohit Kundu, Aneeshan Sain, Tao Xiang , Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2022 (New!)

    Abstract / Code / arXiv / Marktechpost Blog / BibTex

    Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval

    Ayan Kumar Bhunia , Subhadeep Koley, Abdullah Faiz Ur Rahman Khilji , Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2022 (New!)

    Abstract / Code / arXiv / BibTex

    Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

    Pinaki Nath Chowdhury, Ayan Kumar Bhunia , Viswanatha Reddy Gajjala, Aneeshan Sain, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2022 (New!)

    Abstract / Code / arXiv / BibTex

    Sketch3T: Test-time Training for Zero-Shot SBIR

    Aneeshan Sain, Ayan Kumar Bhunia , Vaishnav Potlapalli , Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2022 (New!)

    Abstract / Code / arXiv / BibTex

    2021
    Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

    Ayan Kumar Bhunia , Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song .
    IEEE International Conference on Computer Vision ( ICCV ), 2021

    Abstract / arXiv / BibTex

    Towards the Unseen: Iterative Text Recognition by Distilling from Errors

    Ayan Kumar Bhunia , Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song .
    IEEE International Conference on Computer Vision ( ICCV ), 2021

    Abstract / arXiv / BibTex

    Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

    Ayan Kumar Bhunia , Aneeshan Sain, Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Yi-Zhe Song .
    IEEE International Conference on Computer Vision ( ICCV ), 2021

    Abstract / arXiv / BibTex

    Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting

    Ayan Kumar Bhunia , Pinaki Nath Chowdhury, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2021

    Abstract / Code / arXiv / BibTex

    More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

    Ayan Kumar Bhunia , Pinaki Nath Chowdhury, Aneeshan Sain, Yongxin Yang, Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2021

    Abstract / Code / arXiv / BibTex

    StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

    Aneeshan Sain, Ayan Kumar Bhunia , Yongxin Yang and , Tao Xiang, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2021

    Abstract / arXiv / BibTex

    MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition

    Ayan Kumar Bhunia , Shuvozit Ghose, Amandeep Kumar, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song .
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2021

    Abstract / arXiv / BibTex

    2020
    Pixelor: A Competitive Sketching AI Agent. So you think you can beat me?

    Ayan Kumar Bhunia* , Ayan Das*, Umar Riaz Muhammad*, Yongxin Yang, Timothy M. Hospedalis, Tao Xiang, Yulia Gryaditskaya, Yi-Zhe Song .
    SIGGRAPH Asia , 2020.

    Abstract / Code / arXiv / BibTex / Try Online Demo (*equal contribution)

    Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

    Aneeshan Sain, Ayan Kumar Bhunia , Yongxin Yang, Tao Xiang, Yi-Zhe Song .
    British Machine Vision Conference ( BMVC ), 2020.

    Abstract / arXiv / BibTex (Oral Presentation)

    Fine-grained visual classification via progressive multi-granularity training of jigsaw patches

    Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia , Jiyang Xie, Zhanyu Ma, Yi-Zhe Song , Jun Guo .
    European Conference on Computer Vision ( ECCV ), 2020.

    Abstract / Code/ arXiv / BibTex

    Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval

    Ayan Kumar Bhunia , Yongxin Yang, Timothy M. Hospedalis, Tao Xiang, Yi-Zhe Song.
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2020.

    Abstract / Code / arXiv / BibTex (Oral Presentation)

    2019
    Handwriting Recognition in Low-Resource Scripts Using Adversarial Learning

    Ayan Kumar Bhunia , Abhirup Das, Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy.
    IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2019

    Abstract / Code / arXiv / BibTex

    Improving Document Binarization via Adversarial Noise-Texture Augmentation

    Ankan Kumar Bhunia, Ayan Kumar Bhunia , Aneeshan Sain, Partha Pratim Roy.
    IEEE Conference on Image Processing ( ICIP ), 2019

    Abstract / Code / arXiv / BibTex (Top 10% Papers)

    A Deep One-Shot Network for Query-based Logo Retrieval

    Ayan Kumar Bhunia , Ankan Kumar Bhunia, Shuvozit Ghose, Abhirup Das, Partha Pratim Roy, Umapada Pal
    Pattern Recognition ( PR ), 2019

    Abstract / Code / Third Party Implementation / arXiv / BibTex

    User Constrained Thumbnail Generation Using Adaptive Convolutions

    Perla Sai Raj Kishore, Ayan Kumar Bhunia , Shovozit Ghose, Partha Pratim Roy
    International Conference on Acoustics, Speech and Signal Processing ( ICASSP ), 2019

    Abstract / Code / arXiv / BibTex (Oral Presentation)

    Texture Synthesis Guided Deep Hashing for Texture Image Retrieval

    Ayan Kumar Bhunia , Perla Sai Raj Kishore, Pranay Mukherjee, Abhirup Das, Partha Pratim Roy
    IEEE Winter Conference on Applications of Computer Vision ( WACV ), 2019

    Abstract / arXiv / BibTex / Video Presentation

    Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network

    Ankan Kumar Bhunia, Aishik Konwer, Ayan Kumar Bhunia , Abir Bhowmick, Partha Pratim Roy, Umapada Pal
    Pattern Recognition ( PR ), 2019

    Abstract / Code / arXiv / BibTex



    Template credits : Dr. Jon Barron