Kevin Qinghong Lin | Researcher Profile | Sotabase

Career

· Postdoctoral Researcher, University of Oxford2025–

· Research Intern, Microsoft2024–2025

· Research Intern, Meta2023–2024

· Research Intern, Tencent2021–2022

Publications (69)

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

International Conference on Machine Learning · 2023

1,055

cited

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

arXiv.org · 2023

840

cited

End-to-End Human Pose and Mesh Reconstruction with Transformers

Computer Vision and Pattern Recognition · 2020

735

cited

GIT: A Generative Image-to-text Transformer for Vision and Language

Trans. Mach. Learn. Res. · 2022

716

cited

Deep learning of binary hash codes for fast image retrieval

2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) · 2015

595

cited

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

arXiv.org · 2023

506

cited

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

International Conference on Learning Representations · 2023

418

cited

Mesh Graphormer

IEEE International Conference on Computer Vision · 2021

382

cited

Text2Motion: from natural language instructions to feasible plans

Autonomous Robots · 2023

367

cited

Adversarial Ranking for Language Generation

Neural Information Processing Systems · 2017

347

cited

Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks

Computer Vision and Pattern Recognition · 2016

342

cited

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

Computer Vision and Pattern Recognition · 2021

304

cited

Ieee Transactions on Pattern Analysis and Machine Intelligence Supervised Learning of Semantics-preserving Hash via Deep Convolutional Neural Networks

260

cited

Aligning Large Multi-Modal Model with Robust Instruction Tuning

arXiv.org · 2023

257

cited

Egocentric Video-Language Pretraining

Neural Information Processing Systems · 2022

251

cited

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

arXiv.org · 2021

240

cited

UniVTG: Towards Unified Video-Language Temporal Grounding

IEEE International Conference on Computer Vision · 2023

194

cited

ReCo: Region-Controlled Text-to-Image Generation

Computer Vision and Pattern Recognition · 2022

193

cited

Disco: Disentangled Control for Realistic Human Dance Generation

Computer Vision and Pattern Recognition · 2023

137

cited

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

IEEE International Conference on Computer Vision · 2023

137

cited

Sotabase