Sotabase
Home
Researchers
Career
·
Postdoctoral Researcher
,
University of Oxford
2025–
·
Research Intern
,
Microsoft
2024–2025
·
Research Intern
,
Meta
2023–2024
·
Research Intern
,
Tencent
2021–2022
Publications
(69)
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
International Conference on Machine Learning · 2023
1,055
cited
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
arXiv.org · 2023
840
cited
End-to-End Human Pose and Mesh Reconstruction with Transformers
Computer Vision and Pattern Recognition · 2020
735
cited
GIT: A Generative Image-to-text Transformer for Vision and Language
Trans. Mach. Learn. Res. · 2022
716
cited
Deep learning of binary hash codes for fast image retrieval
2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) · 2015
595
cited
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
arXiv.org · 2023
506
cited
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
International Conference on Learning Representations · 2023
418
cited
Mesh Graphormer
IEEE International Conference on Computer Vision · 2021
382
cited
Text2Motion: from natural language instructions to feasible plans
Autonomous Robots · 2023
367
cited
Adversarial Ranking for Language Generation
Neural Information Processing Systems · 2017
347
cited
Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks
Computer Vision and Pattern Recognition · 2016
342
cited
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Computer Vision and Pattern Recognition · 2021
304
cited
Ieee Transactions on Pattern Analysis and Machine Intelligence Supervised Learning of Semantics-preserving Hash via Deep Convolutional Neural Networks
260
cited
Aligning Large Multi-Modal Model with Robust Instruction Tuning
arXiv.org · 2023
257
cited
Egocentric Video-Language Pretraining
Neural Information Processing Systems · 2022
251
cited
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
arXiv.org · 2021
240
cited
UniVTG: Towards Unified Video-Language Temporal Grounding
IEEE International Conference on Computer Vision · 2023
194
cited
ReCo: Region-Controlled Text-to-Image Generation
Computer Vision and Pattern Recognition · 2022
193
cited
Disco: Disentangled Control for Realistic Human Dance Generation
Computer Vision and Pattern Recognition · 2023
137
cited
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
IEEE International Conference on Computer Vision · 2023
137
cited
Show all 69 papers →
Sotabase
Kevin Qinghong Lin | Researcher Profile | Sotabase | Sotabase