About Me

I am a tenure-track Associate Professor at the Institute of Intelligent Complex Systems (IICS), Fudan University, and a Researcher at Shanghai AI Lab.

My research focuses on AI for Science and LLMs for Scientific Discovery. I am primarily interested in developing foundational generative models for biomolecular structure prediction/design, proteomics and AIVC, while also expanding into Agentic Science for autonomous scientific reasoning.

Previously, I was a Researcher on the Knowledge and Language Team at Microsoft Research, working with JJ Liu and Jianfeng Gao. I received my PhD from TTI-Chicago, a philanthropically endowed computer science research institute in the University of Chicago, advised by Prof. Jinbo Xu. Prior to that, I received my B.S. degree from the School of Mathematics at Fudan University.

Preprints

IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction The IntFold Team, Leon Qiao, Wayne Bai, He Yan, Gary Liu, Nova Xi, Xiang Zhang, Siqi Sun#
arXiv 2025
Accurate de novo sequencing of the modified proteome with OmniNovo Yuhan Chen, Shang Qu, Zhiqiang Gao, Yuejin Yang, Xiang Zhang, Sheng Xu, Xinjie Mao, Liujia Qian, Jiaqi Wei, Zijie Qiu, Chenyu You, Lei Bai, Ning Ding#, Tiannan Guo#, Bowen Zhou#, Siqi Sun#
arXiv 2025 (Under Review)
OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery Zhongyue Zhang*, Zijie Qiu*, Yingcheng Wu*, Shuya Li*, Dingyan Wang, Zhuomin Zhou, Duo An, Yuhan Chen, Yu Li, Yongbo Wang, Chubin Ou, Zichen Wang, Jack Xiaoyu Chen, Bo Zhang, Yusong Hu, Wenxin Zhang, Zhijian Wei, Runze Ma, Qingwu Liu, Bo Dong, Yuexi He, Qiantai Feng, Lei Bai#, Qiang Gao#, Siqi Sun#, Shuangjia Zheng#
bioRxiv 2025 (Under Review)
MassNet: billion-scale AI-friendly mass spectral corpus enables robust de novo peptide sequencing A Jun*, Xiang Zhang*, Xiaofan Zhang*, Jiaqi Wei*, Te Zhang, Yamin Deng, Pu Liu, Zongxiang Nie, Yi Chen, Nanqing Dong, Zhiqiang Gao#, Siqi Sun#, Tiannan Guo#
bioRxiv 2025 (Under Review)
Fitness aligned structural modeling enables scalable virtual screening with AuroBind Zhongyue Zhang*, Jiahua Rao*, Jie Zhong, Weiqiang Bai, Dongxue Wang, Shaobo Ning, Lifeng Qiao, Sheng Xu, Runze Ma, Will Hua, Siqi Sun#, Jian Zhang#, Shuangjia Zheng#
arXiv 2025 (Under Review)

Selected Publications

* indicates co-first author, # indicates co-corresponding author

Journal Papers

Benchmarking all-atom biomolecular structure prediction with FoldBench Sheng Xu*, Qiantai Feng*, Lifeng Qiao, Hao Wu, Tao Shen, Yu Cheng#, Shuangjia Zheng#, Siqi Sun#
Nature Communications, 2025
Cryo-EM reveals mechanisms of natural RNA multivalency Liu Wang*, Jiahao Xie*, Tao Gong*, Hao Wu*, Yifan Tu*, Xin Peng*, Sitong Shang*, Xinyu Jia, Haiyun Ma, Jian Zou, Sheng Xu, Xin Zheng, Dong Zhang, Yang Liu, Chong Zhang, Yongbo Luo, Zirui Huang, Bin Shao, Binwu Ying, Yu Cheng, Siqi Sun#, Xuedong Zhou#, Zhaoming Su#
Science, 2025
Fast, sensitive detection of protein homologs using deep dense retrieval Liang Hong*, Zhihang Hu*, Siqi Sun*,#, Xiangru Tang, Jiuming Wang, Qingxiong Tan, Liangzhen Zheng, Sheng Wang, Sheng Xu, Irwin King, Mark Gerstein#, Yu Li#
Nature Biotechnology, 2025
π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing Xiang Zhang*, Tianze Ling*, Zhi Jin*, Sheng Xu*, Zhiqiang Gao, Boyan Sun, Zijie Qiu, Jiaqi Wei, Nanqing Dong, Guangshuai Wang, Guibin Wang, Leyuan Li, Muhammad Abdul-Mageed, Laks V.S. Lakshmanan, Fuchu He, Wanli Ouyang#, Cheng Chang#, Siqi Sun#
Nature Communications, 2025
Accurate RNA 3D structure prediction using a language model-based deep learning approach Tao Shen*, Zhihang Hu*, Siqi Sun*,#, Di Liu, Felix Wong, Jiuming Wang, Jiayang Chen, Yixuan Wang, Liang Hong, Jin Xiao, Mark Gerstein, Yu Li#
Nature Methods, 2024
Accurate prediction of antibody function and structure using bio-inspired antibody language model Hongtai Jing*, Zhengtao Gao, Sheng Xu, Tao Shen, Zhangzhi Peng, Shwai He, Tao You, Shuang Ye#, Wei Lin#, Siqi Sun#
Briefings in Bioinformatics 2024

Conference Papers

Bidirectional Representations Augmented Autoregressive Biological Sequence Generation: Application in De Novo Peptide Sequencing Xiang Zhang*, Jiaqi Wei*, Zijie Qiu, Sheng Xu, Zhi Jin, ZhiQiang Gao, Nanqing Dong, Siqi Sun#
NeurIPS 2025
Retrieval is Not Enough: Enhancing RAG through Test-Time Critique and Optimization Jiaqi Wei*, Hao Zhou*, Xiang Zhang*, Di Zhang, Zijie Qiu, Noah Wei, Jinzhe Li, Wanli Ouyang, Siqi Sun#
NeurIPS 2025
Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing Zijie Qiu*, Jiaqi Wei*, Xiang Zhang*, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, Siqi Sun#
ICML 2025
Curriculum Learning for Biological Sequence Prediction: The Case of De Novo Peptide Sequencing Xiang Zhang*, Jiaqi Wei*, Zijie Qiu, Sheng Xu, Nanqing Dong, Zhiqiang Gao, Siqi Sun#
ICML 2025
PriFold: Biological Priors Improve RNA Secondary Structure Predictions Chenchen Yang*, Hao Wu*, Tao Shen, Kai Zou, Siqi Sun#
AAAI 2025
MSA Generation with Seqs2Seqs Pretraining: Advancing Protein Structure Predictions Le Zhang*, Jiayang Chen, Tao Shen, Yu Li, Siqi Sun#
NeurIPS 2024
ContraNovo: a contrastive learning approach to enhance de novo peptide sequencing Zhi Jin*, Sheng Xu*, Xiang Zhang*, Tianze Ling, Nanqing Dong, Wanli Ouyang, Zhiqiang Gao, Cheng Chang#, Siqi Sun#
AAAI 2024
Crossbind: Collaborative cross-modal identification of protein nucleic-acid-binding residues Linglin Jing*, Sheng Xu*, Yifan Wang*, Yuzhe Zhou, Tao Shen, Zhigang Ji, Hui Fang, Zhen Li, Siqi Sun#
AAAI 2024
Retgen: A joint framework for retrieval and grounded text generation modeling Yizhe Zhang, Siqi Sun, Xiang Gao, Yuwei Fang, Chris Brockett, Michel Galley, Jianfeng Gao, Bill Dolan
AAAI 2022
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval Siqi Sun*, Yen-Chun Chen*, Linjie Li, Shuohang Wang, Yuwei Fang, Jingjing Liu
NAACL 2021
Contrastive Distillation on Intermediate Representations for Language Model Compression Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu
EMNLP 2020
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan
ACL 2020 (System Demo)
FreeLB: Enhanced Adversarial Training for Language Understanding Chen Zhu*, Yu Cheng*, Zhe Gan*, Siqi Sun*, Tom Goldstein, Jingjing Liu
ICLR 2020 (Spotlight)
Patient Knowledge Distillation for BERT Model Compression Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu
EMNLP 2019
Hierarchical graph network for multi-hop question answering Yuwei Fang, Siqi Sun, Zhe Gan, Rohit Pillai, Shuohang Wang, Jingjing Liu
EMNLP 2019

Earlier Publications

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model Sheng Wang*, Siqi Sun*, Zhen Li, Renyu Zhang, Jinbo Xu
PLOS Computational Biology, 2017
Learning scale-free networks by dynamic node specific degree prior Qingming Tang*, Siqi Sun*, Jinbo Xu
ICML 2015
Adaptive Variable Clustering in Gaussian Graphical Models Siqi Sun*, Yuancheng Zhu*, Jinbo Xu
AISTATS 2014
An iterative network partition algorithm for accurate identification of dense network modules Siqi Sun*, Xinran Dong*, Yao Fu, Weidong Tian
Nucleic Acids Research 2012

Recruitment

I am actively recruiting self-motivated PhD students (Closed for Fall 2026), Research Interns, and Full-time Research Scientists at Shanghai AI Lab and Fudan University. Postdocs and Research Assistants are also welcome.

Our research group focuses on AI for Science (e.g., biomolecular structure prediction, drug discovery) and Large Language Models.

Candidates with backgrounds in Computer Science, Mathematics, Physics, Biology, or Chemistry are all highly valued. If you are interested, please send your CV and transcripts to my email.

Misc

In my spare time, I enjoy rock music , video games , and trading card games .

Visitor Map