- 无标题文档
















 工学 - 软件工程    























指针标注 ; 嵌套实体识别 ; 关系抽取 ; 联合学习 ; 药物重定位





(1)针对生物医学领域嵌套实体的识别,构建了基于层叠指针标注的方法CPT (Cascade Pointer Tagging)。借助层叠指针标注,解决了基于序列标注方式不能识别嵌套实体的问题。此外,本文将实体的描述信息作为先验知识,在实体识别的过程中引入实体的类别信息,可以取得更好的结果。在与基线方法的对比中,无论是嵌套实体的识别还是非嵌套实体的识别,都取得了最高的F1值。

(2)针对生物医学领域文献中存在大量重叠关系的问题,构建了基于两次指针标注进行联合学习的关系抽取方法TPT (Two-time Pointer Tagging)。与基于流水线的方法相比,该方法没有错误传播、忽略子任务间的交互关系和产生冗余信息的缺点,同时还能解决生物医学领域中重叠关系的问题。本文将关系三元组的抽取转换为头部实体到尾部实体的函数映射,加强了三元组内部结构的依赖,在损失函数上添加偏执来缓解标签不平衡的问题。与基线方法进行对比时,在DDI和CPI两个公开的生物医学语料库上,本文的方法不仅提高了精准率,更能明显提升召回率,在两个语料库上都获得最高的F1值。



[1] Chung-Chi H, Zhiyong L. Community challenges in biomedical text mining over 10 years:success, failure and the future[J]. Briefings in Bioinformatics, 2016, 17(1):132-144.

[2] Wishart D S, Craig K, Chi G A, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration[J]. Nucleic Acids Research, 2006, 34(Database issue):D668-72.

[3] Becker K G, Barnes K C, Bright T J, et al. The Genetic Association Database[J]. Nature Genetics, 2004, 36(5):431-432.

[4] Davis A P, Grondin C J, Johnson R J, et al. The comparative toxicogenomics database: update 2019[J]. Nucleic acids research, 2019, 47(D1): D948-D954.

[5] Zhou G D, Zhang J, Su J, et al. Recognizing Names in Biomedical Texts: a Machine Learning Approach[J]. Bioinformatics, 2004, 20(7):1178-1190.

[6] Liu H, Aronson A R, Friedman C. A study of abbreviations in MEDLINE abstracts[C]. Proceedings of the AMIA Symposium, 2002:464-468.

[7] Proux D, Rechenmann F, Julliard L, et al. Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction[J]. Genome Inform Ser Workshop Genome Inform, 1998, 9(3):72-80.

[8] Krauthammer M, Rz

hetsky A, Morozov P, et al. Using BLAST for identifying gene and protein names in journal articles[J]. Gene, 2000, 259(1-2):245-252.

[9] Fukuda K, Tamura A, Tsunoda T, et al. Toward Information Extraction: Identifying Protein Names from Biological Papers[J]. Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, 1997, 98:707-718.

[10] Yeganova L, Smith L, Wilbur W J. Identification of related gene protein names based on an HMM of name variations[J]. Computational Biology and Chemistry, 2004, 28(2):97-107.

[11] Zhang J, Shen D, Zhou G, et al. Enhancing HMM-based biomedical named entity recognition by studying special phenomena[J]. Journal of Biomedical Informatics, 2004, 37(6): 411-422.

[12] Habib M S, Kalita J. Scalable biomedical named entity recognition: investigation of a database-supported SVM approach[J]. International Journal of Bioinformatics Research and Applications, 2010, 6(2):191-208.

[13] Rakesh P, Kumar S S. A kernel-based approach for biomedical named entity recognition[J]. The Scientific World Journal, 2013,12(2):796.

[14] Skeppstedt M, Kvist M, Nilsson G H, et al. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study[J]. Journal of Biomedical Informatics, 2014, 49:148-158.

[15] 孙晓, 孙重远, 任福继. 基于深层条件随机场的生物医学命名实体识别[J].模式识别与人工智能, 2016, 29(11):997-1008.

[16] 王浩畅,李钰,赵铁军.面向生物医学命名实体识别的多Agent元学习框架[J].计算机学报,2010,33(07):1256-1262.

[17] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[C]. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, 2016:260-270.

[18] Habibi M, Weber L, Neves M, et al. Deep learning with word embeddings improves biomedical named entity recognition[J]. Bioinformatics, 2017, 33(14):37-48.

[19] Wang B, Lu W. Neural Segmental Hypergraphs for Overlapping Mention Recognition[C]. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 204-214.

[20] Sohrab M G, Miwa M. Deep exhaustive model for nested named entity recognition[C]. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 2843-2849.

[21] Muis A O, Lu W. Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators[C]. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 2608-2618.

[22] Li X, Feng J, Meng Y, et al. A Unified MRC Framework for Named Entity Recognition[C]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 5849-5859.

[23] Pyysalo S, Ginter F, Heimonen J, et al. BioInfer: a corpus for information extraction in the biomedical domain[J]. BMC bioinformatics, 2007, 8(1): 1-24.

[24] Segura Bedmar I, Martínez P, Herrero Zazo M. Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013)[C]. Association for Computational Linguistics, 2013: 341-350.

[25] Krallinger M, Rabal O, Akhondi S A, et al. Overview of the BioCreative VI chemical-protein interaction Track[C]. Proceedings of the sixth BioCreative challenge evaluation workshop, 2017, 1: 141-146.

[26] Silver B, Ramaiya K, Andrew S B, et al. EADSG guidelines:

insulin therapy in diabetes[J]. Diabetes therapy, 2018, 9(2): 449-492.

[27] Blaschke C, Valencia A. Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study[J]. International Journal of Genomics, 2001, 2(4):196-206.

[28] Quoc-Chinh B, Sloot P M A, Van E M, et al. A novel feature-based approach to extract drug-drug interactions from biomedical text[J]. Bioinformatics, 2014, 30(23):3365-3371.

[29] Kim S, Liu H, Yeganova L, et al. Extracting drug–drug interactions from literature using a rich feature-based linear kernel approach[J]. Journal of Biomedical Informatics, 2015, 55:23-30.

[30] Rastegar-Mojarad M, Boyce R D, Prasad R. UWM-TRIADS: classifying drug-drug interactions with two-stage SVM and post-processing[C]. Proceedings of the 2013 International Workshop on Semantic Evaluation (SemEval), Task 9 - Extraction of Drug-drug Interactions from BioMedical Texts, 2013:667-674.

[31] Chowdhury M F M, Lavelli A. FBK-irst: A multi-phase kernel based approach for drug-drug interaction detection and classification that exploits linguistic information[C]. Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), 2013: 351-355.

[32] Thomas P, Neves M, Rockt?schel T, et al. WBI-DDI: drug-drug interaction extraction using majority voting[C]. Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), 2013:628-635.

[33] 鄂海红,张文静,肖思琪,程瑞,胡莺夕,周筱松,牛佩晴.深度学习实体关系抽取研究综述[J].软件学报,2019,30(06):1793-1818.

[34] Sahu S, Anand A, Oruganty K, et al. Relation extraction from clinical texts using domain invariant convolutional neural network[C]. Proceedings of the 15th Workshop on Biomedical Natural Language Processing, 2016: 206-215.

[35] Wu Y, Luo R, Leung H C M, et al. Renet: A deep learning approach for extracting gene-disease associations from literature[C]. International Conference on Research in Computational Molecular Biology, Springer, Cham, 2019:272-284.

[36] Wei C H, Kao H Y, Lu Z. PubTator: a web-based text mining tool for assisting biocuration[J]. Nucleic acids research, 2013, 41(W1): W518-W522.

[37] Zhao Z, Yang Z, Luo L, et al. Drug drug interaction extraction from biomedical literature using syntax convolutional neural network[J]. Bioinformatics, 2016, 32(22): 3444-3453.

[38] Zhang Y, Zheng W, Lin H, et al. Drug–drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths[J]. Bioinformatics, 2018, 34(5): 828-835.

[39] Peng Y, Rios A, Kavuluru R, et al. Extracting chemical–protein relations with ensembles of SVM and deep learning models[J]. Database, 2018, 1-9.

[40] Sun C, Yang Z, Wang L, et al. Chemical-protein interaction extraction from biomedical literature: a hierarchical recurrent convolutional neural network method[J]. International Journal of Data Mining and Bioinformatics, 2019, 22(2): 113-130.

[41] Qin L, Dong G, Peng J. Chemical-protein Intera

ction Extraction via ChemicalBERT and Attention Guided Graph Convolutional Networks in Parallel[C]. 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, 2020:708-715.

[42] 万佳. 基于词表示和深度学习的生物实体关系抽取[D].大连理工大学, 2018.

[43] 冯钦林. 基于半监督和深度学习的生物实体关系抽取[D].大连理工大学, 2016.

[44] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2019:4171-4186.

[45] Zheng S, Hao Y, Lu D, et al. Joint entity and relation extraction based on a hybrid neural network[J]. Neurocomputing, 2017, 257: 59-66.

[46] Miwa M, Bansal M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures[C]. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016: 1105-1116.

[47] Li F, Zhang M, Fu G, et al. A neural joint model for entity and relation extraction from biomedical text[J]. BMC bioinformatics, 2017, 18(1): 1-11.

[48] Zheng S, Wang F, Bao H, et al. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme[C]. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017:1227-1236.

[49] Luo L, Yang Z, Cao M, et al. A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature[J]. Journal of biomedical informatics, 2020, 103: 103384.

[50] Wei Z, Su J, Wang Y, et al. A Novel Cascade Binary Tagging Framework for Relational Triple Extraction[C]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1476-1488.

[51] Wang S, Zhang Y, Che W, et al. Joint Extraction of Entities and Relations Based on a Novel Graph Scheme[C]. IJCAI, 2018:4461-4467.

[52] Sun C, Gong Y, Wu Y, et al. Joint type inference on entities and relations via graph convolutional networks[C]. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1361-1370.

[53] Fu T J, Li P H, Ma W Y. GraphRel: Modeling text as relational graphs for joint entity and relation extraction[C]. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1409-1418.

[54] Han P, Yang P, Zhao P, et al. GCN-MF: Disease-Gene Association Identification By Graph Convolutional Networks and Matrix Factorization[C]. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019:705-713.

[55] Zeng X, Zeng D, He S, et al. Extracting relational facts by an end-to-end neural model with copy mechanism[C]. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018: 506-514.

[56] Zeng D, Zhang H, Liu Q. Copymtl: Copy mechanism for joint extraction of entities and relations with multi-task learning[C]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 9507-9514.

[57] Levy O, Seo M, Choi E, et al. Zero-Shot Relation Extraction via Reading Comprehension[C]. Proceedings of the 21st Conference on Computational Natural Language Learning, 2017: 333-342.

[58] Swanson D R. Medical literature as a potential source of new knowledge[J]. Bulletin of the Medical Library Association, 1990, 78(1): 29-37.

[59] Jang D, Lee S, Lee J, et al. Inferring new drug indications using the complementarity between clinical disease signatures and drug effects[J]. Journal of biomedical informatics, 2016, 59: 248-257.

[60] Controlprevention C F D. National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Questionnaire (or Examination Protocol, or Laboratory Protocol)[OL]. [2021-03-01]. http://www.cdc.gov/nchs/nhanes.htm.

[61] Wei W Q, Cronin R M, Xu H, et al. Development and evaluation of an ensemble resource linking medications to their indications[J]. Journal of the American Medical Informatics Association, 2013, 20(5): 954-961.

[62] Brown A S, Patel C J. A standard database for drug repositioning[J]. Scientific data, 2017, 4(1): 1-7.












   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式