WO2021120528A1 - 自动化报告解读方法及系统 - Google Patents

自动化报告解读方法及系统 Download PDF

Info

Publication number
WO2021120528A1
WO2021120528A1 PCT/CN2020/092902 CN2020092902W WO2021120528A1 WO 2021120528 A1 WO2021120528 A1 WO 2021120528A1 CN 2020092902 W CN2020092902 W CN 2020092902W WO 2021120528 A1 WO2021120528 A1 WO 2021120528A1
Authority
WO
WIPO (PCT)
Prior art keywords
report
data
database
value
interpretation method
Prior art date
Application number
PCT/CN2020/092902
Other languages
English (en)
French (fr)
Inventor
梁萌萌
余伟师
谢欣
Original Assignee
苏州赛美科基因科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州赛美科基因科技有限公司 filed Critical 苏州赛美科基因科技有限公司
Publication of WO2021120528A1 publication Critical patent/WO2021120528A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the invention belongs to the field of biometrics detection, and designs an automatic report interpretation method and system.
  • the interpreter In the process of writing the report, the interpreter needs to query various public databases to re-screen the thousands of loci output by the biometrics, and to rate the mutations of several selected loci according to the industry's gold standard, and classify them The category is pathogenic, suspected of pathogenic, or clinical significance is unknown. Finally, the interpreter must complete the report in accordance with the document format prescribed by the doctor.
  • This application provides an automated report interpretation method and system, which displays the core data obtained from the search together with its surrounding information in multiple dimensions, integrates related data to the greatest extent, and makes the biometric analysis report simple and easy to read.
  • the automated report interpretation method includes:
  • Calculate the scores of various evidence data sources define the value representing pathogenicity in the calculation result as value A, and define the value representing benign results in the calculation result as value B;
  • the relevant data includes gene function data, phenotype description data, and rating evidence;
  • a in the value A is a number; B in the value B is a number; each position of the bio-information analysis is sorted by number according to the score of the calculation result.
  • this application also includes generating report data in JSON format from a complete report, and storing the report in JSON format in a historical report database.
  • the local relational database includes OMIM database, CHPO database, HGMD database and historical report database; OMIM database, CHPO database, HGMD database and historical report database in the local relational database adopt ER relationship
  • the graph model is related according to the gene-phenotype relationship and forms a multi-dimensional data system.
  • the weighted average calculation is used to calculate the scores of various evidence data sources.
  • a logistic regression algorithm is used to calculate the scores of various evidence data sources.
  • Another objective of this application is to provide an automated report interpretation system, including
  • the intelligent analysis module is used to obtain various evidence data source files for biometric analysis, and calculate the weighted average of each data in the result file, and sort the points in the calculation results according to the level of pathogenicity;
  • the report writing module is used to obtain the calculation results of the intelligent analysis module, the patient phenotype data and the data in the local relational database, and make conclusive descriptions;
  • the generation module receives the data reported by the report writing module and combines it with the HTML text in the template editing module to synthesize and generate a PDF report.
  • a storage medium is characterized in that a computer program is stored in the storage medium, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running .
  • an electronic device includes a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute any one of the foregoing The steps in the method embodiment.
  • the Logistic Regression algorithm to calculate the weighted average of multiple pathogenic evidence data sources, it can increase the speed of screening of pathogenic loci, realize the semi-automation of pathogenic loci screening, and combine with continuous accumulation
  • the historical data which continuously improves the accuracy of the sorting results, increases the confidence of the interpreter to determine that the test result is positive, and at the same time improves efficiency
  • HTML style editing to realize centralized management of the layout and beautification of the interpretation report, compress the time for editing the report, improve the unity of the report page, and also allow the interpreter to only need the relationship report when making the report
  • the content instead of the style can save them about 30% of their time;
  • the interpretation data written in the report can be effectively saved in the database, which is convenient for searching and consulting in a structured manner.
  • association structure system of genes, phenotypes and diseases created when integrating data can effectively eliminate the problem of information islands between multiple data sources, reducing the interpreters’ Unnecessary repeated query steps made to obtain relevant information of the core query results save their time;
  • FIG. 1 Schematic diagram of the overall structure of the automated report interpretation method of this application.
  • Figure 2 ER relationship diagram adopted by the local relational database.
  • this application needs to effectively reduce the drawbacks caused by information islands in the production process of the interpretation report, and display the core data obtained from the search together with its surrounding information in multiple dimensions to maximize the integration of related data.
  • the method embodiment provided in the first embodiment of the present application can be executed in the cloud or a local server cluster.
  • the local server cluster may include one or more processors (the processor may include but is not limited to x86 or ARM architecture processing devices) and a memory for storing data.
  • the above-mentioned local server cluster may also include communication functions Transmission equipment and input and output equipment.
  • the memory can be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the automated report interpretation method in the embodiments of the present application.
  • the processor executes various functions by running the computer programs stored in the memory. Application and data processing, that is, to achieve the above method.
  • the memory can include high-speed random access memory, and data redundancy can be achieved through RAID1 or RAID5 disk arrays to ensure data security.
  • the transmission device is used to receive or send data via a network.
  • the above-mentioned specific examples of the network may include a wireless network provided by a communication provider of a local server cluster.
  • the transmission device includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet.
  • NIC Network Interface Controller
  • an automated report interpretation method running on the above-mentioned local server cluster or network architecture is provided.
  • the automated report interpretation method in combination with FIG. 1 includes the following steps:
  • Calculate the scores of the various evidence data sources define the value representing the pathogenicity in the calculation result as value A, and define the value representing the benign result in the calculation result as value B; in the value A, A is a number, Such as 1.0; B in value B is a number, such as 0.0;
  • the pathogenic locus data, patient phenotype data, and relevant data in the local relational database are imported into the template after correspondence; among them, the relevant data includes gene function data, phenotype description data, and rating evidence, etc., which effectively eliminates multiple The issue of information islands between data sources.
  • the local relational database includes OMIM database, CHPO database, HGMD database, and historical report database; among the local relational databases, OMIM database, CHPO database, HGMD database, and historical report database adopt the ER relational graph model, according to gene-phenotype Relations are associated and form a multi-dimensional data system.
  • the local relational database is generated in advance and continuously updated, so the local relational model in this embodiment is a continuously updated model.
  • association structure system of genes, phenotypes, and diseases created when integrating data can effectively eliminate the problem of information islands between multiple data sources; logistic regression algorithms are used to treat multiple diseases
  • the weighted average calculation of sexual evidence data sources can improve the screening speed of pathogenic sites; use HTML style editing to realize the layout and beautification of interpretation reports, and reduce the time for editing reports.
  • the logistic regression algorithm used to calculate the scores of various evidence data sources performs weighted average calculation to improve the screening speed of pathogenic sites.
  • the various sites of the biosynthesis analysis are sorted by number according to the score of the calculation result.
  • an automated report interpretation system is also provided, and the system is used to implement the above-mentioned embodiments and preferred implementations, and those that have been explained will not be repeated.
  • the term "module” can implement a combination of software and/or hardware with predetermined functions.
  • the devices described in the following embodiments are preferably implemented by software, implementation by hardware or a combination of software and hardware is also possible and conceived.
  • An automated report interpretation system including
  • the intelligent analysis module is used to obtain various evidence data source files for biometric analysis, and calculate the weighted average of each data in the result file, and sort the points in the calculation results according to the level of pathogenicity;
  • the report writing module is used to obtain the calculation results of the intelligent analysis module, the patient phenotype data and the data in the local relational database, and make conclusive descriptions; in actual operation, obtain the calculation results of the intelligent analysis module, the patient phenotype data, and the data in the local relational database.
  • the data in the local relational database has a one-to-one correspondence;
  • the generation module receives the data reported by the report writing module and combines it with the HTML text in the template editing module to synthesize and generate a PDF report.
  • the intelligent analysis module includes:
  • the receiving unit is used to obtain various evidence data source files for biometric analysis
  • the calculation unit is used to calculate the weighted average of the data in the result file
  • the sorting unit is used to sort the points in the weighted average calculation result according to the level of pathogenicity.
  • the report writing module includes:
  • the receiving unit is used to obtain the calculation results of the intelligent analysis module, the patient phenotype data and the data in the local relational database, and correspond them one to one;
  • the description unit is used to conclusively describe the data.
  • the generating module includes:
  • the receiving unit is used to receive the data reported by the report writing module
  • the integration unit is used to combine the data obtained by the receiving unit with the HTML text in the template editing module;
  • the report generation unit is used to generate a PDF report from the synthesized text.
  • a wkhtmltopdf tool is provided in the report generating unit.
  • the automated report interpretation method includes the following steps:
  • the intelligent analysis module After obtaining the biometric analysis result file, it is first imported into the intelligent analysis module, which calculates the weighted average of the scores of the various evidence data sources in the file, and then ranks the sites according to the level of pathogenicity according to the calculation results , Where a score of 1.0 represents pathogenicity, and a score of 0.0 represents benign.
  • the logistic regression algorithm is used to calculate the weighted average of the evidence data sources in Table 1 above, and then the pathogenicity is ranked from highest to lowest based on the calculation results.
  • the interpreter can make a final screening of the pathogenic loci according to the industry gold standard; at the same time, every imported file and the screening result of the interpreter will also be included in the continuous learning of the model of the module. Improve the accuracy of subsequent calculations and sorting.
  • the data is imported into the report writing module.
  • the patient phenotype data, as well as the relevant data and historical data captured from a variety of public databases integrated in the local relational database are imported, including But it is not limited to gene function, phenotype description, rating evidence, etc.; local relational database is pre-generated and continuously updated. It loads the data in the public database through the REST API interface and Tab-separated (TSV)/Comma-separated (CSV) format files, and associates them according to the gene-phenotype relationship to form a multi-dimensional data system.
  • TSV Tab-separated
  • CSV Comma-separated
  • the creation of the database is based on the following ER relationship diagram as shown in Figure 2, where 1:m represents a one-to-many relationship, and m:1 represents a many-to-one relationship.
  • the interpreter combines the above automatically obtained data, and then fills in the conclusive description text in the report writing module, and then generates the report data in JSON format (without the style) for the final synthesis of the report, and saves it in the historical report In the database.
  • the report data in JSON format is easy to expand. Under the condition that the report content is continuously optimized, it can be compatible with reports of various templates.
  • the JSON format report is stored in the PostgreSQL relational database, with its ability to process JSON format data, it is convenient to search and review the historical data in the later period.
  • the JSON format report is stored without any style, which maximizes the decoupling of page content and layout, and facilitates re-importing when the report template is updated.
  • Synthesize report data in JSON format with styled HTML text designed in the template editing module in advance is generated in advance.
  • HTML templates are centrally controlled and generated in advance.
  • the style of the template is processed by CSS, and the modified template can be applied to multiple reports edited by multiple people after being issued once.
  • An embodiment of the present invention also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the foregoing method embodiments.
  • the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
  • the foregoing processor may be configured to execute the following steps through a computer program:
  • Calculate the scores of various evidence data sources define the value representing pathogenicity in the calculation result as value A, and define the value representing benign results in the calculation result as value B;
  • the relevant data includes gene function data, phenotype description data, and rating evidence;
  • An embodiment of the present invention also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the foregoing method embodiments.
  • the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
  • the foregoing processor may be configured to execute the following steps through a computer program:
  • Calculate the scores of various evidence data sources define the value representing pathogenicity in the calculation result as value A, and define the value representing benign results in the calculation result as value B;
  • the relevant data includes gene function data, phenotype description data, and rating evidence;
  • modules or steps of the present invention can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices.
  • they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device for execution by the computing device, and in some cases, can be executed in a different order than here.
  • the present invention is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Public Health (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Epidemiology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

自动化报告解读方法及系统,方法包括:获取生信分析的各项证据数据源;对各项证据数据源的分值进行计算,将计算结果中代表致病性的数值定义为值A,将计算结果中代表良性结果的数值定义为值B;将生信分析的各个位点根据计算结果的分值按序排序;根据行业金标准筛选致病性位点;将致病性位点数据、患者表型数据、本地关系型数据库中的相关数据对应后导入模板;给模板中加入结论性描述,得到完整报告。其将搜索所得核心数据连同它的周边信息一齐进行多维度展示,最大限度的整合有关联的数据,并使得生信分析报告简单易读。

Description

自动化报告解读方法及系统 技术领域
本发明属于生信检测领域,设计自动化报告解读方法及系统。
背景技术
随着测序技术的飞速发展和成本的不断降低,越来越多的患者会采纳医生的建议,接受分子诊断技术的检测,而其中最热门的当属二代基因测序。但是众所周知,无论是测序所得的原始结果文件还是生信工程师采用各类算法对原始结果进行分析、过滤和注释后的输出文件,都无法为医生提供最直接的参考;它需要专业的医学解读人员对数据进行进一步处理,从而形成清晰且易读的最终报告来辅助临床决策。在报告撰写的过程中,解读人员需要查询各类公开的数据库对生信输出的上千位点进行再筛选,并依据行业内的金标准对挑选出的若干位点进行变异评级,将它们归类为致病性、疑似致病性或者临床意义未明。最后,解读人员还得按照医生规定的文档格式完成报告。
目前虽然各大主流的公开数据库都提供了信息检索的web页面,但各数据库之间的关联性较差,形成了较为明显的信息孤岛,导致解读人员需要不停的在各个查询页面上进行切换而非通过一次查询即获得完整数据的多维度展示。同时,解读人员现阶段在对上千位点进行人工筛选时,缺乏针对特定疾病致病性位点的自动化排序机制,导致在这一步骤上消耗较多的时间。此外,在制作报告的时候,报告用于的标准化以及排版的美观度也是影响整体解读速率的重要因素。
发明内容
本申请提供自动化报告解读方法及系统,其将搜索所得核心数据连同它的周边信息一齐进行多维度展示,最大限度的整合有关联的数据,并使得生信分析报告简单易读。
为实现上述技术目的,本申请采取的技术方案如下:自动化报告解读方法,包括:
获取生信分析的各项证据数据源;
对各项证据数据源的分值进行计算,将计算结果中代表致病性的数值定义为值A,将计算结果中代表良性结果的数值定义为值B;
将生信分析的各个位点根据计算结果的分值按序排序;
根据行业金标准筛选致病性位点;
将致病性位点数据、患者表型数据、本地关系型数据库中的相关数据对应后导入模板;其中,相关数据包括基因功能数据、表型描述数据以及评级证据;
给模板中加入结论性描述,得到完整报告。
作为本申请改进的技术方案,所述值A中A为数字;值B中B为数字;生信分析的各个位点根据计算结果的分值按数字大小排序。
作为本申请改进的技术方案,还包括将完整报告生成JSON格式的报告数据,并将JSON格式的报告存储于历史报告数据库。
作为本申请改进的技术方案,所述本地关系型数据库包括OMIM数据库、CHPO数据库、HGMD数据库以及历史报告数据库;所述本地关系型数据库中OMIM数据库、CHPO数据库、HGMD数据库以及历史报告数据库采用ER关系图模式,按基因—表型关系关联,并形成多维度数据体系。
作为本申请改进的技术方案,还包括将完整报告生成JSON格式的报告数据与HTML文本进行合成,并成成PDF报告。
作为本申请改进的技术方案,对各项证据数据源的分值进行计算采用的是加权平均计算。
作为本申请改进的技术方案,对各项证据数据源的分值进行计算采用的逻辑回归算法。
本申请的另一目的,提供一种自动化报告解读系统,包括
智能分析模块,用于获取生信分析的各项证据数据源文件,并将结果文件中的各项数据进行加权平均计算,并将计算结果中各位点按致病性高低排序;
报告撰写模块,用于获取智能分析模块的计算结果、患者表型数据以及本地关系数据库中的数据,并进行结论性描述文字;
生成模块,接收报告撰写模块报告的数据,并结合模板编辑模块中HTML文本合成,并生成PDF报告。
根据本申请的另一实施例,一种存储介质,其特征在于,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
根据本申请的另一实施例,一种电子装置,包括存储器和处理器,其特征在 于,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。
有益效果
通过逻辑回归(Logistic Regression)算法对多种致病性证据数据源的加权平均计算,它能提升致病性位点的筛选速度,实现致病性位点筛选的半自动化,同时能够结合不断积累的历史数据,使排序结果的准确度不断提升,从而增加解读人员判定检测结果为阳性的信心,同时提升效率;
通过HTML转PDF的方式,使用HTML样式编辑实现集中管理解读报告的的排版及美化,压缩编辑报告的时间,提升报告页面的统一性,同时也让解读人员在制作报告的时候,只需关系报告的内容而非样式,能够节约他们30%左右的时间;
在报告中撰写的解读数据,能够实现在数据库中有效的保存,便于之后以结构化的方式进行搜索查阅。
通过对数据库的关联和整合,在整合数据时创建的基因、表型和疾病的关联结构体系,它能有效的消除多种数据源之间的信息孤岛问题,减低了解读人员在查询时,为获取核心查询结果的相关信息而做出的不必要的重复查询步骤,节约了他们的时间;
应当理解,前述构思以及在下面更加详细地描述的额外构思的所有组合只要在这样的构思不相互矛盾的情况下都可以被视为本公开的申请主题的一部分。
结合附图从下面的描述中可以更加全面地理解本申请教导的前述和其他方面、实施例和特征。本申请的其他附加方面例如示例性实施方式的特征和/或有益效果将在下面的描述中显见,或通过根据本申请教导的具体实施方式的实践中得知。
附图说明
图1本申请自动化报告解读方法的整体结构示意图。
图2本地关系型数据库采用的ER关系图。
具体实施方式
为了更了解本申请的技术内容,特举具体实施例并配合所附图式说明如下。
本公开的实施例不必定意在包括本申请的所有方面。应当理解,上面介绍的多种构思和实施例,以及下面更加详细地描述的那些构思和实施方式可以以很多 方式中任意一种来实施,这是因为本申请所公开的构思和实施例并不限于任何实施方式。另外,本申请公开的一些方面可以单独使用,或者与本申请公开的其他方面的任何适当组合来使用。
本申请在设计技术方案时需要在解读报告的制作流程中有效降低信息孤岛带来的弊端,将搜索所得核心数据连同它的周边信息一齐进行多维度展示,最大限度的整合有关联的数据。同时需要创建简易但符合公司解读框架的致病性位点自动化排序模型,加速位点的筛选。在制作终版PDF报告时,需解决报告样式统一性的问题并加强中央管理,并让解读人员在撰写过程中更加注重报告内容而非其排版样式。
实施例1
本申请实施例一所提供的方法实施例可以在云端或者本地服务器集群中执行。本地服务器集群可以包括一个或多个处理器(处理器可以包括但不限于x86或者ARM架构的处理装置)和用于存储数据的存储器,可选地,上述本地服务器集群还可以包括用于通信功能的传输设备以及输入输出设备。
存储器可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本申请实施例中的自动化报告解读方法对应的计算机程序,处理器通过运行存储在存储器内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。
存储器可包括高速随机存储器,并通过RAID1或RAID5磁盘列阵实现数据冗余,确保数据的安全性。
传输装置用于经由一个网络接收或者发送数据。上述的网络具体实例可包括本地服务器集群的通信供应商提供的无线网络。在一个实例中,传输装置包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。
在本实施例中,提供一种运行于上述本地服务器集群或网络架构的自动化报告解读方法,结合图1自动化报告解读方法,包括如下步骤:
获取生信分析的各项证据数据源,各项证据数据源如下表1所示;
表1示例生信分析结果文件中的各项证据数据源
证据种类 数据源
功能预测 Polyphen2-HVAR
进化保守性 LRT
功能预测 SIFT
进化保守性 phastCons100way
进化保守性 GERP++
结构域 Gene
人群频率 gnomAD
结构域 dbNSFP Interpro
功能预测 MutationTaster2
历史评级 公司历史数据
对各项证据数据源的分值进行计算,将计算结果中代表致病性的数值定义为值A,将计算结果中代表良性结果的数值定义为值B;所述值A中A为数字,如1.0;值B中B为数字,如0.0;
根据行业金标准筛选致病性位点;这里行业金标准可采用《ACMG遗传变异分类标准与指南》。
将致病性位点数据、患者表型数据、本地关系型数据库中的相关数据对应后导入模板;其中,相关数据包括基因功能数据、表型描述数据以及评级证据等,其有效的消除多种数据源之间的信息孤岛问题。
给模板中加入结论性描述,得到完整报告。
所述本地关系型数据库包括OMIM数据库、CHPO数据库、HGMD数据库以及历史报告数据库;所述本地关系型数据库中OMIM数据库、CHPO数据库、HGMD数据库以及历史报告数据库采用ER关系图模式,按基因—表型关系关联,并形成多维度数据体系。起始时,本地关系型数据库预先生成并持续更新,故本实施例中本地关系模型是一个持续性更新的模型。
将完整报告生成JSON格式的报告数据,并将JSON格式的报告存储于历史报告数据库,同时将完整报告生成JSON格式的报告数据与HTML文本进行合成,并成成PDF报告。
通过上述步骤,在整合数据时创建的基因、表型和疾病的关联结构体系,它能有效的消除多种数据源之间的信息孤岛问题;通过逻辑回归(Logistic  Regression)算法对多种致病性证据数据源的加权平均计算,它能提升致病性位点的筛选速度;使用HTML样式编辑实现解读报告的排版及美化,压缩编辑报告的时间。有效解决各数据库之间的关联性较差,形成了较为明显的信息孤岛,导致解读人员需要不停的在各个查询页面上进行切换而非通过一次查询即获得完整数据的多维度展示;解读人员现阶段在对上千位点进行人工筛选时,缺乏针对特定疾病致病性位点的自动化排序机制,导致在这一步骤上消耗较多的时间;在制作报告的时候,报告用于的标准化以及排版的美观度等问题。
优选地,对各项证据数据源的分值进行计算采用的逻辑回归算法进行加权平均计算,以提升致病性位点的筛选速度。生信分析的各个位点根据计算结果的分值按数字大小排序。
实施例2
在本实施例中还提供了一种自动化报告解读系统,该系统用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
一种自动化报告解读系统,包括
智能分析模块,用于获取生信分析的各项证据数据源文件,并将结果文件中的各项数据进行加权平均计算,并将计算结果中各位点按致病性高低排序;
报告撰写模块,用于获取智能分析模块的计算结果、患者表型数据以及本地关系数据库中的数据,并进行结论性描述文字;实际操作中,获取智能分析模块的计算结果、患者表型数据以及本地关系数据库中的数据是一一对应;
生成模块,接收报告撰写模块报告的数据,并结合模板编辑模块中HTML文本合成,并生成PDF报告。
可选地,智能分析模块包括:
接收单元,用于获取生信分析的各项证据数据源文件;
计算单元,用于将结果文件中的各项数据进行加权平均计算;
排序单元,用于将加权平均计算结果中各位点按致病性高低排序。
可选地,报告撰写模块包括:
接收单元,用于获取智能分析模块的计算结果、患者表型数据以及本地关系 数据库中的数据,并将其一一对应;
描述单元,用于对进行数据结论性描述文字。
可选地,生成模块包括:
接收单元,用于接收报告撰写模块报告的数据;
整合单元,用于将接收单元获得的数据结合模板编辑模块中HTML文本合成;
报告生成单元,用于将合成后的文本生成PDF报告。
可选地是,报告生成单元中设有wkhtmltopdf工具。
实施例3
自动化报告解读方法,包括如下步骤:
获得生信分析结果文件以后,它首先被导入智能分析模块,该模块会对文件中的各项证据数据源的分值进行加权平均计算,再依据计算结果将位点按照致病性的高低排序,其中分值1.0代表致病性,分值0.0代表良性。其中使用逻辑回归(Logistic Regression)算法对以上表1中证据数据源进行加权平均计算,再依据计算结果将致病性以从高到底进行排序。
表1示例生信分析结果文件中的各项证据数据源
证据种类 数据源
功能预测 Polyphen2-HVAR
进化保守性 LRT
功能预测 SIFT
进化保守性 phastCons100way
进化保守性 GERP++
结构域 Gene
人群频率 gnomAD
结构域 dbNSFP Interpro
功能预测 MutationTaster2
历史评级 公司历史数据
基于这一结果,解读人员就能参照行业金标准对致病性位点进行最终筛选;同时每一份被导入的文件和解读人员的筛选结果也会被纳入模块的模型持续学习中,以不断提升后续计算排序的精确度。
筛选结果确定后,数据被导入报告撰写模块,同时被导入的还有患者表型数据,以及在本地关系型数据库中整合过的从多种公共数据库中抓取到的相关数据和历史数据,包括但不限于基因功能、表型描述、评级证据等;本地关系型数据库预先生成并持续更新。它通过REST API接口和Tab分隔(TSV)/逗号分隔(CSV)格式的文件加载到公共数据库中的数据,并将其按照基因-表型关系进行关联,形成多维度数据体系。该数据库的创建核心基于以下ER关系图如图2所示,其中1:m代表一对多的关系,而m:1代表多对一的关系。
解读人员结合上述自动获取到的数据,再在报告撰写模块中填入结论性的描述文字,之后生成JSON格式的报告数据(不含样式)用于报告的最终合成,同时将其保存于历史报告数据库中。其中,JSON格式的报告数据易于扩展,在报告内容不断进行优化的情况下,能够兼容各种模板的报告。JSON格式的报告存储于PostgreSQL关系型数据库后,借助其对JSON格式数据的处理能力,方便后期对历史数据的搜索回顾作业。JSON格式的报告存储时不带任何样式,最大限度的实现页面内容和排版的解耦,便于在报告模板更新时重新导入。
将JSON格式的报告数据和预先在模板编辑模块中设计的带有样式的HTML文本进行合成。其中,HTML的模板实现中央管控,并预先生成。模板的样式使用CSS进行处理,且修改后的模板经一次下发,即可应用于多人编辑的多份报告中。
使用开源的wkhtmltopdf工具生成终版的PDF报告;合并JSON内容和HTML模板,得以呈现终版的PDF报告。在此过程中,报告的攥写人员已无需关注报告的版面样式。
实施例4
本发明的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
获取生信分析的各项证据数据源;
对各项证据数据源的分值进行计算,将计算结果中代表致病性的数值定义为值A,将计算结果中代表良性结果的数值定义为值B;
将生信分析的各个位点根据计算结果的分值按序排序;
根据行业金标准筛选致病性位点;
将致病性位点数据、患者表型数据、本地关系型数据库中的相关数据对应后导入模板;其中,相关数据包括基因功能数据、表型描述数据以及评级证据;
给模板中加入结论性描述,得到完整报告。
实施例5
本发明的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
获取生信分析的各项证据数据源;
对各项证据数据源的分值进行计算,将计算结果中代表致病性的数值定义为值A,将计算结果中代表良性结果的数值定义为值B;
将生信分析的各个位点根据计算结果的分值按序排序;
根据行业金标准筛选致病性位点;
将致病性位点数据、患者表型数据、本地关系型数据库中的相关数据对应后导入模板;其中,相关数据包括基因功能数据、表型描述数据以及评级证据;
给模板中加入结论性描述,得到完整报告。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。 这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 自动化报告解读方法,其特征在于,包括:
    获取生信分析的各项证据数据源;
    对各项证据数据源的分值进行计算,将计算结果中代表致病性的数值定义为值A,将计算结果中代表良性结果的数值定义为值B;
    将生信分析的各个位点根据计算结果的分值按序排序;
    根据行业金标准筛选致病性位点;
    将致病性位点数据、患者表型数据、本地关系型数据库中的相关数据对应后导入模板;其中,相关数据包括基因功能数据、表型描述数据以及评级证据;
    给模板中加入结论性描述,得到完整报告。
  2. 根据权利要求1所述的自动化报告解读方法,其特征在于,所述值A中A为数字;值B中B为数字;生信分析的各个位点根据计算结果的分值按数字大小排序。
  3. 根据权利要求1所述的自动化报告解读方法,其特征在于,还包括将完整报告生成JSON格式的报告数据,并将JSON格式的报告存储于历史报告数据库。
  4. 根据权利要求1所述的自动化报告解读方法,其特征在于,所述本地关系型数据库包括OMIM数据库、CHPO数据库、HGMD数据库以及历史报告数据库;所述本地关系型数据库中OMIM数据库、CHPO数据库、HGMD数据库以及历史报告数据库采用ER关系图模式,按基因—表型关系关联,并形成多维度数据体系。
  5. 根据权利要求1所述的自动化报告解读方法,其特征在于,还包括将完整报告生成JSON格式的报告数据与HTML文本进行合成,并成成PDF报告。
  6. 根据权利要求1所述的自动化报告解读方法,其特征在于,对各项证据数据源的分值进行计算采用的是加权平均计算。
  7. 根据权利要求1所述的自动化报告解读方法,其特征在于,对各项证据数据源的分值进行计算采用的逻辑回归算法。
  8. 一种自动化报告解读系统,其特征在于,包括
    智能分析模块,用于获取生信分析的各项证据数据源文件,并将结果文件中的各项数据进行加权平均计算,并将计算结果中各位点按致病性高低排序;
    报告撰写模块,用于获取智能分析模块的计算结果、患者表型数据以及本地关系数据库中的数据,并进行结论性描述文字;
    生成模块,接收报告撰写模块报告的数据,并结合模板编辑模块中HTML文本合成,并生成PDF报告。
  9. 一种存储介质,其特征在于,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至7任一项中所述的方法。
  10. 一种电子装置,包括存储器和处理器,其特征在于,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至7任一项中所述的方法。
PCT/CN2020/092902 2019-12-20 2020-05-28 自动化报告解读方法及系统 WO2021120528A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911328539.9A CN111161824A (zh) 2019-12-20 2019-12-20 自动化报告解读方法及系统
CN201911328539.9 2019-12-20

Publications (1)

Publication Number Publication Date
WO2021120528A1 true WO2021120528A1 (zh) 2021-06-24

Family

ID=70557611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092902 WO2021120528A1 (zh) 2019-12-20 2020-05-28 自动化报告解读方法及系统

Country Status (2)

Country Link
CN (1) CN111161824A (zh)
WO (1) WO2021120528A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161824A (zh) * 2019-12-20 2020-05-15 苏州赛美科基因科技有限公司 自动化报告解读方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512508A (zh) * 2014-09-22 2016-04-20 深圳华大基因研究院 自动生成基因检测报告的方法及装置
CN109086571A (zh) * 2018-08-03 2018-12-25 国家卫生计生委科学技术研究所 一种单基因病遗传变异智能解读及报告的方法和系统
CN109817299A (zh) * 2019-02-14 2019-05-28 北京安智因生物技术有限公司 一种疾病相关的基因检测报告自动化生成方法及系统
CN110544508A (zh) * 2019-07-29 2019-12-06 北京荣之联科技股份有限公司 一种单基因遗传病基因的分析方法、装置及电子设备
CN111161824A (zh) * 2019-12-20 2020-05-15 苏州赛美科基因科技有限公司 自动化报告解读方法及系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183371B (zh) * 2007-12-12 2010-06-09 中兴通讯股份有限公司 一种快速完成大数据处理的方法和报表系统
CN108039193A (zh) * 2017-11-17 2018-05-15 哈尔滨工大服务机器人有限公司 一种自动生成体检报告的方法及装置
CN109686439B (zh) * 2018-12-04 2020-08-28 东莞博奥木华基因科技有限公司 遗传病基因检测的数据分析方法、系统及存储介质
CN109754856B (zh) * 2018-12-07 2021-06-22 荣联科技集团股份有限公司 自动生成基因检测报告的方法及装置、电子设备
CN109859831A (zh) * 2018-12-19 2019-06-07 海南一龄医疗产业发展有限公司 一种医疗信息管理系统
CN109739869B (zh) * 2018-12-29 2021-04-06 北京航天数据股份有限公司 模型运行报告生成方法及系统
CN110428127B (zh) * 2019-06-19 2022-04-15 深圳壹账通智能科技有限公司 自动化分析方法、用户设备、存储介质及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512508A (zh) * 2014-09-22 2016-04-20 深圳华大基因研究院 自动生成基因检测报告的方法及装置
CN109086571A (zh) * 2018-08-03 2018-12-25 国家卫生计生委科学技术研究所 一种单基因病遗传变异智能解读及报告的方法和系统
CN109817299A (zh) * 2019-02-14 2019-05-28 北京安智因生物技术有限公司 一种疾病相关的基因检测报告自动化生成方法及系统
CN110544508A (zh) * 2019-07-29 2019-12-06 北京荣之联科技股份有限公司 一种单基因遗传病基因的分析方法、装置及电子设备
CN111161824A (zh) * 2019-12-20 2020-05-15 苏州赛美科基因科技有限公司 自动化报告解读方法及系统

Also Published As

Publication number Publication date
CN111161824A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
US11521751B2 (en) Patient data visualization method and system for assisting decision making in chronic diseases
US11232365B2 (en) Digital assistant platform
Huang et al. Sample imbalance disease classification model based on association rule feature selection
CN109830303A (zh) 基于互联网一体化医疗平台的临床数据挖掘分析与辅助决策方法
WO2021227511A1 (zh) 一种基于电子病历大数据的并发症发病风险预测方法、系统、终端以及存储介质
Yang et al. Multi-source transfer learning via ensemble approach for initial diagnosis of Alzheimer’s disease
WO2021114635A1 (zh) 患者分群模型构建方法、患者分群方法及相关设备
WO2020248847A1 (zh) 智能心脏疾病检测方法、装置及计算机可读存储介质
WO2021120588A1 (zh) 语料生成方法、装置、计算机设备及存储介质
US20170228651A1 (en) Data driven featurization and modeling
CN117271804B (zh) 一种共病特征知识库生成方法、装置、设备及介质
Squires et al. Deep learning and machine learning in psychiatry: a survey of current progress in depression detection, diagnosis and treatment
WO2022194062A1 (zh) 疾病标签检测方法、装置、电子设备及存储介质
US20210089965A1 (en) Data Conversion/Symptom Scoring
CN113111162A (zh) 科室推荐方法、装置、电子设备及存储介质
Bhat Detection of polycystic ovary syndrome using machine learning algorithms
Ramos-Carreño et al. scikit-fda: a Python package for functional data analysis
WO2021120528A1 (zh) 自动化报告解读方法及系统
Jameel et al. Analyses the performance of data warehouse architecture types
Wang et al. Medical Data Classification Assisted by Machine Learning Strategy
Liu et al. Design and implementation of family doctor app on android platform
US20170220773A1 (en) System and method for contextualized tracking of the progress of a clinical study
US20210375466A1 (en) Artificial intelligence methods and systems for constitutional analysis using objective functions
Chen et al. Multi-modal learning for inpatient length of stay prediction
Reid Diabetes diagnosis and readmission risks predictive modelling: USA

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20902415

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20902415

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19/01/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20902415

Country of ref document: EP

Kind code of ref document: A1