TWI817104B - Annotation system for genetic test reports related to toxic chemical substances - Google Patents

Annotation system for genetic test reports related to toxic chemical substances Download PDF

Info

Publication number
TWI817104B
TWI817104B TW110113076A TW110113076A TWI817104B TW I817104 B TWI817104 B TW I817104B TW 110113076 A TW110113076 A TW 110113076A TW 110113076 A TW110113076 A TW 110113076A TW I817104 B TWI817104 B TW I817104B
Authority
TW
Taiwan
Prior art keywords
gene
toxic chemical
module
chemical substances
database
Prior art date
Application number
TW110113076A
Other languages
Chinese (zh)
Other versions
TW202240460A (en
Inventor
林琥 沈
Original Assignee
國立臺灣師範大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國立臺灣師範大學 filed Critical 國立臺灣師範大學
Priority to TW110113076A priority Critical patent/TWI817104B/en
Publication of TW202240460A publication Critical patent/TW202240460A/en
Application granted granted Critical
Publication of TWI817104B publication Critical patent/TWI817104B/en

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

An annotation system for genetic test reports related to toxic chemical substances is disclosed. It contains a graphic text recognition module, a report content receiving module, a noun extraction module, a database module, a text definition module, a correlation determination module, and a comment output module. Due to the introduction of existing academic or experimental related data, the relationship between toxic chemical substances and genes and related information not listed in the genetic test report can be presented by the present invention, so that the owner of the report can more clearly understand the relevant toxic chemical substances with genetic knowledge, getting rid of the trouble caused by professional terminology. The invention can also unify the genetic testing reports issued by different genetic testing units, so that readers can have a comprehensive understanding.

Description

與毒性化學物質相關的基因檢測報告之註解系統Annotation system for genetic test reports related to toxic chemicals

本發明關於一種註解系統,特別是一種與毒性化學物質相關的基因檢測報告之註解系統。The present invention relates to an annotation system, in particular to an annotation system for gene detection reports related to toxic chemical substances.

基因檢測是藉由分析染色體結構、DNA序列、DNA變異位點或基因表現程度,提供受檢者與相關醫事人員評估某些與基因遺傳有關的疾病、體質或個人特質的資訊,執行單位通常會提供一份基因檢測報告與檢測人員說明來結案。基於不同的檢測目的,傳統上的基因檢測有以下幾種類型:婚前基因檢測、胎兒或新生兒基因檢測、預測性基因檢測及診斷性基因檢測。前述的基因檢測都具有特定的標的,因此也形成了一定的市場。隨著基因檢測技術的成熟與費用的下降,基因檢測的目的與標的也有新的範疇。舉例而言,毒性化學物質(體內)與相關的基因檢測是近年來新興的市場。由於某些基因的作用或突變是受特定毒性化學物質過高劑量的影響,檢測機構可藉由偵測體內某些毒性化學物質及檢測相關的基因序列來告知受測者需要注意的健康疾病問題。Genetic testing analyzes chromosome structure, DNA sequence, DNA mutation sites or gene expression levels to provide subjects and relevant medical personnel with information to evaluate certain diseases, physical constitutions or personal characteristics related to genetic inheritance. The execution unit usually Provide a genetic test report and tester instructions to close the case. Based on different testing purposes, there are traditionally several types of genetic testing: premarital genetic testing, fetal or newborn genetic testing, predictive genetic testing and diagnostic genetic testing. The aforementioned genetic tests all have specific targets, so they have also formed a certain market. As genetic testing technology matures and costs decrease, the goals and objectives of genetic testing also have new scope. For example, toxic chemicals (in vivo) and related genetic testing are emerging markets in recent years. Since the functions or mutations of certain genes are affected by excessive doses of specific toxic chemicals, testing institutions can inform subjects of health diseases that need attention by detecting certain toxic chemicals in the body and detecting related gene sequences. .

雖然體內毒性化學物質與相關的基因檢測的立意良善且預測性高,對於改善國人健康狀態及提高生活品質有助益,但推行效果不好。其原因分析有以下幾點:第一、國人對毒素的影響與可能發生的相關疾病理解不足,從而忽視了該項檢測的重要性;第二、檢測報告的內容包含了太多的生化專業術語,解說人員解說檢測報告內容時不容易讓受測者了解具體內容;第三、各家檢測機構在檢測報告中使用生化名詞的標準不一(可能受使用設備設定的影響),受測者無法能對各家檢測機構的檢測報告內容有統合性的理解,也很難藉由一份檢測報告來參照其它醫事報告(如身體檢查報告),作為改善健康狀況的依據。Although testing for toxic chemicals in the body and related genes is well-intentioned and highly predictive, and is helpful in improving the health status of Chinese people and improving their quality of life, the implementation effect is not good. The reasons are as follows: First, Chinese people have insufficient understanding of the impact of toxins and possible related diseases, thus ignoring the importance of the test; second, the content of the test report contains too many biochemical terms , when the commentator explains the content of the test report, it is not easy for the testee to understand the specific content; thirdly, each testing agency has different standards for using biochemical nouns in the test report (may be affected by the settings of the equipment used), and the testee cannot It is difficult to have a comprehensive understanding of the content of test reports from various testing institutions, and it is difficult to use one test report to refer to other medical reports (such as physical examination reports) as a basis for improving health conditions.

見諸現有技術,有些前案可為前述問題提出解方。比如中國大陸專利申請案第CNA-111564178A號提出一種基因多態性分析報告的生成方法、裝置、設備及存儲介質;又中國大陸專利申請案第CNA-111627509A號揭露一種病毒基因檢測報告的生成方法、裝置、設備及存儲介質。兩篇前案專利雖然都跟基因相關報告的生成有關,但並未對報告內容資料進行正規化,也無法輸出註解以強化報告說明。Looking at the existing technology, some previous cases can provide solutions to the aforementioned problems. For example, Mainland China Patent Application No. CNA-111564178A proposes a method, device, equipment and storage medium for generating a genetic polymorphism analysis report; and Mainland China Patent Application No. CNA-111627509A discloses a method for generating a viral gene detection report , devices, equipment and storage media. Although the two previous patents are related to the generation of gene-related reports, they do not formalize the report content and data, and it is impossible to output annotations to strengthen the report description.

因此,為了要解決現有毒性化學物質與相關的基因檢測在推廣時所面臨的問題,讓該項檢測能更廣泛地推廣以謀全民福祉,從而有本發明之研發與提出。Therefore, in order to solve the problems faced in the promotion of existing toxic chemicals and related genetic tests, so that the test can be more widely promoted for the benefit of all people, the present invention was developed and proposed.

本段文字提取和編譯本發明的某些特點。其它特點將被揭露於後續段落中。其目的在涵蓋附加的申請專利範圍之精神和範圍中,各式的修改和類似的排列。This text extracts and compiles certain features of the invention. Other features will be revealed in subsequent paragraphs. It is intended to cover various modifications and similar arrangements within the spirit and scope of the appended claims.

為了解決前述問題,本發明揭露一種與毒性化學物質相關的基因檢測報告之註解系統,安裝於一伺服主機,其包含:一報告內容接收模組,接收一與毒性化學物質相關的基因檢測報告的檔案;一名詞擷取模組,自該與毒性化學物質相關的基因檢測報告檔案中除去中文部分,並擷取剩餘的複數個字串;一資料庫模組,包括:一毒性化學物質資料庫,關聯性儲存複數個毒性化學物質在美國國立醫學圖書館醫學主題詞庫(Medical Subject Headings,MeSH)的一Mesh ID、每一毒性化學物質的複數個名稱與化學表示式、每一毒性化學物質導致的複數個疾病名稱,及每一毒性化學物質在MeSH描述符資料(Descriptor Data)中的藥物行動(Pharm Action)的中文譯名;及一基因資料庫,關聯性儲存複數個基因在美國國家生物技術資訊中心定義之基因名稱、每一基因的複數個別名、每一基因的複數個表達式與位點代號、每一基因的中文註解,及每一基因因缺陷而導致的複數個疾病名稱;一文字定義模組,將該些字串於該資料庫模組中比對以找出每一字串對應的Mesh ID或基因名稱;一關聯性決定模組,將該文字定義模組找出的每一Mesh ID與每一基因名稱組成一關聯比較組,及排除Mesh ID與基因名稱各自對應的疾病名稱中相同的數量小於3個的關聯比較組;及一註解輸出模組,將該關聯性決定模組保留的關聯比較組中,列舉輸出每一關聯比較組的Mesh ID對應的Pharm Action的中文譯名、基因名稱,及關聯比較組中Mesh ID與基因名稱相同之對應的疾病名稱為該與毒性化學物質相關的基因檢測報告檔案的註解。In order to solve the aforementioned problems, the present invention discloses an annotation system for gene detection reports related to toxic chemical substances, which is installed on a server host and includes: a report content receiving module to receive a gene detection report related to toxic chemical substances. File; a word retrieval module, which removes the Chinese part from the genetic test report file related to toxic chemical substances, and retrieves the remaining plural strings; a database module, including: a toxic chemical substance database , the correlation stores a Mesh ID of multiple toxic chemical substances in the Medical Subject Headings (MeSH) of the U.S. National Library of Medicine, multiple names and chemical expressions of each toxic chemical substance, and each toxic chemical substance The names of multiple diseases caused, and the Chinese translation of the drug action (Pharm Action) of each toxic chemical substance in the MeSH descriptor data (Descriptor Data); and a gene database that associates multiple genes stored in the U.S. National Biology The gene name defined by the Technology Information Center, multiple aliases for each gene, multiple expressions and site codes for each gene, Chinese annotations for each gene, and multiple disease names caused by defects in each gene; A text definition module that compares the strings in the database module to find the Mesh ID or gene name corresponding to each string; a correlation determination module that compares the strings found by the text definition module Each Mesh ID and each gene name form a correlation comparison group, and the correlation comparison group that has less than 3 identical disease names corresponding to the Mesh ID and the gene name is excluded; and an annotation output module is used to convert the correlation Among the correlation comparison groups retained by the decision module, list and output the Chinese translation name and gene name of the Pharm Action corresponding to the Mesh ID of each correlation comparison group, and the corresponding disease name in the correlation comparison group whose Mesh ID is the same as the gene name is and Annotations for genetic testing report files related to toxic chemicals.

該與毒性化學物質相關的基因檢測報告之註解系統可進一步包含一圖形文字辨識模組,於接收一紙本毒性化學物質相關的基因檢測報告的一掃描圖檔後,由該掃描圖檔中辨識並擷取文字部分,以組成該與毒性化學物質相關的基因檢測報告檔案。The annotation system for a genetic test report related to toxic chemical substances may further include a graphic text recognition module, which after receiving a scanned image file of a paper genetic test report related to toxic chemical substances, recognizes it from the scanned image file. And extract the text part to form the genetic test report file related to toxic chemical substances.

最好,該些字串中包含空格。Preferably, the strings contain spaces.

依照本發明,毒性化學物質的名稱包含通用英文名稱、國際純粹與應用化學聯合會命名原則定義的名稱、CAS編號、歐洲分子生物學實驗室定義的ChEMBL ID、具有生物意義的化學實體資料庫定義的ChEBI ID,與有機小分子生物活性資料庫定義的PubChem CID。According to the present invention, the name of the toxic chemical substance includes a common English name, a name defined by the International Union of Pure and Applied Chemistry Nomenclature Principles, a CAS number, a ChEMBL ID defined by the European Molecular Biology Laboratory, and a database definition of chemical entities with biological significance. ChEBI ID, and PubChem CID defined by the Organic Small Molecule Bioactivity Database.

依照本發明,疾病名稱由衛生福利部公布之國際疾病分類標準第10版所定義。According to the present invention, disease names are defined by the International Standard Classification of Diseases, 10th edition, published by the Ministry of Health and Welfare.

依照本發明,基因的表達式包含HUGO基因命名委員會定義的HGNC ID、歐洲分子生物學實驗室Ensembl計畫定義的Ensembl ID 、在線人類孟德爾遺傳(Online Mendelian Inheritance in Man,OMIM)資料庫定義的OMIM ID、UniProt知識庫定義的UniProt ID、Entrez Gene資料庫定義的基因編號,及美國國家生物技術資訊中心基因組參考協會人類建立第38版(Genome Reference Consortium Human Build 38)定義的位置說明。According to the present invention, the expression of the gene includes the HGNC ID defined by the HUGO Gene Nomenclature Committee, the Ensembl ID defined by the European Molecular Biology Laboratory Ensembl Project, and the Online Mendelian Inheritance in Man (OMIM) database. OMIM ID, UniProt ID defined by the UniProt knowledge base, gene number defined by the Entrez Gene database, and position description defined by the National Center for Biotechnology Information Genome Reference Consortium Human Build 38 (Genome Reference Consortium Human Build 38).

由於引入現有學術或實驗的關聯資料,基因檢測報告中未列的毒性化學物質與基因的關係與相關資訊,可藉由本系統呈現,從而讓該報告的擁有者能更清楚地了解相關毒性化學物質與基因的知識,擺脫專業術語造成的困擾。本發明也可統一不同基因檢測單位出具的基因檢測報告,讓閱讀者能有全面的理解。Due to the introduction of existing academic or experimental related data, the relationship between toxic chemicals and genes and related information not listed in the genetic testing report can be presented through this system, allowing the owner of the report to have a clearer understanding of the relevant toxic chemicals. Get rid of the confusion caused by professional terminology with the knowledge of genes. The present invention can also unify genetic testing reports issued by different genetic testing units, allowing readers to have a comprehensive understanding.

本發明將藉由參照下列的實施方式而更具體地描述。The present invention will be described in more detail with reference to the following embodiments.

請見圖1,該圖為依照本發明實施例的一種與毒性化學物質相關的基因檢測報告之註解系統(以下簡稱本系統)的架構示意圖。本系統可安裝於一伺服主機10中,藉由伺服主機10的硬體架構而運作。安裝本系統的伺服主機10之硬體架構和一般伺服器架構無大差異,可包含中央處理器、記憶體、儲存裝置(比如硬碟)、輸出入單元等。這些硬體雖未繪示於圖1中,然其為伺服器領域的技術人員所應了解的架構。此外,伺服主機10中重要硬體之一是網路通訊界面110,其為伺服主機10與外界硬體,比如電腦1,透過網路2連接的重要硬體及韌體(有時也包含運行於作業系統的程式軟體)的總裝,可以包含網路卡、連接排線、無線通訊模組等硬體。以下所介紹關於本發明的各個模組,為利用或配合上述現有的伺服主機10的設備而運行之本系統的技術要件。因此,它們可以是軟體,包含了特定的程式碼與資料,而在作業系統下運行於至少一部份的硬體架構中(比如程式碼與相關資料檔案儲存於儲存裝置中,在作業系統的運作下暫存於記憶體,而為中央處理器動態地調用執行)。另一方面,該些模組也可以是特製硬體,比如特殊應用積體電路(Application-specific integrated circuit,ASIC)或外接卡,用以執行該些模組所賦予的作用。更有甚者,這些技術要件可以是部分是軟體、部分是硬體,依照產品設計人員的需求而有效整合,都在本專利所主張的技術範圍內。Please see Figure 1 , which is a schematic structural diagram of an annotation system for gene detection reports related to toxic chemical substances (hereinafter referred to as this system) according to an embodiment of the present invention. The system can be installed in a server host 10 and operates through the hardware structure of the server host 10 . The hardware architecture of the server host 10 on which this system is installed is not much different from that of a general server, and may include a central processing unit, memory, storage devices (such as hard disks), input/output units, etc. Although these hardware are not shown in Figure 1, they are architectures that those skilled in the server field should understand. In addition, one of the important hardware in the server host 10 is the network communication interface 110, which is the important hardware and firmware (sometimes including running The final assembly of program software for operating systems) can include hardware such as network cards, connection cables, and wireless communication modules. Each module of the present invention introduced below is the technical requirement of the system that utilizes or cooperates with the above-mentioned existing servo host 10 equipment to operate. Therefore, they can be software that contains specific program code and data, and runs in at least part of the hardware architecture under the operating system (for example, the program code and related data files are stored in a storage device, in the operating system's It is temporarily stored in the memory during operation, and is dynamically called and executed by the central processor). On the other hand, the modules may also be specially designed hardware, such as application-specific integrated circuits (ASICs) or external cards, to perform the functions assigned by the modules. What's more, these technical requirements can be partly software and partly hardware, and can be effectively integrated according to the needs of product designers, which are all within the technical scope advocated by this patent.

依照本發明,本系統包含了一圖形文字辨識模組210、一報告內容接收模組220、一名詞擷取模組230、一資料庫模組240、一文字定義模組250、一關聯性決定模組260與一註解輸出模組270。關於以上模組的架構、功能,及相互間的運作方式,將於下方詳細說明。According to the present invention, the system includes a graphic character recognition module 210, a report content receiving module 220, a word retrieval module 230, a database module 240, a character definition module 250, and a relevance determination module. Group 260 and an annotation output module 270. The structure, functions, and mutual operation of the above modules will be explained in detail below.

圖形文字辨識模組210用於接收一紙本毒性化學物質相關的基因檢測報告的一掃描圖檔後,由該掃描圖檔中辨識並擷取文字部分(包含中文、外文、數字與符號),以組成一與毒性化學物質相關的基因檢測報告檔案。該掃描圖檔可由電腦1,通過網路2及網路通訊界面110,而由圖形文字辨識模組210接收。亦即,使用者將毒性化學物質相關的基因檢測報告掃描成圖檔,再讓圖形文字辨識模組210轉成可用的文字檔,傳送給報告內容接收模組220處理即可。如果毒性化學物質相關的基因檢測報告本身就是資料檔,比如文字檔,其中包含了必要的資訊,那麼報告內容接收模組220便可透過網路2及網路通訊界面110,由電腦1處直接取得。The graphic text recognition module 210 is used to receive a scanned image file of a paper genetic test report related to toxic chemical substances, and recognize and extract the text part (including Chinese, foreign languages, numbers and symbols) from the scanned image file, To form a genetic test report file related to toxic chemicals. The scanned image file can be received by the computer 1 through the network 2 and the network communication interface 110 by the graphic text recognition module 210. That is, the user scans the genetic test report related to toxic chemicals into a graphic file, and then allows the graphic text recognition module 210 to convert it into a usable text file, and then sends it to the report content receiving module 220 for processing. If the genetic test report related to toxic chemical substances is itself a data file, such as a text file, which contains necessary information, then the report content receiving module 220 can directly receive it from the computer 1 through the network 2 and the network communication interface 110 obtain.

名詞擷取模組230自該與毒性化學物質相關的基因檢測報告檔案中除去中文部分,並擷取剩餘的複數個字串。其處理的邏輯可以圖2來說明,該圖顯示一與毒性化學物質相關的基因檢測報告的部分內容。名詞擷取模組230可以知道圖2中那些是中文部分,將之剔除後,剩下的文字就形成了許多的字串,比如”A288888888”、”73/11/31”、”110/1/20”、”Melamine”、”Di(2-ethylhexyl) phthalate”、”Arsenic / Creatinine”、”M1”等,這些字串都以灰底標示。要注意的是,這些字串中除了外文(台灣的基因檢測報告因為歷史發展因素,幾乎使用的外文都是英文)與數字外,也會包含空格與符號(如Di(2-ethylhexyl) phthalate中的空格與括號)。全形符號”:”因為跟隨中文且為同字形,所以也為名詞擷取模組230認定成中文部分而不擷取。這些擷取的字串將於其它模組中運算使用。The noun retrieval module 230 removes the Chinese part from the genetic test report file related to toxic chemical substances, and retrieves the remaining plural word strings. The processing logic can be illustrated in Figure 2, which shows part of a genetic test report related to toxic chemicals. The noun retrieval module 230 can know that those parts in Figure 2 are Chinese parts. After eliminating them, the remaining words form many strings, such as "A288888888", "73/11/31", "110/1" /20", "Melamine", "Di(2-ethylhexyl) phthalate", "Arsenic / Creatinine", "M1", etc. These strings are marked with a gray background. It should be noted that in addition to foreign languages (Due to historical development factors, almost all foreign languages used in Taiwan’s genetic testing reports are English) and numbers, these strings also contain spaces and symbols (such as Di(2-ethylhexyl) phthalate spaces and brackets). Since the full-width symbol ":" follows Chinese and has the same glyph, it is also recognized as the Chinese part by the noun retrieval module 230 and is not captured. These extracted strings will be used in operations in other modules.

資料庫模組240包括了一毒性化學物質資料庫241及一基因資料庫242。毒性化學物質資料庫241是個關聯性資料庫,也就是每一儲存的資料會與其它一個以上的相同或不同屬性的資料有關聯,以便方便資料處理。依照本發明,毒性化學物質資料庫241關聯性儲存了複數個毒性化學物質在美國國立醫學圖書館醫學主題詞庫(Medical Subject Headings,MeSH)的一Mesh ID、每一毒性化學物質的複數個名稱與化學表示式、每一毒性化學物質導致的複數個疾病名稱,及每一毒性化學物質在MeSH描述符資料(Descriptor Data)中的藥物行動(Pharm Action)的中文譯名。為了對此有較佳的了解,請參見圖3,該圖表列毒性化學物質資料庫241中與一個毒性化學物質(本實施例中為DEHP,即鄰苯二甲酸二(2-乙基己基)酯,為塑化劑的一種)相關的數個關聯性資料的欄位。其它毒性化學物質也有相同的關聯性資料的欄位。在圖3中,最上層欄位的”D004051”就是DEHP的Mesh ID。第二欄位中的資料包含了DEHP的名稱與化學表示式(C24H38O4)。原則上,除了DEHP所有的名稱都應蒐錄到毒性化學物質資料庫241外,還須包含所有學術或政府組織統一定義的名稱或編號。然而,目前基因檢測業界因為歷史發展因素,只有應用到數種規範的編號。因此,毒性化學物質的名稱只須包含通用英文名稱(如DEHP、BEHP、bis(2-ethylhexyl) phthalate與Di(2-ethylhexyl)phthalate)、國際純粹與應用化學聯合會命名原則定義的名稱(bis(2-ethylhexyl) benzene-1,2-dicarboxylate)、美國化學文摘社(Chemical Abstracts Service,CAS)的CAS編號(117-81-7)、歐洲分子生物學實驗室定義的ChEMBL ID(CHEMBL1242017)、具有生物意義的化學實體資料庫定義的ChEBI ID(ChEBI11747),與有機小分子生物活性資料庫定義的PubChem CID(8343)。任二毒性化學物質的名稱之間,及毒性化學物質的名稱與化學表示式以分號”;”分隔,以便進行資料搜尋。第三欄位中列出了DEHP導致的數個疾病名稱。相同地,二個疾病名稱也以分號分隔。應注意的是,由於一種疾病可能有許多種名稱,包含學名與俗稱,為了精確定義疾病名稱以便後續計算關聯性,依照本發明,疾病名稱由衛生福利部公布之國際疾病分類標準第10版所定義。以上與DEHP有關的疾病名稱,可以由美國國家生物技術資訊中心下的多個資料庫中獲得。最後,最下方的欄位顯示DEHP在MeSH描述符資料中的Pharm Action的中文譯名。請見圖4,該圖顯示截自美國國家生物技術資訊中心DEHP分類資料庫描述符資料中的搜尋結果。Pharm Action欄位(橢圓框所框示)所顯示的便是DEHP所屬相關化學物質中的最廣義的群組的專有名詞,經翻譯後也就是塑化劑。將塑化劑列出,比起DEPH,更能讓人們關注與了解其有害性。The database module 240 includes a toxic chemical substance database 241 and a gene database 242 . The toxic chemical substance database 241 is a correlation database, that is, each stored data will be related to one or more other data with the same or different attributes to facilitate data processing. According to the present invention, the toxic chemical substance database 241 associates a Mesh ID of a plurality of toxic chemical substances in the Medical Subject Headings (MeSH) of the U.S. National Library of Medicine and a plurality of names of each toxic chemical substance. With chemical expressions, multiple names of diseases caused by each toxic chemical substance, and the Chinese translation of each toxic chemical substance's Pharm Action in the MeSH descriptor data (Descriptor Data). To better understand this, please see Figure 3, which lists the toxic chemicals in the Toxic Chemical Database 241 and a toxic chemical (DEHP in this example, di(2-ethylhexyl) phthalate). Ester, which is a type of plasticizer) has several related data fields related to it. Other toxic chemicals also have fields for the same correlation data. In Figure 3, "D004051" in the top column is the Mesh ID of DEHP. The information in the second column contains the name and chemical formula of DEHP (C24H38O4). In principle, in addition to all DEHP names being searched into the Toxic Chemical Substance Database 241, they must also contain names or numbers uniformly defined by all academic or government organizations. However, currently, due to historical development factors, only a few standardized numbers are used in the genetic testing industry. Therefore, the names of toxic chemical substances only need to include common English names (such as DEHP, BEHP, bis(2-ethylhexyl) phthalate and Di(2-ethylhexyl)phthalate), names defined by the naming principles of the International Union of Pure and Applied Chemistry (bis (2-ethylhexyl) benzene-1,2-dicarboxylate), CAS number (117-81-7) of Chemical Abstracts Service (CAS), ChEMBL ID defined by European Molecular Biology Laboratory (CHEMBL1242017), The ChEBI ID (ChEBI11747) defined by the Chemical Entities Database of Biological Significance, and the PubChem CID (8343) defined by the Bioactive Small Molecule Database. The names of any two toxic chemical substances, and the name and chemical expression of the toxic chemical substance are separated by a semicolon ";" to facilitate data search. The third column lists the names of several diseases caused by DEHP. Likewise, two disease names are separated by a semicolon. It should be noted that since a disease may have many names, including scientific names and common names, in order to accurately define the disease name for subsequent calculation of correlation, according to the present invention, the disease name is determined by the 10th edition of the International Classification of Diseases published by the Ministry of Health and Welfare. definition. The above disease names related to DEHP can be obtained from multiple databases under the National Center for Biotechnology Information. Finally, the bottom column displays the Chinese translation of DEHP's Pharm Action in the MeSH descriptor data. See Figure 4, which shows search results taken from descriptor data in the DEHP classification database of the National Center for Biotechnology Information. What is displayed in the Pharm Action column (indicated by an oval frame) is the proper noun of the broadest group of related chemical substances to which DEHP belongs. After translation, it is also a plasticizer. Listing plasticizers can make people pay more attention to and understand their harmfulness than DEPH.

基因資料庫242也是一個關聯性資料庫,因此可以關聯性儲存一些資料。在本實施例中,基因資料庫242關聯性儲存數個基因在美國國家生物技術資訊中心定義之基因名稱、每一基因的複數個別名、每一基因的複數個表達式與位點代號、每一基因的中文註解,及每一基因因缺陷而導致的複數個疾病名稱。為了對此有較佳的了解,請參見圖5,該圖表列基因資料庫242中與一個基因(本實施例中為BRCA1)相關的數個關聯性資料的欄位。在圖5中,最上層欄位顯示的是該基因在美國國家生物技術資訊中心定義之基因名稱。第二欄位中的資料包含了BRCA1的其它別名,比如BRCC1、BROVCA1、FANCS等。依照本發明,第三欄位中列出的BRCA1的表達式中,應至少包含以下基因檢測業界使用的基因註解名稱或符號:HUGO基因命名委員會定義的HGNC ID(如HGNC:1100)、歐洲分子生物學實驗室Ensembl計畫定義的Ensembl ID(如ENSG0000001204) 、在線人類孟德爾遺傳(Online Mendelian Inheritance in Man,OMIM)資料庫定義的OMIM ID(如OMIM113705)、UniProt知識庫定義的UniProt ID(如P38398)、Entrez Gene資料庫定義的基因編號(如EG672),及美國國家生物技術資訊中心基因組參考協會人類建立第38版(Genome Reference Consortium Human Build 38)定義的位置說明(如hr17:43,044,295-43,170,245),這些都是目前基因檢測單位(設備)用來註明特定基因的方式。此外,諸如MTC11、QW34TR3、PY3466等位點代號,則是各種基因探針針對不同基因(群)設計的探針編號,也會被用來當作指定特定基因的方式。以上的表達式與位點代號組成了所有與BRCA1同等意義的名稱群,可用以指向該基因。同理,第二欄與第三欄中所有專有名稱或編號間,資料結構上也以分號來區分。第四欄詳細記載了各資料庫與散見文獻中BRCA1的中文註解,可以提供本系統使用者對BRCA1更深入的理解。最後,最下方的欄位顯示BRCA1因缺陷而導致的數個疾病名稱。與毒性化學物質資料庫241相同,基因資料庫242中的疾病名稱也應由衛生福利部公布之國際疾病分類標準第10版所定義。由於毒性化學物質資料庫241與基因資料庫242中的疾病名稱定義一致,兩者的疾病名稱便可用來進行關聯比較。The gene database 242 is also a correlation database, so some data can be stored in correlation. In this embodiment, the gene database 242 associates several genes with gene names defined by the National Center for Biotechnology Information, multiple aliases for each gene, multiple expressions and site codes for each gene, and multiple aliases for each gene. A Chinese annotation of a gene and the names of multiple diseases caused by defects in each gene. In order to have a better understanding of this, please refer to Figure 5, which shows several fields of correlation data related to a gene (BRCA1 in this embodiment) in the gene database 242. In Figure 5, the top column shows the gene name defined by the National Center for Biotechnology Information. The data in the second column includes other aliases of BRCA1, such as BRCC1, BROVCA1, FANCS, etc. According to the present invention, the expression of BRCA1 listed in the third column should at least contain the following gene annotation names or symbols used by the genetic testing industry: HGNC ID (such as HGNC: 1100) defined by the HUGO Gene Nomenclature Committee, European Molecule The Ensembl ID defined by the Biology Laboratory Ensembl Project (such as ENSG0000001204), the OMIM ID defined by the Online Mendelian Inheritance in Man (OMIM) database (such as OMIM113705), and the UniProt ID defined by the UniProt Knowledge Base (such as P38398), the gene number defined by the Entrez Gene database (such as EG672), and the position description defined by the National Center for Biotechnology Information Genome Reference Consortium Human Build 38 (such as hr17:43,044,295-43,170,245 ), these are the methods currently used by genetic testing units (equipment) to indicate specific genes. In addition, site codes such as MTC11, QW34TR3, and PY3466 are probe numbers designed by various gene probes for different genes (groups), and are also used as a way to designate specific genes. The above expressions and site codes constitute all name groups with the same meaning as BRCA1, which can be used to point to this gene. Similarly, all proper names or numbers in the second and third columns are also separated by semicolons in the data structure. The fourth column records in detail the Chinese annotations of BRCA1 in various databases and scattered literature, which can provide users of this system with a deeper understanding of BRCA1. Finally, the bottom column shows the names of several diseases caused by defects in BRCA1. Like the toxic chemical substance database 241, the disease names in the gene database 242 should also be defined by the 10th edition of the International Classification of Diseases published by the Ministry of Health and Welfare. Since the definitions of disease names in the toxic chemical substance database 241 and the gene database 242 are consistent, the two disease names can be used for correlation comparison.

文字定義模組250的作用是將名詞擷取模組230擷取的該些字串於資料庫模組240中比對,以找出每一字串對應的Mesh ID或基因名稱。再以圖2為例來說明。當文字定義模組250以”A288888888”進行檢索比對,由於是身分證字號,便無法於兩個資料庫中的現有數據中找到一樣的字串資料,因此會被摒棄不用。同樣的道理”73/11/31”是日期格式,也沒有對應的字串資料,也不被使用。相反的,”Di(2-ethylhexyl) phthalate”在毒性化學物質資料庫241中有記錄,因此會被文字定義模組250找到對應的Mesh ID,D004051。與基因相關的字串也會循上法找到基因資料庫242中的由美國國家生物技術資訊中心定義之基因名稱,此處不再重複。The function of the text definition module 250 is to compare the word strings retrieved by the noun retrieval module 230 in the database module 240 to find the Mesh ID or gene name corresponding to each word string. Take Figure 2 as an example to illustrate. When the text definition module 250 searches and compares "A288888888", since it is an ID card font size, it cannot find the same string data in the existing data in the two databases, so it will be discarded. In the same way, "73/11/31" is a date format and has no corresponding string data and is not used. On the contrary, "Di(2-ethylhexyl) phthalate" is recorded in the toxic chemical substance database 241, so the corresponding Mesh ID, D004051, will be found by the text definition module 250. The string related to the gene will also be found in the gene database 242 by the U.S. National Center for Biotechnology Information, which will not be repeated here.

由於本系統的開發目的是在為現有與毒性化學物質相關的基因檢測報告,以數據挖掘的技術提供註解,以便報告擁有者能更完整有效地了解該報告的內容。因此,相關的毒性化學物質與基因必須要有足夠的證據,證明其有關,從而本系統能有序地輸出註解資料。所以,由文字定義模組250找出的每一毒性化學物質或基因,必須分別與其它所有的基因或毒性化學物質進行關聯性演算,以確定兩者間是否”有關聯”。關聯性決定模組260便是處理此項工作的模組。依照本發明,關聯性決定模組,將文字定義模組250找出的每一Mesh ID與每一基因名稱組成一關聯比較組,排除Mesh ID與基因名稱各自對應的疾病名稱中相同的數量小於3個的關聯比較組。為了對此有較佳的理解,請見圖6,該圖顯示二組毒性化學物質與其分別關聯的基因名稱與二組基因與其分別關聯的基因名稱。在圖6中,毒性化學物質A、毒性化學物質B、基因1與基因2可以組成四個關聯比較組,即(毒性化學物質A,基因1)、(毒性化學物質A,基因2)、(毒性化學物質B,基因1)與(毒性化學物質B,基因2)。(毒性化學物質A,基因1)關聯比較組中,兩者各自對應的疾病名稱中相同的有疾病1、疾病2、疾病4、疾病13、疾病14、疾病15、疾病16、疾病17、疾病18與疾病19等10個,以灰底標示。(毒性化學物質A,基因2)關聯比較組中,兩者各自對應的疾病名稱中沒有相同的。(毒性化學物質B,基因1)關聯比較組中,兩者各自對應的疾病名稱中相同的只有疾病16一個。(毒性化學物質B,基因2)關聯比較組中,兩者各自對應的疾病名稱中相同的有疾病112、疾病113、疾病114、疾病115與疾病116等5個,以下底線標示。因此,排除(毒性化學物質A,基因2)與(毒性化學物質B,基因1)關聯比較組,剩下兩個關聯比較組中的毒性化學物質與基因在現有數據(期刊、論文或資料庫)中顯示有關聯(比如兩者交互影響產生特定疾病)。有可能,在相關基因檢測報告中只寫出毒性化學物質A與基因1有關,並羅列相關資料,但忽略了毒性化學物質B與基因2也有關聯。此時本系統可以適時地提供毒性化學物質B與基因2相關聯的資料,讓該基因檢測報告的擁有者可以增廣相關知識,同時了解毒性化學物質B與基因2對他的影響。甚至,有些基因檢測報告中未列的資訊,比如該基因檢測報告未說明毒性化學物質A與基因1都會導致疾病19的產生,本系統都能補上輸出,註解輸出模組270便是將關聯性決定模組260找到資料輸出的技術元件。Since the development purpose of this system is to provide annotations using data mining technology for existing genetic testing reports related to toxic chemicals, so that report owners can understand the content of the report more completely and effectively. Therefore, relevant toxic chemicals and genes must have sufficient evidence to prove their relationship, so that this system can output annotation data in an orderly manner. Therefore, each toxic chemical substance or gene found by the text definition module 250 must be correlated with all other genes or toxic chemical substances to determine whether the two are "related". The relevance determination module 260 is the module that handles this task. According to the present invention, the correlation determination module forms a correlation comparison group for each Mesh ID and each gene name found by the text definition module 250, and excludes the number of identical disease names corresponding to the Mesh ID and the gene name being less than 3 associated comparison groups. In order to have a better understanding of this, please see Figure 6, which shows two groups of toxic chemicals and their associated gene names and two groups of genes and their associated gene names. In Figure 6, toxic chemical substance A, toxic chemical substance B, gene 1 and gene 2 can form four correlation comparison groups, namely (toxic chemical substance A, gene 1), (toxic chemical substance A, gene 2), ( Toxic Chemical B, Gene 1) vs. (Toxic Chemical B, Gene 2). (Toxic chemical substance A, gene 1) In the correlation comparison group, the disease names corresponding to the two are the same: disease 1, disease 2, disease 4, disease 13, disease 14, disease 15, disease 16, disease 17, disease 18, disease 19 and other 10 numbers are marked with a gray background. (Toxic chemical substance A, gene 2) In the correlation comparison group, there are no identical disease names corresponding to the two. (Toxic chemical substance B, gene 1) In the correlation comparison group, among the disease names corresponding to the two, only disease 16 is the same. (Toxic chemical substance B, gene 2) In the correlation comparison group, the five disease names corresponding to the two are the same, including disease 112, disease 113, disease 114, disease 115, and disease 116, which are marked with the bottom line below. Therefore, after excluding the (toxic chemical substance A, gene 2) and (toxic chemical substance B, gene 1) correlation comparison groups, the toxic chemicals and genes in the remaining two correlation comparison groups are found in existing data (journals, papers or databases). ) is shown to be related (for example, the two interact to produce a specific disease). It is possible that the relevant genetic testing report only states that toxic chemical substance A is related to gene 1 and lists relevant information, but ignores that toxic chemical substance B is also related to gene 2. At this time, this system can provide timely information related to toxic chemical substance B and gene 2, so that the owner of the genetic test report can increase relevant knowledge and understand the impact of toxic chemical substance B and gene 2 on him. Even for some information that is not listed in the genetic test report, for example, the genetic test report does not indicate that both toxic chemical substance A and gene 1 will cause disease 19, this system can supplement the output. The annotation output module 270 is to link the The sex determination module 260 finds the technical components of the data output.

註解輸出模組270將關聯性決定模組260保留的關聯比較組中,列舉輸出每一關聯比較組的Mesh ID對應的Pharm Action的中文譯名、基因名稱,及關聯比較組中Mesh ID與基因名稱相同之對應的疾病名稱為該與毒性化學物質相關的基因檢測報告檔案的註解,該註解可以文字、圖表、圖型甚至是動態演示方式為之,其檔案格式不限。以圖6中的數據來說明。對於毒性化學物質A與基因1的註解結果可以是”毒性化學物質A中文譯名  基因1名稱  疾病1、疾病2、疾病4、疾病13、疾病14、疾病15、疾病16、疾病17、疾病18與疾病19  註解…”。當然,註解輸出模組270也可以加上一些口語修飾,使其成為可閱讀的文章,比如”毒性化學物質A中文譯名與基因1名稱是有關連的,兩者呈顯與疾病1、疾病2、疾病4、疾病13、疾病14、疾病15、疾病16、疾病17、疾病18與疾病19的高度相關,也就是依照報告中的數據,未來您有可能有該些疾病的隱憂。此外, 註解…”。The annotation output module 270 enumerates and outputs the Chinese translation of the Pharm Action and the gene name corresponding to the Mesh ID of each correlation comparison group in the correlation comparison group retained by the correlation determination module 260, as well as the Mesh ID and gene name in the correlation comparison group. The same corresponding disease name is an annotation of the gene test report file related to the toxic chemical substance. The annotation can be in the form of text, charts, graphics, or even dynamic presentations, and the file format is not limited. Let’s illustrate with the data in Figure 6. The annotation result for toxic chemical substance A and gene 1 can be "Chinese translation of toxic chemical substance A, gene 1 name, disease 1, disease 2, disease 4, disease 13, disease 14, disease 15, disease 16, disease 17, disease 18 and Disease 19 Notes…”. Of course, the annotation output module 270 can also add some oral modifications to make it a readable article, such as "The Chinese translation of toxic chemical substance A is related to the name of gene 1, and both are related to disease 1 and disease 2." , disease 4, disease 13, disease 14, disease 15, disease 16, disease 17, disease 18 and disease 19 are highly correlated, that is, according to the data in the report, you may have hidden worries about these diseases in the future. In addition, Note …”.

要注意的是,如果一個受測者在不同基因檢測單位進行了與毒性化學物質相關的基因檢測,並分別取得報告,他可以同時輸入該些報告進入本系統,從而本系統一次統整所有報告使用的不同專有名詞或編號,讓該受測者能完整理解基因檢測的結果。It should be noted that if a subject undergoes genetic testing related to toxic chemicals in different genetic testing units and obtains separate reports, he can input these reports into the system at the same time, so that the system integrates all reports at once The different proper nouns or numbers used allow the subject to fully understand the results of the genetic test.

雖然本發明已以實施方式揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明之精神和範圍內,當可作些許之更動與潤飾,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some modifications and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the appended patent application scope.

1:電腦 2:網路 10:伺服主機 110:網路通訊界面 210:圖形文字辨識模組 220:報告內容接收模組 230:名詞擷取模組 240:資料庫模組 241:毒性化學物質資料庫 242:基因資料庫 250:文字定義模組 260:關聯性決定模組 270:註解輸出模組 1:Computer 2:Internet 10:Servo host 110:Network communication interface 210: Graphic text recognition module 220: Report content receiving module 230: Noun extraction module 240: Database module 241:Toxic Chemical Substance Database 242:Gene database 250: Text definition module 260: Relevance decision module 270: Annotation output module

圖1為依照本發明實施例的一種與毒性化學物質相關的基因檢測報告之註解系統的架構示意圖,圖2顯示一與毒性化學物質相關的基因檢測報告的部分內容,圖3表列毒性化學物質資料庫中與DEHP相關的數個關聯性資料的欄位,圖4顯示顯示截自美國國家生物技術資訊中心DEHP分類資料庫描述符資料中的搜尋結果,圖5表列基因資料庫中與BRCA1相關的數個關聯性資料的欄位,圖6顯示二組毒性化學物質與其分別關聯的基因名稱與二組基因與其分別關聯的基因名稱。Figure 1 is a schematic structural diagram of an annotation system for a gene detection report related to toxic chemical substances according to an embodiment of the present invention. Figure 2 shows part of a gene detection report related to toxic chemical substances. Figure 3 lists toxic chemical substances. There are several fields of correlation data related to DEHP in the database. Figure 4 shows the search results extracted from the descriptor data of the DEHP classification database of the National Center for Biotechnology Information. Figure 5 lists the fields related to BRCA1 in the tabulated gene database. There are several fields related to correlation data. Figure 6 shows two groups of toxic chemicals and their respectively associated gene names, and two groups of genes and their respective associated gene names.

1:電腦 1:Computer

2:網路 2:Internet

10:伺服主機 10:Servo host

110:網路通訊界面 110:Network communication interface

210:圖形文字辨識模組 210: Graphic text recognition module

220:報告內容接收模組 220: Report content receiving module

230:名詞擷取模組 230: Noun extraction module

240:資料庫模組 240: Database module

241:毒性化學物質資料庫 241:Toxic Chemical Substance Database

242:基因資料庫 242:Gene database

250:文字定義模組 250: Text definition module

260:關聯性決定模組 260: Relevance decision module

270:註解輸出模組 270: Annotation output module

Claims (5)

一種與毒性化學物質相關的基因檢測報告之註解系統,安裝於一伺服主機,包含:一報告內容接收模組,接收一與毒性化學物質相關的基因檢測報告的檔案;一名詞擷取模組,自該與毒性化學物質相關的基因檢測報告檔案中除去中文部分,並擷取剩餘的複數個字串;一資料庫模組,包括:一毒性化學物質資料庫,關聯性儲存複數個毒性化學物質在美國國立醫學圖書館醫學主題詞庫(Medical Subject Headings,MeSH)的一Mesh ID、每一毒性化學物質的複數個名稱與化學表示式、每一毒性化學物質導致的複數個疾病名稱,及每一毒性化學物質在MeSH描述符資料(Descriptor Data)中的藥物行動(Pharm Action)的中文譯名;及一基因資料庫,關聯性儲存複數個基因在美國國家生物技術資訊中心定義之基因名稱、每一基因的複數個別名、每一基因的複數個表達式與位點代號、每一基因的中文註解,及每一基因因缺陷而導致的複數個疾病名稱;一文字定義模組,將該些字串於該資料庫模組中比對以找出每一字串對應的Mesh ID或基因名稱;一關聯性決定模組,將該文字定義模組找出的每一Mesh ID與每一基因名稱組成一關聯比較組,及排除Mesh ID與基因名稱各自對應的疾病名稱中相同的數量小於3個的關聯比較組;及一註解輸出模組,將該關聯性決定模組保留的關聯比較組中,列舉輸出每一關聯比較組的Mesh ID對應的Pharm Action的中文譯名、基因名稱,及關聯比較 組中Mesh ID與基因名稱相同之對應的疾病名稱為該與毒性化學物質相關的基因檢測報告檔案的註解。 An annotation system for genetic testing reports related to toxic chemical substances, installed on a server host, including: a report content receiving module that receives a file of genetic testing reports related to toxic chemical substances; a noun extraction module, Remove the Chinese part from the genetic test report file related to toxic chemical substances, and retrieve the remaining plural strings; a database module includes: a toxic chemical substance database, which stores a plurality of toxic chemical substances in association A Mesh ID in the Medical Subject Headings (MeSH) of the U.S. National Library of Medicine, multiple names and chemical expressions of each toxic chemical substance, multiple names of diseases caused by each toxic chemical substance, and each A Chinese translation of the drug action (Pharm Action) of a toxic chemical substance in the MeSH descriptor data (Descriptor Data); and a gene database that associates multiple genes with the gene names defined by the National Center for Biotechnology Information in the United States. A plurality of aliases for a gene, a plurality of expressions and site codes for each gene, a Chinese annotation for each gene, and a plurality of disease names caused by defects in each gene; a text definition module that combines these words The strings are compared in the database module to find the Mesh ID or gene name corresponding to each string; a correlation determination module is used to compare each Mesh ID and each gene name found by the text definition module. Form an association comparison group, and exclude association comparison groups with less than 3 identical disease names corresponding to the Mesh ID and gene name; and an annotation output module, which will be included in the association comparison group retained by the association determination module. , list and output the Chinese translation of the Pharm Action corresponding to the Mesh ID of each association comparison group, the gene name, and the association comparison The disease name corresponding to the same Mesh ID and gene name in the group is the annotation of the gene test report file related to the toxic chemical substance. 如請求項1所述之與毒性化學物質相關的基因檢測報告之註解系統,進一步包含一圖形文字辨識模組,於接收一紙本毒性化學物質相關的基因檢測報告的一掃描圖檔後,由該掃描圖檔中辨識並擷取文字部分,以組成該與毒性化學物質相關的基因檢測報告檔案。 The annotation system for genetic test reports related to toxic chemical substances as described in claim 1 further includes a graphic text recognition module, which after receiving a scanned image file of a paper genetic test report related to toxic chemical substances, The text portion is identified and extracted from the scanned image file to form the genetic test report file related to toxic chemical substances. 如請求項1所述之與毒性化學物質相關的基因檢測報告之註解系統,其中該些字串中包含空格。 The annotation system for genetic test reports related to toxic chemical substances as described in claim 1, wherein the strings contain spaces. 如請求項1所述之與毒性化學物質相關的基因檢測報告之註解系統,其中毒性化學物質的名稱包含通用英文名稱、國際純粹與應用化學聯合會命名原則定義的名稱、CAS編號、歐洲分子生物學實驗室定義的ChEMBL ID、具有生物意義的化學實體資料庫定義的ChEBI ID,與有機小分子生物活性資料庫定義的PubChem CID。 Annotation system for genetic test reports related to toxic chemical substances as described in request item 1, in which the names of toxic chemical substances include common English names, names defined by the naming principles of the International Union of Pure and Applied Chemistry, CAS numbers, and European Molecular Biology The ChEMBL ID defined by the biological laboratory, the ChEBI ID defined by the biologically significant chemical entity database, and the PubChem CID defined by the organic small molecule biological activity database. 如請求項1所述之與毒性化學物質相關的基因檢測報告之註解系統,其中疾病名稱由衛生福利部公布之國際疾病分類標準第10版所定義,基因的表達式包含HUGO基因命名委員會定義的HGNC ID、歐洲分子生物學實驗室Ensembl計畫定義的Ensembl ID、在線人類孟德爾遺傳(Online Mendelian Inheritance in Man,OMIM)資料庫定義的OMIM ID、UniProt知識庫定義的UniProt ID、Entrez Gene資料庫定義的基因編號,及美國國家生物技術資訊中心基因組參考協會人類建立第38版(Genome Reference Consortium Human Build 38)定義的位置說明。 An annotation system for genetic test reports related to toxic chemical substances as described in request 1, in which the disease name is defined by the 10th edition of the International Standard for Classification of Diseases published by the Ministry of Health and Welfare, and the expression of the gene includes the HUGO gene nomenclature committee defined HGNC ID, Ensembl ID defined by the European Molecular Biology Laboratory Ensembl project, OMIM ID defined by the Online Mendelian Inheritance in Man (OMIM) database, UniProt ID defined by the UniProt knowledge base, Entrez Gene database The defined gene number, and the position description defined by the National Center for Biotechnology Information Genome Reference Consortium Human Build 38 (Genome Reference Consortium Human Build 38).
TW110113076A 2021-04-12 2021-04-12 Annotation system for genetic test reports related to toxic chemical substances TWI817104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW110113076A TWI817104B (en) 2021-04-12 2021-04-12 Annotation system for genetic test reports related to toxic chemical substances

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW110113076A TWI817104B (en) 2021-04-12 2021-04-12 Annotation system for genetic test reports related to toxic chemical substances

Publications (2)

Publication Number Publication Date
TW202240460A TW202240460A (en) 2022-10-16
TWI817104B true TWI817104B (en) 2023-10-01

Family

ID=85460460

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110113076A TWI817104B (en) 2021-04-12 2021-04-12 Annotation system for genetic test reports related to toxic chemical substances

Country Status (1)

Country Link
TW (1) TWI817104B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160068906A1 (en) * 2014-09-08 2016-03-10 Baby Genes, Inc. Method of screening newborns for gene variants
CN106227992A (en) * 2016-07-13 2016-12-14 为朔医学数据科技(北京)有限公司 A kind of recommendation method and system of therapeutic scheme
CN108920453A (en) * 2018-06-08 2018-11-30 医渡云(北京)技术有限公司 Data processing method, device, electronic equipment and computer-readable medium
CN111564178A (en) * 2020-04-15 2020-08-21 圣湘生物科技股份有限公司 Method, apparatus, device and storage medium for generating gene polymorphism analysis report
CN111627509A (en) * 2020-05-07 2020-09-04 圣湘生物科技股份有限公司 Method, device, equipment and storage medium for generating virus gene detection report

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160068906A1 (en) * 2014-09-08 2016-03-10 Baby Genes, Inc. Method of screening newborns for gene variants
CN106227992A (en) * 2016-07-13 2016-12-14 为朔医学数据科技(北京)有限公司 A kind of recommendation method and system of therapeutic scheme
CN108920453A (en) * 2018-06-08 2018-11-30 医渡云(北京)技术有限公司 Data processing method, device, electronic equipment and computer-readable medium
CN111564178A (en) * 2020-04-15 2020-08-21 圣湘生物科技股份有限公司 Method, apparatus, device and storage medium for generating gene polymorphism analysis report
CN111627509A (en) * 2020-05-07 2020-09-04 圣湘生物科技股份有限公司 Method, device, equipment and storage medium for generating virus gene detection report

Also Published As

Publication number Publication date
TW202240460A (en) 2022-10-16

Similar Documents

Publication Publication Date Title
US10957433B2 (en) Clinical concept identification, extraction, and prediction system and related methods
Wu et al. Using text mining to extract depressive symptoms and to validate the diagnosis of major depressive disorder from electronic health records
McCue et al. The scope of big data in one medicine: unprecedented opportunities and challenges
CN108920453B (en) Data processing method and device, electronic equipment and computer readable medium
Uzuner et al. Evaluating the state-of-the-art in automatic de-identification
Klinger et al. Detection of IUPAC and IUPAC-like chemical names
Nystrom et al. Memes: A motif analysis environment in R using tools from the MEME Suite
Hahn et al. Mining the pharmacogenomics literature—a survey of the state of the art
Dai et al. Recognition and evaluation of clinical section headings in clinical documents using token-based formulation with conditional random fields
CN113257377B (en) Method, device, electronic equipment and storage medium for determining target user
Kafkas et al. PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research
Fu et al. Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows
Callahan et al. Ontologizing health systems data at scale: making translational discovery a reality
Haghshenas et al. Detection of a DNA methylation signature for the intellectual developmental disorder, X-linked, syndromic, armfield type
Kayaalp et al. The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them
Rinaldi et al. Terminological resources for text mining over biomedical scientific literature
Hawkins et al. Systematic tissue annotations of genomics samples by modeling unstructured metadata
Bayramli et al. Predictive structured–unstructured interactions in EHR models: A case study of suicide prediction
Hardisty et al. Comparative hepatic toxicity: prechronic/chronic liver toxicity in rodents
TWI817104B (en) Annotation system for genetic test reports related to toxic chemical substances
Hofmann-Apitius et al. Knowledge environments representing molecular entities for the virtual physiological human
Sebastiani et al. Bayesian machine learning and its potential applications to the genomic study of oral oncology
Savova et al. Natural language processing: applications in pediatric research
Alnazzawi Building a semantically annotated corpus for chronic disease complications using two document types
Izarzugaza et al. Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining