CN110085290A - The breast molybdenum target of heterogeneous information integration is supported to report semantic tree method for establishing model - Google Patents

The breast molybdenum target of heterogeneous information integration is supported to report semantic tree method for establishing model Download PDF

Info

Publication number
CN110085290A
CN110085290A CN201910256713.7A CN201910256713A CN110085290A CN 110085290 A CN110085290 A CN 110085290A CN 201910256713 A CN201910256713 A CN 201910256713A CN 110085290 A CN110085290 A CN 110085290A
Authority
CN
China
Prior art keywords
text
molybdenum target
semantic
semantic tree
breast cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910256713.7A
Other languages
Chinese (zh)
Inventor
李继云
孙莉
黄鹏
顾莹莹
李凯华
乐嘉锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
National Dong Hwa University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201910256713.7A priority Critical patent/CN110085290A/en
Publication of CN110085290A publication Critical patent/CN110085290A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The present invention relates to a kind of breast molybdenum targets for supporting heterogeneous information integration to report semantic tree method for establishing model, which comprises the following steps: forms the text normalization database of breast cancer molybdenum target Findings text description;The text description for obtaining breast cancer molybdenum target Findings in real time, the division of progress phrase is described according to semantic information based on text normalization database to text;Obtain the semantic constraint of entity;Form the semantic tree of text description.The present invention is realized by way of constructing breast molybdenum target semantic tree by the text information structuring from Different hospital, different doctors, complicated breast cancer molybdenum target images, realizes the semantic-based integrated of Heterogeneous Information.

Description

The breast molybdenum target of heterogeneous information integration is supported to report semantic tree method for establishing model
Technical field
The present invention relates to a kind of breast molybdenum targets for supporting heterogeneous information integration to report semantic tree method for establishing model, belongs to doctor Learn text structure process field.
Background technique
With the high speed development of medical information, current 80% hospital has been completed that Informatization Service is built.Nowadays Electronic health record also instead of papery case history, but invariably the diagnosis report of patient be still according to the knowledge of doctor and Working experience by natural language to the non-structured description of region of interest, can natural language be that computer is unable to Direct Recognition With processing.
The key that text structureization processing develops as artificial intelligence in medical field, MedLEE (Medical Language Extraction and Encoding System)、UMLS(The Unified Medical Language ) etc. System external natural language processing system is very perfect, but due to Chinese with English in semantic, syntactic structure Greatest differences, it is poor to the portability of Chinese medical text.The country starts late to the research of medicine text structure, uses for reference External existing technology also achieves many breakthrough progress, but for breast molybdenum target diagnostic imaging report text structuring Research it is still few.
Summary of the invention
The object of the present invention is to provide a kind of breast molybdenum target diagnostic imaging report text structuring processing methods.
In order to achieve the above object, the technical solution of the present invention is to provide a kind of mammary gland molybdenums for supporting heterogeneous information integration Target reports semantic tree method for establishing model, which comprises the following steps:
Step 1, the text normalization database that the description of breast cancer molybdenum target Findings text is formed according to Expert Rules, text It is stored in this standardization database and describes relevant to meet current medicine Discipline Maturity to breast cancer molybdenum target Findings text Phrase;
Step 2, the text description for obtaining breast cancer molybdenum target Findings in real time, are based on text normalization according to semantic information Database describes to carry out the division of phrase and removes unwanted redundancy to text, extracts related to breast cancer diagnosis Description, divide the range of each entity, wherein using the classification results of lesion, using each lesion as an entity;
Step 3, the semantic constraint for obtaining entity;
The semantic tree for the text description that step 4, forming step 2 obtain, the root node of semantic tree are entity, semantic tree it is interior Portion's node is each attribute of entity, and leaf node is the corresponding attribute description of each attribute.
Preferably, further include step 5: the semantic tree obtained to previous step visualizes.
The present invention is realized by way of constructing breast molybdenum target semantic tree will be from Different hospital, different doctors, multiple The semantic-based integrated of Heterogeneous Information is realized in the text information structuring of miscellaneous breast cancer molybdenum target image.
Detailed description of the invention
Fig. 1 is that Chinese breast molybdenum target Findings text semantic tree constructs flow chart, and main process is as follows: input is to be processed Breast molybdenum target image text;Text is segmented;The main node of semantic tree is found out according to text feature, and utilizes it Semantic constraint finds its leaf node;The node of semantic tree is hung up into leaf node according to input sequence, is completed to the semantic tree Scanning.
The text of Fig. 2 Chinese breast molybdenum target Findings segments sample, has chosen a breast cancer mesh target image text and retouches A clause in stating is being segmented as a result, from participle as a result, in the case where not considering omission, it can be seen that Chinese The syntactic structure of a clause may be summarized to be position+subject+predicate+different attribute and retouch in breast molybdenum target Findings text It states.Utilize such structure the classification that can be quickly found out corresponding to word.
The semantic tree semantic constraint of Fig. 3 Chinese breast molybdenum target Findings constructs, and is incited somebody to action on the basis of being segmented The result sorted out for the associated description of entity according to its feature.Mainly using the part of speech feature of word and using specially Each key words are all assigned a classification by the word stored in database constructed by family's rule.It will be unwanted superfluous Remaining word abandons.
The semantic tree construction of Fig. 4 Chinese breast molybdenum target Findings, according in a breast cancer molybdenum target image text description A clause, the construction of the semantic subtree marked off.Using the method for hierarchy nesting, entity is nested in step by step is included In level, before finding next entity, do not terminate to add the entity attributes, for ignoring those attributes being not present Value.It is all a nesting using the obtained each semantic tree of such method.
Specific embodiment
In order to make the present invention more obvious and understandable, hereby with preferred embodiment, and attached drawing is cooperated to be described in detail below.
A kind of breast molybdenum target for supporting heterogeneous information integration provided by the invention reports that semantic tree method for establishing model includes The text description of breast molybdenum target image under medical scenario for real world expression constructs related semantic tree.Building Chinese cream Mainly comprising the steps that for gland molybdenum target Findings semantic tree forms breast cancer molybdenum target Findings text according to Expert Rules The database of description;The text of breast cancer molybdenum target Findings is described to carry out using priori knowledge with focus characteristic to be to draw The participle work of departure section;The entity in the text description of breast cancer molybdenum target Findings is subjected to semanteme about using features described above Beam;According to semantic constraint, will have related node in semantic tree and interconnect, constitutes a complete breast cancer molybdenum target image Semantic tree.
The text normalization database of breast cancer molybdenum target Findings
Firstly, constructing the database of breast cancer molybdenum target Findings description according to Expert Rules.Not yet due to the country Breast cancer medical imaging describes to form unified specification, it would be desirable to analyze different dept. of radiology's specialists for breast molybdenum target shadow As the description of performance.By investigation Different hospital for the different structure of breast molybdenum target Findings text and to the progress of its content Analysis, obtains the phrase for meeting current medicine Discipline Maturity.By above method by the text about breast molybdenum target Radiologic imaging This Description standardization forms the standard Unify legislation for breast cancer symptom.
The text of breast cancer molybdenum target Findings segments
The division for carrying out according to semantic information phrase is described for the text of the breast molybdenum target image of input and will not The redundancy removal needed, extracts important description relevant to breast cancer diagnosis.It is close for meaning and belong to same class The word of description constructs thesaurus, guarantees effective identification near synonym, enhances the scalability of semantic tree.By for shadow As the observation and summary of text, it can be deduced that such conclusion: centered on entity, entity is the description to its position before, It is the description to its each attributive character after entity, thus more can efficiently divides the range of each entity.Due to doctor Treating image description has the characteristics that several words, few verb, slightly subject, it is especially desirable to pay attention to the area for noun or nominal phrase Point.Following 6 class can be divided into according to the taxeme of participle.
Classification Number
Entity 1
Predicate 2
Attribute 3
Value 4
Quantifier 5
Distribution 6
The semantic constraint of breast cancer molybdenum target Findings text
Text structure by investigating understanding breast molybdenum target Findings early period will be each using the classification results of lesion A lesion is as an entity, and according to division of teaching contents, according to its different characteristics, entity possesses different semantic constraints.It is embodied in It is exactly in the form of leaf node on semantic tree.Since the text description of molybdenum target Findings generally can be to skin etc. with sign It is described, but only can just have relevant statement when it occurs for pernicious lesion, thus this category feature is needed more Filling meaning.It is finally obtained the result is that comprehensively consider practical application scene and breast molybdenum target image text description in lesion difference The result that feature is obtained.
The semantic tree of breast cancer molybdenum target Findings text constructs
By semantic constraint, by semantic tree between different entities, entity and its attribute and attribute and its value contact Get up.It should be noted that syntactic structure in text, the connection after participle in sentence between each word.There may be multiple entities Share the case where same attributive character describes, it is also possible to which there are the same alike results of single entity to possess a variety of descriptions.Herein Not only to consider that comma, fullstop and the conjunction etc. that include in the breast molybdenum target image text description of input play separation in the process The content of effect, it is also necessary to consider the relationship between context, guarantee semantic complete smoothness.
The semantic tree of breast cancer molybdenum target Findings text visualizes
Semantic tree visualization presents the knot after breast cancer molybdenum target Findings text structure with more intuitive way Fruit can easily show the mode classification of molybdenum target Findings text.Semantic tree is more applicable for due to its tree-shaped structure Visualization, and traditional way of output is difficult to clearly convey the structure and content of semantic tree.Visual semantic tree is also convenient for root It is searched and is watched according to its different characteristic.

Claims (2)

1. it is a kind of support heterogeneous information integration breast molybdenum target report semantic tree method for establishing model, which is characterized in that including with Lower step:
Step 1, the text normalization database that the description of breast cancer molybdenum target Findings text is formed according to Expert Rules, text rule It is stored in generalized database and describes relevant to meet the short of current medicine Discipline Maturity to breast cancer molybdenum target Findings text Language;
Step 2, the text description for obtaining breast cancer molybdenum target Findings in real time, are based on text normalization data according to semantic information Library describes to carry out the division of phrase and removes unwanted redundancy to text, extracts retouch relevant to breast cancer diagnosis It states, divides the range of each entity, wherein using the classification results of lesion, using each lesion as an entity;
Step 3, the semantic constraint for obtaining entity;
The semantic tree for the text description that step 4, forming step 2 obtain, the root node of semantic tree are entity, and the inside of semantic tree is saved Point is each attribute of entity, and leaf node is the corresponding attribute description of each attribute.
2. a kind of breast molybdenum target report semantic tree method for establishing model for supporting heterogeneous information integration as described in claim 1, It is characterized in that, further including step 5:
The semantic tree obtained to previous step visualizes.
CN201910256713.7A 2019-04-01 2019-04-01 The breast molybdenum target of heterogeneous information integration is supported to report semantic tree method for establishing model Pending CN110085290A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910256713.7A CN110085290A (en) 2019-04-01 2019-04-01 The breast molybdenum target of heterogeneous information integration is supported to report semantic tree method for establishing model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910256713.7A CN110085290A (en) 2019-04-01 2019-04-01 The breast molybdenum target of heterogeneous information integration is supported to report semantic tree method for establishing model

Publications (1)

Publication Number Publication Date
CN110085290A true CN110085290A (en) 2019-08-02

Family

ID=67413908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910256713.7A Pending CN110085290A (en) 2019-04-01 2019-04-01 The breast molybdenum target of heterogeneous information integration is supported to report semantic tree method for establishing model

Country Status (1)

Country Link
CN (1) CN110085290A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765274A (en) * 2019-10-10 2020-02-07 东华大学 Method for automatically generating ultrasonic report by voice input thyroid ultrasonic abnormal description
CN111429406A (en) * 2020-03-05 2020-07-17 北京深睿博联科技有限责任公司 Method and device for detecting breast X-ray image lesion by combining multi-view reasoning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078190A1 (en) * 2000-09-29 2004-04-22 Fass Daniel C Method and system for describing and identifying concepts in natural language text for information retrieval and processing
US7209923B1 (en) * 2006-01-23 2007-04-24 Cooper Richard G Organizing structured and unstructured database columns using corpus analysis and context modeling to extract knowledge from linguistic phrases in the database
US20090077113A1 (en) * 2005-05-12 2009-03-19 Kabire Fidaali Device and method for semantic analysis of documents by construction of n-ary semantic trees
CN102651055A (en) * 2012-04-11 2012-08-29 华中科技大学 Method and system for generating file based on medical image
CN107423289A (en) * 2017-07-19 2017-12-01 东华大学 A kind of structuring processing method of across type of mammary clinical tumor document

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078190A1 (en) * 2000-09-29 2004-04-22 Fass Daniel C Method and system for describing and identifying concepts in natural language text for information retrieval and processing
US20090077113A1 (en) * 2005-05-12 2009-03-19 Kabire Fidaali Device and method for semantic analysis of documents by construction of n-ary semantic trees
US7209923B1 (en) * 2006-01-23 2007-04-24 Cooper Richard G Organizing structured and unstructured database columns using corpus analysis and context modeling to extract knowledge from linguistic phrases in the database
CN102651055A (en) * 2012-04-11 2012-08-29 华中科技大学 Method and system for generating file based on medical image
CN107423289A (en) * 2017-07-19 2017-12-01 东华大学 A kind of structuring processing method of across type of mammary clinical tumor document

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
俞扬信: "一种基于语义树的三维模型检索方法", 《情报理论与实践》 *
刘玉文等: "一种医学本体多层概念语义关联度度量模型研究", 《九江学院学报(自然科学版)》 *
张晗等: "基于语义图的医学多文档摘要提取模型构建", 《图书情报工作》 *
文必龙等: "一种数据元语义描述方法", 《哈尔滨商业大学学报(自然科学版)》 *
李俊杰: "基于最大熵原理的医疗文本信息结构化", 《临床医学工程》 *
杜先懋等: "医学影像存储与传输系统中结构化报告的初步应用研究", 《中华放射学杂志》 *
田驰远等: "基于依存句法分析的病理报告结构化处理方法", 《计算机研究与发展》 *
陈德华等: "病理镜检文本数据的结构化处理方法", 《计算机与现代化》 *
黄文博等: "一种融合PLSA模型和树模型的文本病历语义分析新方法", 《吉林大学学报(理学版)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765274A (en) * 2019-10-10 2020-02-07 东华大学 Method for automatically generating ultrasonic report by voice input thyroid ultrasonic abnormal description
CN110765274B (en) * 2019-10-10 2023-10-24 东华大学 Method for automatically generating ultrasonic report by voice input thyroid ultrasonic abnormal description
CN111429406A (en) * 2020-03-05 2020-07-17 北京深睿博联科技有限责任公司 Method and device for detecting breast X-ray image lesion by combining multi-view reasoning
CN111429406B (en) * 2020-03-05 2023-10-27 北京深睿博联科技有限责任公司 Mammary gland X-ray image lesion detection method and device combining multi-view reasoning

Similar Documents

Publication Publication Date Title
He et al. Pathvqa: 30000+ questions for medical visual question answering
JP7008772B2 (en) Automatic identification and extraction of medical conditions and facts from electronic medical records
US10929420B2 (en) Structured report data from a medical text report
CN109378053B (en) Knowledge graph construction method for medical image
CN109599185B (en) Disease data processing method and device, electronic equipment and computer readable medium
US9165116B2 (en) Patient data mining
CN112597774B (en) Chinese medical named entity recognition method, system, storage medium and equipment
US8155951B2 (en) Process for constructing a semantic knowledge base using a document corpus
JP5154832B2 (en) Document search system and document search method
US8935155B2 (en) Method for processing medical reports
US20160335403A1 (en) A context sensitive medical data entry system
CN106874643A (en) Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
US20190057773A1 (en) Method and system for performing triage
CN107851121A (en) Identify the mistake in medical data
CN109918672B (en) Structural processing method of thyroid ultrasound report based on tree structure
US9684647B2 (en) Domain-specific computational lexicon formation
CN106502982B (en) The structuring processing method of unstructured Chinese breast ultrasound text
JP2004157623A (en) Search system and search method
Hammami et al. Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach
AU2020407062A1 (en) Unsupervised taxonomy extraction from medical clinical trials
RU2720363C2 (en) Method for generating mathematical models of a patient using artificial intelligence techniques
CN110069639B (en) Method for constructing thyroid ultrasound field ontology
CN110085290A (en) The breast molybdenum target of heterogeneous information integration is supported to report semantic tree method for establishing model
CN111460788A (en) Interactive reading method for CT/PET report
Jebadas et al. Histogram distance metric learning to diagnose breast cancer using semantic analysis and natural language interpretation methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination