CN109524068A - A kind of disease symptoms extracting method based on AC automatic machine - Google Patents
A kind of disease symptoms extracting method based on AC automatic machine Download PDFInfo
- Publication number
- CN109524068A CN109524068A CN201811201375.9A CN201811201375A CN109524068A CN 109524068 A CN109524068 A CN 109524068A CN 201811201375 A CN201811201375 A CN 201811201375A CN 109524068 A CN109524068 A CN 109524068A
- Authority
- CN
- China
- Prior art keywords
- word
- health record
- electronic health
- symptom
- automatic machine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
- G06F40/157—Transformation using dictionaries or tables
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Medicines Containing Plant Substances (AREA)
Abstract
The present invention provides a kind of disease symptoms extracting methods based on AC automatic machine, step 1: constructing dictionary tree using symptom word dictionary;Step 2: carrying out unsuccessfully the construction of pointer, realize AC automatic machine algorithm;Step 3: electronic health record information is converted into the coded format of UTF-8;Step 4: the symptom word in electronic health record information being matched using AC automatic machine algorithm;If completely matched, the symptom word is marked and extracted, while continuing to read electronic health record information down, until reading terminal;Step 5: if having matched one or more word, but could not successful match, the father node of the position takes unsuccessfully node upwards along symptom dictionary tree, and enters step 4.The present invention can carry out the symptom word in unstructured electronic health record effectively and quickly to extract, to facilitate the research of the automatic monitoring aspect of adverse drug reaction, help to realize the design and optimization of adverse drug reaction spontaneous reporting system.
Description
Technical field
The present invention relates to symptom matching technique fields in the non-structural medicine text such as electronic health record, more particularly to a kind of medicine
The involved disease symptoms extracting method of object adverse reaction detection.
Background technique
The symptom information generated after the medication wherein covered is extracted from the unstructured electronic health record information of patient, is
Realize the basis of adverse drug reaction monitored automatically.
Aho-Corasick automatic machine algorithm (abbreviation AC automatic machine algorithm) originates from dictionary tree algorithm, is main multimode
One of formula matching algorithm.AC automatic machine algorithm possesses linear worst time complexity, flexible height, the short mode of tolerable, can resist
Outstanding advantages of complexity is attacked, is one of the On-line matching algorithm of presently relevant field technical staff first choice.
AC automatic machine algorithm is primarily adapted for use in pattern match field, intrusion detection field and quick Chinese word segmentation neck
Domain.However, application or blank of the AC automatic machine algorithm in the extraction of disease symptoms word, based on the understanding to its advantage,
This invention address that by AC automatic machine algorithm improvement and being applied in the extraction of disease symptoms word.
Summary of the invention
The technical problem to be solved by the present invention is how to have in the long text shaped like unstructured electronic health record information
It imitates and quickly extracts symptom word caused by the bad kickback of using medicine covered in medical record information.
In order to solve the above-mentioned technical problem, the technical solution of the present invention is to provide a kind of disease symptoms based on AC automatic machine
Extracting method, it is characterised in that:, this method is made of following 5 steps:
Step 1: constructing dictionary tree using symptom word dictionary;
Step 2: carrying out unsuccessfully the construction of pointer, realize AC automatic machine algorithm;
Step 3: electronic health record information is converted into the coded format of UTF-8;
Step 4: using AC automatic machine algorithm to symptom word caused by the bad kickback of using medicine in electronic health record information
It is matched;If completely having matched the symptom word in symptom word dictionary in electronic health record information, mark simultaneously
The symptom word is extracted, while continuing to read electronic health record information down, until reading terminal;
Step 5: if having matched one or more word, but could not successful match, along dictionary tree the position father knot
Point takes unsuccessfully node upwards, and enters step 4.
Preferably, in the step 1, when constructing dictionary tree, side on from root node to the path of any one node
Ordered set represents the correspondence prefix of symptom word in symptom word dictionary.
Preferably, the detailed process of the step 2 are as follows: one pointer of setting, original state are directed toward symptom word dictionary
Root node traverses electronic health record information, for each of electronic health record information word, if with symptom word word from front to back
The corresponding word of pointer in allusion quotation is identical, then pointer is directed toward the child node of the word, circulation matching is until failure, unsuccessfully pointer at this time
The node of direction continues same matching, when encountering termination node, counter+1.
Method provided by the invention can be under big data environmental background, the medication in unstructured electronic health record be bad
It reacts generated symptom word effectively and quickly extract, to facilitate the automatic monitoring aspect of adverse drug reaction
Research, help to realize the design and optimization of adverse drug reaction spontaneous reporting system.
Detailed description of the invention
Fig. 1 is to construct dictionary tree exemplary diagram based on symptom word dictionary;
Fig. 2 is the construction exemplary diagram of failure pointer.
Specific embodiment
Present invention will be further explained below with reference to specific examples.
A kind of disease symptoms extracting method based on AC automatic machine is present embodiments provided first to use using AC automatic machine
Symptom word dictionary constructs dictionary tree, then carries out unsuccessfully the construction of pointer.After AC automatic machine is realized, confirmation character string is UTF-
After 8 coded format, the matching of symptom word is carried out.
Specific implementation process is:
Step 1: constructing dictionary tree based on symptom word dictionary.Side on from root node to the path of any one node
Ordered set represent the correspondence prefix of symptom word in dictionary.As shown in Figure 1, " tinnitus " and " earplug " has common prefix
" ear ", " dizziness " and " headache " have common prefix " head ".
Second step carries out unsuccessfully the construction of pointer.A pointer is set, original state is directed toward the root node of symptom dictionary,
Electronic health record information is traversed from front to back, for each of electronic health record information word, if with the pointer in symptom dictionary
Corresponding word is identical, then pointer is directed toward the child node of the word, circulation is matched until failure, the node that unsuccessfully pointer is directed toward at this time
Continue same matching, when encountering termination node, counter+1.As shown in Fig. 2, in the matching process, failure pointer is from " point
Ear " jumps to " tinnitus ", does not return to origin and restarts to match, but from No. 4 position transfers to No. 3 positions, such algorithm
Time complexity be it is linear, do not do any duplicate matching.
Third step, the conversion of electronic health record message encoding format.Before the matching for entering symptom word, by electronics disease
The coded format that information is converted into UTF-8 is gone through, 16 in UTF-8 coding are encoded to 0800-FFFF due to Chinese character, institute
To indicate Chinese character with English alphabet to reach at 4 16 binary digits using by Chinese character separating.
4th step, matches character string.If completely had matched in symptom dictionary in electronic health record information
Word marks and extracts word, while continuing to read electronic health record information down until reading terminal.
5th step, if having matched one or more word, but could not successful match, the position along symptom dictionary tree
Father node takes unsuccessfully node upwards, and enters the 4th step.
Method provided in this embodiment carries out symptom using AC automatic machine for the unstructured electronic health record information of importing
The matching of word simultaneously is completed to extract.This method has taken into account the energy that dictionary tree solves the problems, such as word (short text) multi-mode matching
The advantages of power and KMP algorithm solve the problems, such as the ability of the single pattern matching of long text, integrate dpd mode matching algorithm, tool
Linear worst time complexity, good efficiency, high flexibility opinion can resist the advantages that complexity attack.
The above, only presently preferred embodiments of the present invention, not to the present invention in any form with substantial limitation,
It should be pointed out that under the premise of not departing from the method for the present invention, can also be made for those skilled in the art
Several improvement and supplement, these are improved and supplement also should be regarded as protection scope of the present invention.All those skilled in the art,
Without departing from the spirit and scope of the present invention, when made using disclosed above technology contents it is a little more
Dynamic, modification and the equivalent variations developed, are equivalent embodiment of the invention;Meanwhile all substantial technologicals pair according to the present invention
The variation, modification and evolution of any equivalent variations made by above-described embodiment, still fall within the range of technical solution of the present invention
It is interior.
Claims (3)
1. a kind of disease symptoms extracting method based on AC automatic machine, which is characterized in that this method is made of following 5 steps:
Step 1: constructing dictionary tree using symptom word dictionary;
Step 2: carrying out unsuccessfully the construction of pointer, realize AC automatic machine algorithm;
Step 3: electronic health record information is converted into the coded format of UTF-8;
Step 4: symptom word caused by the bad kickback of using medicine in electronic health record information being carried out using AC automatic machine algorithm
Matching;If completely having matched the symptom word in symptom word dictionary in electronic health record information, marks and extract
The symptom word out, while continuing to read electronic health record information down, until reading terminal;
Step 5: if having matched one or more word, but could not successful match, along dictionary tree the father node of the position to
On take unsuccessfully node, and enter step 4.
2. a kind of disease symptoms extracting method based on AC automatic machine as described in claim 1, it is characterised in that: the step
In 1, when constructing dictionary tree, the ordered set on the side on from root node to the path of any one node represents symptom word dictionary
The correspondence prefix of middle symptom word.
3. a kind of disease symptoms extracting method based on AC automatic machine as described in claim 1, it is characterised in that: the step
2 detailed process are as follows: one pointer of setting, original state are directed toward the root node of dictionary tree, traverse electronic health record letter from front to back
Breath, for each of electronic health record information word, if identical as the corresponding word of pointer in symptom word dictionary, refers to
Needle is directed toward the child node of the word, and until failure, the node that unsuccessfully pointer is directed toward at this time continues same matching for circulation matching, works as chance
To termination node, counter+1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811201375.9A CN109524068A (en) | 2018-10-16 | 2018-10-16 | A kind of disease symptoms extracting method based on AC automatic machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811201375.9A CN109524068A (en) | 2018-10-16 | 2018-10-16 | A kind of disease symptoms extracting method based on AC automatic machine |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109524068A true CN109524068A (en) | 2019-03-26 |
Family
ID=65770865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811201375.9A Pending CN109524068A (en) | 2018-10-16 | 2018-10-16 | A kind of disease symptoms extracting method based on AC automatic machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109524068A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191103A (en) * | 2019-12-30 | 2020-05-22 | 河南拓普计算机网络工程有限公司 | Method, device and storage medium for identifying and analyzing enterprise subject information from internet |
CN111341458A (en) * | 2020-02-27 | 2020-06-26 | 国家卫生健康委科学技术研究所 | Single-gene disease name recommendation method and system based on multi-level structure similarity |
CN113555069A (en) * | 2021-07-22 | 2021-10-26 | 杭州叙简科技股份有限公司 | Chemical name retrieval and extraction method and device based on AC automaton |
CN114580414A (en) * | 2022-02-24 | 2022-06-03 | 医渡云(北京)技术有限公司 | Entity identification method and device based on AC automaton and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102193914A (en) * | 2011-05-26 | 2011-09-21 | 中国科学院计算技术研究所 | Computer aided translation method and system |
CN105183788A (en) * | 2015-08-20 | 2015-12-23 | 及时标讯网络信息技术(北京)有限公司 | Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree |
CN107392143A (en) * | 2017-07-20 | 2017-11-24 | 中国科学院软件研究所 | A kind of resume accurate Analysis method based on SVM text classifications |
CN108021569A (en) * | 2016-11-01 | 2018-05-11 | 中国移动通信有限公司研究院 | The structure of AC automatic machines and Chinese multi-model matching method and relevant apparatus |
CN105260354B (en) * | 2015-08-20 | 2018-08-21 | 及时标讯网络信息技术(北京)有限公司 | A kind of Chinese AC automatic machines working method based on keyword dictionary tree construction |
CN108628907A (en) * | 2017-03-24 | 2018-10-09 | 北京京东尚科信息技术有限公司 | A method of being used for the Trie tree multiple-fault diagnosis based on Aho-Corasick |
-
2018
- 2018-10-16 CN CN201811201375.9A patent/CN109524068A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102193914A (en) * | 2011-05-26 | 2011-09-21 | 中国科学院计算技术研究所 | Computer aided translation method and system |
CN105183788A (en) * | 2015-08-20 | 2015-12-23 | 及时标讯网络信息技术(北京)有限公司 | Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree |
CN105260354B (en) * | 2015-08-20 | 2018-08-21 | 及时标讯网络信息技术(北京)有限公司 | A kind of Chinese AC automatic machines working method based on keyword dictionary tree construction |
CN108021569A (en) * | 2016-11-01 | 2018-05-11 | 中国移动通信有限公司研究院 | The structure of AC automatic machines and Chinese multi-model matching method and relevant apparatus |
CN108628907A (en) * | 2017-03-24 | 2018-10-09 | 北京京东尚科信息技术有限公司 | A method of being used for the Trie tree multiple-fault diagnosis based on Aho-Corasick |
CN107392143A (en) * | 2017-07-20 | 2017-11-24 | 中国科学院软件研究所 | A kind of resume accurate Analysis method based on SVM text classifications |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191103A (en) * | 2019-12-30 | 2020-05-22 | 河南拓普计算机网络工程有限公司 | Method, device and storage medium for identifying and analyzing enterprise subject information from internet |
CN111191103B (en) * | 2019-12-30 | 2021-08-24 | 河南拓普计算机网络工程有限公司 | Method, device and storage medium for identifying and analyzing enterprise subject information from internet |
CN111341458A (en) * | 2020-02-27 | 2020-06-26 | 国家卫生健康委科学技术研究所 | Single-gene disease name recommendation method and system based on multi-level structure similarity |
CN111341458B (en) * | 2020-02-27 | 2020-11-03 | 国家卫生健康委科学技术研究所 | Single-gene disease name recommendation method and system based on multi-level structure similarity |
CN113555069A (en) * | 2021-07-22 | 2021-10-26 | 杭州叙简科技股份有限公司 | Chemical name retrieval and extraction method and device based on AC automaton |
CN114580414A (en) * | 2022-02-24 | 2022-06-03 | 医渡云(北京)技术有限公司 | Entity identification method and device based on AC automaton and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109524068A (en) | A kind of disease symptoms extracting method based on AC automatic machine | |
Yu et al. | Self-chained image-language model for video localization and question answering | |
CN103838875B (en) | A kind of information acquisition system and its method based on Quick Response Code | |
CN113468888A (en) | Entity relation joint extraction method and device based on neural network | |
CN104598577B (en) | A kind of extracting method of Web page text | |
CN102185762B (en) | Method for recognizing, extracting user data sending behavior | |
CN105677710A (en) | Processing method and system of big data | |
CN102867049B (en) | Chinese PINYIN quick word segmentation method based on word search tree | |
CN107992211A (en) | A kind of Chinese character spelling wrong word correcting method based on CNN-LSTM | |
CN106095735A (en) | A kind of method plagiarized based on deep neural network detection academic documents | |
CN107729316A (en) | The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese | |
CN108647511A (en) | The password strength assessment method derived based on weak passwurd | |
CN113971404A (en) | Cultural relic security named entity identification method based on decoupling attention | |
CN111190873B (en) | Log mode extraction method and system for log training of cloud native system | |
CN105068889A (en) | Method for recovering completely deleted files in Ext3/Ext4 | |
CN104360988B (en) | The recognition methods of the coded system of Chinese character and device | |
CN107239520A (en) | A kind of universal forum context extraction method | |
CN104079450A (en) | Method and device for generating characteristic pattern set | |
CN105592087A (en) | DNP abnormity detection method based on vector machine learning | |
CN117056475A (en) | Knowledge graph-based intelligent manufacturing question-answering method, device and storage medium | |
CN116166768A (en) | Text knowledge extraction method and system based on rules | |
CN116776889A (en) | Guangdong rumor detection method based on graph convolution network and external knowledge embedding | |
CN106055542B (en) | A kind of text snippet automatic generation method and system based on temporal knowledge extraction | |
CN108021711A (en) | A kind of method of information processing | |
CN105975451A (en) | Processing system and method for DWG-format-file translation data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190326 |
|
RJ01 | Rejection of invention patent application after publication |