CN112131862B - Traditional Chinese medicine medical record data processing method and device and electronic equipment - Google Patents

Traditional Chinese medicine medical record data processing method and device and electronic equipment Download PDF

Info

Publication number
CN112131862B
CN112131862B CN202010700305.9A CN202010700305A CN112131862B CN 112131862 B CN112131862 B CN 112131862B CN 202010700305 A CN202010700305 A CN 202010700305A CN 112131862 B CN112131862 B CN 112131862B
Authority
CN
China
Prior art keywords
chinese medicine
traditional chinese
medical record
data
diagnosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010700305.9A
Other languages
Chinese (zh)
Other versions
CN112131862A (en
Inventor
杨阳
李园白
李萌
刘方舟
张一颖
王静
杜昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Information On Traditional Chinese Medicine Cacms
Original Assignee
Institute Of Information On Traditional Chinese Medicine Cacms
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Information On Traditional Chinese Medicine Cacms filed Critical Institute Of Information On Traditional Chinese Medicine Cacms
Priority to CN202010700305.9A priority Critical patent/CN112131862B/en
Publication of CN112131862A publication Critical patent/CN112131862A/en
Application granted granted Critical
Publication of CN112131862B publication Critical patent/CN112131862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/90ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to alternative medicines, e.g. homeopathy or oriental medicines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Abstract

The embodiment of the invention discloses a method and a device for processing traditional Chinese medicine medical record data and electronic equipment, relates to the technical field of data processing, and aims to improve the accuracy of case and diagnosis division of the traditional Chinese medicine medical record data. The processing method comprises the following steps: acquiring traditional Chinese medicine medical record data to be processed; dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set to obtain case content and diagnosis order content corresponding to the traditional Chinese medicine case data, wherein the basic dictionary comprises identification words related to the traditional Chinese medicine case and the diagnosis order, and the rule set comprises rules related to the traditional Chinese medicine case and the diagnosis order. The method and the device are suitable for case and diagnosis division of the traditional Chinese medicine medical record data.

Description

Traditional Chinese medicine medical record data processing method and device and electronic equipment
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for processing traditional Chinese medicine medical record data and electronic equipment.
Background
The traditional Chinese medicine has a history of thousands of years, a large amount of clinical traditional Chinese medicine medical records are one of main carriers in the inheritance process of the traditional Chinese medicine medical records, the medical records comprise clinical traditional Chinese medicine practitioner clinical dialectical thinking characteristics, prescription medication experience and the like, the knowledge of regularity has important reference value for guiding clinical dialectical diagnosis and treatment, the medical record data is processed, implicit rules in the data are mined, and the important significance for promoting the further development of the traditional Chinese medicine is achieved.
The premise of mining implicit laws is to accurately standardize massive medical record data, the traditional Chinese medical record is disclosed mainly in the form of published periodical literature and published medical record collection, usually, a document or book has a plurality of different medical record cases in a chapter, even a natural section has a plurality of different cases, one case includes a plurality of diagnosis times and the like, and how to divide the cases and the diagnosis times among the cases and the diagnosis times is one of important problems in data processing. In the prior art, the term "case" is used to divide the medical record data, specifically, all text contents behind the first "case" appearing in the medical record data are used as contents of a case and do not relate to division of the number of times of diagnosis, so that when one piece of medical record data includes two cases, the contents of the two cases are used as contents of a case through the prior art, and the accuracy of case and number of times of diagnosis division of the traditional Chinese medicine medical record data is low.
Disclosure of Invention
The embodiment of the invention provides a method and a device for processing traditional Chinese medicine medical record data and electronic equipment, which can improve the accuracy of case and diagnosis classification of the traditional Chinese medicine medical record data. The technical scheme is as follows:
in one aspect, an embodiment of the present invention provides a method for processing data of a traditional Chinese medicine medical record, including: acquiring traditional Chinese medicine medical record data to be processed; dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set to obtain case content and diagnosis order content corresponding to the traditional Chinese medicine case data, wherein the basic dictionary comprises identification words related to the traditional Chinese medicine case and the diagnosis order, and the rule set comprises rules related to the traditional Chinese medicine case and the diagnosis order.
According to a specific implementation manner of the embodiment of the present application, before acquiring the data of the medical records of traditional Chinese medicine to be processed, the method further includes: acquiring data of a first traditional Chinese medicine medical record; performing word segmentation on the first traditional Chinese medicine medical record data and extracting identification words related to cases and diagnosis times; and adding the extracted identification words related to the cases and the diagnosis times into a basic dictionary to obtain an updated basic dictionary.
According to a specific implementation manner of the embodiment of the application, adding the extracted identification words related to the cases and the diagnosis orders into a basic dictionary to obtain an updated basic dictionary includes: extracting the characteristics of the identification words related to the cases and the diagnosis times; calculating a characteristic value corresponding to each identification word by using an evaluation function; sequencing each identification word related to the case and the diagnosis order according to the magnitude of the characteristic value respectively to obtain a sequence of words related to the first traditional Chinese medical record and the diagnosis order; and selecting the identification words corresponding to the feature values in a preset quantity from the sequence, and adding the identification words into the basic dictionary to obtain an updated basic dictionary.
According to a specific implementation manner of the embodiment of the application, before acquiring the medical data of the traditional Chinese medicine to be processed, after performing word segmentation on the medical data of the first traditional Chinese medicine and extracting identification words related to cases and diagnoses, the method further includes: obtaining a first seed rule based on the identification words related to the cases and the diagnosis times and the first traditional Chinese medicine medical record data; performing rule learning based on the first seed rule and the second traditional Chinese medicine medical record data to obtain a first rule set; and adding the first rule set into a rule set to obtain an updated rule set.
According to a specific implementation manner of the embodiment of the present application, after adding the first rule set to a rule set and obtaining an updated rule set, the method further includes: and optimizing the rule set by using a maximum coverage optimization method to obtain an optimized rule set.
According to a specific implementation manner of the embodiment of the application, before acquiring the medical data of the traditional Chinese medicine to be processed, after performing word segmentation on the medical data of the first traditional Chinese medicine and extracting identification words related to cases and diagnoses, the method further includes: performing rule learning based on the identification words related to the cases and the times of diagnosis and the second traditional Chinese medicine medical record data to obtain derivative identification words corresponding to the identification words related to the cases and the times of diagnosis; and adding the derived identification words into the basic dictionary to obtain an updated basic dictionary.
According to a specific implementation manner of the embodiment of the application, the dividing the data of the medical records of traditional Chinese medicine to be segmented based on the basic dictionary and the rule set includes: based on a basic dictionary, carrying out first division on the to-be-processed traditional Chinese medicine medical record data to obtain more than two data sections corresponding to the to-be-processed traditional Chinese medicine medical record data; and performing case division and diagnosis number division on the traditional Chinese medicine medical record data corresponding to the more than two data sections based on a rule set.
In one aspect, an embodiment of the present invention provides a device for processing data of a traditional Chinese medical record, including: the acquisition module is used for acquiring traditional Chinese medicine medical record data to be processed; the dividing module is used for dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set to obtain case content and diagnosis time content corresponding to the traditional Chinese medicine case data, wherein the basic dictionary comprises identification words related to the traditional Chinese medicine case and the diagnosis time, and the rule set comprises rules related to the traditional Chinese medicine case and the diagnosis time.
In one aspect, an embodiment of the present application provides an electronic device, including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, and is used for acquiring the traditional Chinese medicine medical record data to be processed; dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set to obtain case content and diagnosis order content corresponding to the traditional Chinese medicine case data, wherein the basic dictionary comprises identification words related to the traditional Chinese medicine case and the diagnosis order, and the rule set comprises rules related to the traditional Chinese medicine case and the diagnosis order.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
in the embodiment of the invention, the traditional Chinese medicine case data to be processed is obtained, and the case content and the diagnosis time content corresponding to the traditional Chinese medicine case data are obtained by dividing the traditional Chinese medicine case data to be processed based on a basic dictionary and a rule set, wherein the basic dictionary comprises identification words of traditional Chinese medicine cases and diagnosis times, the rule set comprises rules related to the traditional Chinese medicine cases and the diagnosis times, and the traditional Chinese medicine case data are divided based on not only a basic dictionary comprising the identification words related to the traditional Chinese medicine cases and the diagnosis times, but also the traditional Chinese medicine case data are divided based on the rule set related to the traditional Chinese medicine cases and the diagnosis times to obtain the divided medical cases and the divided diagnosis times, so that the accuracy of case and diagnosis time division of the traditional Chinese medicine case data is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart illustrating a method for processing data of a traditional Chinese medical record according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating a method for processing data of a traditional Chinese medicine medical record according to another embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating a method for processing data of a TCM medical record according to another embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating a method for processing data of a traditional Chinese medical record according to yet another embodiment of the present application;
FIG. 5 is a diagram illustrating an exemplary embodiment of identifying PDF formatted files via OCR and corresponding results obtained;
FIG. 6 is a schematic step diagram of another embodiment of the present application;
fig. 7 is a schematic structural diagram of a data processing apparatus for a medical record of traditional Chinese medicine according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
It should be understood that the embodiments described are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a schematic flow chart of a method for processing data of a medical record in a traditional Chinese medicine according to an embodiment of the present application, as shown in fig. 1, the method for processing data of a medical record in a traditional Chinese medicine according to the present embodiment may include:
step 101, acquiring traditional Chinese medicine medical record data to be processed.
Traditional Chinese medicine is a subject for studying human physiology, pathology, diagnosis, prevention and treatment of diseases and the like; the medical record, also called a medical record, is a continuous record of syndrome differentiation, law establishment and prescription medication when doctors treat diseases; the data of the traditional Chinese medical record can comprise data related to cases, the cases comprise more than one times, and each time can record information of symptoms, doctor examination, diagnosis, medication and the like of a patient, in one example, the symptoms of the patient can be dry mouth, the doctor examination can be examination of tongue, pulse, face and the like, the diagnosis of the doctor can comprise qi stagnation, liver soothing, qi regulating and the like, and the medication can comprise 12g of coptis chinensis, 10g of ginger and the like; the data of the TCM medical records include the words about cases, such as "cases", "medical records", etc., and the words about the times of diagnosis include "times of diagnosis", "first time of diagnosis", "second time of diagnosis", "three days later", "seven days later", etc. Moreover, since the history of traditional Chinese medicine is long and the traditional Chinese medicine is ancient in China, the Chinese language words can be used when the medical record is recorded, and further, the case and the number of times of diagnosis identifying words are special compared with the modern Chinese language, and aiming at the characteristics of the traditional Chinese medical record, the basic dictionary can comprise the identifying words for the case and the number of times of diagnosis in the Chinese language words.
The data of the medical records of traditional Chinese medicine to be processed can be that the data about the cases and the times are all in one natural segment, or more than two cases are in one natural segment, or even if one natural segment comprises one case, more than two times are included in the case, namely more than two times are in one natural segment.
Step 102, dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set to obtain case content and diagnosis time content corresponding to the traditional Chinese medicine case data, wherein the basic dictionary comprises identification words related to the traditional Chinese medicine case and the diagnosis time, and the rule set comprises rules related to the traditional Chinese medicine case and the diagnosis time.
The Chinese medicine cases can be identified as "cases", "medical records", etc., and the Chinese medicine diagnosis times can be identified as "diagnosis times", "first diagnosis", "second diagnosis", "three days later", "seven days later", etc. The rule set comprises a plurality of rules, which can include rules related to cases, and can divide the medical data into different cases according to the rules.
In one example, the number of times of the medical record of the traditional Chinese medicine may include information of symptoms of the patient, doctor examination, diagnosis, medication, etc., and a rule about the number of times of the diagnosis may be established by the information, and the number of times of the diagnosis is divided according to the rule in the medical record data imported to the user.
It is understood that the medical record data includes a case, which may also be divided using rules associated with the case, only the result being a case; the medical record data can be divided into different treatment times according to the rules related to the treatment times, and it can be understood that the medical record data comprises one treatment time, or can be divided into different treatment times according to the rules related to the treatment times, and only the result is one treatment time.
In one example, the dividing (step 102) of the medical records data to be segmented based on a basic dictionary and a rule set includes:
102a, dividing the to-be-processed traditional Chinese medicine medical record data for the first time based on a basic dictionary to obtain more than two data sections corresponding to the to-be-processed traditional Chinese medicine medical record data.
The method comprises the steps that on the basis of a basic dictionary, the traditional Chinese medicine case data to be processed are divided for the first time, the division is performed for the first time, the case and diagnosis times of the traditional Chinese medicine case data to be processed can be divided according to identification words related to cases and identification words related to diagnosis times in the basic dictionary, in one example, because no obvious identification words related to the cases exist between two or more cases, a plurality of cases can be divided into one paragraph, and the other case can be divided into one section separately; more than two visits may be divided into one segment.
In one example, the diagnosis times can be divided on the basis of the result obtained after the case is primarily divided, so that more than two data sections are obtained.
And 102b, performing case division and diagnosis division on the traditional Chinese medicine medical record data corresponding to the more than two data sections based on the rule set.
The case and the diagnosis times are divided again for the result obtained after the step 102a is executed, that is, the medical case data corresponding to more than two data segments, specifically, the rules related to the case in the rule set can be traversed, the primarily divided cases obtained in the step 102a are further divided to obtain secondarily divided cases, the rules related to the diagnosis times in the rule set are traversed, and the diagnosis times of each case obtained by the secondary division are divided respectively, wherein the diagnosis times of the cases can be divided by traversing the rules related to the diagnosis times, and the content of each case (including one or more diagnosis times) obtained by the secondary division is divided.
In this embodiment, the to-be-processed traditional Chinese medicine case data is obtained, and the case content and the treatment order content corresponding to the traditional Chinese medicine case data are obtained by dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set, wherein the basic dictionary includes identification words related to traditional Chinese medicine cases and treatment orders, and the rule set includes rules related to the traditional Chinese medicine cases and treatment orders.
In one example, if there are words or rules in the TCM data to be processed that are inconsistent with the basic dictionary and rule set, the system identifies the context and some accurate tags of the related words based on regular expression patterns (which are also part of the seed rules), so as to find new words to be extracted, and the new words are manually determined whether to be added to the basic table.
Fig. 2 is a schematic flow chart of a method for processing medical records of traditional Chinese medicine according to another embodiment of the present application, and as shown in fig. 2, the method for processing medical records of traditional Chinese medicine according to this embodiment is substantially the same as the method for processing medical records of traditional Chinese medicine shown in fig. 1, except that, based on the embodiment shown in fig. 1, before acquiring medical records of traditional Chinese medicine to be processed (step 101), the method further includes:
and step 103, acquiring data of the first traditional Chinese medicine medical record.
The method can be used for carrying out preprocessing on the first traditional Chinese medicine medical record data, in one example, the contents irrelevant to cases and diagnosis times in the data can be deleted, the medical record data is initially published in an electronic magazine, the electronic data can be converted into texts, the data irrelevant to cases and diagnosis times, such as questions, abstracts, author names, units and the like in the texts, can be deleted, and the information of patients related to cases can be deleted; in another example, erroneous punctuation in the medical case data may be corrected. In one example, the TCM data may be preprocessed manually.
And step 104, performing word segmentation on the first traditional Chinese medicine medical record data and extracting identification words related to cases and diagnosis times.
The words of each sentence from the beginning to the end of the first medical record data are segmented, which can be determined according to experience, which are the identification words related to the cases and the times of diagnosis in the traditional Chinese medical record data, and the identification words are extracted. In one example, the identification word "case" related to the case may be extracted, and the identification word "three days later" related to the diagnosis may be extracted. The extracted identification words related to the cases and the diagnosis times are words corresponding to the first medical record data.
In one example, the first traditional Chinese medicine medical record data is subjected to manual word segmentation and identification words related to cases and diagnosis times are extracted, so that the basic dictionary generation process can be simplified, the defect that the traditional Chinese word segmentation method is not suitable for the traditional Chinese medicine medical record can be overcome, and the effectiveness and the accuracy of case and diagnosis time segmentation of the traditional Chinese medicine medical record data can be improved conveniently.
And 105, adding the extracted identification words related to the cases and the diagnoses into a basic dictionary to obtain an updated basic dictionary.
In the base dictionary before updating, in one example, some identification words already exist in the base dictionary before updating, and in another example, the base dictionary before updating is an empty dictionary, that is, has no identification words therein. In this embodiment, the identification words related to the case and the diagnosis order extracted from the first medical record data are added to the basic dictionary to obtain an updated basic dictionary, and the number of the identification words in the updated basic dictionary is increased.
In order to improve the accuracy of case and diagnosis classification of the data of the medical records of traditional Chinese medicine to be processed by using the basic dictionary, the method adds the extracted words related to the first medical record of traditional Chinese medicine and the diagnosis classification into the basic dictionary to obtain an updated basic dictionary (step 105), and comprises the following steps:
and 105a, extracting the characteristics of each identification word related to the case and the diagnosis.
And extracting the characteristics of the identification words which are extracted from the first traditional Chinese medicine medical record data and are related to the cases and the times of diagnosis, and extracting the characteristics of the identification words.
When the feature extraction is performed, a boolean logic model is used for information filtering, that is, a series of feature variables with binary logic are given, and the variables are extracted from the document and used for describing the features of the document, specifically, the variables can be keywords, index words and the like, can also be personal information such as the time of a doctor, the name of a doctor and the like, and can also be the number and the sequence of words, stop words, punctuation marks and the like in the text. In one example, the identifying word may be characterized by a frequency (or frequency), a location, and/or a word length.
And 105b, calculating a characteristic value corresponding to each identification word by using an evaluation function.
And calculating the characteristic value corresponding to each identification word by using an evaluation function to respectively obtain the characteristic value corresponding to each identification word.
In one example, an evaluation function (F) is constructed for each of the identification words associated with the cases and the diagnoses, and calculation of feature values (or called weight values) is performed. The influence factors influencing the characteristic values are: a frequency weighting factor (W), a location weighting factor (a), and a word length weighting factor (L). The frequency weighting factor refers to the frequency of occurrence of a certain identification word in the medical record text and the frequency of occurrence of the identification word in the medical record set. Position weighting factor refers to the position of a certain identification word in a medical record text, and words in different positions are weighted. And the word length weighting factor is calculated by the number of words of the identified word.
The specific calculation process is as follows:
k is the identification word and d is the medical case text.
①fkdFor identifying the frequency of occurrence of the word K in the medical case text, nkdIs composed ofThe frequency of occurrence of the medical case text with the word K in the medical case set; the frequency weighting factor (W) is calculated as follows:
Figure RE-GDA0002763033590000081
position weight is set to ZkWith values of 1 (a word occurs at the beginning of the first or second natural segment of the text), 0.6 (a word occurs at the non-beginning of the other natural segments in the text), 0.2 (a word occurs at the non-beginning of the last natural segment of the text), respectively. SkTo identify the frequency of occurrence of a word k at a corresponding position, the position weighting factor (A) for that word is calculated as:
Figure RE-GDA0002763033590000082
setting the word number of the identification word K as L, and calculating a word length weighting factor (L) by the formula:
Figure RE-GDA0002763033590000083
combining the above influence factors, the formula of the evaluation function (F) is: f (k, d) ═ W (k, d) × a (k, d) × L (k, d). The expansion formula is:
Figure RE-GDA0002763033590000084
and 105c, sequencing each identification word related to the case and the diagnosis order according to the magnitude of the characteristic value respectively to obtain a sequence of words related to the first traditional Chinese medical record and the diagnosis order.
The feature values corresponding to each identification word obtained by calculation may be sorted according to a descending order, or the feature values corresponding to each identification word obtained by calculation may be sorted according to a descending order. And the identification words related to the cases are sorted according to the size of the characteristic values, and the identification words related to the diagnosis times are sorted according to the size of the characteristic values.
And 105d, selecting a preset number of words from the sequence, and adding the words into the basic dictionary to obtain an updated basic dictionary.
The predetermined number may be determined according to the number of the identification words, when the total number of the identification words is small, the numerical value of the predetermined number may be small, and when the total number of the identification words is large, the numerical value of the predetermined number may be large, or may be small.
In this embodiment, by obtaining the first medical record data, performing word segmentation on the first medical record data, extracting the identification words related to the cases and the times of examinations, and adding the extracted identification words related to the cases and the times of examinations into the basic dictionary to obtain the updated basic dictionary, the number of identification words related to the cases and the times of examinations in the dictionary can be increased, which is convenient for improving the accuracy of case and time division of the medical record data based on the basic dictionary, and in addition, the efficiency and quality of case and time division can be improved.
Fig. 3 is a schematic flow chart of a method for processing medical record data according to another embodiment of the present application, and as shown in fig. 3, the method for processing medical record data according to the present embodiment is substantially the same as the method for processing medical record data shown in fig. 2, except that, on the basis of the embodiment described in fig. 2, after the first medical record data is segmented and identification words related to cases and diagnoses are extracted (step 104) before the medical record data to be processed is obtained (step 101), the method further includes:
and 106, obtaining a first seed rule based on the identification words related to the cases and the diagnosis times and the first traditional Chinese medicine medical record data.
Extracting identification words related to cases and times of diagnosis from the first traditional Chinese medicine medical record data, and identifying the position relationship between the extracted words and the positions of the words in the first medical record data, such as the positions of the words in the front, the middle and the back, thereby forming a seed rule, wherein, for example, a case is an identification word, in the first traditional Chinese medicine medical record data, the word appears at the beginning of a case, and the seed rule can be constructed and shaped as a case: %, where case, i.e., indicating the identification word "case", followed by ": ","% "indicates the text content corresponding to" case ", and corresponds to" case: the case is the case of the medical record data in the form of _%. In another example, a seed rule in the form of "% visit _%", where the visit, i.e., meaning the identification word is "visit", "%" is the text content, and medical plan data in the form of "% visit _%", can be constructed, with the content before and after the visit each being a visit text content.
And 107, based on the first seed rule and the second traditional Chinese medicine medical record data, performing rule learning to obtain a first rule set.
The second Chinese medical record data and the first Chinese medical record data can be the same medical record data or different medical record data; the RS _ WHISK algorithm is adopted for rule learning, and the specific process of the RS _ WHISK algorithm is as follows:
RS _ WHISK (Reservoir)// Reservior denotes unlabeled training set data
RuleSet is Seed RuleSet// assigning Seed rules to rule set
Training set initialization
Repeat according to the requirement of user
Get a new training example set NewInst from Reservoir (user mark training example)
Training=Training+NewInst
Deleting rules that make errors on NewInst
For each Inst in Training
For each Tag of Inst
If Tag is not covered by RuleSet
V. generating a new seed rule
Rule=GROW_Rule(Inst,Tag,Training)
Else
V. expand the existing rule set to make it more accurate
RuleSet=EXTEND_RuleSet(RuleSet,Inst,Tag,Training)
End if
Prune RuleSet// regulatory rule set
The experimental process is basically divided into the following steps: firstly, adopting a seed rule which is initially identified and input manually to learn an RS _ WHISK rule; secondly, performing text segmentation test on the learned rule set on the selected training set; 3) manually checking the found result of text segmentation, extracting text examples which are not covered on the front side, and performing RS _ WHISK rule learning on the new examples again; fourthly, performing text segmentation test on a larger training set containing the last training set; 5) repeating the third step and the fourth step until the target experiment result and the rule set are basically stable.
The WHISK begins in two parts, one part is an unmarked instance (as a reserve instance "reservior") and the other part is an empty training set. In each intersection process of marking and rule learning, a part of the reserve examples is selected to a user, and the user carries out annotation. For an output frame, the user would add a label (tag) to the extracted portion (as a slot value for the output frame) in the example. If there are multiple slots in an output frame, then there should be multiple tags.
The instances that are tagged by the user are called training instances, and all training instances constitute a training set. This is to distinguish between training instances and reserve instances. The labels in the training examples direct the WHIKS to create rules (rule) and also to detect the utility of an existing rule. If a rule can be successfully applied to an instance (i.e., the process of extracting useful information from the instance to form a correct output frame by the rule), we consider the instance to be covered by the rule. If the extracted information is just the information tagged in the example, the correct extraction process is adopted, and otherwise, the wrong extraction is adopted.
It can be seen from the algorithm that in each cycle, a part of the stored instances is selected by the user and then manually labeled, and the newly labeled instances are added into the training set to form a new training set, so that for the existing rules, a part of instances in the training set may not meet the rules, and become counter examples of some rules. In this case, we would discard the rule for which there is a counterexample in order to generate a new rule. The WHISK selects an instance from the training set that is not covered by the existing rule, the instance becomes a 'seed', a new rule is generated according to the seed, and the generated rule covers the seed, namely the information extracted according to the rule is the tagged information in the seed instance.
And 108, adding the first rule set into a rule set to obtain an updated rule set.
The number of the rules in the first rule set can be one, or two or more; the rule set before updating can have rules related to cases and/or diagnosis times, and after the rules in the first rule set are added into the rule set, the rules in the first rule set and the rules in the rule set before updating form the rule set together; in the rule set before updating, there may be no rules associated with the case and/or the number of visits, and after adding the rules in the first rule set to the rule set, the rules in the first rule set constitute the rule set.
In this embodiment, a first seed rule is obtained based on the identification words related to the cases and the diagnosis times and the first medical record data, rule learning is performed based on the first seed rule and the second medical record data, the obtained first rule set is added to the rule set, and an updated rule set is obtained, so that the number of rules related to the cases and the diagnosis times in the rule set can be increased, the accuracy of case and diagnosis time division of the medical record data based on the rule set can be improved, the efficiency and the quality of case and diagnosis time division can be improved, and a set of rule set for the medical records can be trained through rule learning to adapt to the specificity of the medical records.
It is understood that steps 106-108 may be located between steps 104 and 105, or between steps 105 and 101.
In order to optimize the rules in the rule set, so as to reduce the amount of computation and improve the efficiency of dividing cases and treatment orders of the medical records to be processed, the present application, based on the above embodiment, further includes, after adding the first rule set to the rule set and obtaining an updated rule set: and optimizing the rule set by using a maximum coverage optimization method to obtain an optimized rule set.
The maximum coverage optimization method may be a bottom-up sequential coverage method, in which each rule is learned on the training set, the training samples covered by the rule are removed, and then the above process is repeated with the remaining training samples as the training set. And gradually deleting characters from the more special rules to enlarge the coverage range until the condition position is met, so that the rule set can cover the front embodiment to the maximum extent, the back embodiment is avoided, and a globally optimized rule set is formed.
Fig. 4 is a schematic flow chart of a method for processing medical record data according to another embodiment of the present application, and as shown in fig. 4, the method for processing medical record data according to this embodiment is substantially the same as the method for processing medical record data shown in fig. 2, except that, on the basis of the embodiment shown in fig. 2, after the first medical record data is segmented and identification words related to cases and diagnoses are extracted (step 104) before the medical record data to be processed is obtained (step 101), the method further includes:
and step 109, performing rule learning based on the identification words related to the cases and the times of diagnosis and the second traditional Chinese medicine medical record data to obtain the derivative identification words corresponding to the identification words related to the cases and the times of diagnosis.
The rule learning can be specifically referred to in step 107; the identification of a case is "case" and the derivative identification used with it can be a case, a patient, etc.
And step 110, adding the derived identification words into the basic dictionary to obtain an updated basic dictionary.
There may be "case" before updating the basic dictionary, and after the rule learning, the derived identification words corresponding to the case can be the case and the patient, and then the case and the patient are added into the basic table.
In one example, before the derived identification words are added to the basic dictionary, the derived identification words may be displayed, so that the user may select which derived identification words need to be added to the basic dictionary according to experience, specifically, the user may select the derived identification words according to the frequency and frequency of the derived identification words appearing in the existing medical records, and the identification words with a large number of occurrences and/or the identification words with a large frequency of occurrences are added to the basic dictionary.
In this embodiment, based on the identification words related to the case and the diagnosis order and the second medical record data, rule learning is performed to obtain the derivative identification words corresponding to the identification words related to the case and the diagnosis order, and the derivative identification words are added to the basic dictionary to obtain the updated basic dictionary.
It is understood that steps 109 and 110 may be located between steps 104 and 105, or between steps 105 and 101.
The processing method described in the above embodiment is explained below as a specific embodiment.
S1, recognizing the traditional Chinese medicine medical records of different file types by an Optical Character Recognition (OCR) technology, and acquiring corresponding text contents, wherein the file types mainly comprise PDF files, excel files and jpg files. FIG. 5 illustrates an embodiment of recognition of a PDF formatted file by OCR and the corresponding results obtained. The recognized text is cleaned, invalid text, interference text and/or punctuation are removed, interference is avoided, and specifically, the text which is irrelevant to the traditional Chinese medicine medical record data, such as the name, author, unit, abstract and the like of the article, can be removed.
And S2, dividing all the medical plan data into three parts, wherein the first part is used as a character string mark, the second part is used as a training set, and the third part is used as a test set.
S3, segmenting the first part of medical record data, manually marking and extracting the identification words related to cases and times of diagnosis in the medical record text, unifying the separation identifications among the identification words to form an initial basic dictionary, and marking the positions of the extracted identification words appearing in the first part of medical record data and the relationships between preceding and following, for example, the extracted identification words can be divided into three positional relationships, namely, preceding, intermediate and following, and the positional relationship formed by the identification words and preceding and following is used as a seed rule, in this embodiment, the seed rule corresponding to the "case" can be "case: "%", indicates the identification word "case: for example, the seed rule corresponding to the identification word may be "_% diagnosis times _%, which means that the identification word related to the diagnosis times is" diagnosis times ", and the text content corresponding to one diagnosis time is before and after the word.
And S4, performing rule learning on the second part of medical case data, wherein the algorithm of the rule learning adopts a seed rule and a maximum coverage optimization method. The rule learning method is to train on a training set by using manually determined seed rules, and then optimize the rule set according to a maximum coverage optimization recruitment method.
The rule learning method of the method comprises two steps: the method comprises the following steps of (1) pure extraction rule training, wherein the rules are trained on a rule training set after artificial marking and a rule training set generated based on the previous accumulated words and character strings respectively, and the rule set generated after the training by the rule learning method is used for segmentation feedback training; secondly, in the feedback training of information extraction, in order to perfect the rule set after the training in the first stage, the generalization and the concretionary degree of the rule set are adjusted again based on the feedback result of the information extraction (such as the accuracy of segmentation, the error rate and the like). For example: the initial vocabulary had "case: "but after learning by rules, the segmentation word derives" case: "" patient: "and the like.
In one embodiment, the raw data is two: the method comprises the following steps: female, age 48, complaint: cough, fever; prescription: 10g of ginseng and 10g of astragalus root. In this case, qi is supplemented to counteract exogenous pathogens. ② cases: male, 60 years old. The chest CT test results are as follows, and the medicine is used as follows.
In the first step, a seed rule is manually identified as "case _%, and extracted. "case" is extracted when there is any character after the two characters.
Extracting the original data according to the sub-rule, and dividing two cases which are respectively' case: women, age 48, complain "(correct extraction) and" in case, tonify qi to fight exogenous pathogens. "(false extraction) the second original data fails to be split into cases.
And in the second step, the system transfers the divided cases to a worker auditing interface, and judges manually. The manual correction rule is' as case: and (4) extracting if the content is not high. The system adopts an RS _ WHISK algorithm to learn rules according to the style of original data, and recommends a new rule "%: the 'case' word is expanded when the rule that the judgment is correct is manually carried out (namely any character: having the sex word and having the number).
And S5, verifying whether the basic dictionary and the rule set generated in the step can accurately divide cases and times by using the third part of medical plan data.
And repeating the steps to obtain the updated basic dictionary and the updated rule set each time.
After repeating the above steps for a plurality of times, the number of the identification words and the rules related to the cases and the diagnosis times in the basic dictionary and the rules set related to the cases and the diagnosis times is large, the basic dictionary and the rules set based on the time are used for carrying out case segmentation and diagnosis time segmentation on the traditional Chinese medical record data imported by the user, automatically segmenting the imported medical record data into a plurality of cases, and segmenting each case into a plurality of diagnosis times.
FIG. 6 is a schematic step diagram of another embodiment of the present application.
Fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 7, the data processing apparatus includes: an acquisition module 11 and a dividing module 12, wherein,
an obtaining module 11, configured to obtain traditional Chinese medicine medical record data to be processed;
the dividing module 12 is configured to divide the to-be-processed medical record data based on a basic dictionary and a rule set, so as to obtain case content and visit content corresponding to the medical record data, where the basic dictionary includes identification words related to a medical case and a visit, and the rule set includes rules related to the medical case and the visit.
The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.
The device of the embodiment divides the to-be-processed traditional Chinese medicine case data to obtain case content and treatment order content corresponding to the traditional Chinese medicine case data by acquiring the to-be-processed traditional Chinese medicine case data and dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set, wherein the basic dictionary comprises identification words of traditional Chinese medicine cases and treatment orders, the rule set comprises rules related to the traditional Chinese medicine cases and the treatment orders, and the traditional Chinese medicine case data is divided based on not only the basic dictionary comprising the identification words related to the traditional Chinese medicine cases and the treatment orders but also the rule set related to the traditional Chinese medicine cases and the treatment orders, so that the accuracy of case and treatment order division of the traditional Chinese medicine case data is improved.
Fig. 8 is a schematic structural diagram of an embodiment of an electronic device of the present invention, which can implement the process of the embodiment shown in fig. 1 of the present invention, and as shown in fig. 8, the electronic device may include: the device comprises a shell 41, a processor 42, a memory 43, a circuit board 44 and a power circuit 45, wherein the circuit board 44 is arranged inside a space enclosed by the shell 41, and the processor 42 and the memory 43 are arranged on the circuit board 44; a power supply circuit 45 for supplying power to each circuit or device of the electronic apparatus; the memory 43 is used for storing executable program code; the processor 42 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 43, and is configured to execute the following steps of the method for processing medical record data of traditional Chinese medicine:
acquiring traditional Chinese medicine medical record data to be processed;
dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set to obtain case content and diagnosis order content corresponding to the traditional Chinese medicine case data, wherein the basic dictionary comprises identification words related to the traditional Chinese medicine case and the diagnosis order, and the rule set comprises rules related to the traditional Chinese medicine case and the diagnosis order.
The specific execution process of the above steps by the processor 42 and the steps further executed by the processor 42 by running the executable program code may refer to the description of the embodiment shown in fig. 1 of the present invention, and are not described herein again.
The electronic device exists in a variety of forms, including but not limited to:
(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.
(3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.
(4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.
(5) And other electronic equipment with data interaction function.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
For convenience of description, the above devices are described separately in terms of functional division into various units/modules. Of course, the functionality of the units/modules may be implemented in one or more software and/or hardware implementations of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A data processing method of a traditional Chinese medicine medical record is characterized by comprising the following steps:
acquiring traditional Chinese medicine medical record data to be processed;
dividing the to-be-processed traditional Chinese medicine case data based on a basic dictionary and a rule set to obtain case content and diagnosis order content corresponding to the traditional Chinese medicine case data, wherein the basic dictionary comprises identification words related to traditional Chinese medicine cases and diagnosis orders, and the rule set comprises rules related to the traditional Chinese medicine cases and the diagnosis orders;
before acquiring the data of the traditional Chinese medicine medical record to be processed, the method further comprises the following steps:
acquiring first traditional Chinese medicine medical record data, wherein the first traditional Chinese medicine medical record data comprises deleting contents irrelevant to cases and diagnosis times in the data, publishing the medical record data on a journal of an electronic edition at first, converting the data of the electronic edition into a text, deleting data of which the subjects, the abstract, the author names and the units are irrelevant to the cases and the diagnosis times, deleting information of patients related to cases, and correcting wrong punctuation marks in the medical record data;
performing word segmentation on the first traditional Chinese medicine medical record data and extracting identification words related to cases and diagnosis times;
adding the extracted identification words related to the cases and the diagnosis times into a basic dictionary to obtain an updated basic dictionary;
adding the extracted identification words related to the cases and the diagnoses into a basic dictionary to obtain an updated basic dictionary, wherein the step of adding the extracted identification words related to the cases and the diagnoses comprises the following steps:
extracting the characteristics of the identification words related to the cases and the diagnosis times;
calculating a characteristic value corresponding to each identification word by using an evaluation function;
sequencing each identification word related to the case and the diagnosis order according to the magnitude of the characteristic value respectively to obtain a sequence of words related to the first traditional Chinese medical record and the diagnosis order;
selecting identification words corresponding to a predetermined number of characteristic values from the sequence, and adding the identification words into a basic dictionary to obtain an updated basic dictionary;
wherein, the characteristic extraction is carried out on the identification words related to the cases and the diagnosis times, and the characteristic extraction comprises the following steps:
extracting the characteristics of the identification words related to the cases and the times extracted from the first traditional Chinese medicine medical record data, and extracting the characteristics of the identification words;
when the characteristic extraction is carried out, a Boolean logic model is utilized to carry out information filtering, namely a series of characteristic variables with binary logic are given, and the variables are extracted from the document and are used for describing the characteristics of the document, including keywords, index words, treatment time, doctor names, word numbers, stop words, punctuation marks, frequency or frequency, positions and/or word lengths;
wherein, the calculating the feature value corresponding to each identification word by using the evaluation function comprises:
constructing an evaluation function F for each identification word related to the case and the diagnosis, and calculating a characteristic value or weight; the influence factors influencing the characteristic values are: a frequency weighting factor W, a position weighting factor A and a word length weighting factor L; firstly, frequency weighting factors refer to the frequency of occurrence of a certain identification word in a medical case text and the frequency of occurrence of the identification word in a medical case set; position weighting factor refers to the position of a certain identification word in a medical case text, and words in different positions are weighted; the word length weighting factor is calculated by the number of words of the identified word;
the specific calculation process is as follows:
k is an identification word and d is a medical case text;
(1)fkdfor identifying the frequency of occurrence of the word k in the medical case text, nkdThe frequency of occurrence of the medical case text containing the word K in the medical case set is shown; the frequency weighting factor W is calculated as follows:
Figure FDA0003297721330000021
(2) position weight is set to ZkWith values 1 for the word appearing at the beginning of the first or second natural segment of the text, 0.6 for the word appearing at the non-beginning of the other natural segments in the text, 0.2 for the word appearing at the non-beginning of the last natural segment of the text, SkTo identify the frequency of occurrence of a word k at a corresponding position, the position weighting factor (A) for that word is calculated as:
Figure FDA0003297721330000022
(3) the number of words of the identification word K is set as L, and the word length weighting factor (L) is calculated by the formula:
Figure FDA0003297721330000023
the formula of the evaluation function (F) is: f (k, d) ═ W (k, d) × a (k, d) × L (k, d), the expansion formula is:
Figure FDA0003297721330000024
2. the processing method of claim 1, wherein before acquiring the medical records data to be processed, after segmenting the first medical records data and extracting identification words related to cases and diagnoses, the method further comprises:
obtaining a first seed rule based on the identification words related to the cases and the diagnosis times and the first traditional Chinese medicine medical record data;
performing rule learning based on the first seed rule and the second traditional Chinese medicine medical record data to obtain a first rule set;
and adding the first rule set into a rule set to obtain an updated rule set.
3. The process of claim 2, wherein after adding the first rule set to a rule set to obtain an updated rule set, the process further comprises:
and optimizing the rule set by using a maximum coverage optimization method to obtain an optimized rule set.
4. The processing method of claim 1, wherein before acquiring the medical records data to be processed, after segmenting the first medical records data and extracting identification words related to cases and diagnoses, the method further comprises:
performing rule learning based on the identification words related to the cases and the times of diagnosis and the second traditional Chinese medicine medical record data to obtain derivative identification words corresponding to the identification words related to the cases and the times of diagnosis;
and adding the derived identification words into the basic dictionary to obtain an updated basic dictionary.
5. The processing method according to any one of claims 1 to 4, wherein the dividing the TCM medical record data to be processed based on a basic dictionary and a rule set comprises:
based on a basic dictionary, carrying out first division on the to-be-processed traditional Chinese medicine medical record data to obtain more than two data sections corresponding to the to-be-processed traditional Chinese medicine medical record data;
and performing case division and diagnosis number division on the traditional Chinese medicine medical record data corresponding to the more than two data sections based on a rule set.
6. A device for processing data of a medical record of chinese medicine, characterized in that, the device is used for implementing the method for processing data of a medical record of chinese medicine of claim 1.
7. An electronic device, characterized in that the electronic device comprises: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for performing the method of processing the medical records data of traditional Chinese medicine as set forth in claim 1.
CN202010700305.9A 2020-07-20 2020-07-20 Traditional Chinese medicine medical record data processing method and device and electronic equipment Active CN112131862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010700305.9A CN112131862B (en) 2020-07-20 2020-07-20 Traditional Chinese medicine medical record data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010700305.9A CN112131862B (en) 2020-07-20 2020-07-20 Traditional Chinese medicine medical record data processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN112131862A CN112131862A (en) 2020-12-25
CN112131862B true CN112131862B (en) 2021-12-03

Family

ID=73850564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010700305.9A Active CN112131862B (en) 2020-07-20 2020-07-20 Traditional Chinese medicine medical record data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112131862B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844723A (en) * 2017-02-10 2017-06-13 厦门大学 medical knowledge base construction method based on question answering system
CN107229609A (en) * 2016-03-25 2017-10-03 佳能株式会社 Method and apparatus for splitting text
CN108257676A (en) * 2016-12-28 2018-07-06 北京搜狗科技发展有限公司 A kind of processing method, device and the equipment of case information
CN108389606A (en) * 2018-05-08 2018-08-10 灵玖中科软件(北京)有限公司 A kind of the data quality control system and its control method of electronic medical record homepage
CN109215797A (en) * 2018-09-05 2019-01-15 山东管理学院 Chinese medicine case non-categorical Relation extraction method and system based on extension correlation rule
CN110162784A (en) * 2019-04-19 2019-08-23 平安科技(深圳)有限公司 Entity recognition method, device, equipment and the storage medium of Chinese case history

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170214701A1 (en) * 2016-01-24 2017-07-27 Syed Kamran Hasan Computer security based on artificial intelligence
CN108549639A (en) * 2018-04-20 2018-09-18 山东管理学院 Based on the modified Chinese medicine case name recognition methods of multiple features template and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229609A (en) * 2016-03-25 2017-10-03 佳能株式会社 Method and apparatus for splitting text
CN108257676A (en) * 2016-12-28 2018-07-06 北京搜狗科技发展有限公司 A kind of processing method, device and the equipment of case information
CN106844723A (en) * 2017-02-10 2017-06-13 厦门大学 medical knowledge base construction method based on question answering system
CN108389606A (en) * 2018-05-08 2018-08-10 灵玖中科软件(北京)有限公司 A kind of the data quality control system and its control method of electronic medical record homepage
CN109215797A (en) * 2018-09-05 2019-01-15 山东管理学院 Chinese medicine case non-categorical Relation extraction method and system based on extension correlation rule
CN110162784A (en) * 2019-04-19 2019-08-23 平安科技(深圳)有限公司 Entity recognition method, device, equipment and the storage medium of Chinese case history

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
中医医案数据挖掘技术研究;张煜斌;《中国优秀硕士学位论文全文数据库信息科技辑》;20091215(第12期);第I138-542页 *

Also Published As

Publication number Publication date
CN112131862A (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN109670179B (en) Medical record text named entity identification method based on iterative expansion convolutional neural network
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN110377755A (en) Reasonable medication knowledge map construction method based on medicine specification
CN109522546A (en) Entity recognition method is named based on context-sensitive medicine
CN109920540A (en) Construction method, device and the computer equipment of assisting in diagnosis and treatment decision system
JP7464800B2 (en) METHOD AND SYSTEM FOR RECOGNITION OF MEDICAL EVENTS UNDER SMALL SAMPLE WEAKLY LABELING CONDITIONS - Patent application
Wang et al. A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records
CN110427486B (en) Body condition text classification method, device and equipment
CN111048167A (en) Hierarchical case structuring method and system
CN112885478B (en) Medical document retrieval method, medical document retrieval device, electronic device and storage medium
CN113111162A (en) Department recommendation method and device, electronic equipment and storage medium
CN116910172B (en) Follow-up table generation method and system based on artificial intelligence
CN112541066A (en) Text-structured-based medical and technical report detection method and related equipment
Adduru et al. Towards Dataset Creation And Establishing Baselines for Sentence-level Neural Clinical Paraphrase Generation and Simplification.
CN113658720A (en) Method, apparatus, electronic device and storage medium for matching diagnostic name and ICD code
CN112071431B (en) Clinical path automatic generation method and system based on deep learning and knowledge graph
Wang et al. Research on named entity recognition of doctor-patient question answering community based on bilstm-crf model
CN112131862B (en) Traditional Chinese medicine medical record data processing method and device and electronic equipment
CN112635072A (en) ICU (intensive care unit) similar case retrieval method and system based on similarity calculation and storage medium
CN116578704A (en) Text emotion classification method, device, equipment and computer readable medium
CN116911300A (en) Language model pre-training method, entity recognition method and device
CN111178047A (en) Ancient medical record prescription extraction method based on hierarchical sequence labeling
CN115631823A (en) Similar case recommendation method and system
CN114927180A (en) Medical record structuring method and device and storage medium
CN114722825A (en) Label generation method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant