CN111797629A - Medical text data processing method and device, computer equipment and storage medium - Google Patents

Medical text data processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111797629A
CN111797629A CN202010583894.7A CN202010583894A CN111797629A CN 111797629 A CN111797629 A CN 111797629A CN 202010583894 A CN202010583894 A CN 202010583894A CN 111797629 A CN111797629 A CN 111797629A
Authority
CN
China
Prior art keywords
recognition model
labeling result
model
character
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010583894.7A
Other languages
Chinese (zh)
Other versions
CN111797629B (en
Inventor
许水琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Medical and Healthcare Management Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202010583894.7A priority Critical patent/CN111797629B/en
Publication of CN111797629A publication Critical patent/CN111797629A/en
Application granted granted Critical
Publication of CN111797629B publication Critical patent/CN111797629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and provides a method, a device, computer equipment and a storage medium for processing medical text data, wherein the method comprises the following steps: acquiring medical text data; respectively inputting the data into a first recognition model, a second recognition model and a third recognition model; predicting a first labeling result, a second labeling result and a third labeling result corresponding to each character in the medical text data through the first recognition model, the second recognition model and the third recognition model respectively; judging whether the first labeling result, the second labeling result and the third labeling result are the same; when the labeling results are the same, determining the first labeling result as a labeling result corresponding to the character; and (4) extracting the named entities in the medical text data, and performing payment measurement and calculation processing. According to the method and the device, the accuracy of model prediction is improved through the prediction consistency of the multiple models, and therefore the accuracy of named entity recognition is improved. The scheme in this application can be applied to in the wisdom medical treatment field to promote the construction in wisdom city.

Description

Medical text data processing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for processing medical text data, a computer device, and a storage medium.
Background
The traditional payment measurement and calculation process mainly comprises the following steps: manually collecting historical data, and collecting information and expense details of the first page of the hospitalized medical records in different medical institutions in the implementation area of the last three years; manually storing the data into an excel table; carrying out manual analysis and screening secondary processing on the excel data; and manually screening the payment data to calculate related index data, predicting future payment standards and generating corresponding measuring and calculating results. This conventional approach has a number of drawbacks, such as: 1. the procedure is complicated, and the hysteresis is relatively large; 2. manpower and material resources are occupied; 3. the manual operation is easy to generate errors, and different human calculation methods have various differences and are not uniform in standard, so that the measurement and calculation result is inaccurate; 4. the method is not conducive to reuse, resulting in a large amount of repetitive labor.
Thus, automated payment estimation using payment budgeting tools, such as drg-based payment estimation tools, are currently emerging. In the payment calculation tool based on drg payment, named entities such as hospital names, regions, departments and the like included in medical text data need to be accurately identified; the current recognition accuracy is low, and the payment measurement is not facilitated.
Disclosure of Invention
The present application mainly aims to provide a method, an apparatus, a computer device, and a storage medium for processing medical text data, and aims to overcome the defect that named entities included in medical text data cannot be accurately identified at present.
In order to achieve the above object, the present application provides a method for processing medical text data, comprising the following steps:
acquiring medical text data;
inputting the medical text data into a first recognition model, a second recognition model and a third recognition model respectively; the first identification model is obtained by training a BiLSTM-CRF model based on a public data set, the second identification model is obtained by training a BiLSTM-CRF model based on a medical field data set, and the third identification model is obtained by training the BiLSTM-CRF model based on the public data set and the medical field data set;
predicting a first probability that each character in the medical text data corresponds to each label through the first recognition model; predicting a second probability that each character in the medical text data corresponds to each label through the second recognition model; predicting a third probability that each character in the medical text data corresponds to each label through the third recognition model; wherein the label with the first highest probability is used as a first label result of the character predicted by the first recognition model, the label with the second highest probability is used as a first label result of the character predicted by the second recognition model, and the label with the third highest probability is used as a third label result of the character predicted by the third recognition model;
respectively judging whether the first labeling result, the second labeling result and the third labeling result corresponding to each character are the same;
if the first labeling result is the same as the character labeling result, determining the first labeling result as the labeling result corresponding to the character;
and according to the labeling result, extracting the named entity in the medical text data, and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
Further, after the step of respectively determining whether the first labeling result, the second labeling result, and the third labeling result corresponding to each character are the same, the method includes:
if not, calculating the total probability of the character being predicted as the third labeling result according to the first probability of the first recognition model predicting the character as the third labeling result, the second probability of the second recognition model predicting the character as the third labeling result, the third probability of the third recognition model predicting the character as the third labeling result, and preset weights corresponding to the predicted results of the first recognition model, the second recognition model and the third recognition model;
judging whether the total probability is greater than a threshold value, if so, taking the third labeling result as a labeling result corresponding to the character;
and according to the labeling result, extracting the named entity in the medical text data, and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
Further, the step of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively comprises:
sequentially inputting the sample data in the medical field data set into the first identification model, the second identification model and the third identification model for prediction to obtain a labeling result corresponding to each sample data; wherein the sample data comprises a correct labeling result;
respectively calculating the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model according to the predicted labeling results corresponding to all the sample data and the correct labeling result of the sample data;
and calculating the ratio of the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model, and determining the preset weight corresponding to the prediction results of the first recognition model, the second recognition model and the third recognition model according to the ratio.
Further, the step of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively comprises:
training a BilSTM-CRF model based on the public data set to obtain the first recognition model, training the BilSTM-CRF model based on the medical field data set to obtain the second recognition model, and training the BilSTM-CRF model based on the public data set and the medical field data set to obtain a third recognition model;
randomly selecting two models from the first identification model, the second identification model and the third identification model, and sequentially selecting one unmarked target data from the unmarked data set to input the unmarked target data into the two selected models for prediction to obtain the corresponding prediction and marking results of the two models;
and if the corresponding prediction labeling results of the two models are the same, adding the corresponding prediction labeling result to the non-labeling target data, and inputting the non-labeling target data to a third model which is not selected for iterative training.
Further, the step of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively comprises:
acquiring a preset target text; wherein the target text is text data of a medical field;
adding each sample in the public data set into the target text respectively, generating a public data training text correspondingly, and inputting all the generated public data training texts into the BilSTM-CRF model in sequence to train to obtain the first recognition model;
adding each sample in the medical field data set into the target text respectively, generating a medical data training text correspondingly, and inputting all the generated medical data training texts into the BilSTM-CRF model in sequence to train to obtain the second recognition model;
and iteratively selecting a sample from the public data set and the medical field data set respectively, adding the samples to the target text together, correspondingly generating a target data training text, and sequentially inputting all the generated target data training texts into the BilTM-CRF model for training to obtain the third recognition model.
The present application also provides a processing apparatus for medical text data, including:
a first acquisition unit configured to acquire medical text data;
the first input unit is used for inputting the medical text data into a first recognition model, a second recognition model and a third recognition model respectively; the first identification model is obtained by training a BiLSTM-CRF model based on a public data set, the second identification model is obtained by training a BiLSTM-CRF model based on a medical field data set, and the third identification model is obtained by training the BiLSTM-CRF model based on the public data set and the medical field data set;
the prediction unit is used for predicting a first probability that each character in the medical text data corresponds to each label through the first recognition model; predicting a second probability that each character in the medical text data corresponds to each label through the second recognition model; predicting a third probability that each character in the medical text data corresponds to each label through the third recognition model; wherein the label with the first highest probability is used as a first label result of the character predicted by the first recognition model, the label with the second highest probability is used as a first label result of the character predicted by the second recognition model, and the label with the third highest probability is used as a third label result of the character predicted by the third recognition model;
the judging unit is used for respectively judging whether the first labeling result, the second labeling result and the third labeling result corresponding to each character are the same or not;
a first determining unit, configured to determine the first labeling result as a labeling result corresponding to the character if the first labeling result, the second labeling result, and the third labeling result are the same;
and the first processing unit is used for extracting the named entity in the medical text data according to the labeling result and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
Further, the processing device of the medical text data further comprises:
if the first recognition model and the second recognition model are different, predicting a first probability that the character is the third labeling result according to the first recognition model, predicting a second probability that the character is the third labeling result according to the second recognition model, predicting a third probability that the character is the third labeling result according to the third recognition model, and calculating a total probability that the character is predicted as the third labeling result according to preset weights corresponding to prediction results of the first recognition model, the second recognition model and the third recognition model;
a second determining unit, configured to determine whether the total probability is greater than a threshold, and if so, take the third labeling result as a labeling result corresponding to the character;
and the second processing unit is used for extracting the named entity in the medical text data according to the labeling result and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
Further, the processing device of the medical text data further comprises:
the second input unit is used for sequentially inputting the sample data in the medical field data set into the first recognition model, the second recognition model and the third recognition model for prediction to obtain a labeling result corresponding to each sample data; wherein the sample data comprises a correct labeling result;
the second calculation unit is used for respectively calculating the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model according to the predicted labeling results corresponding to all the sample data and the correct labeling result of the sample data;
and the third calculating unit is used for calculating the ratio of the accuracy rates of the prediction results of the first recognition model, the second recognition model and the third recognition model and determining the preset weights corresponding to the prediction results of the first recognition model, the second recognition model and the third recognition model according to the ratio.
Further, the processing device of medical text data further comprises:
the pre-training unit is used for training a BilSTM-CRF model based on the public data set to obtain the first recognition model, training the BilSTM-CRF model based on the medical field data set to obtain the second recognition model, and training the BilSTM-CRF model based on the public data set and the medical field data set to obtain a third recognition model;
the selection unit is used for randomly selecting two models from the first identification model, the second identification model and the third identification model, and sequentially selecting one unmarked target data from the unmarked data set to input the unmarked target data into the two selected models for prediction to obtain the corresponding prediction and marking results of the two models;
and the iterative training unit is used for adding the corresponding prediction labeling result to the label-free target data and inputting the label-free target data to a third unselected model for iterative training if the prediction labeling results corresponding to the two models are the same.
Further, the processing device of medical text data further comprises:
the second acquisition unit is used for acquiring a preset target text; wherein the target text is text data of a medical field;
the first training unit is used for respectively adding each sample in the public data set into the target text, respectively and correspondingly generating a public data training text, and sequentially inputting all the generated public data training texts into the BilTM-CRF model for training to obtain the first recognition model;
the second training unit is used for respectively adding each sample in the medical field data set into the target text, respectively and correspondingly generating a medical data training text, and sequentially inputting all the generated medical data training texts into the BilTM-CRF model for training to obtain the second recognition model;
and the third training unit is used for iteratively selecting a sample from the public data set and the medical field data set respectively, adding the samples into the target text together, correspondingly generating a target data training text, and inputting all the generated target data training texts into the BilSTM-CRF model in sequence to train to obtain the third recognition model.
The present application further provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of any of the above.
The application provides a medical text data processing method, a medical text data processing device, computer equipment and a storage medium, wherein the medical text data processing method comprises the following steps: acquiring medical text data; inputting the medical text data into a first recognition model, a second recognition model and a third recognition model respectively; wherein, the training samples of the three models are different; predicting a first probability that each character in the medical text data corresponds to each label through the first recognition model; predicting a second probability that each character in the medical text data corresponds to each label through the second recognition model; predicting a third probability that each character in the medical text data corresponds to each label through the third recognition model; respectively judging whether the first labeling result, the second labeling result and the third labeling result corresponding to each character are the same; when the labeling results are the same, determining the first labeling result as the labeling result corresponding to the character; and according to the labeling result, extracting the named entity in the medical text data, and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing. According to the method and the device, the accuracy of model prediction needs to be improved through the prediction consistency of a plurality of models, so that the accuracy of named entity identification is improved, and the payment measurement and calculation tool can be used for accurately measuring and calculating.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a method for processing medical text data according to an embodiment of the present application;
fig. 2 is a block diagram of a processing apparatus for medical text data according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a method for processing medical text data, including the following steps:
step S1, acquiring medical text data;
step S2, inputting the medical text data into a first recognition model, a second recognition model, and a third recognition model, respectively; the first identification model is obtained by training a BiLSTM-CRF model based on a public data set, the second identification model is obtained by training a BiLSTM-CRF model based on a medical field data set, and the third identification model is obtained by training the BiLSTM-CRF model based on the public data set and the medical field data set;
step S3, predicting a first probability that each character in the medical text data corresponds to each label through the first recognition model; predicting a second probability that each character in the medical text data corresponds to each label through the second recognition model; predicting a third probability that each character in the medical text data corresponds to each label through the third recognition model; wherein the label with the first highest probability is used as a first label result of the character predicted by the first recognition model, the label with the second highest probability is used as a first label result of the character predicted by the second recognition model, the label with the third highest probability is used as a third label result of the character predicted by the third recognition model, and each label is B, I, E, O, S;
step S4, respectively determining whether the first annotation result, the second annotation result, and the third annotation result corresponding to each character are the same;
step S5, if the first annotation result is the same as the annotation result corresponding to the character, determining the first annotation result as the annotation result corresponding to the character;
and step S6, extracting the named entity in the medical text data according to the labeling result, and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
In the embodiment, the method is applied to a smart medical scene of a smart city, so as to promote the construction of the smart city. In particular, the method can be applied to medical information scenes of digital medical treatment. In the data acquisition stage in the payment measurement and calculation scene, in the payment measurement and calculation in the current medical scene, the acquired data is usually medical text data of each medical institution, and the medical text data is usually the first page information and expense details of the medical records in the medical institution; the system comprises more named entity information, such as medical institution names, department names, names of attending physicians, locations of medical institutions, drug names in expenses and the like; in the drg payment calculation tool, when performing payment calculation based on the medical text data, it is necessary to identify each named entity in the medical text data for classification processing. Therefore, after the medical text data is input into the system, the named entity recognition process is first performed.
Specifically, as described in the above step S1, the medical text data may be obtained from electronic systems of various medical institutions, and is a text file in which a large amount of medical information is recorded.
As described in step S2, three usable recognition models, namely, a first recognition model, a second recognition model and a third recognition model, are trained in advance; the three recognition models are obtained by training based on a BilSTM-CRF model, and the difference is that training samples adopted for training the BilSTM-CRF model are different, and when the training samples are different, the finally obtained recognition models are also different in prediction results.
The public data sets are a large number of data sets with named entity labels and are disclosed in big data, the data volume is large, the sources are wide, and the data acquisition is easy; therefore, the first recognition model obtained by training the BilSTM-CRF model based on the public data set has stronger robustness due to the larger data volume of the training sample.
Because the same word can have different meanings in different fields, specific named entity labeling needs to be carried out aiming at different fields to obtain a training sample, the data set in the medical field is a data set which is specially marked by named entities aiming at the medical field, and the data set in the medical field has strong professional pertinence but small data volume. Therefore, the second recognition model obtained by training the BilSTM-CRF model based on the medical field data set has strong professional recognition capability for named entity recognition in the medical field, but has poor robustness.
The third recognition model is obtained by training the BilSTM-CRF model based on the public data set and the medical field data set, and the training sample of the third recognition model adopts the public data set and the medical field data set, so that the third recognition model not only has strong robustness, but also has strong professional recognition capability, and can improve the generalization capability of the model.
In the present embodiment, the medical text data is input to the first recognition model, the second recognition model, and the third recognition model, respectively, to predict the result. It can be understood that the predicted results of the first recognition model, the second recognition model and the third recognition model are probabilities that each character in the medical text data corresponds to each label, and when the probability of a label is the maximum, the character is indicated as the corresponding label; wherein each of said labels is B, I, E, O, S; b represents entity beginning, I represents entity inside, O represents non-entity, E represents entity ending, and S represents single-word entity. For example, if a medical text is a cef 25 yuan per box, the characters in the medical text may be labeled head-B, spore-I, 2-I, 5-I, yuan-E, per-O, box-O in sequence; and combining the characters between the label B and the label E into a whole according to the label, wherein the whole is the named entity extracted from the text. In the medical field, a single-word named entity is not usually used, and therefore, a single-word entity labeled as S may not be extracted in this embodiment.
The first recognition model, the second recognition model and the third recognition model are integrated with the same word embedding model so as to construct a word vector for the medical text data, such as a currently general word2vec model.
As described in step S3, the first recognition model, the second recognition model and the third recognition model are different from each other and have different attention dimensions in the medical text data, so that the predicted results may be different from each other.
As described in the foregoing steps S4-S5, whether the first annotation result, the second annotation result, and the third annotation result corresponding to each character are the same or not is respectively determined, if the predicted results are consistent, the predicted result is determined to be correct, and any one of the first annotation result, the second annotation result, and the third annotation result is taken as the annotation result corresponding to the character; if the prediction results are different, the prediction results have deviation, and the accuracy is not high.
In the embodiment, the three recognition models are adopted to respectively carry out result prediction, the voting consistency principle is adopted to express the confidence coefficient of the prediction result, the reliability of the model prediction result is improved, the recognition effect of the model is better, the recognition effect of the named entity in the dependent text data is better, and the generalization capability of the model recognition is improved.
Finally, as stated in step S6, according to the labeling result, the named entity in the medical text data can be extracted; further, the named entities extracted from the medical text data are classified and input into corresponding areas in a payment measuring and calculating tool for subsequent processing. In this embodiment, the named entity extraction method is adopted, so that the named entity extraction accuracy is improved, and statistics of subsequent payment measurement and calculation is facilitated. Specifically, in this embodiment, the performing the payment calculation processing based on drg payment includes:
importing named entities in the medical text data; establishing a code matching task to perform code matching processing; if the code matching is successful, performing quality control processing on the newly added quality control task; if the quality control is successful, a grouping task is newly added for grouping processing; if the grouping is successful, adding a cutting task for cutting; if the cutting is successful, a new measurement and calculation task is added to carry out payment measurement and calculation; and if the measurement and calculation are successful, adding a simulation task for simulation processing. The drg payment-based payment calculation tool provides quick and intelligent calculation service for users, and the system mainly pursues the following aims: simplicity, adaptability, scalability. In practical application, a user only needs to import relevant data and then simply click buttons in the above processes, the processes are automatically circulated, code matching, quality control, grouping, measuring, calculating and analyzing are automatically completed, simplicity and convenience are achieved, and a large amount of repeated labor is avoided.
In an embodiment, the named entity, the first recognition model, the second recognition model, and the third recognition model extracted from the medical text data may be stored in a block chain. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In an embodiment, when the first annotation result, the second annotation result, and the third annotation result are different, it may be that some models predict inaccurately, and the other models predict accurately; therefore, when the first annotation result, the second annotation result, and the third annotation result are different, the annotation result corresponding to the character can be further determined as follows.
After the step S4 of determining whether the first labeling result, the second labeling result, and the third labeling result corresponding to each character are the same, the method includes:
step S5a, if the first probability that the character is the third labeling result is predicted by the first recognition model, the second probability that the character is the third labeling result is predicted by the second recognition model, the third probability that the character is the third labeling result is predicted by the third recognition model, and preset weights corresponding to the predicted results of the first recognition model, the second recognition model and the third recognition model are used for calculating the total probability that the character is predicted as the third labeling result;
step S5b, judging whether the total probability is greater than a threshold value, if so, taking the third labeling result as a labeling result corresponding to the character;
and S5c, extracting the named entity in the medical text data according to the labeling result, and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
In this embodiment, since the training samples adopted by the third recognition model are the public data set and the medical field data set, the accuracy of the model recognition can be correspondingly improved, and the accuracy of the prediction result of the third recognition model in the three recognition models is the highest. Therefore, the third labeling result predicted by the third recognition model can be used as a to-be-selected labeling result, and the other two recognition models also have probabilities of correspondingly predicting the character as the third labeling result; therefore, the probabilities that the characters are predicted to be the third labeling results by the three recognition models can be weighted to obtain the total probability that the characters are the third labeling results predicted by the three recognition models. It is understood that the preset weight used in the above-mentioned weighting calculation is preset during the model training.
After the total probability that the character predicted by the three recognition models is the third labeling result is obtained, whether the total probability is greater than a threshold value or not is judged, and if the total probability is greater than the threshold value, the confidence coefficient is higher, so that the third labeling result can be used as the labeling result corresponding to the character. If the confidence coefficient is smaller than the threshold value, the confidence coefficient is low, at this time, a prediction result with a second probability rank can be selected from prediction results obtained by the third recognition model to serve as a marking result to be selected, and then the step of calculating the total probability is carried out, so that a marking result corresponding to the character is obtained.
In an embodiment, before the step S2 of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively, the method includes:
a. sequentially inputting the sample data in the medical field data set into the first identification model, the second identification model and the third identification model for prediction to obtain a labeling result corresponding to each sample data; wherein the sample data comprises a correct labeling result;
b. respectively calculating the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model according to the predicted labeling results corresponding to all the sample data and the correct labeling result of the sample data;
in this embodiment, since the three recognition models have different recognition accuracy rates, sample data in a known medical field data set may be input into the first recognition model, the second recognition model, and the third recognition model to predict a result, and whether the predicted result is consistent with a correct labeling result or not may be determined; and determining the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model according to the same number of the prediction results as the correct labeling results and the total number of the sample data.
c. And calculating the ratio of the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model, and determining the preset weight corresponding to the prediction results of the first recognition model, the second recognition model and the third recognition model according to the ratio.
And the ratio of the preset weights of the prediction results of the first recognition model, the second recognition model and the third recognition model is the ratio of the accuracy rates of the prediction results of the first recognition model, the second recognition model and the third recognition model. For example, if the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model is 0.7, 0.75 and 0.85 respectively, the ratio of the accuracy is 0.7:0.75: 0.85; if the ratio of the preset weights is also 0.7:0.75:0.85, the final preset weights of the prediction results of the first recognition model, the second recognition model and the third recognition model are respectively as follows: 0.3, 0.33, 0.37.
The first recognition model, the second recognition model and the third recognition model have different prediction results with different accuracy rates, and it can be understood that the prediction results have a higher weight ratio as the accuracy rates are higher.
In one embodiment, the step S2 of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively comprises:
s21, training a BilSTM-CRF model based on the public data set to obtain the first recognition model, training the BilSTM-CRF model based on the medical field data set to obtain the second recognition model, and training the BilSTM-CRF model based on the public data set and the medical field data set to obtain a third recognition model;
s22, randomly selecting two models from the first recognition model, the second recognition model and the third recognition model, sequentially selecting one unmarked target data from the unmarked data set, inputting the unmarked target data into the two selected models for prediction, and obtaining the corresponding prediction and marking results of the two models;
and S23, if the corresponding prediction labeling results of the two models are the same, adding the corresponding prediction labeling results to the label-free target data, and inputting the label-free target data to a third model which is not selected for iterative training.
In this embodiment, in order to continue training the first recognition model, the second recognition model, and the third recognition model and make the prediction results of the first recognition model, the second recognition model, and the third recognition model consistent for the same text data, after the first recognition model, the second recognition model, and the third recognition model are obtained by training, two models are randomly selected from the first recognition model, the second recognition model, and the third recognition model, and one label-free target data is sequentially selected from one label-free data set (i.e., an unknown data set without labels added) and input into the two selected models for prediction, so as to obtain the prediction labeling results corresponding to the two models; when the corresponding prediction labeling results of the two models are the same, the confidence degrees of the prediction results of the two models are high; at this time, after adding the corresponding prediction labeling result to the selected label-free target data, inputting the selected label-free target data into a third unselected model for iterative training until the label-free target data in the label-free data set is not updated any more, and then finishing training. After the training, the prediction results of the first recognition model, the second recognition model, and the third recognition model for the same text data may be made to be the same. Moreover, in the training mode, the confidence coefficient of the model is expressed by the voting consistency of the three models, so that the reliability of the model is improved, and the training effect of the model is better; meanwhile, a label-free data set is added to the model training, so that the training data volume is increased, and the model training effect is improved. Preferably, after the three models name entities to the medical text data, the medical text data may be further iteratively trained as training samples of the three models. The training method in the embodiment adopts a part of data sets without labels (namely unknown data sets) for training, which is an innovative semi-supervised training method and increases the training data volume; meanwhile, iterative training is carried out by adopting voting consistency of the three models, and the confidence coefficient of the models is improved.
In an embodiment, before the step S2 of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively, the method includes:
s201, acquiring a preset target text; wherein the target text is text data of a medical field;
s202, adding each sample in the public data set into the target text respectively, generating a public data training text correspondingly, and inputting all the generated public data training texts into the BilSTM-CRF model in sequence to train to obtain the first recognition model;
s203, adding each sample in the medical field data set into the target text respectively, generating a medical data training text correspondingly, and inputting all the generated medical data training texts into the BilSTM-CRF model in sequence to train to obtain the second recognition model;
and S204, iteratively selecting a sample from the public data set and the medical field data set respectively, adding the samples into the target text together, correspondingly generating a target data training text, and inputting all the generated target data training texts into the BilSTM-CRF model in sequence to train to obtain the third recognition model.
In this embodiment, when the first recognition model, the second recognition model and the third recognition model are trained, in order to further improve the labeling accuracy of the models on the medical text data, the training samples of the first recognition model, the second recognition model and the third recognition model are respectively added to the text data of one medical field, and then the text data of the medical field added with the training samples is input into the BiLSTM-CRF model for iterative training to obtain corresponding models; due to the fact that the characteristics of the training samples in the text data of the medical field are mixed in the training process, the model obtained through training has stronger generalization capability in the follow-up prediction of the medical text data, and the model prediction effect is improved.
Referring to fig. 2, an embodiment of the present application further provides a processing apparatus for medical text data, including:
a first acquisition unit 10 for acquiring medical text data;
a first input unit 20 for inputting the medical text data into a first recognition model, a second recognition model and a third recognition model, respectively; the first identification model is obtained by training a BiLSTM-CRF model based on a public data set, the second identification model is obtained by training a BiLSTM-CRF model based on a medical field data set, and the third identification model is obtained by training the BiLSTM-CRF model based on the public data set and the medical field data set;
the prediction unit 30 is used for predicting a first probability that each character in the medical text data corresponds to each label through the first recognition model; predicting a second probability that each character in the medical text data corresponds to each label through the second recognition model; predicting a third probability that each character in the medical text data corresponds to each label through the third recognition model; wherein the label with the first highest probability is used as a first label result of the character predicted by the first recognition model, the label with the second highest probability is used as a first label result of the character predicted by the second recognition model, and the label with the third highest probability is used as a third label result of the character predicted by the third recognition model;
a determining unit 40, configured to determine whether the first labeling result, the second labeling result, and the third labeling result corresponding to each character are the same;
a first determining unit 50, configured to determine the first labeling result as a labeling result corresponding to the character if the first labeling result, the second labeling result, and the third labeling result are the same;
the first processing unit 60 is configured to extract a named entity in the medical text data according to the labeling result, and input the named entity into a payment calculation tool for payment calculation processing.
In one embodiment, the processing device of medical text data further comprises:
if the first recognition model and the second recognition model are different, predicting a first probability that the character is the third labeling result according to the first recognition model, predicting a second probability that the character is the third labeling result according to the second recognition model, predicting a third probability that the character is the third labeling result according to the third recognition model, and calculating a total probability that the character is predicted as the third labeling result according to preset weights corresponding to prediction results of the first recognition model, the second recognition model and the third recognition model;
a second determining unit, configured to determine whether the total probability is greater than a threshold, and if so, take the third labeling result as a labeling result corresponding to the character;
and the second processing unit is used for extracting the named entity in the medical text data according to the labeling result and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
In one embodiment, the processing device of medical text data further comprises:
the second input unit is used for sequentially inputting the sample data in the medical field data set into the first recognition model, the second recognition model and the third recognition model for prediction to obtain a labeling result corresponding to each sample data; wherein the sample data comprises a correct labeling result;
the second calculation unit is used for respectively calculating the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model according to the predicted labeling results corresponding to all the sample data and the correct labeling result of the sample data;
and the third calculating unit is used for calculating the ratio of the accuracy rates of the prediction results of the first recognition model, the second recognition model and the third recognition model and determining the preset weights corresponding to the prediction results of the first recognition model, the second recognition model and the third recognition model according to the ratio.
In one embodiment, the apparatus for processing medical text data further includes:
the pre-training unit is used for training a BilSTM-CRF model based on the public data set to obtain the first recognition model, training the BilSTM-CRF model based on the medical field data set to obtain the second recognition model, and training the BilSTM-CRF model based on the public data set and the medical field data set to obtain a third recognition model;
the selection unit is used for randomly selecting two models from the first identification model, the second identification model and the third identification model, and sequentially selecting one unmarked target data from the unmarked data set to input the unmarked target data into the two selected models for prediction to obtain the corresponding prediction and marking results of the two models;
and the iterative training unit is used for adding the corresponding prediction labeling result to the label-free target data and inputting the label-free target data to a third unselected model for iterative training if the prediction labeling results corresponding to the two models are the same.
In one embodiment, the apparatus for processing medical text data further includes:
the second acquisition unit is used for acquiring a preset target text; wherein the target text is text data of a medical field;
the first training unit is used for respectively adding each sample in the public data set into the target text, respectively and correspondingly generating a public data training text, and sequentially inputting all the generated public data training texts into the BilTM-CRF model for training to obtain the first recognition model;
the second training unit is used for respectively adding each sample in the medical field data set into the target text, respectively and correspondingly generating a medical data training text, and sequentially inputting all the generated medical data training texts into the BilTM-CRF model for training to obtain the second recognition model;
and the third training unit is used for iteratively selecting a sample from the public data set and the medical field data set respectively, adding the samples into the target text together, correspondingly generating a target data training text, and inputting all the generated target data training texts into the BilSTM-CRF model in sequence to train to obtain the third recognition model.
In this embodiment, please refer to the method described in the above embodiment for specific implementation of each unit in the above apparatus embodiment, which is not described herein again.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing medical text data and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of processing medical textual data.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.
An embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a method of processing medical text data. It is to be understood that the computer-readable storage medium in the present embodiment may be a volatile-readable storage medium or a non-volatile-readable storage medium.
In summary, the method, the apparatus, the computer device and the storage medium for processing medical text data provided in the embodiments of the present application include: acquiring medical text data; inputting the medical text data into a first recognition model, a second recognition model and a third recognition model respectively; wherein, the training samples of the three models are different; predicting a first probability that each character in the medical text data corresponds to each label through the first recognition model; predicting a second probability that each character in the medical text data corresponds to each label through the second recognition model; predicting a third probability that each character in the medical text data corresponds to each label through the third recognition model; respectively judging whether the first labeling result, the second labeling result and the third labeling result corresponding to each character are the same; when the labeling results are the same, determining the first labeling result as the labeling result corresponding to the character; and according to the labeling result, extracting the named entity in the medical text data, and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing. According to the method and the device, the accuracy of model prediction needs to be improved through the prediction consistency of a plurality of models, so that the accuracy of named entity identification is improved, and the payment measurement and calculation tool can be used for accurately measuring and calculating.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only for the preferred embodiment of the present application and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims (10)

1. A method for processing medical text data, comprising the steps of:
acquiring medical text data;
inputting the medical text data into a first recognition model, a second recognition model and a third recognition model respectively; the first identification model is obtained by training a BiLSTM-CRF model based on a public data set, the second identification model is obtained by training a BiLSTM-CRF model based on a medical field data set, and the third identification model is obtained by training the BiLSTM-CRF model based on the public data set and the medical field data set;
predicting a first probability that each character in the medical text data corresponds to each label through the first recognition model; predicting a second probability that each character in the medical text data corresponds to each label through the second recognition model; predicting a third probability that each character in the medical text data corresponds to each label through the third recognition model; wherein the label with the first highest probability is used as a first label result of the character predicted by the first recognition model, the label with the second highest probability is used as a first label result of the character predicted by the second recognition model, and the label with the third highest probability is used as a third label result of the character predicted by the third recognition model;
respectively judging whether the first labeling result, the second labeling result and the third labeling result corresponding to each character are the same;
if the first labeling result is the same as the character labeling result, determining the first labeling result as the labeling result corresponding to the character;
and according to the labeling result, extracting the named entity in the medical text data, and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
2. The method for processing medical text data according to claim 1, wherein the step of determining whether the first labeling result, the second labeling result, and the third labeling result corresponding to each character are the same comprises:
if not, calculating the total probability of the character being predicted as the third labeling result according to the first probability of the first recognition model predicting the character as the third labeling result, the second probability of the second recognition model predicting the character as the third labeling result, the third probability of the third recognition model predicting the character as the third labeling result, and preset weights corresponding to the predicted results of the first recognition model, the second recognition model and the third recognition model;
judging whether the total probability is greater than a threshold value, if so, taking the third labeling result as a labeling result corresponding to the character;
and according to the labeling result, extracting the named entity in the medical text data, and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
3. The method for processing medical text data according to claim 2, wherein the step of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively is preceded by:
sequentially inputting the sample data in the medical field data set into the first identification model, the second identification model and the third identification model for prediction to obtain a labeling result corresponding to each sample data; wherein the sample data comprises a correct labeling result;
respectively calculating the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model according to the predicted labeling results corresponding to all the sample data and the correct labeling result of the sample data;
and calculating the ratio of the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model, and determining the preset weight corresponding to the prediction results of the first recognition model, the second recognition model and the third recognition model according to the ratio.
4. The method for processing medical text data according to claim 1, wherein the step of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively is preceded by:
training a BilSTM-CRF model based on the public data set to obtain the first recognition model, training the BilSTM-CRF model based on the medical field data set to obtain the second recognition model, and training the BilSTM-CRF model based on the public data set and the medical field data set to obtain a third recognition model;
randomly selecting two models from the first identification model, the second identification model and the third identification model, and sequentially selecting one unmarked target data from the unmarked data set to input the unmarked target data into the two selected models for prediction to obtain the corresponding prediction and marking results of the two models;
and if the corresponding prediction labeling results of the two models are the same, adding the corresponding prediction labeling result to the non-labeling target data, and inputting the non-labeling target data to a third model which is not selected for iterative training.
5. The method for processing medical text data according to claim 1, wherein the step of inputting the medical text data into the first recognition model, the second recognition model and the third recognition model respectively is preceded by:
acquiring a preset target text; wherein the target text is text data of a medical field;
adding each sample in the public data set into the target text respectively, generating a public data training text correspondingly, and inputting all the generated public data training texts into the BilSTM-CRF model in sequence to train to obtain the first recognition model;
adding each sample in the medical field data set into the target text respectively, generating a medical data training text correspondingly, and inputting all the generated medical data training texts into the BilSTM-CRF model in sequence to train to obtain the second recognition model;
and iteratively selecting a sample from the public data set and the medical field data set respectively, adding the samples to the target text together, correspondingly generating a target data training text, and sequentially inputting all the generated target data training texts into the BilTM-CRF model for training to obtain the third recognition model.
6. An apparatus for processing medical text data, comprising:
a first acquisition unit configured to acquire medical text data;
the first input unit is used for inputting the medical text data into a first recognition model, a second recognition model and a third recognition model respectively; the first identification model is obtained by training a BiLSTM-CRF model based on a public data set, the second identification model is obtained by training a BiLSTM-CRF model based on a medical field data set, and the third identification model is obtained by training the BiLSTM-CRF model based on the public data set and the medical field data set;
the prediction unit is used for predicting a first probability that each character in the medical text data corresponds to each label through the first recognition model; predicting a second probability that each character in the medical text data corresponds to each label through the second recognition model; predicting a third probability that each character in the medical text data corresponds to each label through the third recognition model; wherein the label with the first highest probability is used as a first label result of the character predicted by the first recognition model, the label with the second highest probability is used as a first label result of the character predicted by the second recognition model, and the label with the third highest probability is used as a third label result of the character predicted by the third recognition model;
the judging unit is used for respectively judging whether the first labeling result, the second labeling result and the third labeling result corresponding to each character are the same or not;
a first determining unit, configured to determine the first labeling result as a labeling result corresponding to the character if the first labeling result, the second labeling result, and the third labeling result are the same;
and the first processing unit is used for extracting the named entity in the medical text data according to the labeling result and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
7. The apparatus for processing medical text data according to claim 6, further comprising:
if the first recognition model and the second recognition model are different, predicting a first probability that the character is the third labeling result according to the first recognition model, predicting a second probability that the character is the third labeling result according to the second recognition model, predicting a third probability that the character is the third labeling result according to the third recognition model, and calculating a total probability that the character is predicted as the third labeling result according to preset weights corresponding to prediction results of the first recognition model, the second recognition model and the third recognition model;
a second determining unit, configured to determine whether the total probability is greater than a threshold, and if so, take the third labeling result as a labeling result corresponding to the character;
and the second processing unit is used for extracting the named entity in the medical text data according to the labeling result and inputting the named entity into a payment measuring and calculating tool for payment measuring and calculating processing.
8. The apparatus for processing medical text data according to claim 7, further comprising:
the second input unit is used for sequentially inputting the sample data in the medical field data set into the first recognition model, the second recognition model and the third recognition model for prediction to obtain a labeling result corresponding to each sample data; wherein the sample data comprises a correct labeling result;
the second calculation unit is used for respectively calculating the accuracy of the prediction results of the first recognition model, the second recognition model and the third recognition model according to the predicted labeling results corresponding to all the sample data and the correct labeling result of the sample data;
and the third calculating unit is used for calculating the ratio of the accuracy rates of the prediction results of the first recognition model, the second recognition model and the third recognition model and determining the preset weights corresponding to the prediction results of the first recognition model, the second recognition model and the third recognition model according to the ratio.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any of claims 1 to 5.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202010583894.7A 2020-06-23 2020-06-23 Method and device for processing medical text data, computer equipment and storage medium Active CN111797629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010583894.7A CN111797629B (en) 2020-06-23 2020-06-23 Method and device for processing medical text data, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010583894.7A CN111797629B (en) 2020-06-23 2020-06-23 Method and device for processing medical text data, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111797629A true CN111797629A (en) 2020-10-20
CN111797629B CN111797629B (en) 2022-07-29

Family

ID=72804547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010583894.7A Active CN111797629B (en) 2020-06-23 2020-06-23 Method and device for processing medical text data, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111797629B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112420205A (en) * 2020-12-08 2021-02-26 医惠科技有限公司 Entity recognition model generation method and device and computer readable storage medium
CN113139072A (en) * 2021-04-20 2021-07-20 苏州挚途科技有限公司 Data labeling method and device and electronic equipment
CN113241138A (en) * 2021-06-21 2021-08-10 中国平安人寿保险股份有限公司 Medical event information extraction method and device, computer equipment and storage medium
CN113724819A (en) * 2021-08-31 2021-11-30 平安国际智慧城市科技股份有限公司 Training method, device, equipment and medium for medical named entity recognition model
CN114169338A (en) * 2022-02-10 2022-03-11 北京智源人工智能研究院 Medical named entity identification method and device and electronic equipment
CN117093920A (en) * 2023-10-20 2023-11-21 四川互慧软件有限公司 User DRGs grouping method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330011A (en) * 2017-06-14 2017-11-07 北京神州泰岳软件股份有限公司 The recognition methods of the name entity of many strategy fusions and device
CN108959252A (en) * 2018-06-28 2018-12-07 中国人民解放军国防科技大学 Semi-supervised Chinese named entity recognition method based on deep learning
WO2019071661A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic apparatus, medical text entity name identification method, system, and storage medium
CN109902307A (en) * 2019-03-15 2019-06-18 北京金山数字娱乐科技有限公司 Name the training method and device of entity recognition method, Named Entity Extraction Model
CN110704633A (en) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 Named entity recognition method and device, computer equipment and storage medium
CN110705293A (en) * 2019-08-23 2020-01-17 中国科学院苏州生物医学工程技术研究所 Electronic medical record text named entity recognition method based on pre-training language model
CN110825827A (en) * 2019-11-13 2020-02-21 北京明略软件系统有限公司 Entity relationship recognition model training method and device and entity relationship recognition method and device
CN111274820A (en) * 2020-02-20 2020-06-12 齐鲁工业大学 Intelligent medical named entity identification method and device based on neural network
WO2020119075A1 (en) * 2018-12-10 2020-06-18 平安科技(深圳)有限公司 General text information extraction method and apparatus, computer device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330011A (en) * 2017-06-14 2017-11-07 北京神州泰岳软件股份有限公司 The recognition methods of the name entity of many strategy fusions and device
WO2019071661A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic apparatus, medical text entity name identification method, system, and storage medium
CN108959252A (en) * 2018-06-28 2018-12-07 中国人民解放军国防科技大学 Semi-supervised Chinese named entity recognition method based on deep learning
WO2020119075A1 (en) * 2018-12-10 2020-06-18 平安科技(深圳)有限公司 General text information extraction method and apparatus, computer device and storage medium
CN109902307A (en) * 2019-03-15 2019-06-18 北京金山数字娱乐科技有限公司 Name the training method and device of entity recognition method, Named Entity Extraction Model
CN110705293A (en) * 2019-08-23 2020-01-17 中国科学院苏州生物医学工程技术研究所 Electronic medical record text named entity recognition method based on pre-training language model
CN110704633A (en) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 Named entity recognition method and device, computer equipment and storage medium
CN110825827A (en) * 2019-11-13 2020-02-21 北京明略软件系统有限公司 Entity relationship recognition model training method and device and entity relationship recognition method and device
CN111274820A (en) * 2020-02-20 2020-06-12 齐鲁工业大学 Intelligent medical named entity identification method and device based on neural network

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112420205A (en) * 2020-12-08 2021-02-26 医惠科技有限公司 Entity recognition model generation method and device and computer readable storage medium
CN113139072A (en) * 2021-04-20 2021-07-20 苏州挚途科技有限公司 Data labeling method and device and electronic equipment
CN113241138A (en) * 2021-06-21 2021-08-10 中国平安人寿保险股份有限公司 Medical event information extraction method and device, computer equipment and storage medium
CN113241138B (en) * 2021-06-21 2022-06-17 中国平安人寿保险股份有限公司 Medical event information extraction method and device, computer equipment and storage medium
CN113724819A (en) * 2021-08-31 2021-11-30 平安国际智慧城市科技股份有限公司 Training method, device, equipment and medium for medical named entity recognition model
CN113724819B (en) * 2021-08-31 2024-04-26 平安国际智慧城市科技股份有限公司 Training method, device, equipment and medium for medical named entity recognition model
CN114169338A (en) * 2022-02-10 2022-03-11 北京智源人工智能研究院 Medical named entity identification method and device and electronic equipment
CN117093920A (en) * 2023-10-20 2023-11-21 四川互慧软件有限公司 User DRGs grouping method
CN117093920B (en) * 2023-10-20 2024-01-23 四川互慧软件有限公司 User DRGs grouping method

Also Published As

Publication number Publication date
CN111797629B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN111797629B (en) Method and device for processing medical text data, computer equipment and storage medium
CN109992664B (en) Dispute focus label classification method and device, computer equipment and storage medium
CN112860841B (en) Text emotion analysis method, device, equipment and storage medium
CN109783785B (en) Method and device for generating experiment detection report and computer equipment
CN111651992A (en) Named entity labeling method and device, computer equipment and storage medium
CN110797101B (en) Medical data processing method, medical data processing device, readable storage medium and computer equipment
CN111767707A (en) Method, device, equipment and storage medium for detecting Rayleigh case
CN110688853B (en) Sequence labeling method and device, computer equipment and storage medium
CN111460290B (en) Information recommendation method, device, equipment and storage medium
CN112101550B (en) Triage fusion model training method, triage device, triage equipment and medium
CN111368175B (en) Event extraction method and system and entity classification model
CN112633002A (en) Sample labeling method, model training method, named entity recognition method and device
CN111552811B (en) Method, device, computer equipment and storage medium for information completion in knowledge graph
CN115409111A (en) Training method of named entity recognition model and named entity recognition method
CN113707296B (en) Medical scheme data processing method, device, equipment and storage medium
CN113627159B (en) Training data determining method, device, medium and product of error correction model
CN113626591A (en) Electronic medical record data quality evaluation method based on text classification
CN116503031B (en) Personnel similarity calculation method, device, equipment and medium based on resume analysis
CN111552810B (en) Entity extraction and classification method, entity extraction and classification device, computer equipment and storage medium
CN112036151A (en) Method and device for constructing gene disease relation knowledge base and computer equipment
CN115545035B (en) Text entity recognition model and construction method, device and application thereof
CN116860964A (en) User portrait analysis method, device and server based on medical management label
CN113282837B (en) Event analysis method, device, computer equipment and storage medium
CN114840642A (en) Event extraction method, device, equipment and storage medium
CN114238597A (en) Information extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant