CN113591458A

CN113591458A - Medical term processing method, device, equipment and storage medium based on neural network

Info

Publication number: CN113591458A
Application number: CN202110865296.3A
Authority: CN
Inventors: 张颖
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-11-02
Anticipated expiration: 2041-07-29
Also published as: CN113591458B

Abstract

The application relates to the field of data processing, in particular to a medical term processing method, a device, equipment and a storage medium based on a neural network, wherein the method comprises the following steps: determining a first similarity between the medical term to be processed and each standard medical term in a preset standard medical term library; acquiring a plurality of first candidate standard medical terms of the medical term to be processed according to the first similarity; inputting the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model to obtain target probabilities corresponding to the first candidate standard medical terms; inputting medical terms to be processed into a preset term number prediction model to obtain the number of standard medical terms; and determining the standard medical terms corresponding to the medical terms to be processed according to the number of the standard medical terms and the target probability. The method improves the accuracy of medical term standardization. The present application also relates to the field of blockchains, in which the above-mentioned standardized model of medical terminology may be stored.

Description

Medical term processing method, device, equipment and storage medium based on neural network

Technical Field

The present application relates to the field of data processing, and in particular, to a medical term processing method, apparatus, device, and storage medium based on a neural network.

Background

Medical terminology is a term of art in the medical field used to refer to various things, phenomena, characteristics, relationships, processes, etc. in the medical field. These medical terms are essential components of clinical information systems to convey medical information. However, with regard to the same diagnosis, operation, medicine, examination, assay, symptom, etc., medical staff often have hundreds of different expressions, which brings great difficulty to the subsequent work of analyzing medical record data, etc., and the original medical term can be expressed differently to be converted into the standard medical term by the medical term standardization task.

At present, the task of standardizing medical terms is mainly realized based on a rule matching method and a single matching model, but only one-to-one conversion between original medical terms and standard medical terms can be realized based on the rule matching method and the single matching model, so that the method is not suitable for a scene that the original medical terms correspond to a plurality of standard medical terms, and the accuracy of standardizing medical terms is low. Therefore, how to improve the accuracy of the standardization of medical terms is a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a medical term processing method, a medical term processing device, medical term processing equipment and a storage medium based on a neural network, and aims to improve the accuracy of medical term standardization.

In a first aspect, an embodiment of the present application provides a medical term processing method based on a neural network, including:

acquiring medical terms to be processed, and determining a first similarity between the medical terms to be processed and each standard medical term in a preset standard medical term library;

acquiring a plurality of first candidate standard medical terms of the medical term to be processed from the preset standard medical term library according to the first similarity;

inputting the first candidate standard medical term and the medical term to be processed into a preset medical term standardization model to obtain a target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed;

inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed;

determining a standard medical term corresponding to the medical term to be processed from the plurality of candidate standard medical terms according to the number of standard medical terms and the target probability.

In a second aspect, embodiments of the present application further provide a medical term processing apparatus, including:

the system comprises a determining module, a judging module and a judging module, wherein the determining module is used for acquiring medical terms to be processed and determining a first similarity between the medical terms to be processed and each standard medical term in a preset standard medical term library;

an obtaining module, configured to obtain, according to the first similarity, a plurality of first candidate standard medical terms of the medical term to be processed from the preset standard medical term library;

a probability prediction module, configured to input the first candidate standard medical term and the to-be-processed medical term into a preset medical term standardization model, and obtain a target probability that each of the first candidate standard medical terms is a standard medical term corresponding to the to-be-processed medical term;

the number prediction module is used for inputting the medical term to be processed into a preset term number prediction model to obtain the standard medical term number of the medical term to be processed;

the determining module is further configured to determine a standard medical term corresponding to the medical term to be processed from the plurality of first candidate standard medical terms according to the number of standard medical terms and the target probability.

In a third aspect, the present application further provides a computer device, which includes a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the neural network-based medical term processing method as described above.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the neural network-based medical term processing method as described above.

The embodiment of the application provides a medical term processing method, a device, equipment and a storage medium based on a neural network, the method obtains a plurality of first candidate standard medical terms of medical terms to be processed from a preset standard medical term library through a first similarity between the medical terms to be processed and each standard medical term in the preset standard medical term library, then inputs the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model to obtain target probabilities that each first candidate standard medical term is the standard medical term corresponding to the medical terms to be processed, then inputs the medical terms to be processed into a preset term number prediction model to determine the number of the standard medical terms of the medical terms to be processed, and finally determines the target probabilities corresponding to each first candidate standard medical term according to the number of the standard medical terms and the target probabilities corresponding to each first candidate standard medical term, the standard medical terms corresponding to the medical terms to be processed are determined, so that not only can one-to-one conversion between the original medical terms and the standard medical terms be realized, but also the scene that the original medical terms correspond to a plurality of standard medical terms is adapted, and the accuracy of medical term standardization is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a neural network-based medical term processing method provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of another neural network-based medical term processing method provided by an embodiment of the present application;

FIG. 3 is a schematic block diagram of a medical term processing apparatus provided by an embodiment of the present application;

FIG. 4 is a schematic block diagram of another medical term processing apparatus provided by an embodiment of the present application;

FIG. 5 is a schematic block diagram of sub-modules of the medical term processing apparatus of FIG. 4;

fig. 6 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further described with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The embodiment of the application provides a medical term processing method, a medical term processing device, medical term processing equipment and a storage medium based on a neural network. The medical term processing method can be applied to terminal equipment or a server, the terminal equipment can be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device and the like, and the server can be a single server or a server cluster consisting of a plurality of servers.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a neural network-based medical term processing method according to an embodiment of the present application.

As shown in fig. 1, the medical term processing method includes steps S101 to S105.

Step S101, obtaining medical terms to be processed, and determining a first similarity between the medical terms to be processed and each standard medical term in a preset standard medical term library.

Illustratively, the medical term to be processed input by a user or sent by the terminal device is obtained. The medical term to be processed may be a clinical medical term, for example, a medical term actually used by a doctor in a medical order and/or a medical record, or a medical term used in clinical medical information statistics. It is understood that the medical terms to be processed may vary from person to person, and that different medical practitioners may use different medical expressions for the same medical information. The medical information may be, for example, disease information, symptom information, examination and examination information, or the like. The predetermined standard medical term library includes a plurality of standard medical terms, and the predetermined standard medical term library may include standard medical terms included in the ICD for international disease classification.

For example, determining a first similarity between the medical term to be processed and each standard medical term in the preset standard medical term library may include: and determining a first similarity between the medical term to be processed and each standard medical term in a preset standard medical term library based on a preset similarity algorithm. The preset similarity algorithm may be set based on an actual situation, which is not specifically limited in this embodiment. For example, the similarity algorithm may include: cosine similarity algorithm, pearson correlation coefficient algorithm, Jaccard similarity coefficient algorithm, Tanimoto coefficient algorithm, log likelihood similarity algorithm, word frequency-inverse document frequency algorithm. The similarity between the medical term to be processed and the standard medical term can be accurately calculated by a similarity calculation method.

Step S102, a plurality of first candidate standard medical terms of the medical terms to be processed are obtained from a preset standard medical term library according to the first similarity.

Illustratively, each standard medical term is sorted according to a first similarity between the medical term to be processed and each standard medical term, so as to obtain a standard medical term queue; and taking the standard medical term with the arrangement order in the standard medical term queue before the preset arrangement order as the first candidate standard medical term of the medical term to be processed. The preset arrangement order may be set based on actual conditions, which is not specifically limited in this embodiment. The standard medical terms are ranked based on the similarity, and a plurality of candidate standard medical terms in front can be conveniently acquired.

For example, the higher the first degree of similarity between the standard medical term and the medical term to be processed, the more forward the arrangement order of the standard medical term in the standard medical term queue, and the lower the first degree of similarity between the standard medical term and the medical term to be processed, the more backward the arrangement order of the standard medical term in the standard medical term queue. The preset ranking order is 6, and the first 5 standard medical terms in the standard medical term queue are used as the first candidate standard medical terms of the medical terms to be processed.

Illustratively, the standard medical term having the first similarity greater than or equal to the preset similarity is determined as the first candidate standard medical term of the medical term to be processed. The preset similarity may be set based on an actual situation, which is not specifically limited in this embodiment. For example, if the preset similarity is 80% and the first similarities between the medical term to be processed and the standard medical term a, the standard medical term B, the standard medical term C, the standard medical term D, and the standard medical term E are 90%, 70%, 85%, 50%, and 92%, respectively, the standard medical term a, the standard medical term C, and the standard medical term E are determined as the first candidate standard medical term for the medical term to be processed.

Step S103, inputting the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model, and obtaining target probabilities that the first candidate standard medical terms are the standard medical terms corresponding to the medical terms to be processed.

The preset medical term standardization model is obtained by performing iterative training on a BERT-based binary classification model in advance based on a positive sample data set and a negative sample data set, the positive sample data in the positive sample data set comprises an original medical term and one or more standard medical terms corresponding to the original medical term, the negative sample data in the negative sample data set comprises the original medical term and standard medical terms not corresponding to the original medical term, and the difference value of the number of samples between the positive sample data set and the negative sample data set is smaller than or equal to a preset threshold value.

Exemplarily, combining the medical term to be processed with each first candidate standard medical term respectively to obtain a plurality of model input data; and inputting the model input data into the medical term standardization model to obtain the target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed. For example, if the first candidate standard medical term of the medical term a to be processed includes the standard medical term a, the standard medical term C and the standard medical term E, three model input data, which are [ a, a ], [ a, C ] and [ a, E ], can be combined to input the [ a, a ], [ a, C ] and [ a, E ] into the medical term standardized model, respectively, and the target probabilities of the standard medical term a, the standard medical term C and the standard medical term E being the standard medical term corresponding to the medical term a to be processed can be obtained as 0.85, 0.7 and 0.45.

And S104, inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed.

The preset term number prediction model is obtained by performing iterative training on the neural network model based on a target sample data set, where the sample data in the target sample data set may include original medical terms and standard medical term numbers corresponding to the original medical terms.

For example, the medical term to be treated is "laryngeal carcinoma", and the standard medical term corresponding to "laryngeal carcinoma" is "laryngeal carcinoma", that is, the number of the standard medical terms for "laryngeal carcinoma" is 1. As another example, the medical term to be treated is "right pleural thickening and calcification", "right pleural thickening and calcification" and the corresponding standard medical terms include "pleural calcification" and "pleural hypertrophy", that is, the number of the standard medical terms for "right pleural thickening and calcification" is 2.

And S105, determining a standard medical term corresponding to the medical term to be processed from the plurality of first candidate standard medical terms according to the number of the standard medical terms and the target probability.

Illustratively, each first candidate standard medical term is sorted according to the target probability corresponding to each first candidate standard medical term, so as to obtain a candidate standard medical term queue; acquiring a target arrangement sequence corresponding to the number of standard medical terms; and determining a first candidate standard medical term with the arrangement order in the candidate standard medical term queue being positioned before the target arrangement order as the standard medical term corresponding to the medical term to be processed.

For example, the standard medical term a, the standard medical term C, and the standard medical term E are standard medical terms corresponding to the medical term to be processed with target probabilities of 0.85, 0.7, and 0.45, the generated candidate standard medical term array is [ a, C, E ], if the number of the standard medical terms is 1, the target arrangement order is 2, the standard medical term a located before the target arrangement order 2 is determined as the standard medical term corresponding to the medical term to be processed, and if the number of the standard medical terms is 2, the target arrangement order is 3, the standard medical term a and the standard medical term C located before the target arrangement order 3 are determined as the standard medical term corresponding to the medical term to be processed.

In the medical term processing method provided by the above embodiment, a plurality of first candidate standard medical terms of the medical term to be processed are obtained from the preset standard medical term library through the first similarity between the medical term to be processed and each standard medical term in the preset standard medical term library, then the first candidate standard medical terms and the medical term to be processed are input into the preset medical term standardized model to obtain the target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed, then the medical term to be processed is input into the preset term number prediction model to determine the number of the standard medical terms of the medical term to be processed, and finally the standard medical term corresponding to the medical term to be processed is determined according to the number of the standard medical terms and the target probability corresponding to each first candidate standard medical term, the method not only can realize one-to-one conversion of the original medical terms and the standard medical terms, but also is suitable for the scene that the original medical terms correspond to a plurality of standard medical terms, and the accuracy of medical term standardization is greatly improved.

Referring to fig. 2, fig. 2 is a schematic flow chart of another neural network-based medical term processing method according to an embodiment of the present application.

As shown in fig. 2, the medical term processing method includes steps S201 to S209.

Step S201, acquiring a sample data set of a positive example.

The normal sample data in the normal sample data set includes an original medical term and a standard medical term corresponding to the original medical term, and the standard medical term corresponding to the original medical term may be one or multiple.

Step S202, generating a negative sample data set according to the positive sample data set and a preset standard medical term library.

The negative sample data in the negative sample data set includes an original medical term and a standard medical term that does not correspond to the original medical term, a difference value between the number of samples in the target positive sample data set and the number of samples in the negative sample data set is not greater than a preset threshold, and the preset threshold may be set based on an actual situation, which is not specifically limited in this embodiment. For example, the preset threshold is 5.

In one embodiment, a second similarity between an original medical term in the normal sample data and each standard medical term in the preset standard medical term library is determined; acquiring a plurality of second candidate standard medical terms corresponding to the original medical terms from a preset standard medical term library according to a second similarity between the original medical terms and each standard medical term; removing standard medical terms corresponding to the original medical terms from the plurality of second candidate standard medical terms to obtain a non-standard medical term set corresponding to the original medical terms; and generating a negative sample data set according to the non-standard medical term set corresponding to each original medical term and each original medical term. Wherein the set of non-standard medical terms includes a plurality of standard medical terms that do not correspond to the original medical terms. The negative sample data set can be generated through the positive sample data set, the negative sample data does not need to be artificially marked, and the convenience of sample data generation is improved.

For example, a piece of positive example sample data is [ b, c1, c2, c3], that is, the standard medical term corresponding to the original medical term b is the standard medical term c1, the standard medical term c2 and the standard medical term c3, and through the similarity between the original medical term b and each standard medical term, the obtained plurality of second candidate standard medical terms corresponding to the original medical term b are [ c1, c2, c3, c4, c5], then [ c1, c2, c3] is removed, the set of non-standard medical terms corresponding to the original medical term b can be obtained as [ c4, c5], and therefore, negative example sample data [ b, c4] and [ b, c5] can be generated.

And S203, expanding the normal sample data set to obtain a target normal sample data set.

Exemplarily, a first sample number of a positive sample data set and a second sample number of a negative sample data set are determined; determining the expansion rate of the positive sample data set according to the number of the second samples and the number of the first samples; and carrying out expansion processing on the positive sample data set according to the expansion multiplying power to obtain a target positive sample data set. The difference value of the number of samples between the target positive sample data set and the target negative sample data set is not greater than a preset threshold, and the preset threshold may be set based on an actual situation, which is not specifically limited in this embodiment, for example, the preset threshold is 5. The positive sample data set is expanded through the expansion multiplying power, so that the number of the samples of the target positive sample data set and the target negative sample data set obtained through expansion is balanced, and the accuracy of the model can be improved during subsequent model training.

For example, according to the number of the second samples and the number of the first samples, the manner of determining the expansion ratio of the positive sample data set may be: and determining the ratio of the second sample number to the first sample number as the expansion magnification of the positive sample data set. For example, the positive sample data set includes 50 positive sample data, the negative sample data set includes 500 negative sample data, and the expansion magnification of the positive sample data set is 500/50 ═ 10.

For example, according to the number of the second samples and the number of the first samples, the manner of determining the expansion ratio of the positive sample data set may be: and determining the difference value of the second sample number and the first sample number to obtain the sample number difference value, and determining the ratio of the sample number difference value to the first sample number as the expansion magnification of the positive sample data set. For example, the positive sample data set includes 50 positive sample data, the negative sample data set includes 500 negative sample data, the sample number difference is 450, and the expansion magnification of the positive sample data set is 450/50-9.

For example, the manner of obtaining the target normal sample data set by performing the expansion processing on the normal sample data set according to the expansion magnification may be: and copying the positive sample data set according to the expansion magnification to obtain newly added positive sample data, and merging the newly added positive sample data to obtain a target positive sample data set. For example, if the expansion magnification is 10, and the positive sample data set includes 50 positive sample numbers, the 50 positive sample numbers are copied 10 times, and finally, the positive sample data obtained by copying 10 times are combined, so as to obtain the target positive sample data set.

For example, the manner of obtaining the target normal sample data set by performing the expansion processing on the normal sample data set according to the expansion magnification may be: randomly sampling the normal sample data with a preset proportion from the normal sample data set until the sampling times reach the expansion multiplying power to obtain newly-added normal sample data, and combining the newly-added normal sample data and the normal sample data set to obtain a target normal sample data set. For example, the expansion magnification is 10, the positive sample data set includes 50 positive sample numbers, 90% (45) of the positive sample data are randomly sampled from the 50 positive sample data each time, the total sampling is 10 times, 450 newly added positive sample data are obtained, and finally the 450 newly added positive sample data and the positive sample data set are combined, so that the target positive sample data set including 500 positive sample data is obtained.

And S204, performing iterative training on the BERT-based binary classification model according to the target positive sample data set and the target negative sample data set to obtain a medical term standardized model.

Illustratively, target sample data is alternately selected from a target positive sample data set and a target negative sample data set, and the target sample data is input into a BERT-based binary classification model to obtain a first output probability that the target sample data is a positive sample and a second output probability that the target sample data is a negative sample; determining a model loss value according to a first output probability that target sample data is a positive sample and a second output probability that the target sample data is a negative sample, wherein the sample type of the target sample data comprises the negative sample or the positive sample; and if the model loss value is larger than a preset first loss value, updating model parameters of the BERT-based binary model, and continuing to select target sample data to train the updated BERT-based binary model until the model converges to obtain the medical term standardized model. The preset first loss value may be set based on an actual situation, which is not specifically limited in this embodiment.

For example, according to a first output probability that the target sample data is a positive example and a second output probability that the target sample data is a negative example, the mode of determining the model loss value may be: and acquiring a preset loss function, and determining a model loss value according to the first output probability and the second output probability based on the preset loss function. The preset loss function may be set based on actual conditions, for example, the preset loss function may be a negative log-likelihood function.

In one embodiment, counting the number of standard medical terms of each piece of positive example data in the target positive example data set; combining the original medical terms in each sample data of the positive case with the number of the corresponding standard medical terms to obtain a target sample data set; acquiring sample data from a target sample data set, wherein the sample data comprises real values of the number of original medical terms and standard medical terms; inputting original medical terms in sample data into a preset neural network model to obtain a predicted value of the number of standard medical terms; determining a model loss value according to the true value and the predicted value, and determining whether the preset neural network model converges according to the model loss value; and if the preset neural network model is not converged, updating the parameters of the preset neural network model, and continuing to train the updated preset neural network model until the term number prediction model is obtained through convergence.

For example, the method of determining the loss value according to the real value and the predicted value, and determining whether the preset neural network model converges according to the loss value may be: determining the square difference between the real value and the predicted value, and determining the square difference between the real value and the predicted value as a loss value of a preset neural network model; and if the loss value is greater than or equal to a preset second loss value, determining that the preset neural network model is not converged, and if the loss value is less than the preset second loss value, determining that the preset neural network model is converged. The preset second loss value may be set based on an actual situation, which is not specifically limited in this embodiment.

Step S205, acquiring the medical term to be processed, and determining a first similarity between the medical term to be processed and each standard medical term in a preset standard medical term library.

For example, determining a first similarity between the medical term to be processed and each standard medical term in the preset standard medical term library may include: and determining the similarity between the medical term to be processed and each standard medical term in the preset standard medical term library based on a preset similarity algorithm. The preset similarity algorithm may be set based on an actual situation, which is not specifically limited in this embodiment. For example, the similarity algorithm may include: cosine similarity algorithm, pearson correlation coefficient algorithm, Jaccard similarity coefficient algorithm, Tanimoto coefficient algorithm, log likelihood similarity algorithm, word frequency-inverse document frequency algorithm.

Step S206, a plurality of first candidate standard medical terms of the medical terms to be processed are obtained from a preset standard medical term library according to the first similarity.

Illustratively, each standard medical term is sorted according to a first similarity between the medical term to be processed and each standard medical term, so as to obtain a standard medical term queue; and taking the standard medical term with the arrangement order in the standard medical term queue before the preset arrangement order as the first candidate standard medical term of the medical term to be processed. The preset arrangement order may be set based on actual conditions, which is not specifically limited in this embodiment.

Step S207, inputting the first candidate standard medical terms and the medical terms to be processed into the medical term standardization model, and obtaining the target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed.

Exemplarily, combining the medical term to be processed with each first candidate standard medical term respectively to obtain a plurality of model input data; and inputting the model input data into the medical term standardization model to obtain the target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed.

And S208, inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed.

The preset term number prediction model is obtained by performing iterative training on the neural network model based on a target sample data set, where the sample data in the target sample data set may include medical terms to be processed and standard medical term numbers corresponding to the medical terms to be processed.

Step S209, determining a standard medical term corresponding to the medical term to be processed from the plurality of first candidate standard medical terms according to the number of standard medical terms and the target probability.

In the medical term processing method provided by the above embodiment, the BERT-based two-classification model is iteratively trained through the positive sample data set and the negative sample data set with the balanced sample numbers, so as to obtain an accurate medical term standardized model, then when the medical term to be processed is standardized, the target probability that each first candidate standard medical term of the medical term to be processed is the standard medical term corresponding to the medical term to be processed can be accurately determined based on the medical term standardized model, the standard medical term number of the medical term to be processed is determined by combining the term number prediction model, and finally the standard medical term corresponding to the medical term to be processed can be accurately determined according to the target probability and the standard medical term number that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed, the method not only can realize one-to-one conversion of the original medical terms and the standard medical terms, but also is suitable for the scene that the original medical terms correspond to a plurality of standard medical terms, thereby improving the accuracy of medical term standardization.

Referring to fig. 3, fig. 3 is a schematic block diagram of a medical term processing apparatus according to an embodiment of the present application.

As shown in fig. 3, the medical term processing apparatus 300 includes:

a determining module 310, configured to obtain a medical term to be processed, and determine a first similarity between the medical term to be processed and each standard medical term in a preset standard medical term library;

an obtaining module 320, configured to obtain, according to the first similarity, a plurality of first candidate standard medical terms of the medical term to be processed from the preset standard medical term library;

a probability prediction module 330, configured to input the first candidate standard medical term and the to-be-processed medical term into a preset medical term standardization model, so as to obtain a target probability that each of the first candidate standard medical terms is a standard medical term corresponding to the to-be-processed medical term;

the number prediction module 340 is configured to input the medical term to be processed into a preset term number prediction model, so as to obtain a standard medical term number of the medical term to be processed;

the determining module 310 is further configured to determine a standard medical term corresponding to the medical term to be processed from the plurality of first candidate standard medical terms according to the number of standard medical terms and the target probability.

In an embodiment, the probability prediction module 330 is further configured to:

combining the medical term to be processed with each of the first candidate standard medical terms respectively to obtain a plurality of model input data;

and sequentially inputting each model input data into the medical term standardization model to obtain the target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed.

Referring to fig. 4, fig. 4 is a schematic block diagram of another medical term processing apparatus provided in the embodiments of the present application.

As shown in fig. 4, the medical term processing apparatus 400 includes:

an obtaining module 410, configured to obtain a positive example sample data set, where the positive example sample data in the positive example sample data set includes an original medical term and a standard medical term corresponding to the original medical term;

a generating module 420, configured to generate a negative sample data set according to the positive sample data set and a preset standard medical term library;

the data processing module 430 is configured to perform expansion processing on the positive example sample data set to obtain a target positive example sample data set;

a training module 440, configured to perform iterative training on a BERT-based binary classification model according to the target positive sample data set and the target negative sample data set to obtain the medical term standardized model;

the obtaining module 410 is further configured to obtain a medical term to be processed, and determine a first similarity between the medical term to be processed and each standard medical term in a preset standard medical term library;

the obtaining module 410 is further configured to obtain, according to the first similarity, a plurality of first candidate standard medical terms of the medical term to be processed from the preset standard medical term library;

a probability prediction module 450, configured to input the first candidate standard medical term and the to-be-processed medical term into a preset medical term standardization model, so as to obtain a target probability that each of the first candidate standard medical terms is a standard medical term corresponding to the to-be-processed medical term;

the number prediction module 460 is configured to input the medical term to be processed into a preset term number prediction model to obtain a standard medical term number of the medical term to be processed;

the obtaining module 410 is further configured to determine a standard medical term corresponding to the medical term to be processed from the plurality of first candidate standard medical terms according to the number of standard medical terms and the target probability.

In one embodiment, as shown in fig. 5, the generating module 420 includes:

the determining submodule 421 is configured to determine a second similarity between the original medical term in the positive example sample data and each standard medical term in a preset standard medical term library;

the obtaining sub-module 422 is configured to obtain, according to the second similarity, a plurality of second candidate standard medical terms corresponding to the original medical term from the preset standard medical term library;

a removing sub-module 423, configured to remove a standard medical term corresponding to the original medical term from the plurality of second candidate standard medical terms, so as to obtain a set of non-standard medical terms corresponding to the original medical term;

the generating sub-module 424 is configured to generate a negative sample data set according to the non-standard medical term set corresponding to each original medical term and each original medical term.

In one embodiment, the data processing module 430 is further configured to:

determining a first sample number of the positive sample data set and a second sample number of the negative sample data set;

determining the expansion rate of the positive sample data set according to the number of the second samples and the number of the first samples;

and performing expansion processing on the positive sample data set according to the expansion multiplying power to obtain a target positive sample data set.

In one embodiment, the data processing module 430 is further configured to:

randomly sampling the normal sample data with a preset proportion from the normal sample data set until the sampling times reach the expansion multiplying power to obtain newly-increased normal sample data;

and merging the newly added sample data of the positive example and the sample data set of the positive example to obtain a target sample data set of the positive example.

In one embodiment, the training module 440 is further configured to:

counting the number of standard medical terms of each piece of the positive example data in the target positive example data set;

combining the original medical terms in each piece of the positive example sample data with the corresponding number of the standard medical terms to obtain a target sample data set;

acquiring sample data from the target sample data set, wherein the sample data comprises real values of the number of original medical terms and standard medical terms;

inputting the original medical terms in the sample data into a preset neural network model to obtain a predicted value of the number of standard medical terms;

determining a loss value according to the real value and the predicted value, and determining whether a preset neural network model converges according to the loss value;

and if the preset neural network model is not converged, updating the parameters of the preset neural network model, and continuing to train the updated preset neural network model until the term number prediction model is obtained through convergence.

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the apparatus, the modules and the units described above may refer to the corresponding processes in the foregoing embodiment of the medical term processing method based on the neural network, and are not described herein again.

The apparatus provided by the above embodiments may be implemented in the form of a computer program, which can be run on a computer device as shown in fig. 6.

Referring to fig. 6, fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.

As shown in fig. 6, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a storage medium and an internal memory.

The storage medium may store an operating system and a computer program. The computer program comprises program instructions which, when executed, cause a processor to perform any one of the neural network-based medical term processing methods.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in an embodiment, the processor is configured to run a computer program stored in the memory to implement the steps of:

In an embodiment, the processor, in implementing inputting the first candidate standard medical term and the medical term to be processed into a preset medical term standardization model, is configured to implement:

and inputting the model input data into the medical term standardization model to obtain the target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed.

Wherein, in another embodiment, the processor is configured to run a computer program stored in the memory to implement the steps of:

acquiring a positive example sample data set, wherein the positive example sample data in the positive example sample data set comprises original medical terms and standard medical terms corresponding to the original medical terms;

generating a negative sample data set according to the positive sample data set and a preset standard medical term library;

expanding the sample data set of the positive example to obtain a target sample data set of the positive example;

performing iterative training on a BERT-based binary classification model according to the target positive sample data set and the negative sample data set to obtain the medical term standardized model;

In an embodiment, when the processor generates a negative sample data set according to the positive sample data set and a preset standard medical term library, the processor is configured to:

determining a second similarity between the original medical term in the normal sample data and each standard medical term in a preset standard medical term library;

according to the second similarity, a plurality of second candidate standard medical terms corresponding to the original medical terms are obtained from the preset standard medical term library;

removing standard medical terms corresponding to the original medical terms from the plurality of second candidate standard medical terms to obtain a non-standard medical term set corresponding to the original medical terms;

and generating a negative sample data set according to the non-standard medical term set corresponding to each original medical term and each original medical term.

In an embodiment, when the processor implements the expansion processing on the normal sample data set to obtain the target normal sample data set, the processor is configured to implement:

In an embodiment, when the processor implements the expansion processing on the positive sample data set according to the expansion magnification to obtain the target positive sample data set, the processor is configured to implement:

In an embodiment, after the processor performs the expansion processing on the positive sample data set to obtain the target positive sample data set, the processor is further configured to:

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working process of the computer device described above may refer to the corresponding process in the foregoing embodiment of the medical term processing method based on a neural network, and will not be described herein again.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, where the computer program includes program instructions, and when the program instructions are executed, the method implemented may refer to the various embodiments of the neural network-based medical term processing method of the present application.

The computer readable storage medium may be volatile or nonvolatile. The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A neural network-based medical term processing method, comprising:

2. The medical term processing method according to claim 1, wherein the inputting the first candidate standard medical term and the medical term to be processed into a preset medical term standardization model includes:

3. The method according to claim 1, wherein before the obtaining the medical term to be processed and determining the first similarity between the medical term to be processed and each standard medical term in the preset standard medical term library, the method further comprises:

expanding the positive sample data set to obtain a target positive sample data set, wherein the difference value of the number of samples between the target positive sample data set and the negative sample data set is not greater than a preset threshold value;

and performing iterative training on a BERT-based binary classification model according to the target positive sample data set and the negative sample data set to obtain the medical term standardized model.

4. The method according to claim 3, wherein the generating a negative example sample data set according to the positive example sample data set and a preset standard medical term library comprises:

5. The method according to claim 3, wherein the expanding the normal sample data set to obtain a target normal sample data set comprises:

6. The medical term processing method according to claim 5, wherein the performing the expansion processing on the positive example sample data set according to the expansion magnification to obtain a target positive example sample data set includes:

7. The method according to claim 3, wherein the expanding the positive example sample data set to obtain a target positive example sample data set further comprises:

8. A medical term processing apparatus, characterized in that the medical term processing apparatus comprises:

9. A computer arrangement, characterized in that the computer arrangement comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, carries out the steps of the neural network-based medical term processing method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, wherein the computer program, when being executed by a processor, carries out the steps of the neural network-based medical term processing method as claimed in any one of claims 1 to 7.