CN113591458B

CN113591458B - Medical term processing method, device, equipment and storage medium based on neural network

Info

Publication number: CN113591458B
Application number: CN202110865296.3A
Authority: CN
Inventors: 张颖
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2023-09-01
Anticipated expiration: 2041-07-29
Also published as: CN113591458A

Abstract

The application relates to the field of data processing, in particular to a medical term processing method, a device, equipment and a storage medium based on a neural network, wherein the method comprises the following steps: determining a first similarity between the medical term to be processed and each standard medical term in a preset standard medical term library; acquiring a plurality of first candidate standard medical terms of the medical terms to be processed according to the first similarity; inputting the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model to obtain target probabilities corresponding to the first candidate standard medical terms; inputting the medical terms to be processed into a preset term number prediction model to obtain standard medical term numbers; and determining the standard medical terms corresponding to the medical terms to be processed according to the standard medical terms number and the target probability. The method improves the accuracy of medical term standardization. The application also relates to the field of blockchains, in which the above medical term standardized models can be stored.

Description

Medical term processing method, device, equipment and storage medium based on neural network

Technical Field

The present application relates to the field of data processing, and in particular, to a method, an apparatus, a device, and a storage medium for processing medical terms based on a neural network.

Background

Medical terms are used in the medical arts to refer to various things, phenomena, characteristics, relationships, processes, etc. in the medical arts. These medical terms are essential components of the clinical information system to express medical information. However, with respect to the same diagnosis, operation, medicine, examination, assay, symptom, etc., medical staff often have hundreds to thousands of different expressions, which bring great difficulty to subsequent analysis of medical record data, etc., while original medical terms can be expressed in different ways to be converted into standard medical terms through medical term standardization tasks.

At present, the medical term standardization task is mainly realized based on a rule matching method and a single matching model, but only one-to-one conversion of the original medical term and the standard medical term can be realized based on the rule matching method and the single matching model, the method is not suitable for scenes of the original medical term corresponding to a plurality of standard medical terms, and the accuracy of medical term standardization is low. Therefore, how to improve the accuracy of the standardization of medical terms is a problem to be solved at present.

Disclosure of Invention

The embodiment of the application provides a medical term processing method, device, equipment and storage medium based on a neural network, aiming at improving the accuracy of medical term standardization.

In a first aspect, an embodiment of the present application provides a medical term processing method based on a neural network, including:

acquiring medical terms to be processed, and determining a first similarity between the medical terms to be processed and each standard medical term in a preset standard medical term library;

acquiring a plurality of first candidate standard medical terms of the medical terms to be processed from the preset standard medical term library according to the first similarity;

inputting the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model to obtain target probability that each first candidate standard medical term is the standard medical term corresponding to the medical terms to be processed;

inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed;

and determining standard medical terms corresponding to the medical terms to be processed from a plurality of first candidate standard medical terms according to the number of the standard medical terms and the target probability.

In a second aspect, embodiments of the present application also provide a medical term processing apparatus, including:

the determining module is used for acquiring medical terms to be processed and determining a first similarity between the medical terms to be processed and each standard medical term in a preset standard medical term library;

the acquisition module is used for acquiring a plurality of first candidate standard medical terms of the medical terms to be processed from the preset standard medical term library according to the first similarity;

the probability prediction module is used for inputting the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model to obtain target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed;

the number prediction module is used for inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed;

the determining module is further configured to determine, according to the number of standard medical terms and the target probability, a standard medical term corresponding to the medical term to be processed from a plurality of first candidate standard medical terms.

In a third aspect, embodiments of the present application also provide a computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the neural network based medical term processing method as described above.

In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of a neural network based medical term processing method as described above.

The embodiment of the application provides a medical term processing method, a device, equipment and a storage medium based on a neural network, wherein the method is characterized in that through first similarity between a medical term to be processed and each standard medical term in a preset standard medical term library, a plurality of first candidate standard medical terms of the medical term to be processed are obtained from the preset standard medical term library, then the first candidate standard medical terms and the medical term to be processed are input into a preset medical term standardization model to obtain target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed, then the medical term to be processed is input into a preset term number prediction model, the number of standard medical terms of the medical term to be processed can be determined, and finally the standard medical term corresponding to the medical term to be processed is determined according to the number of standard medical terms and the target probability corresponding to each first candidate standard medical term, so that not only can one-to-one conversion between an original medical term and the standard medical term be realized, but also the standard medical term standardization accuracy can be greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a medical term processing method based on a neural network according to an embodiment of the present application;

FIG. 2 is a flow chart of another medical term processing method based on a neural network according to an embodiment of the present application;

FIG. 3 is a schematic block diagram of a medical term processing apparatus provided by an embodiment of the present application;

FIG. 4 is a schematic block diagram of another medical term processing apparatus provided by an embodiment of the present application;

FIG. 5 is a schematic block diagram of a sub-module of the medical term processing apparatus of FIG. 4;

fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present application.

The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

The embodiment of the application provides a medical term processing method, device and equipment based on a neural network and a storage medium. The medical term processing method can be applied to terminal equipment or servers, the terminal equipment can be mobile phones, tablet computers, notebook computers, desktop computers, personal digital assistants, wearable equipment and the like, and the servers can be single servers or server clusters formed by a plurality of servers.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flow chart of a medical term processing method based on a neural network according to an embodiment of the application.

As shown in fig. 1, the medical term processing method includes steps S101 to S105.

Step S101, obtaining medical terms to be processed, and determining a first similarity between the medical terms to be processed and each standard medical term in a preset standard medical term library.

The method includes the steps of obtaining a medical term to be processed input by a user or obtaining the medical term to be processed sent by terminal equipment. The medical term to be processed may be a clinical medical term, for example, a medical term actually used by a doctor in a doctor's advice and/or medical record, or a medical term used in the process of clinical medical information statistics. It will be appreciated that the medical terms to be processed may vary from person to person and that different medical workers may use different medical expressions for the same medical information. The medical information may be, for example, disease information, symptom information, examination information, or the like. The library of preset standard medical terms comprises a plurality of standard medical terms, and the library of preset standard medical terms may comprise standard medical terms comprised by the international disease classification ICD.

For example, determining the first similarity between the medical term to be processed and each standard medical term in the library of preset standard medical terms may include: a first similarity between the medical term to be processed and each standard medical term in the library of preset standard medical terms is determined based on a preset similarity algorithm. The preset similarity algorithm method may be set based on actual situations, which is not specifically limited in this embodiment. For example, the similarity algorithm may include: cosine similarity algorithm, pearson correlation coefficient algorithm, jaccard similarity coefficient algorithm, tanimoto coefficient algorithm, log likelihood similarity algorithm, word frequency-inverse document frequency algorithm. The similarity between the medical term to be processed and the standard medical term can be accurately calculated through the similarity algorithm.

Step S102, acquiring a plurality of first candidate standard medical terms of the medical terms to be processed from a preset standard medical term library according to the first similarity.

Illustratively, sorting each standard medical term according to the first similarity between the medical term to be processed and each standard medical term to obtain a standard medical term queue; the standard medical terms with the arrangement order before the preset arrangement order in the standard medical term queue are used as first candidate standard medical terms of the medical terms to be processed. The preset arrangement order may be set based on practical situations, which is not specifically limited in this embodiment. Ranking the standard medical terms based on similarity facilitates the acquisition of a plurality of candidate standard medical terms in front.

For example, the higher the first similarity between the standard medical term and the medical term to be processed, the earlier the order of arrangement of the standard medical term in the standard medical term queue, and the lower the first similarity between the standard medical term and the medical term to be processed, the later the order of arrangement of the standard medical term in the standard medical term queue. And if the preset ranking order is 6, taking the first 5 standard medical terms in the standard medical term queue as the first candidate standard medical terms of the medical terms to be processed.

For example, standard medical terms with a first similarity greater than or equal to a preset similarity are determined as first candidate standard medical terms for the medical term to be processed. The preset similarity may be set based on actual situations, which is not specifically limited in this embodiment. For example, if the preset similarity is 80%, and the first similarity between the medical term to be processed and the standard medical term a, the standard medical term B, the standard medical term C, the standard medical term D, and the standard medical term E is 90%, 70%, 85%, 50%, and 92%, respectively, then the standard medical term a, the standard medical term C, and the standard medical term E are determined as the first candidate standard medical term for the medical term to be processed.

Step S103, inputting the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model to obtain target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed.

The preset medical term standardization model is obtained by performing iterative training on the BERT-based binary classification model based on a positive example sample data set and a negative example sample data set in advance, the positive example sample data in the positive example sample data set comprises one or more standard medical terms corresponding to the original medical terms and the original medical terms, the negative example sample data in the negative example sample data set comprises the original medical terms and standard medical terms not corresponding to the original medical terms, and the number difference value of samples between the positive example sample data set and the negative example sample data set is smaller than or equal to a preset threshold value.

Illustratively, combining the medical term to be processed with each first candidate standard medical term to obtain a plurality of model input data; and inputting the model input data into a medical term standardization model to obtain target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed. For example, if the first candidate standard medical term of the medical term a to be processed includes the standard medical term a, the standard medical term C and the standard medical term E, three model input data may be obtained by combining [ a, a ], [ a, C ] and [ a, E ], and the standard medical term a, the standard medical term C and the standard medical term E may be obtained by inputting the [ a, a ], [ a, C ] and [ a, E ] into the medical term standardization model, respectively, so that the target probabilities of the standard medical term a, the standard medical term C and the standard medical term E as the standard medical term corresponding to the medical term a to be processed may be 0.85, 0.7 and 0.45.

Step S104, inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed.

The preset term number prediction model is obtained by performing iterative training on the neural network model based on a target sample data set, and sample data in the target sample data set can comprise original medical terms and standard medical term numbers corresponding to the original medical terms.

For example, the medical term to be treated is "laryngeal malignancy", and the standard medical term corresponding to "laryngeal malignancy" is "laryngeal carcinoma", i.e. the number of standard medical terms of "laryngeal malignancy" is 1. For another example, the medical terms to be treated are "right pleural thickening and calcification", and the number of standard medical terms corresponding to "pleural calcification" and "pleural hypertrophy", i.e., "right pleural thickening and calcification", is 2.

Step S105, determining standard medical terms corresponding to the medical terms to be processed from a plurality of first candidate standard medical terms according to the number of the standard medical terms and the target probability.

Illustratively, sorting each first candidate standard medical term according to the target probability corresponding to each first candidate standard medical term to obtain a candidate standard medical term queue; obtaining a target arrangement order corresponding to the number of standard medical terms; determining a first candidate standard medical term of the candidate standard medical term queue, the ranking order of which is located before the target ranking order, as the standard medical term corresponding to the medical term to be processed.

For example, standard medical terms a, C, and E are standard medical terms corresponding to medical terms to be processed with a target probability of 0.85, 0.7, and 0.45, and the generated candidate standard medical term ranks are [ a, C, E ], and if the number of standard medical terms is 1, the target ranking order is 2, standard medical term a located before the target ranking order 2 is determined as the standard medical term corresponding to medical terms to be processed, and if the number of standard medical terms is 2, the target ranking order is 3, and standard medical term a located before the target ranking order 3 is determined as the standard medical term corresponding to medical terms to be processed.

According to the medical term processing method provided by the embodiment, through the first similarity between the to-be-processed medical term and each standard medical term in the preset standard medical term library, a plurality of first candidate standard medical terms of the to-be-processed medical term are obtained from the preset standard medical term library, then the first candidate standard medical terms and the to-be-processed medical term are input into the preset medical term standardization model to obtain the target probability that each first candidate standard medical term is the standard medical term corresponding to the to-be-processed medical term, then the to-be-processed medical term is input into the preset term number prediction model, the number of standard medical terms of the to-be-processed medical term can be determined, and finally the standard medical term corresponding to the to-be-processed medical term is determined according to the number of standard medical terms and the target probability corresponding to each first candidate standard medical term.

Referring to fig. 2, fig. 2 is a flowchart of another medical term processing method based on a neural network according to an embodiment of the application.

As shown in fig. 2, the medical term processing method includes steps S201 to S209.

Step S201, a positive example sample data set is obtained.

The positive sample data in the positive sample data set includes an original medical term and a standard medical term corresponding to the original medical term, and the standard medical term corresponding to the original medical term may be one or a plurality of standard medical terms.

Step S202, generating a negative example sample data set according to the positive example sample data set and a preset standard medical term library.

The negative example sample data in the negative example sample data set includes an original medical term and a standard medical term that does not correspond to the original medical term, a difference value of the number of samples between the target positive example sample data set and the negative example sample data set is not greater than a preset threshold, the preset threshold may be set based on an actual situation, and this embodiment is not limited specifically. For example, the preset threshold is 5.

In an embodiment, a second similarity between the original medical term in the positive example sample data and each standard medical term in the library of preset standard medical terms is determined; acquiring a plurality of second candidate standard medical terms corresponding to the original medical term from a preset standard medical term library according to the second similarity between the original medical term and each standard medical term; removing standard medical terms corresponding to the original medical terms from the plurality of second candidate standard medical terms to obtain a non-standard medical term set corresponding to the original medical terms; a negative example sample dataset is generated from the non-standard medical term set corresponding to each original medical term and each original medical term. Wherein the non-standard medical term set includes a plurality of standard medical terms that do not correspond to the original medical term. The negative example sample data set can be generated through the positive example sample data set, artificial labeling of the negative example sample data is not needed, and the sample data generation convenience is improved.

For example, one piece of positive example sample data is [ b, c1, c2, c3], that is, standard medical terms corresponding to the original medical term b are standard medical terms c1, c2 and c3, a plurality of second candidate standard medical terms corresponding to the obtained original medical term b are [ c1, c2, c3, c4, c5] through similarity between the original medical term b and each standard medical term, and then the [ c1, c2, c3] is removed, so that a non-standard medical term set corresponding to the original medical term b is [ c4, c5] can be obtained, and thus, negative example sample data [ b, c4] and [ b, c5] can be generated.

And step 203, performing expansion processing on the positive example sample data set to obtain a target positive example sample data set.

Illustratively, determining a first sample number of the positive example sample data set and a second sample number of the negative example sample data set; determining the expansion rate of the positive sample data set according to the second sample number and the first sample number; and performing expansion processing on the positive example sample data set according to the expansion multiplying power to obtain a target positive example sample data set. The difference value of the number of samples between the target positive example sample data set and the negative example sample data set is not greater than a preset threshold, and the preset threshold may be set based on practical situations, which is not particularly limited in this embodiment, for example, the preset threshold is 5. The positive example sample data set is expanded through the expansion rate, so that the number of samples of the target positive example sample data set and the negative example sample data set obtained through expansion is balanced, and the accuracy of the model can be improved during subsequent model training.

For example, the manner of determining the expansion ratio of the positive sample data set according to the second sample number and the first sample number may be: and determining the ratio of the second sample number to the first sample number as the expansion multiplying power of the positive sample data set. For example, if the positive sample data set includes 50 positive sample data and the negative sample data set includes 500 negative sample data, the expansion ratio of the positive sample data set is 500/50=10.

For example, the manner of determining the expansion ratio of the positive sample data set according to the second sample number and the first sample number may be: and determining a difference value of the second sample number and the first sample number to obtain a sample number difference value, and determining a ratio of the sample number difference value to the first sample number as an expansion multiplying power of the positive sample data set. For example, if the positive sample data set includes 50 positive sample data and the negative sample data set includes 500 negative sample data, the number of samples is 450, and the expansion ratio of the positive sample data set is 450/50=9.

For example, the method for performing expansion processing on the positive sample data set according to the expansion multiplying power to obtain the target positive sample data set may be: and copying the positive example sample data set according to the expansion multiplying power to obtain new positive example sample data, and combining the new positive example sample data to obtain a target positive example sample data set. For example, when the expansion ratio is 10 and the positive sample data set includes 50 positive sample numbers, the 50 positive sample numbers are copied 10 times, and finally the positive sample data obtained by the copying 10 times are combined, thereby obtaining the target positive sample data set.

For example, the method for performing expansion processing on the positive sample data set according to the expansion multiplying power to obtain the target positive sample data set may be: randomly sampling positive sample data with preset proportion from the positive sample data set until the sampling times reach the expansion multiplying power to obtain newly increased positive sample data, and combining the newly increased positive sample data and the positive sample data set to obtain a target positive sample data set. For example, the expansion rate is 10, the positive sample data set includes 50 positive sample numbers, 90% (45) positive sample data are randomly sampled from the 50 positive sample data each time, 10 samples are taken in total, 450 new positive sample data are obtained, and finally the 450 new positive sample data and the positive sample data set are combined, so that a target positive sample data set including 500 positive sample data is obtained.

And step S204, performing iterative training on the binary classification model based on the BERT according to the target positive example sample data set and the negative example sample data set to obtain a medical term standardized model.

Illustratively, alternately selecting target sample data from a target positive sample data set and a target negative sample data set, and inputting the target sample data into a BERT-based bi-classification model to obtain a first output probability that the target sample data is a positive sample and a second output probability that the target sample data is a negative sample; determining a model loss value according to a first output probability that the target sample data is a positive example sample and a second output probability that the target sample data is a negative example sample, wherein the sample type of the target sample data comprises the negative example sample or the positive example sample; if the model loss value is larger than a preset first loss value, updating model parameters of the BERT-based binary model, and continuing to select target sample data to train the updated BERT-based binary model until the model converges to obtain a medical term standardized model. The preset first loss value may be set based on actual situations, which is not specifically limited in this embodiment.

Illustratively, the model loss value may be determined according to a first output probability that the target sample data is a positive example sample and a second output probability that the target sample data is a negative example sample by: and acquiring a preset loss function, and determining a model loss value according to the first output probability and the second output probability based on the preset loss function. The preset loss function may be set based on actual situations, for example, the preset loss function may be a negative log likelihood function.

In one embodiment, the standard medical term number of each positive sample data in the target positive sample data set is counted; combining the original medical terms in each positive example sample data with the corresponding standard medical terms to obtain a target sample data set; obtaining sample data from a target sample data set, wherein the sample data comprises true values of the number of original medical terms and standard medical terms; inputting original medical terms in the sample data into a preset neural network model to obtain a predicted value of the number of standard medical terms; determining a model loss value according to the true value and the predicted value, and determining whether a preset neural network model is converged according to the model loss value; if the preset neural network model is not converged, updating parameters of the preset neural network model, and continuing training the updated preset neural network model until the term number prediction model is converged.

The method for determining the loss value according to the true value and the predicted value and determining whether the preset neural network model converges according to the loss value may be: determining the square difference between the true value and the predicted value, and determining the square difference between the true value and the predicted value as a loss value of a preset neural network model; if the loss value is greater than or equal to a preset second loss value, determining that the preset neural network model is not converged, and if the loss value is less than the preset second loss value, determining that the preset neural network model is converged. The preset second loss value may be set based on actual situations, which is not specifically limited in this embodiment.

Step S205, obtaining medical terms to be processed, and determining a first similarity between the medical terms to be processed and each standard medical term in a preset standard medical term library.

For example, determining the first similarity between the medical term to be processed and each standard medical term in the library of preset standard medical terms may include: and determining the similarity between the medical term to be processed and each standard medical term in the preset standard medical term library based on a preset similarity algorithm. The preset similarity algorithm method may be set based on actual situations, which is not specifically limited in this embodiment. For example, the similarity algorithm may include: cosine similarity algorithm, pearson correlation coefficient algorithm, jaccard similarity coefficient algorithm, tanimoto coefficient algorithm, log likelihood similarity algorithm, word frequency-inverse document frequency algorithm.

Step S206, acquiring a plurality of first candidate standard medical terms of the medical terms to be processed from a preset standard medical term library according to the first similarity.

Illustratively, sorting each standard medical term according to the first similarity between the medical term to be processed and each standard medical term to obtain a standard medical term queue; the standard medical terms with the arrangement order before the preset arrangement order in the standard medical term queue are used as first candidate standard medical terms of the medical terms to be processed. The preset arrangement order may be set based on practical situations, which is not specifically limited in this embodiment.

Step S207, inputting the first candidate standard medical terms and the medical terms to be processed into a medical term standardization model to obtain target probabilities that the first candidate standard medical terms are standard medical terms corresponding to the medical terms to be processed.

Illustratively, combining the medical term to be processed with each first candidate standard medical term to obtain a plurality of model input data; and inputting the model input data into a medical term standardization model to obtain target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed.

Step S208, inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed.

The preset term number prediction model is obtained by performing iterative training on the neural network model based on a target sample data set, and sample data in the target sample data set can comprise medical terms to be processed and standard medical term numbers corresponding to the medical terms to be processed.

Step S209, determining standard medical terms corresponding to the medical terms to be processed from a plurality of first candidate standard medical terms according to the number of the standard medical terms and the target probability.

According to the medical term processing method provided by the embodiment, the accurate medical term standardization model can be obtained through iterative training of the BERT-based binary classification model by the positive example sample data set and the negative example sample data set with balanced sample numbers, then when the medical term to be processed is standardized, the target probability that each first candidate standard medical term of the medical term to be processed is the standard medical term corresponding to the medical term to be processed can be accurately determined based on the medical term standardization model, the number of standard medical terms of the medical term to be processed is determined by combining the term number prediction model, and finally the target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed and the number of standard medical terms can be accurately determined.

Referring to fig. 3, fig. 3 is a schematic block diagram of a medical term processing apparatus according to an embodiment of the present application.

As shown in fig. 3, the medical term processing apparatus 300 includes:

a determining module 310, configured to obtain a medical term to be processed, and determine a first similarity between the medical term to be processed and each standard medical term in a preset standard medical term library;

an obtaining module 320, configured to obtain, according to the first similarity, a plurality of first candidate standard medical terms of the medical terms to be processed from the preset standard medical term library;

the probability prediction module 330 is configured to input the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model, so as to obtain a target probability that each of the first candidate standard medical terms is a standard medical term corresponding to the medical terms to be processed;

the number prediction module 340 is configured to input the medical term to be processed into a preset term number prediction model to obtain a standard medical term number of the medical term to be processed;

the determining module 310 is further configured to determine, according to the number of standard medical terms and the target probability, a standard medical term corresponding to the medical term to be processed from a plurality of the first candidate standard medical terms.

In an embodiment, the probability prediction module 330 is further configured to:

combining the medical term to be processed with each first candidate standard medical term respectively to obtain a plurality of model input data;

and sequentially inputting each model input data into the medical term standardized model to obtain target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed.

Referring to fig. 4, fig. 4 is a schematic block diagram of another medical term processing apparatus according to an embodiment of the present application.

As shown in fig. 4, the medical term processing apparatus 400 includes:

an obtaining module 410, configured to obtain a positive sample data set, where positive sample data in the positive sample data set includes an original medical term and a standard medical term corresponding to the original medical term;

a generating module 420, configured to generate a negative example sample data set according to the positive example sample data set and a preset standard medical term library;

the data processing module 430 is configured to perform expansion processing on the positive sample data set to obtain a target positive sample data set;

a training module 440, configured to iteratively train the BERT-based bi-classification model according to the target positive example sample data set and the negative example sample data set, to obtain the medical term standardized model;

The obtaining module 410 is further configured to obtain a medical term to be processed, and determine a first similarity between the medical term to be processed and each standard medical term in a preset standard medical term library;

the obtaining module 410 is further configured to obtain, according to the first similarity, a plurality of first candidate standard medical terms of the medical terms to be processed from the preset standard medical term library;

the probability prediction module 450 is configured to input the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model, so as to obtain a target probability that each first candidate standard medical term is a standard medical term corresponding to the medical terms to be processed;

the number prediction module 460 is configured to input the medical term to be processed into a preset term number prediction model to obtain a standard medical term number of the medical term to be processed;

the obtaining module 410 is further configured to determine, according to the number of standard medical terms and the target probability, a standard medical term corresponding to the medical term to be processed from a plurality of the first candidate standard medical terms.

In one embodiment, as shown in fig. 5, the generating module 420 includes:

A determining submodule 421 for determining a second similarity between the original medical term in the positive example sample data and each standard medical term in a preset standard medical term library;

an obtaining sub-module 422, configured to obtain, according to the second similarity, a plurality of second candidate standard medical terms corresponding to the original medical term from the preset standard medical term library;

a removing sub-module 423, configured to reject standard medical terms corresponding to the original medical terms from the plurality of second candidate standard medical terms, to obtain a non-standard medical term set corresponding to the original medical terms;

a generating sub-module 424 is configured to generate a negative example sample data set according to the non-standard medical term set corresponding to each of the original medical terms and each of the original medical terms.

In an embodiment, the data processing module 430 is further configured to:

determining a first sample number of the positive example sample data set and a second sample number of the negative example sample data set;

determining the expansion rate of the positive sample data set according to the second sample number and the first sample number;

and performing expansion processing on the positive sample data set according to the expansion multiplying power to obtain a target positive sample data set.

In an embodiment, the data processing module 430 is further configured to:

randomly sampling the positive sample data with preset proportion from the positive sample data set until the sampling times reach the expansion multiplying power to obtain newly added positive sample data;

and merging the new positive example sample data and the positive example sample data set to obtain a target positive example sample data set.

In an embodiment, the training module 440 is further configured to:

counting the standard medical term number of each positive example sample data in the target positive example sample data set;

combining the original medical terms in each positive example sample data with the corresponding standard medical terms to obtain a target sample data set;

obtaining sample data from the target sample data set, wherein the sample data comprises true values of the original medical terms and the standard medical terms;

inputting the original medical terms in the sample data into a preset neural network model to obtain a predicted value of the number of standard medical terms;

determining a loss value according to the real value and the predicted value, and determining whether a preset neural network model is converged according to the loss value;

If the preset neural network model is not converged, updating parameters of the preset neural network model, and continuing training the updated preset neural network model until the term number prediction model is converged.

It should be noted that, for convenience and brevity of description, specific working procedures of the above-described apparatus and modules and units may refer to corresponding procedures in the foregoing embodiments of the neural network-based medical term processing method, and are not repeated herein.

The apparatus provided by the above embodiments may be implemented in the form of a computer program which may be run on a computer device as shown in fig. 6.

Referring to fig. 6, fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server or a terminal.

As shown in fig. 6, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a storage medium and an internal memory.

The storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause the processor to perform any of a number of neural network based medical term processing methods.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in an embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

In an embodiment, the processor is configured to, when implementing inputting the first candidate standard medical term and the medical term to be processed into a preset medical term normalization model, implement:

and inputting the model input data into the medical term standardized model to obtain target probability that each first candidate standard medical term is the standard medical term corresponding to the medical term to be processed.

Wherein in another embodiment the processor is configured to run a computer program stored in the memory to perform the steps of:

obtaining a positive example sample data set, wherein positive example sample data in the positive example sample data set comprises original medical terms and standard medical terms corresponding to the original medical terms;

generating a negative example sample data set according to the positive example sample data set and a preset standard medical term library;

performing expansion processing on the positive sample data set to obtain a target positive sample data set;

performing iterative training on the binary classification model based on BERT according to the target positive example sample data set and the negative example sample data set to obtain the medical term standardized model;

In an embodiment, the processor, when implementing generating the negative example sample data set from the positive example sample data set and a preset standard medical term library, is configured to implement:

determining a second similarity between the original medical term in the positive example sample data and each standard medical term in a preset standard medical term library;

acquiring a plurality of second candidate standard medical terms corresponding to the original medical terms from the preset standard medical term library according to the second similarity;

Removing standard medical terms corresponding to the original medical terms from a plurality of second candidate standard medical terms to obtain a non-standard medical term set corresponding to the original medical terms;

and generating a negative example sample data set according to the non-standard medical term set corresponding to each original medical term and each original medical term.

In an embodiment, when implementing expansion processing on the positive example sample data set to obtain a target positive example sample data set, the processor is configured to implement:

In an embodiment, when implementing expansion processing on the positive sample data set according to the expansion multiplying power to obtain a target positive sample data set, the processor is configured to implement:

In an embodiment, after implementing expansion processing on the positive example sample data set to obtain a target positive example sample data set, the processor is further configured to implement:

It should be noted that, for convenience and brevity of description, specific working procedures of the above-described computer device may refer to corresponding procedures in the foregoing embodiments of the neural network-based medical term processing method, which are not described in detail herein.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

Embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, the computer program including program instructions that, when executed, implement a method that can refer to various embodiments of the neural network-based medical term processing method of the present application.

Wherein the computer readable storage medium may be volatile or nonvolatile. The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A neural network-based medical term processing method, comprising:

inputting the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model to obtain target probabilities that the first candidate standard medical terms are standard medical terms corresponding to the medical terms to be processed, wherein the preset medical term standardization model is obtained by carrying out iterative training on a BERT-based classification model based on a positive sample data set and a negative sample data set in advance, positive sample data in the positive sample data set comprises one or more standard medical terms corresponding to original medical terms and original medical terms, negative sample data in the negative sample data set comprises original medical terms and standard medical terms not corresponding to the original medical terms, and the difference value of the number of samples between the positive sample data set and the negative sample data set is smaller than or equal to a preset threshold value;

Inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed, wherein the preset term number prediction model is obtained by performing iterative training on a neural network model based on a target sample data set, and the sample data in the target sample data set comprises original medical terms and standard medical term numbers corresponding to the original medical terms;

2. The medical term processing method according to claim 1, wherein the inputting the first candidate standard medical term and the medical term to be processed into a preset medical term normalization model includes:

3. The medical term processing method according to claim 1, wherein before the obtaining the medical term to be processed and determining the first similarity between the medical term to be processed and each standard medical term in the preset standard medical term library, further comprises:

performing expansion processing on the positive example sample data set to obtain a target positive example sample data set, wherein the difference value of the number of samples between the target positive example sample data set and the negative example sample data set is not greater than a preset threshold value;

and performing iterative training on the binary classification model based on BERT according to the target positive example sample data set and the negative example sample data set to obtain the medical term standardized model.

4. A medical term processing method according to claim 3, wherein said generating a negative example sample data set from said positive example sample data set and a library of preset standard medical terms comprises:

5. A medical term processing method according to claim 3, wherein said expanding the positive sample data set to obtain a target positive sample data set comprises:

6. The medical term processing method according to claim 5, wherein the expanding the positive sample data set according to the expansion ratio to obtain a target positive sample data set includes:

7. The medical term processing method according to claim 3, wherein after performing expansion processing on the positive sample data set to obtain a target positive sample data set, further comprising:

8. A medical term processing apparatus, characterized in that the medical term processing apparatus comprises:

the probability prediction module is used for inputting the first candidate standard medical terms and the medical terms to be processed into a preset medical term standardization model to obtain target probabilities that the first candidate standard medical terms are standard medical terms corresponding to the medical terms to be processed, wherein the preset medical term standardization model is obtained by carrying out iterative training on a BERT-based bi-classification model based on a positive sample data set and a negative sample data set in advance, positive sample data in the positive sample data set comprise original medical terms and one or more standard medical terms corresponding to the original medical terms, negative sample data in the negative sample data set comprise the original medical terms and standard medical terms not corresponding to the original medical terms, and a sample number difference value between the positive sample data set and the negative sample data set is smaller than or equal to a preset threshold value;

The number prediction module is used for inputting the medical terms to be processed into a preset term number prediction model to obtain the standard medical term number of the medical terms to be processed, wherein the preset term number prediction model is obtained by performing iterative training on a neural network model based on a target sample data set, and sample data in the target sample data set comprises original medical terms and standard medical term numbers corresponding to the original medical terms;

9. A computer device, characterized in that it comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the neural network based medical term processing method according to any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the neural network-based medical term processing method according to any one of claims 1 to 7.