CN114386528B

CN114386528B - Model training method and device, computer equipment and storage medium

Info

Publication number: CN114386528B
Application number: CN202210057007.1A
Authority: CN
Inventors: 赵越; 徐卓扬
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2024-05-14
Anticipated expiration: 2042-01-18
Also published as: CN114386528A; WO2023137924A1

Abstract

The embodiment provides a training method and device for a model, computer equipment and a storage medium, and belongs to the technical field of artificial intelligence. Comprising the following steps: acquiring a plurality of inquiry sample data and corresponding prescription sample data; the method comprises the steps of performing prediction processing on patient sample data and dialogue sample data through a pre-training model to obtain prescription prediction data; calculating a first loss value from the plurality of drug forecast data; constructing a drug co-occurrence matrix according to the plurality of drug sample data, and calculating a second loss value; training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model. According to the method, the medicine co-occurrence matrix is constructed through prescription sample data, and the mode of adding medicine co-occurrence loss according to the medicine co-occurrence matrix not only considers the correlation among a plurality of medicines, but also avoids the problem that in a prescription recommendation model, the classifier is increased along with the increase of the medicine quantity, so that the training efficiency of the model is improved.

Description

Model training method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to a model training method and apparatus, a computer device, and a storage medium.

Background

In the task of prescription recommendation, there is often a dependency between multiple drugs in a prescription, e.g., antipyretics and cough suppressants are often in the same prescription. Therefore, efficient mining of correlations between drugs is of paramount importance. Because the number of drugs contained in a prescription is more than one, prescription recommendations are often converted to multi-label classification tasks, but as the number of drugs increases, so too does the classifier in the prescription recommendation model, thereby affecting the training efficiency of the model.

Disclosure of Invention

The main purpose of the disclosed embodiments is to provide a training method and device for a model, a computer device and a storage medium, which can improve training efficiency of the model.

To achieve the above object, a first aspect of an embodiment of the present disclosure proposes a training method of a model for training a prescription recommendation model, including:

Acquiring a plurality of inquiry sample data and prescription sample data of the inquiry sample data; wherein each of the inquiry sample data includes patient sample data and dialogue sample data, each of the prescription sample data includes a plurality of medication sample data;

The patient sample data and the dialogue sample data are subjected to prediction processing through a pre-training model, so that prescription prediction data are obtained; wherein the prescription prediction data comprises a plurality of medication prediction data;

calculating a first loss function of the pre-training model according to the plurality of medicine prediction data to obtain a first loss value;

Constructing a drug co-occurrence matrix according to the plurality of drug sample data;

Calculating a second loss function of the pre-training model according to the drug co-occurrence matrix to obtain a second loss value;

Training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model; wherein the prescription recommendation model is used for recommending prescriptions.

In some embodiments, the predicting the patient sample data and the session sample data by a pre-training model to obtain prescription prediction data includes:

Carrying out standardized processing on the patient sample data according to a preset data format to obtain corresponding patient characteristics;

Performing first coding processing on the patient characteristics to obtain corresponding patient vectors;

performing second coding processing on the dialogue sample data to obtain corresponding dialogue vectors;

and carrying out prediction processing according to the patient vector and the dialogue vector to obtain prescription prediction data.

In some embodiments, performing a second encoding process on the session sample data to obtain a corresponding session vector, including:

Acquiring a preset hierarchical attention model; the hierarchical attention model comprises a word hierarchy neural network and a sentence hierarchy neural network;

performing word segmentation processing on the dialogue sample data to obtain word segmentation data;

carrying out coding processing on the word segmentation data to obtain word coding vectors;

The word-level neural network is used for carrying out coding processing on the word coding vector to obtain sentence coding vector;

And carrying out coding processing on the sentence coding vector through the sentence-level neural network to obtain a dialogue vector.

In some embodiments, each of the drug sample data comprises a plurality of preset drugs; and performing prediction processing according to the patient vector and the dialogue vector to obtain prescription prediction data, wherein the method comprises the following steps of:

performing splicing processing on the patient vector and the dialogue vector to obtain a spliced vector;

The spliced vectors are predicted through a full-connection layer, so that the opening probability of each preset medicine is obtained;

Screening the preset medicines according to a preset threshold value and the opening probability to obtain target medicines;

the target drug is taken as the prescription prediction data.

In some embodiments, the constructing a drug co-occurrence matrix from the plurality of drug sample data comprises:

obtaining the co-occurrence relationship among the preset medicines in the medicine sample data to obtain the medicine co-occurrence relationship;

Constructing a drug co-occurrence pair of the preset drug based on the drug co-occurrence relationship, and acquiring corresponding drug co-occurrence times;

Normalizing the medicine co-occurrence times to obtain a first co-occurrence value;

Calculating the difference between the first co-occurrence value and the preset threshold value to obtain a second co-occurrence value;

and constructing a drug co-occurrence matrix according to the second co-occurrence value.

In some embodiments, the training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model includes:

And taking the first loss value and the second loss value as counter-propagation quantities, and adjusting model parameters of the pre-training model to train the pre-training model so as to obtain the prescription recommendation model.

In some embodiments, further comprising: acquiring actual inquiry data; wherein the actual inquiry data comprises actual patient data and actual dialogue data;

Inputting the actual patient data and the actual dialogue data into the prescription recommendation model to perform prescription recommendation processing to obtain a recommended prescription; wherein the prescription recommendation model is trained according to the method of any one of the embodiments of the first aspect of the present application.

A second aspect of an embodiment of the present disclosure proposes a training apparatus for training a prescription recommendation model, including:

A first acquisition module: the method comprises the steps of acquiring a plurality of inquiry sample data and acquiring prescription sample data corresponding to each inquiry sample data; wherein each of the inquiry sample data includes patient sample data and dialogue sample data, each of the prescription sample data includes a plurality of medication sample data;

Prescription prediction module: the method comprises the steps of performing prediction processing on patient sample data and dialogue sample data through a pre-training model to obtain prescription prediction data; wherein the prescription prediction data comprises a plurality of medication prediction data;

a first calculation module: the method comprises the steps of calculating a first loss function of the pre-training model according to a plurality of medicine prediction data to obtain a first loss value;

and a matrix construction module: for constructing a drug co-occurrence matrix from the plurality of drug sample data;

a second calculation module: the method comprises the steps of calculating a second loss function of the pre-training model according to the drug co-occurrence matrix to obtain a second loss value;

Model training module: the training module is used for training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model; wherein the prescription recommendation model is used for recommending prescriptions.

A third aspect of the disclosed embodiments proposes a computer device comprising a memory and a processor, wherein the memory has stored therein a program for executing the method according to any of the embodiments of the first aspect of the application when the program is executed by the processor.

A fourth aspect of the disclosed embodiments proposes a storage medium being a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method according to any one of the embodiments of the first aspect of the application.

The training method, the training device, the computer equipment and the storage medium of the model provided by the embodiment of the disclosure are characterized in that a plurality of inquiry sample data are obtained, prescription sample data corresponding to each inquiry sample data are obtained, wherein each inquiry sample data comprises patient sample data and dialogue sample data, and each prescription sample data comprises a plurality of medicine sample data; the method comprises the steps of performing prediction processing on patient sample data and dialogue sample data through a pre-training model to obtain prescription prediction data; wherein the prescription prediction data includes a plurality of drug prediction numbers; calculating a first loss function of the pre-training model according to the plurality of medicine prediction data to obtain a first loss value; constructing a drug co-occurrence matrix according to the plurality of drug sample data; calculating a second loss function of the pre-training model according to the drug co-occurrence matrix to obtain a second loss value; training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model for recommending the prescription. According to the embodiment of the application, after the pre-training model predicts the patient sample data and the dialogue sample data, the prescription sample data is introduced, the drug co-occurrence matrix is constructed according to the prescription sample data, and the drug co-occurrence loss is added according to the drug co-occurrence matrix, so that the correlation among a plurality of drugs is considered, the problem that the classifier is increased along with the increase of the number of the drugs in the prescription recommendation model is avoided, and the training efficiency of the model is improved.

Drawings

FIG. 1 is a flow chart of a training method for a model provided by an embodiment of the present disclosure;

fig. 2 is a flowchart of step S200 in fig. 1;

fig. 3 is a flowchart of step S230 in fig. 2;

fig. 4 is a first flowchart of step S240 in fig. 2;

fig. 5 is a second flowchart of step S400 in fig. 1;

FIG. 6 is a block diagram of a modular architecture of a training apparatus for a model provided by an embodiment of the present disclosure;

Fig. 7 is a schematic hardware structure of a computer device according to an embodiment of the disclosure.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

First, several nouns involved in the present application are parsed:

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Multi-tag classification (multi-label classification): it is meant that a piece of data may have one or more labels, such as a patient's physical examination report, which may be labeled with a plurality of labels for hypertension, hyperglycemia, etc.

Two classifications: two categories are shown in the classification task, such as to identify whether a picture is a cat or not. That is, a classifier is trained, a picture is input, represented by the eigenvector x, the output is cat or not, represented by y=0 or 1, and the class two classification is assumed that each sample is set with one and only one tag of 0 or 1.

Softmax classifier: for the generalized generalization of the logistic regression classifier for multiple classifications, probability values belonging to different classes are output.

Serial (Serial communication): in telecommunications and computer science, serial communication refers to a communication scheme in which one bit of metadata is transmitted at a time over a computer bus or other data channel, and the above single processes are performed consecutively. Corresponding to this is parallel communication, which communicates over a serial port by transmitting several bits of metadata at a time. Serial communication is used for long-range communication and most computer networks, where cabling and synchronization present difficulties for practical use of parallel communication. With its improved signal integrity and propagation speed, serial communication buses are becoming more and more popular, and even in short range applications, their advantages have begun to go beyond parallel buses without the need for serialization elements (serializers) and address shortcomings such as Clock skew, interconnect density (interconnect density), etc.

Hierarchical attention network (HIERARCHICAL ATTENTION NETWORKS, HAN): is a neural network for document classification, and the model has two distinct features: first, it has a hierarchical structure (words make up sentences, sentences form documents), reflecting the hierarchical structure of documents, we construct document representations by first building representations of sentences and then aggregating them into document representations; second, it applies two levels of attention mechanisms at the word and sentence level, enabling it to differentially participate in increasingly important content when building a document representation.

One-Hot Encoding (One-Hot Encoding): also known as one-bit valid code, mainly uses bit state registers to code each state, each state being defined by its independent register bit, and only one bit being valid at any time.

Coding (encoder): the input sequence is converted into a vector of fixed length.

Decoding (decoder): reconverting the previously generated fixed vector into an output sequence; wherein the input sequence can be words, voice, images and video; the output sequence may be text, images.

The recurrent neural network (Recurrent Neural Network, RNN) is a type of recurrent neural network (recursive neural network) that takes sequence data as input, performs recursion (recursion) in the evolution direction of the sequence, and all nodes (circulation units) are connected in a chained manner, wherein a bidirectional recurrent neural network (Bidirectional RNN, bi-RNN) and a Long Short-term memory network (Long Short-Term Memory networks, LSTM) are common recurrent neural networks. The cyclic neural network has memory, parameter sharing and complete figure (Turing completeness), so that the cyclic neural network has certain advantages in learning the nonlinear characteristics of the sequence. The recurrent neural network has application in the fields of natural language processing (Natural Language Processing, NLP), such as speech recognition, language modeling, machine translation, etc., and is also used for various time series predictions. A recurrent neural network constructed with the introduction of convolutional neural networks (Convolutional Neural Network, CNN) can address computer vision problems involving sequence inputs.

Self-attention mechanism (Attention Mechanism): the attention mechanism may provide the neural network with the ability to concentrate on a subset of its inputs (or features), select a particular input, and apply to any type of input, regardless of its shape. In situations where computing power is limited, the attention mechanism is a resource allocation scheme that is the primary means of solving the information overload problem, allocating computing resources to more important tasks.

Sigmoid function: is an early-appearing excitation function that ultimately projects excitation values onto both 0 and 1 values. In this way a non-linear factor is introduced, where 1 indicates a fully activated state, 0 indicates a fully deactivated state, and the other output values are between the two, indicating different degrees of activation.

Cross entropy (Cross Entropy): is an important concept in Shannon information theory, and is mainly used for measuring the difference information between two probability distributions. The performance of a language model is typically measured by cross entropy and complexity (perplexity). The meaning of cross entropy is the difficulty of text recognition with the model, or from a compression perspective, each word is encoded with on average a few bits. The meaning of complexity is that the model represents the average number of branches of this text, the inverse of which can be regarded as the average probability for each word. Smoothing refers to assigning a probability value to the unobserved N-ary combinations to ensure that the word sequence always gets a probability value through the language model. Commonly used smoothing techniques are turing estimation, interpolation smoothing, katz smoothing and Kneser-Ney smoothing.

Data Normalization (Normalization): also called normalization, which is to limit the data to be processed to a certain range after being processed by a certain algorithm. The data normalization processing is a basic work of data mining, different evaluation indexes often have different dimensions and dimension units, the situation can influence the result of data analysis, the normalization processing is needed for eliminating the dimension influence among indexes, and the comparability problem among the data indexes is solved. The purpose of data normalization is to unify the data from different sources to the same order of magnitude (a reference coordinate system) so that it makes comparison meaningful. Normalization makes the processing of the latter data more convenient, it has two major advantages: first, normalization can accelerate the speed of gradient descent to solve the optimal solution; second, normalization makes it possible to improve accuracy.

Co-occurrence matrix: the co-occurrence matrix can count the number of times of simultaneous occurrence of the classification labels and can then be used for PMI value calculation (the basic idea of the PMI algorithm is to count the probability of simultaneous occurrence of two classification labels in a text, if the probability is larger, the correlation is tighter, and the association degree is higher), so that the calculation of the co-occurrence matrix plays an important role in data mining and analysis.

Sparse matrix: in the matrix, if the number of elements with the value of 0 is far more than that of non-0 elements and the distribution of the non-0 elements is irregular, the matrix is called as a sparse matrix; in contrast, if the number of elements other than 0 is the majority, the matrix is referred to as a dense matrix. The sum of non-zero elements is defined as the thickness of the matrix over the sum of all elements of the matrix.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

The training method of the model provided by the embodiment of the application can be applied to artificial intelligence. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

For example, existing multi-label classification methods can be broadly divided into three strategies, the first strategy, based on the strength of the correlation mining between drugs: a first order strategy that ignores the correlation between drugs entirely. For example, decomposing prescription recommendations into a plurality of separate, bifurcated tasks; in the scene of large quantity of medicines, more classifiers need to be trained, and the operation is complex, time-consuming and labor-consuming. Second strategy: a second order strategy that considers only the correlation between two drugs. For example, constructing drug pairs, training a classifier for each drug pair, where the number of classifiers is d (d-1)/2, where d is the number of drugs, and as the number of drugs increases, the number of classifiers increases dramatically, with high complexity. Third strategy: a higher order strategy that considers interactions between multiple drugs. For example, training multiple bi-classifiers, the output label of the last bi-classifier is used as the input of the next bi-classifier, the model performance is affected by the label sequence and can only be trained in series, and the calculation efficiency is low.

In summary, the existing model has high complexity and low calculation efficiency. In addition, the flexibility is also weak, when a drug label is newly added, the classifier of the new label needs to be retrained, and the training efficiency of the model is seriously influenced.

Based on the above, the application provides a training method and device for a model, computer equipment and a storage medium, which can improve the training efficiency of the model.

The training method of the model provided by the embodiment of the disclosure relates to the technical field of artificial intelligence and the technical field of virtual reality. The training method of the model provided by the embodiment of the disclosure can be applied to a terminal, a server and software running in the terminal or the server. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, or smart watch, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements the above method, but is not limited to the above form.

Embodiments of the present disclosure are operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiment of the disclosure provides a training method and device of a model, a computer device and a storage medium, and specifically, the following embodiments are used for describing the training method of the model in the embodiment of the disclosure, and the training method is used for training a prescription recommendation model.

Referring to fig. 1, a training method of a model according to an embodiment of the first aspect of the present disclosure includes, but is not limited to, steps S100 to S600.

Step S100, acquiring a plurality of inquiry sample data, and acquiring prescription sample data corresponding to each inquiry sample data;

Step S200, predicting patient sample data and dialogue sample data through a pre-training model to obtain prescription prediction data;

Step S300, calculating a first loss function of the pre-training model according to a plurality of medicine prediction data to obtain a first loss value;

Step S400, constructing a drug co-occurrence matrix according to a plurality of drug sample data;

Step S500, calculating a second loss function of the pre-training model according to the drug co-occurrence matrix to obtain a second loss value;

And step S600, training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model.

In step S100 of some embodiments, a plurality of inquiry sample data is acquired, and prescription sample data corresponding to each inquiry sample data is acquired. The inquiry sample data are data generated when a patient makes an online inquiry or makes an inquiry in a hospital, each inquiry sample data comprises patient sample data and dialogue sample data, and the patient sample data mainly comprises patient name, patient age, patient gender and the like; the dialogue sample data is dialogue content generated by a patient and a doctor during the treatment; the inquiry sample data is the diagnosis result made by the doctor according to the patient sample data and the dialogue sample data, namely the prescription, wherein the prescription comprises the medicine according to the illness state of the patient.

In practical applications, the content of certain patient sample data may be: the last name of the patient is Zhang three, the age of the patient is 22 years, and the sex of the user is male. The contents of a certain session sample data may be in the following form, and its corresponding prescription sample data includes drug a and drug B.

Patient: the doctor is good, the child sleeps to get the quilt off at night today, the next day starts to cough, the wind-cold particles are eaten, the sweat feels good, but the cough is very severe at the time of 4 points today, the child feels much phlegm in the throat, how is the child asked to do?

The doctor: to monitor the body temperature of the child, whether the body temperature is increased or not is checked, if the body temperature is not increased, the child can take the medicine first, and if the body temperature is increased, the child can recommend to visit a hospital to check blood conventionally.

Patient: what is the medicine taken?

The doctor: some phlegm-eliminating and cough-relieving medicines can be orally taken, such as medicine A, medicine B and the like.

Each prescription sample data comprises a plurality of medicine sample data, when preliminary prescription prediction is carried out, the prediction is carried out according to the patient sample data, such as patient age, patient gender and certain key information in the dialogue data, such as symptoms of cough, throat with much phlegm and sweating in the dialogue sample data, medicine names of medicine A and medicine B, and the like, after the prescription sample data are obtained through prediction, a first loss value and a second loss value are further calculated to correct a loss function of the pre-training model, so that the pre-training model is trained towards a new target according to the target loss value, and an optimized pre-training model, namely a prescription recommendation model, is obtained.

In step S200 of some embodiments, the patient sample data and the session sample data are subjected to prediction processing by a pre-training model to obtain prescription prediction data; the pre-training model is a pre-trained prediction model, such as a text classification model, and can perform preliminary prediction on a prescription required to be prescribed according to patient sample data and dialogue data to obtain prescription prediction data, wherein the prescription prediction data comprises a plurality of medicine prediction data, such as medicine C and medicine D. Further, the pre-training model predicts the probability of each drug being prescribed according to the patient sample data and the dialogue sample data, and then converts the probability of each drug being prescribed to the [0,1] range by using the sigmoid excitation function, namely the prescribed prescription prediction data.

Specifically, for the x-th inquiry, the score of each drug, i.e., the probability of being prescribed, is noted as s _x∈R^d, as shown in equation (1):

s_x＝{y₁,y₂,...,y_d},y_i∈[0，1] (1)

Where d is the number of all drugs and y _i is the score for the ith drug, and when the threshold is exceeded, the drug is considered to be included in the prescription, so the final recommended prescription is designated as P _x∈R^d.

P_x＝{p₁,p₂,...,p_d},p_i∈{0，1} (3)

In step S300 of some embodiments, a first loss function of the pre-training model is calculated from the plurality of drug prediction data, resulting in a first loss value. If the pre-training model of the embodiment of the present application is a hierarchical attention network model, a BCE loss function may be selected as a first loss function of the pre-training model, and the BCE loss function is calculated to obtain a first loss value, where a specific calculation process is shown in formula (4):

Where L _model is the first loss value, d is the number of all drugs, p _i is used to indicate whether the corresponding drug is included in the prescription, p _i =0 indicates that the drug is not included, p _i =1 indicates that the drug is included, and y _i is the score of the ith drug.

In step S400 of some embodiments, a drug co-occurrence matrix is constructed according to a plurality of drug sample data, where the drug co-occurrence matrix represents the number of co-occurrences between every two drugs, and the number of co-occurrences refers to the number of co-occurrences of drug E and drug F in a certain prescription.

In step S500 of some embodiments, a second loss function of the pre-training model is calculated from the drug co-occurrence matrix, resulting in a second loss value.

In step S600 of some embodiments, training the pre-training model according to the first loss value and the second loss value to obtain the prescription recommendation model. Specifically, the first loss value and the second loss value are used as counter-propagation quantities, and model parameters of the pre-training model are adjusted to train the pre-training model, so that a prescription recommendation model is obtained. The method comprises the steps of combining a first loss function with a second loss function to obtain a target loss function of a pre-training model, combining a first loss value with a second loss value to obtain a target loss value of the pre-training model, correcting the target loss function of the pre-training model by combining the first loss value, so that the pre-training model is trained according to the target loss value, and optimizing the pre-training model towards a new target, thereby training the pre-training model by adjusting model parameters of the pre-training model to obtain a trained prescription recommendation model, wherein the prescription recommendation model is used for recommending prescriptions.

In some embodiments, as shown in fig. 2, step S200 specifically includes, but is not limited to, step S210 to step S240.

Step S210, carrying out standardized processing on patient sample data according to a preset data format to obtain corresponding patient characteristics;

step S220, performing first coding processing on the patient characteristics to obtain corresponding patient vectors;

step S230, performing a second encoding process on the dialogue sample data to obtain a corresponding dialogue vector;

step S240, according to the patient vector and the dialogue vector, prediction processing is carried out to obtain prescription prediction data.

In step S210 of some embodiments, patient sample data is normalized according to a preset data format, for example, for the patient gender and the patient age, to obtain a corresponding patient vector. In particular, for patient gender, the patient gender may be normalized using a one-hot encoding method in which N states are encoded using N-bit state registers, each state having its own register bit, and at any time only one of the bits is valid, i.e., only one bit is a 1 and the rest are zero values. For example, normalized data obtained by monothermally encoding patient gender characteristics "male" and "female" are: men=10, women=01. The discrete features are subjected to independent thermal coding, the values of the discrete features are expanded to European space, a certain value of the discrete features corresponds to a certain point of the European space, and the discrete features are subjected to independent thermal coding, so that the distance calculation between the features is more reasonable, and the model training effect is improved. In addition, for the patient ages, since the acquired patient ages are not all in a uniform format, such as "32 years", "thirty-two years", "age 32", and the like, in the inquiry process, it is necessary to unify the formats of the patient ages, such as unifying the patient ages to arabic data, such as "32", and the like.

In step S220 of some embodiments, in the modeling process, a first encoding process, i.e., an encoding process, is first performed on the patient feature to obtain a corresponding patient vector.

In step S230 of some embodiments, in the modeling process, a second encoding process, that is, an encoding process, is further required to perform a second encoding process on the dialogue sample data, so as to obtain a corresponding dialogue vector. The first encoding process and the second encoding process are identical in terms of the process of transcoding the feature or data into corresponding encoded vectors, and the first and second are only for distinguishing between different encoded objects.

In step S240 of some embodiments, prediction processing is performed according to the patient vector and the dialogue vector, that is, preliminary prescription prediction is performed by the pre-training model, so as to obtain prescription prediction data.

In some embodiments, as shown in fig. 3, step S230 specifically includes, but is not limited to, steps S231 to S235.

Step S231, a preset hierarchical attention model is obtained;

step S232, word segmentation processing is carried out on the dialogue sample data to obtain word segmentation data;

step S233, carrying out coding processing on the segmented word data to obtain word coding vectors;

Step S234, the word coding vector is coded through a word hierarchy neural network to obtain a sentence coding vector;

And step S235, carrying out coding processing on the sentence-encoded vector through a sentence-level neural network to obtain a dialogue vector.

In step S231 of some embodiments, a preset hierarchical attention model, also referred to as a hierarchical attention network model, is acquired, which includes a word hierarchical neural network including a word sequence encoder and a word level attention layer, and a sentence hierarchical neural network including a sentence sequence encoder and a sentence level attention layer.

In step S232 of some embodiments, word segmentation processing is performed on the dialogue sample data, so as to obtain word segmentation data.

In step S233 of some embodiments, the word segmentation data is encoded, specifically, the word segmentation data is encoded by an encoder, to obtain a word encoding vector.

In step S234 of some embodiments, the word encoding vector is input to the word-level neural network, and the word encoding vector is encoded by a word sequence encoder of the word-level neural network to obtain a sentence encoding vector. The method comprises the following steps: in the word-level neural network, a task is a classification task, that is, each dialogue data to be classified is considered to be divided into a plurality of sentences, so that the word-level neural network processes each sentence. But not every word is useful for classification tasks for words in a sentence, such as words of great interest in the emotional classification of text: "fine", "bad sense", etc., in order to enable the recurrent neural network to automatically put "attention" on these words as well, words important for sentence meaning are extracted by introducing an attention mechanism through the word-level attention layer, and representations of those information words are summarized to form sentence vectors, i.e., sentence-encoding vectors.

In step S235 of some embodiments, the sentence-encoded vector obtained in step S234 is input to a sentence-level neural network, the sentence-sequence encoder of the sentence-level neural network encodes the sentence-encoded vector to obtain a dialogue vector, the specific process of which is consistent with the process of encoding the sentence-level neural network, and in order to reward a sentence that can be classified correctly, consider a context vector that uses the attention mechanism of the sentence-level attention layer again and introduces the sentence level, and use the vector to measure the importance of the sentence, so as to obtain a final dialogue vector.

In some embodiments, as shown in fig. 4, step S240 specifically includes, but is not limited to, steps S241 to S244.

Step S241, performing splicing processing on the patient vector and the dialogue vector to obtain a spliced vector;

Step S242, predicting the spliced vector through the full-connection layer to obtain the opening probability of each preset medicine;

Step S243, screening the preset medicines according to the preset threshold and the opening probability to obtain target medicines;

step S244, the target drug is taken as prescription prediction data.

In step S241 of some embodiments, a stitching process is performed on the patient vector and the dialogue vector by the stitching layer, resulting in a stitched vector.

In step S242 of some embodiments, the prediction processing is performed on the spliced vector through the full connection layer, specifically, the spliced vector may be input into a classifier to perform classification processing, so as to obtain the probability of issuing each preset drug.

In step S243 of some embodiments, screening a preset drug according to a preset threshold and an opening probability to obtain a target drug, wherein if the preset drug exceeds the preset threshold, the preset drug is considered to be the target drug, and if the preset drug does not exceed the preset threshold, the preset drug is considered to be not the target drug; screening out preset medicines with the opening probability larger than a preset threshold value, and taking the medicines as target medicines.

In step S244 of some embodiments, the target drug is included as split prediction data, i.e., the prescription, including a plurality of target drugs.

In practical application, the application embodiment selects the hierarchical attention network HAN because of the hierarchical structure of word-sentence-piece in the dialogue data. The specific operation is as follows, firstly, data preprocessing is carried out, and the gender of the patient is subjected to single-heat coding and the age of the patient is standardized; inputting the characteristics into a full connection layer to obtain a representation e ₁ of a patient, inputting dialogue information into HAN, and sequentially obtaining an integral representation e ₂ of the dialogue information through RNNs and attention of word layers and RNNs and attention of sentence layers; finally, the patient representation e ₁ and the dialogue representation e ₂ are spliced and input into a full connection layer to obtain the probability of each predicted drug being prescribed, and the probability of each predicted drug being prescribed is converted into the range of [0,1] by using a sigmoid excitation function, so that the prescribed prescription is obtained.

In some embodiments, as shown in fig. 5, step S400 specifically includes, but is not limited to, steps S410 to S450.

S410, obtaining a co-occurrence relationship among preset medicines in medicine sample data to obtain a medicine co-occurrence relationship;

S420, constructing a drug co-occurrence pair of a preset drug based on a drug co-occurrence relationship, and acquiring corresponding drug co-occurrence times;

S430, carrying out normalization processing on the medicine co-occurrence times to obtain a first co-occurrence value;

S440, calculating the difference between the first co-occurrence value and a preset threshold value to obtain a second co-occurrence value;

S450, constructing a drug co-occurrence matrix according to the second co-occurrence value.

In step S410 of some embodiments, the drug sample data includes a plurality of preset drugs, and a co-occurrence relationship between the preset drugs in the drug sample data is obtained to obtain a drug co-occurrence relationship, where the co-occurrence relationship refers to whether each two preset drugs co-occur in the same drug sample data.

In step S420 of some embodiments, a drug co-occurrence pair of a preset drug is constructed based on a drug co-occurrence relationship, and a corresponding number of drug co-occurrences is obtained, for example, the preset drug includes drug G and drug H, and the drug G and drug H have a co-occurrence relationship, and a co-occurrence pair of drug G and drug H is constructed, and a corresponding number of co-occurrences of the co-occurrence pair is obtained, for example, 1 co-occurrence number is 1.

In step S430 of some embodiments, the number of drug co-occurrences is normalized, i.e., the number of drug co-occurrences is limited to a certain range, e.g., the [0,1] range, as desired after being processed.

In step S440 of some embodiments, a difference between the first co-occurrence value and the preset threshold is calculated to obtain a second co-occurrence value, that is, the first co-occurrence value is subtracted from the preset threshold to obtain the second co-occurrence value.

In step S450 of some embodiments, a drug co-occurrence matrix of the preset drug is constructed from the second co-occurrence values.

In practical application, the process of constructing the drug co-occurrence matrix is as follows:

Setting k diagnoses, and obtaining k prescription sample data, namely k training sets, for the kth diagnosis, and obtaining a drug co-occurrence matrix A _co,k∈R^d×d by statistics from the k training sets, wherein the drug co-occurrence matrix A _co,k∈R^d×d is shown as a formula (5):

wherein cnt [ i, j ] is the number of co-occurrence times of the ith drug and the jth drug in the prescription of the kth diagnosis, and the values of the ith drug and the jth drug in the drug co-occurrence matrix are obtained by normalizing the number of co-occurrence times of the ith drug and the jth drug.

For example, under the diagnosis of "fever and cough", the doctor prescribes a first prescription (including antipyretic plaster and anti-cough medicine) 3 times and prescribes a second prescription (including anti-cough medicine and anti-inflammation medicine) 1 time, and three medicines of the antipyretic plaster medicine, the anti-cough medicine and the anti-inflammation medicine are assumed to be combined, and the co-occurrence matrix is shown as (6):

based on the co-occurrence matrix, the embodiment of the application designs a second loss function L _co, which is specifically shown in formula (7):

Wherein, Element/>The product of the scores of the ith medicine and the jth medicine is obtained by preliminary prediction of a pre-training model, and can also be regarded as the probability of paired occurrence of the ith medicine and the jth medicine. The design of L _co ensures that the score of the drug pair with high co-occurrence times in the model is also high under the kth diagnosis, thereby ingeniously capturing the correlation among the drugs. For example, under the diagnosis of "fever and cough", the co-occurrence times of the antipyretic patch and the cough-relieving medicine are higher, and if the score of the antipyretic patch predicted by the pre-training model is 0.8 and the score of the cough-relieving medicine is 0.3, the product of the scores of the antipyretic patch and the cough-relieving medicine tends to be increased by L _co, and the respective scores of the two tend to be increased.

Therefore, the target loss function of the pre-training model is finally determined as the sum of the first loss function and the second loss function, specifically as shown in the formula (8):

L＝L_model+L_co (8)

in summary, in the embodiment of the present application, the drug co-occurrence matrix is constructed mainly by using prescription sample data, i.e. diagnostic information, and then the drug co-occurrence loss, i.e. the second loss function, is added based on the first loss function of the original classification task.

In practical application, the specific training process of the prescription recommendation model is as follows, firstly, patient information and dialogue information are input into a text classification model (for example, a hierarchical attention model HAN selected in the embodiment of the application, the specific model structure of which is described in the embodiment above), then, a drug co-occurrence matrix of each diagnosis is constructed, and the loss function of the model is reset to be L, so that the model is optimized towards the goal of minimizing L in the training process, and the prescription recommendation model is obtained after optimization, so that the final recommended prescription is obtained. In other words, although the present solution constructs a co-occurrence matrix of drug pairs, all drug pairs are considered simultaneously for the same diagnosis, i.e., correlations between drugs are indirectly mined. In addition, the embodiment of the application only needs to add a loss function based on the existing model, and does not need to add a classifier, thereby improving training efficiency. In addition, the embodiment of the application is suitable for any number of diagnoses, even if the number of diagnoses is increased, the medicine co-occurrence matrix under each diagnosis is extremely sparse, the medicine co-occurrence matrix can be stored in a sparse matrix form, too much storage space is not occupied, and the embodiment of the application is also suitable for any number of medicines, no matter how many medicines are, only one model is needed.

According to the training method of the model, a plurality of inquiry sample data are obtained, prescription sample data corresponding to each inquiry sample data are obtained, wherein each inquiry sample data comprises patient sample data and dialogue sample data, and each prescription sample data comprises a plurality of medicine sample data; the method comprises the steps of performing prediction processing on patient sample data and dialogue sample data through a pre-training model to obtain prescription prediction data; wherein the prescription prediction data includes a plurality of drug prediction numbers; calculating a first loss function of the pre-training model according to the plurality of medicine prediction data to obtain a first loss value; constructing a drug co-occurrence matrix according to the plurality of drug sample data; calculating a second loss function of the pre-training model according to the drug co-occurrence matrix to obtain a second loss value; training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model for recommending the prescription. According to the embodiment of the application, after the pre-training model predicts the patient sample data and the dialogue sample data, the prescription sample data is introduced, the drug co-occurrence matrix is constructed according to the prescription sample data, and the drug co-occurrence loss is added according to the drug co-occurrence matrix, so that the correlation among a plurality of drugs is considered, the problem that the classifier is increased along with the increase of the number of the drugs in the prescription recommendation model is avoided, and the training efficiency of the model is improved.

Embodiments of the present disclosure also include, but are not limited to, the following steps: acquiring actual inquiry data; wherein the actual interview data includes actual patient data and actual session data; inputting actual patient data and actual dialogue data into a prescription recommendation model to perform prescription recommendation processing to obtain a recommended prescription; the prescription recommendation model is trained by a training method of the model according to the embodiment of the first aspect of the application.

The embodiment of the disclosure also provides a training device for training a prescription recommendation model, as shown in fig. 6, which can implement the training method of the model, and the device includes: the first obtaining module 710, the prescription prediction module 720, the first calculating module 730, the matrix constructing module 740, the second calculating module 750, and the model training module 760, where the first obtaining module 710 is configured to obtain a plurality of inquiry sample data, and obtain prescription sample data corresponding to each inquiry sample data; wherein each of the inquiry sample data includes patient sample data and dialogue sample data, and each of the prescription sample data includes a plurality of drug sample data; the prescription prediction module 720 is configured to perform prediction processing on the patient sample data and the dialogue sample data through the pre-training model to obtain prescription prediction data; wherein the prescription prediction data comprises a plurality of medication prediction data; the first calculation module 730 is configured to calculate a first loss function of the pre-training model according to the plurality of drug prediction data, to obtain a first loss value; the matrix construction module 740 is configured to construct a drug co-occurrence matrix according to the plurality of drug sample data; the second calculation module 750 is configured to calculate a second loss function of the pre-training model according to the drug co-occurrence matrix, so as to obtain a second loss value; the model training module 760 is configured to perform training processing on the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model; wherein the prescription recommendation model is used for recommending prescriptions. It should be noted that, the training device of the model in the embodiment of the present disclosure is used for executing the training method of the model in the above embodiment, and the specific processing procedure is the same as that of the model in the above embodiment, which is not described here again.

The disclosed embodiments also provide a computer device comprising:

at least one processor, and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions that are executed by the at least one processor to cause the at least one processor to perform a method according to any of the embodiments of the first aspect of the application when the instructions are executed.

The hardware structure of the computer device is described in detail below with reference to fig. 7. The computer device includes: processor 810, memory 820, input/output interface 830, communication interface 840 and bus 850.

The processor 810 may be implemented by a general purpose central processing unit (Central Processin Unit, CPU), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present disclosure;

The Memory 820 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). Memory 820 may store an operating system and other application programs, and when implementing the technical solutions provided by the embodiments of the present disclosure through software or firmware, relevant program codes are stored in memory 820 and a training method for executing the model of the embodiments of the present disclosure is called by processor 810;

an input/output interface 830 for implementing information input and output;

the communication interface 840 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.); and

Bus 850 transfers information between the various components of the device (e.g., processor 810, memory 820, input/output interface 830, and communication interface 840);

Wherein processor 810, memory 820, input/output interface 830, and communication interface 840 enable communication connections among each other within the device via bus 850.

The disclosed embodiments also provide a storage medium that is a computer-readable storage medium storing computer-executable instructions for causing a computer to perform a training method of a model of the disclosed embodiments.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present disclosure are for more clearly describing the technical solutions of the embodiments of the present disclosure, and do not constitute a limitation on the technical solutions provided by the embodiments of the present disclosure, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present disclosure are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the solutions shown in fig. 1-5 are not limiting to the embodiments of the present disclosure, and may include more or fewer steps than illustrated, or certain steps may be combined, or different steps.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

Preferred embodiments of the disclosed embodiments are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the disclosed embodiments. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present disclosure shall fall within the scope of the claims of the embodiments of the present disclosure.

Claims

1. A method of training a model, comprising:

Acquiring inquiry sample data and prescription sample data of the inquiry sample data; wherein each of the inquiry sample data includes patient sample data and dialogue sample data, each of the prescription sample data includes a plurality of medication sample data;

training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model; wherein the prescription recommendation model is used for recommending prescriptions;

Wherein each of the drug sample data includes a plurality of preset drugs, and the constructing a drug co-occurrence matrix according to the plurality of drug sample data includes:

calculating the difference between the first co-occurrence value and a preset threshold value to obtain a second co-occurrence value;

constructing the drug co-occurrence matrix according to the second co-occurrence value; wherein, the drug co-occurrence matrix is:

，

wherein d is the number of all medicines, k diagnoses are provided, k training sets are obtained for the kth diagnosis, and the medicine co-occurrence matrix is obtained by statistics from the k training sets ,/>Is the number of co-occurrence of the ith and jth drugs in the prescription for the kth diagnosis,/>Is the number of co-occurrences of the a-th and b-th drugs in the prescription for the kth diagnosis;

Wherein the second loss function is:

，

Wherein the second loss function is Element/>Is obtained by preliminary prediction of the pre-training model.

2. The method of claim 1, wherein the predicting the patient sample data and the session sample data by a pre-training model to obtain prescription prediction data comprises:

3. The method according to claim 2, wherein said performing a second encoding process on the session sample data to obtain a corresponding session vector comprises:

4. The method of claim 2, wherein said predicting based on said patient vector and said session vector results in prescription prediction data, comprising:

screening the preset medicines according to the preset threshold and the opening probability to obtain target medicines;

the target drug is taken as the prescription prediction data.

5. The method of claim 1, wherein the training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model comprises:

6. The method of any one of claims 1-5, further comprising:

Acquiring actual inquiry data; wherein the actual inquiry data comprises actual patient data and actual dialogue data;

and inputting the actual patient data and the actual dialogue data into the prescription recommendation model to perform prescription recommendation processing to obtain a recommended prescription.

7. A training apparatus for training a prescription recommendation model, comprising:

A first acquisition module: prescription sample data for acquiring a plurality of inquiry sample data and the inquiry sample data; wherein each of the inquiry sample data includes patient sample data and dialogue sample data, each of the prescription sample data includes a plurality of medication sample data;

Model training module: the training module is used for training the pre-training model according to the first loss value and the second loss value to obtain a prescription recommendation model; wherein the prescription recommendation model is used for recommending prescriptions;

wherein each of the drug sample data comprises a plurality of preset drugs,

The constructing a drug co-occurrence matrix according to the plurality of drug sample data comprises:

，

Wherein the second loss function is:

，

8. A computer device comprising a memory and a processor, wherein the memory has a program stored therein, the program when executed by the processor is for the processor to perform:

The method of any one of claims 1 to 6.

9. A storage medium that is a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that, when executed by a computer, is operable to perform:

The method of any one of claims 1 to 6.