CN117668233A

CN117668233A - Medical large model construction method, device, computer equipment and medium

Info

Publication number: CN117668233A
Application number: CN202311708696.9A
Authority: CN
Inventors: 刘磊; 邱建华; 刘伟华; 马金民; 李林
Original assignee: Beijing Wisdom Eye Information Technology Co ltd
Current assignee: Beijing Wisdom Eye Information Technology Co ltd
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-03-08

Abstract

The invention discloses a medical large model construction method, a device, computer equipment and a medium, comprising the following steps: performing direction control on the large model generation content through a direction control model, and determining a generation direction, wherein the direction control model is a text classification model; medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed; and integrating the generation direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result. The fact knowledge contained in the knowledge graph is integrated into the large language model, so that the accuracy of large language model generation is improved, and the accuracy of text recognition and classification is improved.

Description

Medical large model construction method, device, computer equipment and medium

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a method and apparatus for constructing a medical large model, a computer device, and a medium.

Background

With the development of artificial intelligence technology, natural language processing technology is widely applied, and the large language model is excellent in natural language processing tasks, such as machine translation, text question-answering and the like.

The inventor realizes that at least the following technical problems exist in the prior art in the process of realizing the invention: an unavoidable problem with large language models is the lack of knowledge of facts, which often creates false information, creating illusions that hinder the credibility of the model and make text classification recognition inaccurate.

Disclosure of Invention

The embodiment of the invention provides a medical large model construction method, a medical large model construction device, computer equipment and a storage medium, so as to improve the accuracy of text classification.

In order to solve the above technical problems, an embodiment of the present application provides a medical large model construction method, including:

performing direction control on the large model generation content through a direction control model, and determining a generation direction, wherein the direction control model is a text classification model;

medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed;

and integrating the generating direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result.

Optionally, the direction control model adopts an improved Bert model as a basic model, the generating content for the large model through the direction control model is used for direction control, and determining the generating direction includes:

adding a preset mark to each sentence head of the input content to serve as integral information;

after each piece of integral information is subjected to the improved Bert model, sentence vectors representing the integral information are output;

and linearizing and normalizing the sentence vector by adopting a linearization layer and a normalization layer to obtain the probability that the sentence vector belongs to each category, and taking the category with the maximum probability value as the generation direction.

Optionally, the model control network comprises two multi-layer perceptron layers MLP and one Transformer encoder layer.

Optionally, the medical large model construction method further includes:

the training corpus is subjected to a direction control model and a skeleton control model to obtain an input corpus x and a triplet c;

inputting the training corpus x into a basic model to obtain a first output y1:

y ₁ ＝F(x；Θ)

where Θ represents the basic model parameters, F (; Θ) represents the basic model function with respect to the parameters Θ, x represents the input;

the triplet c passes through the first layer of the multi-layer perceptron layer MLP and is overlapped with the input corpus x to obtain a new input x ', and the new input x' passes through the second layer of the multi-layer perceptron layer MLP to obtain a second output y2;

superposing the first output and the second output to obtain a target output;

and training the direction control model, the skeleton control model and the model control network based on the real value corresponding to the training corpus and the target output.

Optionally, the base model freezes the parameters during the training process.

Optionally, the step of performing medical information extraction and identification by using a framework control model, and the step of constructing the fact knowledge integrated into the large model integrated knowledge graph includes:

and adopting structural medical knowledge nodes or relations contained in the knowledge graph to carry out skeleton control on the large model generation content, and determining the fact knowledge.

In order to solve the above technical problem, an embodiment of the present application further provides a medical large model building device, including:

the direction control module is used for performing direction control on the large model generation content through a direction control model and determining the generation direction, wherein the direction control model is a text classification model;

the framework control module is used for extracting and identifying medical information by adopting a framework control model and constructing fact knowledge integrated with a large model integrated knowledge graph;

and the network control module is used for integrating the generation direction and the fact knowledge into a large model network through a model control network to perform text processing, so as to obtain a text processing result.

Optionally, the direction control model adopts a modified Bert model as a basic model, and the direction control module includes:

the whole identification unit is used for adding a preset identification to each sentence head of the input content to be used as whole information;

the sentence characterization unit is used for outputting sentence vectors for representing the whole information after each whole information is subjected to the improved Bert model;

and the class determining unit is used for linearizing and normalizing the sentence vectors by adopting a linearization layer and a normalization layer to obtain the probability that the sentence vectors belong to each class, and taking the class with the largest probability value as the generation direction.

Optionally, the medical large model construction device further includes:

the corpus analysis module is used for enabling the training corpus to pass through the direction control model and the skeleton control model to obtain an input corpus x and a triplet c;

the first recognition module is used for inputting the training corpus x into the basic model to obtain a first output y1:

y ₁ ＝F(x；Θ)

the second recognition module is used for enabling the triplet c to pass through the first layer of the multi-layer perceptron layer MLP and to be overlapped with the input corpus x to obtain a new input x ', and enabling the new input x' to pass through the second layer of the multi-layer perceptron layer MLP to obtain a second output y2;

the output determining module is used for superposing the first output and the second output to obtain a target output;

and the iterative training module is used for training the direction control model, the skeleton control model and the model control network based on the real value corresponding to the training corpus and the target output.

In order to solve the above technical problem, the embodiments of the present application further provide a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the steps of the above medical large model building method are implemented when the processor executes the computer program.

In order to solve the above technical problem, the embodiments of the present application further provide a computer readable storage medium storing a computer program, where the computer program implements the steps of the medical large model building method described above when executed by a processor.

The medical large model construction method, the device, the computer equipment and the storage medium provided by the embodiment of the invention are used for carrying out direction control on the generation content of the large model through the direction control model and determining the generation direction, wherein the direction control model is a text classification model; medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed; and integrating the generation direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result. The fact knowledge contained in the knowledge graph is integrated into the large language model, so that the accuracy of large language model generation is improved, and the accuracy of text recognition and classification is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a medical large model building method of the present application;

FIG. 3 is a diagram showing an example of the structure of a directional control model of the present application;

FIG. 4 is a diagram showing an example of the structure of an overall large language model of the present application;

FIG. 5 is a schematic structural view of one embodiment of a medical large model building apparatus according to the present application;

FIG. 6 is a schematic structural diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the medical large model construction method provided in the embodiment of the present application is executed by a server, and accordingly, the medical large model construction device is disposed in the server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation requirements, and the terminal devices 101, 102 and 103 in the embodiments of the present application may specifically correspond to application systems in actual production.

Referring to fig. 2, fig. 2 shows a medical large model construction method according to an embodiment of the present invention, and the method is applied to the server in fig. 1 for illustration, and is described in detail as follows:

s201: and performing direction control on the large model generation content through a direction control model, and determining the generation direction, wherein the direction control model is a text classification model.

In a specific optional implementation manner, the direction control model adopts an improved Bert model as a basic model, the direction control is performed on the generated content of the large model through the direction control model, and determining the generation direction comprises:

linearizing and normalizing the sentence vectors by adopting a linearizing layer and a normalizing layer to obtain the probability that the sentence vectors belong to each category, and taking the category with the largest probability value as the generating direction.

Specifically, the direction control module is used for identifying the type input by a user, determining the overall direction of large model generation and preventing generation of irrelevant information. The essence of the direction control model in this embodiment is a text classification model, a Roberta is adopted as a basic model, a [ CLS ] symbol is added to an input sentence head for representing input overall information, a final output vector C (embodying information of the whole sentence) is obtained after the [ CLS ] passes through the Roberta, then the probability of each category is obtained through a linear layer (obtaining a vector with the same dimension N as the number of categories) and a softmax (normalizing the vector to be 1), the category with the highest probability is selected as output, the model structure is shown in fig. 3, the category is manually defined in advance according to a medical knowledge graph and a medical question-answer corpus, and the model structure is divided into the following categories: symptoms, concurrent symptoms, diagnostic examination items, medical subjects, disease names, disease profiles, disease causes, preventive measures, treatment cycles, treatment patterns, cure probabilities, disease susceptibility populations, medicines, foods, and the like.

During the model training phase, a small amount of training data is manually annotated, e.g., for "what are symptoms of cold? The category of the manual label is "symptom", and the training target is to make the model predicted category consistent with the manual label category. In the model reasoning phase, for any input, the model will divide it into the above manually defined categories. The categories are manually predefined and for any input they are classified into one of the categories. The model outputs the probability of each category finally, and the highest probability is the final output. The similar category of "symptom, concurrent symptom" is also a category that selects a higher probability of output.

S202: and (3) extracting and identifying medical information by adopting a skeleton control model, and constructing the fact knowledge integrated with the large model and the knowledge graph.

Specifically, the framework control model is used for extracting medical information of a problem input by a user, and mainly comprises entity identification, attribute identification and the like. For example: the field knowledge contained in medical knowledge maps of disease types, symptoms, medical subjects, medicines, treatment probabilities, foods and the like. After extracting the entity and attribute in the user problem, finding out the corresponding node in the medical knowledge graph. And then inquiring corresponding skeleton knowledge in the knowledge graph according to the classification result of the direction control model.

For example when the user inputs: what are symptoms of the cold? Firstly, the direction control model identifies that the medical direction which a user wants to ask is a symptom, secondly, the information extraction module extracts an entity cold, then the cold is inquired in a knowledge graph, and based on the symptom identified by the direction control model, the corresponding symptom is inquired in the knowledge graph as skeleton knowledge: ("Cold", "symptom", "fever, cough, runny nose").

S203: and integrating the generation direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result.

Specifically, the control further controls the behavior of the entire neural network by operating the input conditions of the neural network block. The chatglm-6b is used as a basic model, and the basic model is difficult to train because the parameter quantity of the basic model is 60 hundred million, so that the basic model is frozen during training, the parameters of the basic model are not updated, and the training time and cost are reduced.

And adding a new control network layer, which is used for introducing the fact knowledge of the knowledge graph to the basic model and controlling the generation of the basic model. The fact knowledge C (obtained by inquiring the knowledge graph by the framework control module) and the output X are generated into an output vector y2 after passing through the control network layer, and the output vector y2 is added with the output vector y1 of the basic model to form a final output vector y, so that the fact knowledge is introduced into the output, the model output is more in line with the fact, and the control generation effect is achieved. The overall control network architecture is shown in fig. 4.

Optionally, the model control network includes two layers of multi-layer perceptron layers MLP and one Transformer encoder layer, the layers of multi-layer perceptron layers MLP acting to enable the model to capture more characteristic information, transformer encoder layers for converting input sequence information into a set of representation vectors via a self-attention mechanism.

The frozen model is characterized by the fact that the quantity of parameters is too large, 10 hundred million parameters exist, the training time and the training result are large, and the training is difficult. The control network is added to introduce the fact knowledge of the knowledge graph into the basic model output, and the control model is generated.

In a specific optional embodiment, the medical large model construction method further includes:

inputting the training corpus x into a basic model to obtain a first output y1:

y ₁ =f (x; Θ) wherein Θ represents a basic model parameter, F (; Θ) represents a function of the basic model with respect to the parameter Θ, x represents an input;

the triplet c passes through a first layer of multi-layer perceptron layer MLP and is overlapped with the input corpus x to obtain a new input x ', and the new input x' passes through a second layer of multi-layer perceptron layer MLP to obtain a second output y2;

superposing the first output and the second output to obtain a target output;

based on the real value and target output corresponding to the training corpus, training the directional control model, the skeleton control model and the model control network.

Optionally, the base model freezes the parameters during the training process.

For the input problem: "which symptoms of cold exist", and obtaining knowledge triples ("cold", "symptoms", "fever, cough, runny nose") in the knowledge graph through the direction control module and the skeleton control module, and respectively representing input problems and triples as x and c.

(1) Base model:

the input x passes through the base model to produce an output y ₁ ：

y ₁ ＝F(x；Θ)

Wherein Θ represents the basic model parameters, note that this part of parameters are to be frozen during training, and are not updated; f (; Θ) represents the function of the base model with respect to the parameter Θ, x represents the input.

(2) Control layer:

the triplet C is first passed through the first MLP layer and added to the input X to obtain a new input X':

x′＝x+Z(c，Θ ₂₁ )

wherein Z (. Theta.; theta) _z1 ) Representing the first MLP layer with respect to the parameter Θ _z1 C represents a knowledge triplet.

x' then goes through Transformer encoder layers and a second MLP layer to get the output y2 of the control layer:

y ₂ ＝Z(Encoder(x′)，Θ _z2 )

wherein, encoder () represents Transformer Encoder layers, Z (, Θ) _z2 ) Representing the second MLP layer with respect to the parameter Θ _z2 Is a function of (2).

(3) Adding the basic model input and control layer output vectors to obtain a model final output vector y _pred ：

y _pred ＝y ₁ +y ₂ 。

(4) The training targets are as follows: let y of model output _pred Close to the true value y _true ：

y _pred ≈y _true 。

In a specific optional implementation manner, the framework control model is adopted to extract and identify medical information, and the construction of the fact knowledge integrated into the large model and the knowledge graph comprises the following steps:

In the embodiment, the direction control is performed on the large model generation content through the direction control model, and the generation direction is determined, wherein the direction control model is a text classification model; medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed; and integrating the generation direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result. The fact knowledge contained in the knowledge graph is integrated into the large language model, so that the accuracy of large language model generation is improved, and the accuracy of text recognition and classification is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Fig. 5 shows a schematic block diagram of a medical large model construction apparatus in one-to-one correspondence with the medical large model construction method of the above embodiment. As shown in fig. 5, the medical large model construction apparatus includes a direction control module 31, a skeleton control module 32, and a network control module 33. The functional modules are described in detail as follows:

the direction control module 31 is configured to perform direction control on the generated content for the large model through a direction control model, and determine a generation direction, where the direction control model is a text classification model;

the framework control module 32 is used for extracting and identifying medical information by adopting a framework control model, and constructing fact knowledge integrated into a knowledge graph of a large model;

the network control module 33 is configured to integrate the generation direction and the fact knowledge into a large model network through the model control network, and perform text processing to obtain a text processing result.

Alternatively, the direction control model employs a modified Bert model as a base model, and the direction control module 31 includes:

the class determining unit is used for linearizing and normalizing the sentence vectors by adopting the linearization layer and the normalization layer to obtain the probability that the sentence vectors belong to each class, and the class with the largest probability value is used as the generating direction.

Optionally, the medical large model construction device further includes:

the second recognition module is used for enabling the triplet c to pass through the first layer multi-layer perceptron layer MLP and to be overlapped with the input corpus x to obtain a new input x ', and enabling the new input x' to pass through the second layer multi-layer perceptron layer MLP to obtain a second output y2;

and the iterative training module is used for training the directional control model, the skeleton control model and the model control network based on the real value and the target output corresponding to the training corpus.

For specific limitations on the medical large model construction apparatus, reference may be made to the above limitations on the medical large model construction method, and no further description is given here. The respective modules in the above-described medical large model construction apparatus may be realized in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 6, fig. 6 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only a computer device 4 having a component connection memory 41, a processor 42, a network interface 43 is shown in the figures, but it is understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various types of application software installed on the computer device 4, such as program codes for medical large model construction. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the program code stored in the memory 41 or process data, such as the program code for medical large model construction.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

The present application also provides another embodiment, namely, a computer-readable storage medium storing an interface display program executable by at least one processor to cause the at least one processor to perform the steps of the medical large model construction method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. A medical large model construction method, characterized by comprising:

2. The medical large model construction method according to claim 1, wherein the direction control model adopts a modified Bert model as a base model, the generating of the content for the large model by the direction control model is performed with direction control, and determining the generating direction includes:

3. The medical large model construction method according to claim 2, wherein the model control network comprises two multi-layer perceptron layers MLP and one Transformer encoder layer.

4. The medical large model construction method according to claim 3, characterized in that the medical large model construction method further comprises:

inputting the training corpus x into a basic model to obtain a first output y1:

y ₁ ＝F(x；Θ)

superposing the first output and the second output to obtain a target output;

5. The medical large model construction method of claim 4, wherein the base model freezes parameters during training.

6. The method for constructing a large medical model according to any one of claims 1 to 5, wherein the step of performing medical information extraction and identification by using a skeleton control model to construct the fact knowledge integrated into the large model integrated knowledge graph comprises the steps of:

7. A medical large model construction apparatus, characterized in that the medical large model construction apparatus comprises:

8. The medical large model construction apparatus according to claim 7, wherein the direction control model adopts a modified Bert model as a base model, the direction control module comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the medical large model construction method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the medical large model construction method according to any one of claims 1 to 6.