CN117668233A - Medical large model construction method, device, computer equipment and medium - Google Patents

Medical large model construction method, device, computer equipment and medium Download PDF

Info

Publication number
CN117668233A
CN117668233A CN202311708696.9A CN202311708696A CN117668233A CN 117668233 A CN117668233 A CN 117668233A CN 202311708696 A CN202311708696 A CN 202311708696A CN 117668233 A CN117668233 A CN 117668233A
Authority
CN
China
Prior art keywords
model
medical
control
layer
direction control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311708696.9A
Other languages
Chinese (zh)
Inventor
刘磊
邱建华
刘伟华
马金民
李林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wisdom Eye Information Technology Co ltd
Original Assignee
Beijing Wisdom Eye Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wisdom Eye Information Technology Co ltd filed Critical Beijing Wisdom Eye Information Technology Co ltd
Priority to CN202311708696.9A priority Critical patent/CN117668233A/en
Publication of CN117668233A publication Critical patent/CN117668233A/en
Pending legal-status Critical Current

Links

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a medical large model construction method, a device, computer equipment and a medium, comprising the following steps: performing direction control on the large model generation content through a direction control model, and determining a generation direction, wherein the direction control model is a text classification model; medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed; and integrating the generation direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result. The fact knowledge contained in the knowledge graph is integrated into the large language model, so that the accuracy of large language model generation is improved, and the accuracy of text recognition and classification is improved.

Description

Medical large model construction method, device, computer equipment and medium
Technical Field
The present invention relates to the field of natural language processing technologies, and in particular, to a method and apparatus for constructing a medical large model, a computer device, and a medium.
Background
With the development of artificial intelligence technology, natural language processing technology is widely applied, and the large language model is excellent in natural language processing tasks, such as machine translation, text question-answering and the like.
The inventor realizes that at least the following technical problems exist in the prior art in the process of realizing the invention: an unavoidable problem with large language models is the lack of knowledge of facts, which often creates false information, creating illusions that hinder the credibility of the model and make text classification recognition inaccurate.
Disclosure of Invention
The embodiment of the invention provides a medical large model construction method, a medical large model construction device, computer equipment and a storage medium, so as to improve the accuracy of text classification.
In order to solve the above technical problems, an embodiment of the present application provides a medical large model construction method, including:
performing direction control on the large model generation content through a direction control model, and determining a generation direction, wherein the direction control model is a text classification model;
medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed;
and integrating the generating direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result.
Optionally, the direction control model adopts an improved Bert model as a basic model, the generating content for the large model through the direction control model is used for direction control, and determining the generating direction includes:
adding a preset mark to each sentence head of the input content to serve as integral information;
after each piece of integral information is subjected to the improved Bert model, sentence vectors representing the integral information are output;
and linearizing and normalizing the sentence vector by adopting a linearization layer and a normalization layer to obtain the probability that the sentence vector belongs to each category, and taking the category with the maximum probability value as the generation direction.
Optionally, the model control network comprises two multi-layer perceptron layers MLP and one Transformer encoder layer.
Optionally, the medical large model construction method further includes:
the training corpus is subjected to a direction control model and a skeleton control model to obtain an input corpus x and a triplet c;
inputting the training corpus x into a basic model to obtain a first output y1:
y 1 =F(x;Θ)
where Θ represents the basic model parameters, F (; Θ) represents the basic model function with respect to the parameters Θ, x represents the input;
the triplet c passes through the first layer of the multi-layer perceptron layer MLP and is overlapped with the input corpus x to obtain a new input x ', and the new input x' passes through the second layer of the multi-layer perceptron layer MLP to obtain a second output y2;
superposing the first output and the second output to obtain a target output;
and training the direction control model, the skeleton control model and the model control network based on the real value corresponding to the training corpus and the target output.
Optionally, the base model freezes the parameters during the training process.
Optionally, the step of performing medical information extraction and identification by using a framework control model, and the step of constructing the fact knowledge integrated into the large model integrated knowledge graph includes:
and adopting structural medical knowledge nodes or relations contained in the knowledge graph to carry out skeleton control on the large model generation content, and determining the fact knowledge.
In order to solve the above technical problem, an embodiment of the present application further provides a medical large model building device, including:
the direction control module is used for performing direction control on the large model generation content through a direction control model and determining the generation direction, wherein the direction control model is a text classification model;
the framework control module is used for extracting and identifying medical information by adopting a framework control model and constructing fact knowledge integrated with a large model integrated knowledge graph;
and the network control module is used for integrating the generation direction and the fact knowledge into a large model network through a model control network to perform text processing, so as to obtain a text processing result.
Optionally, the direction control model adopts a modified Bert model as a basic model, and the direction control module includes:
the whole identification unit is used for adding a preset identification to each sentence head of the input content to be used as whole information;
the sentence characterization unit is used for outputting sentence vectors for representing the whole information after each whole information is subjected to the improved Bert model;
and the class determining unit is used for linearizing and normalizing the sentence vectors by adopting a linearization layer and a normalization layer to obtain the probability that the sentence vectors belong to each class, and taking the class with the largest probability value as the generation direction.
Optionally, the medical large model construction device further includes:
the corpus analysis module is used for enabling the training corpus to pass through the direction control model and the skeleton control model to obtain an input corpus x and a triplet c;
the first recognition module is used for inputting the training corpus x into the basic model to obtain a first output y1:
y 1 =F(x;Θ)
where Θ represents the basic model parameters, F (; Θ) represents the basic model function with respect to the parameters Θ, x represents the input;
the second recognition module is used for enabling the triplet c to pass through the first layer of the multi-layer perceptron layer MLP and to be overlapped with the input corpus x to obtain a new input x ', and enabling the new input x' to pass through the second layer of the multi-layer perceptron layer MLP to obtain a second output y2;
the output determining module is used for superposing the first output and the second output to obtain a target output;
and the iterative training module is used for training the direction control model, the skeleton control model and the model control network based on the real value corresponding to the training corpus and the target output.
In order to solve the above technical problem, the embodiments of the present application further provide a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the steps of the above medical large model building method are implemented when the processor executes the computer program.
In order to solve the above technical problem, the embodiments of the present application further provide a computer readable storage medium storing a computer program, where the computer program implements the steps of the medical large model building method described above when executed by a processor.
The medical large model construction method, the device, the computer equipment and the storage medium provided by the embodiment of the invention are used for carrying out direction control on the generation content of the large model through the direction control model and determining the generation direction, wherein the direction control model is a text classification model; medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed; and integrating the generation direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result. The fact knowledge contained in the knowledge graph is integrated into the large language model, so that the accuracy of large language model generation is improved, and the accuracy of text recognition and classification is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a medical large model building method of the present application;
FIG. 3 is a diagram showing an example of the structure of a directional control model of the present application;
FIG. 4 is a diagram showing an example of the structure of an overall large language model of the present application;
FIG. 5 is a schematic structural view of one embodiment of a medical large model building apparatus according to the present application;
FIG. 6 is a schematic structural diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the medical large model construction method provided in the embodiment of the present application is executed by a server, and accordingly, the medical large model construction device is disposed in the server.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation requirements, and the terminal devices 101, 102 and 103 in the embodiments of the present application may specifically correspond to application systems in actual production.
Referring to fig. 2, fig. 2 shows a medical large model construction method according to an embodiment of the present invention, and the method is applied to the server in fig. 1 for illustration, and is described in detail as follows:
s201: and performing direction control on the large model generation content through a direction control model, and determining the generation direction, wherein the direction control model is a text classification model.
In a specific optional implementation manner, the direction control model adopts an improved Bert model as a basic model, the direction control is performed on the generated content of the large model through the direction control model, and determining the generation direction comprises:
adding a preset mark to each sentence head of the input content to serve as integral information;
after each piece of integral information is subjected to the improved Bert model, sentence vectors representing the integral information are output;
linearizing and normalizing the sentence vectors by adopting a linearizing layer and a normalizing layer to obtain the probability that the sentence vectors belong to each category, and taking the category with the largest probability value as the generating direction.
Specifically, the direction control module is used for identifying the type input by a user, determining the overall direction of large model generation and preventing generation of irrelevant information. The essence of the direction control model in this embodiment is a text classification model, a Roberta is adopted as a basic model, a [ CLS ] symbol is added to an input sentence head for representing input overall information, a final output vector C (embodying information of the whole sentence) is obtained after the [ CLS ] passes through the Roberta, then the probability of each category is obtained through a linear layer (obtaining a vector with the same dimension N as the number of categories) and a softmax (normalizing the vector to be 1), the category with the highest probability is selected as output, the model structure is shown in fig. 3, the category is manually defined in advance according to a medical knowledge graph and a medical question-answer corpus, and the model structure is divided into the following categories: symptoms, concurrent symptoms, diagnostic examination items, medical subjects, disease names, disease profiles, disease causes, preventive measures, treatment cycles, treatment patterns, cure probabilities, disease susceptibility populations, medicines, foods, and the like.
During the model training phase, a small amount of training data is manually annotated, e.g., for "what are symptoms of cold? The category of the manual label is "symptom", and the training target is to make the model predicted category consistent with the manual label category. In the model reasoning phase, for any input, the model will divide it into the above manually defined categories. The categories are manually predefined and for any input they are classified into one of the categories. The model outputs the probability of each category finally, and the highest probability is the final output. The similar category of "symptom, concurrent symptom" is also a category that selects a higher probability of output.
S202: and (3) extracting and identifying medical information by adopting a skeleton control model, and constructing the fact knowledge integrated with the large model and the knowledge graph.
Specifically, the framework control model is used for extracting medical information of a problem input by a user, and mainly comprises entity identification, attribute identification and the like. For example: the field knowledge contained in medical knowledge maps of disease types, symptoms, medical subjects, medicines, treatment probabilities, foods and the like. After extracting the entity and attribute in the user problem, finding out the corresponding node in the medical knowledge graph. And then inquiring corresponding skeleton knowledge in the knowledge graph according to the classification result of the direction control model.
For example when the user inputs: what are symptoms of the cold? Firstly, the direction control model identifies that the medical direction which a user wants to ask is a symptom, secondly, the information extraction module extracts an entity cold, then the cold is inquired in a knowledge graph, and based on the symptom identified by the direction control model, the corresponding symptom is inquired in the knowledge graph as skeleton knowledge: ("Cold", "symptom", "fever, cough, runny nose").
S203: and integrating the generation direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result.
Specifically, the control further controls the behavior of the entire neural network by operating the input conditions of the neural network block. The chatglm-6b is used as a basic model, and the basic model is difficult to train because the parameter quantity of the basic model is 60 hundred million, so that the basic model is frozen during training, the parameters of the basic model are not updated, and the training time and cost are reduced.
And adding a new control network layer, which is used for introducing the fact knowledge of the knowledge graph to the basic model and controlling the generation of the basic model. The fact knowledge C (obtained by inquiring the knowledge graph by the framework control module) and the output X are generated into an output vector y2 after passing through the control network layer, and the output vector y2 is added with the output vector y1 of the basic model to form a final output vector y, so that the fact knowledge is introduced into the output, the model output is more in line with the fact, and the control generation effect is achieved. The overall control network architecture is shown in fig. 4.
Optionally, the model control network includes two layers of multi-layer perceptron layers MLP and one Transformer encoder layer, the layers of multi-layer perceptron layers MLP acting to enable the model to capture more characteristic information, transformer encoder layers for converting input sequence information into a set of representation vectors via a self-attention mechanism.
The frozen model is characterized by the fact that the quantity of parameters is too large, 10 hundred million parameters exist, the training time and the training result are large, and the training is difficult. The control network is added to introduce the fact knowledge of the knowledge graph into the basic model output, and the control model is generated.
In a specific optional embodiment, the medical large model construction method further includes:
the training corpus is subjected to a direction control model and a skeleton control model to obtain an input corpus x and a triplet c;
inputting the training corpus x into a basic model to obtain a first output y1:
y 1 =f (x; Θ) wherein Θ represents a basic model parameter, F (; Θ) represents a function of the basic model with respect to the parameter Θ, x represents an input;
the triplet c passes through a first layer of multi-layer perceptron layer MLP and is overlapped with the input corpus x to obtain a new input x ', and the new input x' passes through a second layer of multi-layer perceptron layer MLP to obtain a second output y2;
superposing the first output and the second output to obtain a target output;
based on the real value and target output corresponding to the training corpus, training the directional control model, the skeleton control model and the model control network.
Optionally, the base model freezes the parameters during the training process.
For the input problem: "which symptoms of cold exist", and obtaining knowledge triples ("cold", "symptoms", "fever, cough, runny nose") in the knowledge graph through the direction control module and the skeleton control module, and respectively representing input problems and triples as x and c.
(1) Base model:
the input x passes through the base model to produce an output y 1
y 1 =F(x;Θ)
Wherein Θ represents the basic model parameters, note that this part of parameters are to be frozen during training, and are not updated; f (; Θ) represents the function of the base model with respect to the parameter Θ, x represents the input.
(2) Control layer:
the triplet C is first passed through the first MLP layer and added to the input X to obtain a new input X':
x′=x+Z(c,Θ 21 )
wherein Z (. Theta.; theta) z1 ) Representing the first MLP layer with respect to the parameter Θ z1 C represents a knowledge triplet.
x' then goes through Transformer encoder layers and a second MLP layer to get the output y2 of the control layer:
y 2 =Z(Encoder(x′),Θ z2 )
wherein, encoder () represents Transformer Encoder layers, Z (, Θ) z2 ) Representing the second MLP layer with respect to the parameter Θ z2 Is a function of (2).
(3) Adding the basic model input and control layer output vectors to obtain a model final output vector y pred
y pred =y 1 +y 2
(4) The training targets are as follows: let y of model output pred Close to the true value y true
y pred ≈y true
In a specific optional implementation manner, the framework control model is adopted to extract and identify medical information, and the construction of the fact knowledge integrated into the large model and the knowledge graph comprises the following steps:
and adopting structural medical knowledge nodes or relations contained in the knowledge graph to carry out skeleton control on the large model generation content, and determining the fact knowledge.
In the embodiment, the direction control is performed on the large model generation content through the direction control model, and the generation direction is determined, wherein the direction control model is a text classification model; medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed; and integrating the generation direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result. The fact knowledge contained in the knowledge graph is integrated into the large language model, so that the accuracy of large language model generation is improved, and the accuracy of text recognition and classification is improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
Fig. 5 shows a schematic block diagram of a medical large model construction apparatus in one-to-one correspondence with the medical large model construction method of the above embodiment. As shown in fig. 5, the medical large model construction apparatus includes a direction control module 31, a skeleton control module 32, and a network control module 33. The functional modules are described in detail as follows:
the direction control module 31 is configured to perform direction control on the generated content for the large model through a direction control model, and determine a generation direction, where the direction control model is a text classification model;
the framework control module 32 is used for extracting and identifying medical information by adopting a framework control model, and constructing fact knowledge integrated into a knowledge graph of a large model;
the network control module 33 is configured to integrate the generation direction and the fact knowledge into a large model network through the model control network, and perform text processing to obtain a text processing result.
Alternatively, the direction control model employs a modified Bert model as a base model, and the direction control module 31 includes:
the whole identification unit is used for adding a preset identification to each sentence head of the input content to be used as whole information;
the sentence characterization unit is used for outputting sentence vectors for representing the whole information after each whole information is subjected to the improved Bert model;
the class determining unit is used for linearizing and normalizing the sentence vectors by adopting the linearization layer and the normalization layer to obtain the probability that the sentence vectors belong to each class, and the class with the largest probability value is used as the generating direction.
Optionally, the medical large model construction device further includes:
the corpus analysis module is used for enabling the training corpus to pass through the direction control model and the skeleton control model to obtain an input corpus x and a triplet c;
the first recognition module is used for inputting the training corpus x into the basic model to obtain a first output y1:
y 1 =f (x; Θ) wherein Θ represents a basic model parameter, F (; Θ) represents a function of the basic model with respect to the parameter Θ, x represents an input;
the second recognition module is used for enabling the triplet c to pass through the first layer multi-layer perceptron layer MLP and to be overlapped with the input corpus x to obtain a new input x ', and enabling the new input x' to pass through the second layer multi-layer perceptron layer MLP to obtain a second output y2;
the output determining module is used for superposing the first output and the second output to obtain a target output;
and the iterative training module is used for training the directional control model, the skeleton control model and the model control network based on the real value and the target output corresponding to the training corpus.
For specific limitations on the medical large model construction apparatus, reference may be made to the above limitations on the medical large model construction method, and no further description is given here. The respective modules in the above-described medical large model construction apparatus may be realized in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 6, fig. 6 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only a computer device 4 having a component connection memory 41, a processor 42, a network interface 43 is shown in the figures, but it is understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various types of application software installed on the computer device 4, such as program codes for medical large model construction. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the program code stored in the memory 41 or process data, such as the program code for medical large model construction.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
The present application also provides another embodiment, namely, a computer-readable storage medium storing an interface display program executable by at least one processor to cause the at least one processor to perform the steps of the medical large model construction method as described above.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims (10)

1. A medical large model construction method, characterized by comprising:
performing direction control on the large model generation content through a direction control model, and determining a generation direction, wherein the direction control model is a text classification model;
medical information extraction and identification are carried out by adopting a framework control model, and fact knowledge integrated with a large model and a knowledge graph is constructed;
and integrating the generating direction and the fact knowledge into a large model network through a model control network, and performing text processing to obtain a text processing result.
2. The medical large model construction method according to claim 1, wherein the direction control model adopts a modified Bert model as a base model, the generating of the content for the large model by the direction control model is performed with direction control, and determining the generating direction includes:
adding a preset mark to each sentence head of the input content to serve as integral information;
after each piece of integral information is subjected to the improved Bert model, sentence vectors representing the integral information are output;
and linearizing and normalizing the sentence vector by adopting a linearization layer and a normalization layer to obtain the probability that the sentence vector belongs to each category, and taking the category with the maximum probability value as the generation direction.
3. The medical large model construction method according to claim 2, wherein the model control network comprises two multi-layer perceptron layers MLP and one Transformer encoder layer.
4. The medical large model construction method according to claim 3, characterized in that the medical large model construction method further comprises:
the training corpus is subjected to a direction control model and a skeleton control model to obtain an input corpus x and a triplet c;
inputting the training corpus x into a basic model to obtain a first output y1:
y 1 =F(x;Θ)
where Θ represents the basic model parameters, F (; Θ) represents the basic model function with respect to the parameters Θ, x represents the input;
the triplet c passes through the first layer of the multi-layer perceptron layer MLP and is overlapped with the input corpus x to obtain a new input x ', and the new input x' passes through the second layer of the multi-layer perceptron layer MLP to obtain a second output y2;
superposing the first output and the second output to obtain a target output;
and training the direction control model, the skeleton control model and the model control network based on the real value corresponding to the training corpus and the target output.
5. The medical large model construction method of claim 4, wherein the base model freezes parameters during training.
6. The method for constructing a large medical model according to any one of claims 1 to 5, wherein the step of performing medical information extraction and identification by using a skeleton control model to construct the fact knowledge integrated into the large model integrated knowledge graph comprises the steps of:
and adopting structural medical knowledge nodes or relations contained in the knowledge graph to carry out skeleton control on the large model generation content, and determining the fact knowledge.
7. A medical large model construction apparatus, characterized in that the medical large model construction apparatus comprises:
the direction control module is used for performing direction control on the large model generation content through a direction control model and determining the generation direction, wherein the direction control model is a text classification model;
the framework control module is used for extracting and identifying medical information by adopting a framework control model and constructing fact knowledge integrated with a large model integrated knowledge graph;
and the network control module is used for integrating the generation direction and the fact knowledge into a large model network through a model control network to perform text processing, so as to obtain a text processing result.
8. The medical large model construction apparatus according to claim 7, wherein the direction control model adopts a modified Bert model as a base model, the direction control module comprising:
the whole identification unit is used for adding a preset identification to each sentence head of the input content to be used as whole information;
the sentence characterization unit is used for outputting sentence vectors for representing the whole information after each whole information is subjected to the improved Bert model;
and the class determining unit is used for linearizing and normalizing the sentence vectors by adopting a linearization layer and a normalization layer to obtain the probability that the sentence vectors belong to each class, and taking the class with the largest probability value as the generation direction.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the medical large model construction method according to any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the medical large model construction method according to any one of claims 1 to 6.
CN202311708696.9A 2023-12-13 2023-12-13 Medical large model construction method, device, computer equipment and medium Pending CN117668233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311708696.9A CN117668233A (en) 2023-12-13 2023-12-13 Medical large model construction method, device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311708696.9A CN117668233A (en) 2023-12-13 2023-12-13 Medical large model construction method, device, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN117668233A true CN117668233A (en) 2024-03-08

Family

ID=90078617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311708696.9A Pending CN117668233A (en) 2023-12-13 2023-12-13 Medical large model construction method, device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN117668233A (en)

Similar Documents

Publication Publication Date Title
CN112863683B (en) Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium
CN112308237B (en) Question-answer data enhancement method and device, computer equipment and storage medium
CN112256886B (en) Probability calculation method and device in atlas, computer equipment and storage medium
CN111694937A (en) Interviewing method and device based on artificial intelligence, computer equipment and storage medium
CN111767375A (en) Semantic recall method and device, computer equipment and storage medium
KR20210090576A (en) A method, an apparatus, an electronic device, a storage medium and a program for controlling quality
CN116245097A (en) Method for training entity recognition model, entity recognition method and corresponding device
CN115438149A (en) End-to-end model training method and device, computer equipment and storage medium
CN116186295B (en) Attention-based knowledge graph link prediction method, attention-based knowledge graph link prediction device, attention-based knowledge graph link prediction equipment and attention-based knowledge graph link prediction medium
CN112232052A (en) Text splicing method and device, computer equipment and storage medium
CN115730237B (en) Junk mail detection method, device, computer equipment and storage medium
CN115510188A (en) Text keyword association method, device, equipment and storage medium
CN117668233A (en) Medical large model construction method, device, computer equipment and medium
CN114548114A (en) Text emotion recognition method, device, equipment and storage medium
CN115169350B (en) Method, device, equipment, medium and program for processing information
CN117034875A (en) Text data generation method, device, equipment and storage medium thereof
CN116821298A (en) Keyword automatic identification method applied to application information and related equipment
CN117057362A (en) Artificial intelligence-based intention recognition method, apparatus, device and storage medium
CN117493563A (en) Session intention analysis method, device, equipment and storage medium thereof
CN117932082A (en) Text content reference digestion method, device, equipment and storage medium thereof
CN117011874A (en) Text detection method, device, equipment and storage medium based on artificial intelligence
CN117057935A (en) Data processing method, device, equipment and storage medium based on field design
CN117235260A (en) Text labeling method, device, equipment and storage medium based on artificial intelligence
CN116453510A (en) Training method and device for multilingual model, computer equipment and storage medium
CN115826973A (en) List page generation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination