CN116975241B - Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model - Google Patents

Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model Download PDF

Info

Publication number
CN116975241B
CN116975241B CN202311216697.1A CN202311216697A CN116975241B CN 116975241 B CN116975241 B CN 116975241B CN 202311216697 A CN202311216697 A CN 202311216697A CN 116975241 B CN116975241 B CN 116975241B
Authority
CN
China
Prior art keywords
liver cancer
data
question
answer
language model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311216697.1A
Other languages
Chinese (zh)
Other versions
CN116975241A (en
Inventor
李亚
李晓龙
戴青云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202311216697.1A priority Critical patent/CN116975241B/en
Publication of CN116975241A publication Critical patent/CN116975241A/en
Application granted granted Critical
Publication of CN116975241B publication Critical patent/CN116975241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Human Computer Interaction (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a liver cancer auxiliary diagnosis and question-answering method, system and medium based on a large language model. Acquiring a liver cancer knowledge data set through a medical data platform; importing the liver cancer knowledge data set into a question-answer conversion model, and converting knowledge data based on a preset question-answer template to obtain a liver cancer instruction question-answer data set; constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and fine-tuning the large language model based on a LoRA method; generating a reward model based on the trimmed large language model, training the reward model according to the comparison data set, and generating a corresponding reward function; and optimizing the preference of the large language model according to the rewarding model. The invention can accurately assist the questions and answers of the patient, provide practical medical advice, improve the understanding of the patient on liver cancer and greatly improve the communication efficiency of doctors and patients.

Description

Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model
Technical Field
The invention relates to the field of large language models, in particular to a liver cancer auxiliary diagnosis and question-answering method, system and medium based on a large language model.
Background
Liver cancer is a malignant tumor. The high incidence of liver cancer is related to various factors such as personal living habits, environmental pollution, chronic liver diseases and the like, so that diagnosis and prevention of liver cancer are very important.
With the continuous development of artificial intelligence technology, question-answering systems based on large language models have become a research hotspot in the medical field. The liver cancer question-answering system is a medical application based on artificial intelligence technology, can provide rapid and accurate liver cancer related information for patients and doctors, helps the patients to better know the knowledge of etiology, symptoms, diagnosis, treatment and the like of the liver cancer, and can also provide auxiliary diagnosis and treatment support for the doctors.
The research and development of the liver cancer auxiliary diagnosis and question-answering system have important significance for improving the early diagnosis rate, the treatment effect and the survival rate of the liver cancer. Meanwhile, the application of the liver cancer auxiliary diagnosis and question-answering system can also provide support for reasonable distribution of medical resources and popularization of medical services, and is helpful for relieving problems of shortage of medical resources, uneven medical services and the like. Therefore, the research and application of the liver cancer auxiliary diagnosis and question-answering system have wide development prospect and social value. However, in the prior art, the question answering method and system for assisting the question answering of liver cancer do not have good application, the method for acquiring and analyzing related data is simple, and good assisting effect is difficult to achieve.
Disclosure of Invention
The invention overcomes the defects of the prior art and provides a liver cancer auxiliary diagnosis and question-answering method, a liver cancer auxiliary diagnosis and question-answering system and a liver cancer auxiliary diagnosis and question-answering medium based on a large language model.
The first aspect of the invention provides a liver cancer auxiliary diagnosis and question-answering method based on a large language model, which comprises the following steps:
acquiring a liver cancer knowledge data set through a medical data platform;
importing the liver cancer knowledge data set into a question-answer conversion model, and converting knowledge data based on a preset question-answer template to obtain a liver cancer instruction question-answer data set;
constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and fine-tuning the large language model based on a LoRA method;
generating a reward model based on the trimmed large language model, training the reward model according to the comparison data set, and generating a corresponding reward function;
and optimizing the preference of the large language model according to the rewarding model.
In this scheme, through medical data platform, gather liver cancer knowledge data set, specifically do:
acquiring hospitalization service instruction manual data in a target medical institution, liver cancer image report data and liver cancer knowledge science popularization data through a medical data platform;
Performing data extraction of entities, relations and attributes based on the inpatient service manual data, and forming an inpatient service knowledge graph;
extracting liver cancer image description information and image diagnosis information from the cancer image report data;
according to the liver cancer image description information and the image diagnosis information, a staging result is obtained through manual labeling;
the liver cancer knowledge data set comprises hospitalization service knowledge maps, liver cancer image description information, image diagnosis information, stage results and data of liver cancer knowledge science popularization.
In this scheme, the liver cancer knowledge data set is imported into a question-answer conversion model, and knowledge data conversion is performed based on a preset question-answer template to obtain a liver cancer instruction question-answer data set, specifically:
integrating the data of the hospitalization service knowledge graph and the liver cancer knowledge science popularization into first training data based on a large language model;
importing the first training data into a question-answer conversion model based on ChatGPT, performing question-answer simulation according to a first preset question-answer prompt template, and generating first question-answer data;
carrying out data integration on the liver cancer image description information, the image diagnosis information and the staging result to form second training data;
importing the second training data into a question-answer conversion model, performing question-answer simulation based on a second preset question-answer prompting template, and generating second question-answer data;
And carrying out data integration on the first question-answer data and the second question-answer data to form a liver cancer instruction question-answer data set.
In this scheme, the construction of a large language model, the introduction of the liver cancer instruction question-answer data set into the large language model for training, and fine tuning of the large language model based on the LoRA method, specifically, the steps of:
constructing a large language model based on ChatGLM-6B;
importing the liver cancer instruction question-answer data set into a large language model for pre-training;
freezing the pre-trained model weight parameters and generating a newly added network layer based on the LoRA mode;
training the newly added network layer based on the liver cancer instruction question-answer data set, and updating corresponding parameters;
and importing the newly added network layer into a large language model.
In this scheme, the generating a reward model based on the trimmed large language model trains the reward model according to the comparison data set and generates a corresponding reward function, specifically:
training a reward model based on a preset comparison data set, wherein a loss function of the reward model is as follows:
;
wherein,scalar output of bonus model with hint x and output y of parameter θ, output +.>Is a specific output +.>Good output, D is the comparison dataset, < +.>As an S-shaped function in a neural network, +. >Is a loss function value;
performing reinforcement training on the large language model based on a preset reward function, wherein the expression of the reward function is specifically as follows:
;
wherein,is a KL penalty function, < >>Is a model for learning RL strategy, +.>Is a large language model subjected to LoRA fine tuning, < ->Is the scalar output of the reward model with hint x and output y of parameter θ, D is the comparison dataset, objective is the reward function, Φ is the optimization parameter, +.>For reinforcement learning data sets, β is a predetermined correction factor and E is a reward calculation function.
In this solution, the preference optimization is performed on the large language model according to the reward model, specifically:
acquiring a preset test data set;
performing question-answer test on the large language model based on a preset test data set, performing output analysis and reward score calculation on answers of each test based on a reward model and a reward function, and optimizing large language model parameters and the reward model based on calculation results;
and carrying out iterative optimization on the large predictive model and the rewarding model based on the preset test data set until the preset iterative times are reached.
The second aspect of the present invention also provides a liver cancer auxiliary diagnosis and question-answering system based on a large language model, the system comprising: the liver cancer diagnosis and question-answering system comprises a memory and a processor, wherein the memory comprises a liver cancer diagnosis assisting and question-answering program based on a large language model, and the liver cancer diagnosis assisting and question-answering program based on the large language model realizes the following steps when being executed by the processor:
Acquiring a liver cancer knowledge data set through a medical data platform;
importing the liver cancer knowledge data set into a question-answer conversion model, and converting knowledge data based on a preset question-answer template to obtain a liver cancer instruction question-answer data set;
constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and fine-tuning the large language model based on a LoRA method;
generating a reward model based on the trimmed large language model, training the reward model according to the comparison data set, and generating a corresponding reward function;
and optimizing the preference of the large language model according to the rewarding model.
In this scheme, through medical data platform, gather liver cancer knowledge data set, specifically do:
acquiring hospitalization service instruction manual data in a target medical institution, liver cancer image report data and liver cancer knowledge science popularization data through a medical data platform;
performing data extraction of entities, relations and attributes based on the inpatient service manual data, and forming an inpatient service knowledge graph;
extracting liver cancer image description information and image diagnosis information from the cancer image report data;
according to the liver cancer image description information and the image diagnosis information, a staging result is obtained through manual labeling;
The liver cancer knowledge data set comprises hospitalization service knowledge maps, liver cancer image description information, image diagnosis information, stage results and data of liver cancer knowledge science popularization.
In this scheme, the construction of a large language model, the introduction of the liver cancer instruction question-answer data set into the large language model for training, and fine tuning of the large language model based on the LoRA method, specifically, the steps of:
constructing a large language model based on ChatGLM-6B;
importing the liver cancer instruction question-answer data set into a large language model for pre-training;
freezing the pre-trained model weight parameters and generating a newly added network layer based on the LoRA mode;
training the newly added network layer based on the liver cancer instruction question-answer data set, and updating corresponding parameters;
and importing the newly added network layer into a large language model.
The third aspect of the present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium includes a liver cancer auxiliary diagnosis and question-answering program based on a large language model, and when the liver cancer auxiliary diagnosis and question-answering program based on the large language model is executed by a processor, the steps of the liver cancer auxiliary diagnosis and question-answering method based on the large language model described in any one of the above are implemented.
By the scheme of the invention, the following beneficial effects can be realized:
can effectively guide patients how to be admitted and discharged, and explain the inpatient beard. For newly admitted patients, they can be presented with the entire hospitalization procedure and notes, including how to handle hospitalization, medical examinations that need to be performed after admission to the ward, medication use notes, diet contraindications, etc. Meanwhile, various services and facilities of a hospital can be publicized for the patient, so that the patient can adapt to the environment and the treatment process better. For discharged patients, detailed discharge guidance can be provided for the patients, including the aspects of postoperative rehabilitation, correct medication, diet contraindication, advice and the like. The communication efficiency between doctors and patients is improved, and the overall efficiency of the medical institution is greatly improved.
Can provide the knowledge related to the liver cancer of the patient. The basic concept of liver cancer can be introduced to patients and corresponding treatment methods adopted for liver cancer of different stages can be introduced to patients. Solving the questions and concerns of the patients and family members and providing psychological support and advice. Realizing the targeted consultation service of patients on liver cancer diseases.
And giving out an image diagnosis or liver cancer staging result according to the liver cancer image description. Through liver cancer imaging examination, such as CT, related liver cancer image data can be obtained, a doctor can give out liver cancer image description according to the data, the system can give out image diagnosis or liver cancer stage result according to the liver cancer image description, and the efficiency of assisting patient consultation and answering is improved.
The invention discloses a liver cancer auxiliary diagnosis and question-answering method, system and medium based on a large language model. Acquiring a liver cancer knowledge data set through a medical data platform; importing the liver cancer knowledge data set into a question-answer conversion model, and converting knowledge data based on a preset question-answer template to obtain a liver cancer instruction question-answer data set; constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and fine-tuning the large language model based on a LoRA method; generating a reward model based on the trimmed large language model, training the reward model according to the comparison data set, and generating a corresponding reward function; and optimizing the preference of the large language model according to the rewarding model. The invention can accurately assist the questions and answers of the patient, provide practical medical advice, improve the understanding of the patient on liver cancer and greatly improve the communication efficiency of doctors and patients.
Drawings
FIG. 1 shows a flowchart of a liver cancer auxiliary diagnosis and question-answering method based on a large language model of the invention;
FIG. 2 shows a flowchart of the liver cancer knowledge data set acquisition of the present invention;
FIG. 3 illustrates a large language model optimization flow chart of the present invention;
FIG. 4 shows a block diagram of a liver cancer auxiliary diagnosis and question-answering system based on a large language model of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
FIG. 1 shows a flowchart of a liver cancer auxiliary diagnosis and question-answering method based on a large language model.
As shown in fig. 1, the first aspect of the present invention provides a liver cancer auxiliary diagnosis and question-answering method based on a large language model, comprising:
s102, acquiring a liver cancer knowledge data set through a medical data platform;
s104, importing the liver cancer knowledge data set into a question-answer conversion model, and carrying out knowledge data conversion based on a preset question-answer template to obtain a liver cancer instruction question-answer data set;
S106, constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and fine-tuning the large language model based on a LoRA method;
s108, generating a reward model based on the trimmed large language model, training the reward model according to the comparison data set, and generating a corresponding reward function;
and S110, optimizing the preference of the large language model according to the rewarding model.
According to the embodiment of the invention, the liver cancer knowledge data set is collected through the medical data platform, specifically:
acquiring hospitalization service instruction manual data in a target medical institution, liver cancer image report data and liver cancer knowledge science popularization data through a medical data platform;
performing data extraction of entities, relations and attributes based on the inpatient service manual data, and forming an inpatient service knowledge graph;
extracting liver cancer image description information and image diagnosis information from the cancer image report data;
according to the liver cancer image description information and the image diagnosis information, a staging result is obtained through manual labeling;
the liver cancer knowledge data set comprises hospitalization service knowledge maps, liver cancer image description information, image diagnosis information, stage results and data of liver cancer knowledge science popularization.
The manual labeling is specifically a manual staging result labeling process based on a professional doctor.
Fig. 2 shows a flowchart of the liver cancer knowledge data set acquisition of the present invention.
According to the embodiment of the invention, the liver cancer knowledge data set is imported into a question-answer conversion model, and knowledge data conversion is performed based on a preset question-answer template to obtain a liver cancer instruction question-answer data set, which specifically comprises:
s202, integrating data of hospitalization service knowledge graphs and liver cancer knowledge scientific popularization into first training data based on a large language model;
s204, importing the first training data into a question-answer conversion model based on ChatGPT, performing question-answer simulation according to a first preset question-answer prompt template, and generating first question-answer data;
s206, integrating the liver cancer image description information, the image diagnosis information and the stage result to form second training data;
s208, importing second training data into a question-answer conversion model, performing question-answer simulation based on a second preset question-answer prompting template, and generating second question-answer data;
and S210, carrying out data integration on the first question-answer data and the second question-answer data to form a liver cancer instruction question-answer data set.
It should be noted that, the first preset question-answer prompting template is a template used by the hospitalization knowledge graph and the liver cancer knowledge science popularization data, that is, the first preset question-answer prompting template is applied to the first training data, and the content of the template is specifically as follows:
Monograph question-answering prompt template:
according to the following text, question-answer data between the patient and the doctor are generated, and several groups are generated as much as possible, so that repetition cannot be caused. The doctor will give the patient a very well-tolerated and comprehensive answer and the mood is gentle and intimate, and more detailed and helpful in responding to the patient's inquiry: { data }
Multiple rounds of dialog prompt templates:
based on the text below, a dialogue is generated about the patient asking the doctor, which dialogue must be multi-round. The doctor will give the patient a very well-tolerated and comprehensive answer and the mood is gentle and intimate, and more detailed and helpful in responding to the patient's inquiry: { data }
Where { data } represents the question-answer data of the training.
In addition, a second preset question-answer prompting template used for liver cancer image description information, image diagnosis information and a stage result (second training data) is as follows:
you are a liver cancer doctor with abundant experience, please analyze that the liver cancer of the patient is several stages according to the stage standard of the liver cancer and the liver cancer image description of the patient, and the information you can extract in the liver cancer image description is: { info }, it is shown in your analysis that these info are described by you by analyzing liver cancer images. The values of the two indices PS and liver function Child-Pugh are known, the score of PS is { fraction }, liver function Child-Pugh is { grade }, and the final staging result must be: { stage }. The analysis process is detailed, the way of speaking is like a doctor, and the words are gentle and intimate, and the object you talk to can be { patient or other doctor }.
The staging criteria for liver cancer are:
CNLC phase Ia: the score of physical activity state (PS) is between 0 and 2, the liver function Child-Pugh is grade A or grade B, and single tumor with the diameter less than or equal to 5cm has no vascular invasion or extrahepatic metastasis;
CNLC stage Ib: PS scores are between 0 and 2, liver function Child-Pugh is grade A or grade B, single tumor, diameter is more than 5cm, or 2-3 tumors, maximum diameter is less than or equal to 3cm, and vascular invasion and extrahepatic metastasis are avoided;
CNLC stage IIa: PS scores are between 0 and 2, liver function Child-Pugh is grade A or grade B, 2 to 3 tumors have maximum diameters of more than 3cm, and vascular invasion and extrahepatic metastasis are avoided;
CNLC IIb phase: PS scores are between 0 and 2, liver functions Child-Pugh are grade A or grade B, the number of tumors is more than or equal to 4, and no matter what the tumor diameter is, no vascular invasion or extrahepatic metastasis exists;
CNLC stage IIIa: PS scores between 0 and 2, liver function Child-Pugh is grade A or grade B, and no matter the tumor condition is affected by blood vessels and has no extrahepatic metastasis;
CNLC IIIb phase: PS scores between 0 and 2, liver function Child-Pugh is grade A or grade B, and tumor conditions and vascular invasion are no matter whether extrahepatic metastasis exists;
CNLC stage IV: PS scores between 3 and 4, or liver function Child-Pugh is grade C, and tumor conditions, vascular invasion and extrahepatic metastasis are all the same.
The liver cancer image of the patient is described as follows: { liver cancer image description }
The prompt templates are used for inquiring the ChatGPT in combination with the collected data to generate high-quality question-answer data, the generated question-answer data are integrated into a liver cancer instruction question-answer data set after being processed, and the data are used for fine-tuning a large language model ChatGLM, wherein the data format is as follows:
[
{
"construction" is good, i know. What do i need to do that about the procedure and report to the process? ",
"input":"",
"output" refers to the procedure of the department of diet, you only need to go to building 5, building 1. After the ward is reached, you only need to report your name and hospitalization number to the nurse, and then can begin to settle. If you have any questions about the notes during hospitalization, our healthcare staff will also give you a patience answer. ",
"history":[
[ "doctor you good, i diagnosed need hospitalization, please ask how to handle admission? "you good, according to our hospital's specifications, you can get the admission notice to the doctor's place first, then fill out the relevant form to the 5 th floor 1 th hospitalization toll gate and pay the hospitalization fee. Then, the user needs to go to the meal department of building 5 and building 1 to transact the meal, and finally, the user directly goes to the ward nurse station to report. "]
[ "good, where does that me need to find the doctor? "you can go directly to the outpatient hall, and consult the staff with the doctor's location. But before this, you need to have a hospital notice that you have made, do you have obtained this notice? "]
[ "yes, I have taken the hospitalization notice. "that is, you can go directly to the hospitalization toll gate of floor 5 and 1 to fill out the relevant form and pay for the hospitalization. If you have any doubt about the cost of hospitalization, our hospitalization toll staff can provide you with a detailed bill of charge. "]
]
},
{
"construction" is known that the score of PS is 1, liver function Child-Pugh is B-class, and the liver cancer of the patient is analyzed to belong to that stage according to the following liver cancer image description,
input is that the two lungs are seen as multiple capsules without lung texture transparent shadow. The inner segment of the middle lobe of the right lung and the lower tongue segment of the upper lobe of the left lung are marked by a dense streak. The two lungs are seen as multiple solid nodule shadows with a diameter of about 2-3mm. The basal segment behind the lower lobe of the double lung is seen as flocculent blurred shadow, yu Shuangce lung fields are increased, disturbed and blurred, and the distribution of the double lung bronchus vascular bundles is natural. The trachea and bronchi are smooth, the tube wall is smooth, and the stenosis, the dilatation or the pressure change are not seen. The bilateral pulmonary portal is not enlarged, the mediastinum is not occupied with lesions, and the lymphadenopathy is not enlarged. The heart is normal in size and morphology, and calcification spots are seen in the aorta and the coronary arteries. The large blood vessels are reinforced uniformly, and filling defects are not found. The chest has normal morphology and no abnormal change of chest wall. The bilateral armpits were not seen with enlarged lymph nodes. The rib and thoracic vertebrae are not clearly damaged by bone. A small amount of effusion was seen in the bilateral chest. Liver is normal in morphology and size, moderate in liver-lobe proportion and smooth in edge. The liver S7 is similar to a circular low-density range, the diameter is about 35mm, the non-uniform reinforcement is visible in the arterial period after reinforcement, the edge reinforcement is obvious, the slightly low-density range is visible in the center, the venous period reinforcement is slightly higher than the surrounding liver tissue, no abnormal density shadow is visible in the residual liver parenchyma, and the abnormal reinforcement range is not visible in the dual-period scanning liver after reinforcement. The portal trunk and branches develop normally. The intrahepatic and extrahepatic biliary tract is not dilated, the gallbladder is not large, nodular high density shadow is seen in the gallbladder, and the size is about 14mm multiplied by 12mm. The pancreas is normal in shape and size, the head of the pancreas is in punctiform calcification, and the pancreatic duct is not expanded. Spleen size, morphology and density were not abnormal, and collateral vessels were not seen in the spleen gate and the stomach periphery. Small punctiform compact shadows are seen in the left renal calyx. The double kidneys are normal in size, morphology and density, and no abnormal reinforcement is seen after reinforcement. No abnormality was seen in the peritoneum, and no enlargement of the lymph nodes was seen in the mesenteric area and after the peritoneum. The inner side of the duodenal descending segment is seen to contain an air sac shadow with a diameter of about 8 mm. The layers seen rise the wall of the colon-structural hepatic curved tube thicken. ",
The "output" esteem patient, i have carefully analyzed your situation from the liver cancer image description you provide. First, I want to emphasize that your liver cancer image description shows no extrahepatic metastasis, no vascular invasion, 1 tumor, tumor diameter less than 5cm. This information is important for us to judge your liver cancer stage.
Now let us analyze your liver cancer stage according to CNLC stage criteria. Your physical activity status (PS) score is 1, liver function Child-Pugh is grade B. In combination with your liver cancer image description, we can draw the following conclusions:
no extrahepatic transfer: meets the requirements of the Ia phase, the Ib phase, the IIa phase, the IIb phase and the IIIa phase of CNLC;
no vascular invasion: meets the requirements of the Ia phase, the Ib phase, the IIa phase and the IIb phase of CNLC;
1 tumor: meets the requirements of the CNLC phase Ia and the Ib;
tumor diameter is less than 5cm: meets the requirements of CNLC Ia phase.
By combining the information, the liver cancer stage is CNLC Ia stage. This means that your liver cancer is at an earlier stage and the prognosis of treatment may be better. Please feel confident, we will provide you with the most appropriate treatment plan to help you recover as soon as possible. If you have any questions or concerns, please consult me at any time, i struggle to answer you. ",
"history":[]
}
]
Based on the preset template, the invention can further acquire high-quality question-answer data.
According to the embodiment of the invention, a large language model is constructed, the liver cancer instruction question-answer data set is imported into the large language model for training, and fine tuning is performed on the large language model based on a LoRA method, specifically:
constructing a large language model based on ChatGLM-6B;
importing the liver cancer instruction question-answer data set into a large language model for pre-training;
freezing the pre-trained model weight parameters and generating a newly added network layer based on the LoRA mode;
training the newly added network layer based on the liver cancer instruction question-answer data set, and updating corresponding parameters;
and importing the newly added network layer into a large language model.
In the invention, the fine tuning mode based on LoRA is selected, so that the number of parameters in the fine tuning process of the model can be greatly reduced, and the fine tuning efficiency of the model is improved.
According to the embodiment of the invention, the reward model is generated based on the trimmed large language model, the reward model is trained according to the comparison data set, and a corresponding reward function is generated, specifically:
training a reward model based on a preset comparison data set, wherein a loss function of the reward model is as follows:
Wherein,scalar output of bonus model with hint x and output y of parameter θ, output +.>Is a specific output +.>Good output, D is the comparison dataset, < +.>As an S-shaped function in a neural network, +.>Is a loss function value;
performing reinforcement training on the large language model based on a preset reward function, wherein the expression of the reward function is specifically as follows:
wherein,is a KL penalty function, < >>Is a model for learning RL strategy, +.>Is a large language model subjected to LoRA fine tuning, < ->Is the scalar output of the reward model with hint x and output y of parameter θ, D is the comparison dataset, objective is the reward function, Φ is the optimization parameter, +.>For reinforcement learning data sets, β is a predetermined correction factor and E is a reward calculation function.
The large language model in the invention uses a ChatGLM-6B model, and the open bilingual language model is based on a universal language model (General language model, GLM) framework, and has 62 hundred million parameters. ChatGLM-6B was optimized for Chinese QA and dialog, and its technique was similar to ChatGPT. The model is trained on about 1 trillion Chinese and English corpora, and is one of the open-source large language models which perform best in the Chinese field at present. In addition, the present invention uses a low-rank adaptive tuning model (LoRA) to tune large language models.
It should be noted that the preset comparison data is generated by a large language model (e.g., chatGPT, chatGLM-6B) by mimicking human preferences. The invention can improve the question-answering ability in the corresponding field by performing large language model tuning based on the reward model and the reward function.
FIG. 3 shows a large language model optimization flow chart of the present invention. According to the embodiment of the invention, preference optimization is carried out on a large language model according to a reward model, specifically:
s302, acquiring a preset test data set;
s304, carrying out question-answer test on the large language model based on a preset test data set, carrying out output analysis and reward score calculation on answers of each test based on a reward model and a reward function, and optimizing large language model parameters and the reward model based on calculation results;
s306, carrying out iterative optimization on the large predictive model and the rewarding model based on a preset test data set until the preset iterative times are reached.
It should be noted that, the preset test data set is question-answer data selected by the user, and may be used to perform user preference optimization.
The dialogue model suitable for liver cancer questions and answers can be obtained through the fine training of the model, the medical application scene is strong, and further, the model construction and training method can be applied to the questions and answers application scenes of other medical diseases, the practicability is strong, the scheme migration is simple, only the corresponding data and the preset templates need to be changed, and the application value is high.
According to an embodiment of the present invention, further comprising:
acquiring newly-added liver cancer knowledge data from a medical data platform within a preset time period;
acquiring an actual question-answer data set of a patient in a preset time period, and carrying out semantic analysis and entity word extraction based on the actual question-answer data set to obtain entity vocabulary data;
counting the occurrence frequency of each entity word in the entity vocabulary data, and dividing the entity vocabulary data into a high-frequency knowledge entity and a low-frequency knowledge entity based on a preset frequency;
based on the high-frequency knowledge entity and the low-frequency knowledge entity, carrying out knowledge classification on the newly added knowledge liver cancer knowledge data, and forming high-value knowledge data and low-value knowledge data;
generating high-frequency training data and low-frequency training data based on the high-value knowledge data and the low-value knowledge data respectively;
based on a preset proportion, carrying out data extraction and data integration from the high-frequency training data and the low-frequency training data to form initial training data;
and generating a second rewarding model based on the initial training data and the large language model, and performing question and answer data training and optimization on the large language model based on the second rewarding model and the initial training data.
It should be noted that, with the development of fusion of medical treatment and informatization, the amount of knowledge data of liver cancer is also increasing, in the invention, by obtaining new knowledge within a certain time and accurately extracting the corresponding new knowledge based on a certain proportion and technical means, training data with higher pertinence is formed, thereby reducing redundancy of question-answer data and further improving question-answer quality within a certain field range.
The predetermined ratio is generally 8:2,7:3, etc., and the high frequency training data is higher than the low frequency training data.
FIG. 4 shows a block diagram of a liver cancer auxiliary diagnosis and question-answering system based on a large language model of the present invention.
The second aspect of the present invention also provides a liver cancer auxiliary diagnosis and question-answering system 4 based on a large language model, which comprises: the memory 41 and the processor 42, wherein the memory comprises a liver cancer auxiliary diagnosis and question-answer program based on a large language model, and the liver cancer auxiliary diagnosis and question-answer program based on the large language model realizes the following steps when being executed by the processor:
acquiring a liver cancer knowledge data set through a medical data platform;
importing the liver cancer knowledge data set into a question-answer conversion model, and converting knowledge data based on a preset question-answer template to obtain a liver cancer instruction question-answer data set;
constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and fine-tuning the large language model based on a LoRA method;
generating a reward model based on the trimmed large language model, training the reward model according to the comparison data set, and generating a corresponding reward function;
And optimizing the preference of the large language model according to the rewarding model.
According to the embodiment of the invention, the liver cancer knowledge data set is collected through the medical data platform, specifically:
acquiring hospitalization service instruction manual data in a target medical institution, liver cancer image report data and liver cancer knowledge science popularization data through a medical data platform;
performing data extraction of entities, relations and attributes based on the inpatient service manual data, and forming an inpatient service knowledge graph;
extracting liver cancer image description information and image diagnosis information from the cancer image report data;
according to the liver cancer image description information and the image diagnosis information, a staging result is obtained through manual labeling;
the liver cancer knowledge data set comprises hospitalization service knowledge maps, liver cancer image description information, image diagnosis information, stage results and data of liver cancer knowledge science popularization.
The manual labeling is specifically a manual staging result labeling process based on a professional doctor.
According to the embodiment of the invention, the liver cancer knowledge data set is imported into a question-answer conversion model, and knowledge data conversion is performed based on a preset question-answer template to obtain a liver cancer instruction question-answer data set, which specifically comprises:
Integrating the data of the hospitalization service knowledge graph and the liver cancer knowledge science popularization into first training data based on a large language model;
importing the first training data into a question-answer conversion model based on ChatGPT, performing question-answer simulation according to a first preset question-answer prompt template, and generating first question-answer data;
carrying out data integration on the liver cancer image description information, the image diagnosis information and the staging result to form second training data;
importing the second training data into a question-answer conversion model, performing question-answer simulation based on a second preset question-answer prompting template, and generating second question-answer data;
and carrying out data integration on the first question-answer data and the second question-answer data to form a liver cancer instruction question-answer data set.
It should be noted that, the first preset question-answer prompting template is a template used by the hospitalization knowledge graph and the liver cancer knowledge science popularization data, that is, the first preset question-answer prompting template is applied to the first training data, and the content of the template is specifically as follows:
monograph question-answering prompt template:
according to the following text, question-answer data between the patient and the doctor are generated, and several groups are generated as much as possible, so that repetition cannot be caused. The doctor will give the patient a very well-tolerated and comprehensive answer and the mood is gentle and intimate, and more detailed and helpful in responding to the patient's inquiry: { data }
Multiple rounds of dialog prompt templates:
based on the text below, a dialogue is generated about the patient asking the doctor, which dialogue must be multi-round. The doctor will give the patient a very well-tolerated and comprehensive answer and the mood is gentle and intimate, and more detailed and helpful in responding to the patient's inquiry: { data }
Where { data } represents the question-answer data of the training.
In addition, a second preset question-answer prompting template used for liver cancer image description information, image diagnosis information and a stage result (second training data) is as follows:
you are a liver cancer doctor with abundant experience, please analyze that the liver cancer of the patient is several stages according to the stage standard of the liver cancer and the liver cancer image description of the patient, and the information you can extract in the liver cancer image description is: { info }, it is shown in your analysis that these info are described by you by analyzing liver cancer images. The values of the two indices PS and liver function Child-Pugh are known, the score of PS is { fraction }, liver function Child-Pugh is { grade }, and the final staging result must be: { stage }. The analysis process is detailed, the way of speaking is like a doctor, and the words are gentle and intimate, and the object you talk to can be { patient or other doctor }.
The staging criteria for liver cancer are:
CNLC phase Ia: the score of physical activity state (PS) is between 0 and 2, the liver function Child-Pugh is grade A or grade B, and single tumor with the diameter less than or equal to 5cm has no vascular invasion or extrahepatic metastasis;
CNLC stage Ib: PS scores are between 0 and 2, liver function Child-Pugh is grade A or grade B, single tumor, diameter is more than 5cm, or 2-3 tumors, maximum diameter is less than or equal to 3cm, and vascular invasion and extrahepatic metastasis are avoided;
CNLC stage IIa: PS scores are between 0 and 2, liver function Child-Pugh is grade A or grade B, 2 to 3 tumors have maximum diameters of more than 3cm, and vascular invasion and extrahepatic metastasis are avoided;
CNLC IIb phase: PS scores are between 0 and 2, liver functions Child-Pugh are grade A or grade B, the number of tumors is more than or equal to 4, and no matter what the tumor diameter is, no vascular invasion or extrahepatic metastasis exists;
CNLC stage IIIa: PS scores between 0 and 2, liver function Child-Pugh is grade A or grade B, and no matter the tumor condition is affected by blood vessels and has no extrahepatic metastasis;
CNLC IIIb phase: PS scores between 0 and 2, liver function Child-Pugh is grade A or grade B, and tumor conditions and vascular invasion are no matter whether extrahepatic metastasis exists;
CNLC stage IV: PS scores between 3 and 4, or liver function Child-Pugh is grade C, and tumor conditions, vascular invasion and extrahepatic metastasis are all the same.
The liver cancer image of the patient is described as follows: { liver cancer image description }
The prompt templates are used for inquiring the ChatGPT in combination with the collected data to generate high-quality question-answer data, the generated question-answer data are integrated into a liver cancer instruction question-answer data set after being processed, and the data are used for fine-tuning a large language model ChatGLM, wherein the data format is as follows:
[
{
"construction" is good, i know. What do i need to do that about the procedure and report to the process? ",
"input":"",
"output" refers to the procedure of the department of diet, you only need to go to building 5, building 1. After the ward is reached, you only need to report your name and hospitalization number to the nurse, and then can begin to settle. If you have any questions about the notes during hospitalization, our healthcare staff will also give you a patience answer. ",
"history":[
[ "doctor you good, i diagnosed need hospitalization, please ask how to handle admission? "you good, according to our hospital's specifications, you can get the admission notice to the doctor's place first, then fill out the relevant form to the 5 th floor 1 th hospitalization toll gate and pay the hospitalization fee. Then, the user needs to go to the meal department of building 5 and building 1 to transact the meal, and finally, the user directly goes to the ward nurse station to report. "]
[ "good, where does that me need to find the doctor? "you can go directly to the outpatient hall, and consult the staff with the doctor's location. But before this, you need to have a hospital notice that you have made, do you have obtained this notice? "]
[ "yes, I have taken the hospitalization notice. "that is, you can go directly to the hospitalization toll gate of floor 5 and 1 to fill out the relevant form and pay for the hospitalization. If you have any doubt about the cost of hospitalization, our hospitalization toll staff can provide you with a detailed bill of charge. "]
]
},
{
"construction" is known that the score of PS is 1, liver function Child-Pugh is B-class, and the liver cancer of the patient is analyzed to belong to that stage according to the following liver cancer image description,
input is that the two lungs are seen as multiple capsules without lung texture transparent shadow. The inner segment of the middle lobe of the right lung and the lower tongue segment of the upper lobe of the left lung are marked by a dense streak. The two lungs are seen as multiple solid nodule shadows with a diameter of about 2-3mm. The basal segment behind the lower lobe of the double lung is seen as flocculent blurred shadow, yu Shuangce lung fields are increased, disturbed and blurred, and the distribution of the double lung bronchus vascular bundles is natural. The trachea and bronchi are smooth, the tube wall is smooth, and the stenosis, the dilatation or the pressure change are not seen. The bilateral pulmonary portal is not enlarged, the mediastinum is not occupied with lesions, and the lymphadenopathy is not enlarged. The heart is normal in size and morphology, and calcification spots are seen in the aorta and the coronary arteries. The large blood vessels are reinforced uniformly, and filling defects are not found. The chest has normal morphology and no abnormal change of chest wall. The bilateral armpits were not seen with enlarged lymph nodes. The rib and thoracic vertebrae are not clearly damaged by bone. A small amount of effusion was seen in the bilateral chest. Liver is normal in morphology and size, moderate in liver-lobe proportion and smooth in edge. The liver S7 is similar to a circular low-density range, the diameter is about 35mm, the non-uniform reinforcement is visible in the arterial period after reinforcement, the edge reinforcement is obvious, the slightly low-density range is visible in the center, the venous period reinforcement is slightly higher than the surrounding liver tissue, no abnormal density shadow is visible in the residual liver parenchyma, and the abnormal reinforcement range is not visible in the dual-period scanning liver after reinforcement. The portal trunk and branches develop normally. The intrahepatic and extrahepatic biliary tract is not dilated, the gallbladder is not large, nodular high density shadow is seen in the gallbladder, and the size is about 14mm multiplied by 12mm. The pancreas is normal in shape and size, the head of the pancreas is in punctiform calcification, and the pancreatic duct is not expanded. Spleen size, morphology and density were not abnormal, and collateral vessels were not seen in the spleen gate and the stomach periphery. Small punctiform compact shadows are seen in the left renal calyx. The double kidneys are normal in size, morphology and density, and no abnormal reinforcement is seen after reinforcement. No abnormality was seen in the peritoneum, and no enlargement of the lymph nodes was seen in the mesenteric area and after the peritoneum. The inner side of the duodenal descending segment is seen to contain an air sac shadow with a diameter of about 8 mm. The layers seen rise the wall of the colon-structural hepatic curved tube thicken. ",
The "output" esteem patient, i have carefully analyzed your situation from the liver cancer image description you provide. First, I want to emphasize that your liver cancer image description shows no extrahepatic metastasis, no vascular invasion, 1 tumor, tumor diameter less than 5cm. This information is important for us to judge your liver cancer stage.
Now let us analyze your liver cancer stage according to CNLC stage criteria. Your physical activity status (PS) score is 1, liver function Child-Pugh is grade B. In combination with your liver cancer image description, we can draw the following conclusions:
no extrahepatic transfer: meets the requirements of the Ia phase, the Ib phase, the IIa phase, the IIb phase and the IIIa phase of CNLC;
no vascular invasion: meets the requirements of the Ia phase, the Ib phase, the IIa phase and the IIb phase of CNLC;
1 tumor: meets the requirements of the CNLC phase Ia and the Ib;
tumor diameter is less than 5cm: meets the requirements of CNLC Ia phase.
By combining the information, the liver cancer stage is CNLC Ia stage. This means that your liver cancer is at an earlier stage and the prognosis of treatment may be better. Please feel confident, we will provide you with the most appropriate treatment plan to help you recover as soon as possible. If you have any questions or concerns, please consult me at any time, i struggle to answer you. ",
"history":[]
}
]
Based on the preset template, the invention can further acquire high-quality question-answer data.
According to the embodiment of the invention, a large language model is constructed, the liver cancer instruction question-answer data set is imported into the large language model for training, and fine tuning is performed on the large language model based on a LoRA method, specifically:
constructing a large language model based on ChatGLM-6B;
importing the liver cancer instruction question-answer data set into a large language model for pre-training;
freezing the pre-trained model weight parameters and generating a newly added network layer based on the LoRA mode;
training the newly added network layer based on the liver cancer instruction question-answer data set, and updating corresponding parameters;
and importing the newly added network layer into a large language model.
In the invention, the fine tuning mode based on LoRA is selected, so that the number of parameters in the fine tuning process of the model can be greatly reduced, and the fine tuning efficiency of the model is improved.
According to the embodiment of the invention, the reward model is generated based on the trimmed large language model, the reward model is trained according to the comparison data set, and a corresponding reward function is generated, specifically:
training a reward model based on a preset comparison data set, wherein a loss function of the reward model is as follows:
Wherein the method comprises the steps of,Scalar output of bonus model with hint x and output y of parameter θ, output +.>Is a specific output +.>Good output, D is the comparison dataset, < +.>As an S-shaped function in a neural network, +.>Is a loss function value;
performing reinforcement training on the large language model based on a preset reward function, wherein the expression of the reward function is specifically as follows:
wherein,is a KL penalty function, < >>Is a model for learning RL strategy, +.>Is a large language model subjected to LoRA fine tuning, < ->Is the scalar output of the reward model with hint x and output y of parameter θ, D is the comparison dataset, objective is the reward function, Φ is the optimization parameter, +.>For reinforcement learning data sets, β is a predetermined correction factor and E is a reward calculation function.
The large language model in the invention uses a ChatGLM-6B model, and the open bilingual language model is based on a universal language model (General language model, GLM) framework, and has 62 hundred million parameters. ChatGLM-6B was optimized for Chinese QA and dialog, and its technique was similar to ChatGPT. The model is trained on about 1 trillion Chinese and English corpora, and is one of the open-source large language models which perform best in the Chinese field at present. In addition, the present invention uses a low-rank adaptive tuning model (LoRA) to tune large language models.
It should be noted that the preset comparison data is generated by a large language model (e.g., chatGPT, chatGLM-6B) by mimicking human preferences. The invention can improve the question-answering ability in the corresponding field by performing large language model tuning based on the reward model and the reward function.
According to the embodiment of the invention, preference optimization is carried out on a large language model according to a reward model, specifically:
acquiring a preset test data set;
performing question-answer test on the large language model based on a preset test data set, performing output analysis and reward score calculation on answers of each test based on a reward model and a reward function, and optimizing large language model parameters and the reward model based on calculation results;
and carrying out iterative optimization on the large predictive model and the rewarding model based on the preset test data set until the preset iterative times are reached.
It should be noted that, the preset test data set is question-answer data selected by the user, and may be used to perform user preference optimization.
The dialogue model suitable for liver cancer questions and answers can be obtained through the fine training of the model, the medical application scene is strong, and further, the model construction and training method can be applied to the questions and answers application scenes of other medical diseases, the practicability is strong, the scheme migration is simple, only the corresponding data and the preset templates need to be changed, and the application value is high.
According to an embodiment of the present invention, further comprising:
acquiring newly-added liver cancer knowledge data from a medical data platform within a preset time period;
acquiring an actual question-answer data set of a patient in a preset time period, and carrying out semantic analysis and entity word extraction based on the actual question-answer data set to obtain entity vocabulary data;
counting the occurrence frequency of each entity word in the entity vocabulary data, and dividing the entity vocabulary data into a high-frequency knowledge entity and a low-frequency knowledge entity based on a preset frequency;
based on the high-frequency knowledge entity and the low-frequency knowledge entity, carrying out knowledge classification on the newly added knowledge liver cancer knowledge data, and forming high-value knowledge data and low-value knowledge data;
generating high-frequency training data and low-frequency training data based on the high-value knowledge data and the low-value knowledge data respectively;
based on a preset proportion, carrying out data extraction and data integration from the high-frequency training data and the low-frequency training data to form initial training data;
and generating a second rewarding model based on the initial training data and the large language model, and performing question and answer data training and optimization on the large language model based on the second rewarding model and the initial training data.
It should be noted that, with the development of fusion of medical treatment and informatization, the amount of knowledge data of liver cancer is also increasing, in the invention, by obtaining new knowledge within a certain time and accurately extracting the corresponding new knowledge based on a certain proportion and technical means, training data with higher pertinence is formed, thereby reducing redundancy of question-answer data and further improving question-answer quality within a certain field range.
The predetermined ratio is generally 8:2,7:3, etc., and the high frequency training data is higher than the low frequency training data.
The third aspect of the present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium includes a liver cancer auxiliary diagnosis and question-answering program based on a large language model, and when the liver cancer auxiliary diagnosis and question-answering program based on the large language model is executed by a processor, the steps of the liver cancer auxiliary diagnosis and question-answering method based on the large language model described in any one of the above are implemented.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.
Alternatively, the above-described integrated units of the present invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (3)

1. A liver cancer auxiliary diagnosis and question-answering method based on a large language model is characterized by comprising the following steps:
acquiring a liver cancer knowledge data set through a medical data platform;
importing the liver cancer knowledge data set into a question-answer conversion model, and converting knowledge data based on a preset question-answer template to obtain a liver cancer instruction question-answer data set;
constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and fine-tuning the large language model based on a LoRA method;
generating a reward model based on the trimmed large language model, training the reward model according to the comparison data set, and generating a corresponding reward function;
optimizing the preference of the large language model according to the rewarding model;
wherein the method further comprises:
Acquiring newly-added liver cancer knowledge data from a medical data platform within a preset time period;
acquiring an actual question-answer data set of a patient in a preset time period, and carrying out semantic analysis and entity word extraction based on the actual question-answer data set to obtain entity vocabulary data;
counting the occurrence frequency of each entity word in the entity vocabulary data, and dividing the entity vocabulary data into a high-frequency knowledge entity and a low-frequency knowledge entity based on a preset frequency;
based on the high-frequency knowledge entity and the low-frequency knowledge entity, carrying out knowledge classification on the newly added knowledge liver cancer knowledge data, and forming high-value knowledge data and low-value knowledge data;
generating high-frequency training data and low-frequency training data based on the high-value knowledge data and the low-value knowledge data respectively;
based on a preset proportion, carrying out data extraction and data integration from the high-frequency training data and the low-frequency training data to form initial training data;
generating a second rewarding model based on the initial training data and the large language model, and performing question-answer data training and optimization on the large language model based on the second rewarding model and the initial training data;
the liver cancer knowledge data set is collected through the medical data platform, and specifically comprises the following steps:
Acquiring hospitalization service instruction manual data in a target medical institution, liver cancer image report data and liver cancer knowledge science popularization data through a medical data platform;
performing data extraction of entities, relations and attributes based on the inpatient service manual data, and forming an inpatient service knowledge graph;
extracting liver cancer image description information and image diagnosis information from the cancer image report data;
according to the liver cancer image description information and the image diagnosis information, a staging result is obtained through manual labeling;
the liver cancer knowledge data set comprises hospitalization service knowledge maps, liver cancer image description information, image diagnosis information, a staging result and data of liver cancer knowledge science popularization;
the liver cancer knowledge data set is imported into a question-answer conversion model, knowledge data conversion is carried out based on a preset question-answer template, and a liver cancer instruction question-answer data set is obtained, specifically:
integrating the data of the hospitalization service knowledge graph and the liver cancer knowledge science popularization into first training data based on a large language model;
importing the first training data into a question-answer conversion model based on ChatGPT, performing question-answer simulation according to a first preset question-answer prompt template, and generating first question-answer data;
Carrying out data integration on the liver cancer image description information, the image diagnosis information and the staging result to form second training data;
importing the second training data into a question-answer conversion model, performing question-answer simulation based on a second preset question-answer prompting template, and generating second question-answer data;
data integration is carried out on the first question-answer data and the second question-answer data to form a liver cancer instruction question-answer data set;
the method comprises the steps of constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and performing fine adjustment on the large language model based on a LoRA method, wherein the method specifically comprises the following steps:
constructing a large language model based on ChatGLM-6B;
importing the liver cancer instruction question-answer data set into a large language model for pre-training;
freezing the pre-trained model weight parameters and generating a newly added network layer based on the LoRA mode;
training the newly added network layer based on the liver cancer instruction question-answer data set, and updating corresponding parameters;
introducing the newly added network layer into a large language model;
the method comprises the steps of generating a reward model based on the trimmed large language model, training the reward model according to a comparison data set, and generating a corresponding reward function, wherein the method comprises the following specific steps:
training a reward model based on a preset comparison data set, wherein a loss function of the reward model is as follows:
Wherein,is a scalar output of the bonus model with hint x and output y of parameter θ, where y is +.>And->Output->Is a specific output +.>Good output, D is the comparison dataset, < +.>As an S-shaped function in a neural network, +.>Is a loss function value;
performing reinforcement training on the large language model based on a preset reward function, wherein the expression of the reward function is specifically as follows:
wherein,is a KL penalty function, < >>Is a model for learning RL strategy, +.>Is a large language model subjected to LoRA fine tuning, < ->Is the scalar output of the reward model with hint x and output y of parameter θ, D is the comparison dataset, objective is the reward function, Φ is the optimization parameter, +.>For the reinforcement learning data set, beta is a preset correction coefficient, E is a reward calculation function;
the preference optimization of the large language model according to the rewarding model is specifically as follows:
acquiring a preset test data set;
performing question-answer test on the large language model based on a preset test data set, performing output analysis and reward score calculation on answers of each test based on a reward model and a reward function, and optimizing large language model parameters and the reward model based on calculation results;
performing iterative optimization on the large predictive model and the rewarding model based on a preset test data set until the preset iterative times are reached;
The first preset question-answer prompting template is a template used by hospitalization service knowledge graph and liver cancer knowledge science popularization data, namely, is applied to first training data, and the content of the template is specifically as follows:
monograph question-answering prompt template:
according to the text, generating question-answer data between the patient and the doctor, generating several groups as much as possible, and not repeating; the doctor will give the patient a very well-tolerated and comprehensive answer and the mood is gentle and intimate, and more detailed and helpful in responding to the patient's inquiry: { data }
Multiple rounds of dialog prompt templates:
based on the text below, a dialogue is generated about the patient asking the doctor, which dialogue must be multi-round; the doctor will give the patient a very well-tolerated and comprehensive answer and the mood is gentle and intimate, and more detailed and helpful in responding to the patient's inquiry: { data }
Wherein { data } represents training question-answer data;
in addition, a second preset question-answer prompting template used for liver cancer image description information, image diagnosis information and stage results is as follows:
you are a liver cancer doctor with abundant experience, please analyze that the liver cancer of the patient is several stages according to the stage standard of the liver cancer and the liver cancer image description of the patient, and the information you can extract in the liver cancer image description is: { information }, which can be shown in your analysis is described by you by analyzing liver cancer images; the values of the two indices PS and liver function Child-Pugh are known, the score of PS is { fraction }, liver function Child-Pugh is { grade }, and the final staging result must be: { stage }; the analysis process is detailed, the speaking mode is like doctor, and the words are gentle and intimate, and the object you talk to can be { patient or other doctor };
The staging criteria for liver cancer are:
CNLC phase Ia: the score of physical activity state (PS) is between 0 and 2, the liver function Child-Pugh is grade A or grade B, and single tumor with the diameter less than or equal to 5cm has no vascular invasion or extrahepatic metastasis;
CNLC stage Ib: PS scores are between 0 and 2, liver function Child-Pugh is grade A or grade B, single tumor, diameter is more than 5cm, or 2-3 tumors, maximum diameter is less than or equal to 3cm, and vascular invasion and extrahepatic metastasis are avoided;
CNLC stage IIa: PS scores are between 0 and 2, liver function Child-Pugh is grade A or grade B, 2 to 3 tumors have maximum diameters of more than 3cm, and vascular invasion and extrahepatic metastasis are avoided;
CNLC IIb phase: PS scores are between 0 and 2, liver functions Child-Pugh are grade A or grade B, the number of tumors is more than or equal to 4, and no matter what the tumor diameter is, no vascular invasion or extrahepatic metastasis exists;
CNLC stage IIIa: PS scores between 0 and 2, liver function Child-Pugh is grade A or grade B, and no matter the tumor condition is affected by blood vessels and has no extrahepatic metastasis;
CNLC IIIb phase: PS scores between 0 and 2, liver function Child-Pugh is grade A or grade B, and tumor conditions and vascular invasion are no matter whether extrahepatic metastasis exists;
CNLC stage IV: PS scores between 3 and 4, or liver function Child-Pugh is grade C, and tumor conditions, vascular invasion and extrahepatic metastasis are all the same;
The liver cancer image of the patient is described as follows: { liver cancer image description }
The prompt templates are used for inquiring the ChatGPT in combination with the collected data to generate high-quality question-answer data, the generated question-answer data are integrated into a liver cancer instruction question-answer data set after being processed, and the data are used for fine-tuning a large language model ChatGLM, wherein the data format is as follows:
[
{
"construction" good, I know; what do i need to do that about the procedure and report to the process? ",
"input":"",
"output" refers to the procedure of the meal department, and you only need to go to building 5 and building 1 for handling; after the ward is reached, you only need to report your name and hospitalization number to the nurse, and then can begin to place; if you have any problem on the notes during hospitalization, our healthcare staff will also give you a one-to-one patience answer; ",
"history":[
[ "doctor you good, i diagnosed need hospitalization, please ask how to handle admission? "you good, according to our hospital's regulations, you can get the admission notice to the doctor, then fill out the relevant form to the 5 th floor 1 th hospitalization toll gate and pay the hospitalization according to the money; then, the user needs to go to the meal department of the building No. 5 and building No. 1 to transact the meal procedure, and finally, the user directly goes to a ward nurse station to report; "]
[ "good, where does that me need to find the doctor? "you can go directly to the outpatient hall, and consult the staff with the doctor's location; but before this, you need to have a hospital notice that you have made, do you have obtained this notice? "]
[ "yes, I have taken the hospitalization notice; "that is too good, you can go directly to the hospitalization toll gate of building 5 and 1 to fill out the relevant form, and pay the hospitalization according to the money; if you have any doubt about the cost of hospitalization, our hospitalization toll office staff can provide you with a detailed bill of charge; "]
]
},
{
"construction" is known that the score of PS is 1, liver function Child-Pugh is B-class, and the liver cancer of the patient is analyzed to belong to that stage according to the following liver cancer image description,
input, double lung see multiple saccular lung-free texture transparent shadow; the inner section of the middle lobe of the right lung and the lower tongue section of the upper lobe of the left lung are in a strip-shaped dense shadow; double lung is seen with multiple solid nodule shadows, the diameter is about 2-3mm; the basal section of the lower lobe of the double lung is provided with flocculent fuzzy shadow, yu Shuangce lung fields are increased, disturbed and fuzzy, and the double lung bronchus vascular bundles run naturally; the trachea and bronchi are smooth, the tube wall is smooth, and the stenosis, the dilatation or the pressure change are not seen; the bilateral pulmonary portal is not enlarged, the mediastinum is not occupied, and the lymphadenopathy is not caused; the heart is normal in size and shape, and calcification spots are seen in the aorta and the coronary arteries; all the large blood vessels are reinforced uniformly, and filling defects are not found; the chest has normal shape and no abnormal change of chest wall; the bilateral axilla was not seen with enlarged lymph nodes; the ribs and thoracic vertebrae are not damaged clearly; a small amount of effusion is seen in the bilateral thoracic cavities; liver is normal in shape and size, proportion of liver and leaves is moderate, and edges are smooth; the liver is similar to a circular low-density range with the diameter of about 35mm, the non-uniform reinforcement is visible in the arterial period after reinforcement, the edge reinforcement is obvious, the slightly low-density range is visible in the center, the venous period reinforcement is slightly higher than the surrounding liver tissue, no abnormal density shadow is visible in the residual liver parenchyma, and no abnormal reinforcement range is visible in the dual-period scanning liver after reinforcement; the gate trunk and branches thereof develop normally; the intrahepatic and extrahepatic biliary tracts are not dilated, the gall bladder is not large, nodular high density shadows are seen in the gall bladder, and the size is about 14mm multiplied by 12mm; the pancreas is normal in shape and size, the head of the pancreas is in punctiform calcification, and the pancreatic duct is not expanded; spleen size, morphology and density are not abnormal, and collateral blood vessels are not found in the spleen gate and the stomach periphery; small punctiform compact shadows are seen in the left renal calyx; the size, shape and density of the double kidneys are normal, and abnormal reinforcement is not seen after reinforcement; no abnormality was seen in the peritoneum, no enlargement of lymph nodes was seen in the mesenteric area and after the peritoneum; the inner side of the duodenal descending segment is provided with an air-contained bag-shaped shadow with the diameter of about 8 mm; the observed layer increases the thickness of the colon-structural hepatic curved tube wall; ",
"output" esteem patients, I have carefully analyzed your situation from your provided liver cancer image description; firstly, I want to emphasize that your liver cancer image description shows no extrahepatic metastasis, no vascular invasion, 1 tumor, tumor diameter less than 5cm; these information are very important for us to judge your liver cancer stage;
now let us analyze your liver cancer stage according to CNLC stage criteria; the physical activity state (PS) score of the liver is 1, and the liver function Child-Pugh is B grade; in combination with your liver cancer image description, we can draw the following conclusions:
no extrahepatic transfer: meets the requirements of the Ia phase, the Ib phase, the IIa phase, the IIb phase and the IIIa phase of CNLC;
no vascular invasion: meets the requirements of the Ia phase, the Ib phase, the IIa phase and the IIb phase of CNLC;
1 tumor: meets the requirements of the CNLC phase Ia and the Ib;
tumor diameter is less than 5cm: meets the requirements of CNLC Ia phase;
by combining the information, the liver cancer stage is CNLC Ia stage; this means that your liver cancer is at an earlier stage, and the prognosis of treatment may be better; please feel confident, we can provide the most appropriate treatment scheme for you, helping you recover as soon as possible; if you have any questions or concerns, please consult me at any time, i struggle to answer you; ",
"history":[]
}
]。
2. A liver cancer auxiliary diagnosis and question-answering system based on a large language model is characterized in that the system comprises: the liver cancer diagnosis and question-answering system comprises a memory and a processor, wherein the memory comprises a liver cancer diagnosis assisting and question-answering program based on a large language model, and the liver cancer diagnosis assisting and question-answering program based on the large language model realizes the following steps when being executed by the processor:
acquiring a liver cancer knowledge data set through a medical data platform;
importing the liver cancer knowledge data set into a question-answer conversion model, and converting knowledge data based on a preset question-answer template to obtain a liver cancer instruction question-answer data set;
constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and fine-tuning the large language model based on a LoRA method;
generating a reward model based on the trimmed large language model, training the reward model according to the comparison data set, and generating a corresponding reward function;
optimizing the preference of the large language model according to the rewarding model;
wherein the method further comprises:
acquiring newly-added liver cancer knowledge data from a medical data platform within a preset time period;
acquiring an actual question-answer data set of a patient in a preset time period, and carrying out semantic analysis and entity word extraction based on the actual question-answer data set to obtain entity vocabulary data;
Counting the occurrence frequency of each entity word in the entity vocabulary data, and dividing the entity vocabulary data into a high-frequency knowledge entity and a low-frequency knowledge entity based on a preset frequency;
based on the high-frequency knowledge entity and the low-frequency knowledge entity, carrying out knowledge classification on the newly added knowledge liver cancer knowledge data, and forming high-value knowledge data and low-value knowledge data;
generating high-frequency training data and low-frequency training data based on the high-value knowledge data and the low-value knowledge data respectively;
based on a preset proportion, carrying out data extraction and data integration from the high-frequency training data and the low-frequency training data to form initial training data;
generating a second rewarding model based on the initial training data and the large language model, and performing question-answer data training and optimization on the large language model based on the second rewarding model and the initial training data;
the liver cancer knowledge data set is collected through the medical data platform, and specifically comprises the following steps:
acquiring hospitalization service instruction manual data in a target medical institution, liver cancer image report data and liver cancer knowledge science popularization data through a medical data platform;
performing data extraction of entities, relations and attributes based on the inpatient service manual data, and forming an inpatient service knowledge graph;
Extracting liver cancer image description information and image diagnosis information from the cancer image report data;
according to the liver cancer image description information and the image diagnosis information, a staging result is obtained through manual labeling;
the liver cancer knowledge data set comprises hospitalization service knowledge maps, liver cancer image description information, image diagnosis information, a staging result and data of liver cancer knowledge science popularization;
the liver cancer knowledge data set is imported into a question-answer conversion model, knowledge data conversion is carried out based on a preset question-answer template, and a liver cancer instruction question-answer data set is obtained, specifically:
integrating the data of the hospitalization service knowledge graph and the liver cancer knowledge science popularization into first training data based on a large language model;
importing the first training data into a question-answer conversion model based on ChatGPT, performing question-answer simulation according to a first preset question-answer prompt template, and generating first question-answer data;
carrying out data integration on the liver cancer image description information, the image diagnosis information and the staging result to form second training data;
importing the second training data into a question-answer conversion model, performing question-answer simulation based on a second preset question-answer prompting template, and generating second question-answer data;
Data integration is carried out on the first question-answer data and the second question-answer data to form a liver cancer instruction question-answer data set;
the method comprises the steps of constructing a large language model, importing the liver cancer instruction question-answer data set into the large language model for training, and performing fine adjustment on the large language model based on a LoRA method, wherein the method specifically comprises the following steps:
constructing a large language model based on ChatGLM-6B;
importing the liver cancer instruction question-answer data set into a large language model for pre-training;
freezing the pre-trained model weight parameters and generating a newly added network layer based on the LoRA mode;
training the newly added network layer based on the liver cancer instruction question-answer data set, and updating corresponding parameters;
introducing the newly added network layer into a large language model;
the method comprises the steps of generating a reward model based on the trimmed large language model, training the reward model according to a comparison data set, and generating a corresponding reward function, wherein the method comprises the following specific steps:
training a reward model based on a preset comparison data set, wherein a loss function of the reward model is as follows:
wherein,is a scalar output of the bonus model with hint x and output y of parameter θ, where y is +.>And->Output->Is a specific output +.>Good output, D is the comparison dataset, < +. >As an S-shaped function in a neural network, +.>Is a loss function value;
performing reinforcement training on the large language model based on a preset reward function, wherein the expression of the reward function is specifically as follows:
wherein,is a KL penalty function, < >>Is a study ofModel of RL strategy->Is a large language model subjected to LoRA fine tuning, < ->Is the scalar output of the reward model with hint x and output y of parameter θ, D is the comparison dataset, objective is the reward function, Φ is the optimization parameter, +.>For the reinforcement learning data set, beta is a preset correction coefficient, E is a reward calculation function;
the preference optimization of the large language model according to the rewarding model is specifically as follows:
acquiring a preset test data set;
performing question-answer test on the large language model based on a preset test data set, performing output analysis and reward score calculation on answers of each test based on a reward model and a reward function, and optimizing large language model parameters and the reward model based on calculation results;
performing iterative optimization on the large predictive model and the rewarding model based on a preset test data set until the preset iterative times are reached;
the first preset question-answer prompting template is a template used by hospitalization service knowledge graph and liver cancer knowledge science popularization data, namely, is applied to first training data, and the content of the template is specifically as follows:
Monograph question-answering prompt template:
according to the text, generating question-answer data between the patient and the doctor, generating several groups as much as possible, and not repeating; the doctor will give the patient a very well-tolerated and comprehensive answer and the mood is gentle and intimate, and more detailed and helpful in responding to the patient's inquiry: { data }
Multiple rounds of dialog prompt templates:
based on the text below, a dialogue is generated about the patient asking the doctor, which dialogue must be multi-round; the doctor will give the patient a very well-tolerated and comprehensive answer and the mood is gentle and intimate, and more detailed and helpful in responding to the patient's inquiry: { data }
Wherein { data } represents training question-answer data;
in addition, a second preset question-answer prompting template used for liver cancer image description information, image diagnosis information and stage results is as follows:
you are a liver cancer doctor with abundant experience, please analyze that the liver cancer of the patient is several stages according to the stage standard of the liver cancer and the liver cancer image description of the patient, and the information you can extract in the liver cancer image description is: { information }, which can be shown in your analysis is described by you by analyzing liver cancer images; the values of the two indices PS and liver function Child-Pugh are known, the score of PS is { fraction }, liver function Child-Pugh is { grade }, and the final staging result must be: { stage }; the analysis process is detailed, the speaking mode is like doctor, and the words are gentle and intimate, and the object you talk to can be { patient or other doctor };
The staging criteria for liver cancer are:
CNLC phase Ia: the score of physical activity state (PS) is between 0 and 2, the liver function Child-Pugh is grade A or grade B, and single tumor with the diameter less than or equal to 5cm has no vascular invasion or extrahepatic metastasis;
CNLC stage Ib: PS scores are between 0 and 2, liver function Child-Pugh is grade A or grade B, single tumor, diameter is more than 5cm, or 2-3 tumors, maximum diameter is less than or equal to 3cm, and vascular invasion and extrahepatic metastasis are avoided;
CNLC stage IIa: PS scores are between 0 and 2, liver function Child-Pugh is grade A or grade B, 2 to 3 tumors have maximum diameters of more than 3cm, and vascular invasion and extrahepatic metastasis are avoided;
CNLC IIb phase: PS scores are between 0 and 2, liver functions Child-Pugh are grade A or grade B, the number of tumors is more than or equal to 4, and no matter what the tumor diameter is, no vascular invasion or extrahepatic metastasis exists;
CNLC stage IIIa: PS scores between 0 and 2, liver function Child-Pugh is grade A or grade B, and no matter the tumor condition is affected by blood vessels and has no extrahepatic metastasis;
CNLC IIIb phase: PS scores between 0 and 2, liver function Child-Pugh is grade A or grade B, and tumor conditions and vascular invasion are no matter whether extrahepatic metastasis exists;
CNLC stage IV: PS scores between 3 and 4, or liver function Child-Pugh is grade C, and tumor conditions, vascular invasion and extrahepatic metastasis are all the same;
The liver cancer image of the patient is described as follows: { liver cancer image description }
The prompt templates are used for inquiring the ChatGPT in combination with the collected data to generate high-quality question-answer data, the generated question-answer data are integrated into a liver cancer instruction question-answer data set after being processed, and the data are used for fine-tuning a large language model ChatGLM, wherein the data format is as follows:
[
{
"construction" good, I know; what do i need to do that about the procedure and report to the process? ",
"input":"",
"output" refers to the procedure of the meal department, and you only need to go to building 5 and building 1 for handling; after the ward is reached, you only need to report your name and hospitalization number to the nurse, and then can begin to place; if you have any problem on the notes during hospitalization, our healthcare staff will also give you a one-to-one patience answer; ",
"history":[
[ "doctor you good, i diagnosed need hospitalization, please ask how to handle admission? "you good, according to our hospital's regulations, you can get the admission notice to the doctor, then fill out the relevant form to the 5 th floor 1 th hospitalization toll gate and pay the hospitalization according to the money; then, the user needs to go to the meal department of the building No. 5 and building No. 1 to transact the meal procedure, and finally, the user directly goes to a ward nurse station to report; "]
[ "good, where does that me need to find the doctor? "you can go directly to the outpatient hall, and consult the staff with the doctor's location; but before this, you need to have a hospital notice that you have made, do you have obtained this notice? "]
[ "yes, I have taken the hospitalization notice; "that is too good, you can go directly to the hospitalization toll gate of building 5 and 1 to fill out the relevant form, and pay the hospitalization according to the money; if you have any doubt about the cost of hospitalization, our hospitalization toll office staff can provide you with a detailed bill of charge; "]
]
},
{
"construction" is known that the score of PS is 1, liver function Child-Pugh is B-class, and the liver cancer of the patient is analyzed to belong to that stage according to the following liver cancer image description,
input, double lung see multiple saccular lung-free texture transparent shadow; the inner section of the middle lobe of the right lung and the lower tongue section of the upper lobe of the left lung are in a strip-shaped dense shadow; double lung is seen with multiple solid nodule shadows, the diameter is about 2-3mm; the basal section of the lower lobe of the double lung is provided with flocculent fuzzy shadow, yu Shuangce lung fields are increased, disturbed and fuzzy, and the double lung bronchus vascular bundles run naturally; the trachea and bronchi are smooth, the tube wall is smooth, and the stenosis, the dilatation or the pressure change are not seen; the bilateral pulmonary portal is not enlarged, the mediastinum is not occupied, and the lymphadenopathy is not caused; the heart is normal in size and shape, and calcification spots are seen in the aorta and the coronary arteries; all the large blood vessels are reinforced uniformly, and filling defects are not found; the chest has normal shape and no abnormal change of chest wall; the bilateral axilla was not seen with enlarged lymph nodes; the ribs and thoracic vertebrae are not damaged clearly; a small amount of effusion is seen in the bilateral thoracic cavities; liver is normal in shape and size, proportion of liver and leaves is moderate, and edges are smooth; the liver is similar to a circular low-density range with the diameter of about 35mm, the non-uniform reinforcement is visible in the arterial period after reinforcement, the edge reinforcement is obvious, the slightly low-density range is visible in the center, the venous period reinforcement is slightly higher than the surrounding liver tissue, no abnormal density shadow is visible in the residual liver parenchyma, and no abnormal reinforcement range is visible in the dual-period scanning liver after reinforcement; the gate trunk and branches thereof develop normally; the intrahepatic and extrahepatic biliary tracts are not dilated, the gall bladder is not large, nodular high density shadows are seen in the gall bladder, and the size is about 14mm multiplied by 12mm; the pancreas is normal in shape and size, the head of the pancreas is in punctiform calcification, and the pancreatic duct is not expanded; spleen size, morphology and density are not abnormal, and collateral blood vessels are not found in the spleen gate and the stomach periphery; small punctiform compact shadows are seen in the left renal calyx; the size, shape and density of the double kidneys are normal, and abnormal reinforcement is not seen after reinforcement; no abnormality was seen in the peritoneum, no enlargement of lymph nodes was seen in the mesenteric area and after the peritoneum; the inner side of the duodenal descending segment is provided with an air-contained bag-shaped shadow with the diameter of about 8 mm; the observed layer increases the thickness of the colon-structural hepatic curved tube wall; ",
"output" esteem patients, I have carefully analyzed your situation from your provided liver cancer image description; firstly, I want to emphasize that your liver cancer image description shows no extrahepatic metastasis, no vascular invasion, 1 tumor, tumor diameter less than 5cm; these information are very important for us to judge your liver cancer stage;
now let us analyze your liver cancer stage according to CNLC stage criteria; the physical activity state (PS) score of the liver is 1, and the liver function Child-Pugh is B grade; in combination with your liver cancer image description, we can draw the following conclusions:
no extrahepatic transfer: meets the requirements of the Ia phase, the Ib phase, the IIa phase, the IIb phase and the IIIa phase of CNLC;
no vascular invasion: meets the requirements of the Ia phase, the Ib phase, the IIa phase and the IIb phase of CNLC;
1 tumor: meets the requirements of the CNLC phase Ia and the Ib;
tumor diameter is less than 5cm: meets the requirements of CNLC Ia phase;
by combining the information, the liver cancer stage is CNLC Ia stage; this means that your liver cancer is at an earlier stage, and the prognosis of treatment may be better; please feel confident, we can provide the most appropriate treatment scheme for you, helping you recover as soon as possible; if you have any questions or concerns, please consult me at any time, i struggle to answer you; ",
"history":[]
}
]。
3. A computer-readable storage medium, wherein the computer-readable storage medium includes a large language model-based liver cancer auxiliary diagnosis and question-answering program, and the large language model-based liver cancer auxiliary diagnosis and question-answering program, when executed by a processor, implements the steps of the large language model-based liver cancer auxiliary diagnosis and question-answering method according to claim 1.
CN202311216697.1A 2023-09-20 2023-09-20 Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model Active CN116975241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311216697.1A CN116975241B (en) 2023-09-20 2023-09-20 Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311216697.1A CN116975241B (en) 2023-09-20 2023-09-20 Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model

Publications (2)

Publication Number Publication Date
CN116975241A CN116975241A (en) 2023-10-31
CN116975241B true CN116975241B (en) 2024-01-09

Family

ID=88477005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311216697.1A Active CN116975241B (en) 2023-09-20 2023-09-20 Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model

Country Status (1)

Country Link
CN (1) CN116975241B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117648397A (en) * 2023-11-07 2024-03-05 中译语通科技股份有限公司 Chapter event extraction method, system, equipment and storage medium
CN117272052B (en) * 2023-11-22 2024-02-09 北京壹永科技有限公司 Large language model training method, device, equipment and storage medium
CN117633252A (en) * 2023-12-14 2024-03-01 广州华微明天软件技术有限公司 Auxiliary retrieval method integrating knowledge graph and large language model
CN117649949B (en) * 2024-01-29 2024-04-30 浙江大学 Clinical thinking data generation system and method based on reinforcement learning
CN117708307B (en) * 2024-02-06 2024-05-14 西北工业大学 Method and device for fusing micro-tuning and Adapter of large language model
CN117809798B (en) * 2024-03-01 2024-04-26 金堂县第一人民医院 Verification report interpretation method, system, equipment and medium based on large model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200063352A (en) * 2018-11-23 2020-06-05 박해유 Medical service provision system using chatbot based on deep learning technology
AU2020104254A4 (en) * 2020-04-23 2021-03-11 Xiamen University Healthcare question answering (qa) method and system based on contextualized language model and knowledge embedding
CN115658886A (en) * 2022-09-20 2023-01-31 广东技术师范大学 Intelligent liver cancer staging method, system and medium based on semantic text
CN116092699A (en) * 2021-11-05 2023-05-09 上海仰和华健人工智能科技有限公司 Cancer question-answer interaction method based on pre-training model
CN116775911A (en) * 2023-08-22 2023-09-19 北京六元空间信息科技有限责任公司 Medical queue follow-up dialogue assisting method and system based on questionnaire and large model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200063352A (en) * 2018-11-23 2020-06-05 박해유 Medical service provision system using chatbot based on deep learning technology
AU2020104254A4 (en) * 2020-04-23 2021-03-11 Xiamen University Healthcare question answering (qa) method and system based on contextualized language model and knowledge embedding
CN116092699A (en) * 2021-11-05 2023-05-09 上海仰和华健人工智能科技有限公司 Cancer question-answer interaction method based on pre-training model
CN115658886A (en) * 2022-09-20 2023-01-31 广东技术师范大学 Intelligent liver cancer staging method, system and medium based on semantic text
CN116775911A (en) * 2023-08-22 2023-09-19 北京六元空间信息科技有限责任公司 Medical queue follow-up dialogue assisting method and system based on questionnaire and large model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Training language models to follow instructions with human feedback;Long Ouyang等;《arXiv》;第1-68页 *

Also Published As

Publication number Publication date
CN116975241A (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN116975241B (en) Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model
Zhang et al. ME‐Net: multi‐encoder net framework for brain tumor segmentation
Chen et al. Classification of lungs infected COVID-19 images based on inception-ResNet
CN110348541A (en) Optical fundus blood vessel image classification method, device, equipment and storage medium
CN109063710A (en) Based on the pyramidal 3D CNN nasopharyngeal carcinoma dividing method of Analysis On Multi-scale Features
CN110348515A (en) Image classification method, image classification model training method and device
CN107194158A (en) A kind of disease aided diagnosis method based on image recognition
CN109259784A (en) AI prediction technique, device, equipment and the storage medium of cerebral infarction
Deng et al. Speech-based diagnosis of autism spectrum condition by generative adversarial network representations
CN110070540A (en) Image generating method, device, computer equipment and storage medium
CN112635013B (en) Medical image information processing method and device, electronic equipment and storage medium
CN109859841A (en) Diagnosis for liver cancer system, method, equipment and medium neural network based
CN111860528A (en) Image segmentation model based on improved U-Net network and training method
CN116884559A (en) Language model-based image report generation method and system
Meng et al. Radiomics-enhanced deep multi-task learning for outcome prediction in head and neck cancer
Balsano et al. Artificial Intelligence and liver: Opportunities and barriers
Ruan et al. An efficient tongue segmentation model based on u-net framework
Feng et al. Automatic segmentation of thrombosed aortic dissection in post‐operative CT‐angiography images
CN114093512A (en) Survival prediction method based on multi-mode data and deep learning model
CN113129310A (en) Medical image segmentation system based on attention routing
CN116687438A (en) Method and device for identifying borborygmus
Moghaddasi et al. Comparing the efficiency of artificial neural network and gene expression programming in predicting coronary artery disease
Hamabe et al. Artificial intelligence‐based technology to make a three‐dimensional pelvic model for preoperative simulation of rectal cancer surgery using MRI
Huang et al. Dual-term loss function for shape-aware medical image segmentation
CN115908299A (en) Medical image-based life cycle prediction method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant