CN117114112A

CN117114112A - Vertical field data integration method, device, equipment and medium based on large model

Info

Publication number: CN117114112A
Application number: CN202311329473.1A
Authority: CN
Inventors: 王伟; 贾惠迪; 邹克旭; 郭东宸; 常鹏慧; 孙悦丽; 朱珊娴; 田启明
Original assignee: Beijing Yingshi Ruida Technology Co ltd
Current assignee: Beijing Yingshi Ruida Technology Co ltd
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2023-11-24
Anticipated expiration: 2043-10-16
Also published as: CN117114112B

Abstract

The application provides a vertical field data integration method, device, equipment and medium based on a large model, and relates to the technical field of data processing. The method comprises the following steps: receiving query sentences input by a user through a vertical domain agent corresponding to each vertical domain; invoking a pre-trained large-scale model, and identifying the query intention of the query statement through the large-scale model; in the vertical domain agent, according to the query intention, a preset external program interface is called to extract real-time data corresponding to the query intention, and/or a pre-trained extraction large model is called to extract vertical domain knowledge corresponding to the query intention; and calling a pre-trained digest big model, and integrating the obtained real-time data and/or the obtained vertical domain knowledge into response data of the query statement. The application can realize the efficient integration of knowledge, data and logic in the vertical field.

Description

Vertical field data integration method, device, equipment and medium based on large model

Technical Field

The application relates to the technical field of data processing, in particular to a vertical field data integration method, device, equipment and medium based on a large model.

Background

The vertical domain large model refers to a large language model which is trained and optimized in a specific domain or industry, and can be used for solving various problems in the domain, and has high accuracy and efficiency. Compared with a general language model, the vertical domain large model is more focused on the knowledge and skills of a specific domain, and has higher domain expertise and practicability.

Currently, several methods for combining the vertical domain with large models are as follows: (1) pretraining-fine tuning: this is one of the most widely used methods at present. In this approach, the model is first pre-trained on a large generic corpus to learn the general features and structure of the language. The model is then adapted to the domain specific task by fine-tuning on domain specific data. (2) data enhancement: data enhancement is a technique that generates new training samples by performing a series of random transformations and processing on the raw data. In large models in the vertical domain, various data enhancement techniques may be used to augment the training data set to improve the generalization ability and robustness of the model.

While the pretraining-fine tuning method is in most cases unsupervised or weakly supervised, the model is mainly learned from large-scale generic data. Thus, the model may lack domain-specific expertise and details, which limit the accuracy and practicality of the model in the vertical domain. In addition, knowledge migration between different domains may be affected by factors such as differences in data distribution, domain-specific languages and rules, etc., so that the model still has limited performance in a specific domain.

On the other hand, data enhancement methods may result in models that are too sensitive to certain specific data changes, thereby increasing the risk of overfitting. If the enhanced data is too close to some samples in the training set, the model may perform well when processing similar data, but poorly when faced with data in a new, real scene.

Disclosure of Invention

In view of the above, the embodiments of the present application provide a method, apparatus, device and medium for integrating vertical domain data based on a large model, so as to implement efficient integration of vertical domain knowledge, data and logic.

The embodiment of the application provides the following technical scheme: a vertical field data integration method based on a large model comprises the following steps:

receiving query sentences input by a user through vertical domain agents corresponding to each vertical domain, wherein the different vertical domains are respectively provided with the corresponding vertical domain agents;

invoking a pre-trained large-scale model, and identifying the query intention of the query statement through the large-scale model;

in the vertical domain agent, according to the query intention, a preset external program interface is called to extract real-time data corresponding to the query intention, and/or a pre-trained extraction large model is called to extract vertical domain knowledge corresponding to the query intention;

and calling a pre-trained digest big model, and integrating the obtained real-time data and/or the obtained vertical domain knowledge into response data of the query statement.

According to one embodiment of the present application, in the vertical domain agent, according to the query intention, a preset external program interface is called to extract real-time data corresponding to the query intention, and/or a pre-trained extraction large model is called to extract vertical domain knowledge corresponding to the query intention, including:

setting a corresponding knowledge system DomainArgent for each knowledge system in the vertical domain agent according to different knowledge systems included in the vertical domain;

determining a knowledge system DomainArgent corresponding to the query intention through the intent big model;

and in the determined knowledge system DomainArgent, calling the external program interface to extract real-time data corresponding to the query intention, and/or calling the extraction large model to extract vertical domain knowledge corresponding to the query intention.

According to an embodiment of the present application, in the determined knowledge system domainagant, invoking the external program interface to extract real-time data corresponding to the query intention, and/or invoking the extraction large model to extract vertical domain knowledge corresponding to the query intention, including:

setting a corresponding sub-knowledge system domainage for each knowledge attribute in each knowledge system domainage according to different knowledge attributes included in different knowledge systems;

determining a knowledge system DomainArgent corresponding to the query intention and a sub-knowledge system DomainArgent corresponding to the knowledge system DomainArgent through the intent big model;

and in the determined sub-knowledge system DomainArgent, calling the external program interface to extract real-time data corresponding to the query intention, and/or calling the extraction large model to extract vertical domain knowledge corresponding to the query intention.

According to one embodiment of the present application, further comprising:

and a data analysis port, a knowledge extraction port and a logic integration port are respectively arranged in each knowledge system DomainArgent and each sub-knowledge system DomainArgent, so that a preset external program interface is called through the data analysis port, real-time data corresponding to the query intention is extracted, a pre-trained extraction large model is called through the knowledge extraction port, vertical domain knowledge corresponding to the query intention is extracted to obtain the vertical domain knowledge, and the pre-trained digest large model is called through the logic integration port, so that the obtained real-time data and/or the vertical domain knowledge are integrated into response data of the query sentence.

According to one embodiment of the present application, identifying, by the intent big model, a query intent of the query statement includes:

and identifying the type of the query intention of the query statement through the intent big model, wherein the type of the query intention comprises a real-time data query and a vertical domain knowledge query.

According to one embodiment of the present application, further comprising:

and calling training data according to the received model training instruction, respectively training an intent large model to be trained, an extraction large model to be trained and a digest large model to be trained, and respectively obtaining the intent large model for identifying query intention, the extraction large model for extracting knowledge in the vertical field and the digest large model for data integration.

According to one embodiment of the present application, the large model to be trained extraction and the large model to be trained digest are large models based on a transform architecture.

The embodiment of the application also provides a vertical field data integration device based on the large model, which comprises the following steps:

the query receiving module is used for receiving query sentences input by a user through the vertical domain agents corresponding to each vertical domain, wherein the corresponding vertical domain agents are respectively arranged in different vertical domains;

the intention recognition module is used for calling a pre-trained large intent model and recognizing the query intention of the query statement through the large intent model;

the data acquisition module is used for calling a preset external program interface to extract real-time data corresponding to the query intention according to the query intention in the vertical domain agent, and/or calling a pre-trained extraction large model to extract vertical domain knowledge corresponding to the query intention;

and the data integration module is used for calling a pre-trained digest big model and integrating the obtained real-time data and/or the obtained vertical domain knowledge into response data of the query statement.

The embodiment of the application also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the vertical field data integration method based on the large model when executing the computer program.

The embodiment of the application also provides a computer readable storage medium, which stores a computer program for executing the large model-based vertical domain data integration method.

Compared with the prior art, the beneficial effects achieved by the at least one technical scheme adopted by the embodiment of the application at least comprise:

(1) High-efficiency integration: and the intelligent language understanding capability of the large language model is utilized to realize the efficient integration of knowledge, data and logic in the vertical field.

(2) The method comprises the following steps of: and the application accuracy and reliability of the model in the vertical field are improved by combining a domain-specific knowledge base, a data source and logic rules.

(3) And (3) automatic treatment: through the large pre-trained language model, automatic knowledge, data and logic processing is realized, and labor and time cost are saved.

(4) Scalability: the method can be applied to different vertical fields, and can be expanded to knowledge integration and application in different fields through customized field knowledge bases and rules.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for integrating data in the vertical domain according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a vertical domain proxy framework in accordance with an embodiment of the present application;

FIG. 3 is a block diagram of a vertical field data integration apparatus according to an embodiment of the application;

fig. 4 is a schematic structural view of the computer device of the present application.

Detailed Description

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

As shown in fig. 1, an embodiment of the present application provides a vertical domain data integration method based on a large model, including:

s101, receiving query sentences input by a user through vertical domain agents corresponding to each vertical domain, wherein the corresponding vertical domain agents are respectively arranged in different vertical domains;

a large model refers to a machine learning model with a large number of parameters and complex structures. These models can be applied to deal with large-scale data and complex problems. Traditional machine learning models, such as logistic regression, decision trees, naive bayes, etc., are smaller in scale and can only process small amounts of data. The deep learning model can contain millions of parameters and process massive data. The ultra-large-scale deep learning model can even reach the parameters of the billions level, and the ultra-large-scale deep learning model needs to be trained by using a super computer. The large model has the following advantages: (1) the capability of processing large-scale data is strong. The large model can process mass data, thereby improving the accuracy and generalization capability of the machine learning model. (2) the capability of handling complex problems is strong. Large models have higher complexity and greater flexibility and can handle more complex problems. (3) higher accuracy and performance. The large model has more parameters and more complex structures, and can more accurately express data distribution and learn more complex features, so that the accuracy and performance of the model are improved.

Vertical domain refers to an enterprise or product that, in a particular industry, domain, or market segment, is focused on a particular segment. These businesses or products often have in-depth industry knowledge and expertise, as well as customized services or products for a particular target audience.

In this embodiment, the vertical domain Agent (Agent) refers to an intelligent program specifically built for a specific task or problem domain in a specific vertical domain (such as medical, finance, and travel), and provides targeted services, solutions, or information by understanding and processing the expertise and data of the domain. It corresponds to a large set, and integrates multiple functional endpoints (ports) as a whole in one domain.

S102, calling a pre-trained large intent model (large intent recognition model), and recognizing the query intent of the query statement through the large intent model;

s103, in the vertical domain agent, according to the query intention, calling a preset external program interface to extract real-time data corresponding to the query intention, and/or calling a pre-trained extraction large model (extraction large model) to extract vertical domain knowledge corresponding to the query intention;

s104, calling a previously trained digest big model (understanding big model), and integrating the obtained real-time data and/or the obtained vertical domain knowledge into response data of the query statement.

In one embodiment, in the vertical domain agent, according to the query intention, a preset external program interface is called to extract real-time data corresponding to the query intention, and/or a pre-trained extraction large model is called to extract vertical domain knowledge corresponding to the query intention, including:

setting a corresponding knowledge system DomainArgent for each knowledge system in the vertical domain agent according to different knowledge systems included in the vertical domain; determining a knowledge system DomainArgent corresponding to the query intention through the intent big model; and in the determined knowledge system DomainArgent, calling the external program interface to extract real-time data corresponding to the query intention, and/or calling the extraction large model to extract vertical domain knowledge corresponding to the query intention.

In this embodiment, the domain Agent is a knowledge system subdivided under a vertical domain Agent, for example, the vertical domain Agent is an atmospheric environment Agent, and the domain Agent can be divided into ozone domain Agent and pm2.5domain Agent, which are constructed by dividing according to a specific knowledge system in the domain.

In a further embodiment, the method further comprises: setting a corresponding sub-knowledge system domainage for each knowledge attribute in each knowledge system domainage according to different knowledge attributes included in different knowledge systems; determining a knowledge system DomainArgent corresponding to the query intention and a sub-knowledge system DomainArgent corresponding to the knowledge system DomainArgent through the intent big model; and in the determined sub-knowledge system DomainArgent, calling the external program interface to extract real-time data corresponding to the query intention, and/or calling the extraction large model to extract vertical domain knowledge corresponding to the query intention.

When the embodiment is implemented, the multi-level DomainAgents can be divided according to the complexity of knowledge systems in different vertical fields, so that more accurate information can be obtained.

In one embodiment, further comprising: and a data analysis port, a knowledge extraction port and a logic integration port are respectively arranged in each knowledge system DomainArgent and each sub-knowledge system DomainArgent, so that a preset external program interface is called through the data analysis port, real-time data corresponding to the query intention is extracted, a pre-trained extraction large model is called through the knowledge extraction port, vertical domain knowledge corresponding to the query intention is extracted to obtain the vertical domain knowledge, and the pre-trained digest large model is called through the logic integration port, so that the obtained real-time data and/or the vertical domain knowledge are integrated into response data of the query sentence.

In one embodiment, identifying the query intent of the query statement by the intent big model further comprises: and identifying the type of the query intention of the query statement through the intent big model.

In this embodiment, the types of the user intention include real-time data query and vertical domain knowledge query, firstly, judging which type the user intention belongs to through the intent big model, and if the user intention is the real-time data query, extracting real-time data corresponding to the user intention through a pre-designed external program interface; if the user intention is the vertical domain knowledge query, invoking an extraction large model which is trained in advance and used for extracting the vertical domain knowledge to extract the vertical domain knowledge corresponding to the user intention; if the user intention comprises real-time data query and vertical domain knowledge, extracting real-time data corresponding to the user intention through a pre-designed external program interface, and simultaneously calling a pre-trained extraction large model for extracting the vertical domain knowledge to extract the vertical domain knowledge corresponding to the user intention.

The embodiment of the application mainly comprises the following steps when being implemented in particular:

step one: the intent big model (intent recognition big model), the extraction big model (extraction big model), the digest big model (understanding big model) training.

The intent big model is used to analyze the user's input text or speech to determine the user's query intent or purpose. The large model of the intent can learn rich language knowledge and semantic understanding capability by pre-training a large amount of language data. The procedure for the large model construction is as follows: the model architecture is based on a transducer. Representative dialog data, user queries, or text data are collected, covering different intents, contexts, and expressions. The data is generated by manual annotation, existing data sets, or synthetic data. The data is fed into the model for training, and the model is optimized and iterated.

The extraction large model is used for extracting structural information such as specific entities, relations, attributes and the like from the text. The extraction large model can be used for pre-training large-scale text data, and has the capability of understanding and analyzing the text. The method for constructing the extraction large model comprises the following steps: the model architecture is based on a transducer. Text data containing target information, such as articles, news, comments, conversations, etc., is collected covering the particular content that is desired to be extracted and analyzed. The data is fed into the model for training, and the model is optimized and iterated.

The digest big model is used for summarizing and summarizing a plurality of results obtained by the big model, so that the finally output text is easy for human to read. The digest large model can be trained by a large number of high quality corpora containing the knowledge of the branches and summarized abstract knowledge. The method for constructing the digest big model comprises the following steps: the model architecture is based on a transducer. Representative text data, articles, documents, news, conversations, etc., including summaries or summaries, are collected. The data is fed into the model for training, and the model is optimized and iterated.

Step two: vertical domain agents (proxies) are built for different vertical domains. Each Agent contains several knowledge systems DomainAgents and functional Endpoint. Functional Endpoint consists of DataEndpoint (data analysis port), knowledgeEndpoint (knowledge extraction port), and LogicEndpoint (logic integration port). The DataEndpoint is responsible for extracting real-time data, and comprises a plurality of designed interfaces; knowledgeEndpoint is responsible for vertical domain knowledge extraction; logicEndpoint is responsible for analysis functions.

Taking an ecological environment as an example, a vertical domain Agent, a corresponding knowledge system DomainAgent, and a function Endpoint can be constructed, as shown in fig. 2.

Step three: user question input and intention judgment. The user inputs the initial vertical domain Agent, judges which knowledge system DomainArgent is entered through the large-scale model of the Agent, and judges whether the user enters the next knowledge system DomainArgent or invokes the function of the Endpoint through the large-scale model of the Agent.

For example, the user asks "what is the definition of fine particulate matter, and what is the PM2.5 concentration in the city a today? The system operation flow is as follows: firstly, inputting an Agent entering an ecological environment by a user; identifying that the user question relates to the atmospheric field through the large intent model, and entering an atmospheric environment Agent; and recognizing that the user question relates to data query and expertise through the large-scale intent model, and entering into DataEndpoint and KnowledgeEndpoint.

Step four: and specific function call. After intent determination, invoking an Endpoint that is needed to identify the user's question to be used.

Calling an interface of inquiring PM2.5 concentration of a certain city in a knowledge system DataEndpoint, calling knowledgeEndpoint, and extracting relevant knowledge through an extraction large model. Each Endpoint called will form a corresponding result. The result returned by the knowledge system DataEndpoint is "2023, 7, 3 day a city PM2.5 concentration is 45 micrograms per cubic meter"; the KnowledgeEndpoint returns the result of "fine particulate matter is also known as fines, PM. Fine particulate matter refers to particulate matter having an aerodynamic equivalent diameter of 2.5 microns or less in ambient air.

Step five: and (5) integrating results. The results formed by the called end points are input into a big model of the digest, the big model of the digest can sort, summarize, semantic optimize and the like the results, and final results are output, so that the results are easy to read by human beings.

The final result is: the fine particulate matter is also called fine particles, PM. Fine particulate matter refers to particulate matter having an aerodynamic equivalent diameter of 2.5 microns or less in ambient air. Today the PM2.5 concentration in city a is 45 micrograms per cubic meter.

According to the embodiment of the application, three large models of an intent large model, an extraction large model and a digest large model are trained, each model is responsible for different functions, the intent large model is responsible for intention recognition, the extraction large model is responsible for integrating knowledge in the vertical field, and the digest large model is responsible for integrating results. The intent big model can accept a variety of input types, such as text, images, audio, etc., and fuse them together for intent classification. The intent big model can better handle multi-modal scenes, providing more accurate and comprehensive intent understanding. The Extraction large model can process entity Extraction with multiple granularities, from word level to phrase level and even sentence level, and can effectively capture the relation and context information of cross sentences. The digest large model can integrate and generalize the results of texts, images and audios, and is easy for users to read.

On the basis of training a large model, knowledge of multiple domains can be fused into a dialogue by constructing vertical domain agents and functional end points. Enabling large models to answer more complex and specialized questions and provide more comprehensive and accurate information.

As shown in fig. 3, an embodiment of the present application further provides a vertical domain data integration apparatus 200 based on a large model, including:

a query receiving module 201, configured to receive, through a vertical domain agent corresponding to each vertical domain, a query sentence input by a user, where different vertical domains are respectively provided with a corresponding vertical domain agent;

the intention recognition module 202 is used for calling a pre-trained large intent model, and recognizing the query intention of the query statement through the large intent model;

the data obtaining module 203 is configured to invoke a preset external program interface to extract real-time data corresponding to the query intention according to the query intention in the vertical domain agent, and/or invoke a pre-trained extraction large model to extract vertical domain knowledge corresponding to the query intention;

and the data integration module 204 is configured to invoke a pre-trained digest big model, and integrate the obtained real-time data and/or the obtained vertical domain knowledge into response data of the query statement.

In one embodiment, the data obtaining module 203 is further configured to set, in the vertical domain agent, a corresponding knowledge system domainagant for each of the knowledge systems according to different knowledge systems included in the vertical domain; determining a knowledge system DomainArgent corresponding to the query intention through the intent big model; and in the determined knowledge system DomainArgent, calling the external program interface to extract real-time data corresponding to the query intention, and/or calling the extraction large model to extract vertical domain knowledge corresponding to the query intention.

In one embodiment, the data obtaining module 203 is further configured to set, in each knowledge hierarchy domainagant, a corresponding sub-knowledge hierarchy domainagant for each knowledge attribute according to different knowledge attributes included in different knowledge hierarchies; determining a knowledge system DomainArgent corresponding to the query intention and a sub-knowledge system DomainArgent corresponding to the knowledge system DomainArgent through the intent big model; and in the determined sub-knowledge system DomainArgent, calling the external program interface to extract real-time data corresponding to the query intention, and/or calling the extraction large model to extract vertical domain knowledge corresponding to the query intention.

In an embodiment, the data obtaining module 203 is further configured to set a data analysis port, a knowledge extraction port, and a logic integration port in each of the knowledge system domainage and each of the sub-knowledge system domainage respectively, so as to invoke a preset external program interface through the data analysis port, extract real-time data corresponding to the query intent, invoke a pre-trained extraction big model through the knowledge extraction port, extract vertical domain knowledge corresponding to the query intent, obtain the vertical domain knowledge, invoke a pre-trained digest big model through the logic integration port, and integrate the obtained real-time data and/or the vertical domain knowledge into response data of the query statement.

In one embodiment, the intent recognition module 202 is further configured to recognize a type of query intent of the query statement via the intent big model, wherein the type of query intent includes a real-time data query and a vertical domain knowledge query.

In one embodiment, the vertical domain data integration apparatus 200 further includes a model training module, configured to invoke training data according to a received model training instruction, and train an intent big model to be trained, an extraction big model to be trained, and a digest big model to be trained, to respectively obtain the intent big model for identifying query intention, the extraction big model for extracting vertical domain knowledge, and the digest big model for data integration.

In one embodiment, a computer device is provided, as shown in fig. 4, including a memory 301, a processor 302, and a computer program stored on the memory and executable on the processor, which when executed implements any of the large model-based vertical domain data integration methods described above.

In particular, the computer device may be a computer terminal, a server or similar computing means.

In the present embodiment, there is provided a computer-readable storage medium storing a computer program for executing any of the above-described large model-based vertical domain data integration methods.

In particular, computer-readable storage media, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable storage media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the application are not limited to any specific combination of hardware and software.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present application should be included in the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. The vertical domain data integration method based on the large model is characterized by comprising the following steps of:

2. The vertical domain data integration method based on the large model according to claim 1, wherein in the vertical domain agent, according to the query intention, invoking a preset external program interface to extract real-time data corresponding to the query intention, and/or invoking a pre-trained extraction large model to extract vertical domain knowledge corresponding to the query intention, including:

3. The method for integrating vertical domain data based on large model according to claim 2, wherein in the determined knowledge system domainagant, invoking the external program interface to extract real-time data corresponding to the query intention, and/or invoking the extraction large model to extract vertical domain knowledge corresponding to the query intention, comprises:

4. The large model based vertical domain data integration method of claim 3, further comprising:

5. The large model-based vertical domain data integration method according to any one of claims 1 to 4, wherein identifying a query intent of the query statement by the intent large model comprises:

6. The large model-based vertical domain data integration method according to any one of claims 1 to 4, further comprising:

7. The method of claim 6, wherein the large model to be trained, the large model to be trained extraction and the large model to be trained digest are large models based on a transform architecture.

8. A large model-based vertical domain data integration apparatus, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the large model based vertical domain data integration method of any one of claims 1 to 7 when the computer program is executed.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the large model-based vertical domain data integration method of any one of claims 1 to 7.