CN116975192A

CN116975192A - Large language model operation chain device and method based on tree structure

Info

Publication number: CN116975192A
Application number: CN202310898633.8A
Authority: CN
Inventors: 郭红森
Original assignee: Shanghai Shuheng Information Technology Co ltd
Current assignee: Shanghai Shuheng Information Technology Co ltd
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-10-31

Abstract

The invention relates to a large language model operation chain device and a method based on a tree structure, comprising the following steps: the hierarchical structure processing module is used for decomposing the large-scale language model into a plurality of sub-models and then organizing according to the tree structure; the context modeling module is used for extracting context information from input data and determining a sub-model path to be calculated according to the extracted context information; the computing resource allocation module is used for receiving the sub-model path determined by the context modeling module, computing resource requirements according to the sub-model path, and dynamically allocating computing resources for the computing process; the model parameter storage module is used for storing parameters of the large language model and sub-model parameters thereof; compared with the prior art, the method can realize the efficient operation of a large-scale language model under the condition of limited computing resources, reduce the waste of computing resources, maintain higher prediction accuracy, and solve the problems and disadvantages of the prior art in the aspects of computing resource requirements, context modeling, computing speed and the like.

Description

Large language model operation chain device and method based on tree structure

[ technical field ]

The invention relates to the technical field of language processing, in particular to a large language model operation chain device and method based on a tree structure.

[ background Art ]

In the prior art, the operation of a large-scale language model mainly depends on high-performance computing devices, such as GPUs, TPUs, and the like. The devices can support complex parallel computation, and satisfy high-speed multiplication and addition operation of a large-scale language model weight matrix. However, in the prior art, the language model is mainly operated by adopting a single-level structure and a fixed computing resource allocation mode. Such a technical means mainly has the following problems and drawbacks:

(1) The computational resource requirements increase with model size: along with the expansion of the scale of the language model, the computing resource demand grows exponentially; when the model scale reaches a certain degree, the traditional computing equipment such as GPU, TPU and other resources may not meet the operation requirements; today, the scale of large-scale language models is expected to continue to grow as the accuracy requirements for various types of natural language processing tasks increase.

(2) Fixed computing resource allocation scheme: the existing large-scale language model operation technology generally adopts a fixed strategy when computing resources are distributed; for language model applications under different scenarios and requirements, a fixed allocation of computing resources may result in wasted resources and performance bottlenecks.

(3) Lack of modeling of context: the prior art does not fully utilize the context information when computing a large-scale language model; optimizing modeling context information can effectively reduce the calculated amount and improve the calculation speed; in addition, for specific tasks or input data, context modeling can help filter irrelevant information, and improve the accuracy of reasoning results.

(4) The calculation complexity is high: in the prior art, the operation complexity of a large-scale language model is high; the reason for this is that the multiplication and addition of the weight matrix causes the calculation amount to increase exponentially with the increase of depth and width; this presents challenges for demand and performance optimization of computing resources.

In summary, the prior art has not been able to effectively solve the problem of the reasoning speed of the large-scale language model under the limited computing resources; aiming at the problem, the method can provide a large language model operation chain device based on a tree structure so as to solve the problems and disadvantages of the prior art in the aspects of calculation resource requirement, context modeling, calculation speed and the like, and has very important significance.

[ summary of the invention ]

The invention aims to solve the defects and provide a large language model operation chain device based on a tree structure, which can realize the efficient operation of a large-scale language model under the condition of limited computing resources, reduce the waste of computing resources, keep higher prediction accuracy, has good usability and expansibility, and solves the problems and the disadvantages of the prior art in the aspects of computing resource requirements, context modeling, computing speed and the like.

The large language model operation chain device based on the tree structure comprises a hierarchical structure processing module, a context modeling module, a computing resource allocation module and a model parameter storage module, wherein the hierarchical structure processing module is used for decomposing a large-scale language model into a plurality of sub-models and then organizing according to the tree structure; the context modeling module is used for extracting context information from input data and determining a sub-model path to be calculated according to the extracted context information; the computing resource allocation module is used for receiving the sub-model path determined by the context modeling module, computing resource requirements according to the sub-model path, and dynamically allocating computing resources for the computing process; and the model parameter storage module is used for storing parameters of the large language model and sub-model parameters thereof.

Further, in the hierarchical structure processing module, each node of the tree structure corresponds to one sub-model, and the connection between each node represents the association between the sub-models.

Further, in the context modeling module, the calculation amount is reduced by using the constraint of the context information, and the irrelevant nodes are filtered.

Further, in the model parameter storage module, model parameters are updated through a training process and used for calculating an output result in an reasoning process.

The invention also provides a large language model operation chain method based on the tree structure, which comprises the following steps: (1) Firstly, input data is sent to a context modeling module, context information is extracted, and a sub-model path required to be calculated is determined; (2) The computing resource allocation module dynamically allocates computing resources according to the path computing resource requirements of the sub-models, and realizes parallel computing among the sub-models; (3) The input data and the calculation resource allocation result are transmitted to a hierarchical structure processing module, the hierarchical structure processing module is utilized to decompose the large-scale language model into a plurality of sub-models, the sub-models are organized according to a tree structure, and then efficient operation is carried out on paths of the sub-models; (4) And after the calculation is completed, processing the calculation result to form a prediction output result.

Further, in the step (4), after the calculation is completed, the parameter storage module is used for storing the parameters of the large language model and the parameters of each sub-model thereof, so as to calculate an output result in the reasoning process.

Further, in step (1), the design context modeling module is configured to extract key information from the input data, and the pre-trained classifier or clustering algorithm determines a sub-model path to be calculated according to the extracted information.

Further, when a question to be answered is input, the context modeling module extracts useful information from the question and determines a computational submodel path; and after the calculation resource allocation module dynamically allocates resources, calculating a required submodel in the tree structure, and finally generating corresponding answers according to a knowledge base or other data sources.

Further, when inputting speech exercise data to be modified, the context modeling module extracts useful information from the data and determines a computational submodel path; after the computing resource allocation module dynamically allocates resources, a required submodel is computed in the tree structure, and finally, the score and feedback of the voice exercise are generated.

Further, when inputting the writing exercise text to be modified, the context modeling module extracts useful information from the text and determines a calculation sub-model path; and after the calculation resource allocation module dynamically allocates resources, calculating a required submodel in the tree structure, and finally generating the score and feedback of the writing exercise.

Compared with the prior art, the invention has the following advantages:

(1) Reducing computing resource requirements: according to the invention, the large-scale language model is decomposed into a plurality of sub-models through the hierarchical structure processing module, so that the scale of a single model is reduced, and the demand of computing resources is reduced, thereby the invention can still realize high-efficiency operation under the condition of limited computing resources, and is suitable for the demands of different performance devices.

(2) The calculation speed is improved: the method utilizes the context modeling module to extract and restrict the calculation path of the sub-model, reduces the calculated amount and filters out irrelevant sub-models; the computing resource allocation module performs intelligent allocation according to the input data and the sub-model computing resource requirements, and fully utilizes computing resources, so that the computing speed is improved.

(3) High accuracy is maintained: the invention adopts the tree structure to organize the submodel, thereby reducing the calculation complexity and simultaneously keeping higher accuracy; in addition, the context modeling module can filter irrelevant information according to input data, and the prediction accuracy of the model is improved.

(4) Flexible adaptation to different scenarios: the invention adopts a modularized design, and each module can be flexibly adjusted and optimized, so that the invention is suitable for language model application under different scenes and requirements; for example, the tree structure hierarchy in the hierarchical processing module may be optimized according to specific task requirements, or the policies of the computing resource allocation module may be adjusted according to computing device performance.

(5) Environmental protection: the invention reduces the energy consumption while reducing the demand of computing resources, and is beneficial to green computing and environmental protection.

(6) Easy to use and expand: the structure and the working principle of the invention are clear and easy to understand and implement; meanwhile, the invention has good expansibility and universality due to the modularized design, and can be applied to various natural language processing, machine learning and other related fields.

In summary, the invention can realize the efficient operation of a large-scale language model under the condition of limited computing resources, reduce the waste of computing resources, keep higher prediction accuracy and have good usability and expansibility; therefore, the invention has wide application value and can be applied to a plurality of natural language processing tasks such as chat robots, real-time translation, knowledge graph construction and the like.

[ description of the drawings ]

FIG. 1 is a schematic diagram of the structure of the present invention;

fig. 2 is a schematic flow chart of the present invention.

Detailed description of the preferred embodiments

The invention provides a large language model operation chain device and a large language model operation chain method based on a tree structure, which can effectively solve the problems and disadvantages in the aspects of calculation resource requirements, context modeling, calculation speed and the like in the prior art.

The invention is further described below with reference to the accompanying drawings:

as shown in fig. 1, the invention mainly comprises a hierarchical structure processing module (module 1), a context modeling module (module 2), a computing resource allocation module (module 3), a model parameter storage module (module 4) and other components. In particular, the method comprises the steps of,

hierarchical processing module (module 1): the module is responsible for decomposing the large-scale language model into a plurality of sub-models and then organizing according to a tree structure. Each node of the tree structure corresponds to one sub-model, and the connection between the nodes represents the association between the sub-models. By decomposing and reorganizing the large-scale language model, the scale of the individual models is reduced, thereby reducing the computational resource requirements.

Context modeling module (module 2): the module extracts context information from the input data and determines the sub-model path to be calculated based on the extracted context information. And the constraint of the context information is utilized, the calculated amount is reduced, and irrelevant nodes are filtered, so that the accuracy of a prediction result is improved.

Computing resource allocation module (module 3): the module receives the sub-model paths determined by the context modeling module and dynamically allocates computing resources for the computing process. And according to the input data and the sub-model computing resource requirements, intelligently adjusting and distributing resources, fully utilizing the computing resources and improving the computing speed.

Model parameter storage module (module 4): the component is responsible for storing parameters of the large language model and its sub-model parameters. Model parameters are updated through a training process and are used for calculating output results in an reasoning process.

The specific working steps and working principles of the invention are as follows:

(1) First, the input data is fed into a context modeling module (module 2), which extracts the context information and determines the sub-model paths to be calculated.

(2) The computing resource allocation module (module 3) dynamically allocates computing resources according to the sub-model paths.

(3) The input data and the calculation resource allocation result are transmitted to a hierarchical processing module (module 1) to perform efficient operation on the sub-model path.

(4) After the calculation is completed, the calculation result is processed (such as activating function, normalizing and the like) to form a prediction output result.

Through the technical scheme, the method and the device realize the efficient operation of the large-scale language model, and simultaneously, the advantage of the tree structure is utilized, so that the model can keep higher accuracy while the calculation complexity is reduced. Thus, the present invention is applicable to natural language processing, machine learning, and other related fields.

The invention is further illustrated below in connection with specific examples:

example 1: intelligent question-answering system

The invention can be applied to constructing an intelligent question-answering system for generating corresponding answers according to questions presented by users. In this embodiment 1, a tree-structure-based large language model operation chain device will be employed to realize efficient operation. The specific implementation process is as follows:

(1) The large-scale question-answering model is decomposed into a plurality of sub-models and organized according to a tree structure. Each sub-model represents a portion of the weights in the original model, such as vocabulary, grammar rules, domain knowledge, and the like.

(2) The context modeling module is designed to extract useful information, such as keywords, entities, domain types, etc., from the input questions. A pre-trained classifier or clustering algorithm determines the sub-model paths to be calculated for the input problem based on the extracted information.

(3) The computing resource allocation module dynamically allocates computing resources according to the input data and the sub-model computing resource requirements. According to task demands and computing device performances, parallel computation among different sub-models is realized in the multi-core processor environment.

(4) Parameters of the large language question-answer model and sub-model parameters thereof are stored. Used in the reasoning process to calculate the output result.

(5) When a user inputs a question to be answered, the context modeling module extracts useful information from the question and determines a computational submodel path. And after the calculation resource allocation module dynamically allocates resources, calculating a required submodel in the tree structure, and finally generating corresponding answers according to a knowledge base or other data sources in the system.

Therefore, the tree-structure-based large language model operation chain device is applied to the intelligent question-answering system, so that the problem raised by a user can be effectively solved under the condition of limited computing resources. In addition, the embodiment 1 maintains higher accuracy while reducing the computation complexity, and formulates a flexible computation resource allocation strategy, so that the method can be widely applied to various natural language processing scenes.

Example 2: intelligent voice training scoring system

The invention can be applied to construct an intelligent speech training scoring system for evaluating and providing feedback for speech exercises submitted by a user. The specific implementation process is as follows:

(1) The large-scale speech scoring model is decomposed into a plurality of sub-models and organized according to a tree structure. Each sub-model represents a portion of the weights in the original model, such as pronunciation accuracy, speech speed, intonation, fluency, etc.

(2) And the design context modeling module is used for extracting key information such as voice characteristics, grammar structures, semantic information and the like from the input voice data. A pre-trained classifier or clustering algorithm determines the sub-model paths to be calculated according to the extracted information.

(4) Parameters of a large language speech scoring model and sub-model parameters thereof are stored. Used in the reasoning process to calculate the output result.

(5) When the user inputs speech exercise data to be modified, the context modeling module extracts useful information from the data and determines a computational submodel path. After the computing resource allocation module dynamically allocates resources, a required submodel is computed in the tree structure, and finally, the score and feedback of the voice exercise are generated.

Therefore, by using the intelligent voice training scoring system constructed by the invention, a user can obtain evaluation and feedback of voice exercise in a short time, so that learning and exercise effects are effectively improved. Meanwhile, the invention realizes high-efficiency operation under the condition of limited computing resources, and reduces the running cost of the system.

Example 3: athens writing correction system

The invention can be applied to the construction of an elegance (iles) writing modification system for evaluating and providing feedback on elegance writing exercises submitted by a user. The specific implementation process is as follows:

(1) The large-scale yawing writing correction model is decomposed into a plurality of sub-models and organized according to a tree structure. Each sub-model represents a portion of the weights in the original model, such as scoring criteria, grammar rules, vocabulary, etc.

(2) And the design context modeling module is used for extracting key information such as keywords, topics, entities and the like from the input Athens writing text. A pre-trained classifier or clustering algorithm determines the sub-model paths to be calculated according to the extracted information.

(4) The parameters of the large language elegance writing correction model and the sub-model parameters thereof are stored. Used in the reasoning process to calculate the output result.

(5) When the user enters the Athens writing exercise text to be modified, the context modeling module extracts useful information from the text and determines a computational submodel path. And after the computing resource allocation module dynamically allocates resources, computing a required submodel in the tree structure, and finally generating the score and feedback of the Athens writing exercise.

Therefore, by using the yawing writing correction system constructed by the invention, a user can obtain evaluation and feedback of the yawing writing exercise in a short time, so that the learning and exercise effects are effectively improved. Meanwhile, the invention realizes high-efficiency operation under the condition of limited computing resources, and reduces the running cost of the system.

The foregoing is merely a specific implementation of the embodiment of the present invention, but the protection scope of the embodiment of the present invention is not limited to this, and any changes or substitutions within the technical scope disclosed in the embodiment of the present invention should be covered in the protection scope of the embodiment of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims

1. A tree-structure-based large language model operation chain device, comprising:

the hierarchical structure processing module is used for decomposing the large-scale language model into a plurality of sub-models and then organizing according to the tree structure;

the context modeling module is used for extracting context information from input data and determining a sub-model path to be calculated according to the extracted context information;

the computing resource allocation module is used for receiving the sub-model path determined by the context modeling module, computing resource requirements according to the sub-model path, and dynamically allocating computing resources for the computing process;

and the model parameter storage module is used for storing parameters of the large language model and sub-model parameters thereof.

2. The tree-structure-based large language model operation chain device according to claim 1, wherein: in the hierarchical structure processing module, each node of the tree structure corresponds to one sub-model, and the connection between the nodes represents the association between the sub-models.

3. The tree-structure-based large language model operation chain device according to claim 1, wherein: in the context modeling module, the calculation amount is reduced by utilizing the constraint of the context information, and irrelevant nodes are filtered.

4. The tree-structure-based large language model operation chain device according to claim 1, wherein: in the model parameter storage module, model parameters are updated through a training process and are used for calculating an output result in an reasoning process.

5. The tree structure-based large language model operation chain method is characterized by comprising the following steps of:

(1) Firstly, input data is sent to a context modeling module, context information is extracted, and a sub-model path required to be calculated is determined;

(2) The computing resource allocation module dynamically allocates computing resources according to the path computing resource requirements of the sub-models, and realizes parallel computing among the sub-models;

(3) The input data and the calculation resource allocation result are transmitted to a hierarchical structure processing module, the hierarchical structure processing module is utilized to decompose the large-scale language model into a plurality of sub-models, the sub-models are organized according to a tree structure, and then efficient operation is carried out on paths of the sub-models;

(4) And after the calculation is completed, processing the calculation result to form a prediction output result.

6. The tree-structure-based large language model operation chain method according to claim 5, wherein: in the step (4), after the calculation is completed, the parameter storage module is used for storing the parameters of the large language model and the parameters of each submodel thereof so as to calculate the output result in the reasoning process.

7. The tree-structure-based large language model operation chain method according to claim 5, wherein: in step (1), the design context modeling module is used for extracting key information from input data, and the pre-trained classifier or clustering algorithm determines a sub-model path to be calculated according to the extracted information.

8. The tree-structure-based large language model operation chain method according to claim 5, wherein: when a question to be answered is input, the context modeling module extracts useful information from the question and determines a calculation sub-model path; and after the calculation resource allocation module dynamically allocates resources, calculating a required submodel in the tree structure, and finally generating corresponding answers according to a knowledge base or other data sources.

9. The tree-structure-based large language model operation chain method according to claim 5, wherein: when inputting voice exercise data to be modified, the context modeling module extracts useful information from the data and determines a calculation sub-model path; after the computing resource allocation module dynamically allocates resources, a required submodel is computed in the tree structure, and finally, the score and feedback of the voice exercise are generated.

10. The tree-structure-based large language model operation chain method according to claim 5, wherein: when inputting the writing exercise text to be corrected, the context modeling module extracts useful information from the text and determines a calculation sub-model path; and after the calculation resource allocation module dynamically allocates resources, calculating a required submodel in the tree structure, and finally generating the score and feedback of the writing exercise.