CN117971420A - Task processing, traffic task processing and task processing model training method - Google Patents

Task processing, traffic task processing and task processing model training method Download PDF

Info

Publication number
CN117971420A
CN117971420A CN202311839740.XA CN202311839740A CN117971420A CN 117971420 A CN117971420 A CN 117971420A CN 202311839740 A CN202311839740 A CN 202311839740A CN 117971420 A CN117971420 A CN 117971420A
Authority
CN
China
Prior art keywords
processing
model
unit
task
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311839740.XA
Other languages
Chinese (zh)
Inventor
呼思乐
王逸群
张永岗
沈旭
叶杰平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Original Assignee
Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Alibaba Cloud Feitian Information Technology Co ltd filed Critical Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Priority to CN202311839740.XA priority Critical patent/CN117971420A/en
Publication of CN117971420A publication Critical patent/CN117971420A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a task processing method, a traffic task processing method and a task processing model training method, wherein the task processing method comprises the following steps: acquiring data to be processed aiming at a target task; inputting data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between a model processing result of the initial processing model and a reverse model processing result. By determining the key processing units in the initial processing model based on the difference between the model processing result and the inverse model processing result, the key processing units determining the model decision process are determined, the interpretability and transparency of the model are improved, and the interpretability of the task processing result is further improved.

Description

Task processing, traffic task processing and task processing model training method
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to task processing, traffic task processing and a task processing model training method.
Background
With the development of computer technology, artificial intelligent models begin to enlarge the wonderful colors, and the remarkable capability is shown in terms of language understanding, generation, interaction and reasoning, and the artificial intelligent models are widely applied to the processing fields of dialogue, translation, code generation and the like.
However, the decision making process and interpretation mechanism of the artificial intelligence model are not clear, the ability to interpret the artificial intelligence model is critical to the trust and acceptance of the modeling process, and knowing the reasoning process of the artificial intelligence model can make the user and stakeholder more receptive to and trust the results of the model, so a task processing scheme with high interpretability is needed.
Disclosure of Invention
In view of this, the present embodiment provides a task processing method. One or more embodiments of the present disclosure relate to a traffic task processing method, a task processing model training method, a task processing device, a traffic task processing device, a task processing model training device, a computing device, a computer readable storage medium, and a computer program, which solve the technical drawbacks of the prior art.
According to a first aspect of embodiments of the present specification, there is provided a task processing method, including:
Acquiring data to be processed aiming at a target task;
Inputting data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between a model processing result of the initial processing model and a reverse model processing result.
According to a second aspect of embodiments of the present specification, there is provided a traffic task processing method, including:
Acquiring traffic data to be processed aiming at a target traffic task;
Inputting traffic data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between a model processing result of the initial processing model and a reverse model processing result.
According to a third aspect of embodiments of the present disclosure, there is provided an initial processing model training method applied to cloud-side equipment, including:
responding to a model training request aiming at a task processing model, and acquiring a model processing result of an initial processing model and unit processing results of a plurality of processing units in the initial processing model;
fixing unit processing results of processing units other than the first processing unit aiming at the first processing unit, and reversely adjusting the unit processing results of the first processing unit to obtain reverse model processing results output by an initial processing model, wherein the first processing unit is any one of a plurality of processing units;
determining the unit weight of the first processing unit according to the model processing result and the reverse model processing result;
determining key processing units in the initial processing model according to the unit weights of the plurality of processing units;
and adjusting unit parameters of the key processing units to obtain a task processing model after training is completed.
According to a fourth aspect of embodiments of the present specification, there is provided a task processing device including:
the first acquisition module is configured to acquire data to be processed for a target task;
The first input module is configured to input data to be processed into a task processing model to obtain task processing results output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between model processing results of the initial processing model and reverse model processing results.
According to a fifth aspect of embodiments of the present specification, there is provided a traffic task processing device including:
The second acquisition module is configured to acquire traffic data to be processed aiming at a target traffic task;
The second input module is configured to input traffic data to be processed into the task processing model to obtain task processing results output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in the initial processing model, and the key processing unit is obtained by carrying out difference between model processing results of the initial processing model and reverse model processing results.
According to a sixth aspect of embodiments of the present disclosure, there is provided an initial processing model training apparatus applied to cloud-side equipment, including:
The third acquisition module is configured to respond to a model training request aiming at the task processing model and acquire model processing results of the initial processing model and unit processing results of a plurality of processing units in the initial processing model;
The first adjusting module is configured to fix unit processing results of processing units other than the first processing unit for the first processing unit, and reversely adjust the unit processing results of the first processing unit to obtain reverse model processing results output by the initial processing model, wherein the first processing unit is any one of a plurality of processing units;
A first determining module configured to determine a unit weight of the first processing unit according to the model processing result and the inverse model processing result;
A second determining module configured to determine key processing units in the initial processing model according to unit weights of the plurality of processing units;
and the second adjusting module is configured to adjust the unit parameters of the key processing unit to obtain a task processing model with completed training.
According to a seventh aspect of embodiments of the present specification, there is provided a computing device comprising:
A memory and a processor;
The memory is configured to store computer executable instructions that, when executed by the processor, implement the steps of the methods provided in the first, second or third aspects above.
According to an eighth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing computer executable instructions which when executed by a processor implement the steps of the method provided in the first or second or third aspects above.
According to a ninth aspect of embodiments of the present specification, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the method provided in the first or second or third aspect described above.
The task processing method provided by one embodiment of the present specification includes: acquiring data to be processed aiming at a target task; inputting data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between a model processing result of the initial processing model and a reverse model processing result. The key processing units in the initial processing model are determined based on the difference between the model processing result and the reverse model processing result, so that the key processing units for determining the model decision process are determined, the interpretability and transparency of the model are improved, the interpretability of the task processing result is further improved, and the task processing model is obtained by carrying out parameter adjustment on the key processing units without adjusting the unit parameters of all units of the initial processing model, so that the training efficiency of the task processing model is improved.
Drawings
FIG. 1 is an architecture diagram of a task processing system provided in one embodiment of the present description;
FIG. 2 is an architecture diagram of another task processing system provided by one embodiment of the present description;
FIG. 3 is a flow chart of a method of task processing provided in one embodiment of the present disclosure;
FIG. 4 is a training flowchart of a task processing model in a task processing method according to an embodiment of the present disclosure;
FIG. 5 is a process flow diagram of a task processing method provided by one embodiment of the present disclosure;
FIG. 6 is a process flow diagram of another task processing method provided by one embodiment of the present disclosure;
FIG. 7 is a flow chart of a traffic task processing method provided by one embodiment of the present disclosure;
FIG. 8 is a flow chart of a task processing model training method provided in one embodiment of the present disclosure;
FIG. 9 is a schematic diagram of a task processing device according to one embodiment of the present disclosure;
FIG. 10 is a schematic view of a traffic task processing device according to one embodiment of the present disclosure;
FIG. 11 is a schematic diagram of a task processing model training device according to an embodiment of the present disclosure;
FIG. 12 is a block diagram of a computing device provided in one embodiment of the present description.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.
Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.
In one or more embodiments of the present description, a large model refers to a deep learning model with large scale model parameters, typically including hundreds of millions, billions, trillions, and even more than one billion model parameters. The large Model can be called as a Foundation Model, the large Model is pre-trained through a large-scale unlabeled corpus, a pre-trained Model with more than one hundred million parameters is produced, the Model can adapt to a wide downstream task, and the Model has better generalization capability, such as a large-scale language Model (LLM, large Language Model), a multi-modal pre-trained Model (multi-modal pre-training Model) and the like.
When the large model is actually applied, the pretrained model can be applied to different tasks by fine tuning with a small amount of samples, the large model can be widely applied to the fields of natural language processing (NLP, natural Language Processing), computer vision and the like, and particularly can be applied to the tasks of the computer vision fields such as vision question and answer (VQA, visual Question Answering), image description (IC), image generation and the like, and the tasks of the natural language processing fields such as emotion classification based on texts, text abstract generation, machine translation and the like, and main application scenes of the large model comprise digital assistants, intelligent robots, searching, online education, office software, electronic commerce, intelligent design and the like.
First, terms related to one or more embodiments of the present specification will be explained.
Thinking chain: the thought Chain (CoT, chain-of-Thought) refers to a series of thought steps with logical relationships, forming a complete thought process. The thinking chain is generally used in prompt learning of the large model, the reasoning process of the large model is decomposed into steps, the steps are intuitively displayed, the large model is stimulated to give the reasoning process on the reasoning task, and then an answer is output, so that the accuracy of the large model in processing the reasoning task is improved.
The advent of large models has attracted a great deal of attention, and the decision making process and interpretation mechanisms of these models are not yet clear, and the ability to interpret large models is critical to establishing trust and acceptance. The model-aware reasoning process may make it easier for users and stakeholders to accept and trust the results of the model.
Interpretation ability is one of the key factors in assessing and improving model performance. Researchers and developers can reveal and resolve possible biases or errors through the predictive process of the in-depth research model. This is critical to ensure fairness and accuracy of the model. For example, in the relevant fields of security, supervision, etc., the use of artificial intelligence is tightly regulated, requiring algorithms to be able to provide reasons for decisions, so that regulatory compliance and ethical criteria are met by means of an interpretive approach, ensuring that the decision process of the model is reliable and interpretable.
On the basis, the embodiment of the specification aims at providing preliminary exploration for the interpretability of the model, researching the internal mechanism of the model for completing the thinking chain reasoning, locating and interpreting key processing units related to the thinking chain reasoning capability in the model, verifying the behavior mode of the key processing units when completing the reasoning task, laying a foundation for developing a large model which is transparent, interpretable and reliable, helping to improve the trust degree of model output, improving and correcting deviation and errors in the model, meeting the requirements of supervision and moral, and providing guidance and revenues for future research and development.
Specifically, the embodiment of the specification provides a task processing method, which obtains data to be processed for a target task; inputting data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between a model processing result of the initial processing model and a reverse model processing result. The key processing units in the initial processing model are determined based on the difference between the model processing result and the reverse model processing result, so that the key processing units for determining the model decision process are determined, the interpretability and transparency of the model are improved, the interpretability of the task processing result is further improved, and the task processing model is obtained by carrying out parameter adjustment on the key processing units without adjusting the unit parameters of all units of the initial processing model, so that the training efficiency of the task processing model is improved.
In the present specification, a task processing method, a task processing model training method, a task processing device, a traffic task processing device, a task processing model training device, a computing device, and a computer-readable storage medium are provided, which are described in detail one by one in the following embodiments.
Referring to fig. 1, fig. 1 illustrates an architecture diagram of a task processing system provided in one embodiment of the present disclosure, where the task processing system may include a client 100 and a server 200;
A client 100, configured to send data to be processed for a target task to a server 200;
The server 200 is configured to input data to be processed into a task processing model, and obtain a task processing result output by the task processing model, where the task processing model is obtained by performing parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by performing parameter adjustment on a difference between a model processing result of the initial processing model and a reverse model processing result; sending a task processing result to the client 100;
The client 100 is further configured to receive a task processing result sent by the server 200.
By applying the scheme of the embodiment of the specification, the key processing units in the initial processing model are determined based on the difference between the model processing result and the inverse model processing result, so that the key processing units for determining the model decision process are determined, the interpretability and transparency of the model are improved, the interpretability of the task processing result is further improved, and the task processing model is obtained by carrying out parameter adjustment on the key processing units without adjusting the unit parameters of all units of the initial processing model, so that the training efficiency of the task processing model is improved.
Referring to fig. 2, fig. 2 illustrates an architecture diagram of another task processing system provided in one embodiment of the present disclosure, where the task processing system may include a plurality of clients 100 and a server 200, where the clients 100 may include an end-side device and the server 200 may include a cloud-side device. Communication connection can be established between the plurality of clients 100 through the server 200, in a task processing scenario, the server 200 is used to provide task processing services between the plurality of clients 100, and the plurality of clients 100 can respectively serve as a transmitting end or a receiving end, so that communication is realized through the server 200.
The user may interact with the server 200 through the client 100 to receive data transmitted from other clients 100, or transmit data to other clients 100, etc. In the task processing scenario, it may be that the user issues a data stream to the server 200 through the client 100, and the server 200 generates a task processing result according to the data stream and pushes the task processing result to other clients that establish communications.
Wherein, the client 100 and the server 200 establish a connection through a network. The network provides a medium for a communication link between client 100 and server 200. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The data transmitted by the client 100 may need to be encoded, transcoded, compressed, etc. before being distributed to the server 200.
The client 100 may be a browser, APP (Application), or a web Application such as H5 (HyperText Markup Language, hypertext markup language (htv) 5 th edition) Application, or a light Application (also called applet, a lightweight Application) or cloud Application, etc., and the client 100 may be based on a software development kit (SDK, software Development Kit) of a corresponding service provided by the server 200, such as a real-time communication (RTC, real Time Communication) based SDK development acquisition, etc. The client 100 may be deployed in an electronic device, need to run depending on the device or some APP in the device, etc. The electronic device may for example have a display screen and support information browsing etc. as may be a personal mobile terminal such as a mobile phone, tablet computer, personal computer etc. Various other types of applications are also commonly deployed in electronic devices, such as human-machine conversation type applications, model training type applications, text processing type applications, web browser applications, shopping type applications, search type applications, instant messaging tools, mailbox clients, social platform software, and the like.
The server 200 may include a server that provides various services, such as a server that provides communication services for multiple clients, a server for background training that provides support for a model used on a client, a server that processes data sent by a client, and so on. It should be noted that, the server 200 may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. The server may also be a server of a distributed system or a server that incorporates a blockchain. The server may also be a cloud server for cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN, content Delivery Network), basic cloud computing services such as big data and artificial intelligence platforms, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.
It should be noted that, the task processing method provided in the embodiments of the present disclosure is generally executed by the server, but in other embodiments of the present disclosure, the client may also have a similar function to the server, so as to execute the task processing method provided in the embodiments of the present disclosure. In other embodiments, the task processing method provided in the embodiments of the present disclosure may be performed by the client and the server together.
Referring to fig. 3, fig. 3 shows a flowchart of a task processing method according to an embodiment of the present disclosure, which specifically includes the following steps:
step 302: and acquiring data to be processed aiming at the target task.
In one or more embodiments of the present disclosure, task processing may be performed on to-be-processed data of a target task, so as to obtain a task processing result corresponding to the to-be-processed data.
In particular, the target task may be a task of a different scenario, such as an inference task, a traffic task, a question-answer task, a retrieval task, and so on. The data to be processed refers to data that may be in different formats, such as voice data, text data, video data, and so forth. The data to be processed may also be data in different languages, such as english data, chinese data, etc.
In practical applications, there are various ways of obtaining the data to be processed for the target task, and the method is specifically selected according to the practical situation, which is not limited in any way in the embodiment of the present specification. In one possible implementation manner of the present disclosure, data to be processed for a target task sent by a user may be received. In another possible implementation manner of the present specification, the data to be processed for the target task may be read from other data acquisition devices or databases.
Step 304: inputting data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between a model processing result of the initial processing model and a reverse model processing result.
By applying the scheme of the embodiment of the specification, the key processing units in the initial processing model are determined based on the difference between the model processing result and the inverse model processing result, so that the key processing units for determining the model decision process are determined, the interpretability and transparency of the model are improved, the interpretability of the task processing result is further improved, and the task processing model is obtained by carrying out parameter adjustment on the key processing units without adjusting the unit parameters of all units of the initial processing model, so that the training efficiency of the task processing model is improved.
In an alternative embodiment of the present disclosure, referring to fig. 4, fig. 4 shows a training flowchart of a task processing model in a task processing method according to an embodiment of the present disclosure, before the data to be processed is input into the task processing model to obtain a task processing result output by the task processing model, the method may further include the following steps 402 to 408:
Step 402: and obtaining a model processing result of the initial processing model and unit processing results of a plurality of processing units in the initial processing model.
Specifically, the initial processing model is an artificial intelligent model, and can be a large model or other pre-trained neural network models. The model processing result refers to an output result obtained by processing input data by the initial processing model, for example, the input data is a problem to be solved "where the sun rises from the earth", and the model processing result is a reply result "the sun rises from the east of the earth". The processing unit refers to a unit for processing model input data, and includes, but is not limited to, an attention unit and a linear mapping unit. The unit processing result of the processing unit is determined according to the processing unit, for example, the processing unit is an attention head, and the unit processing result is an attention vector.
In practical applications, there are various ways of obtaining the model processing results of the initial processing model and the unit processing results of the plurality of processing units in the initial processing model, and the method is specifically selected according to the actual situation, which is not limited in any way in the embodiment of the present specification.
In one possible implementation manner of the present specification, the model processing results of the initial processing model and the unit processing results of a plurality of processing units in the initial processing model may be read from other data acquisition devices or databases.
In another possible implementation manner of the present disclosure, a model processing result and a unit processing result of a plurality of processing units may be generated in real time by using an initial processing model, where the initial processing model includes a plurality of processing units, a residual unit, and a multi-layer neural network, where the obtaining the model processing result of the initial processing model and the unit processing result of the plurality of processing units in the initial processing model may include the following steps:
Obtaining fact data;
Processing the fact data through a plurality of processing units to obtain unit processing results of the plurality of processing units;
And mapping the unit processing result through the residual unit and the multi-layer neural network to obtain a model processing result of the initial processing model.
In particular, the fact data may include questions to be inferred and inference data of the questions to be inferred, and answers conforming to objective facts may be obtained based on the fact data. The fact data may be in the format of < question, reason, answer >. The reasoning data can be understood as a forward reasoning thought and a forward reasoning process. The fact data may also be referred to as reference data (REFERENCE DATA) because the model process results and unit process results derived based on the fact data may be used to compare differently with the inverse model process results to determine the criticality of the processing unit.
It should be noted that, the manner of obtaining the fact data is various, and is specifically selected according to the actual situation, and the embodiment of the present disclosure is not limited in any way. In one possible implementation of the present description, fact data sent by a user may be received. In another possible implementation of the present description, the fact data may be read from other data acquisition devices or databases.
By applying the scheme of the embodiment of the specification, the fact data are acquired; processing the fact data through a plurality of processing units to obtain unit processing results of the plurality of processing units; and mapping the unit processing result through the residual unit and the multi-layer neural network to obtain a model processing result of the initial processing model, wherein the unit processing result and the model processing result are generated based on the fact data, so that the positioning failure of the key processing unit caused by the fact that the model processing result is not prepared is avoided, and the follow-up accurate determination of the key processing unit through result comparison is ensured.
In an alternative embodiment of the present specification, the fact data may be constructed using an in-text learning capability (in-context learning) of the pre-trained language model to excite the reasoning capability of the initial processing model, that is, the above-mentioned obtaining of the fact data may include the following steps:
Acquiring a problem to be inferred;
inputting the problem to be inferred and the inference prompt information into a pre-training language model to obtain inference data corresponding to the problem to be inferred;
and constructing fact data according to the questions to be inferred and the inference data.
Specifically, the pre-training language model may be a large model, or may be a neural network model of natural language that is trained based on a plurality of sample questions and inference data labels corresponding to the sample questions. The reasoning prompt information is used for guiding the pre-training language model to generate forward reasoning data. The reasoning prompt information comprises a reasoning prompt template for prompting tasks to be executed by the pre-training language model and also comprises a reasoning question-answer pair example for learning by the pre-training language model, wherein the reasoning question-answer pair example comprises a reasoning process and a reasoning answer.
It should be noted that, there are various ways of obtaining the problem to be inferred, and the method is specifically selected according to the actual situation, which is not limited in any way in the embodiment of the present disclosure. In one possible implementation manner of the present disclosure, a question to be inferred sent by a user may be received. In another possible implementation manner of the present specification, the problem to be inferred may be read from other data acquisition devices or databases.
Illustratively, assume that the question to be inferred is "what home entertainment devices require a cable? Answer selection: (a) a radio booth; (b) a substation; (c); a cabinet (d); a television; (e) The desk inputs the reasoning prompt information and the question to be reasoning into the pre-training language model, and obtains the reasoning data corresponding to the question to be reasoning as' in the selection, only the television needs the cable. Thus, the answer is yes.
By applying the scheme of the embodiment of the specification, the problem to be inferred is obtained; inputting the problem to be inferred and the inference prompt information into a pre-training language model to obtain inference data corresponding to the problem to be inferred; according to the problem to be inferred and the reasoning data, the fact data is constructed, the learning capacity in the text of the pre-training language model is utilized, and the accuracy of the fact data is ensured.
Step 404: and fixing unit processing results of processing units other than the first processing unit aiming at the first processing unit, and reversely adjusting the unit processing results of the first processing unit to obtain a reverse model processing result output by the initial processing model, wherein the first processing unit is any one of a plurality of processing units.
It should be noted that, since the plurality of processing units in the initial processing model form different inference paths, in order to determine whether each processing unit is a critical processing unit, a path patching (PATH PATCHING) manner may be used to apply a causal disturbance (causal intervention) to the inference path where the first processing unit is located, that is, the unit processing result is reversely adjusted, and the disturbance will propagate to the output node along the inference path where the first processing unit is located, so as to obtain a reverse model processing result. When the unit processing results of the first processing unit are reversely adjusted, in order to ensure that the influence of the first processing unit on the model output result is independently observed, the unit processing results of the processing units other than the first processing unit in the plurality of processing units can be fixed, so that causal disturbance is only performed on the unit processing results of the first processing unit.
In practical applications, there are various ways of reversely adjusting the unit processing result of the first processing unit, and the method is specifically selected according to the practical situation, which is not limited in any way in the embodiment of the present disclosure. In one possible implementation manner of the present disclosure, noise may be added to a unit processing result to obtain a reverse unit processing result that has a significant difference from the unit processing result, and the reverse unit processing result is propagated through an initial processing model to obtain a reverse model processing result.
In another possible implementation manner of the present disclosure, the inverse data may be used to reversely adjust the unit processing result of the first processing unit, that is, reversely adjust the unit processing result of the first processing unit, to obtain a reverse model processing result output by the initial processing model, and may include the following steps:
Obtaining anti-facts data;
The inverse data is processed by a first processing unit to obtain an inverse unit processing result of the first processing unit;
and replacing the unit processing result of the first processing unit with the reverse unit processing result, and performing propagation processing on the reverse unit processing result through the initial processing model to obtain a reverse model processing result.
Specifically, the inverse facts data (counterfactual data) includes the question to be inferred and inverse inference data of the question to be inferred, and an erroneous answer to the question to be inferred can be obtained based on the inverse facts data. The reverse reasoning data can be understood as a reverse reasoning idea and a reverse reasoning process.
It should be noted that, the manner of obtaining the counterfactual data is various, and is specifically selected according to the actual situation, which is not limited in any way in the embodiment of the present disclosure. In one possible implementation of the present description, the inverse facts data sent by the user may be received. In another possible implementation of the present description, the counterfactual data may be read from other data acquisition devices or databases.
Referring to fig. 5, fig. 5 shows a process flow chart of a task processing method provided in one embodiment of the present specification, and as shown in fig. 5, a line represents an inference path of an initial processing model, where the initial processing model includes a plurality of residual units, an output unit, a Multi-Layer persistence (MLP) 1, an attention header 1.0, an attention header 1.1, a Multi-Layer neural network 0, an attention header 0.0, and an attention header 0.1. Assume that it is currently determined whether the attention header 0.0 is a key processing unit: first, the last token in the embedded vector is input into the initial processing model, and the unit processing results of all processing units (multi-layer neural network and attention head) are collected. The result of the unit processing of the other attention heads (e.g. attention head 1.0, attention head 1.1) on the facts is then frozen to cause causal disturbance to the attention head 0.0, replacing its original unit processing result with the inverse unit processing result (derived based on the inverse facts data), the disturbance effect will propagate along the inference path shown in the solid line to the output unit, resulting in an inverse model processing result. Finally, it is determined whether the attention head 0.0 is a key processing unit based on the model processing result (obtained based on the fact data) and the inverse model processing result.
Note that, to ensure independent observation of the effect on the attention header 0.0, the inference path shown by the solid line includes the forward path through the residual unit connection and the MLP, but does not include other attention headers.
By applying the scheme of the embodiment of the specification, the anti-facts data are obtained; the inverse data is processed by a first processing unit to obtain an inverse unit processing result of the first processing unit; the unit processing result of the first processing unit is replaced by the reverse unit processing result, and the reverse unit processing result is transmitted through the initial processing model to obtain the reverse model processing result, so that the accuracy of the reverse model processing result is ensured.
In an alternative embodiment of the present specification, the anti-facts data may be generated based on the facts data, that is, the above-mentioned acquiring the anti-facts data may include the steps of:
obtaining fact data, wherein the fact data comprises reasoning data;
and adjusting the reasoning data into replacement data irrelevant to the reasoning data to obtain the anti-facts data.
Illustratively, reference is made to the above-mentioned problem to be inferred as "why is a home entertainment device required a cable? Answer selection: an example of (a) radio booth (b) substation (c) cabinet (d) television (e) desk ", will reason data" in the above choices, only television needs cable. Thus, the answer is that the "data for replacement" stubborn dolphins jump and play in the flashing blue ocean. The cat sleeps calmly on a comfortable mattress. The answer is "the anti-facts data is available.
In practical application, the inference data is adjusted to be replacement data irrelevant to the inference data, before the anti-facts data is obtained, the replacement data can be obtained, and various ways for obtaining the replacement data are available, and the selection is specifically performed according to practical situations, and the embodiment of the present disclosure does not limit the description. In one possible implementation of the present disclosure, replacement data sent by a user may be received. In another possible implementation of the present description, the replacement data may be read from other data acquisition devices or databases.
By applying the scheme of the embodiment of the specification, the fact data is acquired, wherein the fact data comprises reasoning data; the reasoning data is adjusted to be replacement data irrelevant to the reasoning data, the anti-fact data is obtained, the accuracy of the anti-fact data is guaranteed, and the accuracy of the processing result of the reverse model is further guaranteed.
Step 406: and determining the unit weight of the first processing unit according to the model processing result and the reverse model processing result.
Specifically, the unit weights are used to characterize the importance of the first processing unit in the initial processing model reasoning process. The first processing unit is subjected to causal disturbance to obtain a reverse model processing result, the model processing result and the reverse model processing result are further subjected to difference comparison, and if the difference between the reverse model processing result and the model processing result is larger, the unit weight of the first processing unit is larger, namely the first processing unit is important in the initial processing model reasoning process and is a key processing unit; if the difference between the reverse model processing result and the model processing result is smaller, the unit weight of the first processing unit is smaller, that is, the first processing unit is less important in the initial processing model reasoning process and is not a key processing unit.
In practical application, the mode of determining the unit weight of the first processing unit according to the model processing result and the inverse model processing result is various, and specifically, the mode is selected according to the practical situation, and the embodiment of the present disclosure does not limit the mode.
In one possible implementation manner of the present disclosure, the determining the unit weight of the first processing unit by observing the causal effect (causal effect) in the model processing result after the causal disturbance may include the following steps:
Analyzing the reverse model processing result, determining a first associated keyword in the reverse model processing result, analyzing the model processing result, and determining a second associated keyword in the model processing result, wherein the associated keyword is related to the problem to be inferred;
determining a first weight magnitude of the first processing unit according to the first associated keywords and the reverse model processing result, and determining a second weight magnitude of the first processing unit according to the second associated keywords and the model processing result;
the unit weight of the first processing unit is determined from the first weight magnitude and the second weight magnitude.
It should be noted that, since the number of words involved in word distribution in the model processing result is very large, the finally generated disturbance is very sparse, so that the embodiment of the specification measures the disturbance effect by observing the probability change condition of the associated keywords related to the question to be inferred, so as to rapidly locate the key processing unit. Wherein the associated keywords may be understood as benefit words (WoI, word-of-Interest), and the unit weights determined based on the associated keywords may be used to measure changes in the mental chain reasoning ability of the model.
In practical application, the weight magnitude value may be determined by the following formula (1), the sum of the first weight magnitude value and the second weight magnitude value may be obtained, and the rate of change of the first weight magnitude value and the second weight magnitude value may be used as a measure for evaluating the unit weight of the first processing unit, that is, the unit weight may be determined by the following formula (2):
Where t represents the weight magnitude, P gt represents the associated keyword probability, ΣP cd represents the sum of the probabilities of the words, h i,j represents the first processing unit, Representing the unit weight of the first processing unit,/>Representing a first weight magnitude, t 0 represents a second weight magnitude. If/>Negative, the relative confidence of the reverse model processing results is reduced, thereby indicating that the reasoning performance of the initial processing model is impaired,/>The greater the drop, the greater the importance of the first processing unit.
By applying the scheme of the embodiment of the specification, analyzing the reverse model processing result, determining a first associated keyword in the reverse model processing result, analyzing the model processing result, and determining a second associated keyword in the model processing result, wherein the associated keyword is related to a problem to be inferred; determining a first weight magnitude of the first processing unit according to the first associated keywords and the reverse model processing result, and determining a second weight magnitude of the first processing unit according to the second associated keywords and the model processing result; and determining the unit weight of the first processing unit according to the first weight magnitude and the second weight magnitude, so that the disturbance effect is measured through probability change conditions of related keywords before and after causal disturbance, and the accuracy of the unit weight is ensured.
In another possible implementation manner of the present disclosure, the determining the unit weight of the first processing unit according to the model processing result and the inverse model processing result may include the following steps:
Analyzing the reverse model processing result, and determining a first associated keyword in the reverse model processing result, wherein the associated keyword is related to the problem to be inferred;
inputting the first associated keywords into an initial processing model to obtain a prediction processing result output by the initial processing model;
and determining the unit weight of the first processing unit according to the model processing result and the prediction processing result.
It should be noted that, when the reverse model processing result is analyzed and the first associated keyword in the reverse model processing result is determined, the keyword at the last token position can be directly used as the first associated keyword from the result sequence corresponding to the model processing result based on the keyword sequence. The first associated keyword may also be directly extracted from the inverse model processing result using a keyword positioning function.
In practical application, because the same task is executed when the initial processing model generates the reverse model processing result and the model processing result, in order to avoid the influence of the task on the unit weight, after the first associated keyword is determined, the first associated keyword can be input into the initial processing model, and the initial processing model can freely generate the prediction processing result.
Further, when determining the unit weight of the first processing unit according to the model processing result and the prediction processing result, the model processing result and the prediction processing result can be directly compared, if the difference between the prediction processing result and the model processing result is larger, the unit weight of the first processing unit is larger, that is, the first processing unit is important in the initial processing model reasoning process and is a key processing unit; if the difference between the prediction processing result and the model processing result is smaller, the unit weight of the first processing unit is smaller, that is, the first processing unit is less important in the initial processing model reasoning process and is not a key processing unit. The cell weights of the first processing cell may also be generated using a pre-trained language model.
By applying the scheme of the embodiment of the specification, analyzing the reverse model processing result, and determining a first associated keyword in the reverse model processing result, wherein the associated keyword is related to a problem to be inferred; inputting the first associated keywords into an initial processing model to obtain a prediction processing result output by a pre-training language model; and determining the unit weight of the first processing unit according to the model processing result and the prediction processing result, wherein the prediction processing result is freely generated by the initial processing model and is not limited to a certain task, so that the prediction processing result is more flexible.
In one possible implementation manner of the present disclosure, the determining the unit weight of the first processing unit according to the model processing result and the prediction processing result may include the following steps:
And inputting the weight generation prompt information, the model processing result and the prediction processing result into a pre-training language model to obtain the unit weight of the first processing unit.
Specifically, the weight generation hint information is used to guide the pre-training language model generation unit weights. For example, the weight generation hint information may be "please score the degree of change of the model processing result and the prediction processing result, to obtain the unit weight".
By applying the scheme of the embodiment of the specification, the weight generation prompt information, the model processing result and the prediction processing result are input into the pre-training language model to obtain the unit weight of the first processing unit, so that the efficiency of obtaining the unit weight is improved.
Step 408: and determining key processing units in the initial processing model according to the unit weights of the plurality of processing units, and carrying out parameter adjustment on the key processing units to obtain the task processing model.
Specifically, the key processing unit refers to a processing unit, which is important or even indispensable to the initial processing model reasoning process, among the plurality of processing units. A critical processing unit may also be understood as a processing unit that has a decisive influence on the reasoning results of the initial processing model.
In practical applications, the manner of determining the key processing units in the initial processing model according to the unit weights of the plurality of processing units is various, and specifically, the method is selected according to the actual situation, which is not limited in any way in the embodiment of the present specification. In one possible implementation manner of the present disclosure, the unit weights of the plurality of processing units may be ordered, and the N processing units that are ordered first are determined as the key processing units. In another possible implementation manner of the present disclosure, a weight threshold may be obtained, and a processing unit with a unit weight greater than or equal to the weight threshold is determined as a critical processing unit. Wherein, N and the weight threshold value are set according to the actual situation.
By applying the scheme of the embodiment of the specification, the importance degree of the processing unit in the model decision process can be accurately determined by reversely adjusting the processing result of the unit and observing the change of the adjusted model processing result, so that the transparency and the trust degree of model output are improved, and the decision process of the model is ensured to be reliable and interpretable.
In an alternative embodiment of the present disclosure, after determining the key processing units in the initial processing model according to the unit weights of the plurality of processing units, the method may further include the following steps:
And carrying out criticality verification on the key processing unit, and determining a verification result of the key processing unit.
It should be noted that, the verification result refers to whether the key processing unit is the processing unit for determining the model decision process. After determining the critical processing units, the criticality (importance) of the critical processing units may be verified, while the non-criticality of other processing units may be confirmed, thereby ensuring the accuracy of the critical processing units. Meanwhile, the verification result of the key processing unit can be fed back to the user, so that the transparency of the model decision process is improved.
In practical application, the key processing unit is subjected to key verification, and various ways of determining the verification result of the key processing unit are selected according to practical situations, which is not limited in the embodiment of the present disclosure.
In one possible implementation manner of the present disclosure, the unit processing results of the key processing units may be fixed, and the unit processing results of the processing units other than the key processing units may be reversely adjusted to obtain a first verification result output by the initial processing model, where if the first verification result has a smaller difference compared with the model processing result before the reverse adjustment, it is indicated that the key processing unit is an accurate key processing unit. The method for determining the difference between the first verification result and the model processing result is various, and is specifically selected according to the actual situation, which is not limited in any way in the embodiment of the present specification. In one possible implementation manner of the present disclosure, the cosine similarity between the first verification result and the model processing result may be calculated, if the cosine similarity is smaller than or equal to a preset verification threshold, it is determined that the difference is smaller, and the key processing unit is an accurate key processing unit, and the preset verification threshold is set according to an actual situation. In another possible implementation manner of the present disclosure, the first verification result and the model processing result may be input into a difference comparison model, so as to obtain a difference between the first verification result and the model processing result.
In another possible implementation manner of the present disclosure, the criticality of the key processing unit may be verified by a unit elimination (knockout) method, that is, the above-mentioned critical verification is performed on the key processing unit, and the determining the verification result of the key processing unit may include the following steps:
Screening a control processing unit from the plurality of processing units;
fixing unit processing results of processing units except the key processing units, and reversely adjusting the unit processing results of the key processing units to obtain reverse key processing results output by the initial processing model;
Fixing the unit processing results of the processing units except the control processing unit, and reversely adjusting the unit processing results of the control processing unit to obtain a reverse control processing result output by the initial processing model;
and determining the verification result of the key processing unit according to the reverse key processing result and the reverse comparison processing result.
Specifically, the number of the control processing units selected randomly from the plurality of processing units may be one or more. If the reference processing unit is one, then the reference processing unit is a processing unit other than the critical processing unit. If the control processing unit is plural, the control processing unit includes at least one processing unit other than the key processing unit. Preferably, the control processing units are all processing units other than the key processing units, and the number of the control processing units is consistent with that of the key processing units.
It should be noted that, the "fixing the unit processing results of the processing units other than the key processing unit" and reversely adjusting the unit processing results of the key processing unit to obtain the reverse key processing result output by the initial processing model; the implementation manner of fixing the unit processing results of the processing units other than the comparison processing unit and reversely adjusting the unit processing results of the comparison processing unit to obtain the reverse comparison processing result output by the initial processing model may refer to the implementation manner of fixing the unit processing results of the processing units other than the first processing unit and reversely adjusting the unit processing results of the first processing unit to obtain the reverse model processing result output by the initial processing model, which is not described in detail in the embodiments of the present specification.
In practical application, the unit processing results of the key processing units are reversely adjusted, meanwhile, the unit processing results of the comparison processing units are reversely adjusted to serve as a comparison group, and then forward reasoning is carried out on the initial processing model to obtain reverse key processing results and reverse comparison processing results. If the Accuracy (Accuracy) of the reverse critical processing result is reduced compared with the reverse control processing result, the reasoning capability of the initial processing model is reduced, and the critical processing unit is an accurate critical processing unit.
It is worth to say that, after the key processing units are eliminated by the unit elimination method, the reasoning capability of the discovery model is greatly reduced. And, among the key processing units, some play a key role in judging the final answer, and some play a key role in synthesizing stepwise thinking to obtain the answer, which corresponds to two stages of the mental chain reasoning process: first thinking step by step to get intermediate ideas and then answer questions based on these ideas.
By applying the scheme of the embodiment of the specification, fixing the unit processing results of the processing units except the key processing unit, and reversely adjusting the unit processing results of the key processing unit to obtain a reverse key processing result output by the initial processing model; fixing the unit processing results of the processing units except the control processing unit, and reversely adjusting the unit processing results of the control processing unit to obtain a reverse control processing result output by the initial processing model; and determining the verification result of the key processing unit according to the reverse key processing result and the reverse comparison processing result, thereby ensuring the accuracy of the verification result.
Referring to fig. 6, fig. 6 is a flowchart illustrating a processing procedure of another task processing method according to an embodiment of the present disclosure, where when a critical processing unit is verified by using a unit rejection method, it is assumed that the inference capability of an initial processing model is not affected after a control processing unit is rejected, but a significant degradation is exhibited when the critical processing unit is rejected. In this case, it can be verified that the critical processing unit contributes to the reasoning task. As shown in fig. 5, in the unit rejection process, the unit processing result of each key processing unit in the fact data may be replaced with the inverse unit processing result corresponding to the inverse fact data. Then, the prediction change of the answer output by the initial processing model before and after the unit rejection is observed and compared.
In an alternative embodiment of the present disclosure, after determining the key processing units in the initial processing model according to the unit weights of the plurality of processing units, the method may further include the following steps:
and adjusting unit parameters of the key processing unit to obtain an initial processing model after parameter adjustment.
After determining the key processing units in the initial processing model, the key processing units only occupy part of the plurality of processing units in the initial processing model, so that only the unit parameters of the key processing units are adjusted, and the training speed of the initial processing model can be improved.
In practical applications, there are various ways of adjusting the unit parameters of the key processing unit, and the method is specifically selected according to the practical situation, which is not limited in any way in the embodiments of the present disclosure. In one possible implementation manner of the present disclosure, a target unit parameter sent by a user may be obtained, and a unit parameter of a key processing unit is directly replaced by a target unit parameter, so as to obtain an initial processing model after parameter adjustment. In another possible implementation manner of the present disclosure, a training sample set may be obtained, a plurality of training samples in the training sample set are input into an initial processing model, a prediction result of the initial processing model is obtained, a loss value is calculated according to a real label and the prediction result of the training sample, a unit parameter of a key processing unit is adjusted based on the loss value until a preset stopping condition is reached, and an initial processing model after parameter adjustment is obtained, where the initial processing model after parameter adjustment may be understood as an initial processing model after training is completed.
By applying the scheme of the embodiment of the specification, the unit parameters of the key processing units are adjusted, the initial processing model with the adjusted parameters is obtained, and the training speed of the initial processing model is improved.
The following describes the task processing method provided in the present specification by taking an application of the task processing method in an intelligent traffic scenario as an example with reference to fig. 7. Fig. 7 shows a flow chart of a traffic task processing method according to an embodiment of the present disclosure, which specifically includes the following steps:
step 702: and acquiring traffic data to be processed aiming at the target traffic task.
Step 704: inputting traffic data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between a model processing result of the initial processing model and a reverse model processing result.
It should be noted that, the implementation manners of the steps 702 to 704 may refer to the implementation manners of the steps 302 to 304, and the description of the embodiment of the present disclosure is omitted. The target traffic tasks include, but are not limited to, traffic route planning tasks, map planning tasks, and the like.
By applying the scheme of the embodiment of the specification, the key processing units in the initial processing model are determined based on the difference between the model processing result and the inverse model processing result, so that the key processing units for determining the model decision process are determined, the interpretability and transparency of the model are improved, the interpretability of the task processing result is further improved, and the task processing model is obtained by carrying out parameter adjustment on the key processing units without adjusting the unit parameters of all units of the initial processing model, so that the training efficiency of the task processing model is improved.
In an optional embodiment of the present disclosure, after the traffic data to be processed is input into the task processing model and the task processing result output by the task processing model is obtained, the method may include the following steps:
And receiving adjustment data sent by the user based on the task processing result, and adjusting model parameters of the task processing model according to the adjustment data.
It should be noted that, after the task processing result is obtained, the task processing result may be sent to the client, so that the client displays the task processing result to the user. The client side displays the task processing result to the user in various manners, and the method is specifically selected according to the actual situation, and the embodiment of the present disclosure is not limited in any way. In one possible implementation manner of the present disclosure, the task processing result may be directly displayed to the user. In another possible implementation manner of the present disclosure, the task processing result may be displayed to the user according to the display requirement information of the user. The display requirement information characterizes the requirement of a user for checking the task processing result, and includes, but is not limited to, displaying only the task processing result, displaying a storage path of the task processing result, displaying data to be processed and the task processing result.
In practical applications, the user may not be satisfied with the task processing result, and at this time, adjustment data sent by the user based on the task processing result may be received, and model parameters of the task processing model may be adjusted according to the adjustment data. Wherein the adjustment data includes, but is not limited to, the adjusted task processing results. The process of adjusting the model parameters of the task processing model according to the adjustment data may refer to the training process of the task processing model, and will not be described in detail in the embodiment of the present specification.
By applying the scheme of the embodiment of the specification, the adjustment data sent by the user based on the task processing result is received, and the model parameters of the task processing model are adjusted according to the adjustment data, so that the task processing model can be updated based on the adjustment data fed back by the user, the accuracy of the task processing model is improved, and meanwhile, the user experience is improved.
Referring to fig. 8, fig. 8 shows a flowchart of a task processing model training method provided in an embodiment of the present disclosure, where an initial processing model training method is applied to cloud side equipment, and specifically includes the following steps:
Step 802: and responding to the model training request aiming at the task processing model, and acquiring the model processing result of the initial processing model and the unit processing results of a plurality of processing units in the initial processing model.
Step 804: and fixing unit processing results of processing units other than the first processing unit aiming at the first processing unit, and reversely adjusting the unit processing results of the first processing unit to obtain a reverse model processing result output by the initial processing model, wherein the first processing unit is any one of a plurality of processing units.
Step 806: and determining the unit weight of the first processing unit according to the model processing result and the reverse model processing result.
Step 808: the key processing units in the initial processing model are determined based on the unit weights of the plurality of processing units.
Step 810: and adjusting unit parameters of the key processing units to obtain a task processing model after training is completed.
It should be noted that, the implementation manner of steps 802 to 810 may refer to the implementation manner of steps 402 to 408, and the description of the embodiment of the present disclosure is omitted.
By applying the scheme of the embodiment of the specification, the importance degree of the processing unit in the model decision process can be accurately determined by reversely adjusting the processing result of the unit and observing the change of the adjusted model processing result, so that the transparency and the trust degree of model output are improved, and the interpretability of the model decision process is improved. And after the key processing units are determined, the unit parameters of the key processing units are directly adjusted, so that the parameter adjustment amount is reduced, and the training efficiency of the task processing model is improved.
Corresponding to the task processing method embodiment, the present disclosure further provides a task processing device embodiment, and fig. 9 shows a schematic structural diagram of a task processing device provided in one embodiment of the present disclosure. As shown in fig. 9, the apparatus includes:
a first obtaining module 902 configured to obtain data to be processed for a target task;
the first input module 904 is configured to input data to be processed into a task processing model, and obtain a task processing result output by the task processing model, where the task processing model is obtained by performing parameter adjustment on a key processing unit in the initial processing model, and the key processing unit is obtained by performing parameter adjustment on a difference between a model processing result of the initial processing model and a reverse model processing result.
Optionally, the apparatus further comprises: the task processing model training unit is configured to acquire a model processing result of the initial processing model and unit processing results of a plurality of processing units in the initial processing model; fixing unit processing results of processing units other than the first processing unit aiming at the first processing unit, and reversely adjusting the unit processing results of the first processing unit to obtain reverse model processing results output by an initial processing model, wherein the first processing unit is any one of a plurality of processing units; determining the unit weight of the first processing unit according to the model processing result and the reverse model processing result; and determining key processing units in the initial processing model according to the unit weights of the plurality of processing units, and carrying out parameter adjustment on the key processing units to obtain the task processing model.
Optionally, the task processing model training unit is further configured to parse the reverse model processing result, determine a first associated keyword in the reverse model processing result, parse the model processing result, and determine a second associated keyword in the model processing result, where the associated keyword is related to the problem to be inferred; determining a first weight magnitude of the first processing unit according to the first associated keywords and the reverse model processing result, and determining a second weight magnitude of the first processing unit according to the second associated keywords and the model processing result; the unit weight of the first processing unit is determined from the first weight magnitude and the second weight magnitude.
Optionally, the task processing model training unit is further configured to parse the reverse model processing result and determine a first associated keyword in the reverse model processing result, where the associated keyword is related to the problem to be inferred; inputting the first associated keywords into an initial processing model to obtain a prediction processing result output by the initial processing model; and determining the unit weight of the first processing unit according to the model processing result and the prediction processing result.
Optionally, the task processing model training unit is further configured to input the weight generation prompt information, the model processing result and the prediction processing result into the pre-training language model, so as to obtain the unit weight of the first processing unit.
Optionally, the task processing model training unit is further configured to screen out a control processing unit from the plurality of processing units; fixing unit processing results of processing units except the key processing units, and reversely adjusting the unit processing results of the key processing units to obtain reverse key processing results output by the initial processing model; fixing the unit processing results of the processing units except the control processing unit, and reversely adjusting the unit processing results of the control processing unit to obtain a reverse control processing result output by the initial processing model; and determining the verification result of the key processing unit according to the reverse key processing result and the reverse comparison processing result.
Optionally, the task processing model training unit is further configured to acquire the counterfactual data; the inverse data is processed by a first processing unit to obtain an inverse unit processing result of the first processing unit; and replacing the unit processing result of the first processing unit with the reverse unit processing result, and performing propagation processing on the reverse unit processing result through the initial processing model to obtain a reverse model processing result.
Optionally, the task processing model training unit is further configured to acquire fact data, wherein the fact data comprises reasoning data; and adjusting the reasoning data into replacement data irrelevant to the reasoning data to obtain the anti-facts data.
Optionally, the initial processing model comprises a plurality of processing units, residual units, and a multi-layer neural network; the task processing model training unit is further configured to acquire fact data; processing the fact data through a plurality of processing units to obtain unit processing results of the plurality of processing units; and mapping the unit processing result through the residual unit and the multi-layer neural network to obtain a model processing result of the initial processing model.
Optionally, the task processing model training unit is further configured to acquire a problem to be inferred; inputting the problem to be inferred and the inference prompt information into a pre-training language model to obtain inference data corresponding to the problem to be inferred; and constructing fact data according to the questions to be inferred and the inference data.
By applying the scheme of the embodiment of the specification, the key processing units in the initial processing model are determined based on the difference between the model processing result and the inverse model processing result, so that the key processing units for determining the model decision process are determined, the interpretability and transparency of the model are improved, the interpretability of the task processing result is further improved, and the task processing model is obtained by carrying out parameter adjustment on the key processing units without adjusting the unit parameters of all units of the initial processing model, so that the training efficiency of the task processing model is improved.
The above is a schematic solution of a task processing device of the present embodiment. It should be noted that, the technical solution of the task processing device and the technical solution of the task processing method belong to the same concept, and details of the technical solution of the task processing device, which are not described in detail, can be referred to the description of the technical solution of the task processing method.
Corresponding to the above-mentioned traffic task processing method embodiment, the present disclosure further provides a traffic task processing device embodiment, and fig. 10 shows a schematic structural diagram of a traffic task processing device provided in one embodiment of the present disclosure. As shown in fig. 10, the apparatus includes:
A second acquisition module 1002 configured to acquire traffic data to be processed for a target traffic task;
The second input module 1004 is configured to input the traffic data to be processed into a task processing model, and obtain a task processing result output by the task processing model, where the task processing model is obtained by performing parameter adjustment on a key processing unit in the initial processing model, and the key processing unit is obtained by performing parameter adjustment on a difference between a model processing result of the initial processing model and a reverse model processing result.
Optionally, the apparatus further comprises: and the receiving module is configured to receive adjustment data sent by a user based on the task processing result and adjust model parameters of the task processing model according to the adjustment data.
By applying the scheme of the embodiment of the specification, the key processing units in the initial processing model are determined based on the difference between the model processing result and the inverse model processing result, so that the key processing units for determining the model decision process are determined, the interpretability and transparency of the model are improved, the interpretability of the task processing result is further improved, and the task processing model is obtained by carrying out parameter adjustment on the key processing units without adjusting the unit parameters of all units of the initial processing model, so that the training efficiency of the task processing model is improved.
The above is a schematic scheme of a traffic task processing device of the present embodiment. It should be noted that, the technical solution of the traffic task processing device and the technical solution of the traffic task processing method belong to the same concept, and details of the technical solution of the traffic task processing device, which are not described in detail, can be referred to the description of the technical solution of the traffic task processing method.
Corresponding to the task processing model training method embodiment, the present disclosure further provides a task processing model training device embodiment, and fig. 11 shows a schematic structural diagram of a task processing model training device provided in one embodiment of the present disclosure. As shown in fig. 11, the apparatus is applied to cloud-side equipment, and includes:
a third obtaining module 1102 configured to obtain a model processing result of the initial processing model and unit processing results of a plurality of processing units in the initial processing model in response to a model training request for the task processing model;
A first adjustment module 1104 configured to fix, for a first processing unit, unit processing results of processing units other than the first processing unit, and reversely adjust the unit processing results of the first processing unit to obtain a reverse model processing result output by the initial processing model, where the first processing unit is any one of the plurality of processing units;
A first determining module 1106 configured to determine a cell weight of the first processing cell based on the model processing result and the inverse model processing result;
A second determination module 1108 configured to determine key processing units in the initial processing model based on the unit weights of the plurality of processing units;
A second adjustment module 1110 configured to adjust unit parameters of the key processing units to obtain a trained task processing model.
By applying the scheme of the embodiment of the specification, the importance degree of the processing unit in the model decision process can be accurately determined by reversely adjusting the processing result of the unit and observing the change of the adjusted model processing result, so that the transparency and the trust degree of model output are improved, and the decision process of the model is ensured to be reliable and interpretable. And after the key processing units are determined, the unit parameters of the key processing units are directly adjusted, so that the parameter adjustment amount is reduced, and the training efficiency of the task processing model is improved.
The above is a schematic scheme of a task processing model training device of the present embodiment. It should be noted that, the technical solution of the task processing model training device and the technical solution of the task processing model training method belong to the same concept, and details of the technical solution of the task processing model training device which are not described in detail can be referred to the description of the technical solution of the task processing model training method.
FIG. 12 illustrates a block diagram of a computing device provided in one embodiment of the present description. The components of computing device 1200 include, but are not limited to, memory 1210 and processor 1220. Processor 1220 is coupled to memory 1210 by bus 1230 and database 1250 is used to store data.
The computing device 1200 also includes an access device 1240, the access device 1240 enabling the computing device 1200 to communicate via the one or more networks 1260. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 1240 may include one or more of any type of Network interface, wired or wireless, such as a Network interface card (NIC, network INTERFACE CARD), such as an IEEE802.11 wireless local area Network (WLAN, wireless Local Area Networks) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, world Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular Network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above components of computing device 1200, as well as other components not shown in fig. 12, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 12 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 1200 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 1200 may also be a mobile or stationary server.
The processor 1220 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the task processing method or traffic task processing method or task processing model training method described above.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device belongs to the same concept as the technical solution of the task processing method, the traffic task processing method and the task processing model training method, and details of the technical solution of the computing device which are not described in detail can be referred to the description of the technical solution of the task processing method, the traffic task processing method or the task processing model training method.
An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the task processing method or traffic task processing method or task processing model training method described above.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium belongs to the same concept as the technical solution of the task processing method, the traffic task processing method and the task processing model training method, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the task processing method, the traffic task processing method or the task processing model training method.
An embodiment of the present disclosure further provides a computer program, where the computer program, when executed in a computer, causes the computer to perform the steps of the task processing method or the traffic task processing method or the task processing model training method described above.
The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program belongs to the same concept as the technical solution of the task processing method, the traffic task processing method and the task processing model training method, and details of the technical solution of the computer program which are not described in detail can be referred to the description of the technical solution of the task processing method, the traffic task processing method or the task processing model training method.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims (15)

1. A task processing method, comprising:
Acquiring data to be processed aiming at a target task;
And inputting the data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between the model processing result of the initial processing model and a reverse model processing result.
2. The method according to claim 1, wherein before the data to be processed is input into the task processing model and the task processing result output by the task processing model is obtained, further comprising:
obtaining a model processing result of an initial processing model and unit processing results of a plurality of processing units in the initial processing model;
Fixing unit processing results of processing units other than a first processing unit aiming at the first processing unit, and reversely adjusting the unit processing results of the first processing unit to obtain reverse model processing results output by the initial processing model, wherein the first processing unit is any one of the processing units;
Determining the unit weight of the first processing unit according to the model processing result and the reverse model processing result;
And determining key processing units in the initial processing model according to the unit weights of the plurality of processing units, and carrying out parameter adjustment on the key processing units to obtain a task processing model.
3. The method of claim 2, the determining the cell weight of the first processing cell based on the model processing result and the inverse model processing result, comprising:
Analyzing the reverse model processing result, determining a first associated keyword in the reverse model processing result, analyzing the model processing result, and determining a second associated keyword in the model processing result, wherein the associated keyword is related to a problem to be inferred;
Determining a first weight magnitude of the first processing unit according to the first associated keyword and the reverse model processing result, and determining a second weight magnitude of the first processing unit according to the second associated keyword and the model processing result;
determining a unit weight of the first processing unit based on the first weight magnitude and the second weight magnitude.
4. The method of claim 2, the determining the cell weight of the first processing cell based on the model processing result and the inverse model processing result, comprising:
Analyzing the reverse model processing result, and determining a first associated keyword in the reverse model processing result, wherein the associated keyword is related to a problem to be inferred;
inputting the first associated keywords into the initial processing model to obtain a prediction processing result output by the initial processing model;
And determining the unit weight of the first processing unit according to the model processing result and the prediction processing result.
5. The method of claim 4, the determining the cell weight of the first processing cell based on the model processing result and the prediction processing result, comprising:
And inputting the weight generation prompt information, the model processing result and the prediction processing result into a pre-training language model to obtain the unit weight of the first processing unit.
6. The method of claim 2, wherein after determining the key processing units in the initial processing model according to the unit weights of the plurality of processing units, further comprising:
screening a control processing unit from the plurality of processing units;
fixing unit processing results of processing units except the key processing unit, and reversely adjusting the unit processing results of the key processing unit to obtain a reverse key processing result output by the initial processing model;
fixing the unit processing results of the processing units except the comparison processing unit, and reversely adjusting the unit processing results of the comparison processing unit to obtain a reverse comparison processing result output by the initial processing model;
and determining the verification result of the key processing unit according to the reverse key processing result and the reverse comparison processing result.
7. The method of claim 2, wherein said reversely adjusting the unit processing result of the first processing unit to obtain a reverse model processing result output by the initial processing model comprises:
Obtaining anti-facts data;
processing the inverse fact data through the first processing unit to obtain an inverse unit processing result of the first processing unit;
and replacing the unit processing result of the first processing unit with the reverse unit processing result, and performing propagation processing on the reverse unit processing result through the initial processing model to obtain a reverse model processing result.
8. The method of claim 7, the acquiring the counterfactual data comprising:
obtaining fact data, wherein the fact data comprises reasoning data;
and adjusting the reasoning data into replacement data irrelevant to the reasoning data to obtain anti-facts data.
9. The method of claim 2, the initial processing model comprising a plurality of processing units, residual units, and a multi-layer neural network;
The obtaining the model processing result of the initial processing model and the unit processing results of a plurality of processing units in the initial processing model includes:
Obtaining fact data;
processing the fact data through the plurality of processing units to obtain unit processing results of the plurality of processing units;
and mapping the unit processing result through the residual unit and the multi-layer neural network to obtain a model processing result of the initial processing model.
10. The method of claim 9, the obtaining factual data comprising:
Acquiring a problem to be inferred;
inputting the to-be-inferred problems and the inference prompt information into a pre-training language model to obtain inference data corresponding to the to-be-inferred problems;
And constructing fact data according to the questions to be inferred and the inference data.
11. A traffic task processing method, comprising:
Acquiring traffic data to be processed aiming at a target traffic task;
Inputting the traffic data to be processed into a task processing model to obtain a task processing result output by the task processing model, wherein the task processing model is obtained by carrying out parameter adjustment on a key processing unit in an initial processing model, and the key processing unit is obtained by carrying out difference between a model processing result of the initial processing model and a reverse model processing result.
12. The method of claim 11, wherein the inputting the traffic data to be processed into the task processing model, after obtaining the task processing result output by the task processing model, further comprises:
And receiving adjustment data sent by a user based on the task processing result, and adjusting model parameters of the task processing model according to the adjustment data.
13. A task processing model training method is applied to cloud side equipment and comprises the following steps:
responding to a model training request aiming at a task processing model, and acquiring a model processing result of an initial processing model and unit processing results of a plurality of processing units in the initial processing model;
Fixing unit processing results of processing units other than a first processing unit aiming at the first processing unit, and reversely adjusting the unit processing results of the first processing unit to obtain reverse model processing results output by the initial processing model, wherein the first processing unit is any one of the processing units;
Determining the unit weight of the first processing unit according to the model processing result and the reverse model processing result;
Determining key processing units in the initial processing model according to the unit weights of the plurality of processing units;
And adjusting the unit parameters of the key processing units to obtain a task processing model after training.
14. A computing device, comprising:
A memory and a processor;
the memory is configured to store computer executable instructions that, when executed by a processor, implement the steps of the method of any one of claims 1 to 10 or any one of claims 11 to 12 or claim 13.
15. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the steps of the method of any one of claims 1 to 10 or any one of claims 11 to 12 or claim 13.
CN202311839740.XA 2023-12-27 2023-12-27 Task processing, traffic task processing and task processing model training method Pending CN117971420A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311839740.XA CN117971420A (en) 2023-12-27 2023-12-27 Task processing, traffic task processing and task processing model training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311839740.XA CN117971420A (en) 2023-12-27 2023-12-27 Task processing, traffic task processing and task processing model training method

Publications (1)

Publication Number Publication Date
CN117971420A true CN117971420A (en) 2024-05-03

Family

ID=90853750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311839740.XA Pending CN117971420A (en) 2023-12-27 2023-12-27 Task processing, traffic task processing and task processing model training method

Country Status (1)

Country Link
CN (1) CN117971420A (en)

Similar Documents

Publication Publication Date Title
CN111897941B (en) Dialogue generation method, network training method, device, storage medium and equipment
CN110188331B (en) Model training method, dialogue system evaluation method, device, equipment and storage medium
CN107846350B (en) Method, computer readable medium and system for context-aware network chat
US10311895B2 (en) Assessing the structural quality of conversations
WO2020155619A1 (en) Method and apparatus for chatting with machine with sentiment, computer device and storage medium
CN117332072B (en) Dialogue processing, voice abstract extraction and target dialogue model training method
CN116050405A (en) Text processing, question-answer text processing and text processing model training method
CN117521675A (en) Information processing method, device, equipment and storage medium based on large language model
CN116595154B (en) Task processing method and automatic question-answering method
CN117271745A (en) Information processing method and device, computing equipment and storage medium
CN114925681A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN114880991A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN117971420A (en) Task processing, traffic task processing and task processing model training method
CN111222533B (en) Deep learning visual question-answering method and system based on dependency tree
CN114020908A (en) Text classification method and device, computer readable storage medium and electronic equipment
CN114970494A (en) Comment generation method and device, electronic equipment and storage medium
CN117573842B (en) Document retrieval method and automatic question-answering method
CN116467500B (en) Data relation identification, automatic question-answer and query sentence generation method
CN116776870B (en) Intention recognition method, device, computer equipment and medium
CN117972222B (en) Enterprise information retrieval method and device based on artificial intelligence
CN118194985A (en) Task processing, legal task processing, task processing model training method, computing device, computer readable storage medium, and computer program product
CN118212460A (en) Image classification method, automatic question-answering method, image class feature fusion model training method and information processing method based on deep learning model
CN118227731A (en) Sample data construction method and question-answer model training method
CN118245587A (en) Model test method and model test device
Ali et al. Intelligent Agents in Educational Institutions: AEdBOT–A Chatbot for Administrative Assistance using Deep Learning Hybrid Model Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination