CN116542308A

CN116542308A - Task processing method and related equipment thereof

Info

Publication number: CN116542308A
Application number: CN202310365374.2A
Authority: CN
Inventors: 张檬; 李良友; 刘洋; 王硕; 李鹏; 孙茂松
Original assignee: Tsinghua University; Huawei Technologies Co Ltd
Current assignee: Tsinghua University; Huawei Technologies Co Ltd
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-08-04

Abstract

The application discloses a task processing method and related equipment thereof, which can accurately obtain the processing result of information to be processed in a certain task, thereby successfully processing the task which is required to be completed by a user and being beneficial to improving user experience. The method comprises the following steps: when the user needs to complete the first task, a first prompt from the user can be acquired first, and the first prompt is input to the target model, wherein the first prompt is used for indicating to execute the first task on the first information input by the user. The object model may then process the first prompt to obtain features of the first prompt. The object model may then process the features of the first prompt to derive a second prompt, which is used only to indicate the first task. Finally, the target model can process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained. The target model thus successfully processes the first task that the user needs to complete.

Description

Task processing method and related equipment thereof

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence (artificial intelligence, AI), in particular to a task processing method and related equipment thereof.

Background

With the rapid development of AI technology, a large-scale language model is widely used to handle tasks in various fields. For the information to be processed in a certain task, the processing result of the information can be obtained through a large-scale language model, so that the corresponding task is completed.

In the related art, a certain task and information to be processed in the task may be expressed as a prompt (prompt) in a natural language, and the prompt may be input to a large-scale language model. Then, the large-scale language model can process the prompt to obtain the characteristics of the prompt, and further process the characteristics of the prompt to obtain the processing result of the information to be processed in the task, which is equivalent to completing the task.

In the above process, the prompt is used as an input of a large-scale language model, and is used for describing the task and the information to be processed. In the process of obtaining the processing result of the information, the large-scale language model needs to constantly know the content related to the task from the prompt, and the length of the prompt is usually large, so that the model may misunderstand the task as another task, and the finally obtained processing result is a wrong result.

Disclosure of Invention

The embodiment of the application provides a task processing method and related equipment thereof, which can accurately obtain the processing result of information to be processed in a certain task, thereby successfully processing the task which is required to be completed by a user and being beneficial to improving user experience.

A first aspect of an embodiment of the present application provides a task processing method, which may be implemented by using a target model, where the method includes:

when the user needs to complete the first task, a first prompt from the user can be acquired first, and the first prompt can comprise the first task and first information in the first task, so that the first prompt can be used for indicating to execute the first task on the first information input by the user.

After the first hint is obtained, the first hint may be input to a target model, and the target model may first perform feature extraction (e.g., decode or encode, etc.) on the first hint to obtain features of the first hint. After the features of the first prompt are obtained, the target model can process the features of the first prompt, so that a second prompt is obtained and output outwards, and the second prompt only comprises the first task, so that the second prompt is only used for describing the first task.

After the second prompt is obtained, the target model can also process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained and output externally. Thus, the target model can output the second prompt and the processing result of executing the first task on the first information, so that the first task which is required to be completed by the user is successfully processed.

From the above method, it can be seen that: when the user needs to complete the first task, a first prompt from the user can be acquired first, and the first prompt is input to the target model, wherein the first prompt is used for indicating to execute the first task on the first information input by the user. The object model may then process the first prompt to obtain features of the first prompt. The object model may then process the features of the first prompt to derive a second prompt, which is used only to indicate the first task. Finally, the target model can process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained. The target model thus successfully processes the first task that the user needs to complete. In the foregoing process, the target model may extract the second prompt from the first prompt, and obtain a processing result of the first task for executing the first task on the first information by using features of the first prompt in combination with the second prompt. Because the second prompt is only used for describing the first task and has smaller length, the target model can accurately capture the content of the first task under the effective guidance of the second prompt on the basis of the first prompt, so that the task processing direction is accurately controlled, and therefore, the processing result of the first task executed by the first information, which is finally obtained by the target model, is the correct result, the first task which is required to be completed by a user can be successfully processed, and the user experience is improved.

In one possible implementation, the following relationship exists between the first hint and the second hint: the first hint and the second hint may describe the first task in different expressions or the first hint and the second hint may describe the first task in the same way. In this way, the present solution may be applied to various scenarios, for example, the first prompt and the second prompt describe the scenario of the same content (first task) in completely different expressions, and for example, the first prompt and the second prompt describe the scenario of the same content in completely the same expressions, and so on.

In one possible implementation, the second hint comprises v characters, v being greater than or equal to 1, and based on the characteristics of the first hint, obtaining the second hint comprises: decoding the 1 st character of the second prompt to the i st character of the second prompt based on the characteristics of the first prompt to obtain the i+1 th character of the second prompt, i=1. In the foregoing implementation, the second hint is set to include v characters. After the features of the first prompt are obtained, the target model can decode the preset characters by utilizing the features of the first prompt, so that the 1 st character of the second prompt is obtained. The object model may then decode the 1 st character of the second hint using the features of the first hint to obtain the 2 nd character of the second hint. Then, the target model may decode the 1 st character of the second hint to the 2 nd character of the second hint using the features of the first hint to obtain the 3 rd character of the second hint. In this way, the target model can obtain and output a complete second prompt.

In one possible implementation manner, the processing result of executing the first task on the first information includes n characters, n is greater than or equal to 1, and based on the feature of the first prompt and the second prompt, obtaining the processing result of executing the first task on the first information includes: decoding the second prompt, the 1 st character of the processing result of the first task on the first information, and the j (th) character of the processing result of the first task on the first information based on the characteristics of the first prompt to obtain the j+1th character of the processing result of the first task on the first information, j=1, n-1, and decoding the second prompt based on the characteristics of the first prompt to obtain the 1 st character of the processing result of the first task on the first information. In the foregoing implementation manner, the processing result of executing the first task on the first information is set to include n characters. After the second prompt is obtained, the target model may decode the 1 st to v th characters of the second prompt by using the features of the first prompt, thereby obtaining the 1 st character of the processing result of executing the first task on the first information. Then, the object model may decode the 1 st to v th characters of the second hint and the 1 st character of the processing result of performing the first task on the first information by using the feature of the first hint, thereby obtaining the 2 nd character of the processing result of performing the first task on the first information. The target model may then decode the 1 st to v th characters of the second hint and the 1 st to 2 nd characters of the processing result of the first task performed on the first information using the characteristics of the first hint to obtain the 3 rd character of the processing result of the first task performed on the first information, and finally, the target model may decode the 1 st to v th characters of the second hint and the 1 st to n-1 st characters of the processing result of the first task performed on the first information using the characteristics of the first hint to obtain the nth character of the processing result of the first task performed on the first information. In this way, the target model can obtain and output the complete processing result of executing the first task on the first information.

A second aspect of an embodiment of the present application provides a model training method, including: acquiring a first prompt, wherein the first prompt is used for indicating to execute a first task on first information input by a user; inputting the first prompt into a model to be trained to obtain a second prompt and a processing result of executing the first task on the first information, wherein the second prompt is used for indicating the first task, and the model to be trained is used for: decoding or encoding the first prompt to obtain the characteristics of the first prompt; acquiring a second prompt based on the characteristics of the first prompt; based on the characteristics of the first prompt and the second prompt, obtaining a processing result of executing a first task on the first information; based on the second prompt and a processing result of executing the first task on the first information, training the model to be trained to obtain a target model, wherein the target model is used for processing the first task.

The target model obtained through training by the method has task processing capacity. When the user needs to complete the first task, a first prompt from the user can be acquired first, and the first prompt is input to the target model, wherein the first prompt is used for indicating to execute the first task on the first information input by the user. The object model may then process the first prompt to obtain features of the first prompt. The object model may then process the features of the first prompt to derive a second prompt, which is used only to indicate the first task. Finally, the target model can process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained. The target model thus successfully processes the first task that the user needs to complete. In the foregoing process, the target model may extract the second prompt from the first prompt, and obtain a processing result of the first task for executing the first task on the first information by using features of the first prompt in combination with the second prompt. Because the second prompt is only used for describing the first task and has smaller length, the target model can accurately capture the content of the first task under the effective guidance of the second prompt on the basis of the first prompt, so that the task processing direction is accurately controlled, and therefore, the processing result of the first task executed by the first information, which is finally obtained by the target model, is the correct result, the first task which is required to be completed by a user can be successfully processed, and the user experience is improved.

In one possible implementation, the method further includes: acquiring a third prompt, wherein the third prompt is used for indicating to execute a second task on second information input by a user; inputting the third prompt into the model to be trained to obtain a fourth prompt and a processing result of executing the second task on the second information, wherein the fourth prompt is used for indicating the second task; based on the second prompt and a processing result of executing the first task on the first information, training the model to be trained, and obtaining the target model includes: training the model to be trained based on the second prompt, the processing result of executing the first task on the first information, the fourth prompt and the processing result of executing the second task on the second information to obtain a target model, wherein the target model is used for processing the first task, the second task and the third task.

In one possible implementation, the first prompt and the second prompt describe the first task in different ways, or the first prompt and the second prompt describe the first task in the same way.

In one possible implementation, the third prompt and the fourth prompt describe the second task in different ways, or the third prompt and the fourth prompt describe the second task in the same way.

In one possible implementation manner, the second prompt includes v characters, v is greater than or equal to 1, and the model to be trained is used for decoding the 1 st character of the second prompt to the i-th character of the second prompt based on the characteristics of the first prompt, so as to obtain the i+1-th character of the second prompt, i=1.

In one possible implementation manner, the fourth prompt includes w characters, w is greater than or equal to 1, and the model to be trained is used for decoding the 1 st character of the fourth prompt to the k character of the fourth prompt based on the characteristics of the third prompt, so as to obtain the k+1 th character of the fourth prompt, k=1.

In one possible implementation manner, the processing result of executing the first task on the first information includes n characters, n is greater than or equal to 1, and the model to be trained is used for decoding the second prompt, the 1 st character of executing the processing result of executing the first task on the first information, and the j-th character of executing the processing result of executing the first task on the first information based on the characteristics of the first prompt, so as to obtain the j+1th character of executing the processing result of the first task on the first information, j=1,...

In one possible implementation manner, the processing result of executing the second task on the second information includes m characters, m is greater than or equal to 1, and the model to be trained is used for decoding the fourth prompt, the 1 st character of executing the processing result of the second task on the second information, and the p-th character of executing the processing result of the second task on the second information based on the feature of the third prompt, so as to obtain the p+1th character of executing the processing result of the second task on the second information, p=1,...

A third aspect of the embodiments of the present application provides a task processing device, where the device includes a target model, and the device includes: the acquisition module is used for acquiring a first prompt, wherein the first prompt is used for indicating to execute a first task on first information input by a user; the first processing module is used for decoding or encoding the first prompt to obtain the characteristics of the first prompt; the second processing module is used for acquiring a second prompt based on the characteristics of the first prompt, and the second prompt is used for indicating the first task; and the third processing module is used for acquiring a processing result of executing the first task on the first information based on the characteristics of the first prompt and the second prompt.

From the above device, it can be seen that: when the user needs to complete the first task, a first prompt from the user can be acquired first, and the first prompt is input to the target model, wherein the first prompt is used for indicating to execute the first task on the first information input by the user. The object model may then process the first prompt to obtain features of the first prompt. The object model may then process the features of the first prompt to derive a second prompt, which is used only to indicate the first task. Finally, the target model can process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained. The target model thus successfully processes the first task that the user needs to complete. In the foregoing process, the target model may extract the second prompt from the first prompt, and obtain a processing result of the first task for executing the first task on the first information by using features of the first prompt in combination with the second prompt. Because the second prompt is only used for describing the first task and has smaller length, the target model can accurately capture the content of the first task under the effective guidance of the second prompt on the basis of the first prompt, so that the task processing direction is accurately controlled, and therefore, the processing result of the first task executed by the first information, which is finally obtained by the target model, is the correct result, the first task which is required to be completed by a user can be successfully processed, and the user experience is improved.

In one possible implementation manner, the second prompt includes v characters, v is greater than or equal to 1, and the second processing module is configured to decode, based on the feature of the first prompt, the 1 st character of the second prompt to the i-th character of the second prompt, to obtain the i+1-th character of the second prompt, i=1.

In one possible implementation manner, the processing result of executing the first task on the first information includes n characters, n is greater than or equal to 1, and the third processing module is configured to decode, based on the feature of the first hint, the second hint, the 1 st character of executing the processing result of executing the first task on the first information to the j-th character of executing the processing result of executing the first task on the first information, to obtain the j+1th character of executing the processing result of executing the first task on the first information, where j=1.

A fourth aspect of embodiments of the present application provides a model training apparatus, the apparatus comprising: the first acquisition module is used for acquiring a first prompt, and the first prompt is used for indicating to execute a first task on first information input by a user; the first processing module is used for inputting the first prompt into the model to be trained to obtain a second prompt and a processing result of executing the first task on the first information, the second prompt is used for indicating the first task, and the model to be trained is used for: decoding or encoding the first prompt to obtain the characteristics of the first prompt; acquiring a second prompt based on the characteristics of the first prompt; based on the characteristics of the first prompt and the second prompt, obtaining a processing result of executing a first task on the first information; the training module is used for training the model to be trained based on the second prompt and the processing result of executing the first task on the first information to obtain a target model, and the target model is used for processing the first task.

The target model obtained by training the device has task processing capability. When the user needs to complete the first task, a first prompt from the user can be acquired first, and the first prompt is input to the target model, wherein the first prompt is used for indicating to execute the first task on the first information input by the user. The object model may then process the first prompt to obtain features of the first prompt. The object model may then process the features of the first prompt to derive a second prompt, which is used only to indicate the first task. Finally, the target model can process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained. The target model thus successfully processes the first task that the user needs to complete. In the foregoing process, the target model may extract the second prompt from the first prompt, and obtain a processing result of the first task for executing the first task on the first information by using features of the first prompt in combination with the second prompt. Because the second prompt is only used for describing the first task and has smaller length, the target model can accurately capture the content of the first task under the effective guidance of the second prompt on the basis of the first prompt, so that the task processing direction is accurately controlled, and therefore, the processing result of the first task executed by the first information, which is finally obtained by the target model, is the correct result, the first task which is required to be completed by a user can be successfully processed, and the user experience is improved.

In one possible implementation, the apparatus further includes: the second acquisition module is used for acquiring a third prompt, and the third prompt is used for indicating to execute a second task on second information input by a user; the second processing module is used for inputting a third prompt into the model to be trained to obtain a fourth prompt and a processing result of executing a second task on the second information, and the fourth prompt is used for indicating the second task; the training module is used for training the model to be trained based on the second prompt, the processing result of executing the first task on the first information, the fourth prompt and the processing result of executing the second task on the second information to obtain a target model, wherein the target model is used for processing the first task, the second task and the third task.

A fifth aspect of embodiments of the present application provides a task processing device, the device including a memory and a processor; the memory stores code, the processor being configured to execute the code, the task processing device performing the method as described in the first aspect or any one of the possible implementations of the first aspect when the code is executed.

A sixth aspect of embodiments of the present application provides a model training apparatus, the apparatus comprising a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the model training apparatus performs the method as described in the second aspect or any one of the possible implementations of the second aspect.

A seventh aspect of the embodiments of the present application provides a circuit system comprising a processing circuit configured to perform the method according to the first aspect, any one of the possible implementations of the first aspect, the second aspect or any one of the possible implementations of the second aspect.

An eighth aspect of the embodiments of the present application provides a chip system, which includes a processor, configured to invoke a computer program or a computer instruction stored in a memory, to cause the processor to perform a method according to any one of the first aspect, the second aspect, or any one of the second aspect.

In one possible implementation, the processor is coupled to the memory through an interface.

In one possible implementation, the system on a chip further includes a memory having a computer program or computer instructions stored therein.

A ninth aspect of the embodiments of the present application provides a computer storage medium storing a computer program which, when executed by a computer, causes the computer to implement a method as described in the first aspect, any one of the possible implementations of the first aspect, the second aspect or any one of the possible implementations of the second aspect.

A tenth aspect of the embodiments of the present application provides a computer program product storing instructions that, when executed by a computer, cause the computer to carry out the method according to any one of the first aspect, the second aspect or any one of the possible implementations of the second aspect.

In this embodiment of the present application, when the user needs to complete the first task, a first prompt derived from the user may be acquired first, and the first prompt is input to the target model, where the first prompt is used to indicate that the first task is executed on the first information input by the user. The object model may then process the first prompt to obtain features of the first prompt. The object model may then process the features of the first prompt to derive a second prompt, which is used only to indicate the first task. Finally, the target model can process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained. The target model thus successfully processes the first task that the user needs to complete. In the foregoing process, the target model may extract the second prompt from the first prompt, and obtain a processing result of the first task for executing the first task on the first information by using features of the first prompt in combination with the second prompt. Because the second prompt is only used for describing the first task and has smaller length, the target model can accurately capture the content of the first task under the effective guidance of the second prompt on the basis of the first prompt, so that the task processing direction is accurately controlled, and therefore, the processing result of the first task executed by the first information, which is finally obtained by the target model, is the correct result, the first task which is required to be completed by a user can be successfully processed, and the user experience is improved.

Drawings

FIG. 1 is a schematic diagram of a structure of an artificial intelligence main body frame;

FIG. 2a is a schematic diagram of a task processing system according to an embodiment of the present application;

FIG. 2b is a schematic diagram of another architecture of a task processing system according to an embodiment of the present application;

FIG. 2c is a schematic diagram of a related device for task processing according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a system 100 architecture according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a target model according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart of a task processing method according to an embodiment of the present application;

FIG. 6 is another schematic diagram of a target model according to an embodiment of the present disclosure;

FIG. 7 is another schematic structural diagram of a target model according to an embodiment of the present disclosure;

FIG. 8 is a schematic flow chart of a model training method according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a task processing device according to an embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram of a model training device according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an execution device according to an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a training device according to an embodiment of the present disclosure;

Fig. 13 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which the embodiments of the application described herein have been described for objects of the same nature. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

In the related art, a certain task and information to be processed in the task may be expressed as a hint in natural language, and the hint may be input to a large-scale language model. Then, the large-scale language model can process the prompt to obtain the characteristics of the prompt, and further process the characteristics of the prompt to obtain the processing result of the information to be processed in the task, which is equivalent to completing the task. For example, a certain translation task is set, and information to be translated in the translation task is a Chinese text. First, a hint may be generated that is: "translate the following from Chinese to English: { Chinese text }). Then, after the prompt can be input into the large-scale language model, the model can output corresponding English text, so that the translation task is completed.

In the above process, the prompt is used as an input of a large-scale language model, and is used for describing the task and the information to be processed. In the process of obtaining the processing result of the information to be processed, the large-scale language model needs to constantly know the content related to the task from the prompt to control the task processing direction, and as the length of the prompt is usually larger, the model can misunderstand the task as another task, namely control the misdirection of task processing, so that the finally obtained processing result is a wrong result, the task cannot be completed, and the user experience is low. Still as in the example above, the model may misunderstand the translation task as a summary task based on the prompt, and the task processing direction for chinese text is not text translation, but rather text summary, resulting in a summary of chinese text rather than english text that is ultimately output.

To solve the above-described problems, embodiments of the present application provide a task processing method that can be implemented in combination with artificial intelligence (artificial intelligence, AI) technology. AI technology is a technical discipline that utilizes digital computers or digital computer controlled machines to simulate, extend and extend human intelligence, and obtains optimal results by sensing environments, acquiring knowledge and using knowledge. In other words, artificial intelligence technology is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Data processing using artificial intelligence is a common application of artificial intelligence.

First, the overall workflow of the artificial intelligence system will be described, referring to fig. 1, fig. 1 is a schematic structural diagram of an artificial intelligence subject framework, and the artificial intelligence subject framework is described below in terms of two dimensions, namely, an "intelligent information chain" (horizontal axis) and an "IT value chain" (vertical axis). Where the "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process. The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure of

The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to the internet of things data of the traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capability

After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Intelligent product and industry application

The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, smart city etc.

Next, several application scenarios of the present application are described.

Fig. 2a is a schematic structural diagram of a task processing system according to an embodiment of the present application, where the task processing system includes a user device and a data processing device. The user equipment comprises intelligent terminals such as a mobile phone, a personal computer or an information processing center. The user equipment is an initiating terminal of task processing, and is used as an initiating party of task processing request, and the user usually initiates the request through the user equipment.

The data processing device may be a device or a server having a data processing function, such as a cloud server, a web server, an application server, and a management server. The data processing equipment receives a task processing request from the intelligent terminal through the interactive interface, and performs task processing in the modes of machine learning, deep learning, searching, reasoning, decision making and the like through a memory for storing data and a processor link for data processing. The memory in the data processing device may be a generic term comprising a database storing the history data locally, either on the data processing device or on another network server.

In the task processing system shown in fig. 2a, the user device may receive an instruction from the user, for example, the user device may obtain a prompt input/selected by the user, and then initiate a request to the data processing device, so that the data processing device executes a task processing application for the prompt obtained by the user device, thereby obtaining a result corresponding to the prompt. For example, the user device may obtain a prompt entered by the user describing a task and information to be processed in the task. Then, the user equipment initiates a task processing request to the data processing equipment, so that the data processing equipment performs a series of processing on the prompt based on the task processing request, and a result corresponding to the prompt, namely a processing result of information to be processed in the task, is obtained.

In fig. 2a, a data processing device may perform the task processing method of the embodiments of the present application.

Fig. 2b is another schematic structural diagram of a task processing system provided in the embodiment of the present application, in fig. 2b, a user device directly serves as a data processing device, and the user device can directly obtain an input from a user and directly process the input by hardware of the user device, and a specific process is similar to that of fig. 2a, and reference is made to the above description and will not be repeated here.

In the task processing system shown in fig. 2b, the user device may obtain a prompt input by the user, where the prompt is used to describe a task and information to be processed in the task. And then, the user equipment performs a series of processing on the prompt, so that a result corresponding to the prompt, namely a processing result of the information to be processed in the task, is obtained.

In fig. 2b, the user equipment itself may perform the task processing method according to the embodiment of the present application.

Fig. 2c is a schematic diagram of a related device for task processing according to an embodiment of the present application.

The user device in fig. 2a and 2b may be the local device 301 or the local device 302 in fig. 2c, and the data processing device in fig. 2a may be the executing device 210 in fig. 2c, where the data storage system 250 may store data to be processed of the executing device 210, and the data storage system 250 may be integrated on the executing device 210, or may be disposed on a cloud or other network server.

The processors in fig. 2a and 2b may perform data training/machine learning/deep learning through a neural network model or other models (e.g., a model based on a support vector machine), and perform task processing applications on the image using the model obtained by the data final training or learning, thereby obtaining corresponding processing results.

Fig. 3 is a schematic diagram of a system 100 architecture provided in an embodiment of the present application, in fig. 3, an execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140, where the input data may include in an embodiment of the present application: each task to be scheduled, callable resources, and other parameters.

In the process of preprocessing input data by the execution device 110, or performing relevant processing (such as performing functional implementation of a neural network in the present application) such as calculation by the calculation module 111 of the execution device 110, the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing, or may store data, instructions, etc. obtained by corresponding processing in the data storage system 150.

Finally, the I/O interface 112 returns the processing results to the client device 140 for presentation to the user.

It should be noted that the training device 120 may generate, based on different training data, a corresponding target model/rule for different targets or different tasks, where the corresponding target model/rule may be used to achieve the targets or complete the tasks, thereby providing the user with the desired result. Wherein the training data may be stored in database 130 and derived from training samples collected by data collection device 160.

In the case shown in FIG. 3, the user may manually give input data, which may be manipulated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.

It should be noted that fig. 3 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 3, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110. As shown in fig. 3, the neural network may be trained in accordance with the training device 120.

The embodiment of the application also provides a chip, which comprises the NPU. The chip may be provided in an execution device 110 as shown in fig. 3 for performing the calculation of the calculation module 111. The chip may also be provided in the training device 120 as shown in fig. 3 to complete the training work of the training device 120 and output the target model/rule.

The neural network processor NPU is mounted as a coprocessor to a main central processing unit (central processingunit, CPU) (host CPU) which distributes tasks. The core part of the NPU is an operation circuit, and the controller controls the operation circuit to extract data in a memory (a weight memory or an input memory) and perform operation.

In some implementations, the arithmetic circuitry includes a plurality of processing units (PEs) internally. In some implementations, the operational circuit is a two-dimensional systolic array. The arithmetic circuitry may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the operational circuitry is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit takes the data corresponding to the matrix B from the weight memory and caches the data on each PE in the arithmetic circuit. The operation circuit takes the matrix A data and the matrix B from the input memory to perform matrix operation, and the obtained partial result or the final result of the matrix is stored in an accumulator (accumulator).

The vector calculation unit may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, etc. For example, the vector computation unit may be used for network computation of non-convolutional/non-FC layers in a neural network, such as pooling, batch normalization (batch normalization), local response normalization (localresponse normalization), and the like.

In some implementations, the vector computation unit can store the vector of processed outputs to a unified buffer. For example, the vector calculation unit may apply a nonlinear function to an output of the arithmetic circuit, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to an arithmetic circuit, for example for use in subsequent layers in a neural network.

The unified memory is used for storing input data and output data.

The weight data is transferred to the input memory and/or the unified memory directly by the memory cell access controller (direct memory access controller, DMAC), the weight data in the external memory is stored in the weight memory, and the data in the unified memory is stored in the external memory.

And a bus interface unit (bus interface unit, BIU) for implementing interaction among the main CPU, the DMAC and the instruction fetch memory through a bus.

The instruction fetching memory (instruction fetch buffer) is connected with the controller and used for storing instructions used by the controller;

And the controller is used for calling the instruction which refers to the cache in the memory and controlling the working process of the operation accelerator.

Typically, the unified memory, input memory, weight memory, and finger memory are On-Chip (On-Chip) memories, and the external memory is a memory external to the NPU, which may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), a high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.

Since the embodiments of the present application relate to a large number of applications of neural networks, for ease of understanding, related terms and related concepts of the neural networks related to the embodiments of the present application will be described below.

(1) Neural network

The neural network may be composed of neural units, which may refer to an arithmetic unit having xs and intercept 1 as inputs, and the output of the arithmetic unit may be:

where s=1, 2, … … n, n is a natural number greater than 1, ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by joining together a number of the above-described single neural units, i.e., the output of one neural unit may be the input of another. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

The operation of each layer in a neural network can be described by the mathematical expression y=a (wx+b): the operation of each layer in a physical layer neural network can be understood as the transformation of input space into output space (i.e., row space to column space of the matrix) is accomplished by five operations on the input space (set of input vectors), including: 1. dimension increasing/decreasing; 2. zoom in/out; 3. rotating; 4. translating; 5. "bending". Wherein operations of 1, 2, 3 are completed by Wx, operation of 4 is completed by +b, and operation of 5 is completed by a (). The term "space" is used herein to describe two words because the object being classified is not a single thing, but rather a class of things, space referring to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value of a neuron in the layer neural network. The vector W determines the spatial transformation of the input space into the output space described above, i.e. the weights W of each layer control how the space is transformed. The purpose of training the neural network is to finally obtain a weight matrix (a weight matrix formed by a plurality of layers of vectors W) of all layers of the trained neural network. Thus, the training process of the neural network is essentially a way to learn and control the spatial transformation, and more specifically to learn the weight matrix.

Since it is desirable that the output of the neural network is as close as possible to the value actually desired, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually desired target value and then according to the difference between the two (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be predicted to be lower, and the adjustment is continued until the neural network can predict the actually desired target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and the training of the neural network becomes the process of reducing the loss as much as possible.

(2) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the parameter in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial neural network model are updated by back propagation of the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

The method provided in the present application is described below from the training side of the neural network and the application side of the neural network.

The model training method provided by the embodiment of the application relates to data sequence processing, and can be particularly applied to methods such as data training, machine learning, deep learning and the like, and intelligent information modeling, extraction, preprocessing, training and the like of symbolizing and formalizing training data (for example, a first prompt and a third prompt in the model training method provided by the embodiment of the application) are performed, so that a trained neural network (for example, a target model in the model training method provided by the embodiment of the application) is finally obtained; in addition, the task processing method provided in the embodiment of the present application may use the trained neural network, and input data (for example, the first prompt in the task processing method provided in the embodiment of the present application) into the trained neural network, so as to obtain output data (for example, a processing result of executing the first task on the first information in the task processing method provided in the embodiment of the present application). It should be noted that, the model training method and the task processing method provided in the embodiments of the present application are inventions based on the same concept, and may be understood as two parts in a system or two stages of an overall process: such as a model training phase and a model application phase.

The task processing method provided by the embodiment of the application can be realized through a target model (also can be called a novel large-scale language model), and the structure of the target model is briefly introduced below. Fig. 4 is a schematic structural diagram of a target model provided in this embodiment, as shown in fig. 4, an input end of the target model may be configured to receive a first prompt from the outside and a second prompt from the target model, where the two prompts are both used to describe a first task, and an output end of the target model may output the second prompt to the outside and a processing result of the first task for executing the first task on the first information in the first task. To understand the workflow of the object model shown in fig. 4, the workflow will be described with reference to fig. 5, and fig. 5 is a schematic flow chart of a task processing method according to an embodiment of the present application, as shown in fig. 5, where the method includes:

501. a first prompt is obtained, the first prompt being for indicating to perform a first task on first information entered by a user.

In this embodiment, when the user needs to complete the first task, the first prompt from the user may be acquired first, where the first prompt may include the first task and first information in the first task (may also be referred to as information to be processed in the first task), so that the first prompt may be used to indicate that the first task is executed on the first information input by the user.

Specifically, the first prompt may be obtained in a variety of ways:

(1) And receiving the first task and the first information in the first task which are directly input by the user, and converting the first task and the first information into a first prompt by utilizing a prompt template corresponding to the first task. Then, the first prompt may be used to indicate that the first task is to be performed on the first information entered by the user.

For example, let us say that the user needs to translate the chinese text into the english text, the translation task t between the chinese text and the english text input by the user can be received first, and the information x to be translated input by the user, where the information x is a chinese text, can be presented as "how small the text is. After t and x are obtained, t and x are converted into a prompt P (t, x) by using a prompt template corresponding to t, and the P (t, x) can be presented as 'translating the following Chinese into English': { where the small is } "P (t, x) can also be presented as" { where the small is } "translating the above from chinese to english", P (t, x) can also be presented as "chinese→english, { where the small is }, etc. It can be seen that hint P (t, x) is used to describe translation task t as well as chinese text x.

(2) The first prompt is received directly from the user, and the first prompt is generally composed of the first task and first information in the first task, so the first prompt can be used for indicating to execute the first task on the first information input by the user.

For example, assuming that the user needs to translate chinese text into english text, the prompt P (t, x) entered by the user may be received directly, P (t, x) may be presented as "translate chinese into english: { where the small is } "P (t, x) can also be presented as" { where the small is } "translating the above from chinese to english", P (t, x) can also be presented as "chinese→english, { where the small is }, etc. The prompt P (t, x) is used for describing a translation task t between the Chinese text and the English text, and information x to be translated, wherein the information x is a Chinese text and can be presented as' little clear where.

502. And decoding or encoding the first prompt to obtain the characteristics of the first prompt.

After the first prompt is obtained, the first prompt can be input into a target model, and the target model can firstly extract the characteristics of the first prompt, so that the characteristics of the first prompt are obtained.

Specifically, the object model may obtain the features of the first prompt by:

(1) The target model is provided to only comprise a decoder, the input end of the decoder is used as the input end of the whole target model, and the output end of the decoder is used as the output end of the whole target model. Then, after receiving the first hint, the decoder may decode the first hint to obtain the characteristics of the first hint. It should be noted that, the first hint includes u characters (u is a positive integer greater than or equal to 1), so that the features of the first hint also include the features of the u characters of the first hint.

As still another example, as shown in FIG. 6 (FIG. 6 is another schematic diagram of the object model provided in the embodiment of the present application), the object model is provided to include a decoder, and P (t, x) is prompted to be a string including u characters, P respectively ₁ 、P ₂ 、...P _u . After P (t, x) is obtained, the decoder may decode P (t, x) to obtain the u character features of P (t, x).

(2) The target model is provided with an encoder and a decoder, wherein the input end of the encoder and the first input end of the decoder are used as the input end of the whole target model, the output end of the encoder is connected with the second input end of the decoder, and the output end of the decoder is used as the output end of the whole target model. Then, the encoder may encode the first hint after receiving the first hint, thereby obtaining features of the first hint, and send the features of the first hint to the decoder. It should be noted that, the first hint includes u characters, so that the features of the first hint also include the features of the u characters of the first hint.

Still as described above, as shown in fig. 7 (fig. 7 is another schematic structural diagram of a target model provided in an embodiment of the present application), the target model is provided to include an encoder and a decoder, and P (t, x) is prompted to be a packet Character strings containing u characters, P respectively ₁ 、P ₂ 、...P _u . After P (t, x) is obtained, the encoder may first decode P (t, x) to obtain the characteristics of u characters of P (t, x), and send the characteristics of u characters of P (t, x) to the decoder.

503. Based on the characteristics of the first prompt, a second prompt is acquired, the second prompt being used to indicate the first task.

After the features of the first prompt are obtained, the target model can perform a series of processing on the features of the first prompt, so as to obtain a second prompt, and output the second prompt to the outside, wherein the second prompt only comprises the first task, and therefore the second prompt is only used for describing the first task.

Specifically, the target model may obtain the second hint by:

let the second hint contain v characters (v is a positive integer greater than or equal to 1). After obtaining the characteristics of the first prompt, the decoder may first decode a preset character (which may also be referred to as a sequence start character, which may be understood as a command character, the content of which is usually nonsensical) using the characteristics of the first prompt, thereby obtaining the 1 st character of the second prompt. The decoder may then decode the 1 st character of the second hint using the features of the first hint to obtain the 2 nd character of the second hint. The decoder may then decode the 1 st character of the second hint to the 2 nd character of the second hint using the features of the first hint to obtain the 3 rd character of the second hint. In this way, the decoder can obtain and output the complete second prompt.

As still shown in the above examples, referring to FIG. 6 or FIG. 7, after the decoder obtains the characteristics of u characters of P (t, x), the decoder can use the characteristics to decode the sequence start character (beginning of sequence, BOS) to obtain the 1 st character P of hint Q (t) ₁ . The decoder can then continue to use this feature for the 1 st character P of Q (t) ₁ Decoding to obtain the 2 nd character P of Q (t) ₂ . The decoder can then continue to use this feature for character 1P of Q (t) ₁ And the 2 nd character P of Q (t) ₂ Decoding to obtain the 3 rd character P of Q (t) ₃ And so on, the decoder finally obtains the 1 st to the v th characters of Q (t), which are respectively Q ₁ 、Q ₂ 、...Q _v That is, Q (t) is obtained, and Q (t) is outputted to the outside.

More specifically, there may be a variety of relationships between the first hint and the second hint:

(1) The first hint and the second hint may describe the first task in different expressions. Still as in the above example, let P (t, x) appear as "translate the following Chinese to English: { where small is } "Q (t) can be presented as" text translation, chinese translation is english ", whereby it can be seen that both describe the translation task t between chinese text-english text by different expressions.

(2) The first hint and the second hint may describe the first task in the same expression. Still as in the above example, let P (t, x) appear as "chinese- > english, { where small is going }", Q (t) can appear as "chinese- > english", whereby it can be seen that both describe the translation task t between chinese text and english text by the same expression.

504. And acquiring a processing result of executing the first task on the first information based on the characteristics of the first prompt and the second prompt.

After the second prompt is obtained, the target model can further process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained, and the processing result of executing the first task on the first information is output to the outside. Thus, the target model can output the second prompt and the processing result of executing the first task on the first information, so that the first task which is required to be completed by the user is successfully processed.

Specifically, the target model may acquire a processing result of performing the first task on the first information by:

the processing result of executing the first task on the first information is set to include n characters (n is a positive integer greater than or equal to 1). After obtaining the second hint, the decoder may decode the 1 st character through the v-th character of the second hint using the feature of the first hint, thereby obtaining the 1 st character of the processing result of executing the first task on the first information. Then, the decoder may decode the 1 st to v th characters of the second hint and the 1 st character of the processing result of performing the first task on the first information by using the feature of the first hint, thereby obtaining the 2 nd character of the processing result of performing the first task on the first information. The decoder may then decode the 1 st to v th characters of the second hint, and the 1 st to 2 nd characters of the processing result of the first task performed on the first information, using the characteristics of the first hint, to obtain the 3 rd character of the processing result of the first task performed on the first information. In this way, the decoder can obtain and output the complete processing result of executing the first task on the first information.

As still shown in the above examples, referring to FIG. 6 or FIG. 7, after the decoder obtains Q (t), the decoder can use the characteristics of u characters of P (t, x) for all the characters Q of Q (t) ₁ 、Q ₂ 、...Q _v Decoding is performed to obtain the 1 st character y of the processing result y of the information x ₁ . The decoder can then continue to use this feature for all characters Q of Q (t) ₁ 、Q ₂ 、...Q _v The 1 st character y of y ₁ Decoding to obtain the 2 nd character y of y ₂ . The decoder can then continue to use this feature for all characters Q of Q (t) ₁ 、Q ₂ 、...Q _v And word 1 of ySymbol y ₁ To the 2 nd character y of y ₂ Decoding to obtain the 3 rd character y of y ₃ And so on, the decoder can finally obtain the 1 st to nth characters of y, which are respectively y ₁ 、y ₂ 、...y _n That is, y is obtained and y is outputted to the outside.

Since x appears as "where small is removed", accordingly, y may appear as "Where did Xiao Ming go".

Further, the object model provided by the embodiments of the present application may be compared with a model provided by a related art. It should be noted that, in the embodiment of the present application, for a certain task t and input x, an input prompt P (t, x) and an output prompt Q (t) may be defined, where these prompts are text sequences, that is, a character string containing a plurality of characters. The input prompt can be used as the input of the target model provided by the embodiment of the application and the input of the model provided by the related technology, and the output prompt and the original output y of the model provided by the related technology can be used as the new output of the target model provided by the embodiment of the application. Wherein, the comparison result is shown in table 1:

TABLE 1

Scheme for the production of a semiconductor device	Input device	Output of
			Prior Art	P(t,x)	y
Embodiments of the present application	P(t,x)	[Q(t),y]

Furthermore, in order to better understand the task processing method provided by the embodiments of the present application, the task processing method provided by the embodiments of the present application is further described below with reference to a plurality of application examples. These applications are shown for example in table 2:

TABLE 2

The foregoing is a detailed description of the task processing method provided in the embodiment of the present application, and the model training method provided in the embodiment of the present application will be described below. Fig. 8 is a schematic flow chart of a model training method according to an embodiment of the present application, as shown in fig. 8, where the method includes:

801. a first prompt is obtained, the first prompt being for indicating to perform a first task on first information entered by a user.

In this embodiment, when the model to be trained needs to be trained, a batch of training data may be acquired first, where the batch of training data may include a first prompt, where the first prompt is used to indicate that a first task is executed on first information input by a user. It should be noted that, the first real prompt corresponding to the first prompt is known (the first real prompt is only used to indicate the first task), and the real processing result of the first information is also known.

In one possible implementation, the batch of training data may also include a third prompt for instructing the execution of a second task on second information entered by the user (which may also be referred to as information to be processed in the second task). It should be noted that, the first task and the second task are different tasks, the second real prompt corresponding to the third prompt is known (the second real prompt is only used for indicating the second task), and the real processing result of the second information is also known.

802. Inputting the first prompt into a model to be trained to obtain a second prompt and a processing result of executing the first task on the first information, wherein the second prompt is used for indicating the first task, and the model to be trained is used for: decoding or encoding the first prompt to obtain the characteristics of the first prompt; acquiring a second prompt based on the characteristics of the first prompt; and acquiring a processing result of executing the first task on the first information based on the characteristics of the first prompt and the second prompt.

After the first prompt is obtained, the first prompt can be input to the model to be trained. Then the model to be trained may decode or encode the first prompt, resulting in the features of the first prompt. The model to be trained may then obtain a second prompt based on the characteristics of the first prompt, the second prompt being used only to indicate the first task. Then, the model to be trained can acquire a processing result of executing the first task on the first information based on the features of the first prompt and the second prompt.

In one possible implementation, after the third prompt is obtained, the third prompt may be input to the model to be trained. Then the model to be trained may decode or encode the third hint, resulting in features of the third hint. The model to be trained may then obtain a fourth prompt based on the characteristics of the third prompt, the fourth prompt being used only to indicate the second task. Then, the model to be trained can acquire a processing result of executing the second task on the second information based on the characteristics of the third prompt and the fourth prompt.

It should be noted that, for the description of step 802, reference may be made to the relevant description of steps 502 to 504 in the embodiment shown in fig. 5, which is not repeated here.

803. Based on the second prompt and a processing result of executing the first task on the first information, training the model to be trained to obtain a target model, wherein the target model is used for processing the first task.

After the second prompt output by the model to be trained and the processing result of executing the first task on the first information are obtained, a target loss can be constructed based on the second prompt, the first real prompt, the processing result of executing the first task on the first information and the real processing result of the first information, and the target loss can be used for indicating the difference between the second prompt and the first real prompt and the difference between the processing result of executing the first task on the first information and the real processing result of the first information. Then, the parameters of the model to be trained can be updated based on the target loss, and the model to be trained after the parameters are updated is obtained. And then continuing training the model to be trained after updating the parameters by using the next batch of training data until the model training conditions (such as convergence of the target model, etc.) are met, so as to obtain the target model in the embodiment shown in fig. 5, wherein the obtained target model can be used for processing the first task.

In one possible implementation manner, after the second prompt, the processing result of executing the first task on the first information, the fourth prompt and the processing result of executing the second task on the second information output by the model to be trained are obtained, the target loss is obtained by calculating the second prompt, the first real prompt, the processing result of executing the first task on the first information, the real processing result of the first information, the fourth prompt, the second real prompt, the processing result of executing the second task on the second information and the real processing result of the second information through a preset loss function, and the target loss can be used for indicating the difference between the second prompt and the first real prompt, the difference between the processing result of executing the first task on the first information and the real processing result of the first information, the difference between the fourth prompt and the second real prompt and the difference between the processing result of executing the second task on the second information and the real processing result of the second information. Then, the parameters of the model to be trained can be updated based on the target loss, and the model to be trained after the parameters are updated is obtained. Then, the model to be trained after updating the parameters can be continuously trained by using the next batch of training data until the model training conditions are met, so that a target model is obtained, and the obtained target model is used for processing a first task, a second task and a third task, wherein the first task, the second task and the third task are different tasks, for example, the first task is a text translation task, the second task is a text abstract task, the third task is a question-answer task and the like.

It should be understood that, in this embodiment, only the training data describing the first task and the second task is used to train the target model capable of processing the first task, the second task and the third task, which is schematically described, and the number of task types described by the training data is not limited, and the number of task types that can be processed by the target model is not limited. In practical applications, training data describing different tasks of type a may be utilized to train a target model capable of handling different tasks of type B, which typically include tasks of type a, with B being greater than a.

The target model obtained through training in the embodiment of the application has task processing capability. When the user needs to complete the first task, a first prompt from the user can be acquired first, and the first prompt is input to the target model, wherein the first prompt is used for indicating to execute the first task on the first information input by the user. The object model may then process the first prompt to obtain features of the first prompt. The object model may then process the features of the first prompt to derive a second prompt, which is used only to indicate the first task. Finally, the target model can process the characteristics of the first prompt and the second prompt, so that a processing result of executing the first task on the first information in the first task is obtained. The target model thus successfully processes the first task that the user needs to complete. In the foregoing process, the target model may extract the second prompt from the first prompt, and obtain a processing result of the first task for executing the first task on the first information by using features of the first prompt in combination with the second prompt. Because the second prompt is only used for describing the first task and has smaller length, the target model can accurately capture the content of the first task under the effective guidance of the second prompt on the basis of the first prompt, so that the task processing direction is accurately controlled, and therefore, the processing result of the first task executed by the first information, which is finally obtained by the target model, is the correct result, the first task which is required to be completed by a user can be successfully processed, and the user experience is improved.

Further, in the embodiment of the present application, using training data describing the first task and the second task, a target model capable of processing the first task, the second task, and the third task may be trained. Therefore, the training mode provided by the embodiment of the application can enable the model to have certain task generalization capability, so that different requirements of users are met, and user experience is further improved.

The foregoing is a detailed description of the model training method provided in the embodiment of the present application, and the task processing device and the model training device provided in the embodiment of the present application will be described below. Fig. 9 is a schematic structural diagram of a task processing device according to an embodiment of the present application, as shown in fig. 9, where the device includes:

the acquiring module 901 is configured to acquire a first prompt, where the first prompt is used to instruct to execute a first task on first information input by a user;

a first processing module 902, configured to decode or encode the first hint to obtain a feature of the first hint;

the second processing module 903 is configured to obtain a second prompt based on the feature of the first prompt, where the second prompt is used to indicate the first task;

and the third processing module 904 is configured to obtain a processing result of executing the first task on the first information based on the feature of the first prompt and the second prompt.

In one possible implementation manner, the second prompt includes v characters, v is greater than or equal to 1, and the second processing module 903 is configured to decode, based on the feature of the first prompt, the 1 st character of the second prompt to the i-th character of the second prompt, to obtain the i+1-th character of the second prompt, where i=1, v-1, and the 1-th character of the second prompt is obtained by decoding, based on the feature of the first prompt, the preset characters.

In one possible implementation manner, the processing result of executing the first task on the first information includes n characters, n is greater than or equal to 1, and the third processing module 904 is configured to decode, based on the feature of the first hint, the second hint, the 1 st character of executing the processing result of the first task on the first information, to the j-th character of executing the processing result of the first task on the first information, to obtain the j+1th character of executing the processing result of the first task on the first information, where j=1.

Fig. 10 is a schematic structural diagram of a model training device according to an embodiment of the present application, as shown in fig. 10, where the device includes:

a first obtaining module 1001, configured to obtain a first prompt, where the first prompt is used to instruct to perform a first task on first information input by a user;

the first processing module 1002 is configured to input a first prompt to a model to be trained, obtain a second prompt, and execute a processing result of a first task on the first information, where the second prompt is used to indicate the first task, and the model to be trained is configured to: decoding or encoding the first prompt to obtain the characteristics of the first prompt; acquiring a second prompt based on the characteristics of the first prompt; based on the characteristics of the first prompt and the second prompt, obtaining a processing result of executing a first task on the first information;

the training module 1003 is configured to train the model to be trained based on the second prompt and a processing result of executing the first task on the first information, to obtain a target model, where the target model is used to process the first task.

In one possible implementation, the apparatus further includes: the second acquisition module is used for acquiring a third prompt, and the third prompt is used for indicating to execute a second task on second information input by a user; the second processing module is used for inputting a third prompt into the model to be trained to obtain a fourth prompt and a processing result of executing a second task on the second information, and the fourth prompt is used for indicating the second task; the training module 1003 is configured to train the model to be trained based on the second prompt, the processing result of executing the first task on the first information, the fourth prompt, and the processing result of executing the second task on the second information, to obtain a target model, where the target model is used to process the first task, the second task, and the third task.

It should be noted that, because the content of information interaction and execution process between the modules/units of the above-mentioned apparatus is based on the same concept as the method embodiment of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and specific content may refer to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.

The embodiment of the application also relates to an execution device, and fig. 11 is a schematic structural diagram of the execution device provided in the embodiment of the application. As shown in fig. 11, the execution device 1100 may be embodied as a mobile phone, a tablet, a notebook, a smart wearable device, a server, etc., which is not limited herein. The task processing device described in the corresponding embodiment of fig. 9 may be disposed on the execution device 1100, so as to implement the function of task processing in the corresponding embodiment of fig. 5. Specifically, the execution apparatus 1100 includes: a receiver 1101, a transmitter 1102, a processor 1103 and a memory 1104 (where the number of processors 1103 in the execution device 1100 may be one or more, one processor is exemplified in fig. 11), wherein the processor 1103 may comprise an application processor 11031 and a communication processor 11032. In some embodiments of the present application, the receiver 1101, transmitter 1102, processor 1103 and memory 1104 may be connected by a bus or other means.

The memory 1104 may include read-only memory and random access memory and provides instructions and data to the processor 1103. A portion of the memory 1104 may also include non-volatile random access memory (non-volatile randomaccess memory, NVRAM). The memory 1104 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.

The processor 1103 controls the operation of the execution device. In a specific application, the individual components of the execution device are coupled together by a bus system, which may include, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.

The method disclosed in the embodiments of the present application may be applied to the processor 1103 or implemented by the processor 1103. The processor 1103 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1103. The processor 1103 may be a general purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (application specific integratedcircuit, ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The processor 1103 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1104, and the processor 1103 reads information in the memory 1104, and in combination with the hardware, performs the steps of the method described above.

The receiver 1101 is operable to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 1102 may be used to output numeric or character information through a first interface; the transmitter 1102 may also be configured to send instructions to the disk stack via the first interface to modify data in the disk stack; the transmitter 1102 may also include a display device such as a display screen.

In this embodiment of the present application, in an instance, the processor 1103 is configured to generate, by using the target model in the corresponding embodiment of fig. 5, a processing result of information to be processed in a certain task.

The embodiment of the application also relates to training equipment, and fig. 12 is a schematic structural diagram of the training equipment provided by the embodiment of the application. As shown in fig. 12, exercise device 1200 is implemented by one or more servers, exercise device 1200 may vary considerably in configuration or performance, and may include one or more central processing units (centralprocessing units, CPU) 1212 (e.g., one or more processors) and memory 1232, one or more storage media 1230 (e.g., one or more mass storage devices) storing application programs 1242 or data 1244. Wherein memory 1232 and storage medium 1230 can be transitory or persistent. The program stored on storage medium 1230 may include one or more modules (not shown), each of which may include a series of instruction operations for use in training devices. Still further, central processor 1212 may be configured to communicate with storage medium 1230 to execute a series of instruction operations in storage medium 1230 on exercise device 1200.

Training apparatus 1200 may also include one or more power sources 1226, one or more wired or wireless network interfaces 1250, one or more input/output interfaces 1258; or one or more operating systems 1241, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

Specifically, the training apparatus may perform the model training method in the corresponding embodiment of fig. 8.

The embodiments of the present application also relate to a computer storage medium in which a program for performing signal processing is stored, which when run on a computer causes the computer to perform the steps as performed by the aforementioned performing device or causes the computer to perform the steps as performed by the aforementioned training device.

Embodiments of the present application also relate to a computer program product storing instructions that, when executed by a computer, cause the computer to perform steps as performed by the aforementioned performing device or cause the computer to perform steps as performed by the aforementioned training device.

The execution device, training device or terminal device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the execution device to perform the data processing method described in the above embodiment, or to cause the chip in the training device to perform the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), etc.

Specifically, referring to fig. 13, fig. 13 is a schematic structural diagram of a chip provided in an embodiment of the present application, where the chip may be represented as a neural network processor NPU 1300, and the NPU 1300 is mounted as a coprocessor on a main CPU (HostCPU), and the HostCPU distributes tasks. The core part of the NPU is an arithmetic circuit 1303, and the controller 1304 controls the arithmetic circuit 1303 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 1303 includes a plurality of processing units (PEs) inside. In some implementations, the operation circuit 1303 is a two-dimensional systolic array. The arithmetic circuit 1303 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1303 is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1302 and buffers the data on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 1301 and performs matrix operation with matrix B, and the partial result or the final result of the matrix obtained is stored in an accumulator (accumulator) 1308.

Unified memory 1306 is used to store input data and output data. The weight data is directly transferred to the weight memory 1302 through the memory cell access controller (Direct Memory Access Controller, DMAC) 1305. The input data is also carried into the unified memory 1306 through the DMAC.

BIU Bus Interface Unit, bus interface unit 1313, is used for the AXI bus to interact with the DMAC and finger memory (Instruction Fetch Buffer, IFB) 1309.

The bus interface unit 1313 (Bus Interface Unit, abbreviated as BIU) is configured to obtain an instruction from the external memory by the instruction fetch memory 1309, and is also configured to obtain raw data of the input matrix a or the weight matrix B from the external memory by the memory unit access controller 1305.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1306 or to transfer weight data to the weight memory 1302 or to transfer input data to the input memory 1301.

The vector calculation unit 1307 includes a plurality of operation processing units that perform further processing, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and the like, on the output of the operation circuit 1303, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization (batch normalization), pixel-level summation, up-sampling of a predicted label plane and the like.

In some implementations, the vector computation unit 1307 can store the vector of processed outputs to the unified memory 1306. For example, the vector calculation unit 1307 may perform a linear function; alternatively, a nonlinear function is applied to the output of the arithmetic circuit 1303, for example, to linearly interpolate the predicted label plane extracted by the convolution layer, and then, for example, to accumulate a vector of values to generate an activation value. In some implementations, vector computation unit 1307 generates a normalized value, a pixel-level summed value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1303, for example for use in subsequent layers in a neural network.

An instruction fetch memory (instruction fetch buffer) 1309 connected to the controller 1304 for storing instructions used by the controller 1304;

the unified memory 1306, the input memory 1301, the weight memory 1302, and the finger memory 1309 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture.

The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a training device, or a network device, etc.) to perform the method described in the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Claims

1. A method for task processing, wherein the method is implemented by a target model, the method comprising:

acquiring a first prompt, wherein the first prompt is used for indicating to execute a first task on first information input by a user;

decoding or encoding the first prompt to obtain the characteristics of the first prompt;

based on the characteristics of the first prompt, a second prompt is obtained, and the second prompt is used for indicating the first task;

and acquiring a processing result of executing the first task on the first information based on the characteristics of the first prompt and the second prompt.

2. The method of claim 1, wherein the first prompt and the second prompt describe the first task in different ways.

3. The method of claim 1 or 2, wherein the second hint comprises v characters, v being ≡ 1, wherein the obtaining the second hint based on the characteristics of the first hint comprises:

decoding the 1 st character of the second prompt to the i th character of the second prompt based on the characteristics of the first prompt to obtain the i+1 th character of the second prompt, i=1.

4. The method of claim 3, wherein the processing result of the first task performed on the first information comprises n characters, n being greater than or equal to 1, wherein the obtaining the processing result of the first task performed on the first information based on the characteristics of the first hint and the second hint comprises:

decoding the second prompt, the 1 st character of the processing result of executing the first task on the first information, and the j (th) character of the processing result of executing the first task on the first information based on the characteristics of the first prompt to obtain the j+1th character of the processing result of executing the first task on the first information, j=1.

5. A method of model training, the method comprising:

inputting the first prompt to a model to be trained to obtain a second prompt and a processing result of executing the first task on the first information, wherein the second prompt is used for indicating the first task, and the model to be trained is used for: decoding or encoding the first prompt to obtain the characteristics of the first prompt; acquiring the second prompt based on the characteristics of the first prompt; based on the characteristics of the first prompt and the second prompt, obtaining a processing result of executing the first task on the first information;

And training the model to be trained based on the second prompt and the processing result of executing the first task on the first information to obtain a target model, wherein the target model is used for processing the first task.

6. The method of claim 5, wherein the method further comprises:

acquiring a third prompt, wherein the third prompt is used for indicating to execute a second task on second information input by a user;

inputting the third prompt into a model to be trained to obtain a fourth prompt and a processing result of executing the second task on the second information, wherein the fourth prompt is used for indicating the second task;

the training the model to be trained based on the second prompt and the processing result of executing the first task on the first information, and obtaining a target model includes:

and training the model to be trained based on the second prompt, the processing result of executing the first task on the first information, the fourth prompt and the processing result of executing the second task on the second information to obtain a target model, wherein the target model is used for processing the first task, the second task and the third task.

7. A method as claimed in claim 5 or 6, wherein the first prompt and the second prompt describe the first task in different ways.

8. The method according to any one of claims 5 to 7, wherein the second prompt contains v characters, v is greater than or equal to 1, the model to be trained is configured to decode the 1 st character of the second prompt to the i-th character of the second prompt based on the features of the first prompt, to obtain the i+1 th character of the second prompt, i=1, and v-1, and the 1 st character of the second prompt is obtained by decoding preset characters based on the features of the first prompt.

9. The method of claim 8, wherein the processing result of executing the first task on the first information includes n characters, n being greater than or equal to 1, the model to be trained is configured to decode, based on the characteristics of the first hint, a 1 st character of the processing result of executing the first task on the second hint, a 1 st character of the processing result of executing the first task on the first information, to a j+1th character of the processing result of executing the first task on the first information, j = 1.

10. A task processing device, the device comprising a target model, the device comprising:

the acquisition module is used for acquiring a first prompt, wherein the first prompt is used for indicating to execute a first task on first information input by a user;

the first processing module is used for decoding or encoding the first prompt to obtain the characteristics of the first prompt;

the second processing module is used for acquiring a second prompt based on the characteristics of the first prompt, and the second prompt is used for indicating the first task;

and the third processing module is used for acquiring a processing result of executing the first task on the first information based on the characteristics of the first prompt and the second prompt.

11. The apparatus of claim 10, wherein the first prompt and the second prompt describe the first task in different ways.

12. The apparatus of claim 10 or 11, wherein the second hint comprises v characters, v being greater than or equal to 1, and the second processing module is configured to decode a 1 st character of the second hint to an i-th character of the second hint based on a feature of the first hint, to obtain an i+1-th character of the second hint, i = 1.

13. The apparatus of claim 12, wherein the processing result of performing the first task on the first information includes n characters, n being greater than or equal to 1, wherein the third processing module is configured to decode, based on the characteristics of the first hint, a 1 st character of the processing result of performing the first task on the second hint, a 1 st character of the processing result of performing the first task on the first information, to a j+1th character of the processing result of performing the first task on the first information, j = 1, and wherein n-1, a 1 st character of the processing result of performing the first task on the first information, based on the characteristics of the first hint, decodes the second hint.

14. A model training apparatus, the apparatus comprising:

the first acquisition module is used for acquiring a first prompt, and the first prompt is used for indicating to execute a first task on first information input by a user;

the first processing module is configured to input the first prompt to a model to be trained, obtain a second prompt and a processing result of executing the first task on the first information, where the second prompt is used to indicate the first task, and the model to be trained is used to: decoding or encoding the first prompt to obtain the characteristics of the first prompt; acquiring the second prompt based on the characteristics of the first prompt; based on the characteristics of the first prompt and the second prompt, obtaining a processing result of executing the first task on the first information;

The training module is used for training the model to be trained based on the second prompt and the processing result of executing the first task on the first information to obtain a target model, and the target model is used for processing the first task.

15. The apparatus of claim 14, wherein the apparatus further comprises:

the second acquisition module is used for acquiring a third prompt, and the third prompt is used for indicating to execute a second task on second information input by a user;

the second processing module is used for inputting the third prompt into a model to be trained to obtain a fourth prompt and a processing result of executing the second task on the second information, and the fourth prompt is used for indicating the second task;

the training module is configured to train the model to be trained based on the second prompt, the processing result of executing the first task on the first information, the fourth prompt, and the processing result of executing the second task on the second information, so as to obtain a target model, where the target model is used to process the first task, the second task, and the third task.

16. The apparatus of claim 14 or 15, wherein the first prompt and the second prompt describe the first task in different ways.

17. The apparatus according to any one of claims 14 to 16, wherein the second prompt contains v characters, v being greater than or equal to 1, and the model to be trained is configured to decode, based on the features of the first prompt, the 1 st character of the second prompt to the i-th character of the second prompt, to obtain the i+1-th character of the second prompt, i=1, where v-1, and the 1 st character of the second prompt is obtained by decoding preset characters based on the features of the first prompt.

18. The apparatus of claim 17, wherein the processing result of performing the first task on the first information includes n characters, n being greater than or equal to 1, the model to be trained is configured to decode, based on characteristics of the first hint, a 1 st character of the processing result of performing the first task on the second hint, a 1 st character of the processing result of performing the first task on the first information, to a j+1th character of the processing result of performing the first task on the first information, j = 1.

19. A task processing device, characterized in that the device comprises a memory and a processor; the memory stores code, the processor being configured to execute the code, the task processing device performing the method of any of claims 1 to 9 when the code is executed.

20. A computer storage medium storing one or more instructions which, when executed by one or more computers, cause the one or more computers to implement the method of any one of claims 1 to 9.

21. A computer program product, characterized in that it stores instructions that, when executed by a computer, cause the computer to implement the method of any one of claims 1 to 9.