CN117094362A - Task processing method and related device - Google Patents

Task processing method and related device Download PDF

Info

Publication number
CN117094362A
CN117094362A CN202311358507.XA CN202311358507A CN117094362A CN 117094362 A CN117094362 A CN 117094362A CN 202311358507 A CN202311358507 A CN 202311358507A CN 117094362 A CN117094362 A CN 117094362A
Authority
CN
China
Prior art keywords
task
feature
tasks
model
adapter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311358507.XA
Other languages
Chinese (zh)
Other versions
CN117094362B (en
Inventor
辛毅
杜俊珑
鄢科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311358507.XA priority Critical patent/CN117094362B/en
Publication of CN117094362A publication Critical patent/CN117094362A/en
Application granted granted Critical
Publication of CN117094362B publication Critical patent/CN117094362B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a task processing method and a related device in the field of artificial intelligence, wherein a target general feature is determined according to acquired data to be processed through a pre-training model in a multi-task processing model; determining respective target private characteristics of a plurality of tasks according to reference characteristics generated when the data to be processed is processed by the data to be processed or the pre-training model by an adapter in the multi-task processing model, wherein a shared projection structure in the adapter is used for extracting reference general characteristics, knowledge extraction structures corresponding to the tasks are used for extracting the reference private characteristics of the tasks corresponding to the tasks based on the reference general characteristics, and the target private characteristics of the tasks are determined according to the reference private characteristics of the tasks; and determining a processing result of the task corresponding to the decoder according to the target private characteristic and the target general characteristic of the task corresponding to the decoder through each decoder in the multi-task processing model. In this way, the performance of the pre-trained model in the task is improved.

Description

Task processing method and related device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a task processing method and a related device.
Background
Today, the pre-trained fine tuning paradigm has met with significant success in many areas. However, as model parameters of the pre-trained model become larger and larger, more and more downstream tasks are applied, the cost required to individually fine tune all model parameters of the pre-trained model for each downstream task is extremely high, requiring high computational effort and storage resource support.
Based on this, an Adapter (Adapter) has been developed, which is a bottleneck structure with a very small amount of learnable parameters inserted into the pre-trained model. When the pre-training model is finely tuned aiming at a downstream task, only the model parameters of the adapter can be trained and adjusted, the original model parameters of the pre-training model are kept unchanged, and the effects similar to, and even better than, all the model parameters in the fine-tuning pre-training model can be achieved.
When the pre-training model is simultaneously applied to a plurality of downstream tasks, the implementation effect of the adaptor-based pre-training model fine adjustment scheme in the related art is generally not ideal enough, and the performance of the adaptor-inserted pre-training model in the downstream tasks is poor.
Disclosure of Invention
The embodiment of the application provides a task processing method and a related device, which are used for improving the performance of a pre-training model inserted with an adapter in a downstream task.
The first aspect of the application provides a task processing method, which comprises the following steps:
acquiring data to be processed;
determining target general characteristics according to the data to be processed through a pre-training model in the multi-task processing model; the multitasking model is used for executing a plurality of tasks based on the input data;
determining respective target private characteristics of a plurality of tasks according to the data to be processed or reference characteristics generated when the pre-training model processes the data to be processed through an adapter in the multi-task processing model; the adapter comprises a shared projection structure and a knowledge extraction structure corresponding to each of a plurality of tasks, wherein the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the corresponding tasks based on the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks;
determining a processing result of a task corresponding to the decoder according to the target private feature and the target general feature of the task corresponding to the decoder through each decoder in the multi-task processing model; the multi-task processing model comprises a plurality of decoders corresponding to the tasks.
A second aspect of the present application provides a task processing device, comprising:
The data acquisition module is used for acquiring data to be processed;
the first feature extraction module is used for determining target general features according to the data to be processed through a pre-training model in the multi-task processing model; the multitasking model is used for executing a plurality of tasks based on the input data;
the second feature extraction module is used for determining respective target private features of the tasks according to the data to be processed or reference features generated when the pre-training model processes the data to be processed through an adapter in the multi-task processing model; the adapter comprises a shared projection structure and a knowledge extraction structure corresponding to each of a plurality of tasks, wherein the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the corresponding tasks based on the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks;
the decoding module is used for determining the processing result of the task corresponding to the decoder according to the target private characteristic and the target general characteristic of the task corresponding to the decoder through each decoder in the multi-task processing model; the multi-task processing model comprises a plurality of decoders corresponding to the tasks.
A third aspect of the application provides a computer apparatus comprising a processor and a memory:
the memory is used for storing a computer program;
the processor is configured to execute the steps of the task processing method according to the first aspect described above according to the computer program.
A fourth aspect of the present application provides a computer-readable storage medium storing a computer program for executing the steps of the task processing method described in the first aspect.
A fifth aspect of the application provides a computer program product or computer program comprising computer instructions stored on a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps of the task processing method described in the first aspect.
From the above technical solutions, the embodiment of the present application has the following advantages:
according to the task processing method provided by the embodiment of the application, the target general characteristics are determined according to the acquired data to be processed through the pre-training model in the multi-task processing model for executing a plurality of tasks based on the input data; then, determining respective target private characteristics of a plurality of tasks according to reference characteristics generated when the data to be processed is processed or the data to be processed is processed by a pre-training model through an adapter in a multi-task processing model, wherein the adapter comprises a shared projection structure and a knowledge extraction structure corresponding to each of the plurality of tasks, the shared projection structure is used for extracting reference general characteristics, the knowledge extraction structure is used for extracting the reference private characteristics of the corresponding tasks based on the reference general characteristics, and the target private characteristics of the tasks are determined according to the reference private characteristics of the tasks; and then determining the processing result of the task corresponding to the decoder according to the target private feature and the target general feature of the task corresponding to the decoder through each decoder in the multi-task processing model, wherein the multi-task processing model comprises the decoders corresponding to the tasks. The shared projection structure in the adapter can train based on respective training samples of a plurality of tasks, information interaction can be carried out across tasks in the training process, the knowledge extraction structure in the adapter has better reference common feature learning capacity, the reference private feature of a single task can be extracted based on the reference common feature extracted by the shared projection structure, the better reference private feature learning capacity is achieved, and when the multi-task processing model (namely the pre-training model inserted with the adapter) is applied to the plurality of tasks, the shared projection structure and the knowledge extraction structure based on the adapter can better learn feature representation under each task, so that the performance of the multi-task processing model in downstream tasks is improved.
Drawings
FIG. 1a is a schematic diagram of a task-specific adapter provided by the related art;
FIG. 1b is a schematic diagram of a shared task adapter provided by the related art;
FIG. 1c is a schematic diagram of an adapter according to an embodiment of the present application;
fig. 2 is a schematic view of a task processing method according to an embodiment of the present application;
FIG. 3 is a flowchart of a task processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a multi-task processing model according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a task processing scenario provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of a task processing device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Today, the pre-trained fine tuning paradigm has met with significant success in many areas. However, as model parameters of the pre-trained model become larger and larger, more and more downstream tasks are applied, the cost required to individually fine tune all model parameters of the pre-trained model for each downstream task is extremely high, requiring high computational effort and storage resource support.
As an example, assuming that the number of the downstream tasks corresponding to the pre-training model includes 10, if all the model parameters of the pre-training model are individually tuned for each downstream task, 10 model training needs to be performed for the pre-training model, and all the model parameters of the pre-training model are adjusted in the process of each model training, which requires extremely high cost and high calculation power and support of storage resources.
Based on this, an Adapter (Adapter) has been developed, which is a bottleneck structure with a very small amount of learnable parameters inserted into the pre-trained model. When the pre-training model is finely tuned aiming at a downstream task, only the model parameters of the adapter can be trained and adjusted, the original model parameters of the pre-training model are kept unchanged, and the effects similar to, and even better than, all the model parameters in the fine-tuning pre-training model can be achieved.
Referring to FIG. 1a, a schematic diagram of a task-specific adapter is provided in the related art.
In a hypothetical manner, the pretraining model may be fine-tuned by using a specific task adapter, as shown in fig. 1a, where the specific task adapter is designed to be closely related to the number of tasks, that is, an independent adapter is added for each downstream task in each layer of the pretraining model, so that each task has an independent channel, and a specific representation may be:
Wherein,represent the firstiInput data of the individual task adapter, +.>Represent the firstiUnique downsampled projection layer parameters for individual tasks, < >>Represent the firstiUnique upsampled projection layer parameters for individual tasks, < >>Is expressed by the firstiAnd outputting the result of the transformation of the task adapter.
In this assumption mode, when the pre-training model is fine-tuned for the downstream task through the specific task adapters, all the specific adapters are independent, and private representation information of each downstream task can be obtained.
However, the number of task-specific adapters provided by this hypothetical approach increases with the number of downstream tasks, potentially resulting in a higher number of training parameters; on the other hand, since each downstream task is an independent channel, that is, only an independent feature representation of each downstream task can be obtained, the feature representations corresponding to the plurality of downstream tasks cannot be interacted, which may result in poor performance of the pre-training model inserted into the specific task adapter in the downstream task.
Referring to FIG. 1b, a schematic diagram of a shared task adapter is provided in the related art.
In another hypothetical approach, the pretrained model may be trimmed by using a shared task adapter, as shown in fig. 1b, where the design of the shared task adapter is not related to the number of tasks, that is, only one shared task adapter is added to each layer of the pretrained model for multiple downstream tasks, and all the downstream tasks share the shared task adapter, so that all the downstream tasks have the same representation after passing through the pretrained model, which may be specifically:
Wherein,xinput data representing a single task is presented,indicating that multiple tasks share downsampled projection layer parameters,indicating that all tasks share up-sampled projection layer parameters,x'output results for a single task.
In the assumption mode, when the pre-training model is finely tuned for the downstream tasks through the shared task adapter, all the downstream tasks share one shared task adapter, the input and output of the tasks are not different, and the general representation information among all the tasks can be acquired.
Compared with the special task adapter, the shared task adapter provided by the assumption mode reduces the number of adapters, reduces the number of training parameters and promotes the interaction between tasks; however, the shared task adapter can only extract generic feature representations of multiple downstream tasks, and the specific information representation for each downstream task is not sufficiently available, resulting in poor performance in the downstream tasks of the pre-trained model inserted into the shared task adapter.
That is, regardless of the specific task adapter or the shared task adapter, when the pre-training model is simultaneously applied to a plurality of downstream tasks, the implementation effect of the fine-tuning scheme of the pre-training model based on the adapter in the related art is generally not ideal enough, and the pre-training model inserted into the adapter has poor performance in the downstream tasks.
In order to solve the above technical problems, an embodiment of the present application provides a task processing method, including: determining target general features according to the acquired data to be processed by a pre-training model in a multi-task processing model for executing a plurality of tasks based on the input data; then, determining respective target private characteristics of a plurality of tasks according to reference characteristics generated when the data to be processed is processed by the data to be processed or the pre-training model by an adapter in the multi-task processing model, wherein the adapter comprises a shared projection structure and knowledge extraction structures corresponding to the tasks, the shared projection structure is used for extracting reference general characteristics, the knowledge extraction structures corresponding to the tasks are used for extracting the reference private characteristics of the tasks based on the reference general characteristics, and the target private characteristics of the tasks are determined according to the reference private characteristics of the tasks; and then determining the processing result of the task corresponding to the decoder according to the target private feature and the target general feature of the task corresponding to the decoder through each decoder in the multi-task processing model, wherein the multi-task processing model comprises the decoders corresponding to the tasks.
Referring to fig. 1c, a schematic view of an adapter according to an embodiment of the present application is shown.
Referring to fig. 1c, the adapter provided by the embodiment of the application extracts the reference common feature through the shared projection structure in the adapter, extracts the reference private feature of the corresponding task based on the reference common feature through the knowledge extraction structure corresponding to each of the tasks, and further obtains the target private feature through the reference private feature; meanwhile, determining target general characteristics according to the acquired data to be processed through a pre-training model; and then determining the processing result of the task corresponding to the decoder according to the target private characteristic and the target general characteristic of the task corresponding to the decoder through each decoder in the multi-task processing model.
The adapter provided by the embodiment can extract the reference common features among a plurality of tasks through the shared projection structure, and can also extract the reference private features corresponding to each task through the knowledge extraction structure corresponding to each task, namely, the common features among the plurality of tasks and the private features of a single task are extracted, and the performance of a pre-training model inserted with the adapter in the plurality of tasks is improved.
Therefore, the shared projection structure in the adapter can train based on respective training samples of a plurality of tasks, information interaction can be carried out across tasks in the training process, the knowledge extraction structure in the adapter has better reference common feature learning capacity, the reference private feature of a single task can be extracted based on the reference common feature extracted by the shared projection structure, the better reference private feature learning capacity is achieved, and when the shared projection structure and the knowledge extraction structure based on the adapter are applied to a plurality of tasks, the feature representation under each task can be better learned, so that the performance of the multi-task processing model in downstream tasks is improved.
Referring to fig. 2, the schematic diagram of a scenario of a task processing method according to an embodiment of the present application may include a terminal device 201 or a server 202.
The terminal device 201 or the server 202 acquires data to be processed. As an example, the data to be processed may be an image, text, or the like, which is not particularly limited herein.
The terminal equipment 201 or the server 202 determines target general characteristics according to the data to be processed through a pre-training model in the multi-task processing model; the multitasking model is used to perform a plurality of tasks based on the input data. As an example, assuming that the data to be processed is image data, a pre-training model in the multi-tasking model may determine target general features corresponding to the image data from the image data, and the multi-tasking model may perform a plurality of tasks such as a semantic segmentation task, an instance segmentation task, and the like based on the input image data.
The terminal device 201 or the server 202 determines respective target private characteristics of a plurality of tasks according to the data to be processed or reference characteristics generated when the pre-training model processes the data to be processed through an adapter in the multi-task processing model; the adapter comprises a shared projection structure and a knowledge extraction structure corresponding to each of the tasks, the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the corresponding tasks based on the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks.
As an example, assuming that the data to be processed is image data, the multitasking model executes 3 tasks, A, B, C respectively, and the adaptor in the multitasking model determines the target private features a, b, and c corresponding to the tasks A, B, C respectively according to the image data or the reference features generated when the pre-training model processes the image data. Wherein the shared projection structure in the adapter can extract the reference common feature X; the knowledge extraction structure corresponding to each task A, B, C in the adapter can extract the reference private features X1, X2, X3 corresponding to each task A, B, C based on the reference common feature X; the target private features a, b, c to which the tasks A, B, C respectively correspond may then be determined from the reference private features x1, x2, x 3.
The terminal device 201 or the server 202 determines the processing result of the task corresponding to the decoder according to the target private feature and the target general feature of the task corresponding to the decoder through each decoder in the multi-task processing model; the multi-task processing model comprises a plurality of decoders corresponding to the tasks. As an example, the multitasking model executes 3 tasks, A, B, C respectively, the decoder corresponding to each task A, B, C determines the processing result corresponding to each task A, B, C according to the target private feature and the target general feature corresponding to each task A, B, C, for example, the decoder corresponding to task a determines the processing result a-1 of task a according to the target private feature a and the target general feature Y corresponding to task a, the decoder corresponding to task B determines the processing result B-1 of task B according to the target private feature B and the target general feature Y corresponding to task B, and the decoder corresponding to task C determines the processing result C-1 of task C according to the target private feature C and the target general feature Y corresponding to task C.
Therefore, the shared projection structure in the adapter can train based on respective training samples of a plurality of tasks, information interaction can be carried out across tasks in the training process, the shared projection structure has better reference common feature learning capability, the knowledge extraction structure in the adapter can extract reference private features of a single task on the basis of the reference common features extracted by the shared projection structure, the better reference private feature learning capability is achieved, and when the multi-task processing model (namely the pre-training model inserted with the adapter) is applied to the plurality of tasks, the shared projection structure and the knowledge extraction structure based on the adapter can better learn feature representation under each task, so that the performance of the multi-task processing model in downstream tasks is improved.
The task processing method provided by the embodiment of the application can be applied to terminal equipment or a server with data processing capability, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms. The terminal device includes, but is not limited to, a mobile phone, a tablet, a computer, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, an aircraft, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
The task processing method provided by the embodiment of the application relates to artificial intelligence, computer vision technology and a pre-training model.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Computer Vision (CV) is a science of how to "look" at a machine, and more specifically, to replace a camera and a Computer to perform machine Vision such as identifying and measuring a target by human eyes, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. The large model technology brings important innovation for the development of computer vision technology, and a pretrained model in the vision fields of swin-transducer, viT, V-MOE, MAE and the like can be quickly and widely applied to downstream specific tasks through fine adjustment (finetune). Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
The Pre-training model (Pre-training model), also called a basic stone model and a large model, refers to a Deep Neural Network (DNN) with large parameters, trains massive unlabeled data, utilizes the function approximation capability of the large-parameter DNN to enable PTM to extract common features on the data, and is suitable for downstream tasks through technologies such as fine tuning (fine), parameter Efficient Fine Tuning (PEFT), prompt-tuning and the like. Therefore, the pre-training model can achieve ideal effects in a small sample (Few-shot) or Zero sample (Zero-shot) scene. PTM can be classified according to the data modality of the process into a language model (ELMO, BERT, GPT), a visual model (swin-transducer, viT, V-MOE), a speech model (VALL-E), a multi-modal model (ViBERT, CLIP, flamingo, gato), etc., wherein a multi-modal model refers to a model that builds a representation of the characteristics of two or more data modalities. The pre-training model is an important tool for outputting Artificial Intelligence Generation Content (AIGC), and can also be used as a general interface for connecting a plurality of specific task models.
Several terms which may be involved in the following embodiments of the present application will be explained first.
The adapter means a bottleneck structure (generally composed of two full-link layers, one layer is used for dimension-reducing mapping and the other layer is used for dimension-length mapping) which is inserted into a network structure with large-scale model parameters and has a very small amount of learning parameters, and only the bottleneck structure parameters are trained when a certain task is fine-tuned at the downstream, so that the original parameters of the pre-trained model are kept unchanged.
The transducer is a model for improving the training speed of the model by using an attention mechanism, and mainly comprises an input module, an encoding module, a decoding module and an output module, wherein the encoding module is mainly realized by an encoder, and the decoding module is mainly realized by a decoder. Wherein the encoder is responsible for converting the input into features, i.e. the encoder functions to transform an input sequence of indefinite length into a context variable of definite length and encode the input sequence information in the context variable; the decoder is responsible for translating the feature into the target, i.e. the decoder generates the output sequence by decoding the information in the context variables.
Referring to fig. 3, the flowchart of a task processing method according to an embodiment of the present application is shown.
Referring to fig. 3, the task processing method provided by the embodiment of the present application may include:
s301: and obtaining data to be processed.
The data to be processed means data according to which a plurality of tasks in the embodiment of the present application are performed, which is data input into a multitasking model, and may be an image, a text, etc., which is not particularly limited herein.
S302: and determining the target general characteristics according to the data to be processed by a pre-training model in the multi-task processing model.
Wherein the multitasking model is for performing a plurality of tasks based on the input data.
A multitasking model means a model that can handle multiple tasks, which may include a pre-trained model and an adapter. The pre-training model means a basic processing model corresponding to a plurality of tasks. The pre-training model can execute a plurality of tasks, has a large model parameter, has excellent and reliable performance after being subjected to repeated training, and can be a transducer structure and the like.
The task means a task for execution based on the data to be processed, and as an example, when the data to be processed is image data, the plurality of tasks executed by the multitasking processing model may include at least two of a semantic segmentation task, an instance segmentation task, a panorama segmentation task, a human segmentation task, and a saliency detection task.
The purpose of semantic segmentation is to label each pixel in the image with the category of content represented. Because each pixel in an image is labeled, this type of task is often referred to as dense prediction. That is, semantic segmentation is a classification at the pixel level, and pixels belonging to the same class are classified into one class, and thus semantic segmentation is a task of understanding an image from the pixel level. As an example, an image containing a person, an animal and a background, for example, pixels belonging to the person are classified into one type, pixels belonging to the animal are classified into one type, and background pixels are classified into another type.
And the example segmentation simultaneously utilizes the results of target detection and semantic segmentation to extract the Mask corresponding to the target in the semantic segmentation through the index of the highest confidence class of the target provided by the target detection. In short, specific objects (specific instances) in a class are partitioned.
Saliency detection means that the visual characteristics of a person are simulated through an intelligent algorithm, and salient areas (namely areas of human interest) in an image are extracted.
Human segmentation, which is a subtask of the semantic segmentation task, aims at pixel-level fine-grained segmentation of human images (e.g., to divide body parts and clothing).
The target general feature means a general feature among a plurality of tasks, and the target general feature can reflect general information according to which the plurality of tasks are executed, that is, the target general feature can play a corresponding role in the execution process of the plurality of tasks.
S303: and determining the target private characteristics of each of the tasks according to the data to be processed or the reference characteristics generated when the pre-training model processes the data to be processed by an adapter in the multi-task processing model.
The adapter comprises a shared projection structure and a knowledge extraction structure corresponding to each of a plurality of tasks, wherein the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the corresponding tasks based on the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks.
The target private feature means a specific feature corresponding to each task, and the target private feature can reflect specific private information according to which each task is executed, that is, the target private feature plays a specific role in the execution process of the corresponding task, for example, the target private feature a can play a specific role in executing the corresponding task a, but the target private feature a cannot play a role in executing the task B.
The shared projection structure means a structure for extracting reference common features among reference features generated when data to be processed is processed or the pre-training model processes the data to be processed.
It should be understood that the shared projection structure in the adapter can train based on the training samples of each of the plurality of tasks, and can perform information interaction across tasks in the training process, so that the shared projection structure has better reference universal feature learning capability.
Knowledge extraction structure means a structure for determining reference private features of a corresponding task. The knowledge extraction structure can extract the reference private features of a single task on the basis of sharing the reference general features extracted by the projection structure, and has better reference private feature learning capability.
S304: and determining a processing result of the task corresponding to the decoder according to the target private characteristic and the target general characteristic of the task corresponding to the decoder through each decoder in the multi-task processing model.
The multi-task processing model comprises a plurality of decoders corresponding to the tasks.
It should be understood that, by inputting the target general feature and the target private feature corresponding to each of the plurality of tasks into the corresponding decoder, the processing result of the corresponding task may be obtained, and since the general feature between the plurality of tasks and the private feature of each of the plurality of tasks are used in the processing process, the specific information of the task may be sufficiently obtained, so as to improve the performance of the pre-training model inserted into the adapter in the downstream task.
According to the task processing method provided by the embodiment of the application, the shared projection structure in the adapter can be used for training based on the training samples of the tasks, information interaction can be carried out across the tasks in the training process, the method has better reference universal feature learning capability, the knowledge extraction structure in the adapter can be used for extracting the reference private features of a single task on the basis of the reference universal features extracted by the shared projection structure, the better reference private feature learning capability is achieved, and when the multi-task processing model (namely the pre-training model inserted with the adapter) is applied to the tasks, the feature representation under each task can be better learned based on the shared projection structure and the knowledge extraction structure of the adapter, so that the performance of the multi-task processing model in downstream tasks is improved.
Based on the task processing method provided in the foregoing embodiment, in order to further describe the process of determining, by the adapter, the target private feature according to the data to be processed or the reference feature, in some possible implementation manners, step S303 may include:
a1: and determining the reference universal characteristic according to the data to be processed or the reference characteristic through the shared projection structure in the adapter.
Because the adapter can be inserted into any position in the pre-training model, the data received by the shared projection structure in the adapter can be data to be processed, and can also be reference characteristics generated when the pre-training model processes the data to be processed.
It should be appreciated that the common characteristics between the plurality of tasks may be determined by the shared projection architecture in the adapter, enabling interactions between the plurality of tasks.
In one possible implementation manner, the step A1 may specifically include:
b1: and carrying out downsampling treatment on the data to be processed or the reference characteristics through a downsampling projection layer in the shared projection structure to obtain the reference downsampling characteristics.
Downsampling, also known as downsampling, downsampling an image, the primary purpose of which is to conform the image to the size of the display area, and generating a thumbnail of the corresponding image, may be pooling. B2: and carrying out nonlinear transformation processing on the reference downsampling characteristic through sharing a nonlinear layer in the projection structure to obtain the reference transformation characteristic.
In this embodiment, the nonlinear layer may perform nonlinear transformation by using a nonlinear activation function, and may also be replaced by a different nonlinear activation function in different tasks, such as ReLU, sigmoid, tanh.
In the embodiment of the present application, the nonlinear layer may perform nonlinear transformation by using a ReLU (ralu) as a nonlinear activation function, where a ReLU (Rectified Linear Unit, linear rectification function), also called a modified linear unit, is an activation function (activation function) commonly used in an artificial neural network, and generally refers to a nonlinear activation function represented by a ramp function and its variants.
Wherein, the purpose of introducing the nonlinear activation function is to improve the nonlinear fitting capability of the adapter and enhance the expression capability of the model.
B3: and carrying out upsampling processing on the reference transformation characteristics through an upsampling projection layer in the shared projection structure to obtain the reference general characteristics.
Upsampling, also known as upscaling, image interpolation, the main purpose of which is to enlarge the original image so that it can be displayed on a higher resolution display device, the upsampling method can be bilinear interpolation, deconvolution, anti-pooling, etc.
As shown in connection with FIG. 1c, the shared projection architecture in the adapter may include a downsampled projection layer W down Nonlinear activation function ReLU and upsampling projection layerW up Interactions between multiple tasks may be facilitated by a shared projection architecture, which may be expressed in particular as:
wherein,Freference is made to the general feature of,representing the reference characteristics generated when the data to be processed or the pre-training model processes the data to be processed, it should be understood that, because the adapter can be inserted into the position of any layer in the pre-training model, the data of the shared projection structure input to the adapter can be the data to be processed or the reference characteristics generated when the pre-training model processes the data to be processed. The reference feature may be an output result of a certain layer in the pre-training model, or may be a final output result of the pre-training model, which is not specifically limited herein.
In this embodiment, the shared projection structure has a better reference general feature learning capability, so that the reference general feature corresponding to the data to be processed or the reference feature can be better obtained, so as to promote interaction among a plurality of tasks.
A2: the first sub-reference generic feature and the second sub-reference generic feature are determined from the reference generic feature by a gating structure in the adapter.
It should be understood that, in order to better learn the common features of interactions between the tasks, and the multiple independent feature representations of each, in this embodiment, the reference common feature may be divided into two parts, and a part of the reference common feature is used as the common feature of interactions between the tasks, that is, the first sub-reference common feature; and a part of the reference universal features are used for determining the reference private features corresponding to each task, namely the second sub-reference universal features, and determining the reference private features of the tasks corresponding to the knowledge extraction structure according to the second sub-reference universal features.
As one possible implementation, the first sub-reference generic feature may be expressed as:
wherein,representing a first sub-reference passBy means of the characteristics of the device,sthe scaling factor representing the gating structure,Freference to generic features is represented.
A3: and determining the reference private feature of the task corresponding to the knowledge extraction structure according to the second sub-reference universal feature through each knowledge extraction structure in the adapter.
In one possible implementation, step A3 may be expressed as:
wherein,represent the firstiReference private feature of individual tasks->And->Respectively represent the firstiScaling and shifting factors for individual tasks, +.>Representing dot product operations, (1)-sFRepresenting a second sub-reference generic feature.
In another possible implementation manner, step A3 may specifically include:
c1: and scaling the second sub-reference universal feature by a scaling factor in the knowledge extraction structure to obtain a reference scaling feature.
Scaling means changing the range of values of the feature, scaling to a specific interval.
Since the second sub-reference generic feature may contain a large amount of information, by scaling the second sub-reference generic feature, the information contained therein may be scaled to a specific task interval, the information within the specific task interval having a higher reference value for performing the corresponding task.
C2: and carrying out shift processing on the reference scaling characteristic through a shift factor in the knowledge extraction structure to obtain the reference private characteristic of the task.
The shift process means an operation of moving the reference zoom feature.
And further carrying out shift processing on the reference scaling characteristics to enable the reference scaling characteristics to be more matched with corresponding tasks, and providing information with more reference value for the execution of the corresponding tasks.
In this embodiment, the scaling factor in the knowledge extraction structure is notified to perform scaling processing on the second sub-reference generic feature to obtain the reference scaling feature, and the reference scaling feature is further processed by the shifting factor in the knowledge extraction structure to obtain the reference private feature, so that accuracy of the determined reference private feature can be improved.
A4: for each task, determining target private characteristics of the task according to the first sub-reference universal characteristics and the reference private characteristics of the task.
In one possible implementation, the target private feature may be expressed as:
/>
wherein,representing the target private feature corresponding to the task, +.>Representing a first sub-reference generic feature, < >>Representing a reference private feature. Of course, in practical application, the target private feature may also be obtained by weighted summation of the first sub-reference common feature and the reference private feature according to a specific weight coefficient.
In the embodiment of the application, the reference general feature is obtained through the shared projection structure in the adapter, the first sub-reference general feature and the second sub-reference general feature are obtained through the gating structure in the adapter, the reference private feature is further determined according to the second sub-reference general feature through the knowledge extraction structure in the adapter, and finally the target private feature is determined through the first sub-reference general feature and the reference private feature, so that information among a plurality of tasks can be complemented, independent information corresponding to each task is reserved, and the performance of the pre-training parameters inserted into the adapter in the plurality of tasks is improved.
Referring to fig. 4, a schematic diagram of a multitasking model according to an embodiment of the present application is shown.
Based on the task processing method provided in the foregoing embodiment, in one possible implementation manner, the pre-training model includes a plurality of sub-coding structures, such as four blocks (blocks) in the pre-training model in fig. 4, and the multi-task processing model includes adapters corresponding to the plurality of sub-coding structures respectively.
As shown in connection with fig. 4, in this embodiment, the multitasking model includes a pre-training model, which may be a transducer model that may include a plurality of blocks (transducers), such as the 4 blocks shown in fig. 4, which may be connected in parallel or in series, and in this embodiment, the serial connection is shown by way of example, with each block having a corresponding decoder and adapter connected thereto, and each block includes several transducer layers.
Taking block1 as an example in the embodiment, the block1 comprises a multi-head self-attention module (SW-MSA) based on a sliding window, two normalization modules (LayerNorm) arranged in front of and behind the SW-MSA module, and a multi-layer perceptron (MLP); wherein the adapters of block1 are disposed at both ends of the MLP as shown in fig. 4.
The multi-layer perceptron (MLP, multilayer Perceptron) is also called artificial neural network (ANN, artificial Neural Network), which may have multiple hidden layers in between, except for input and output layers, the simplest MLP having a structure with only one hidden layer, i.e. three layers.
Layer normalization (LayerNorm, layer normalization) is a normalization technique used in deep neural networks. It may normalize the output of each neuron in the network so that the output of each layer in the network has a similar distribution.
In one possible implementation, step S302 may include: by pre-training each sub-coding structure (referring to a transducer block in this embodiment) in the model, the target generic features of the sub-coding structure output are determined from the input data of the sub-coding structure.
The input data of the sub-coding structure is data to be processed or target general characteristics output by other sub-coding structures. It should be understood that, as shown in fig. 4, if the sub-coding structure is block1, the input data of block1 is data to be processed, and if the sub-coding structure is block2, the input data of block2 is the target general feature output by block 1.
Correspondingly, step S303 may include: and determining, by each adapter, target private characteristics of each of a plurality of tasks output by the adapter according to input data of the adapter.
The input data of the adapter is input data of a sub-coding structure corresponding to the adapter or reference characteristics generated by the sub-coding structure corresponding to the adapter.
It should be appreciated that since the adapter may be inserted into a different transformer layer in the sub-coding structure, its input data may be the input data of the sub-coding structure if the adapter is inserted before or in parallel with the first layer of the sub-coding structure, and the reference feature may be generated for the sub-coding structure if the adapter is inserted between some two layers or after the last layer of the sub-coding structure or in parallel with other layers than the first layer.
As an example, an adapter parallel to the MLP layer is inserted in each transducer block, and only the parameters of the adapter are updated during the trimming process, as shown in fig. 4. For multiple tasks, the decoder of each task receives its corresponding target private feature from a different adapter. That is, the adapter transmits its generated target private feature directly to the decoder. Target generic features derived based on the pre-training model are added to the encoder and transmitted to the decoder after each block.
Correspondingly, step S304 may include: and determining a processing result of the task corresponding to the decoder according to the target general feature output by each sub-coding structure and the target private feature of the task corresponding to the decoder output by each adapter through each decoder.
For the firstiThe decoder of each task, the multi-scale information received from the encoder including the target private feature and the target generic feature, can be expressed as:
wherein,represent the firstiMultiscale information of individual tasks, +.>And->The representation is from the firstjTarget generic features and target private features of the block. The multi-scale information is then input into a designed decoder to obtain the processing result. In the multitasking model, the decoder is generally an upsampling structure, such as a segrurmer decoder, a HRNet-V2 decoder, etc., which may be set according to a specific task, and is not limited herein.
It should be understood that when the plurality of sub-coding structures are connected in series, the corresponding task processing result may also be determined according to the output of the adapter corresponding to the last sub-coding structure machine, so as to improve the task processing efficiency.
It should be appreciated that given T tasks, the task-specific adapter in the related art inserts T adapters in each of the transformer layers, if each adapter consists of 2kd parameters for downsampled projection and upsampled projection, then the total number of trainable parameters of the transformer model with L layers is TL 2kd; the shared adapter in the related art inserts a single adapter in each of the transformer layers having l.2kd parameters; the adapter provided in this embodiment includes an instruction extraction structure, and for T tasks, the parameter of the module is 2Td, that is, the total number of trainable parameters is L (2kd+2td), that is, the total number of trainable parameters required in the embodiment of the present application, compared with the original transducer model, only about 1% of the parameters thereof, which reduces the number of trainable parameters and lowers the cost.
Furthermore, since the task-specific adapter in the related art establishes a separate path for each task during training and reasoning, each input data must pass through the encoder T times to obtain predictions of T tasks, i.e., the training and reasoning efficiency of the task-specific adapter is O (T). The adapter provided by the embodiment of the application allows the target general feature to pass through the encoder, and the target private feature needs to be calculated through the adapter, so that the calculated amount of the encoder is reduced, and the training and reasoning efficiency is improved.
Based on the task processing method provided in the above embodiment, the multitasking model may be trained by:
d1: and acquiring training samples corresponding to the tasks.
The training sample comprises training data and corresponding labeling results. As an example, it is assumed that it is necessary to determine whether an image is a compliance image, where the training data may be a plurality of training images, and the corresponding labeling result may be compliance or rule violation, as shown in fig. 5, which is a schematic diagram of a task processing scenario provided by an embodiment of the present application.
The training samples corresponding to the task a may include training data X and a corresponding labeling result Y thereof, and the training samples corresponding to the task B may include training data X and a corresponding labeling result Z thereof, while the training data X corresponding to the task a and the training data X corresponding to the task B are the same, the labeling result of the training data corresponding to the task a is Y, and the labeling result of the training data corresponding to the task B is Z.
D2: aiming at each task, determining a training processing result corresponding to the training sample according to training data in the training sample corresponding to the task through a multi-task processing model to be trained. And according to the training processing result and the labeling result in the training sample, adjusting model parameters of a shared projection structure and a knowledge extraction structure corresponding to the task and model parameters of a decoder corresponding to the task, which are included in the adapter in the multi-task processing model.
It should be understood that the process of processing the training data in the training sample by the multitasking model to be trained is similar to the process of processing the data by the multitasking model in the above embodiment, and thus will not be described again.
In the training process, a loss value may be determined based on a difference between a training processing result and a labeling result, and then a loss function is constructed according to the loss value, and parameters of the adapter are adjusted based on the loss function, so that the operations are iteratively performed based on different training samples until the trained multi-task processing model meets a training ending condition, for example, until the training times of the multi-task processing model reach a preset time or the performance of the multi-task processing model reaches a preset performance requirement.
In the embodiment of the application, the shared projection structure and the knowledge extraction structure of the adapter are trained, and the model parameters of the shared projection structure and the knowledge extraction structure corresponding to the task and the model parameters of the decoder corresponding to the task, which are included in the adapter in the multi-task processing model, are adjusted, so that interaction can be carried out among a plurality of tasks in the training process, information among the plurality of tasks can be complemented, namely, the shared projection structure of the adapter can obtain target general characteristics, and in addition, the knowledge extraction structure can be obtained through training to obtain independent characteristic information corresponding to each task, namely, target private characteristics, so that the performance of the pre-training model inserted into the trained adapter in the plurality of tasks is improved.
Referring to fig. 6, the structure of a task processing device according to an embodiment of the present application is shown.
Referring to fig. 6, a task processing device 600 provided in an embodiment of the present application may include:
a data acquisition module 601, configured to acquire data to be processed;
a first feature extraction module 602, configured to determine, according to data to be processed, a target general feature through a pre-training model in the multitask processing model; the multitasking model is used for executing a plurality of tasks based on the input data;
A second feature extraction module 603, configured to determine, by using an adapter in the multitasking model, target private features of each of the plurality of tasks according to reference features generated when the data to be processed is processed or the pre-training model processes the data to be processed; the adapter comprises a shared projection structure and a knowledge extraction structure corresponding to each of a plurality of tasks, wherein the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the corresponding tasks based on the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks;
a decoding module 604, configured to determine, by each decoder in the multitasking model, a processing result of a task corresponding to the decoder according to a target private feature and a target general feature of the task corresponding to the decoder; the multi-task processing model comprises a plurality of decoders corresponding to the tasks.
As an example, the second feature extraction module 603 includes:
the reference general feature determining unit is used for determining a reference general feature according to the data to be processed or the reference feature through the shared projection structure in the adapter;
the gate control unit is used for determining a first sub-reference universal feature and a second sub-reference universal feature according to the reference universal feature through a gate control structure in the adapter;
The reference private feature determining unit is used for determining the reference private feature of the task corresponding to the knowledge extraction structure according to the second sub-reference general feature through each knowledge extraction structure in the adapter;
the second feature extraction unit is used for determining target private features of the tasks according to the first sub-reference general features and the reference private features of the tasks for each task.
As an example, referring to the general feature determination unit, comprising:
the first processing subunit is used for carrying out downsampling processing on data to be processed or reference features through a downsampling projection layer in the shared projection structure to obtain reference downsampled features;
the second processing subunit is used for carrying out nonlinear transformation processing on the reference downsampling characteristic through sharing a nonlinear layer in the projection structure to obtain a reference transformation characteristic;
and the reference general feature determining subunit is used for carrying out upsampling processing on the reference transformation features through an upsampling projection layer in the shared projection structure to obtain reference general features.
As an example, a reference private feature determination unit includes:
the third processing subunit is used for performing scaling processing on the second sub-reference general feature through the scaling factors in the knowledge extraction structure to obtain a reference scaling feature;
And the reference private feature determination subunit is used for carrying out shift processing on the reference scaling feature through the shift factors in the knowledge extraction structure to obtain the reference private feature of the task.
As an example, the pre-training model includes a plurality of sub-coding structures, and the multitasking model includes adapters corresponding to the plurality of sub-coding structures;
the first feature extraction module 602 is specifically configured to:
determining target general characteristics output by the sub-coding structures according to input data of the sub-coding structures through each sub-coding structure in the pre-training model; the input data of the sub-coding structure is data to be processed or target general characteristics output by other sub-coding structures;
the second feature extraction module 603 is specifically configured to:
determining, by each adapter, target private characteristics of each of a plurality of tasks output by the adapter according to input data of the adapter; the input data of the adapter is input data of a sub-coding structure corresponding to the adapter or reference characteristics generated by the sub-coding structure corresponding to the adapter.
As an example, the decoding module 604 is specifically configured to:
and determining a processing result of the task corresponding to the decoder according to the target general feature output by each sub-coding structure and the target private feature of the task corresponding to the decoder output by each adapter through each decoder.
As one example, the multitasking model is trained by:
the training acquisition module is used for acquiring training samples corresponding to each of the tasks; the training sample comprises training data and corresponding labeling results;
the training module is used for determining a training processing result corresponding to the training sample according to training data in the training sample corresponding to the task through a multi-task processing model to be trained aiming at each task; and according to the training processing result and the labeling result in the training sample, adjusting model parameters of a shared projection structure and a knowledge extraction structure corresponding to the task and model parameters of a decoder corresponding to the task, which are included in the adapter in the multi-task processing model.
As one example, when the data to be processed is image data, the plurality of tasks performed by the multitasking model include at least two of a semantic segmentation task, an instance segmentation task, a panorama segmentation task, a human segmentation task, and a saliency detection task.
The task processing device provided by the embodiment of the present application has the same beneficial effects as the task processing method provided by the above embodiment, and therefore will not be described in detail.
The embodiment of the application also provides a computer device, which can be a terminal device or a server, and the terminal device and the server provided by the embodiment of the application are introduced from the aspect of hardware materialization.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 7, for convenience of explanation, only the portions related to the embodiments of the present application are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present application. The terminal may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (pda), a Point of Sales (POS), a vehicle-mounted computer, and the like, taking the terminal as an example of a computer:
fig. 7 is a block diagram showing a part of the structure of a computer related to a terminal provided by an embodiment of the present application. Referring to fig. 7, a computer includes: radio Frequency (RF) circuitry 1210, memory 1220, input unit 1230 (including touch panel 1231 and other input devices 1232), display unit 1240 (including display panel 1241), sensors 1250, audio circuitry 1260 (which may connect speaker 1261 and microphone 1262), wireless fidelity (wireless fidelity, wiFi) module 1270, processor 1280, and power supply 1290. Those skilled in the art will appreciate that the computer architecture shown in fig. 7 is not limiting and that more or fewer components than shown may be included, or that certain components may be combined, or that different arrangements of components may be provided.
Memory 1220 may be used to store software programs and modules, and processor 1280 may execute the various functional applications and data processing of the computer by executing the software programs and modules stored in memory 1220. The memory 1220 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data created according to the use of the computer (such as audio data, phonebooks, etc.), and the like. In addition, memory 1220 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
Processor 1280 is a control center of the computer and connects various parts of the entire computer using various interfaces and lines, performing various functions of the computer and processing data by running or executing software programs and/or modules stored in memory 1220, and invoking data stored in memory 1220. In the alternative, processor 1280 may include one or more processing units; preferably, the processor 1280 may integrate an application processor and a modem processor, wherein the application processor primarily handles operating systems, user interfaces, application programs, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1280.
In an embodiment of the present application, the processor 1280 included in the terminal further has the following functions:
acquiring data to be processed;
determining target general characteristics according to the data to be processed through a pre-training model in the multi-task processing model; the multitasking model is used for executing a plurality of tasks based on the input data;
determining respective target private characteristics of a plurality of tasks according to the data to be processed or reference characteristics generated when the pre-training model processes the data to be processed through an adapter in the multi-task processing model; the adapter comprises a shared projection structure and a knowledge extraction structure corresponding to each of a plurality of tasks, wherein the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the corresponding tasks based on the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks;
determining a processing result of a task corresponding to the decoder according to the target private feature and the target general feature of the task corresponding to the decoder through each decoder in the multi-task processing model; the multi-task processing model comprises a plurality of decoders corresponding to the tasks.
Optionally, the processor 1280 is further configured to perform steps of any implementation of the task processing method provided by the embodiment of the present application.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a server 1300 according to an embodiment of the present application. The server 1300 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPU) 1322 (e.g., one or more processors) and memory 1332, one or more storage media 1330 (e.g., one or more mass storage devices) storing applications 1342 or data 1344. Wherein the memory 1332 and storage medium 1330 may be transitory or persistent. The program stored on the storage medium 1330 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Further, the central processor 1322 may be configured to communicate with the storage medium 1330, and execute a series of instruction operations in the storage medium 1330 on the server 1300.
The Server 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input/output interfaces 1358, and/or one or more operating systems, such as Windows Server TM ,Mac OS X TM ,Unix TM , Linux TM ,FreeBSD TM Etc.
The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 8.
Wherein CPU 1322 is configured to perform the following steps:
acquiring data to be processed;
determining target general characteristics according to the data to be processed through a pre-training model in the multi-task processing model; the multitasking model is used for executing a plurality of tasks based on the input data;
determining respective target private characteristics of a plurality of tasks according to the data to be processed or reference characteristics generated when the pre-training model processes the data to be processed through an adapter in the multi-task processing model; the adapter comprises a shared projection structure and a knowledge extraction structure corresponding to each of a plurality of tasks, wherein the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the corresponding tasks based on the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks;
determining a processing result of a task corresponding to the decoder according to the target private feature and the target general feature of the task corresponding to the decoder through each decoder in the multi-task processing model; the multi-task processing model comprises a plurality of decoders corresponding to the tasks.
Optionally, CPU 1322 may also be configured to perform the steps of any implementation of the task processing methods provided by embodiments of the present application.
The embodiments of the present application also provide a computer-readable storage medium storing a computer program for executing any one of the task processing methods described in the foregoing embodiments.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform any one of the task processing methods described in the foregoing respective embodiments.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media in which a computer program can be stored.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (11)

1. A method of task processing, the method comprising:
acquiring data to be processed;
determining target general characteristics according to the data to be processed through a pre-training model in a multi-task processing model; the multitasking model is used for executing a plurality of tasks based on input data;
determining, by an adapter in the multitasking model, target private features of each of the plurality of tasks according to reference features generated when the data to be processed is processed or the pre-training model processes the data to be processed; the adapter comprises a shared projection structure and knowledge extraction structures corresponding to the tasks, wherein the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the tasks corresponding to the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks;
determining a processing result of a task corresponding to the decoder according to a target private feature and the target general feature of the task corresponding to the decoder through each decoder in the multi-task processing model; the multi-task processing model comprises decoders corresponding to the tasks.
2. The method according to claim 1, wherein the determining, by the adapter in the multitasking model, the target private feature of each of the plurality of tasks from the reference feature generated when the data to be processed is processed or the pre-training model processes the data to be processed includes:
determining, by the shared projection structure in the adapter, the reference generic feature according to the data to be processed or the reference feature;
determining a first sub-reference generic feature and a second sub-reference generic feature from the reference generic feature by a gating structure in the adapter;
determining, by each knowledge extraction structure in the adapter, a reference private feature of a task corresponding to the knowledge extraction structure according to the second sub-reference common feature;
for each task, determining a target private feature of the task according to the first sub-reference universal feature and the reference private feature of the task.
3. The method of claim 2, wherein the determining the reference generic feature from the data to be processed or the reference feature by the shared projection architecture in the adapter comprises:
Performing downsampling processing on the data to be processed or the reference features through a downsampling projection layer in the shared projection structure to obtain reference downsampling features;
performing nonlinear transformation processing on the reference downsampling characteristic through a nonlinear layer in the shared projection structure to obtain a reference transformation characteristic;
and carrying out upsampling processing on the reference transformation characteristic through an upsampling projection layer in the shared projection structure to obtain the reference general characteristic.
4. The method of claim 2, wherein said determining, by each of the knowledge extraction structures in the adapter, a reference private feature of a task to which the knowledge extraction structure corresponds based on the second sub-reference common feature comprises:
scaling the second sub-reference universal feature by a scaling factor in the knowledge extraction structure to obtain a reference scaling feature;
and carrying out shift processing on the reference scaling feature through a shift factor in the knowledge extraction structure to obtain the reference private feature of the task.
5. The method according to any one of claims 1 to 4, wherein a plurality of sub-coding structures are included in the pre-training model, and wherein an adapter to which each of the plurality of sub-coding structures corresponds is included in the multitasking model;
The determining the target general feature according to the data to be processed by a pre-training model in a multi-task processing model comprises the following steps:
determining target general characteristics output by the sub-coding structures according to input data of the sub-coding structures through each sub-coding structure in the pre-training model; the input data of the sub-coding structure is the data to be processed or the target general characteristics output by other sub-coding structures;
the determining, by the adapter in the multitasking model, the target private feature of each of the plurality of tasks according to the data to be processed or the reference feature generated when the pre-training model processes the data to be processed, includes:
determining, by each of the adapters, target private characteristics of each of the plurality of tasks output by the adapter according to input data of the adapter; the input data of the adapter is input data of a sub-coding structure corresponding to the adapter or reference characteristics generated by the sub-coding structure corresponding to the adapter.
6. The method according to claim 5, wherein said determining, by each decoder in the multitasking model, a processing result of a task corresponding to the decoder based on a target private feature of the task corresponding to the decoder and the target general feature, comprises:
And determining a processing result of the task corresponding to the decoder according to the target general feature output by each sub-coding structure and the target private feature of the task corresponding to the decoder output by each adapter through each decoder.
7. The method of claim 1, wherein the multitasking model is trained by:
acquiring training samples corresponding to the tasks respectively; the training sample comprises training data and corresponding labeling results;
aiming at each task, determining a training processing result corresponding to a training sample according to training data in the training sample corresponding to the task through the multitask processing model to be trained; and according to the training processing result and the labeling result in the training sample, adjusting model parameters of the shared projection structure and the knowledge extraction structure corresponding to the task and model parameters of the decoder corresponding to the task, which are included in the adapter in the multi-task processing model.
8. The method of claim 1, wherein when the data to be processed is image data, the plurality of tasks performed by the multitasking model include at least two of a semantic segmentation task, an instance segmentation task, a panoramic segmentation task, a human segmentation task, a saliency detection task.
9. A task processing device, the device comprising:
the data acquisition module is used for acquiring data to be processed;
the first feature extraction module is used for determining target general features according to the data to be processed through a pre-training model in the multi-task processing model; the multitasking model is used for executing a plurality of tasks based on input data;
the second feature extraction module is used for determining respective target private features of the tasks according to the data to be processed or reference features generated when the pre-training model processes the data to be processed through an adapter in the multi-task processing model; the adapter comprises a shared projection structure and knowledge extraction structures corresponding to the tasks, wherein the shared projection structure is used for extracting reference common features, the knowledge extraction structure is used for extracting reference private features of the tasks corresponding to the reference common features, and target private features of the tasks are determined according to the reference private features of the tasks;
the decoding module is used for determining the processing result of the task corresponding to the decoder according to the target private characteristic and the target general characteristic of the task corresponding to the decoder through each decoder in the multi-task processing model; the multi-task processing model comprises decoders corresponding to the tasks.
10. A computer device, the computer device comprising a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the task processing method according to any one of claims 1 to 8 according to the computer program.
11. A computer-readable storage medium storing a computer program for executing the task processing method according to any one of claims 1 to 8.
CN202311358507.XA 2023-10-19 2023-10-19 Task processing method and related device Active CN117094362B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311358507.XA CN117094362B (en) 2023-10-19 2023-10-19 Task processing method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311358507.XA CN117094362B (en) 2023-10-19 2023-10-19 Task processing method and related device

Publications (2)

Publication Number Publication Date
CN117094362A true CN117094362A (en) 2023-11-21
CN117094362B CN117094362B (en) 2024-02-09

Family

ID=88780217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311358507.XA Active CN117094362B (en) 2023-10-19 2023-10-19 Task processing method and related device

Country Status (1)

Country Link
CN (1) CN117094362B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117934451A (en) * 2024-03-13 2024-04-26 中国水利水电第一工程局有限公司 Unmanned aerial vehicle inspection method and system applied to photovoltaic power station

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021151296A1 (en) * 2020-07-22 2021-08-05 平安科技(深圳)有限公司 Multi-task classification method and apparatus, computer device, and storage medium
CN113704388A (en) * 2021-03-05 2021-11-26 腾讯科技(深圳)有限公司 Training method and device for multi-task pre-training model, electronic equipment and medium
WO2021259305A1 (en) * 2020-06-24 2021-12-30 华为技术有限公司 Multitask learning method and device
CN114282681A (en) * 2021-08-11 2022-04-05 腾讯科技(深圳)有限公司 Multitask processing and model training method, device, medium and equipment
CN114424215A (en) * 2019-09-25 2022-04-29 谷歌有限责任公司 Multitasking adapter neural network
US20220147721A1 (en) * 2020-11-10 2022-05-12 Naver Corporation Adapters for zero-shot multilingual neural machine translation
US20220343139A1 (en) * 2021-04-15 2022-10-27 Peyman PASSBAN Methods and systems for training a neural network model for mixed domain and multi-domain tasks
CN115269767A (en) * 2021-04-14 2022-11-01 华为技术有限公司 Model training method, device and storage medium
CN115391499A (en) * 2022-07-22 2022-11-25 网易(杭州)网络有限公司 Method for generating multitask generation model, question-answer pair generation method and related device
CN116524183A (en) * 2023-04-16 2023-08-01 西北工业大学 Camouflage target detection method based on multitask adapter fine adjustment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114424215A (en) * 2019-09-25 2022-04-29 谷歌有限责任公司 Multitasking adapter neural network
WO2021259305A1 (en) * 2020-06-24 2021-12-30 华为技术有限公司 Multitask learning method and device
WO2021151296A1 (en) * 2020-07-22 2021-08-05 平安科技(深圳)有限公司 Multi-task classification method and apparatus, computer device, and storage medium
US20220147721A1 (en) * 2020-11-10 2022-05-12 Naver Corporation Adapters for zero-shot multilingual neural machine translation
CN113704388A (en) * 2021-03-05 2021-11-26 腾讯科技(深圳)有限公司 Training method and device for multi-task pre-training model, electronic equipment and medium
CN115269767A (en) * 2021-04-14 2022-11-01 华为技术有限公司 Model training method, device and storage medium
US20220343139A1 (en) * 2021-04-15 2022-10-27 Peyman PASSBAN Methods and systems for training a neural network model for mixed domain and multi-domain tasks
CN114282681A (en) * 2021-08-11 2022-04-05 腾讯科技(深圳)有限公司 Multitask processing and model training method, device, medium and equipment
CN115391499A (en) * 2022-07-22 2022-11-25 网易(杭州)网络有限公司 Method for generating multitask generation model, question-answer pair generation method and related device
CN116524183A (en) * 2023-04-16 2023-08-01 西北工业大学 Camouflage target detection method based on multitask adapter fine adjustment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117934451A (en) * 2024-03-13 2024-04-26 中国水利水电第一工程局有限公司 Unmanned aerial vehicle inspection method and system applied to photovoltaic power station

Also Published As

Publication number Publication date
CN117094362B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN111898696B (en) Pseudo tag and tag prediction model generation method, device, medium and equipment
JP7373554B2 (en) Cross-domain image transformation
Liu et al. Real-time robust vision-based hand gesture recognition using stereo images
CN111553267B (en) Image processing method, image processing model training method and device
US20220222925A1 (en) Artificial intelligence-based image processing method and apparatus, device, and storage medium
EP4002161A1 (en) Image retrieval method and apparatus, storage medium, and device
CN112883149B (en) Natural language processing method and device
CN109034206A (en) Image classification recognition methods, device, electronic equipment and computer-readable medium
CN113435365B (en) Face image migration method and device
CN117094362B (en) Task processing method and related device
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
WO2022161302A1 (en) Action recognition method and apparatus, device, storage medium, and computer program product
CN111091010A (en) Similarity determination method, similarity determination device, network training device, network searching device and storage medium
CN114495916B (en) Method, device, equipment and storage medium for determining insertion time point of background music
WO2022222854A1 (en) Data processing method and related device
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
CN114298997B (en) Fake picture detection method, fake picture detection device and storage medium
CN112115744A (en) Point cloud data processing method and device, computer storage medium and electronic equipment
Han Texture image compression algorithm based on self-organizing neural network
CN117036658A (en) Image processing method and related equipment
CN114282543A (en) Text data processing method and device, computer equipment and storage medium
CN114298961A (en) Image processing method, device, equipment and storage medium
CN115861605A (en) Image data processing method, computer equipment and readable storage medium
CN114692715A (en) Sample labeling method and device
CN117173731B (en) Model training method, image processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant