Disclosure of Invention
In view of the foregoing, the present application provides a task processing method, apparatus, device, and computer storage medium.
In one aspect, the present application provides a task processing method, including:
acquiring a task request input by a user, wherein the task request comprises: the task type of the downstream task and the input information of the downstream task;
inputting the input information into an industrial adapter network model and a pre-training language model to respectively obtain a first output result and a second output result, wherein the industrial adapter network model is obtained by training according to industrial knowledge acquired from an industrial knowledge base;
and determining a target task result according to the first output result, the second output result and the task type.
Optionally, the determining a target task result according to the first output result, the second output result and the task type includes:
determining a splicing mode between the first output result and the second output result according to the task type;
and determining the target task result according to the first output result, the second output result and the splicing mode.
Optionally, the task types are: matching tasks or classifying tasks, wherein the splicing mode comprises the following steps: splicing and overlapping;
The determining the target task result according to the first output result, the second output result and the splicing mode includes:
respectively performing splicing treatment and superposition treatment on the first output result and the second output result to obtain a first task result and a second task result;
and taking the task result meeting the first preset condition in the first task result and the second task result as a target task result.
Optionally, before the task request input by the user is obtained, the method further includes:
acquiring initial industrial knowledge from an industrial knowledge base;
performing format conversion on the initial industrial knowledge to obtain a target corpus corresponding to a preset template, wherein the form of the target corpus is determined according to the semantics of the initial industrial knowledge and the labels of the preset template;
and training the adapter network model according to the target corpus to obtain the industrial adapter network model.
Optionally, after training the adapter network model according to the target corpus to obtain the industrial adapter network model, the method further includes:
acquiring task sample data, wherein the task sample data comprises sample data of various task types;
Respectively inputting the sample data of the plurality of task types into the industrial adapter network model and the pre-training language model to obtain a first characteristic output by the industrial adapter network model and a second characteristic output by the pre-training language model under each task type;
according to the first characteristic, the second characteristic and a preset splicing mode, determining a corresponding relation between each task type and the preset splicing mode, wherein the preset splicing mode comprises the following steps: stitching, stacking, fusing classification, and classical attention mechanisms.
Optionally, the determining, according to the first feature, the second feature, and the preset splicing manner, a correspondence between each task type and the preset splicing manner includes:
respectively performing splicing treatment, superposition treatment, fusion classification treatment and classical attention mechanism treatment on the first characteristic and the second characteristic to obtain a first training result, a second training result, a third training result and a fourth training result;
and determining the corresponding relation between the task type, the splicing processing, the superposition processing, the fusion classification processing and the classical attention mechanism processing according to the first training result, the second training result, the third training result and the fourth training result.
Optionally, before the sample data of the plurality of task types are input to the industrial adapter network model and the pre-training language model, the method further includes:
traversing connection points of the industrial adapter network model and the pre-training language model in a network searching mode;
and taking the target connection point meeting the second preset condition as the target connection point of the industrial adapter network model and the pre-training language model, wherein the target connection point has an association relation with the task type.
In a second aspect, the present application provides a task processing device, comprising:
the acquisition module is used for acquiring a task request input by a user, wherein the task request comprises: the task type of the downstream task and the input information of the downstream task;
the processing module is used for inputting the input information into an industrial adapter network model and a pre-training language model to respectively obtain a first output result and a second output result, wherein the industrial adapter network model is obtained by training according to industrial knowledge acquired from an industrial knowledge base;
and the determining module is used for determining a target task result according to the first output result, the second output result and the task type.
Optionally, the determining module determines a splicing mode between the first output result and the second output result according to the task type; and determining the target task result according to the first output result, the second output result and the splicing mode.
Optionally, the task types are: matching tasks or classifying tasks, wherein the splicing mode comprises the following steps: splicing and overlapping;
the processing module is further used for respectively performing splicing processing and superposition processing on the first output result and the second output result to obtain a first task result and a second task result;
the determining module is specifically configured to use a task result that satisfies a first preset condition in the first task result and the second task result as a target task result.
Optionally, the acquiring module is further configured to acquire initial industrial knowledge from an industrial knowledge base;
the processing module is further used for carrying out format conversion on the initial industrial knowledge to obtain a target corpus corresponding to a preset template, and the form of the target corpus is determined according to the semantics of the initial industrial knowledge and the label of the preset template;
The processing module is further used for training the adapter network model according to the target corpus to obtain the industrial adapter network model;
optionally, the acquiring module is further configured to acquire task sample data, where the task sample data includes sample data of multiple task types;
the processing module is further configured to input the sample data of the multiple task types to the industrial adapter network model and the pre-training language model, respectively, to obtain a first feature output by the industrial adapter network model and a second feature output by the pre-training language model under each task type;
the determining module is further configured to determine a correspondence between each task type and a preset splicing manner according to the first feature, the second feature, and the preset splicing manner, where the preset splicing manner includes: splicing, overlaying, fusing classification and classical attention mechanisms;
optionally, the processing module is further configured to perform a stitching process, a superposition process, a fusion classification process, and a classical attention mechanism process on the first feature and the second feature to obtain a first training result, a second training result, a third training result, and a fourth training result;
The determining module is specifically configured to determine a correspondence between the task type and the splicing process, the stacking process, the fusion classification process, and the classical attention mechanism process according to the first training result, the second training result, the third training result, and the fourth training result;
optionally, the processing module is further configured to traverse the connection points of the industrial adapter network model and the pre-training language model in a network searching manner;
the processing module is further configured to use a target connection point that meets a second preset condition as a target connection point of the industrial adapter network model and the pre-training language model, where the target connection point has an association relationship with the task type.
In a third aspect, the present application provides a task processing device, comprising:
a memory;
a processor;
wherein the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the task processing method as described in the first aspect and various possible implementations of the first aspect.
In a fourth aspect, the present application provides a computer storage medium having stored thereon computer executable instructions that are executed by a processor to implement the task processing method as described in the first aspect and the various possible implementations of the first aspect.
The task processing method provided by the application is characterized in that a task request input by a user is obtained, wherein the task request comprises the following steps: the task type of the downstream task and the input information of the downstream task; inputting the input information into an industrial adapter network model and a pre-training language model to respectively obtain a first output result and a second output result, wherein the industrial adapter network model is obtained by training according to industrial knowledge acquired from an industrial knowledge base; determining a target task result according to the first output result, the second output result and the task type; according to the method, the task results are output by adopting the two models, so that the task results are more accurate, different processing modes are adopted for different types of downstream tasks, the diversity of downstream task processing is improved, the range of the downstream task output results is enlarged, and meanwhile, the accuracy of the downstream task output results is improved.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
The pre-training language model has strong language understanding capability and language generating capability, and has smooth reply, strong readability and logic property when carrying out general field question and answer, and has wide application.
In the industrial field, the industrial brain is a key core for realizing industrial networking, digitizing and intellectualization and moving to intelligent manufacturing, and how a pre-trained language model can better enable intelligent manufacturing is a hot spot of industrial brain intellectualization. When the pre-training language model is applied to the industrial field, the effect is often poor, the processing problem is inaccurate, the industrial corpus of different forms and the corpus processed by different tasks are not necessarily accurate by directly using the pre-training language model to finely tune the obtained result, and the training workload of the model is required to be high again when another task is carried out.
The existing knowledge graph query can return results more efficiently and accurately, but the construction cost of the knowledge graph is very high, the knowledge graph cannot be used in a large scale, and the user intention and the transformation of the graph query statement are required when the user intention is understood through the knowledge graph query, so that the information is lost.
The traditional method for applying the pre-training language model to solve the specific technical field is Finetune, the method is to add a small network above a large network of the pre-training language model, and perform parameter fine adjustment on the large network or the small network by training data of a specific task, so as to improve the accuracy of the specific task, but the conventional pre-training language model has at least billions of network parameters, and the specific task data of the industrial field may only have thousands of samples, so that if the fine adjustment of the scale data volume is carried out along with the large network, the influence on the bottom network parameters is hardly caused, and only fine adjustment on the network between a feature layer and an output layer is realized, and the pre-training language model cannot realize the extraction of good features for the professional field. And the pre-trained language model needs to be retrained for different fields and different tasks.
In the scheme of the pre-training language model adapter based on knowledge injection, the principle is that an adapter network structure and a pre-training language model structure are parallel, the injected knowledge data only affects the network parameters of the adapter, the parameters of the pre-training language model are not affected, and the accuracy of an output result is ensured by extracting the characteristics of the adapter network model and the pre-training language model and performing characteristic splicing, so that the characteristics of the pre-training language model are conveniently extracted. However, the scheme can only obtain the accurate result of the target task when the data correlation between the target task and the training adapter network is high, and if the data correlation between the target task and the training adapter network is not high, the accurate result of the target task cannot be obtained, for example: the data of the named entity recognition task is used as knowledge injection, so that gains can not be brought to text classification tasks, multiple adapters can be generated due to the injection of multiple tasks, the dimension of the features is high, the difficulty of splicing is high, an adapter network is not adjusted according to a specific task, the feature splicing of the adapter network and a pre-training language model is inflexible, and the performance of knowledge injection is affected.
Aiming at the problems in the prior art, the application provides a task processing method, which is characterized in that when the input information of different types of downstream tasks is obtained, the input information is input into an industrial adapter network model and a pre-training language model to obtain two output results, and then a target task result is determined according to the two output results and the task type of the downstream tasks; according to the method, the task results are output by adopting the two models, so that the task results are more accurate, different processing modes are adopted for different types of downstream tasks, the diversity of downstream task processing is improved, and meanwhile, the accuracy of the downstream task results is improved.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a task processing method according to an embodiment of the present application. As shown in fig. 1, the task processing method provided in this embodiment includes:
S101: acquiring a task request input by a user, wherein the task request comprises: the task type of the downstream task and the input information of the downstream task.
The downstream tasks may be, for example, natural language processing (NLP, natural Language Processing) tasks, and different task requests correspond to different types of downstream tasks, where the task types of the downstream tasks may include, for example: matching tasks, classification tasks, extraction type question-answering tasks, sequence labeling class tasks, generation type system tasks and the like, and input information of downstream tasks can comprise, for example: sentences, sentences and sentences, questions and articles, and the like.
The user may input a task request through an input box displayed on the display interface, for example, or may input a corresponding task request through natural language.
After the task request input by the user is obtained, the task type of the corresponding downstream task and the input information of the downstream task can be obtained according to the task request.
It will be appreciated that different task requests may be processed using different industrial brains, with correspondence between the task type of the downstream task and the industrial brain. For example: when a user inputs a task request for indicating a matching task, the industrial brain corresponding to the matching task is selected to process the task request.
S102: and inputting the input information into an industrial adapter network model and a pre-training language model to respectively obtain a first output result and a second output result, wherein the industrial adapter network model is obtained by training according to industrial knowledge acquired from an industrial knowledge base.
The adapter network model is, for example, a network model trained according to industrial knowledge. The pre-training language model is, for example, a language model obtained by training in advance through massive data, and the pre-training language model may include, for example: ELMo model, GPT model, bert model.
The first output result is a result output by the industrial adapter network model, the second output result is a result output by the pre-training language model, and the first output result and the second output result can comprise word vectors output by the corresponding models.
In this step, the input information may be input to the industrial adapter network model and the pre-training language model, respectively, or the input information may be input to the pre-training language model, and then the input information may be transmitted to the industrial adapter network model through the pre-training language model. The application is not limited in this regard. The first output result and the second output result may be different or the same, for example.
S103: and determining a target task result according to the first output result, the second output result and the task type.
The target task result and the task type have a corresponding relationship, for example: if the task type is an extraction type question-answer task, the target task result is an answer range; if the task type is classified, the target task result is classified result.
The target task result can be determined jointly according to the first output result, the second output result and the task type.
For example, the type of the target task result can be determined according to the task type, and then the first output result and the second output result are used as the target task result together; or the result shared by the first output result and the second output result is used as a target task result.
It can be appreciated that in this step, the target task result is determined according to the results output by the industrial adapter network model and the pre-training language model, so that the range of the target task result is enlarged, and meanwhile, the accuracy of the target task result is improved.
According to the task processing method provided by the embodiment, the task request input by the user is obtained, and the task request comprises the following steps: the task type of the downstream task and the input information of the downstream task; inputting the input information into an industrial adapter network model and a pre-training language model to respectively obtain a first output result and a second output result, wherein the industrial adapter network model is obtained by training according to industrial knowledge acquired from an industrial knowledge base; determining a target task result according to the first output result, the second output result and the task type; according to the method, the task results are output by adopting the two models, so that the task results are more accurate, different processing modes are adopted for different types of downstream tasks, the diversity of downstream task processing is improved, the range of the downstream task output results is enlarged, and meanwhile, the accuracy of the downstream task output results is improved.
Fig. 2 is a flowchart of a task processing method according to an embodiment of the present application. As shown in fig. 2, the task processing method of the present embodiment is described in detail on the basis of fig. 1, and includes:
s201: initial industrial knowledge is obtained from an industrial knowledge base.
The initial industrial knowledge includes, for example: patents, maintenance manuals, quality control standards, dictionaries, maps, named entity recognition, text classification, text abstracts, intelligent questions and answers, etc. in the industrial field.
S202: and carrying out format conversion on the initial industrial knowledge to obtain a target corpus corresponding to a preset template, wherein the form of the target corpus is determined according to the semantics of the initial industrial knowledge and the labels of the preset template.
When training the adapter network model, the training process brings gain to the target task only when the data correlation between the target task and the adapter network model is high, for example: when the data of the named entity recognition task is used for optimizing the processing result of the adapter network model aiming at the text classification task, the data of the named entity recognition task does not bring gain to the processing result corresponding to the text classification task. Therefore, when optimizing the processing results of the adapter network model at present, knowledge corresponding to a plurality of different types of tasks is generally used to optimize the processing results of a plurality of corresponding adapter network models. Namely: in the prior art, knowledge injection of multiple tasks can generate multiple adapter network models, and the multiple adapter network models can lead to higher feature dimension of subsequent splicing, so that difficulty is brought to training of target tasks.
Therefore, in order to achieve the purpose of training multiple types of tasks through only one adapter network model, knowledge formats of the multiple types of tasks need to be converted into the same preset template, so that training is performed uniformly. Namely: and carrying out format conversion on the initial industrial knowledge to obtain a target corpus corresponding to the preset template. It is to be appreciated that the target corpus can include training sets of various types of tasks.
In this step, for example, the form of the corresponding preset template may be determined according to the semantics of the initial industrial knowledge and the label of the preset template, and then the initial industrial knowledge is filled into the corresponding preset template, so as to obtain the target corpus after format conversion.
The form of the target corpus may include, for example: label: x is a group; x (label); label-X; [ label classification ] X; wherein label is a label and X is an input sentence.
Taking the classification task in the industrial field as an example, the format conversion is performed by the following original training corpus form: inputting sentences: the smaller the density of the x=pvc plate, the better the quality and plasticity; label (label): the chemical industry.
Template 1: label: x is a group;
sample generation: chemical industry: the lower the density of the PVC plate, the better the quality and plasticity.
Template 2: x (label);
sample generation: the lower the density of the PVC plate, the better the quality and plasticity (chemical industry).
Template 3: label-X;
sample generation: the lower the density of the PVC plate, the better the quality and plasticity-the chemical industry.
Template 4: from label news: x is a group;
sample generation: from the news of the chemical industry: the lower the density of the PVC plate, the better the quality and plasticity.
Optionally, before the target corpus is obtained by using the preset template, the format of the preset template is fixed and limited, so that the obtained target corpus cannot contain all industrial knowledge in the industrial field, that is, the preset template can limit the diversity of the target corpus. Therefore, in order to make the target corpus fully contain all industrial knowledge in the industrial field, so as to improve the diversity of the target corpus, the preset template needs to be expanded, and specific expansion modes can be as follows: generating a plurality of expansion templates which are semantically close to the preset template by utilizing the paraphrase technology, screening the expansion templates, and deleting the expansion templates which do not meet the set conditions.
The specific implementation manner of screening the expansion template may be, for example:
Calculating the template score of the expansion template through the template score function, and determining whether the expansion template meets the set condition according to the template score of each expansion template and a preset set value.
Specific: the calculation formula of the template score function is as follows:
wherein,,representing input is +.>Training data of->Represent all training data, ++>Representing the removal of training data from training data +.>The labeling result of the sample is +.>。/>Representing a given language template P, obtaining an expanded sample E in the case that the input sample is x x The extended sample contains tag words assigned to the tag words by the universal language model M>Is a probability of (2). It will be appreciated that the training data is the initial industrial knowledge. The setting value is set according to different requirements of the user, and is not particularly limited herein.
When the calculated template score is greater than or equal to a set value, the expansion template is indicated to meet the set condition, and the expansion template does not need to be deleted at the moment. If the calculated template score is smaller than the set value, the expanded template is indicated to not meet the set condition, and the expanded template needs to be deleted at the moment.
The step determines the template score of each expansion template through the template score function, and screens out the expansion templates which do not meet the set conditions, thereby realizing the expansion of the preset templates, enabling the target corpus to fully contain all industrial knowledge in the industrial field, and improving the diversity of the target corpus.
S203: and training the adapter network model according to the target corpus to obtain the industrial adapter network model.
The implementation manner of training the adapter network model is as follows: and injecting the target corpus into the adapter network model so that the adapter network model performs iterative processing according to the target corpus, thereby completing training of the model and obtaining the industrial adapter network model.
Because the form of the target corpus is definite and similar, the adapter network model is trained through the target corpus, and only one corresponding industrial adapter network model is obtained. Therefore, the defect of generating a plurality of adapter network models in the prior art is avoided, and the difficulty of subsequent splicing is reduced.
S204: task sample data is obtained, wherein the task sample data comprises sample data of a plurality of task types.
The task sample data may be obtained from historical data, for example. The task sample data may include, for example: matching task data, classification task data, extraction type question-answer task data, sequence labeling task data, generation type system task data and the like.
The purpose of this step is to optimize the processing results of the downstream task (NLP) from the task sample data by training the completed industrial adapter network model and the pre-trained language model. The specific optimization process will be described in detail below.
S205: traversing the connection points of the industrial adapter network model and the pre-training language model in a network searching mode.
Wherein a plurality of connection points exist between the industrial adapter network model and the pre-trained language model. When the industrial adapter network model and the pre-training language model are connected by adopting different connection points, the prediction accuracy of the output result is different.
In this step, the connection points of the industrial adapter network model and the pre-training language model can be obtained through traversing in a network searching mode. The purpose of obtaining all the connection points between the two models is to select the optimal connection point from them in order to optimize the predicted outcome of the downstream task.
S206: and taking the target connection point meeting the second preset condition as the target connection point of the industrial adapter network model and the pre-training language model, wherein the target connection point has an association relation with the task type.
The second preset condition is, for example: the output result of the downstream task has the highest prediction accuracy. The target connection point is a connection point of the industrial adapter network model and the pre-training language model, which enables the output result of the downstream task to have the highest prediction accuracy.
It will be appreciated that when there are multiple training tasks and the training tasks of the industrial adapter network model and the training tasks of the pre-training language model are inconsistent, the industrial adapter network model and the pre-training language model will not give reasonable optimization goals, thereby affecting the prediction accuracy.
Therefore, in optimizing the processing results of the downstream task (NLP), it is necessary to ensure that the training tasks of the industrial adapter network model and the pre-training language model are the same. Namely: and taking the connection point of the industrial adapter network model with the highest prediction accuracy of the output result of the downstream task and the pre-training language model as a target connection point.
The target connection point in the above steps is not fixed, and when in different task modes, the connection points of the industrial adapter network model and the pre-training language model can be different, so that the connection mode with the highest model prediction accuracy under different task modes is obtained.
S207: and respectively inputting the sample data of the plurality of task types into the industrial adapter network model and the pre-training language model to obtain a first characteristic output by the adapter network model and a second characteristic output by the pre-training language model under each task type.
The first feature includes, for example: inputting the sample data into a word vector obtained in an industrial adapter network model; the second features include, for example: the sample data is input into word vectors obtained in the pre-trained language model.
Word vectors (Word emplacement), also known as a collective term for a set of language modeling and feature learning techniques in Word embedded Natural Language Processing (NLP), wherein words or phrases from a vocabulary are mapped to vectors of real numbers. Conceptually, it involves mathematical embedding from a space of one dimension per word to a continuous vector space with lower dimensions. The first feature and the second feature may be, for example, features of the same dimension or features of different dimensions.
Sample data of multiple task types are input into an industrial adapter network model and a pre-training language model, and word vectors output by the two models under different task types can be obtained.
S208: and determining the corresponding relation between each task type and the preset splicing mode according to the first characteristic, the second characteristic and the preset splicing mode.
The preset splicing manner may include, for example: stitching, stacking, fusing classification, and classical attention mechanisms.
Stitching refers to the expansion of two features in dimension.
Superposition means that the two features are numerically added and the dimensions remain unchanged.
Fusion classification (SE-gate) refers to a combination process of compression followed by excitation. In the compression stage, the dimension of each vector is regarded as a channel, the vector average of a plurality of words in a sentence is calculated, so that the preliminary weight of each channel is obtained, and then in the excitation stage, the weight of each channel is further adjusted according to training data.
Classical attention mechanisms (attention mechanisms) are those that selectively learn inputs by preserving the intermediate output results of the LSTM encoder on the input sequences, and then training a model and correlating the output sequences with the model output.
And obtaining the relation between each task type and the splicing mode according to the word vector obtained in the industrial adapter network model, the word vector obtained in the pre-training language model and the preset splicing mode.
It can be understood that under different task types, the prediction accuracy obtained by adopting different preset splicing modes is different. In this step, for each task type, a prediction accuracy corresponding to each preset splicing mode may be obtained first, and then, according to the prediction accuracy, the preset splicing mode corresponding to each task type is determined.
S209: acquiring a task request input by a user, wherein the task request comprises: the task type of the downstream task and the input information of the downstream task.
S210: and inputting the input information into an industrial adapter network model and a pre-training language model to respectively obtain a first output result and a second output result, wherein the industrial adapter network model is obtained by training according to industrial knowledge acquired from an industrial knowledge base.
Steps S209 to S210 are similar to steps S101 to S102 described above, and will not be repeated here.
S211: and determining a splicing mode between the first output result and the second output result according to the task type.
After the first output result and the second output result are obtained, the splicing mode of the first output result and the second output result can be determined according to the corresponding relation between the task type and the splicing mode. The splice means may comprise one or more, for example.
S212: and determining the target task result according to the first output result, the second output result and the splicing mode.
And the target task result and the task type of the downstream task have an association relation. Different downstream tasks have different target task results corresponding to different task types.
Optionally, when the task type is a matching task or a classifying task, the splicing manner includes splicing and stacking, and the specific implementation manner of determining the target task result may be, for example:
respectively performing splicing treatment and superposition treatment on the first output result and the second output result to obtain a first task result and a second task result; and taking the task result meeting the first preset condition in the first task result and the second task result as a target task result.
The first preset condition may include, for example, that the accuracy is highest. Namely: when the task type is a matching task, the first preset condition is that the matching accuracy is highest; when the task type is classified, the first preset condition is that the distribution accuracy is highest.
After the first output result and the second output result are obtained, splicing processing and superposition processing can be respectively carried out on the first output result and the second output result, so that a first task result output by the pre-training language model and a second task result output by the industrial adapter network model are obtained, and then the task result with the highest accuracy in the first task result and the second task result is used as a target task result. For a specific procedure, reference may be made to the downstream task examples described below.
According to the task processing method provided by the embodiment, format conversion is carried out on initial industrial knowledge to obtain target corpus with consistent types, and then an adapter network model is trained according to the target corpus with consistent types to obtain an industrial adapter network model; therefore, only one adapter network model is needed to be generated, the defect of generating a plurality of adapter network models in the prior art is avoided, and the subsequent splicing of the adapter network model and the pre-training language model is facilitated; the method adopts different processing modes aiming at different types of downstream tasks, thereby improving the diversity of downstream task processing and simultaneously improving the accuracy of downstream task processing.
Specific implementation ways for obtaining corresponding target task results for different downstream task types are described in detail below.
Fig. 3 is a schematic diagram of task processing with task types matching tasks according to an embodiment of the present application. As shown in fig. 3, the input information is: sentence 1 and sentence 2, the concatenation mode includes: splicing and overlapping, wherein the target task result is as follows: and matching the result.
According to the sentence 1 and the sentence 2, the pre-training language model can obtain a word vector 1, the industrial adapter network model can obtain a word vector 2 according to the inputted sentence 1 and sentence 2, the word vector 1 and the word vector 2 are respectively subjected to splicing processing and superposition processing to obtain a first task result and a second task result, and further a task result with higher matching degree is selected from the first task result and the second task result to serve as a target task result.
Fig. 4 is a task processing schematic diagram of a task type classified task according to an embodiment of the present application. As shown in fig. 4, the input information is: sentence, the concatenation mode includes: splicing and overlapping, wherein the target task result is as follows: and classifying the result.
The pre-training language model can obtain a word vector 1 according to an input sentence, the industrial adapter network model can obtain a word vector 2 according to the input sentence, the word vector 1 and the word vector 2 are respectively subjected to splicing processing and superposition processing to obtain a first task result and a second task result, and further a task result with higher matching degree is selected from the first task result and the second task result to serve as a target task result.
Fig. 5 is a task processing schematic diagram of a task type of an extraction question-answering task according to an embodiment of the present application. As shown in fig. 5, the input information is: questions and articles, the splice approach includes: splicing, overlapping, fusing classification and classical attention mechanisms, and the target task results are: answer range.
According to the input problems and articles, the pre-training language model can obtain a word vector 1, the industrial adapter network model can obtain a word vector 2 according to the input problems and articles, the word vector 1 and the word vector 2 are respectively subjected to splicing processing, superposition processing, fusion classification processing and classical attention mechanism processing to obtain a first task result, a second task result, a third task result and a fourth task result, and further a task result with higher matching degree is selected from the first task result, the second task result, the third task result and the fourth task result to serve as a target task result.
Fig. 6 is a schematic diagram of task processing for a task type sequence labeling task according to an embodiment of the present application. As shown in fig. 6, the input information is: sentence, the concatenation mode includes: splicing, overlapping, fusing classification and classical attention mechanisms, and the target task results are: and (5) labeling a result by the sequence.
The pre-training language model can obtain a word vector 1 according to an input sentence, the industrial adapter network model can obtain a word vector 2 according to the input sentence, the word vector 1 and the word vector 2 are respectively subjected to splicing processing, superposition processing, fusion classification processing and classical attention mechanism processing to obtain a first task result, a second task result, a third task result and a fourth task result, and further a task result with higher matching degree is selected from the first task result, the second task result, the third task result and the fourth task result to serve as a target task result.
Fig. 7 is a task processing schematic diagram of a task type of a generating system task according to an embodiment of the present application, where, as shown in fig. 7, input signals are: sentence, the concatenation mode includes: splicing, overlapping, fusing classification and classical attention mechanisms, and the target task results are: and generating a formula result.
The pre-training language model can obtain a word vector 1 according to an input sentence, the industrial adapter network model can obtain a word vector 2 according to the input sentence, the word vector 1 and the word vector 2 are respectively subjected to splicing processing, superposition processing, fusion classification processing and classical attention mechanism processing to obtain a first task result, a second task result, a third task result and a fourth task result, further a task result with higher matching degree is selected from the first task result, the second task result, the third task result and the fourth task result to serve as the input word vector, the corresponding word vector is obtained through self-attention mechanism (self-attention mechanism) training with the sentence generated immediately before, and the corresponding word vector is input to the context-attention mechanism to obtain a target task result.
Fig. 8 is a schematic structural diagram of a task processing device provided by the present application. As shown in fig. 8, the present application provides a task processing device 300 including:
the obtaining module 301 is configured to obtain a task request input by a user, where the task request includes: the task type of the downstream task and the input information of the downstream task;
the processing module 302 is configured to input the input information into an industrial adapter network model and a pre-training language model, to obtain a first output result and a second output result, where the industrial adapter network model is obtained by training according to industrial knowledge obtained from an industrial knowledge base;
And the determining module 303 is configured to determine a target task result according to the first output result, the second output result, and the task type.
Optionally, the determining module 303 is specifically configured to determine, according to the task type, a splicing manner between the first output result and the second output result; and determining the target task result according to the first output result, the second output result and the splicing mode.
Optionally, the task types are: matching tasks or classifying tasks, wherein the splicing mode comprises the following steps: splicing and overlapping;
the processing module 302 is further configured to perform a splicing process and an overlapping process on the first output result and the second output result, to obtain a first task result and a second task result;
the determining module 303 is specifically configured to take a task result that satisfies a first preset condition of the first task result and the second task result as a target task result.
Optionally, the acquiring module 301 is further configured to acquire initial industrial knowledge from an industrial knowledge base;
the processing module 302 is further configured to perform format conversion on the initial industrial knowledge to obtain a target corpus corresponding to a preset template, where a form of the target corpus is determined according to semantics of the initial industrial knowledge and a label of the preset template;
The processing module 302 is further configured to train an adapter network model according to the target corpus, so as to obtain the industrial adapter network model;
optionally, the obtaining module 301 is further configured to obtain task sample data, where the task sample data includes sample data of multiple task types;
the processing module 302 is further configured to input the sample data of the plurality of task types to the industrial adapter network model and the pre-training language model, respectively, to obtain a first feature output by the industrial adapter network model and a second feature output by the pre-training language model under each task type;
the determining module 303 is further configured to determine, according to the first feature, the second feature, and a preset splicing manner, a correspondence between each task type and the preset splicing manner, where the preset splicing manner includes: splicing, overlaying, fusing classification and classical attention mechanisms;
optionally, the processing module 302 is further configured to perform a stitching process, an overlapping process, a fusion classification process, and a classical attention mechanism process on the first feature and the second feature to obtain a first training result, a second training result, a third training result, and a fourth training result;
The determining module 303 is specifically configured to determine, according to the first training result, the second training result, the third training result, and the fourth training result, a correspondence between the task type and the splicing process, the stacking process, the fusion classification process, and the classical attention mechanism process;
optionally, the processing module 302 is further configured to traverse the connection points of the industrial adapter network model and the pre-training language model in a network searching manner;
the processing module 302 is further configured to use a target connection point that meets a second preset condition as a target connection point of the industrial adapter network model and the pre-training language model, where the target connection point has an association relationship with the task type.
Fig. 9 is a schematic structural diagram of a task processing device provided by the present application. As shown in fig. 9, the present application provides a task processing device 400 including: a receiver 401, a transmitter 402, a processor 403 and a memory 404.
A receiver 401 for receiving instructions and data;
a transmitter 402 for transmitting instructions and data;
memory 404 for storing computer-executable instructions;
A processor 403, configured to execute computer-executable instructions stored in the memory 404, to implement the steps executed by the task processing method in the above embodiment. Reference may be made in particular to the description of the task processing method embodiments described above.
Alternatively, the memory 404 may be separate or integrated with the processor 403.
When the memory 404 is provided separately, the electronic device further comprises a bus for connecting the memory 404 and the processor 403.
The application also provides a computer readable storage medium, wherein the computer readable storage medium stores computer execution instructions, and when the processor executes the computer execution instructions, the task processing method executed by the task processing device is realized.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
While the present application has been described with reference to the preferred embodiments shown in the drawings, it will be readily understood by those skilled in the art that the scope of the application is not limited to those specific embodiments, and the above examples are only for illustrating the technical solution of the application, not for limiting it; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.