CN116029441A - Data processing method, device and equipment - Google Patents

Data processing method, device and equipment Download PDF

Info

Publication number
CN116029441A
CN116029441A CN202310090590.0A CN202310090590A CN116029441A CN 116029441 A CN116029441 A CN 116029441A CN 202310090590 A CN202310090590 A CN 202310090590A CN 116029441 A CN116029441 A CN 116029441A
Authority
CN
China
Prior art keywords
model
sub
target
stabilizer
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310090590.0A
Other languages
Chinese (zh)
Inventor
包容
崔世文
李志峰
许卓尔
孟昌华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202310090590.0A priority Critical patent/CN116029441A/en
Publication of CN116029441A publication Critical patent/CN116029441A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification discloses a data processing method, a device and equipment, wherein the method comprises the following steps: obtaining service data of a target service, inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizers are arranged in one network layer of the first sub-model, after the first sub-model is subjected to model training, model parameters in the first sub-model are kept unchanged, the trained target model is obtained through the parameters in the front stabilizer of the countermeasure sample training, and service processing is performed on the target service based on the prediction result of the service data.

Description

Data processing method, device and equipment
Technical Field
The present document relates to the field of computer technologies, and in particular, to a data processing method, apparatus, and device.
Background
In recent years, the development of deep learning based on a deep neural network has been greatly successful in many fields (such as the field of natural language processing, etc.), however, the robustness of the model is still greatly lacking, for example, in the emotion classification task, some words in the original text are replaced by their close meaning words, and the deep neural network may have misjudgment. The slightly modified sample is called an challenge sample, the difference in performance of the model between the challenge sample and the original sample is called model robustness, the most widely used way to improve model robustness is to train the model by minimizing the empirical loss of the challenge sample, however, the challenge training method requires the technician to retrain the model and further modify all the parameters (including the inherent model parameters of the model), and the process requires a lot of time and is unfavorable for the model deployed in industrial application. Therefore, it is necessary to provide a model processing method with better expandability and capable of improving the robustness of the model.
Disclosure of Invention
The embodiment of the specification aims to provide a model processing mode which is better in expandability and capable of improving model robustness.
In order to achieve the above technical solution, the embodiments of the present specification are implemented as follows:
the embodiment of the specification provides a data processing method, which comprises the following steps: and acquiring service data of the target service. The business data are input into a pre-trained target model, the business data are predicted through a first sub-model and a front stabilizer applied to the target business in the target model, a prediction result aiming at the business data is obtained, the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizers are arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after training is carried out on the front stabilizer through an countermeasure sample, the trained target model is obtained. And carrying out service processing on the target service based on the prediction result of the service data.
The embodiment of the present specification provides a data processing apparatus, including: and the data acquisition module acquires service data of the target service. The prediction module is used for inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizers are arranged in one network layer of the first sub-model, after the first sub-model performs model training, model parameters in the first sub-model are kept unchanged, and after the front stabilizer is trained through countermeasure samples, the trained target model is obtained. And the service processing module is used for carrying out service processing on the target service based on the prediction result of the service data.
A data processing apparatus provided in an embodiment of the present specification includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: and acquiring service data of the target service. The business data are input into a pre-trained target model, the business data are predicted through a first sub-model and a front stabilizer applied to the target business in the target model, a prediction result aiming at the business data is obtained, the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizers are arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after training is carried out on the front stabilizer through an countermeasure sample, the trained target model is obtained. And carrying out service processing on the target service based on the prediction result of the service data.
The present description also provides a storage medium for storing computer-executable instructions that when executed by a processor implement the following: and acquiring service data of the target service. The business data are input into a pre-trained target model, the business data are predicted through a first sub-model and a front stabilizer applied to the target business in the target model, a prediction result aiming at the business data is obtained, the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizers are arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after training is carried out on the front stabilizer through an countermeasure sample, the trained target model is obtained. And carrying out service processing on the target service based on the prediction result of the service data.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram illustrating an embodiment of a data processing method according to the present disclosure;
FIG. 2 is a diagram of another embodiment of a data processing method according to the present disclosure;
FIG. 3 is a diagram of another embodiment of a data processing method according to the present disclosure;
FIG. 4 is a diagram of an embodiment of a data processing apparatus according to the present disclosure;
fig. 5 is a data processing apparatus embodiment of the present specification.
Detailed Description
The embodiment of the specification provides a data processing method, a device and equipment.
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
Example 1
As shown in fig. 1, the embodiment of the present disclosure provides a data processing method, where an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, such as a smart watch, an in-vehicle device, or the like). The server may be a single server, a server cluster including a plurality of servers, a background server such as a financial service or an online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example for detailed description, and the following related contents may be referred to for the execution process of the terminal device, which is not described herein. The method specifically comprises the following steps:
In step S102, service data of a target service is acquired.
The target service may include various kinds of services, for example, a payment service, a transfer service, a face recognition service, an online transaction service, a public opinion service of an event, etc., and the target service in this embodiment may be one service or may include a plurality of services, which may be specifically set according to actual situations, and this embodiment of the present disclosure is not limited thereto. The service data may be data generated during the process of executing the target service, in practical application, the service data may be set according to a specific target service, the service data may include time, place, account information of executing the target service, data related to the target service, and the like, specifically, for example, the target service is a public opinion service of an event, the service data may include time, place, related person or thing information of the event, content text of the event, positive comment information and negative comment information received for the event, and the like, which may be specifically set according to practical situations, and embodiments of the present disclosure do not limit the present disclosure.
In implementation, in recent years, deep learning based on a deep neural network has been greatly successful in many fields (such as the field of natural language processing, etc.), however, the robustness of the model is still greatly lacking, for example, in an emotion classification task, some words in an original text are replaced by their close meaning words, and misjudgment occurs in the deep neural network. The slightly modified sample is called an challenge sample, the difference in performance of the model between the challenge sample and the original sample is called model robustness, the most widely used way to improve model robustness is to train the model by minimizing the empirical loss of the challenge sample, however, the challenge training method requires the technician to retrain the model and further modify all the parameters (including the inherent model parameters of the model), and the process requires a lot of time and is unfavorable for the model deployed in industrial application. Therefore, it is necessary to provide a model processing method with better expandability and capable of improving the robustness of the model. The embodiment of the present disclosure provides an achievable processing manner, which specifically may include the following:
For the target service, when the target user executes the target service each time, the relevant information of the target user executing the target service may be recorded, and the recorded relevant information may be used as service data of the target service, where the recorded information may include time of executing the target service, information generated by each operation of the target user in the process of executing the target service, location information, and the like, which may specifically be set according to an actual situation, and embodiments of the present disclosure do not limit the present disclosure. By the method, the service data of the target service can be obtained, and when corresponding service prediction (particularly, judging whether the service data has specified risk (such as fraud risk and the like) or not) is needed to be carried out on the service data of the target service, the service data generated by the target service at present can be obtained.
In step S104, the service data is input into a pre-trained target model, the service data is predicted by a first sub-model and a pre-stabilizer applied to the target service in the target model, a prediction result for the service data is obtained, the target model includes a first sub-model and a pre-stabilizer, the first sub-model includes a plurality of different network layers, the pre-stabilizer includes one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after the first sub-model performs model training, model parameters in the first sub-model are kept unchanged, and the trained target model is obtained by training parameters in the pre-stabilizer against samples.
The first sub-model may be constructed by a plurality of different algorithms, for example, may be constructed by a specified neural network model, and the neural network model may include a plurality of types, such as a convolutional neural network model, and may be specifically set according to the actual situation, which is not limited in the embodiment of the present disclosure. The first sub-model may be used to predict the service data of the target service to obtain a corresponding prediction result, but if the service data is disturbed to a certain extent (for example, a similar meaning word of some words is used to replace the words), the prediction result obtained after the disturbed service data is input into the first sub-model may be quite different from the prediction result. The pre-stabilizer may be a component that can be inserted into the first sub-model to solve the problem of accuracy of the prediction result of the perturbed service data, where the pre-stabilizer may include parameters that need to be added for each network layer in the first sub-model, for example, the first sub-model includes an input layer, an intermediate layer, and an output layer, parameters that need to be added may be set for the input layer, the intermediate layer, and the output layer, respectively, the pre-stabilizer may be constructed based on the parameters that need to be added, or the first sub-model includes the input layer, the intermediate layer, and the output layer, parameters that need to be added may be set for the intermediate layer, and the pre-stabilizer may be constructed based on the parameters that need to be added, which may be specifically set according to the actual situation, and the embodiment of the present disclosure is not limited.
In implementation, a corresponding algorithm may be obtained, and a first sub-model may be constructed based on the algorithm, where input data of the first sub-model may be service data generated when a target service is executed, output data may be a prediction result corresponding to the input data, then a training sample for training the first sub-model (i.e., service data generated when the target service is executed) may be obtained, the first sub-model may be trained using the training sample to obtain a corresponding prediction result, an objective function may be preset, and model parameters in the first sub-model may be optimized based on the objective function, where the first sub-model may be adjusted for the objective function. And then, training the first sub-model by using a training sample, and simultaneously optimizing the model parameters through the objective function to finally obtain the trained first sub-model. In addition, in order to make the model of the target service possess the network layer that resists the attack of the sample, the parameter that needs to increase in the first sub-model can be confirmed according to the actual situation, can build the front-end stabilizer based on the parameter that needs to increase, and can set up the front-end stabilizer in the first sub-model, obtain initial target model, then, can carry out disturbance processing to the business data that obtains the disturbance, can use the business data of disturbance as new training sample, can use training sample to input into the target model, wherein, the model parameter in the first sub-model in the target model remains unchanged, but only train the parameter in the front-end stabilizer, after the new training sample is input into the target model, can be through the first sub-model that keeps the model parameter unchanged in the target model and the front-end stabilizer to carry out analysis and business prediction, obtain corresponding prediction result, can set up objective function in advance, can carry out optimization processing to the parameter in the target model (especially the front-end stabilizer) based on this objective function, wherein, can carry out adjustment to the front-end stabilizer. And then, using a new training sample to train the target model, and simultaneously optimizing the parameters in the front stabilizer through the target function to finally obtain the trained target model.
After the service data of the target service is obtained in the above manner, the service data of the target service can be input into the pre-trained target model, and the service data is predicted through the first sub-model and the front stabilizer applied to the target service in the target model, so as to obtain a prediction result aiming at the service data.
In step S106, the target service is processed based on the prediction result of the service data.
In the implementation, if the prediction result of the service data indicates that there is no abnormality in the service data, the service processing may be continued on the target service, and if the prediction result of the service data indicates that there is an abnormality in the service data, the service processing may be refused on the target service, which may be specifically set according to the actual situation, and the embodiment of the present specification is not limited thereto.
The embodiment of the specification provides a data processing method, by acquiring service data of a target service, inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after the first sub-model performs model training, model parameters in the first sub-model are kept unchanged, a trained target model is obtained by training parameters in the front stabilizer against a sample, service processing is performed on the target service based on the prediction result of the service data, thus, a pre-stabilizer is proposed, which may be a lightweight and scalable module designed to enable an object model after combination to be counter-trained when inserted into a first sub-model, in particular inserting trainable parameters (i.e. parameters in the pre-stabilizer) into corresponding network layers in the first sub-model, allowing subsequent parameters to be concatenated with the parameters in the pre-stabilizer, and then performing counter-training only on the parameters in the pre-stabilizer, while keeping the model parameters in the first sub-model frozen (i.e. kept unchanged), so that fewer parameters need to be trained, a faster training speed, while occupying less GPU display memory, and that the robustness of the model after training of the pre-stabilizer is improved to be comparable to the performance of counter-training all model parameters, different front stabilizers can be replaced for different scenes, so that the model is suitable for different scenes, and further, the robustness of the model in an actual scene is improved more conveniently, and the service model is deployed more flexibly.
Example two
As shown in fig. 2, the embodiment of the present disclosure provides a data processing method, where an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, such as a smart watch, an in-vehicle device, or the like). The server may be a single server, a server cluster including a plurality of servers, a background server such as a financial service or an online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example for detailed description, and the following related contents may be referred to for the execution process of the terminal device, which is not described herein. The method specifically comprises the following steps:
in step S202, a training sample is acquired, and a distance threshold value based on a distance between a challenge sample constructed from the training sample and the training sample, a first sub-model trained in advance, an optimizer, an initialized pre-stabilizer, and disturbance information are acquired.
The first sub-model may be constructed based on a deep neural network model, where the deep neural network model may include various types, for example, the deep neural network model may include a convolutional neural network model, a recurrent neural network model, a deep belief network model, a deep automatic encoder, a generating countermeasure network model, and the like, and may be specifically set according to practical situations, which is not limited in this embodiment of the present specification. The first sub-model may include a plurality of different network layers, in this embodiment, the first sub-model may include an input layer and a hidden layer, the pre-stabilizer may include an input sub-stabilizer and a hidden sub-stabilizer, the input sub-stabilizer is disposed in the input layer of the first sub-model, the hidden sub-stabilizer is disposed in the hidden layer of the first sub-model, and accordingly, parameters in the input sub-stabilizer are parameters inserted into the input layer of the first sub-model, and parameters in the hidden sub-stabilizer are parameters inserted into the hidden layer of the first sub-model. The hidden layer of the first sub-model may be built by an attention mechanism that may enable the deep neural network to focus on a subset of its inputs (or features), i.e. select a particular input, and the attention mechanism may be applied to any type of input, regardless of the form of its data. The mechanism of attention can be divided into two types: one is a top-down conscious attention mechanism, which may be referred to as a focused (i.e., focus) attention mechanism, which refers to a task-dependent attention mechanism that actively focuses on an object consciously for a predetermined purpose. The other is a bottom-up unconscious attention mechanism, which can be called a Saliency-Based attention mechanism, which is attention driven by external stimuli, does not need active intervention, and is irrelevant to tasks. If the stimulus information of a subject is different from the surrounding information, an unconscious "Winner-Take-All" or Gating (Gating) mechanism may divert attention to the subject, whether the attention is intentional or unintentional, most of the brain activities need to rely on attention, such as memorizing information, reading or thinking, etc. In practical applications, the hidden layer of the first sub-model may be constructed by a Multi-Head Attention mechanism in a converters architecture, and the hidden layer of the first sub-model may include encoders stacked in the converters architecture, where the Multi-Head Attention mechanism may utilize multiple queries to calculate multiple data in parallel from input data, each Attention focusing on a different part of the input data, and the hard Attention mechanism is based on the expectation of all the input data of the Attention distribution. The hard attention mechanism can be realized in two ways, one is to select the input data with the highest probability; another hard attention mechanism may be implemented by randomly sampling on the attention distribution, which attention mechanism is used in the present embodiment may be set according to the actual situation, which is not limited in the present embodiment. The distance threshold may be set according to the actual situation, in the actual application, the countersample and the training sample (i.e., the original sample) are almost unchanged visually or semantically, but the countersample and the training sample are divided into two different samples by a specified model or algorithm, specifically, if the countersample and the training sample (i.e., the original sample) are images, the countersample and the training sample are almost unchanged visually, but the countersample and the training sample are divided into two different samples by a specified model or algorithm, if the countersample and the training sample (i.e., the original sample) are texts, the countersample and the training sample are almost unchanged semantically, but the countersample and the training sample are divided into two different samples by a specified model or algorithm, therefore, the countersample and the training sample can be almost unchanged visually or semantically, but the countersample and the training sample are divided into two different samples by a specified model or algorithm as basic setting rules, the distance threshold may be set based on the setting rules, specifically, and the setting may be performed according to the actual situation, and the present invention does not limit the description. The optimizer may be constructed by a preset optimization algorithm, and may be used for performing optimization processing on model parameters of a certain model in a process of training the model, and in practical application, the optimizer may include various types of optimizers, for example, an SGD optimizer, an SGDM optimizer, a NAG optimizer, an AdaGrad optimizer, an AdaDelta optimizer, an Adam optimizer, a Nadam optimizer, an AdamW optimizer, etc., and may be specifically set according to practical situations, which is not limited in this embodiment of the present disclosure, for example, the optimizer in this embodiment may be an AdamW optimizer, etc. The disturbance information may be information that needs to be added to the training sample (i.e., the original sample) to convert the training sample into a satisfactory challenge sample.
In implementation, the first sub-model may be constructed based on a preset algorithm, then a training sample (i.e., service data generated when the target service is executed) for training the first sub-model may be obtained, and the first sub-model may be model-trained using the training sample, to finally obtain a trained first sub-model. In addition, a distance threshold value, an optimizer, an initialized pre-stabilizer, disturbance information, and the like of a distance between the challenge sample and the training sample, which are constructed based on the training sample, for training the target model may be acquired, and the above information may be set according to actual situations, which is not limited in the embodiment of the present specification. In practical application, the disturbance information may be obtained in a plurality of different manners, for example, the corresponding disturbance information may be obtained by uniformly distributing and sampling, which may be specifically set according to the actual situation, and the embodiment of the present disclosure is not limited to this.
In step S204, a corresponding challenge sample is constructed based on the training sample, the disturbance information, and the distance threshold, and a target model to be trained is constructed based on the first sub-model and the initialized pre-stabilizer.
In implementation, after the training sample, the disturbance information and the distance threshold are obtained in the above manner, a training sample may be obtained, then the disturbance information may be added to the training sample to generate a corresponding countermeasure sample, and whether the generated countermeasure sample can be used for subsequent target model training may be measured by the distance threshold, and if the distance between the generated countermeasure sample and the training sample is within the above distance threshold, the generated countermeasure sample can be used for subsequent target model training, where the countermeasure sample may be retained. If the distance between the generated challenge sample and the training sample is not within the distance threshold, the generated challenge sample cannot be used for subsequent target model training, at which point the challenge sample may be discarded.
The parameters in the pre-stabilizer can be initialized to obtain an initialized pre-stabilizer, the initialized pre-stabilizer can be analyzed to determine parameters of network layers contained in the pre-stabilizer or determine sub-stabilizers contained in the pre-stabilizer, and then the pre-stabilizer can be combined with the first sub-model according to the analysis result, so that a corresponding target model is constructed. Based on the above, the front stabilizer includes an input sub-stabilizer and a hidden sub-stabilizer, and then the input sub-stabilizer may be disposed in the input layer of the first sub-model, the hidden sub-stabilizer may be disposed in the hidden layer of the first sub-model, and accordingly, the parameters in the input sub-stabilizer are inserted into the input layer of the first sub-model, and the parameters in the hidden sub-stabilizer are inserted into the hidden layer of the first sub-model, so as to obtain the target model to be trained.
In step S206, model training is performed on the target model to be trained based on the challenge sample and the optimizer, to obtain a trained target model.
In implementation, input data of the target model to be trained may be a challenge sample, output data may be a prediction result corresponding to the input data, then a challenge sample for training the target model may be obtained, the target model may be model-trained using the challenge sample to obtain a corresponding prediction result, and model parameters in the target model may be optimized using an optimizer, where the target model may be adjusted for the optimizer. And then, model training can be carried out on the target model by using the countermeasure sample, and meanwhile, the model parameters are optimized through the optimizer, so that the trained target model is finally obtained.
In practical applications, the specific processing of the step S206 may be varied, and an alternative processing manner is provided below, which may specifically include the following processing from step A2 to step A6.
In step A2, disturbance normalization parameters, an countermeasure training update rate, a countermeasure training step length, and a learning rate are acquired.
In the implementation, the disturbance normalization parameter, the countermeasure training update rate, the countermeasure training step length, the learning rate, and other information may be obtained according to actual situations, specifically, the disturbance normalization parameter may be determined according to the disturbance information, the countermeasure training update rate, the countermeasure training step length, the learning rate, and the like may be set according to expert experience, or the countermeasure training update rate, the countermeasure training step length, the learning rate, and the like may be set according to actual situations, specifically, may be set according to actual situations, and the embodiment of the present specification does not limit the description.
In step A4, model training is performed on the target model to be trained based on the countermeasure sample and the optimizer, first gradient information of the target model about disturbance information is determined, and parameters in the pre-stabilizer are updated.
In step A6, if the number of steps corresponding to the parameters in the pre-stabilizer is updated and does not reach the preset number of steps of the countermeasure training, the parameters in the pre-stabilizer are further updated based on the first gradient information, the countermeasure training update rate, the disturbance information, the distance threshold, the disturbance normalization parameters, the countermeasure training step size and the learning rate, so as to obtain a trained target model.
In practice, for the above-mentioned processing of step A4 and step A6, the following calculation procedure can be referred to: wherein the input data may comprise a set of training samples
Figure BDA0004073960610000081
(M represents the number of training samples, X i Represents the ith training sample, y i Tag information representing the ith training sample), distance threshold e, disturbance information delta, countermeasure training update rate alpha, countermeasure training step K, pre-stabilizer S, first sub-dieF and learning rate τ, the output data is the trained pre-stabilizer S, i.e. +.>
Figure BDA0004073960610000091
If the number of steps corresponding to the parameters in the pre-stabilizer is updated, the parameters in the pre-stabilizer are further updated based on the first gradient information, the countermeasure training update rate, the disturbance information, the distance threshold, the disturbance normalization parameter, the countermeasure training step size and the learning rate, and if the number of steps corresponding to the parameters in the pre-stabilizer is updated, the parameters in the pre-stabilizer are further updated based on the first gradient information, the countermeasure training update rate, the disturbance information, the distance threshold, the disturbance normalization parameter, the countermeasure training step size and the learning rate, until the number of steps of the preset countermeasure training is reached, and finally the trained target model can be obtained.
In step S208, service data of the target service is acquired.
In this embodiment, the target service may be a service in the field of natural language processing, and may specifically be set according to an actual situation.
In step S210, the service data is input into a pre-trained target model, and the service data is predicted by a first sub-model and a pre-stabilizer applied to a target service in the target model, so as to obtain a prediction result for the service data.
Based on the foregoing, the first sub-model may include an input layer and a hidden layer, and the pre-stabilizer may include an input sub-stabilizer and a hidden sub-stabilizer, the input sub-stabilizer being disposed in the input layer of the first sub-model, the hidden sub-stabilizer being disposed in the hidden layer of the first sub-model. The hidden layer of the first sub-model may be built by an attention mechanism. The hidden layer of the first sub-model can be constructed through a multi-head attention mechanism in a Transformers framework, and encoders stacked in the Transformers framework can be included in the hidden layer of the first sub-model. The first sub-model may be constructed based on a deep neural network model.
In step S212, the target service is processed based on the prediction result of the service data.
The robustness of the model in the actual scene is improved more conveniently through the trained target model, the model deployment is more flexible, the performance of the trained target model is detected through various text challenge-resistance attack algorithms, for example, in the embodiment, the performance of the trained target model is detected through text challenge-resistance attack algorithms such as TextBugger, PWWS and textfoole, wherein the textBugger text challenge-resistance attack algorithm is a challenge-resistance attack algorithm suitable for white box and black box scenes, specifically, important words or sentences (in the white box or black box scenes respectively) are found firstly, then the better disturbance information is selected from the generated disturbance information, or for the black box situation, the important words to be manipulated are found by using a scoring function; the PWWS text challenge-against-attack algorithm is a synonym substitution text attack algorithm, wherein the significance of words and the classification probability are considered, the former shows the influence degree of original words on classification, and the latter has a change value which measures the attack effect of proposed substitution; textfooler text challenge algorithm is a comprehensive challenge paradigm, specifically, creating challenge examples, maintaining human predictive consistency, semantic similarity, and language fluency, first identifying important words of the target model, then replacing words with similar semantic and grammatically correct words until the predictions are altered.
The embodiment of the specification provides a data processing method, by acquiring service data of a target service, inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after the first sub-model performs model training, model parameters in the first sub-model are kept unchanged, a trained target model is obtained by training parameters in the front stabilizer against a sample, service processing is performed on the target service based on the prediction result of the service data, thus, a pre-stabilizer is proposed, which may be a lightweight and scalable module designed to enable an object model after combination to be counter-trained when inserted into a first sub-model, in particular inserting trainable parameters (i.e. parameters in the pre-stabilizer) into corresponding network layers in the first sub-model, allowing subsequent parameters to be concatenated with the parameters in the pre-stabilizer, and then performing counter-training only on the parameters in the pre-stabilizer, while keeping the model parameters in the first sub-model frozen (i.e. kept unchanged), so that fewer parameters need to be trained, a faster training speed, while occupying less GPU display memory, and that the robustness of the model after training of the pre-stabilizer is improved to be comparable to the performance of counter-training all model parameters, different front stabilizers can be replaced for different scenes, so that the model is suitable for different scenes, and further, the robustness of the model in an actual scene is improved more conveniently, and the service model is deployed more flexibly.
Example III
Based on the foregoing embodiment, the process of the foregoing embodiment is described below by using a specific application scenario, where the application scenario is a text processing scenario of natural language processing, based on which the target service is a service in the natural language processing field, the training sample is a training text sample, the countermeasure sample is countermeasure text, and the service data is text data.
As shown in fig. 3, the embodiment of the present disclosure provides a data processing method, where an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, such as a smart watch, an in-vehicle device, or the like). The server may be a single server, a server cluster including a plurality of servers, a background server such as a financial service or an online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example for detailed description, and the following related contents may be referred to for the execution process of the terminal device, which is not described herein. The method specifically comprises the following steps:
In step S302, a training text sample is acquired, and a distance threshold, a first sub-model trained in advance, an optimizer, an initialized pre-stabilizer, and disturbance information, which are constructed based on the training text sample, are acquired.
The first sub-model may be used for performing natural language processing, for example, performing semantic detection, text classification, semantic emotion classification, and the like, and may be specifically set according to actual situations, which is not limited in the embodiment of the present specification. The first sub-model may be built based on a deep neural network model, the first sub-model may include an input layer and a hidden layer, the pre-stabilizer may include an input sub-stabilizer and a hidden sub-stabilizer, the input sub-stabilizer is disposed in the input layer of the first sub-model, and the hidden sub-stabilizer is disposed in the hidden layer of the first sub-model. The hidden layer of the first sub-model may be constructed through a attention mechanism, the hidden layer of the first sub-model may be constructed through a multi-head attention mechanism in a Transformers architecture, and encoders stacked in the Transformers architecture may be included in the hidden layer of the first sub-model. The contrast text and the training text sample may be hardly different in visual sense or semantic sense, but the contrast text and the training text sample may be separated into two different samples by a specified model or algorithm as a basic setting principle, the above distance threshold may be set based on the setting principle, and specifically may be set according to the actual situation, which is not limited in the embodiment of the present specification. The optimizer in this embodiment may be an AdamW optimizer or the like. The disturbance information may include various kinds, for example, the disturbance information may be noise information, or word replacement (such as paraphrase replacement, etc.), etc., and may be specifically set according to actual situations. Corresponding disturbance information and the like can be obtained through uniform distribution sampling, and can be specifically set according to actual conditions, and the embodiment of the specification is not limited to the above.
In step S304, a corresponding countermeasure text is constructed based on the training text sample, the disturbance information, and the distance threshold, and a target model to be trained is constructed based on the first sub-model and the initialized pre-stabilizer.
In step S306, disturbance normalization parameters, an countermeasure training update rate, a countermeasure training step length, and a learning rate are acquired.
In step S308, based on the countermeasure text and the model training of the target model to be trained by the optimizer, first gradient information of the target model with respect to disturbance information is determined, and parameters in the pre-stabilizer are updated.
In step S310, if the number of steps corresponding to the parameters in the pre-stabilizer is updated to not reach the preset number of steps of the countermeasure training, the parameters in the pre-stabilizer are further updated based on the first gradient information, the countermeasure training update rate, the disturbance information, the distance threshold, the disturbance normalization parameters, the countermeasure training step size and the learning rate, so as to obtain the trained target model.
In step S312, text data of the target service is acquired.
In step S314, the text data is input into a pre-trained target model, and the text data is predicted by a first sub-model and a pre-stabilizer applied to the target service in the target model, so as to obtain a prediction result for the text data.
In step S212, the target service is processed based on the prediction result of the text data.
The embodiment of the specification provides a data processing method, by acquiring service data of a target service, inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after the first sub-model performs model training, model parameters in the first sub-model are kept unchanged, a trained target model is obtained by training parameters in the front stabilizer against a sample, service processing is performed on the target service based on the prediction result of the service data, thus, a pre-stabilizer is proposed, which may be a lightweight and scalable module designed to enable an object model after combination to be counter-trained when inserted into a first sub-model, in particular inserting trainable parameters (i.e. parameters in the pre-stabilizer) into corresponding network layers in the first sub-model, allowing subsequent parameters to be concatenated with the parameters in the pre-stabilizer, and then performing counter-training only on the parameters in the pre-stabilizer, while keeping the model parameters in the first sub-model frozen (i.e. kept unchanged), so that fewer parameters need to be trained, a faster training speed, while occupying less GPU display memory, and that the robustness of the model after training of the pre-stabilizer is improved to be comparable to the performance of counter-training all model parameters, different front stabilizers can be replaced for different scenes, so that the model is suitable for different scenes, and further, the robustness of the model in an actual scene is improved more conveniently, and the service model is deployed more flexibly.
Example IV
The data processing method provided in the embodiment of the present disclosure is based on the same concept, and the embodiment of the present disclosure further provides a data processing device, as shown in fig. 4.
The data processing apparatus includes: a data acquisition module 401, a prediction module 402, and a traffic processing module 403, wherein:
a data acquisition module 401 for acquiring service data of a target service;
the prediction module 402 is configured to input the service data into a pre-trained target model, predict the service data through a first sub-model and a front stabilizer applied to the target service in the target model, to obtain a prediction result for the service data, where the target model includes the first sub-model and the front stabilizer, the first sub-model includes a plurality of different network layers, the front stabilizer includes one or more sub-stabilizers, the sub-stabilizer is disposed in one network layer of the first sub-model, and after the first sub-model performs model training, keep model parameters in the first sub-model unchanged, and train parameters in the front stabilizer through countermeasure samples to obtain a trained target model;
And a service processing module 403, configured to perform service processing on the target service based on the prediction result of the service data.
In this embodiment of the present disclosure, the first sub-model includes an input layer and a hidden layer, and the pre-stabilizer includes an input sub-stabilizer and a hidden sub-stabilizer, where the input sub-stabilizer is disposed in the input layer of the first sub-model, and the hidden sub-stabilizer is disposed in the hidden layer of the first sub-model.
In the embodiment of the present specification, the hidden layer of the first sub-model is constructed through an attention mechanism.
In this embodiment of the present disclosure, the hidden layer of the first sub-model is constructed by using a multi-head attention mechanism in a converters architecture, and the hidden layer of the first sub-model includes encoders stacked in the converters architecture.
In an embodiment of the present disclosure, the apparatus further includes:
the sample acquisition module acquires a training sample, and acquires a distance threshold value of a distance between an countermeasure sample constructed based on the training sample and the training sample, the first sub-model trained in advance, an optimizer, an initialized front stabilizer and disturbance information;
the construction module is used for constructing a corresponding countermeasure sample based on the training sample, the disturbance information and the distance threshold value, and constructing a target model to be trained based on the first sub-model and the initialized front-end stabilizer;
And the training module is used for carrying out model training on the target model to be trained based on the countermeasure sample and the optimizer to obtain the trained target model.
In an embodiment of the present disclosure, the training module includes:
the information acquisition unit acquires disturbance normalization parameters, an countermeasure training update rate, countermeasure training step length and learning rate;
the updating unit is used for carrying out model training on a target model to be trained based on the countermeasure sample and the optimizer, determining first gradient information of the target model about the disturbance information and updating parameters in the pre-stabilizer;
and the training unit is used for further updating the parameters in the pre-stabilizer based on the first gradient information, the countermeasure training update rate, the disturbance information, the distance threshold value, the disturbance normalization parameter, the countermeasure training step length and the learning rate if the step number corresponding to the parameters in the pre-stabilizer does not reach the preset countermeasure training step number, so as to obtain the trained target model.
In the embodiment of the present disclosure, the first sub-model is constructed based on a deep neural network model, and the target service is a service in the field of natural language processing.
The embodiment of the present disclosure provides a data processing apparatus, by obtaining service data of a target service, inputting the service data into a pre-trained target model, predicting the service data by a first sub-model and a pre-stabilizer applied to the target service in the target model, to obtain a prediction result for the service data, where the target model includes a first sub-model and a pre-stabilizer, the first sub-model includes a plurality of different network layers, the pre-stabilizer includes one or more sub-stabilizers, the sub-stabilizer is disposed in one network layer of the first sub-model, and after the first sub-model performs model training, model parameters in the first sub-model are kept unchanged, a trained target model is obtained by training parameters in the pre-stabilizer against a sample, and service processing is performed on the target service based on the prediction result of the service data, so as to propose a pre-stabilizer, which may be a light-weight and expandable module, and is designed to enable a combined target to perform the anti-training when being inserted into the first sub-model, a specific training parameter is inserted into the first sub-model, and then the pre-stabilizer is allowed to be inserted into the first sub-model, and the pre-training parameters are not more than necessary to be fully stabilized, and simultaneously, a corresponding training parameter is allowed to be less than necessary to be fully stabilized in the pre-training parameters in the pre-model, and a pre-training parameter is allowed to be fully stable in the pre-training model based on the prediction result, different front stabilizers can be replaced for different scenes, so that the model is suitable for different scenes, and further, the robustness of the model in an actual scene is improved more conveniently, and the service model is deployed more flexibly.
Example five
The data processing device provided in the embodiment of the present disclosure further provides a data processing apparatus based on the same concept, as shown in fig. 5.
The data processing device may provide a terminal device or a server or the like for the above-described embodiments.
The data processing apparatus may vary considerably in configuration or performance and may include one or more processors 501 and memory 502, in which memory 502 may store one or more stored applications or data. Wherein the memory 502 may be transient storage or persistent storage. The application programs stored in memory 502 may include one or more modules (not shown) each of which may include a series of computer executable instructions for use in a data processing apparatus. Still further, the processor 501 may be arranged to communicate with the memory 502 and execute a series of computer executable instructions in the memory 502 on a data processing apparatus. The data processing device may also include one or more power supplies 503, one or more wired or wireless network interfaces 504, one or more input/output interfaces 505, and one or more keyboards 506.
In particular, in this embodiment, the data processing apparatus includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the data processing apparatus, and the one or more programs configured to be executed by the one or more processors comprise instructions for:
acquiring service data of a target service;
inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after training is carried out on the front stabilizer through an countermeasure sample, the trained target model is obtained;
And carrying out service processing on the target service based on the prediction result of the service data.
In this embodiment of the present disclosure, the first sub-model includes an input layer and a hidden layer, and the pre-stabilizer includes an input sub-stabilizer and a hidden sub-stabilizer, where the input sub-stabilizer is disposed in the input layer of the first sub-model, and the hidden sub-stabilizer is disposed in the hidden layer of the first sub-model.
In the embodiment of the present specification, the hidden layer of the first sub-model is constructed through an attention mechanism.
In this embodiment of the present disclosure, the hidden layer of the first sub-model is constructed by using a multi-head attention mechanism in a converters architecture, and the hidden layer of the first sub-model includes encoders stacked in the converters architecture.
In this embodiment of the present specification, further includes:
acquiring a training sample, and acquiring a distance threshold value of a distance between an countermeasure sample constructed based on the training sample and the training sample, the first sub-model trained in advance, an optimizer, an initialized pre-stabilizer and disturbance information;
constructing a corresponding countermeasure sample based on the training sample, the disturbance information and the distance threshold value, and constructing a target model to be trained based on the first sub-model and the initialized pre-stabilizer;
Model training is carried out on the target model to be trained based on the countermeasure sample and the optimizer, and the trained target model is obtained.
In this embodiment of the present disclosure, performing model training on a target model to be trained based on the challenge sample and the optimizer to obtain a trained target model, including:
obtaining disturbance normalization parameters, an countermeasure training update rate, countermeasure training step length and learning rate;
model training is carried out on a target model to be trained based on the countermeasure sample and the optimizer, first gradient information of the target model about the disturbance information is determined, and parameters in the pre-stabilizer are updated;
if the step number corresponding to the parameter in the pre-stabilizer is updated to not reach the preset step number of the countermeasure training, the parameter in the pre-stabilizer is further updated based on the first gradient information, the countermeasure training update rate, the disturbance information, the distance threshold, the disturbance normalization parameter, the countermeasure training step length and the learning rate, so that the trained target model is obtained.
In the embodiment of the present disclosure, the first sub-model is constructed based on a deep neural network model, and the target service is a service in the field of natural language processing.
The embodiment of the specification provides a data processing device, by acquiring service data of a target service, inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after the first sub-model performs model training, model parameters in the first sub-model are kept unchanged, a trained target model is obtained by training parameters in the front stabilizer against a sample, service processing is performed on the target service based on the prediction result of the service data, thus, a pre-stabilizer is proposed, which may be a lightweight and scalable module designed to enable an object model after combination to be counter-trained when inserted into a first sub-model, in particular inserting trainable parameters (i.e. parameters in the pre-stabilizer) into corresponding network layers in the first sub-model, allowing subsequent parameters to be concatenated with the parameters in the pre-stabilizer, and then performing counter-training only on the parameters in the pre-stabilizer, while keeping the model parameters in the first sub-model frozen (i.e. kept unchanged), so that fewer parameters need to be trained, a faster training speed, while occupying less GPU display memory, and that the robustness of the model after training of the pre-stabilizer is improved to be comparable to the performance of counter-training all model parameters, different front stabilizers can be replaced for different scenes, so that the model is suitable for different scenes, and further, the robustness of the model in an actual scene is improved more conveniently, and the service model is deployed more flexibly.
Example six
Further, based on the method shown in fig. 1 to 3, one or more embodiments of the present disclosure further provide a storage medium, which is used to store computer executable instruction information, and in a specific embodiment, the storage medium may be a U disc, an optical disc, a hard disk, etc., where the computer executable instruction information stored in the storage medium can implement the following flow when executed by a processor:
acquiring service data of a target service;
inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after training is carried out on the front stabilizer through an countermeasure sample, the trained target model is obtained;
And carrying out service processing on the target service based on the prediction result of the service data.
In this embodiment of the present disclosure, the first sub-model includes an input layer and a hidden layer, and the pre-stabilizer includes an input sub-stabilizer and a hidden sub-stabilizer, where the input sub-stabilizer is disposed in the input layer of the first sub-model, and the hidden sub-stabilizer is disposed in the hidden layer of the first sub-model.
In the embodiment of the present specification, the hidden layer of the first sub-model is constructed through an attention mechanism.
In this embodiment of the present disclosure, the hidden layer of the first sub-model is constructed by using a multi-head attention mechanism in a converters architecture, and the hidden layer of the first sub-model includes encoders stacked in the converters architecture.
In this embodiment of the present specification, further includes:
acquiring a training sample, and acquiring a distance threshold value of a distance between an countermeasure sample constructed based on the training sample and the training sample, the first sub-model trained in advance, an optimizer, an initialized pre-stabilizer and disturbance information;
constructing a corresponding countermeasure sample based on the training sample, the disturbance information and the distance threshold value, and constructing a target model to be trained based on the first sub-model and the initialized pre-stabilizer;
Model training is carried out on the target model to be trained based on the countermeasure sample and the optimizer, and the trained target model is obtained.
In this embodiment of the present disclosure, performing model training on a target model to be trained based on the challenge sample and the optimizer to obtain a trained target model, including:
obtaining disturbance normalization parameters, an countermeasure training update rate, countermeasure training step length and learning rate;
model training is carried out on a target model to be trained based on the countermeasure sample and the optimizer, first gradient information of the target model about the disturbance information is determined, and parameters in the pre-stabilizer are updated;
if the step number corresponding to the parameter in the pre-stabilizer is updated to not reach the preset step number of the countermeasure training, the parameter in the pre-stabilizer is further updated based on the first gradient information, the countermeasure training update rate, the disturbance information, the distance threshold, the disturbance normalization parameter, the countermeasure training step length and the learning rate, so that the trained target model is obtained.
In the embodiment of the present disclosure, the first sub-model is constructed based on a deep neural network model, and the target service is a service in the field of natural language processing.
The embodiment of the present disclosure provides a storage medium, by obtaining service data of a target service, inputting the service data into a pre-trained target model, predicting the service data by a first sub-model and a pre-stabilizer applied to the target service in the target model, obtaining a prediction result for the service data, where the target model includes a first sub-model and a pre-stabilizer, the first sub-model includes a plurality of different network layers, the pre-stabilizer includes one or more sub-stabilizers, the sub-stabilizer is disposed in one network layer of the first sub-model, and after the first sub-model performs model training, model parameters in the first sub-model are kept unchanged, a trained target model is obtained by training parameters in the pre-stabilizer against the sample, and service processing is performed on the target service based on the prediction result of the service data, so that a pre-stabilizer, which may be a light-weight and expandable module, is designed to enable the combined target to perform the anti-training when inserted into the first sub-model, specifically, the pre-training parameters may be inserted into the first sub-model, and the pre-stabilizer may be inserted into the first sub-model, and the pre-training parameters may be further trained by inserting the pre-stabilizer into the first sub-model, and the pre-stabilizer may be allowed to be more stable, and the pre-training parameters may be kept in the first sub-model, and the pre-training parameters may be kept relatively less than the pre-training parameters, and the pre-training parameters may be allowed to be a corresponding to be compared with the pre-training parameters, and the pre-training parameters may be further, and the pre-training parameters may be improved, different front stabilizers can be replaced for different scenes, so that the model is suitable for different scenes, and further, the robustness of the model in an actual scene is improved more conveniently, and the service model is deployed more flexibly.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing one or more embodiments of the present description.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable fraud case serial-to-parallel device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable fraud case serial-to-parallel device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the present disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (10)

1. A method of data processing, the method comprising:
acquiring service data of a target service;
inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after training is carried out on the front stabilizer through an countermeasure sample, the trained target model is obtained;
and carrying out service processing on the target service based on the prediction result of the service data.
2. The method of claim 1, the first sub-model comprising an input layer and a hidden layer, the pre-stabilizer comprising an input sub-stabilizer disposed in the input layer of the first sub-model and a hidden sub-stabilizer disposed in the hidden layer of the first sub-model.
3. The method of claim 2, wherein the hidden layer of the first sub-model is constructed by an attention mechanism.
4. The method of claim 3, wherein the hidden layer of the first sub-model is constructed by a multi-headed attention mechanism in a Transformers architecture, and wherein the hidden layer of the first sub-model comprises encoders stacked in the Transformers architecture.
5. The method of claim 4, the method further comprising:
acquiring a training sample, and acquiring a distance threshold value of a distance between an countermeasure sample constructed based on the training sample and the training sample, the first sub-model trained in advance, an optimizer, an initialized pre-stabilizer and disturbance information;
constructing a corresponding countermeasure sample based on the training sample, the disturbance information and the distance threshold value, and constructing a target model to be trained based on the first sub-model and the initialized pre-stabilizer;
model training is carried out on the target model to be trained based on the countermeasure sample and the optimizer, and the trained target model is obtained.
6. The method of claim 5, wherein the model training based on the challenge sample and the optimizer for the target model to be trained, to obtain the trained target model, comprises:
Obtaining disturbance normalization parameters, an countermeasure training update rate, countermeasure training step length and learning rate;
model training is carried out on a target model to be trained based on the countermeasure sample and the optimizer, first gradient information of the target model about the disturbance information is determined, and parameters in the pre-stabilizer are updated;
if the step number corresponding to the parameter in the pre-stabilizer is updated to not reach the preset step number of the countermeasure training, the parameter in the pre-stabilizer is further updated based on the first gradient information, the countermeasure training update rate, the disturbance information, the distance threshold, the disturbance normalization parameter, the countermeasure training step length and the learning rate, so that the trained target model is obtained.
7. The method of any of claims 1-6, the first sub-model being built based on a deep neural network model, the target business being a business in the natural language processing domain.
8. A data processing apparatus, the apparatus comprising:
the data acquisition module acquires service data of a target service;
the prediction module is used for inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after the front stabilizer is trained through countermeasure samples, the trained target model is obtained;
And the service processing module is used for carrying out service processing on the target service based on the prediction result of the service data.
9. A data processing apparatus, the data processing apparatus comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring service data of a target service;
inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after training is carried out on the front stabilizer through an countermeasure sample, the trained target model is obtained;
And carrying out service processing on the target service based on the prediction result of the service data.
10. A storage medium for storing computer executable instructions that when executed by a processor implement the following:
acquiring service data of a target service;
inputting the service data into a pre-trained target model, predicting the service data through a first sub-model and a front stabilizer applied to the target service in the target model to obtain a prediction result aiming at the service data, wherein the target model comprises the first sub-model and the front stabilizer, the first sub-model comprises a plurality of different network layers, the front stabilizer comprises one or more sub-stabilizers, the sub-stabilizer is arranged in one network layer of the first sub-model, after model training is carried out on the first sub-model, model parameters in the first sub-model are kept unchanged, and after training is carried out on the front stabilizer through an countermeasure sample, the trained target model is obtained;
and carrying out service processing on the target service based on the prediction result of the service data.
CN202310090590.0A 2023-01-17 2023-01-17 Data processing method, device and equipment Pending CN116029441A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310090590.0A CN116029441A (en) 2023-01-17 2023-01-17 Data processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310090590.0A CN116029441A (en) 2023-01-17 2023-01-17 Data processing method, device and equipment

Publications (1)

Publication Number Publication Date
CN116029441A true CN116029441A (en) 2023-04-28

Family

ID=86073747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310090590.0A Pending CN116029441A (en) 2023-01-17 2023-01-17 Data processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN116029441A (en)

Similar Documents

Publication Publication Date Title
US11048875B2 (en) Skimming data sequences using recurrent neural networks
CN113837370B (en) Method and apparatus for training a model based on contrast learning
CN116227474B (en) Method and device for generating countermeasure text, storage medium and electronic equipment
CN109934253B (en) Method and device for generating countermeasure sample
CN111401062B (en) Text risk identification method, device and equipment
CN110008394B (en) Public opinion information identification method, device and equipment
CN113826125A (en) Training machine learning models using unsupervised data enhancement
CN112417093B (en) Model training method and device
CN111858898A (en) Text processing method and device based on artificial intelligence and electronic equipment
CN112200132A (en) Data processing method, device and equipment based on privacy protection
CN115545002B (en) Model training and business processing method, device, storage medium and equipment
CN110502614B (en) Text interception method, device, system and equipment
CN113887206B (en) Model training and keyword extraction method and device
US20230401382A1 (en) Dynamic Language Models for Continuously Evolving Content
CN117392694A (en) Data processing method, device and equipment
CN116543264A (en) Training method of image classification model, image classification method and device
CN116151355A (en) Method, device, medium and equipment for model training and service execution
CN116029441A (en) Data processing method, device and equipment
CN115700555A (en) Model training method, prediction method, device and electronic equipment
CN116501852B (en) Controllable dialogue model training method and device, storage medium and electronic equipment
CN111539520A (en) Method and device for enhancing robustness of deep learning model
CN115423485B (en) Data processing method, device and equipment
CN115408449B (en) User behavior processing method, device and equipment
CN117369783B (en) Training method and device for security code generation model
CN114817469B (en) Text enhancement method, training method and training device for text enhancement model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination