CN116186540A

CN116186540A - Data processing method, device and equipment

Info

Publication number: CN116186540A
Application number: CN202310197133.1A
Authority: CN
Inventors: 蒋晨之; 傅幸; 王维强
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-05-30

Abstract

The embodiment of the specification provides a data processing method, a device and equipment, wherein the method comprises the following steps: acquiring a data sample for training a target model; inputting the data sample into a first module of the target model to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and the preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and the preset classification loss function; inputting the data sample into a second module of the target model to obtain a second characterization vector corresponding to the data sample, and obtaining a second loss value based on the first characterization vector, the second characterization vector and a preset contrast loss function; and performing iterative training on the target model based on the first loss value and the second loss value to obtain a trained target model.

Description

Data processing method, device and equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, and device.

Background

With the rapid development of computer technology, the variety and number of application services provided by enterprises for users are also increasing, and accordingly, the data volume of user data is increasing, the data structure is also becoming complex, and data can be detected through a model obtained through training. For example, when performing risk detection on data, the risk detection model may be trained according to training data and corresponding risk labels, and when performing risk detection on data to be detected according to the risk detection model after training.

However, since the complexity of determining the risk tag of the training data is high, this results in a small amount of data of the training data that can be used for training the model, so that the detection accuracy of the model obtained by training is poor, and therefore, a solution capable of improving the detection accuracy of the model is required.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a data processing method, apparatus and device, so as to provide a solution capable of improving the detection accuracy of a model.

In order to achieve the above technical solution, the embodiments of the present specification are implemented as follows:

in a first aspect, a data processing method includes: acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process; inputting the data sample into a first module of the target model to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and a preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and a preset classification loss function; inputting the data sample into a second module of the target model to obtain a second characterization vector corresponding to the data sample, and obtaining a second loss value based on the first characterization vector, the second characterization vector and a preset contrast loss function, wherein the second module has no gradient back propagation; and performing iterative training on the target model based on the first loss value and the second loss value to obtain a trained target model.

In a second aspect, embodiments of the present disclosure provide a data processing apparatus, the apparatus comprising: the sample acquisition module is used for acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process; the first determining module is used for inputting the data sample into the first module of the target model to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and a preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and a preset classification loss function; the second determining module is used for inputting the data sample into the second module of the target model, obtaining a second characterization vector corresponding to the data sample, and obtaining a second loss value based on the first characterization vector, the second characterization vector and a preset contrast loss function, wherein the second module has no gradient back propagation; and the training module is used for carrying out iterative training on the target model based on the first loss value and the second loss value to obtain a trained target model.

In a third aspect, embodiments of the present specification provide a data processing apparatus, the data processing apparatus comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process; inputting the data sample into a first module of the target model to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and a preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and a preset classification loss function; inputting the data sample into a second module of the target model to obtain a second characterization vector corresponding to the data sample, and obtaining a second loss value based on the first characterization vector, the second characterization vector and a preset contrast loss function, wherein the second module has no gradient back propagation; and performing iterative training on the target model based on the first loss value and the second loss value to obtain a trained target model.

In a fourth aspect, embodiments of the present description provide a storage medium for storing computer-executable instructions that, when executed, implement the following: acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process; inputting the data sample into a first module of the target model to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and a preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and a preset classification loss function; inputting the data sample into a second module of the target model to obtain a second characterization vector corresponding to the data sample, and obtaining a second loss value based on the first characterization vector, the second characterization vector and a preset contrast loss function, wherein the second module has no gradient back propagation; and performing iterative training on the target model based on the first loss value and the second loss value to obtain a trained target model.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1A is a flowchart illustrating an embodiment of a data processing method according to the present disclosure;

FIG. 1B is a schematic diagram illustrating a data processing method according to the present disclosure;

FIG. 2 is a schematic diagram of a target model of the present disclosure;

FIG. 3 is a schematic diagram of another object model of the present disclosure;

FIG. 4 is a schematic diagram illustrating a processing procedure of another data processing method according to the present disclosure;

FIG. 5 is a schematic diagram illustrating a processing procedure of another data processing method according to the present disclosure;

FIG. 6 is a schematic diagram of an embodiment of a data processing apparatus according to the present disclosure;

fig. 7 is a schematic diagram of a data processing apparatus according to the present specification.

Detailed Description

The embodiment of the specification provides a data processing method, a device and equipment.

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

Example 1

As shown in fig. 1A and fig. 1B, the embodiment of the present disclosure provides a data processing method, where an execution body of the method may be a server, and the server may be an independent server or a server cluster formed by a plurality of servers. The method specifically comprises the following steps:

in S102, a data sample for training a target model is acquired.

The data sample may include input data of a user in a human-computer interaction process, the data sample may be data corresponding to a preset service and/or a preset user acquired in a preset acquisition period, the preset data acquisition period may be a preset period of approximately one week, approximately half a month, each day, or the like, specifically, for example, the data sample may be behavior data generated by triggering execution of a resource transfer service by a user acquired at 10-14 points each day, or the target data may be a plurality of behavior data input by a preset user acquired in the human-computer interaction process, or the target data may also be a plurality of behavior data input by a preset user acquired in the approximately one week, the data sample may include one or more of text data, voice data, picture data, video data and other data input by the user, and the target model may be a model constructed by any deep learning algorithm for detecting the data, for example, the target model may be a model constructed by a neural network algorithm for classifying the data, that is, or the target model may be classified by a classification result of the target model output, and the risk processing may be performed on the target model.

In implementation, with the rapid development of computer technology, the types and the number of application services provided by enterprises for users are also increasing, and accordingly, the data volume of user data is increasing, the data structure is also becoming complex, and data can be detected through a model obtained through training. For example, when performing risk detection on data, the risk detection model may be trained according to training data and corresponding risk labels, and when performing risk detection on data to be detected according to the risk detection model after training. However, since the complexity of determining the risk tag of the training data is high, this results in a small amount of data of the training data that can be used for training the model, so that the detection accuracy of the model obtained by training is poor, and therefore, a solution capable of improving the detection accuracy of the model is required. For this reason, the embodiments of the present specification provide a technical solution that can solve the above-mentioned problems, and specifically, reference may be made to the following.

In implementation, taking the data sample as the data corresponding to the preset service acquired in the preset acquisition period as an example, the behavior data corresponding to the preset service triggered to be executed in the near half month can be acquired, that is, the terminal device can acquire the operation data (such as a clicking operation, an input operation, etc.) corresponding to the preset service triggered to be executed by the user.

The server may receive a plurality of operation data transmitted from a plurality of terminal devices and determine the received operation data as data samples. Alternatively, the server may store the received operation data, and upon reaching the model training period, may determine a data sample for training the target model based on the stored operation data, e.g., may randomly extract the stored operation data as the data sample, or select a predetermined number of operation data from the operation data as the data sample based on the storage time of the operation data, or the like.

In addition, a plurality of different types of data can be stored in the server, the server can select corresponding data from the stored data as a data sample based on different target models, for example, if the target model is a model for risk detection for the resource transfer service, the server can select the data corresponding to the resource transfer service from the stored data as the data sample.

The above-mentioned data sample acquiring method is an optional and realizable acquiring method, and in an actual application scenario, there may also be multiple different acquiring methods, and different acquiring methods may be selected according to different actual application scenarios, which is not specifically limited in the embodiment of the present disclosure.

In S104, inputting the data sample into a first module of the target model, to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; and determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and the preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and the preset classification loss function.

The first module may be configured to perform classification processing on the data sample to obtain a first classification label corresponding to the data sample, and meanwhile, the first module may be further configured to perform feature extraction processing on the data sample to obtain a first feature vector corresponding to the data sample.

In implementation, as shown in fig. 2, a data sample may be input into a target model, and a first classification label corresponding to the data sample and a first token vector corresponding to the data sample may be obtained through classification processing and token extraction processing of the first module.

In addition, a plurality of categories may be preset for the application scenario of the target model, and corresponding characterization vectors (i.e., preset category characterization vectors) may be preset for each category, for example, assuming that the application scenario of the target model is a risk detection scenario, three categories may be preset for the risk detection scenario, such as a high risk category, a medium risk category, and a no risk category, and corresponding characterization vectors may be set for each category.

The method for determining the preset category characterization vector is an optional and realizable determination method, and in an actual application scenario, there may be a plurality of different determination methods, for example, historical data corresponding to each category may also be obtained, and based on the characterization vector corresponding to the historical data, a category characterization vector corresponding to each category may be determined, etc., and different determination methods may be selected according to different actual application scenarios, which is not limited in this embodiment of the present disclosure.

The second class label corresponding to the data sample may be determined based on the similarity between the first token vector and each preset class token vector, for example, the class corresponding to the preset class token vector with the highest similarity may be determined to be the second class label corresponding to the data sample, specifically, for example, assuming that the similarity between the first token vector and the class token vector of the high risk class is 0.8, the similarity between the first token vector and the class token vector of the medium risk class is 0.2, and the similarity between the first token vector and the class token vector of the no risk class is 0.1, the second class label corresponding to the data sample may be determined to be the high risk class.

Finally, a first loss value can be obtained based on the second classification tag, the first classification tag and a preset classification loss function, wherein the preset classification loss function can be any function capable of controlling the similarity between the second classification tag and the first classification tag, namely, the similarity between the first classification tag and the second classification tag can be controlled to be smaller than a preset first similarity threshold through the preset classification loss function, so that the classification accuracy of the first module is improved.

In S106, the data sample is input to the second module of the target model, to obtain a second characterization vector corresponding to the data sample, and based on the first characterization vector, the second characterization vector, and a preset contrast loss function, a second loss value is obtained.

The second module may perform the characterization extraction process on the data sample, the characterization extraction algorithm of the second module may be the same as the characterization extraction algorithm of the first module, and since the second model does not have gradient back propagation, the parameters of the characterization extraction algorithm of the second module may be replicated by an exponential moving average (Exponential Moving Average, EMA).

In implementation, as shown in fig. 2, the second module may perform a feature extraction process on the data sample to obtain a second feature vector corresponding to the data sample.

Then, the server may obtain a second loss value based on the first token vector, the second token vector, and a preset contrast loss function. The preset contrast loss function may be any function capable of controlling the similarity between the first characterization vector and the second characterization vector, that is, the similarity between the first characterization vector and the second characterization vector may be controlled by the preset contrast loss function to be smaller than a preset second similarity threshold, so as to improve the accuracy of the characterization extraction of the first module and the second module.

In S108, iterative training is performed on the target model based on the first loss value and the second loss value, to obtain a trained target model.

In an implementation, the target loss value may be determined based on the first loss value and the second loss value (e.g., a sum of the first loss value and the second loss value may be determined as the target loss value), and the target model is iteratively trained by the target loss value to obtain a trained target model.

The trained target model may be used for performing detection processing on data, for example, the trained target model may be used for performing corresponding detection processing on data of a scene type corresponding to the data sample, for example, the scene type corresponding to the data sample may be a resource transfer scene, and then the trained target model may be used for performing risk detection processing on data of the resource transfer scene.

The embodiment of the specification provides a data processing method, which is used for acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process, the data sample is input into a first module of the target model, a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample are obtained, a second classification label corresponding to the data sample is determined based on the similarity of the first characterization vector and a preset classification characterization vector, a first loss value is obtained based on the second classification label, the first classification label and a preset classification loss function, the data sample is input into the second module of the target model, a second characterization vector corresponding to the data sample is obtained, a second loss value is obtained based on the first characterization vector, the second characterization vector and a preset comparison loss function, the second module is free of gradient back propagation, and iterative training is performed on the target model based on the first loss value and the second loss value, so that a trained target model is obtained. Therefore, on one hand, the data samples used for training the target model can be unlabeled samples, namely, the data samples can be gray samples, so that the problem of poor model training effect caused by small data size of the labeled samples can be avoided, on the other hand, the aggregation of the characterization space can be optimized through the second classification labels, in addition, the data samples of the same class can be more aggregated through the first characterization vector, the second characterization vector and the preset contrast loss function, the data samples of different classes are more dispersed, and the aggregation of the optimized characterization space can be realized, so that when the target model is trained, the unlabeled data samples are used, the model can learn to a better characterization space, namely, when the target model is trained, the gray samples are effectively utilized, the aggregation of the characterization space of the model is optimized, and the detection accuracy of the trained target model is improved.

Example two

The embodiment of the specification provides a data processing method, and an execution subject of the method may be a server, where the server may be an independent server or may be a server cluster formed by a plurality of servers. The method specifically comprises the following steps:

in S102, a data sample for training a target model is acquired.

The data sample may include input data of a user during human-computer interaction.

In S104, the data sample is input to the first module of the target model, so as to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample.

As shown in fig. 3, the first module may include a first data enhancer module, an Encoder (Encoder), a Classifier (Classifier), and a first characterization extraction sub-module, where the first data enhancer sub-module is configured to perform data enhancement processing on target data, the Encoder may be configured to perform encoding processing on a data sample after enhancement processing, the Classifier may be configured to perform classification processing on a first encoding result, and the first characterization extraction sub-module may be configured to perform characterization extraction processing on the first encoding result.

The second module may include a second data enhancer module, a momentum encoder, and a second characterization extraction sub-module, the second data enhancer module may be configured to perform data enhancement processing on the target data, the momentum encoder may be configured to perform encoding processing on the enhanced data samples, the second characterization extraction sub-module may be configured to perform characterization extraction processing on the second encoding result, a structure of the momentum encoder may be the same as a structure of the encoder, and parameters of the momentum encoder may be the same as parameters of the encoder.

Because the second module has no gradient back propagation, the parameters of the momentum encoder of the second module can be copied by the EMA to the parameters of the encoder of the first module, and likewise, the structures and parameters of the second characterization extraction sub-module of the second module and the first characterization extraction sub-module of the first module can be the same, i.e., the second characterization extraction sub-module of the second module can also be copied by the EMA to the first characterization extraction sub-module of the first module.

In addition, the data samples may include unlabeled samples or unlabeled samples (i.e., gray samples), in order to ensure accuracy of model training, the classifier in the first module may be obtained by training a model constructed by a preset classification algorithm based on historical data, and the number of the historical data used for training the classifier is greater than a preset number threshold.

In the case that the data samples include unlabeled samples, inaccurately labeled samples, and labeled samples, the duty ratio of the labeled data samples in the data samples may be greater than a preset duty ratio in order to ensure accuracy of model training.

In S402, a similarity between the first token vector and each of the preset class token vectors is determined based on the first token vector and the preset class token vectors.

In S404, a second class label corresponding to the data sample is determined based on the initial class label and the similarity.

In S406, the data sample is sampled to obtain sampled data.

In the implementation, taking the iterative training of the target model by a min-batch iterative training mode as an example, the mini-batch data obtained by sampling can be used as sampling data.

In S408, a first loss value is determined based on the first class label, the second class label, and the number of sample data corresponding to the sample data.

In an implementation, a corresponding second class label may be determined for the sampled data, e.g., a first token vector and a preset class token vector corresponding to the sampled data may be substituted into the formula

Obtaining the similarity between the first characterization vector and each preset category characterization vector, wherein z _i For the similarity between the first characterization vector corresponding to the ith data in the sampled data and each preset category characterization vector, q _i For the ith data in the sampled data to correspond to the first characterization vector, mu _j And representing the vector for the category corresponding to the j-th category.

The similarity between the first token vector and each of the predetermined class token vectors and the initial class labels may then be substituted into the formula

s _i ^′ ＝*s _i +(1-)* _i ，

Obtaining a second class label, wherein s _i ^′ For the second classification label corresponding to the ith data in the sampled data, s _i And (3) for the initial classification label corresponding to the ith data in the sampled data, wherein θ is a preset first super parameter.

The number of the first class labels, the second class labels, and the sample data corresponding to the sample data may be substituted into the formula,

obtaining a first loss value, wherein L _cls For the first loss value, B is the number of sampled data, x _i For the ith data in the sampled data, f ^j (( _i ) For the prediction probability of the prediction category output by the first characterization extraction submodule for the ith data sample, C is the number of preset categories, s _ij And a second class label corresponding to the ith data is preset for the jth class.

In addition, under the condition that the first classification label is the same as the preset category, the preset category characterization vector can be updated, for example, the preset category characterization vector and the first characterization vector can be input into a formula,

μ ^′ _c ＝Normalize(*μ _C +(1-)* _i )，

obtaining an updated preset category characterization vector, wherein mu _c ' is a preset category characterization after updating, mu _C For the pre-update pre-set category characterization, γ is a pre-set second hyper-parameter.

In S106, the data sample is input to the second module of the target model, and a second token vector corresponding to the data sample is obtained.

In S410, a history token vector of the momentum encoder output is obtained, and a token memory is constructed based on the history token vector, the first token vector, and the second token vector.

In implementations, the momentum encoder may output a second token vector corresponding to the current data sample and a historical token vector corresponding to the historical data, and a set of the historical token vector, the first token vector, and the second token vector may be determined as a token memory bank.

In S412, a forward token vector in the token memory corresponding to the first token vector is obtained.

In implementation, since the second token vector is the same as the data source of the first token vector (i.e., the first token vector and the second token vector are both obtained by performing data enhancement processing, encoding processing, and token extraction processing on the same data sample), the second token vector may be determined as a forward token vector corresponding to the first token vector.

In addition, the following provides an alternative implementation manner, which can be seen from the following steps one to two:

step one, obtaining a classification label corresponding to the data sample, and determining a label corresponding to the expression vector in the characterization memory based on the classification label corresponding to the data sample.

The data samples may include unlabeled samples, or unlabeled samples (i.e., gray samples), labeled samples, and the like, and for unlabeled or labeled inaccurate data samples, the classification label corresponding to the data sample may be a first classification label, and for labeled data samples, the classification label corresponding to the data sample may be a labeled label of the data sample or a first classification label corresponding to the data sample.

And step two, determining the characterization vector corresponding to the label identical to the first classification label in the labels corresponding to the characterization vector in the characterization memory as a forward characterization vector.

In the implementation, the first classification label may be a label corresponding to a class with the largest prediction probability among the plurality of prediction classes output by the first token extraction submodule, so that a token vector corresponding to the same label as the first classification label in the labels corresponding to the token vectors in the token memory library may be determined as the forward token vector.

In S414, a second loss value is obtained based on the token vector, the forward token vector, the first token vector, and the preset contrast loss function in the token memory.

In practice, the token vector, the forward token vector, the first token vector in the token memory may be substituted into the formula,

Obtaining a second loss value, wherein L _cont Is the second loss value, B _q For a first token vector corresponding to the sampled data, q _i For the i first token vector, P (x _i ) Data set formed for forward characterization vector, k ⁺ For forward characterization of the vector, a (x _i ) The token memory bank, k' is the token vector in the token memory bank, and τ is a preset third super parameter.

Wherein, the liquid crystal display device comprises a liquid crystal display device,

wherein (1)>

Extracting for the first characterization the prediction probability,/-for the prediction category outputted by the submodule for the ith data sample>

For the ith data sampleI.e. the construction of P (x) based on the token vector corresponding to the same tag as the first classification tag among the tags corresponding to the token vector in the token memory bank _i )。

In S416, target data to be detected is acquired.

The scenes corresponding to the target data may be the same as the scenes corresponding to the data samples, for example, if the scenes corresponding to the data samples are resource transfer scenes, then the scenes corresponding to the target data may also be resource transfer scenes, in addition, the data structure of the target data and the data structure of the data samples may also be the same, for example, if the data samples include text data and picture data, then the target data may also include text data and/or picture data.

In an implementation, taking a scenario corresponding to a data sample as an example of a resource transfer scenario, the terminal device may acquire data (such as resource transfer time, resource transfer number, etc.) corresponding to the execution of the resource transfer service when detecting an execution instruction that triggers the resource transfer service by the user, and determine the acquired data as target data to send to the server.

After the target data is obtained, different detection methods can be selected to detect the target data according to the application scenario corresponding to the target data, for example, in the case that the application scenario corresponding to the target data is a search scenario, a classification scenario and other scenarios, as shown in fig. 4, the classification result of the target data can be determined through a trained target model, or in the case that the application scenario corresponding to the target data is a risk detection scenario such as a resource transfer scenario and other scenarios, as shown in fig. 5, the target data can be processed through a first module of the trained target model, a target characterization vector corresponding to the target data is obtained, and then the risk detection result of the target data is determined based on a pre-trained risk detection model.

In S418, the target data is input into the trained target model, a classification label corresponding to the target data is obtained, and a classification result of the target data is determined based on the classification label corresponding to the target data.

In implementation, the target data may be input into the trained target model, so as to obtain a classification label corresponding to the target data through the classifier in the first module of the trained target model, and determine a classification result of the target data based on the classification label corresponding to the target data.

The classification result of the target data may be used for processing such as data retrieval.

In S420, the target data is input into the trained target model, so that the target data is processed by the first module of the trained target model, and a target characterization vector corresponding to the target data is obtained.

In S422, the target token vector is input into a pre-trained risk detection model, and a risk detection result of the target data is obtained.

The risk detection model is a model constructed based on a preset deep learning algorithm.

In implementation, taking the target data as the resource transfer data as an example, a target characterization vector corresponding to the target data can be obtained based on a first module of the trained target model, and then a risk detection result corresponding to the target characterization vector can be determined based on a risk detection model, wherein the risk detection model can be a model with better effect, which is obtained by training a large number of models in advance.

Because malicious events such as the malicious third party stealing the user privacy information have batchability and anti-transfer property, the risk detection business needs to be timely prevented and controlled to realize batched hemostasis. Compared with the mode that the normal classification model directly determines black or white labels to perform risk prevention and control, the risk prevention and control with higher information degree and higher interpretability can be achieved timely and flexibly by performing vector recall in the model representation space. The hidden similar data can be recalled in time through the currently known limited data, and then the recalled data are subjected to operations such as analysis, risk prevention and control, timely hemostasis and the like. Therefore, different from the learning target of the two classification models, better characterization space can be required to be learned, so that data with similar business morphology have aggregation in the characterization space.

In addition, the sample data corresponding to the black label in the sample data of risk detection is often a small part of real black sample data, and training a model by using only the part of sample data can be accompanied with the problem that the data volume of a data sample is small, so that the trained model is poor in generalization and robustness. The risk detection usually has a large number of gray samples, which are uncertain in labels, and black samples can exist in the gray samples, so that the model characterization space is better if the gray samples can be effectively utilized in characterization learning.

In the method, when the second classification labels are determined, the inner product calculation similarity (namely, the similarity of the first characterization vector and the preset classification characterization vector) is adopted between the characterizations, so that the gray sample can be effectively utilized to improve the model effect while focusing on optimizing the characterization space aggregation, and the contrast loss function is also based on the inner product calculation distance (namely, the second loss value is determined based on the first characterization vector, the forward characterization vector and the labeling vector in the characterization memory bank) between the characterizations, so that the inside of the classes is more aggregated, the classes are more dispersed, and the model can learn to a better characterization space while focusing on optimizing the characterization space aggregation, namely, the gray sample with label noise is adopted, and the method is applied to downstream characterization consumption, such as vector retrieval, and the like, so that the gray sample and the optimization model are effectively utilized to characterize the space aggregation is considered.

Example III

The data processing method provided in the embodiment of the present disclosure is based on the same concept, and the embodiment of the present disclosure further provides a data processing device, as shown in fig. 6.

The data processing apparatus includes: sample acquisition module 601, first determination module 602, second determination module 603, and training module 604, wherein:

the sample acquiring module 601 is configured to acquire a data sample for training a target model, where the data sample includes input data of a user in a human-computer interaction process;

a first determining module 602, configured to input the data sample into a first module of the target model, and obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and a preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and a preset classification loss function;

a second determining module 603, configured to input the data sample into a second module of the target model, obtain a second characterization vector corresponding to the data sample, and obtain a second loss value based on the first characterization vector, the second characterization vector, and a preset contrast loss function, where the second module has no gradient back propagation;

And a training module 604, configured to iteratively train the target model based on the first loss value and the second loss value, to obtain a trained target model.

In this embodiment of the present disclosure, the first module includes a first data enhancer module, an encoder, a classifier, and a first characterization extraction submodule, where the first data enhancer module is configured to perform data enhancement processing on the target data, the encoder is configured to perform coding processing on a data sample after enhancement processing, the classifier is configured to perform classification processing on the first coding result, and the first characterization extraction submodule is configured to perform characterization extraction processing on the first coding result;

the second module comprises a second data enhancer module, a momentum encoder and a second characterization extraction submodule, wherein the second data enhancer module is used for carrying out data enhancement processing on the target data, the momentum encoder is used for carrying out coding processing on data samples after enhancement processing, the second characterization extraction submodule is used for carrying out characterization extraction processing on the second coding result, the structure of the momentum encoder is identical to that of the encoder, and parameters of the momentum encoder are identical to those of the encoder.

In an embodiment of the present disclosure, the apparatus further includes:

the first acquisition module is used for acquiring target data to be detected;

the classification module is used for inputting the target data into the trained target model to obtain a classification label corresponding to the target data, and determining a classification result of the target data based on the classification label corresponding to the target data.

In an embodiment of the present disclosure, the apparatus further includes:

the second acquisition module is used for acquiring target data to be detected;

the vector acquisition module is used for inputting the target data into the trained target model, so as to process the target data through a first module of the trained target model and obtain a target characterization vector corresponding to the target data;

the result acquisition module is used for inputting the target characterization vector into a pre-trained risk detection model to obtain a risk detection result of the target data, wherein the risk detection model is constructed based on a preset deep learning algorithm.

In the embodiment of the present disclosure, the first determining module 602 is configured to:

determining the similarity between the first characterization vector and each preset category characterization vector based on the first characterization vector and the preset category characterization vector;

And determining a second classification label corresponding to the data sample based on the initial classification label and the similarity.

In this embodiment of the present disclosure, the second determining module 603 is configured to:

acquiring a history characterization vector output by the momentum encoder, and constructing a characterization memory based on the history characterization vector, the first characterization vector and the second characterization vector;

acquiring a forward characterization vector corresponding to the first characterization vector in the characterization memory;

and obtaining the second loss value based on the characterization vector, the forward characterization vector, the first characterization vector and the preset contrast loss function in the characterization memory.

acquiring a classification label corresponding to the data sample, and determining a label corresponding to the expression vector in the characterization memory based on the classification label corresponding to the data sample;

and determining the characterization vector corresponding to the label which is the same as the first classification label in the labels corresponding to the characterization vectors in the characterization memory as the forward characterization vector.

In this embodiment of the present disclosure, the preset classification loss function is a cross entropy loss function, and the first determining module 602 is configured to:

Sampling the data sample to obtain sampling data;

and determining the first loss value based on the first classification label corresponding to the sampling data, the second classification label and the quantity of the sampling data.

The embodiment of the specification provides a data processing device, which is used for acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process, the data sample is input into a first module of the target model, a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample are obtained, a second classification label corresponding to the data sample is determined based on the similarity of the first characterization vector and a preset classification characterization vector, a first loss value is obtained based on the second classification label, the first classification label and a preset classification loss function, the data sample is input into the second module of the target model, a second characterization vector corresponding to the data sample is obtained, a second loss value is obtained based on the first characterization vector, the second characterization vector and a preset comparison loss function, the second module is free of gradient back propagation, and iterative training is performed on the target model based on the first loss value and the second loss value, so that a trained target model is obtained. Therefore, on one hand, the data samples used for training the target model can be unlabeled samples, namely, the data samples can be gray samples, so that the problem of poor model training effect caused by small data size of the labeled samples can be avoided, on the other hand, the aggregation of the characterization space can be optimized through the second classification labels, in addition, the data samples of the same class can be more aggregated through the first characterization vector, the second characterization vector and the preset contrast loss function, the data samples of different classes are more dispersed, and the aggregation of the optimized characterization space can be realized, so that when the target model is trained, the unlabeled data samples are used, the model can learn to a better characterization space, namely, when the target model is trained, the gray samples are effectively utilized, the aggregation of the characterization space of the model is optimized, and the detection accuracy of the trained target model is improved.

Example IV

Based on the same idea, the embodiment of the present disclosure further provides a data processing apparatus, as shown in fig. 7.

The data processing apparatus may vary considerably in configuration or performance and may include one or more processors 701 and memory 702, where the memory 702 may store one or more stored applications or data. Wherein the memory 702 may be transient storage or persistent storage. The application programs stored in the memory 702 may include one or more modules (not shown) each of which may include a series of computer executable instructions for use in a data processing apparatus. Still further, the processor 701 may be arranged to communicate with a memory 702 and execute a series of computer executable instructions in the memory 702 on a data processing apparatus. The data processing device may also include one or more power supplies 703, one or more wired or wireless network interfaces 704, one or more input/output interfaces 705, and one or more keyboards 706.

In particular, in this embodiment, the data processing apparatus includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the data processing apparatus, and the one or more programs configured to be executed by the one or more processors comprise instructions for:

Acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process;

inputting the data sample into a first module of the target model to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and a preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and a preset classification loss function;

inputting the data sample into a second module of the target model to obtain a second characterization vector corresponding to the data sample, and obtaining a second loss value based on the first characterization vector, the second characterization vector and a preset contrast loss function, wherein the second module has no gradient back propagation;

and performing iterative training on the target model based on the first loss value and the second loss value to obtain a trained target model.

Optionally, the first module includes a first data enhancer module, an encoder, a classifier and a first characterization extraction submodule, where the first data enhancer submodule is used for performing data enhancement processing on the target data, the encoder is used for performing coding processing on the data samples after enhancement processing, the classifier is used for performing classification processing on the first coding result, and the first characterization extraction submodule is used for performing characterization extraction processing on the first coding result;

Optionally, the method further comprises:

acquiring target data to be detected;

inputting the target data into the trained target model to obtain a classification label corresponding to the target data, and determining a classification result of the target data based on the classification label corresponding to the target data.

Optionally, the method further comprises:

acquiring target data to be detected;

inputting the target data into the trained target model, and processing the target data through a first module of the trained target model to obtain a target characterization vector corresponding to the target data;

And inputting the target characterization vector into a pre-trained risk detection model to obtain a risk detection result of the target data, wherein the risk detection model is constructed based on a preset deep learning algorithm.

Optionally, the determining, based on the similarity between the first token vector and a preset class token vector, a second class label corresponding to the data sample includes:

Optionally, the obtaining a second loss value based on the first token vector, the second token vector, and a preset contrast loss function includes:

Optionally, the obtaining the forward token vector corresponding to the first token vector in the token memory bank includes:

Optionally, the preset classification loss function is a cross entropy loss function, and the obtaining a first loss value based on the second classification label, the first classification label and the preset classification loss function includes:

sampling the data sample to obtain sampling data;

The embodiment of the specification provides data processing equipment, which is used for acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process, the data sample is input into a first module of the target model, a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample are obtained, a second classification label corresponding to the data sample is determined based on the similarity of the first characterization vector and a preset classification characterization vector, a first loss value is obtained based on the second classification label, the first classification label and a preset classification loss function, the data sample is input into the second module of the target model, a second characterization vector corresponding to the data sample is obtained, a second loss value is obtained based on the first characterization vector, the second characterization vector and a preset comparison loss function, the second module is free of gradient back propagation, and iterative training is performed on the target model based on the first loss value and the second loss value, so that a trained target model is obtained. Therefore, on one hand, the data samples used for training the target model can be unlabeled samples, namely, the data samples can be gray samples, so that the problem of poor model training effect caused by small data size of the labeled samples can be avoided, on the other hand, the aggregation of the characterization space can be optimized through the second classification labels, in addition, the data samples of the same class can be more aggregated through the first characterization vector, the second characterization vector and the preset contrast loss function, the data samples of different classes are more dispersed, and the aggregation of the optimized characterization space can be realized, so that when the target model is trained, the unlabeled data samples are used, the model can learn to a better characterization space, namely, when the target model is trained, the gray samples are effectively utilized, the aggregation of the characterization space of the model is optimized, and the detection accuracy of the trained target model is improved.

Example five

The embodiments of the present disclosure further provide a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements each process of the embodiments of the data processing method, and the same technical effects can be achieved, and for avoiding repetition, a detailed description is omitted herein. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The embodiment of the specification provides a computer readable storage medium, which is used for acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process, the data sample is input into a first module of the target model to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample, a second classification label corresponding to the data sample is determined based on the similarity of the first characterization vector and a preset classification characterization vector, a first loss value is obtained based on the second classification label, the first classification label and a preset classification loss function, the data sample is input into a second module of the target model to obtain a second characterization vector corresponding to the data sample, a second loss value is obtained based on the first characterization vector, the second characterization vector and a preset contrast loss function, the second module is in no-gradient reverse propagation, and iterative training is performed on the target model based on the first loss value and the second loss value to obtain the trained target model. Therefore, on one hand, the data samples used for training the target model can be unlabeled samples, namely, the data samples can be gray samples, so that the problem of poor model training effect caused by small data size of the labeled samples can be avoided, on the other hand, the aggregation of the characterization space can be optimized through the second classification labels, in addition, the data samples of the same class can be more aggregated through the first characterization vector, the second characterization vector and the preset contrast loss function, the data samples of different classes are more dispersed, and the aggregation of the optimized characterization space can be realized, so that when the target model is trained, the unlabeled data samples are used, the model can learn to a better characterization space, namely, when the target model is trained, the gray samples are effectively utilized, the aggregation of the characterization space of the model is optimized, and the detection accuracy of the trained target model is improved.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing one or more embodiments of the present description.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A data processing method, comprising:

2. The method of claim 1, the first module comprising a first data enhancement sub-module for data enhancement processing of the target data, an encoder for encoding the enhanced data samples, a classifier for classifying the first encoded results, and a first characterization extraction sub-module for characterization extraction processing of the first encoded results;

3. The method of claim 2, the method further comprising:

acquiring target data to be detected;

4. The method of claim 2, the method further comprising:

acquiring target data to be detected;

5. The method of claim 2, wherein the determining, based on the similarity between the first token vector and a preset class token vector, a second class label corresponding to the data sample comprises:

6. The method of claim 5, the deriving a second loss value based on the first token vector, the second token vector, and a preset contrast loss function, comprising:

7. The method of claim 6, the obtaining a forward token vector in the token memory corresponding to the first token vector comprising:

8. The method of claim 6, the predetermined class loss function being a cross entropy loss function, the deriving a first loss value based on the second class label, the first class label, and the predetermined class loss function comprising:

sampling the data sample to obtain sampling data;

9. A data processing apparatus comprising:

the sample acquisition module is used for acquiring a data sample for training a target model, wherein the data sample comprises input data of a user in a human-computer interaction process;

the first determining module is used for inputting the data sample into the first module of the target model to obtain a first classification label corresponding to the data sample and a first characterization vector corresponding to the data sample; determining a second classification label corresponding to the data sample based on the similarity of the first characterization vector and a preset classification characterization vector, and obtaining a first loss value based on the second classification label, the first classification label and a preset classification loss function;

the second determining module is used for inputting the data sample into the second module of the target model, obtaining a second characterization vector corresponding to the data sample, and obtaining a second loss value based on the first characterization vector, the second characterization vector and a preset contrast loss function, wherein the second module has no gradient back propagation;

and the training module is used for carrying out iterative training on the target model based on the first loss value and the second loss value to obtain a trained target model.

10. A data processing apparatus, the data processing apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

11. A storage medium for storing computer-executable instructions that when executed implement the following: