CN115130536A - Training method of feature extraction model, data processing method, device and equipment - Google Patents

Training method of feature extraction model, data processing method, device and equipment Download PDF

Info

Publication number
CN115130536A
CN115130536A CN202210369228.2A CN202210369228A CN115130536A CN 115130536 A CN115130536 A CN 115130536A CN 202210369228 A CN202210369228 A CN 202210369228A CN 115130536 A CN115130536 A CN 115130536A
Authority
CN
China
Prior art keywords
training
sample
data
anchor point
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210369228.2A
Other languages
Chinese (zh)
Inventor
李文豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210369228.2A priority Critical patent/CN115130536A/en
Publication of CN115130536A publication Critical patent/CN115130536A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the application provides a training method, a data processing method, a device and equipment of a feature extraction model, and relates to the technical fields of artificial intelligence, payment safety, finance and cloud, wherein the training method comprises the following steps: acquiring a training set, and constructing a plurality of positive sample pairs and a plurality of negative sample pairs based on the training set, wherein the positive sample pairs comprise two training samples in the same category, and the negative sample pairs comprise two training samples in different categories; repeatedly executing training operation on the neural network model based on the training set until a preset condition is met, and obtaining a trained feature extraction model; and for each training operation, if the preset condition is not met, determining a plurality of new sample pairs based on the similarity between the feature vectors of the training samples output by the model, and taking the new sample pairs as the sample pairs based on the subsequent training operation. Based on the training method provided by the application, the performance of the feature extraction model can be effectively improved.

Description

Training method of feature extraction model, data processing method, device and equipment
Technical Field
The application relates to the technical field of artificial intelligence, payment security, finance and cloud, in particular to a training method, a data processing device and equipment of a feature extraction model.
Background
With the rapid development of artificial intelligence technology, more and more artificial intelligence technologies are applied to various fields, data processing methods based on artificial intelligence technology have been widely applied to more and more scenes, and data processing based on neural network models belongs to one of the most important branches.
In a data processing mode based on a neural network model, feature extraction of input data of the model is usually indispensable, and the expression capability of features extracted by the model is a very critical factor influencing the quality of a model processing result, so that obtaining a high-performance feature extraction model through a training mode is an important research topic in an artificial intelligence technology. In the prior art, in order to improve the performance of a model, various model training methods have been proposed, and although some models obtained by training in the existing training methods can meet application requirements to a certain extent, the performance of the model still needs to be improved, and particularly when the model is applied to a specific task (for example, a classification task in a specific scene), the expression capability of features extracted by the model still needs to be improved.
Disclosure of Invention
An object of the embodiments of the present application is to provide a training method capable of effectively improving performance of a feature extraction model, and a data processing method, an apparatus, an electronic device, and a computer-readable storage medium based on the training method. In order to achieve the purpose, the technical scheme provided by the embodiment of the application is as follows:
in one aspect, an embodiment of the present application provides a method for training a feature extraction model, where the method includes:
acquiring a training set, wherein the training set comprises training samples of a plurality of categories;
constructing a plurality of sample pairs based on the training set, the plurality of sample pairs including a plurality of positive sample pairs and a plurality of negative sample pairs, wherein the positive sample pairs include two training samples belonging to the same class, and the negative sample pairs include two training samples of the different classes;
repeatedly executing training operation on the neural network model based on the training set until a preset condition is met, and taking the neural network model meeting the preset condition as a trained feature extraction model; wherein the preset condition includes total loss convergence of training corresponding to the neural network model or the number of times of training reaching a set number of times, and the training operation includes:
respectively inputting each training sample in a plurality of sample pairs into the neural network model to obtain a feature vector of each training sample;
determining a total training loss based on a first similarity between feature vectors of training samples in each of the sample pairs;
if the total training loss is not converged and the training times do not reach the set times, adjusting model parameters of the neural network model, determining a plurality of new sample pairs based on the second similarity between the feature vectors of the training samples, and taking the new sample pairs as the sample pairs based on the subsequent training operation.
In an alternative embodiment of the present application, the total loss of training characterizes a degree of difference between positive sample pairs and a degree of similarity between negative sample pairs among a plurality of sample pairs input to the model.
In an optional embodiment of the present application, the second similarity corresponding to the new positive sample pair corresponding to each training sample is smaller than the second similarity corresponding to the positive sample pair before updating corresponding to the training sample, and the second similarity corresponding to the new negative sample pair corresponding to each training sample is larger than the second similarity corresponding to the negative sample pair before updating corresponding to the training sample.
Optionally, the feature extraction model is a feature extraction model in a classification model, and the classification model is configured to extract a feature vector of the first data to be processed through the feature extraction model, and identify a classification result corresponding to the first data to be processed based on the extracted feature vector; the classification result is one of a plurality of specified classes, the plurality of classes include the plurality of specified classes, and one training sample of each specified class is second data to be processed corresponding to the specified class.
Optionally, the classification model may specifically identify whether the first to-be-processed data is data corresponding to a target class based on the extracted feature vector, that is, the classification result represents that the first to-be-processed data is data corresponding to the target class or data corresponding to a non-target class, and accordingly, the plurality of specified classes include the target class and at least one non-target class.
Optionally, the designated class is a designated object class, the second to-be-processed data is service data corresponding to a sample object of the designated object class, the first to-be-processed data is service data corresponding to a target object, and the classification result represents whether the target object is an object of the target class.
In another aspect, an embodiment of the present application provides a training apparatus for a feature extraction model, where the apparatus includes:
the training data acquisition module is used for acquiring a training set, and the training set comprises a plurality of classes of training samples;
a training data processing module, configured to construct a plurality of sample pairs based on the training set, where the plurality of sample pairs include a plurality of positive sample pairs and a plurality of negative sample pairs, where the positive sample pairs include two training samples of a same category, and the negative sample pairs include two training samples of the different categories;
the model training module is used for repeatedly executing training operation on the neural network model based on the training set until a preset condition is met, and taking the neural network model meeting the preset condition as a trained feature extraction model; wherein the preset condition includes that the total loss of training corresponding to the neural network model converges or the number of training times reaches a set number of times, and the training operation includes:
respectively inputting each training sample in a plurality of sample pairs into the neural network model to obtain a feature vector of each training sample;
determining a total training loss based on a first similarity between feature vectors of training samples in each of the sample pairs;
if the total training loss is not converged and the training times do not reach the set times, adjusting model parameters of the neural network model, determining a plurality of new sample pairs based on the second similarity between the feature vectors of the training samples, and taking the new sample pairs as the sample pairs based on the subsequent training operation.
Optionally, the model training module may be configured to: for each training sample, respectively determining a second similarity between the feature vector of the training sample and the feature vector of each first sample, and taking the corresponding first sample with the lowest second similarity and the training sample as a new positive sample pair, wherein the first sample is a training sample belonging to the same class as the training sample in each training sample; and for each training sample, respectively determining a second similarity between the feature vector of the training sample and the feature vector of each second sample, and taking the corresponding second sample with the highest second similarity and the training sample as a new negative sample pair, wherein the second sample is a training sample belonging to a different class from the training sample in each training sample.
Optionally, the training data processing module may be configured to: respectively taking each training sample in the training set as an anchor point, and constructing a sample group corresponding to each anchor point, wherein the sample group corresponding to each anchor point comprises a positive sample pair and a negative sample pair corresponding to the anchor point, the positive sample pair corresponding to one anchor point comprises the anchor point and the positive sample of the anchor point, and the negative sample pair corresponding to one anchor point comprises the anchor point and the negative sample of the anchor point;
accordingly, the model training module, in determining the total loss of training, may be configured to: for each sample group, determining training loss corresponding to the sample group according to a first similarity between the feature vectors of two samples in a positive sample pair of the sample group and a first similarity between the feature vectors of two samples in a negative sample pair of the sample group; determining a total training loss according to the training loss corresponding to each sample group;
the model training module, in determining a plurality of new sample pairs, may be configured to: and determining a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of the training samples, and taking the sample pairs in the new sample group corresponding to each anchor point as a plurality of sample pairs in the subsequent training operation.
Optionally, the model training module, when determining a new sample set corresponding to each anchor point, may be configured to: for each anchor point, respectively determining second similarity between the feature vector of the anchor point and the feature vector of each first sample, and determining the corresponding first sample with the lowest second similarity as a new positive sample corresponding to the anchor point, wherein the first sample is a training sample belonging to the same category as the anchor point in each training sample; and for each anchor point, determining a second similarity between the feature vector of the anchor point and the feature vector of each second sample, and determining the corresponding second sample with the highest second similarity as a new negative sample corresponding to the anchor point, wherein the second sample is a training sample belonging to a different category from the anchor point in each training sample.
Optionally, the training data processing module may be configured to, when constructing the sample group corresponding to each anchor point: constructing at least one batch data set according to the training set, wherein each batch data set comprises p classes of training samples, the number of the training samples of each class is k, p is more than or equal to 2, and k is more than or equal to 3; for each batch of data sets, respectively taking each training sample in the batch of data sets as an anchor point, and constructing a sample group corresponding to each anchor point in the batch of data sets based on each training sample in the batch of data sets;
accordingly, the model training module may be configured to: repeatedly executing training operation on the neural network model based on each batch of data sets, wherein each training operation is performed based on a sample group corresponding to each anchor point in one batch of data sets;
the model training module, when determining a new set of samples corresponding to each of the anchor points, may be configured to: and for each anchor point in the batch of data sets corresponding to the current training operation, determining a new sample group corresponding to the anchor point according to the second similarity between the anchor point and each training sample except the anchor point in the batch of data sets.
Optionally, for each sample group, the model training module, when determining the training loss corresponding to the sample group, may be configured to:
determining a first distance between the feature vectors of the two samples in the positive sample pair of the sample group and a second distance between the feature vectors of the two samples in the negative sample pair of the sample group, wherein the first distance represents a first similarity corresponding to the positive sample pair of the sample group, and the second distance represents a first similarity corresponding to the negative sample pair of the sample group;
determining a difference between the first distance and the second distance; and determining the training loss corresponding to the sample group according to the difference, wherein the training loss corresponding to the sample group is positively correlated with the difference.
Optionally, the training loss corresponding to each sample group is determined based on the following expression:
s(x)=ln(1+e x )
x=d(a,p)-d(a,n)+β
wherein s (x) represents the training loss corresponding to the sample group, a, p and n represent the anchor point, the positive sample and the negative sample in the sample group, respectively, d (a, p) represents the first distance, d (a, n) represents the second distance, and β represents the preset adjustment threshold.
On the other hand, the embodiment of the present application further provides a data processing method based on a neural network model, the method includes:
acquiring data to be processed; inputting the data to be processed into a first feature extraction model, and extracting a first feature vector corresponding to the data to be processed through the first feature extraction model, wherein the first feature extraction model is obtained by training through a training method provided in any optional embodiment of the application;
and determining a classification result corresponding to the data to be processed based on the first feature vector.
In another aspect, an embodiment of the present application further provides a data processing apparatus based on a neural network model, where the apparatus includes:
the data acquisition module is used for acquiring data to be processed;
the data processing module is configured to input the data to be processed into a first feature extraction model, extract a first feature vector corresponding to the data to be processed through the first feature extraction model, and determine a classification result corresponding to the data to be processed based on the first feature vector, where the first feature extraction model is obtained by training using a training method provided in any optional embodiment of the present application.
Optionally, the data processing module may be configured to: extracting a second feature vector of the data to be processed through the second feature extraction model; fusing the first feature vector and the second feature vector; and determining a classification result corresponding to the data to be processed based on the fused features.
Optionally, the data obtaining module may be configured to: acquiring service data of a target service in multiple time periods corresponding to a target object, wherein the service data corresponding to each time period comprises an attribute value of at least one service attribute of the target service; based on the service data corresponding to the multiple time periods, constructing a service time sequence characteristic matrix corresponding to the target object, taking the service time sequence characteristic matrix as the data to be processed, and representing the object type of the target object by the classification result; the line number of the service timing sequence feature matrix is the number of the time periods of the multiple time periods, the column number is the number of the attributes of the at least one service attribute, and each element value in the service timing sequence feature matrix represents an attribute value of one service attribute corresponding to one time period.
Based on the method provided by the embodiment of the present application, the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to implement the training method provided by any optional embodiment of the present application, or to implement the data processing method provided by any optional embodiment of the present application.
An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the training method provided in any optional embodiment of the present application, or implements the data processing method provided in any optional embodiment of the present application.
An embodiment of the present application further provides a computer program product, where the computer program product includes a computer program, and when executed by a processor, the computer program implements the training method provided in any optional embodiment of the present application, or implements the data processing method provided in any optional embodiment of the present application.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
according to the training method provided by the embodiment of the application, in the process of training the neural network model based on the positive sample pair and the negative sample pair, the updating of each sample pair is continuously carried out based on the similarity between the feature vectors of the training samples obtained by model extraction, namely, a new sample pair is determined according to the similarity. By adopting the method, in the model training process, the scheme of simply randomly selecting the positive sample pairs and the negative sample pairs is not adopted, but the sample pairs are selected according to the similarity between the samples, so that the optimization of the sample pairs is realized, and the training effect of the model is improved. Based on this method of this application, can select assorted sample pair according to the application demand in the training process to better satisfy the application demand, optionally, can let the model study through the more difficult sample combination of screening study, thereby can effectively promote the model performance of the feature extraction model that trains, promote the distinguishment of the eigenvector that the model output, and can accelerate the training efficiency of model, through this mode training model, can be better satisfy practical application demand.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
FIG. 1 is a block diagram of a data processing system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a data processing method in an application scenario according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a training method for a feature extraction model according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a training mode of a feature extraction model according to an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a model training principle provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of a data processing method according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of an exercise device according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device to which the embodiment of the present application is applied.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" can be implemented as "a", or as "B", or as "a and B". When describing a plurality of (two or more) items, if the relationship between the plurality of items is not explicitly defined, the plurality of items may refer to one, more or all of the plurality of items, for example, for the description of "parameter a includes a1, a2, A3", parameter a may be implemented to include a1, a2 or A3, and parameter a may be implemented to include at least two of the three items of parameters a1, a2, A3.
In the related art, for the feature extraction of data, for example, for the feature extraction of time series data, a feature construction scheme based on a statistical domain, a frequency domain or a time domain is usually adopted, or a feature Embedding (Embedding) mode based on deep learning is adopted, but the former needs prior knowledge and expert experience and has a certain degree of information loss, and for the latter, most of the current feature Embedding modes adopt unsupervised Embedding modes such as word steering quantity (word steering quantity) and item steering quantity (content steering quantity), but when the features extracted by these modes are applied to a specific scene, the gain effect of the features cannot be guaranteed, that is, the extracted features have universality, but the feature discrimination under a specific task is poor.
The training method provided by the embodiment of the application is just for the problems in the prior art, and the training method and the data processing method of the feature extraction model based on the depth metric learning are provided.
The scheme provided in the embodiment of the present application is implemented based on an Artificial Intelligence (AI) technology, for example, both the feature extraction model and the classification model are Artificial Intelligence-based neural network models, where the feature extraction model in the embodiment of the present application may be a model based on any existing feature extraction model or a model obtained by improving an existing feature extraction model, that is, the training method of the feature extraction model provided in the embodiment of the present application may be applied to training any feature extraction model, and may improve the performance of the trained model, where the training of the feature extraction model may be trained in a Machine Learning (ML) manner based on a training set (i.e., a large number of training samples).
The artificial intelligence is to study the design principle and implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. With the research and progress of artificial intelligence technology, the artificial intelligence technology has been researched and applied in a plurality of fields such as common intelligent home, intelligent wearable equipment, virtual assistant, intelligent sound box, intelligent marketing, unmanned driving, automatic driving, unmanned aerial vehicle, robot, intelligent medical treatment, intelligent customer service, internet of vehicles, automatic driving, intelligent traffic, and the like.
Optionally, the data processing related to the method provided by the embodiment of the present application may be implemented based on a cloud technology. For example, data calculations involved in the application process (e.g., classifying data to be processed) and the training process of the feature extraction model (e.g., similarity calculation between feature vectors in the model training process, calculation of training loss, adjustment of model parameters, etc.) may be implemented using cloud computing techniques. Among them, cloud computing (cloud computing) is a computing mode that distributes computing tasks over a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as if they are infinitely expandable and can be acquired at any time, used on demand, expanded at any time, and paid for use.
The training method or the data processing method of the feature extraction model provided by the embodiment of the application can be executed by any electronic device, such as a user terminal or a server. For example, the data to be processed may be data sent by a user to a server through a user terminal thereof, the server may deploy a trained feature extraction model, the server extracts a feature vector of the data to be processed by executing the data processing method provided in the embodiment of the present application, and may perform subsequent processing based on the extracted feature vector according to application requirements, where the subsequent processing may include, but is not limited to, classification of the data to be processed, similarity determination between the data to be processed and other data, and the like. For example, a user may send an image set including a large number of images to a server through a user terminal of the user, the server may invoke a trained feature extraction model to extract feature vectors of each image in the image set, may group the images in the image set according to image categories according to similarities between the feature vectors of the images, and may provide the group result to the user.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The user terminal (also referred to as a user device) may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart voice interaction device (e.g., a smart speaker), a wearable electronic device (e.g., a smart watch), an in-vehicle terminal, a smart home appliance (e.g., a smart television), an AR/VR device, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
Optionally, the method provided in this embodiment of the present application may be implemented as a stand-alone application program or a functional module/plug-in of an application program, for example, the application program may be special data classification software or other application programs with a data classification function, and classification of data to be processed may be implemented through the application program.
The feature extraction model obtained by training the feature extraction model provided in the embodiment of the present application may be applicable to any application scenario that requires extraction of feature vectors (i.e., feature representation) with discriminativity for data to be processed, and may include, but is not limited to, data classification, object classification, and the like, for example, the feature extraction model provided in the embodiment of the present application may be applied to wind control management, the feature extraction model may extract feature vectors of service data of related services based on the feature extraction model, and perform risk identification based on the feature vectors, for example, the feature extraction model may be used as a backbone network of a classification model, after the feature vectors of the service data are extracted through the backbone network, the risk levels corresponding to the service data may be predicted through the classification model of the classification model based on the service vectors, and different risk levels in the scenario are different categories, in model training, the training samples included in the training set may be a plurality of training samples corresponding to respective risk levels, and one training sample for each risk level may include sample traffic data corresponding to the risk level (i.e., traffic data for known true risk levels).
It should be noted that in alternative embodiments of the present application, the subject-related data may be referred to, and when the embodiments of the present application are applied to specific products or technologies, subject permission or consent needs to be obtained, and the collection, use and processing of the related data may be subject to relevant laws and regulations and standards in relevant countries and regions. That is, if data related to the subject is referred to in the embodiments of the present application, the data needs to be obtained through approval by the subject and in compliance with relevant laws and regulations and standards of countries and regions.
For better illustration and understanding of the method provided in the embodiments of the present application, an alternative implementation of the method provided in the present application will be described below with reference to a specific application scenario. In this scenario embodiment, taking an application of the data processing method provided in the embodiment of the present application to an application program with a mobile payment function as an example, based on the method provided in the embodiment of the present application, the method may be used to identify a type (i.e., a category) of an object corresponding to the application, for example, whether the object is a risk object, i.e., whether the object is an object of a target category, may be determined based on business data related to the object, and the business data may be business data of one or more specified types of business (i.e., the target business may be one or more), where a dividing manner of the object type is not limited in the embodiment of the present application.
Fig. 1 shows a schematic structural diagram of a data processing system to which the embodiment of the present application is applicable, and as shown in fig. 1, the data processing system may include a terminal device 11, a terminal device 21, an application server 20, and a training server 30, where the application server 20 may be a server of an application program providing the above-mentioned mobile payment function. The terminal device 11 may be an electronic device that uses an object of the application, and the terminal device 11 may communicate with the application server 20 via a network, may operate via a user interface of the application, and uses a service provided by the application. The terminal device 21 may be an electronic device on a management side of the application, a management client of the application may be run on the terminal device 21, the terminal device 21 may also communicate with the application server 20 through a network, and an authorized administrator or related personnel may perform management of the application through the management client.
The training server 30 may perform a training operation of the neural network model by executing the training method provided in the embodiment of the present application, so as to obtain a trained feature extraction model. Optionally, after obtaining the trained feature extraction model, the training server 30 may also train the classification model including the feature extraction model to obtain the trained classification model. The application server 20 and the training server 30 may communicate via a network, after the trained classification model is obtained by the training server 30, the trained classification model may be deployed in the application server 20, the application server 20 performs processing on the data to be processed by invoking the trained classification model by executing the data processing method provided in the embodiment of the present application, where the processing may include but is not limited to identification of a category of an object corresponding to the data to be processed, and optionally, the application server 20 may further send the identification result to the terminal device 21, so as to show the identification result to an administrator of the application program.
In the application scenario, the data processing method provided in the embodiment of the present application may be executed by the application server 20, or may be executed by other electronic devices independent of the application server, such as a cloud server. The following description will be given taking an application server as an execution subject.
Fig. 2 is a schematic flow chart of data processing based on the data processing system shown in fig. 1. In the following, with reference to fig. 1 and fig. 2, the scenario embodiment may include steps S11 to S13, and steps S21 and S23, and specifically the following steps are performed:
step S11: a training data set is obtained.
Step S12: the training server 20 trains the feature extraction model as well as the classification model.
The classification model includes a feature extraction model and a classification module, and may also include other neural network structures. Optionally, the feature extraction model may be trained first, and after the trained feature extraction model is obtained, the classification model may be trained again on the premise of fixing the model parameters of the feature extraction model (that is, the model parameters of the classification module of the classification model are learned). Of course, the whole classification model may also be directly trained end to end, in which case, the training loss of the classification model may include the total training loss of the feature extraction model portion and the classification loss of the classification model. The following description is made in a manner of training the feature extraction model first.
Optionally, in this scenario embodiment, the training data set may include a first training set and a second training set, where the first training set and the second training set both include a large number of training samples, the first training set is a sample set used for training the feature extraction module, and the second training set is a sample set used for training the classification model. For convenience of description, the training samples in the first training set are referred to as first samples, and the training samples in the second training set are referred to as second samples.
The sample acquisition method is not limited in this application. Optionally, in order to make the trained classification model more conform to the application scenario, the samples in the training data set may be samples corresponding to the application scenario. Optionally, a training sample may be constructed based on historical service data corresponding to the application program, or simulated service data may be obtained by simulating the operation of the application program, a training sample may be constructed based on the service data obtained through simulation, or constructed in other manners. Taking an example that the purpose of training the classification model is to use the trained model to identify whether the target object is an object of a target class, the training samples of multiple classes in the first training set may include training samples of two classes, where the two classes may be a target class and a non-target class, the first training set includes multiple samples of a target class and multiple samples of a non-target class, one sample of a target class refers to service data corresponding to one sample object of a target class or a service timing characteristic obtained based on the service data, and similarly, one sample of a non-target class refers to service data or a service timing characteristic corresponding to one sample object of a non-target class. Similarly, the training samples in the second training set may also include samples corresponding to sample objects of a specified category and samples corresponding to sample objects of a non-specified category.
Taking the first training set as an example, a plurality of first samples may be constructed based on a plurality of specified types of business data of each sample object, taking the obtaining of one sample as an example, the business data of a plurality of specified types of one sample object in a plurality of time periods may be obtained, for example, transaction data of a plurality of specified types of 30 days of the sample object (which may also be referred to as transaction characteristics, which may be attribute values of business attributes of each type, for example, for payment business, which may include the number of resources involved in payment) may be obtained according to the 30 days of the transaction data of the plurality of specified types, and the business time sequence characteristics corresponding to the sample object may be represented as a 30 × 100 characteristic matrix assuming that attribute values of 100 business attributes corresponding to the transaction data of the plurality of specified types are adopted altogether, therefore, the service timing characteristic may also be referred to as a service timing characteristic matrix, where 30 is a dimension of the timing characteristic, i.e., 30 days, and 100 is a dimension of the service characteristic, i.e., 100 attribute values. The traffic timing characteristics and the real object type (i.e., sample label, i.e., category label) of the sample object may be used as a sample.
As an example, table 1 below shows an example of a plurality of service data of specified types corresponding to one sample, and as shown in table 1, the example includes n service data of specified types for m days, and the attribute value i (ie ∈ [1, n ∈ n)]) Attribute value representing the ith specified type of service data, D j (j∈[1,m]) And representing the traffic data of the j day, the traffic time sequence feature corresponding to the traffic data in this example may be an m × n feature matrix, and the feature matrix is input into the feature extraction model as a sample to obtain a corresponding feature vector.
TABLE 1
Attribute value 1 Attribute value 2 Attribute value 3 Attribute value 4 Attribute value n
D 1 a 11 a 12 a 13 a 14 a 1n
D 2 a 21 a 22 a 23 a 24 a 2n
D m a m1 a m2 a m3 a m4 a mn
The dimension of the feature vector output by the feature extraction model is determined by the dimension of the output layer of the model, can be configured according to actual application requirements, and can be configured according to actual application requirements, for example, the dimension of the feature vector output by the model can be adjusted according to the dimension of the feature input into the model according to business requirements, the larger the dimension of the input feature is, the larger the dimension of the output feature can be, the higher the dimension of the output vector is theoretically, the denser the information contained in the feature vector is, but the model training and reasoning speed can be influenced. In practical application, it is not recommended that the dimensionality of the output vector is too low, otherwise, the information loss is large, the expression capability of the feature vector for input data is weak,
assuming that in practical applications, the purpose of data processing is to identify an object of a first type (i.e., an object of the above-mentioned target class, such as an object that is not compliant), then the type of the object may include a first type (i.e., a first class) and a second type, then a sample label of a sample represents the first type or the second type, if a sample object is an object of the first type, the sample corresponding to the sample may be regarded as a sample of the first type, and a sample corresponding to a sample object of the second type may be regarded as a sample of the second type, where the samples of the first type and the samples of the second type form a first training set. It is to be understood that in the model training process, two classes of samples or more than two classes of samples may be included in the first training set. When the trained feature extraction model is applied to the classification model, if the purpose of the training model is to identify whether the object is the first type of object, the second training set of the classification model may adopt samples including two categories, that is, a plurality of samples corresponding to the first type of sample object and a plurality of samples corresponding to the second type of sample object, so that the trained binary classification model can be better applied to the specific classification task.
The model structure of the feature extraction model is not limited in the embodiment of the present application, and may be any existing feature extraction network, or may be obtained by modifying an existing feature extraction network. As an alternative, the feature extraction model may use a ResNet50 network (a deep learning network) as a base network, and since the model is a feature vector for extracting input data, the feature extraction model in this embodiment of the present application is a network obtained by modifying a ResNet50 network, and optionally, the modification may include: modifying the final average pooling layer, the full-link layer and the Softmax layer of the ResNet50 network into a full-link layer, a Batch Normalization layer and a full-link layer, wherein the parameter information (including the number of neurons) of the neurons of each layer in the modified model can be configured according to actual requirements, for example, the parameters of the full-link layer before and after the modified Batch Normalization can be 2048 × 1000 and 1024 × 128 respectively, wherein 2048 and 1024 respectively represent the dimensions of the input features of the two full-link layers, 1000 and 128 respectively represent the dimensions of the output features of the two full-link layers, and with the parameter configuration, the dimension of the feature vector output by the feature extraction model is 128, that is, the feature vector includes 128 feature values.
In this embodiment of the application, the first training set includes a plurality of classes of first training samples, and assuming that the number of classes is m, an initial sample group (triplet) for model training is first constructed based on the first training set, a plurality of batch data sets may be constructed in a sampling manner, and a triplet corresponding to each batch data set is constructed based on a sample in each batch data set. Optionally, during each sampling, first training samples of p categories may be randomly selected from a first training set including first training samples of M categories, and k samples are randomly selected from each category, that is, p × k samples are included in one sampling training, that is, the number of samples in one batch of data set is p × k. In the training process, specific positive samples in the batch are sequentially selected as anchor points, then a positive sample which is the most difficult to the anchor points and a negative sample which is the most difficult to the anchor points are respectively selected to form triples with the positive samples and the negative samples, optionally, corresponding triples can be constructed for each sample in the batch data set, and then p × k triples are contained in one sampling.
It should be noted that the initial triplet may be randomly generated, that is, before the constructed neural network model (the feature extraction model before training) is trained, for each sample in the batch data set, a sample belonging to the same class as the sample may be randomly selected as a positive sample, and a sample belonging to a different class from the sample may be selected as a negative sample, so that the three samples form a triplet. In the training process, a new triplet corresponding to each sample may be re-determined based on the similarity between feature vectors of samples output by the model, that is, for each sample (i.e., anchor point), one positive sample (the sample of the same class with the lowest similarity to the sample) and one negative sample (the sample of a different class with the highest similarity to the sample) that are the most difficult to the sample are selected and taken as the new triplet corresponding to the sample.
Each batch of data sets is obtained through a random sampling mode, and after p × k initial triples in each batch of data sets are constructed, iterative training can be performed on the neural network model based on each batch of data sets, wherein the setting of relevant parameters (such as training algebra, learning rate and the like) adopted during the training of the model can be set according to requirements. As an alternative, the algebra of the training (i.e., epoch, all the batch data sets participated in one training to one generation) can be set to t 1 25000, the network optimizer adopted in the training can be an Adam optimizer, the default learning rate epsilon of the Adam optimizer is 3e-4, and two exponential decay rate coefficients beta 1 And beta 2 Are respectively 0.9 and 0.999. In the embodiment of the application, relevant parameters of the Adam optimizer are optimized, namely beta 1 Set to 0.5, and for the learning rate, a training algebra threshold t is set 0 The learning rate ∈ (t) is expressed as 15000:
Figure BDA0003587268420000161
wherein e is 0 3e-4, t represents the current training algebra in the training process, and e (t) represents the current algebra in the training process, that is, when the training algebra reaches t 0 In the meantime, the learning rate changes with the increase of the training algebra, specifically, the learning rate is positively correlated, that is, the larger t, the larger e (t).
In iteratively training a model, one training of the model based on one batch of data sets may be referred to as an iteration. During the training process, can beEach first training sample in the data set is input into the neural network model, a feature vector of each first training sample is obtained, a total training loss corresponding to the iterative training is calculated according to a first similarity between feature vectors of first training samples in p × k triples (for example, a distance between a feature vector of an anchor point in each triplet and a feature vector of a positive sample, and a distance between a feature vector of an anchor point and a feature vector of a negative sample may be based), parameters of the model are updated, and triples in the data set are updated according to a second similarity between feature vectors of each first training sample, so that p × k new triples are obtained, and the p × k new triples are used in a next generation training process. Based on each batch of data sets, the process of training the model and updating the triples and the like is continuously carried out until the training algebra reaches the set algebra t 1 Or the corresponding total training loss in the iterative training process is converged, the neural network model at the moment is used as a trained feature extraction model, optionally, the model can be verified or tested, the model meeting the verification or test conditions is used as the trained model, and if the model does not meet the verification or test conditions, the training can be continued.
After the trained feature extraction model is obtained, the classification model including the feature extraction model may be trained based on the samples in the second training set to obtain the trained classification model. The embodiment of the present application is not limited to the training mode of the classification model, and optionally, when the classification model is performed, the model parameters of the trained feature extraction model may be fixed.
Step S13: the trained classification model is deployed to the application server 20.
Step S21: the application server 20 retrieves the data to be processed.
Step S22: the application server 20 identifies the data to be processed through the trained classification model to obtain a corresponding classification result;
step S23: the application server transmits the classification result to the terminal device 21 to provide the classification result to the manager.
After the trained classification model is obtained through training, the training server 30 may send the trained classification model to the application server 20, the application server 20 may recognize data to be processed through the classification model, and may provide the recognized classification result to a manager through the terminal device 21, for example, the classification result may be sent to the terminal device 21, the terminal device 21 may display the classification result to the manager or related personnel through a management client of the application program, or send corresponding prompt information to the manager when the classification result is a specified result, for example, send a prompt when the classification result is abnormal.
In this scenario embodiment, the service data to be processed may be service data of multiple services of specified types corresponding to an object using an application when the application is used by the object, the application server 20 may obtain a service timing feature corresponding to the object based on an attribute value of each service corresponding to the object, where the service timing feature is the data to be processed in the application scenario, based on the service timing feature, the application server 20 may perform feature extraction on the input service timing feature by using a trained classification model, and obtain a corresponding feature vector, based on the feature vector, a classification module of the classification model may obtain a corresponding classification result based on the feature vector, where the classification result represents an object type of the object corresponding to the service timing feature.
It can be understood that the training method of the feature extraction model and the data processing method provided in the embodiments of the present application may be applied to, but are not limited to, the above application scenarios. By adopting the characteristic extraction model obtained by training the scheme provided by the application, the discrimination degree of the characteristic vector of the data to be processed obtained by the model extraction can be effectively improved, so that the processing effect can be effectively improved when the characteristic vector is further processed, and the classification accuracy can be effectively improved in the application scene.
The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below through descriptions of several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps and the like in different embodiments is not repeated.
Fig. 3 is a flowchart illustrating a training method of a feature extraction model according to an embodiment of the present application, and as shown in fig. 3, the training method may include the following steps S110 to S130.
Step S110: a training set is obtained, the training set including training samples of a plurality of classes.
In the embodiment of the present application, the training data set includes training subsets corresponding to a plurality of (at least two) classes, and each training subset includes a plurality of training samples belonging to the same class. The classification method is not limited in the present application, and classification may be performed according to an actual application scenario and an application requirement.
The embodiment of the present application is not limited to the acquisition mode of the training set, and the acquisition mode may be based on a data set used for classification model training, a data set designed manually, or a training set constructed from sample data acquired in an actual application scenario. Optionally, the training samples in the training set may be a large number of training samples corresponding to the same application scenario, so that the trained feature extraction model may be better applicable to the application scenario, and feature vectors corresponding to different data to be processed in the scenario extracted by the model have better discrimination. Of course, the training set may also adopt a large number of training samples corresponding to a plurality of application scenarios, so that the trained feature extraction model has better generality.
Optionally, in practical application, the data format of the training sample may be selected according to practical application requirements, and the data format of the training sample may correspond to the data format of the data to be processed in the application scenario (that is, the data to be input into the model and required to be subjected to feature extraction) to which the model is applied after being trained, that is, the training sample may be selected according to the task to be solved in the practical application scenario. Optionally, the feature extraction model may be used as a feature extraction module in a classification model, where the classification model is configured to extract a feature vector of the first to-be-processed data through the feature extraction model, and identify a classification result corresponding to the first to-be-processed data based on the extracted feature vector; in this alternative, the plurality of classes include the plurality of specified classes, and one training sample of each specified class is the second data to be processed corresponding to the specified class.
The data form of the first data to be processed is determined by the specific classification task to be solved in the actual service scene to which the classification model is applied. For example, the classification model is used to identify the class of the object according to the service data of the object in the target application, that is, the feature extraction model is applied to the object classification task and is to extract the feature vector for object classification based on the service data of the object, the training sample is the service data of the sample object or the processed service data feature (such as the service time sequence feature matrix in the foregoing) obtained by processing the service data, and the training set includes the service data or service data feature corresponding to the sample object of each class. For another example, the classification model is used to classify texts, the training set may include sample texts (second to-be-processed data) of multiple classes (i.e., the multiple specified classes), and the trained feature extraction model may perform feature extraction on the to-be-processed texts (first to-be-processed data) input into the classification model, and predict text classes of the to-be-processed texts by the classification module of the classification model based on the extracted feature vectors.
The classification model may be a binary classification model or a multi-classification model.
Optionally, the classification model may specifically identify, based on the extracted feature vector, whether the first to-be-processed data is data corresponding to the target category, that is, the classification result represents that the first to-be-processed data is data corresponding to the target category or data corresponding to a non-target category, and correspondingly, the multiple specified categories include the target category and at least one non-target category.
In this alternative, the classification model is a binary classification model, and its function is to identify whether the data to be processed input into the classification model is data of a target class (a specific class among the multiple specified classes), and in the training process of the feature extraction model, the training samples of multiple classes at least include the training sample corresponding to the target class and the training samples corresponding to other classes (classes that are not the target class), and the other classes may be one or multiple. For example, the classification task corresponding to the classification model is to identify whether the image to be processed is an a-class image, the training set may include a plurality of a-class images and a plurality of images that are not a-class images, and each image may serve as a training sample.
It can be understood that the classification manner of the multiple designated classes may be pre-classified according to the actual application scenario and the application requirement, for example, the classification model is to identify whether the object is an object of a target class based on the service data of the object, then the designated class may be the object class, correspondingly, the second to-be-processed data is the training sample, that is, the service data corresponding to the sample object of a certain object class or the service feature data obtained based on the service data, the first to-be-processed data is the service data or the service feature data corresponding to the target object (the object to be identified), and the classification result represents whether the target object is the object of the target class. For another example, the classification model is used for image classification, and the specified category is an image category.
Step S120: and constructing a plurality of sample pairs based on the training set, wherein the plurality of sample pairs comprise a plurality of positive sample pairs and a plurality of negative sample pairs, the positive sample pairs comprise two training samples of the same category, and the negative sample pairs comprise two training samples of different categories.
The plurality of sample pairs constructed in this step may be understood as initial sample pairs, and in the training process of step S130, the sample pairs are continuously optimized and updated based on the output of the neural network model, and a new sample pair obtained by each training operation is used as a sample pair according to which a subsequent training (for example, a next training) is performed.
The embodiment of the present application is not limited to the method for constructing the initial sample pair, and for example, the initial sample pair may be randomly generated, that is, any two training samples in the same category may be used as a positive sample pair, and two training samples in different categories may be used as a negative sample pair.
It is to be understood that "positive" and "negative" in the above-described positive sample pair and negative sample pair in the embodiments of the present application are relative concepts, and "positive" and "negative" are relative to one sample, and for one sample, a sample belonging to the same category as the sample may become a positive sample of the sample, and a sample belonging to a different category from the sample may become a negative sample of the sample. In the sample group described later, the positive sample and the negative sample in one sample group are also relative to the anchor point in the sample group, the sample in the sample group belonging to the same class as the anchor point is the positive sample of the anchor point, and the sample belonging to a different class from the anchor point is the negative sample of the anchor point.
Step S130: and repeatedly executing training operation on the neural network model based on the training set until the preset condition is met, and taking the neural network model meeting the preset condition as a trained feature extraction model.
The preset condition is a training end condition of the model, and may include, but is not limited to, total loss convergence of training corresponding to the neural network model or the number of times of training reaches a set number of times, and the flow of the training operation is as shown in fig. 4, and may include the following steps S131 to S133.
Step S131: respectively inputting each training sample in the plurality of sample pairs into a neural network model to obtain a feature vector of each training sample;
step S132: determining a total loss of training based on a first similarity between feature vectors of training samples in each sample pair;
step S133: if the total training loss is not converged and the training times do not reach the set times, adjusting model parameters of the neural network model, determining a plurality of new sample pairs based on the second similarity between the feature vectors of the training samples, and taking the new sample pairs as the sample pairs based on the subsequent training operation.
It can be understood that, since the training samples in the positive sample pair are two samples belonging to the same category, and the training samples in the negative sample pair are two samples belonging to different categories, the training model is theoretically designed to have the similarity between the feature vectors of the two samples in the positive sample pair (which may be referred to as the similarity between the two samples) as high as possible, and the similarity between the feature vectors of the two samples in the negative sample pair as low as possible, that is, the total training loss represents the difference between each positive sample pair and the similarity between each negative sample pair input into the model during the current training operation, since the similarity between the feature vectors of the two samples in the positive sample pair can reflect the difference between the positive sample pairs, and the similarity between the feature vectors of the two samples in the negative sample pair can reflect the similarity between the negative sample pairs, optionally, for each sample pair, the training loss corresponding to the sample pair may be calculated according to the first similarity between the feature vectors of the two samples in the sample pair, and then the training total loss corresponding to the model is obtained based on the training loss corresponding to each sample pair, for example, the sum or the average of the training losses corresponding to each sample pair is used as the training total loss.
The embodiment of the present application is not limited to the specific calculation manner of the first similarity, and theoretically, any conventional manner for calculating the similarity between two feature vectors may be adopted, for example, an euclidean distance between two feature vectors may be calculated, and the distance is adopted to represent the first similarity, and the larger the distance is, the smaller the similarity is.
In order to obtain a feature extraction network meeting the actual application requirements, an initial neural network model needs to be continuously trained based on training samples in a training set until a preset training end condition, namely the preset condition, is met.
The specific neural network structure of the feature extraction model is not limited in the embodiment of the present application, and may be selected according to practical application requirements, and may be a feature extraction model based on any neural network structure for feature extraction, for example, the feature extraction model may include but is not limited to a feature extraction model based on a convolutional neural network or a feature extraction model based on a cyclic neural network.
When the neural network model is continuously trained, the preset condition for finishing training can be configured according to the actual requirement, and the embodiment of the application is not limited. For example, the preset condition may include that the total training loss satisfies a certain condition, that is, the total training loss converges (a determination condition of the total training loss convergence may be set according to a requirement, for example, the total training loss is less than a set value or a difference between the total training losses corresponding to two consecutive training operations is less than at least one of set thresholds, etc.), or the training frequency reaches the set frequency. Optionally, the preset condition may further include at least one of a verification condition or a test condition, and accordingly, a verification data set and/or a test data set may be preconfigured, in the training process of the model, whether the performance of the current model (the model after being trained for one or more times) meets the verification condition may be evaluated based on the verification data set and/or the test data set, and whether the model needs to be trained continuously is determined based on the evaluation result. It should be noted that the training algebra (epoch) in which the training times in the preset condition reach the set times index reaches the set algebra, that is, the times of all samples in the training set participating in the training all reach the set algebra.
In the above training mode provided in this embodiment of the present application, after each training operation, in addition to calculating the total loss of the model based on the second similarity of the positive sample pair (i.e. the similarity between the feature vectors of the two samples in the positive sample pair) and the similarity of the negative sample pair, the update of the sample pair is performed, specifically, the new positive sample pair and the new negative sample pair may be re-determined according to the similarity between the feature vectors of the training samples output by the model, so that the similarity between the new positive sample pair and the new negative sample pair is as low as possible, the similarity between the new negative sample pair is as high as possible, wherein the similarity between the new positive sample pair corresponding to one training sample is not higher than the positive sample pair before the update corresponding to the sample (the positive sample pair corresponding to the sample input to the model in this training), and the similarity between the new negative sample pair corresponding to one training sample is not lower than the similarity between the negative sample pair before the update corresponding to the sample pair That is to say, the newly determined positive sample pair and negative sample pair are the sample pairs which are difficult to the model, and the plurality of sample pairs used for the subsequent training operation are newly determined in each training process in this way, so that the model can learn the sample pairs which are difficult to learn as much as possible, thereby optimizing the model as much as possible and improving the model performance of the feature extraction model obtained by final training.
Similarly, the embodiment of the present application is not limited to the calculation method of the second similarity, and the calculation method may be configured according to actual requirements, for example, the similarity between the sample pairs may be obtained by calculating the euclidean distance between the feature vectors, the cosine similarity, or other methods.
Optionally, the determining a plurality of new sample pairs based on the second similarity between the feature vectors of the training samples includes:
for each training sample, respectively determining second similarity between the feature vector of the training sample and the feature vector of each first sample, and taking the corresponding first sample with the lowest second similarity and the training sample as a new positive sample pair, wherein the first sample is a training sample belonging to the same category as the training sample in each training sample;
and for each training sample, respectively determining a second similarity between the feature vector of the training sample and the feature vector of each second sample, and taking the corresponding second sample with the highest second similarity and the training sample as a new negative sample pair, wherein the second sample is a training sample belonging to a different class from the training sample in each training sample.
In this alternative, for each training sample in the training set, one training sample and the training sample of the same class that is most dissimilar to the sample may be used as a positive sample pair, one training sample and the training sample of the different class that is most similar to the sample may be used as a negative sample pair, that is, the most difficult sample combination is used as a training sample pair to allow the neural network model to learn, so that the model obtained by training can distinguish the samples of different classes with very high similarity, and the samples of the same class with very low similarity can be identified as the same class, that is, when the feature extraction model obtained by training with the method provided by the present application is used for feature extraction of data to be processed, even if the two data to be processed of the same class with very low similarity, the similarity of feature vectors of the two data to be processed output by the model is also relatively high, even if the similarity of two pieces of data to be processed of different classes is high, the similarity of the feature vectors of the two pieces of data to be processed output by the model is low.
It can be understood that, in practical implementation, in addition to the above-mentioned manner of constructing a new positive sample pair by selecting a sample with the lowest similarity and a new negative sample pair by selecting a sample with the highest similarity, the number of the new sample pairs may be extended based on the same principle, for example, for a training sample, at least two new positive sample pairs or negative sample pairs corresponding to the training sample may be constructed, for example, the similarity between the training sample and each of the other training samples of the same category may be calculated according to the feature vector of each training sample output by the model, the other training samples may be ranked in order of the similarity from low to high, at least two samples ranked first (or the sample with the highest part selected according to the set occupation ratio) are selected to construct a new positive sample pair with the sample, for example, the first two samples with the lowest similarity are selected, these two samples are combined with the sample to obtain two new pairs of positive samples, respectively. Similarly, a new negative sample pair may be constructed in a similar manner, for example, for a training sample, other training samples of the first two different categories with the highest similarity may be selected and constructed with the training sample to obtain a new negative sample pair. By adopting the method, the number of the training sample pairs can be expanded on the premise of ensuring that the determined new sample pairs are the sample pairs with high learning difficulty.
As an alternative, the constructing a plurality of sample pairs based on the training samples in the training set may include:
respectively taking each training sample in the training set as an anchor point, and constructing a sample group corresponding to each anchor point, wherein the sample group corresponding to each anchor point comprises a positive sample pair and a negative sample pair corresponding to the anchor point, the positive sample pair corresponding to one anchor point comprises the anchor point and the positive sample of the anchor point, and the negative sample pair corresponding to one anchor point comprises the anchor point and the negative sample of the anchor point;
accordingly, the determining the total loss of training based on the first similarity between the feature vectors of the training samples in each sample pair may include:
for each sample group, determining training loss corresponding to the sample group according to a first similarity between the feature vectors of two samples in a positive sample pair of the sample group and a first similarity between the feature vectors of two samples in a negative sample pair of the sample group; determining the total training loss according to the training loss corresponding to each sample group;
the determining a plurality of new sample pairs based on the second similarity between the feature vectors of the training samples may include:
and determining a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of the training samples, and taking the sample pairs in the new sample group corresponding to each anchor point as a plurality of sample pairs in the subsequent training operation.
In this alternative, a sample set may be used for training of the neural network model. Each sample group is a triplet including an anchor point (i.e., a training sample), a positive sample corresponding to the anchor point, and a negative sample corresponding to the anchor point, i.e., the positive sample corresponding to the anchor point is a training sample belonging to the same class as the anchor point, and the negative sample corresponding to the anchor point is a training sample belonging to a different class from the anchor point, i.e., a sample group is two sample pairs formed by three training samples with reference to the anchor point.
For example, a random generation mode may be adopted, each training sample is respectively used as an anchor point, one of the training samples belonging to the same category as the anchor point may be randomly selected as a positive sample corresponding to the anchor point, and one of the training samples belonging to different categories as a negative sample corresponding to the anchor point may be randomly selected as a negative sample corresponding to the anchor point, so as to obtain a sample group corresponding to the anchor point.
For the neural network model, the purpose of training the model is to make the similarity between the feature vectors of the positive sample pairs in each sample group learned by the model as high as possible, and the similarity between the feature vectors of the negative sample pairs as low as possible, that is, to draw the distance between the positive sample pairs as close as possible and draw the distance between the negative sample pairs away, so that the learned feature vectors have good class distinction.
As a schematic illustration, fig. 5 shows a schematic diagram of a principle of feature extraction model training based on triples, and as shown in fig. 5, the length of a line segment between an anchor point and a positive sample in the diagram represents the similarity between the anchor point and the positive sample, and the length of a line segment between the anchor point and a negative sample represents the similarity between the anchor point and the negative sample, wherein the longer the line segment is, the smaller the similarity is. The distance between the anchor point on the left side in the figure and the corresponding positive sample, and the distance between the negative sample and the anchor point can be understood as a sample group before training, the distance between the anchor point on the right side in the figure and the positive sample, and the distance between the anchor point and the negative sample can be understood as a distance between a sample pair obtained by the anchor point output by the trained feature extraction model and the feature vectors of the positive sample and the negative sample, and as can be seen from fig. 5, the purpose of training the model is to improve the similarity between the feature vectors of samples of the same category, and reduce the similarity between the feature vectors of samples of different categories.
In order to achieve the above object, in the training process, for each sample group, a training loss corresponding to the sample group may be calculated based on a similarity between a pair of positive samples in the sample group (i.e., a similarity between an anchor point in the sample group and a feature vector of its corresponding positive sample) and a similarity between a pair of negative samples in the sample group, where the training loss corresponding to a sample group characterizes a difference between the pair of positive samples in the sample group and a difference between the pair of negative samples. In this embodiment, the form of the loss function of the neural network model is not limited, and for example, the loss function may include, but is not limited to, a triplet loss function, where the training loss corresponding to a triplet is calculated based on a distance between a positive sample pair (representing a similarity between the sample pairs) and a distance between a negative sample pair in the triplet.
Optionally, for each sample group, the determining the training loss corresponding to the sample group according to the first similarity between the feature vectors of the two samples in the positive sample pair of the sample group and the first similarity between the feature vectors of the two samples in the negative sample pair of the sample group may include:
determining a first distance between the feature vectors of the two samples in the positive sample pair of the sample group and a second distance between the feature vectors of the two samples in the negative sample pair of the sample group, wherein the first distance represents a first similarity corresponding to the positive sample pair of the sample group, and the second distance represents a first similarity corresponding to the negative sample pair of the sample group;
determining a difference between the first distance and the second distance;
and determining the training loss corresponding to the sample group according to the difference, wherein the training loss corresponding to the sample group is positively correlated with the difference.
Because the larger the distance between the two feature vectors is, the more dissimilar the two feature vectors are, and if the distance is smaller, the more similar the feature vectors are, the similarity between the sample pairs can be characterized by the distance between the feature vectors of the two training samples in the sample pair, and the larger the distance corresponding to the sample pair is, the lower the similarity is. Because the goal of the training model is to make the similarity between the feature vectors of the positive sample pairs in the sample group output by the model as high as possible and the similarity between the feature vectors of the negative sample pairs as low as possible, i.e., to make the distance between the positive sample and the anchor point in the sample group as close as possible and to make the distance between the negative sample and the anchor point as far as possible, for each sample group, the training loss corresponding to the sample group can be calculated based on the difference between the first distance corresponding to the positive sample and the anchor point in the sample group and the second distance corresponding to the negative sample and the anchor point. In the embodiment of the present application, the training loss corresponding to each sample group is in positive correlation with the difference between the first distance and the second distance corresponding to the sample group, and the smaller the difference is, the smaller the loss is, it can be understood that the smaller the loss is, the smaller the distance between the positive sample and the anchor point in the sample group is, and the larger the distance between the negative sample and the anchor point is.
The optional manner provided in the embodiment of the present application provides an expression of a triplet loss corresponding to an existing sample group (obtained by randomly selecting an existing sample group) for an improved triplet loss function, as follows:
(d(a,p)-d(a,n)+α) + (1)
where d (a, p) represents a first distance between an anchor point and a positive sample in the sample set, d (a, n) represents a second distance between an anchor point and a negative sample in the sample set, α is a preset parameter value, + represents that when a value in parentheses is greater than or equal to 0, the loss is a value in parentheses, the value in parentheses is less than 0, and the loss is 0.
As can be seen from the above expression, in the training process of the conventional triplet loss function, for each triplet, if the relationship of the triplet determined based on the output of the model is correct (i.e. d (a, p) -d (a, n) + α is less than 0), the loss corresponding to the triplet is directly set to 0, the loss is calculated in a hard-cut manner, and the loss corresponding to the triplet with d (a, p) -d (a, n) + α less than 0 is not considered, but these triplets also have an influence on the training result of the model, and when d (a, p) -d (a, n) + α is a negative number, the distance between the positive sample pairs in the triplet is smaller than the distance between the negative sample pairs, but if the triplet is also given a training loss, it is considered that the distance between the feature vectors of the positive sample pair output by the model and the distance between the feature vectors of the negative sample pair still need to be optimized (i.e. the distance between the feature vectors of the positive sample pair and the feature vectors of the negative sample pair remains to be optimized (i.e. the distance between the positive sample pair and the triplet also give a training loss are also given to the triplet) The difference can still be increased), the improvement of the model performance is also used, and the existing triplet loss ignores the training loss corresponding to the part of the triplets.
In view of the above problem, in the scheme for calculating the training loss corresponding to the sample group provided in the embodiment of the present application, when the total training loss corresponding to the model is calculated, the training losses corresponding to all the sample groups input into the model are considered, and the training loss corresponding to each sample group is in positive correlation with a difference between a first distance between a positive sample pair and a second distance between a negative sample pair in the sample group, that is, the larger the value is, the larger the loss is, and the smaller the value is, the smaller the loss is. The difference and the loss may be in a linear variation relationship or a nonlinear variation relationship, and the specific form of the loss function may be designed and selected based on the principle of the scheme provided by the present application.
As an alternative, the training loss corresponding to each sample set is determined based on the following expression:
s(x)=ln(1+e x ) (2)
x=d(a,p)-d(a,n)+β (3)
wherein s (x) represents the training loss corresponding to the sample group, a, p and n represent the anchor point, the positive sample and the negative sample in the sample group, respectively, d (a, p) represents the first distance, d (a, n) represents the second distance, and β represents the preset adjustment threshold.
It can be seen from expressions (2) and (3) that, in this alternative of the present application, the training loss corresponding to the sample group is exponentially decaying, rather than hard cutoff, the total training loss corresponding to the model is calculated by using this scheme, and the parameter adjustment of the loss constraint model is used, so that the trained model can better approximate the distance of the similar sample in the feature embedding space, and further improve the model performance.
In the training method provided in the embodiment of the application, for each training operation, after the training loss corresponding to each sample group is calculated based on the feature vectors of each training sample output by the model, the training loss corresponding to each sample group may be calculated based on the training loss corresponding to each sample group, for example, the sum or the average of the training losses corresponding to each sample group may be used as the total training loss, and it may be further determined whether the total training loss satisfies the preset condition, if the preset condition is satisfied, the model training may be ended, and if the preset condition is not satisfied, the model parameters may be adjusted, for example, the gradient descent algorithm may be used to adjust the model parameters, and the model may be continuously trained based on a new sample group.
It will be appreciated that, for this alternative, new sample pairs are determined based on the similarity between feature vectors of training samples output by the model, that is, new sample groups corresponding to anchor points are determined, and a new sample group corresponding to an anchor point includes a new positive sample pair corresponding to the anchor point and a new negative sample group corresponding to the anchor point.
Optionally, the determining a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of the training samples may include:
for each anchor point, respectively determining second similarity between the feature vector of the anchor point and the feature vector of each first sample, and determining the corresponding first sample with the lowest second similarity as a new positive sample corresponding to the anchor point, wherein the first sample is a training sample belonging to the same category as the anchor point in each training sample;
and for each anchor point, determining a second similarity between the feature vector of the anchor point and the feature vector of each second sample, and determining the corresponding second sample with the highest second similarity as a new negative sample corresponding to the anchor point, wherein the second sample is a training sample belonging to a different class from the anchor point in each training sample.
Similarly, in practical implementation, for each anchor point, a new sample group corresponding to the anchor point may be determined, or a plurality of new sample groups may be determined, specifically, for an anchor point, at least one sample with the lowest similarity to the anchor point in training samples of the same category may be respectively used as a positive sample corresponding to the anchor point, each positive sample is respectively combined with the anchor point to obtain a corresponding positive sample pair, at least one sample with the highest similarity to the anchor point in training samples of different categories is respectively used as a negative sample corresponding to the anchor point, each negative sample is respectively combined with the anchor point to obtain a corresponding negative sample pair, one positive sample pair and one negative sample pair corresponding to the anchor point are combined to obtain a sample group, for example, two positive samples are provided, one negative sample is provided, two positive sample pairs are provided, one negative sample pair is provided, and two positive sample pairs are respectively combined with a negative sample pair, two sample sets corresponding to the anchor point are obtained. By adopting the scheme provided by the embodiment of the application, the sample group can be conveniently and quickly expanded, the expanded sample group can be ensured to be a more difficult sample combination, and the sample combination is used for training the neural network model, so that the expression capability of the feature vector output by the trained model can be further improved.
As an alternative, taking each training sample in the training set as an anchor point, and constructing a sample group corresponding to each anchor point may include:
constructing at least one batch data set according to the training set, wherein each batch data set comprises p classes of training samples, the number of the training samples of each class is k, p is more than or equal to 2, and k is more than or equal to 3;
for each batch of data sets, respectively taking each training sample in the batch of data sets as an anchor point, and constructing a sample group corresponding to each anchor point in the batch of data sets based on each training sample in the batch of data sets;
wherein, the repeatedly executing training operation on the neural network model based on the training set may include:
repeatedly executing training operation on the neural network model based on each batch of data sets, wherein each training operation is performed based on a sample group corresponding to each anchor point in one batch of data sets;
correspondingly, the determining a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of the training samples includes:
and for each anchor point in the batch of data sets corresponding to the current training operation, determining a new sample group corresponding to the anchor point according to the second similarity between the anchor point and each training sample except the anchor point in the batch of data sets.
For the training of the model, because the number of training samples in the training set is usually large, if all samples in the training set are adopted for each iterative training, the calculation overhead is large, and for the problem, in the training of the model, the training set is divided into a plurality of batch data sets, one batch data set is also a batch, and each iterative training can be performed based on one batch data set. By adopting the scheme, the calculation amount of calculating the total training loss of the model each time can be reduced, and the stability of the loss function convergence of the model can be ensured.
In the embodiment of the present application, regarding the size of the batch data set (that is, the total number of training samples in the batch data set), the embodiment of the present application is not limited as long as at least two classes of training data in each batch data set are satisfied, and the number of the training data in each class is not less than three. After the training set is divided into a plurality of batch data sets, the initial sample group can be constructed for each batch data set, that is, three training samples in each constructed sample group are in the same batch data set, and similarly, when a new sample group is determined in the training process, the new sample group corresponding to each anchor point in the batch data set is also determined from the same batch data set.
It will be appreciated that, in performing model training operations using batch data sets, each batch data set participates in the training of the model, each batch of data sets participates in multiple training, one iteration training is carried out based on one batch of data sets, each sample in the batch of data sets is input into a neural network model to obtain the characteristic vector of each sample in the batch of data sets, the output vector of each sample is based on the model, the training loss corresponding to each sample group in each batch of data sets can be calculated, so as to obtain the total training loss of the model, if the total training loss does not meet the preset condition, the sample group corresponding to each anchor point can be updated based on the similarity among the feature vectors of all samples output by the model to obtain a new sample group, and using each obtained new sample group as each sample group in the batch of data sets for subsequent training operation.
When the model is trained based on the batch data set, the total training loss corresponding to the model (i.e., the training loss corresponding to all triples in the batch data set) may be represented as follows:
Figure BDA0003587268420000311
s(α)=ln(1+e x )
Figure BDA0003587268420000312
wherein L is th Representing the total loss of training, s (α) representing the training loss corresponding to an anchor point α in a batch data set, that is, the training loss of a triplet corresponding to the anchor point α, where a batch data set may include p × k triples, that is, each sample in the batch data set is used as an anchor point, respectively, to construct a triplet corresponding to each anchor point,
Figure BDA0003587268420000313
representing the distance between the feature vectors of the positive sample pairs in the triplet corresponding to the anchor point alpha,
Figure BDA0003587268420000314
the distance between the feature vectors of the pairs of negative examples in the triplet corresponding to the anchor point alpha, it will be appreciated that, the first time a batch of data sets is input into the model,
Figure BDA0003587268420000315
and
Figure BDA0003587268420000316
the distances corresponding to the pairs of samples in the initial triplet of anchor points a, calculated from the output of the model, are, in addition to the first input into the model,
Figure BDA0003587268420000317
and
Figure BDA0003587268420000318
it is the distance corresponding to the sample pair in the new triplet (i.e. the most difficult sample combination) of anchor point a.
It should be noted that, in practical application, it is assumed that the nth training operation is performed based on the batch data set 1, and after the nth training operation is completed and each new sample group corresponding to the batch data set 1 is obtained, the (n + 1) th operation may be performed based on the new sample group of the batch data set 1, or may be performed based on original sample groups corresponding to other batch data sets or new sample groups of other batch data sets. That is to say, in the training process of the neural network model, each training operation is based on which batch of data sets, which is not limited in the embodiments of the present application, as long as each batch of data sets can participate in the training of the model for multiple times, for example, the minimum number of times that each batch of data sets participates in the training of the model may be preset, and the number of times that multiple batches of data sets participate in the training of the model may be approximately the same, so that the number of times that each training sample in the training set participates in the training of the model is substantially balanced.
Optionally, assuming that the training set is divided into 3 sets of data sets, which are denoted as S1, S2, and S3, in the mode of sampling the training set, when the model is trained, the model may be trained sequentially by using the initial triples in S1, S2, and S3 (all three sets of data participate in one training operation, that is, a generation of training is completed), and then the training operations may be performed sequentially by using the corresponding updated triples in S1, S2, and S3, so that the trained feature extraction model satisfying the preset condition is obtained by continuously repeating the training.
The trained feature extraction model obtained by the training method provided by the embodiment of the application can extract the feature vector with good distinctiveness when performing feature extraction on the data to be processed input into the model, so that the further processing of the data to be processed can be realized based on the feature vector, for example, the type of the data to be processed is identified, the similarity degree between different data can be judged based on the feature vectors of different data to be processed, and the method can be used for classification of data sets and the like. The data to be processed may be various forms of data, and may include, but is not limited to, text, characters (such as numerical values), images, or other forms of data. The trained feature extraction model can be applied to any scene needing to extract feature vectors with good distinguishing capability, for example, the feature extraction model can be used as a feature extraction module in other models (such as a classification model, a similarity judgment model and the like).
In order to test the effect of the feature extraction model obtained by training with the training method provided by the embodiment of the application, after the feature extraction model which meets the preset conditions, namely the embedding model, is obtained based on the training set, the model is tested on the time-span test set, the hidden vector (namely the feature vector) of the test data in the test set is extracted, and the test result shows that the iv Value (Information Value, Information Value used for measuring the prediction capability of the features) of the model is up to 2.7. In addition, on a cross-time test set, the feature vectors extracted by the feature extraction model are spliced together with the feature vectors extracted by the feature extraction model obtained by training in the prior art and sent to the conventional machine learning models such as xgboost and the like for testing the classification effect of features by a classification task, and a test result shows that the AUC (Area Under ROC Curve) index of the test set Under the xgboost model can be improved by 5 percentage points, and the KS value (a model evaluation index) is improved by 4 percentage points. Experiments prove that the hidden vector features with good discrimination on classification tasks can be extracted and obtained by training the feature extraction model applied to a specific scene by adopting the training method provided by the embodiment of the application. An embodiment of the present application further provides a data processing method, as shown in fig. 6, the method may include the following steps:
step S210: acquiring data to be processed;
step S220: inputting data to be processed into a first feature extraction model, and extracting a first feature vector of the data to be processed through the first feature extraction model, wherein the first feature extraction model is obtained by training through a training method in any optional embodiment of the application;
step S230: and determining a classification result corresponding to the data to be processed based on the first feature vector.
The form of the data to be processed is not limited in the embodiment of the present application, and the form of the data to be processed may be different for different application scenarios. For example, the data to be processed may be a text, an image, or a feature matrix obtained by processing the service data. For different application scenarios and application requirements, the form of the data processing result may also be different, for example, if it is desired to identify the type corresponding to the data to be processed and the data classification result corresponding to the data processing result, if it is desired to determine whether the data to be processed and other data are similar, the similarity may be calculated based on the first feature vector of the data to be processed and the feature vectors of other data, and the data processing result is the calculated similarity or the similarity determination result determined according to the similarity and the set threshold. In this embodiment, the description is performed by taking an example that the feature extraction model obtained by training with the training method provided in the embodiment of the present application is applied to data classification, and the accuracy of the classification result can be effectively improved by using the feature vector extracted by the feature extraction model.
Optionally, the data processing method may further include: extracting a second feature vector of the data to be processed through a second feature extraction model;
correspondingly, in step S230, determining a data processing result corresponding to the data to be processed based on the first feature vector may include:
fusing the first feature vector and the second feature vector;
and determining a data processing result of the data to be processed based on the fused features.
In order to further improve the accuracy of the classification result, in the optional mode of the present application, feature vectors of the data to be processed, which are extracted by the multiple feature extraction models, may be fused, and the classification result may be determined by fusing the features. Tests prove that the feature vectors of the data to be processed extracted by the feature extraction model obtained by training by the training method provided by the embodiment of the application are fused with the feature vectors of the data to be processed extracted by other feature extraction models and then are used for classification tasks, so that the accuracy of the final classification result can be effectively improved.
The embodiment of the present application is not limited to the model structure and the training mode of the second feature extraction model, and the second feature extraction model may be a feature extraction model obtained by training in an existing training mode. The specific fusion mode of the feature vectors is not limited in the embodiment of the present application, and may include but is not limited to at least one of splicing, adding, or calculating an average value of the first feature vector and the second feature vector.
In an optional embodiment of the application, the acquiring the data to be processed may include:
acquiring service data of a target service in multiple time periods corresponding to a target object, wherein the service data corresponding to each time period comprises an attribute value of at least one service attribute of the target service;
based on the service data corresponding to the multiple time periods, a service time sequence characteristic matrix corresponding to the target object is constructed, the service characteristic matrix is used as data to be processed, and the classification result represents the object type of the target object
The number of rows of the service time sequence characteristic matrix is the number of time periods of a plurality of time periods, the number of columns is the number of attributes of at least one service attribute, and each element value in the service time sequence characteristic matrix represents an attribute value of one service attribute corresponding to one time period.
As can be known from the application scenario embodiments provided in the foregoing, the data processing method provided in the embodiments of the present application may be applied to a type identification scenario of a target object, and optionally, the classification identification result may be obtained through a classification model that includes the first feature extraction model, for example, the service timing feature matrix may be input into the classification model, a first feature vector of the feature matrix is extracted through the first feature extraction model of the classification model, a second feature vector of the feature matrix is extracted through the second feature extraction model, the first feature vector and the second feature vector are spliced together, and a classification result is obtained through the classification module, which does not limit the specific structure of the classification module in the embodiments of the present application. Alternatively, the classification model may include, but is not limited to, a classification model based on a conventional machine learning model such as xgboost.
Based on the same principle as the training method provided in the embodiment of the present application, the embodiment of the present application further provides a training apparatus for feature extraction models, as shown in fig. 7, the training apparatus 100 includes a training data obtaining module 110, a training data processing module 120, and a model training module 130.
A training data obtaining module 110, configured to obtain a training set, where the training set includes training samples of multiple categories;
a training data processing module 120, configured to construct a plurality of sample pairs based on a training set, where the plurality of sample pairs include a plurality of positive sample pairs and a plurality of negative sample pairs, where a positive sample pair includes two training samples of a same category, and a negative sample pair includes two training samples of different categories;
a model training module 130, configured to repeatedly perform a training operation on the neural network model based on a training set until a preset condition is met, and use the neural network model meeting the preset condition as a trained feature extraction model; wherein, the preset condition includes that the total loss of training corresponding to the neural network model converges or the training frequency reaches the set frequency, and the training operation includes:
respectively inputting each training sample in the plurality of sample pairs into a neural network model to obtain a feature vector of each training sample; determining a total loss of training based on a first similarity between feature vectors of training samples in each sample pair; if the total training loss is not converged and the training times do not reach the set times, adjusting model parameters of the neural network model, determining a plurality of new sample pairs based on the second similarity between the feature vectors of the training samples, and taking the new sample pairs as the sample pairs based on the subsequent training operation.
Optionally, the model training module may be configured to: for each training sample, respectively determining a second similarity between the feature vector of the training sample and the feature vector of each first sample, and taking the corresponding first sample with the lowest second similarity and the training sample as a new positive sample pair, wherein the first sample is a training sample belonging to the same class as the training sample in each training sample; and for each training sample, respectively determining a second similarity between the feature vector of the training sample and the feature vector of each second sample, and taking the corresponding second sample with the highest second similarity and the training sample as a new negative sample pair, wherein the second sample is a training sample belonging to a different class from the training sample in each training sample.
Optionally, the training data processing module may be configured to: respectively taking each training sample in the training set as an anchor point, and constructing a sample group corresponding to each anchor point, wherein the sample group corresponding to each anchor point comprises a positive sample pair and a negative sample pair corresponding to the anchor point, the positive sample pair corresponding to one anchor point comprises the anchor point and the positive sample of the anchor point, and the negative sample pair corresponding to one anchor point comprises the anchor point and the negative sample of the anchor point;
accordingly, the model training module, in determining the total loss of training, may be configured to: for each sample group, determining training loss corresponding to the sample group according to a first similarity between the feature vectors of two samples in a positive sample pair of the sample group and a first similarity between the feature vectors of two samples in a negative sample pair of the sample group; determining the total training loss according to the training loss corresponding to each sample group;
the model training module, in determining a plurality of new sample pairs, may be configured to: and determining a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of the training samples, and taking the sample pairs in the new sample group corresponding to each anchor point as a plurality of sample pairs in the subsequent training operation.
Optionally, the model training module may be configured to, when determining a new sample group corresponding to each anchor point: for each anchor point, respectively determining second similarity between the feature vector of the anchor point and the feature vector of each first sample, and determining the corresponding first sample with the lowest second similarity as a new positive sample corresponding to the anchor point, wherein the first sample is a training sample belonging to the same category as the anchor point in each training sample; and for each anchor point, determining a second similarity between the feature vector of the anchor point and the feature vector of each second sample, and determining the corresponding second sample with the highest second similarity as a new negative sample corresponding to the anchor point, wherein the second sample is a training sample belonging to a different class from the anchor point in each training sample.
Optionally, the training data processing module may be configured to, when constructing the sample group corresponding to each anchor point: constructing at least one batch data set according to the training set, wherein each batch data set comprises p classes of training samples, the number of the training samples of each class is k, p is more than or equal to 2, and k is more than or equal to 3; for each batch of data sets, respectively taking each training sample in the batch of data sets as an anchor point, and constructing a sample group corresponding to each anchor point in the batch of data sets based on each training sample in the batch of data sets;
accordingly, the model training module may be configured to: repeatedly executing training operation on the neural network model based on each batch of data sets, wherein each training operation is performed based on a sample group corresponding to each anchor point in one batch of data sets;
the model training module, when determining the new sample set corresponding to each anchor point, may be configured to: and for each anchor point in the batch of data sets corresponding to the current training operation, determining a new sample group corresponding to the anchor point according to the second similarity between the anchor point and each training sample except the anchor point in the batch of data sets.
Optionally, for each sample group, the model training module when determining the training loss corresponding to the sample group may be configured to:
determining a first distance between the feature vectors of the two samples in the positive sample pair of the sample group and a second distance between the feature vectors of the two samples in the negative sample pair of the sample group, wherein the first distance represents a first similarity corresponding to the positive sample pair of the sample group, and the second distance represents a first similarity corresponding to the negative sample pair of the sample group;
determining a difference between the first distance and the second distance; and determining the training loss corresponding to the sample group according to the difference, wherein the training loss corresponding to the sample group is positively correlated with the difference.
Optionally, the training loss corresponding to each sample group is determined based on the following expression:
s(x)=ln(1+e x )
x=d(a,p)-d(a,n)+β
wherein s (x) represents the training loss corresponding to the sample group, a, p and n represent the anchor point, the positive sample and the negative sample in the sample group, respectively, d (a, p) represents the first distance, d (a, n) represents the second distance, and β represents the preset adjustment threshold.
Based on the same principle as the data processing method provided by the embodiment of the present application, the embodiment of the present application further provides a data processing apparatus based on a neural network model, as shown in fig. 8, the data processing apparatus 200 includes a data obtaining module 210 and a data processing model 220.
A data obtaining module 210, configured to obtain data to be processed;
the data processing module 220 is configured to input the data to be processed into the first feature extraction model, extract a first feature vector corresponding to the data to be processed through the first feature extraction model, and obtain a classification result corresponding to the data to be processed through the classification module based on the first feature vector, where the first feature extraction model is obtained by training using the training method provided in any optional embodiment of the present application.
Optionally, the data processing module 220 may be specifically configured to: extracting a second feature vector of the data to be processed through the second feature extraction model; fusing the first feature vector and the second feature vector; and determining a classification result corresponding to the data to be processed based on the fused features.
Optionally, the data obtaining module may be configured to: acquiring service data of a target service in multiple time periods corresponding to a target object, wherein the service data corresponding to each time period comprises an attribute value of at least one service attribute of the target service; based on the service data corresponding to a plurality of time periods, constructing a service characteristic matrix corresponding to the target object, taking the service characteristic matrix as data to be processed, and representing the object type of the target object by the classification result; the number of rows of the service characteristic matrix is the number of time periods of a plurality of time periods, the number of columns is the number of attributes of the at least one service attribute, and each element value in the service characteristic matrix represents an attribute value of one service attribute corresponding to one time period.
The apparatus in the embodiment of the present application may perform the method corresponding to the apparatus provided in the embodiment of the present application, and the implementation principle is similar, the actions performed by the modules in the apparatus in the embodiments of the present application correspond to the steps in the method corresponding to the embodiments of the present application, and for the detailed functional description of the modules in the apparatus, reference may be made to the description in the corresponding method shown in the foregoing specifically, and details are not repeated here. The training device and the data processing device provided by the embodiment of the application can be any electronic equipment.
An electronic device is further provided in an embodiment of the present application, where the electronic device may include a memory, a processor, and a computer program stored in the memory, and the processor may implement the method in any optional embodiment of the present application when executing the computer program stored in the memory.
Optionally, fig. 9 shows a schematic structural diagram of an electronic device applicable to the embodiment of the present invention, as shown in fig. 9, the electronic device may be a server or a user terminal, and the electronic device may be configured to implement the method provided in any embodiment of the present invention.
As shown in fig. 9, the electronic device 2000 may include at least one processor 2001, a memory 2002, a communication module 2003, an input/output interface 2004 and the like, and optionally, the components may be connected via a bus 2005 for communication. It should be noted that the structure of the electronic device 2000 shown in fig. 9 is only schematic and does not limit the electronic device to which the method provided in the embodiment of the present application is applied.
The memory 2002 may be used to store an operating system, application programs, and the like, and the application programs may include computer programs that implement the methods illustrated in the embodiments of the present invention when called by the processor 2001, and may also include programs for implementing other functions or services. The Memory 2002 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and computer programs, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
The processor 2001 is connected to the memory 2002 via the bus 2005, and realizes a corresponding function by calling an application program stored in the memory 2002. The Processor 2001 may be a CPU (Central Processing Unit), a general purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof, which may implement or execute the various exemplary logic blocks, modules, and circuits described in connection with the present disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.
The electronic device 2000 may be connected to a network through a communication module 2003 (which may include, but is not limited to, components such as a network interface, etc.) to enable interaction of data with other devices (such as a user terminal or a server, etc.) through the network, such as sending data to or receiving data from other devices. The communication module 2003 may include a wired network interface, a wireless network interface, and/or the like, that is, the communication module may include at least one of a wired communication module or a wireless communication module.
The electronic device 2000 may be connected to a desired input/output device, such as a keyboard, a display device, etc., through the input/output interface 2004, and the electronic device 2000 may have a display device itself, and may be connected to other display devices through the interface 2004. Optionally, a storage device, such as a hard disk, may be connected through the interface 2004, so as to store data in the electronic device 2000 in the storage device, or read data in the storage device, and store data in the storage device in the memory 2002. It is to be appreciated that the input/output interface 2004 can be a wired interface or a wireless interface. Depending on the actual application scenario, the device connected to the input/output interface 2004 may be a component of the electronic device 2000, or may be an external device connected to the electronic device 2000 when necessary.
The bus 2005 used to connect the components may include a path that carries information between the components. The bus 2005 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 2005 can be classified into an address bus, a data bus, a control bus, and the like according to functions.
Alternatively, for the solution provided in the embodiment of the present invention, the memory 2002 may be used for storing a computer program for executing the solution of the present invention, and the processor 2001 executes the computer program to implement the actions of the method or apparatus provided in the embodiment of the present invention.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program can implement the corresponding content of the foregoing method embodiment.
The embodiment of the present application further provides a computer program product, which includes a computer program that, when being executed by a processor, can implement the corresponding content of the foregoing method embodiment.
It should be noted that the terms "first," "second," "third," "fourth," "1," "2," and the like (if any) in the description and claims of this application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in the present application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of the present application are also within the protection scope of the embodiments of the present application without departing from the technical idea of the present application.

Claims (15)

1. A training method of a feature extraction model is characterized by comprising the following steps:
acquiring a training set, wherein the training set comprises training samples of a plurality of categories;
constructing a plurality of sample pairs based on the training set, the plurality of sample pairs including a plurality of positive sample pairs and a plurality of negative sample pairs, wherein the positive sample pairs include two training samples of the same class, and the negative sample pairs include two training samples of the different classes;
repeatedly executing training operation on the neural network model based on the training set until a preset condition is met, and taking the neural network model meeting the preset condition as a trained feature extraction model; wherein the preset condition includes that the total loss of training corresponding to the neural network model converges or the number of training times reaches a set number of times, and the training operation includes:
respectively inputting each training sample in a plurality of sample pairs into the neural network model to obtain a feature vector of each training sample;
determining a total training loss based on a first similarity between feature vectors of training samples in each of the sample pairs;
if the total training loss is not converged and the training times do not reach the set times, adjusting model parameters of the neural network model, determining a plurality of new sample pairs based on the second similarity between the feature vectors of the training samples, and taking the new sample pairs as the sample pairs based on the subsequent training operation.
2. The method of claim 1, wherein determining a plurality of new sample pairs based on the second similarity between the feature vectors of the respective training samples comprises:
for each training sample, respectively determining a second similarity between the feature vector of the training sample and the feature vector of each first sample, and taking the corresponding first sample with the lowest second similarity and the training sample as a new positive sample pair, wherein the first sample is a training sample belonging to the same class as the training sample in each training sample;
and for each training sample, respectively determining a second similarity between the feature vector of the training sample and the feature vector of each second sample, and taking the corresponding second sample with the highest second similarity and the training sample as a new negative sample pair, wherein the second sample is a training sample belonging to a different class from the training sample in each training sample.
3. The method of claim 1, wherein constructing a plurality of sample pairs based on training samples in the training set comprises:
respectively taking each training sample in the training set as an anchor point, and constructing a sample group corresponding to each anchor point, wherein the sample group corresponding to each anchor point comprises a positive sample pair and a negative sample pair corresponding to the anchor point, the positive sample pair corresponding to one anchor point comprises the anchor point and the positive sample of the anchor point, and the negative sample pair corresponding to one anchor point comprises the anchor point and the negative sample of the anchor point;
determining a total training loss based on a first similarity between feature vectors of training samples in each of the sample pairs, comprising:
for each sample group, determining training loss corresponding to the sample group according to a first similarity between the feature vectors of two samples in a positive sample pair of the sample group and a first similarity between the feature vectors of two samples in a negative sample pair of the sample group;
determining a total training loss according to the training loss corresponding to each sample group;
determining a plurality of new sample pairs based on the second similarity between the feature vectors of the respective training samples, including:
and determining a new sample group corresponding to each anchor point based on the second similarity between the feature vectors of the training samples, and taking the sample pairs in the new sample group corresponding to each anchor point as a plurality of sample pairs in the subsequent training operation.
4. The method of claim 3, wherein determining a new set of samples corresponding to each anchor point based on the second similarity between the feature vectors of the training samples comprises:
for each anchor point, respectively determining second similarity between the feature vector of the anchor point and the feature vector of each first sample, and determining the corresponding first sample with the lowest second similarity as a new positive sample corresponding to the anchor point, wherein the first sample is a training sample belonging to the same category as the anchor point in each training sample;
and for each anchor point, determining a second similarity between the feature vector of the anchor point and the feature vector of each second sample, and determining the corresponding second sample with the highest second similarity as a new negative sample corresponding to the anchor point, wherein the second sample is a training sample belonging to a different category from the anchor point in each training sample.
5. The method according to claim 3 or 4, wherein the constructing a sample group corresponding to each anchor point by using each training sample in the training set as an anchor point respectively comprises:
constructing at least one batch data set according to the training set, wherein each batch data set comprises p classes of training samples, the number of the training samples of each class is k, p is more than or equal to 2, and k is more than or equal to 3;
for each batch of data sets, taking each training sample in the batch of data sets as an anchor point, and constructing a sample group corresponding to each anchor point in the batch of data sets based on each training sample in the batch of data sets;
the repeatedly performing training operations on the neural network model based on the training set includes:
repeatedly executing training operation on the neural network model based on each batch of data sets, wherein each training operation is performed based on a sample group corresponding to each anchor point in one batch of data sets;
determining a new sample set corresponding to each anchor point based on the second similarity between the feature vectors of the training samples, including:
and for each anchor point in the batch of data sets corresponding to the current training operation, determining a new sample group corresponding to the anchor point according to the second similarity between the anchor point and each training sample except the anchor point in the batch of data sets.
6. The method of claim 3 or 4, wherein for each sample group, determining the training loss corresponding to the sample group according to the first similarity between the feature vectors of the two samples in the positive sample pair of the sample group and the first similarity between the feature vectors of the two samples in the negative sample pair of the sample group comprises:
determining a first distance between the feature vectors of the two samples in the positive sample pair of the sample group and a second distance between the feature vectors of the two samples in the negative sample pair of the sample group, wherein the first distance characterizes a first similarity corresponding to the positive sample pair of the sample group, and the second distance characterizes a first similarity corresponding to the negative sample pair of the sample group;
determining a difference between the first distance and the second distance;
and determining the training loss corresponding to the sample group according to the difference, wherein the training loss corresponding to the sample group is positively correlated with the difference.
7. The method of claim 6, wherein the training loss for each of the sample sets is determined based on the following expression:
s(x)=ln(1+e x )
x=d(a,p)-d(a,n)+β
wherein s (x) represents the training loss corresponding to the sample group, a, p and n represent the anchor point, the positive sample and the negative sample in the sample group, respectively, d (a, p) represents the first distance, d (a, n) represents the second distance, and β represents the preset adjustment threshold.
8. A data processing method based on a neural network model is characterized by comprising the following steps:
acquiring data to be processed;
inputting the data to be processed into a first feature extraction model, and extracting a first feature vector corresponding to the data to be processed through the first feature extraction model, wherein the first feature extraction model is obtained by training by adopting the method of any one of claims 1 to 7;
and determining a classification result corresponding to the data to be processed based on the first feature vector.
9. The method of claim 8, further comprising:
extracting a second feature vector of the data to be processed through a second feature extraction model;
the determining, based on the first feature vector, a classification result corresponding to the data to be processed includes:
fusing the first feature vector and the second feature vector;
and determining a classification result corresponding to the data to be processed based on the fused features.
10. The method according to claim 8 or 9, wherein the acquiring the data to be processed comprises:
acquiring service data of a target service in multiple time periods corresponding to a target object, wherein the service data corresponding to each time period comprises an attribute value of at least one service attribute of the target service;
based on the service data corresponding to the plurality of time periods, constructing a service time sequence characteristic matrix corresponding to the target object, and taking the service time sequence characteristic matrix as the data to be processed, wherein the classification result represents the object type of the target object;
the number of rows of the service timing characteristic matrix is the number of time periods of the plurality of time periods, the number of columns is the number of attributes of the at least one service attribute, and each element value in the service timing characteristic matrix represents an attribute value of one service attribute corresponding to one time period.
11. A training device for a feature extraction model, comprising:
the training data acquisition module is used for acquiring a training set, and the training set comprises a plurality of classes of training samples;
a training data processing module, configured to construct a plurality of sample pairs based on the training set, where the plurality of sample pairs include a plurality of positive sample pairs and a plurality of negative sample pairs, where the positive sample pairs include two training samples of a same category, and the negative sample pairs include two training samples of the different categories;
the model training module is used for repeatedly executing training operation on the neural network model based on the training set until a preset condition is met, and taking the neural network model meeting the preset condition as a trained feature extraction model; wherein the preset condition includes that the total loss of training corresponding to the neural network model converges or the number of training times reaches a set number of times, and the training operation includes:
respectively inputting each training sample in a plurality of sample pairs into the neural network model to obtain a feature vector of each training sample;
determining a total training loss characterizing a degree of difference between positive sample pairs and a degree of similarity between negative sample pairs in the plurality of sample pairs based on a first degree of similarity between feature vectors of training samples in each of the sample pairs;
if the total training loss is not converged and the training times do not reach the set times, adjusting model parameters of the neural network model, determining a plurality of new sample pairs based on the second similarity between the feature vectors of the training samples, and taking the new sample pairs as the sample pairs based on the subsequent training operation.
12. A data processing apparatus based on a neural network model, comprising:
the data acquisition module is used for acquiring data to be processed;
the data processing module is used for inputting the data to be processed into a first feature extraction model, extracting a first feature vector corresponding to the data to be processed through the first feature extraction model, and determining a classification result corresponding to the data to be processed based on the first feature vector;
wherein the first feature extraction model is trained using the method of any one of claims 1 to 7.
13. An electronic device, characterized in that the electronic device comprises a memory in which a computer program is stored and a processor that executes the computer program to implement the method of any of claims 1 to 7 or to implement the method of any of claims 8 to 10.
14. A computer-readable storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method of any one of claims 1 to 7 or carries out the method of any one of claims 8 to 10.
15. A computer program product, characterized in that the computer product comprises a computer program which, when executed by a processor, implements the method of any one of claims 1 to 7 or implements the method of any one of claims 8 to 10.
CN202210369228.2A 2022-04-08 2022-04-08 Training method of feature extraction model, data processing method, device and equipment Pending CN115130536A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210369228.2A CN115130536A (en) 2022-04-08 2022-04-08 Training method of feature extraction model, data processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210369228.2A CN115130536A (en) 2022-04-08 2022-04-08 Training method of feature extraction model, data processing method, device and equipment

Publications (1)

Publication Number Publication Date
CN115130536A true CN115130536A (en) 2022-09-30

Family

ID=83376396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210369228.2A Pending CN115130536A (en) 2022-04-08 2022-04-08 Training method of feature extraction model, data processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN115130536A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226674A (en) * 2023-05-06 2023-06-06 中国建筑西南设计研究院有限公司 Layout model training method, layout method and device for frame beams

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226674A (en) * 2023-05-06 2023-06-06 中国建筑西南设计研究院有限公司 Layout model training method, layout method and device for frame beams
CN116226674B (en) * 2023-05-06 2023-09-05 中国建筑西南设计研究院有限公司 Layout model training method, layout method and device for frame beams

Similar Documents

Publication Publication Date Title
CN111444952B (en) Sample recognition model generation method, device, computer equipment and storage medium
CN114265979B (en) Method for determining fusion parameters, information recommendation method and model training method
CN111966914B (en) Content recommendation method and device based on artificial intelligence and computer equipment
CN111444951B (en) Sample recognition model generation method, device, computer equipment and storage medium
CN110310114B (en) Object classification method, device, server and storage medium
CN111932386A (en) User account determining method and device, information pushing method and device, and electronic equipment
CN112257841A (en) Data processing method, device and equipment in graph neural network and storage medium
CN113449011A (en) Big data prediction-based information push updating method and big data prediction system
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN112817563B (en) Target attribute configuration information determining method, computer device, and storage medium
CN112667979A (en) Password generation method and device, password identification method and device, and electronic device
CN114358109A (en) Feature extraction model training method, feature extraction model training device, sample retrieval method, sample retrieval device and computer equipment
CN110222838B (en) Document sorting method and device, electronic equipment and storage medium
CN115100717A (en) Training method of feature extraction model, and cartoon object recognition method and device
CN115130536A (en) Training method of feature extraction model, data processing method, device and equipment
CN111582341B (en) User abnormal operation prediction method and device
CN110765352B (en) User interest identification method and device
CN112364198A (en) Cross-modal Hash retrieval method, terminal device and storage medium
CN112131199A (en) Log processing method, device, equipment and medium
CN114546804A (en) Information push effect evaluation method and device, electronic equipment and storage medium
CN115186096A (en) Recognition method, device, medium and electronic equipment for specific type word segmentation
CN115114329A (en) Method and device for detecting data stream abnormity, electronic equipment and storage medium
CN116628236B (en) Method and device for delivering multimedia information, electronic equipment and storage medium
CN117786234B (en) Multimode resource recommendation method based on two-stage comparison learning
CN117078789B (en) Image processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination