CN113255824B

CN113255824B - Method and apparatus for training classification model and data classification

Info

Publication number: CN113255824B
Application number: CN202110658623.8A
Authority: CN
Inventors: 詹忆冰; 韩梦雅
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2023-12-08
Anticipated expiration: 2041-06-15
Also published as: CN113255824A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for training classification models and data classification. The specific implementation mode of the method comprises the following steps: the following training steps are performed: selecting at least one sample from the sample set; extracting a concept representation of each sample and a concept representation of each category based on a concept representation network; calculating the prediction probability of the category to which each sample belongs according to the distance between the conceptual representation of each sample and the conceptual representation of the category to which each sample belongs; calculating a total loss value according to the prediction probability of the category to which each sample belongs and the category label; if the total loss value is less than the predetermined threshold, a classification model is constructed based on the concept-characterization network. This embodiment enables learning of robust, trusted knowledge of new classes from limited labeling samples.

Description

Method and apparatus for training classification model and data classification

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for training classification models and data classification.

Background

Deep learning has been increasingly applied to various industries of people's life, work, study, such as face recognition, commodity retrieval, etc., due to its excellent data learning ability and excellent task performance. However, due to the complexity of the model, massive labeled data with labels, which are collected for a certain task, are often required to train for deep learning, so that a deep learning model with stable performance and high confidence coefficient can be obtained.

However, in real life scenarios, it is often difficult to obtain large amounts of tagged data: 1) In some scenes, such as a commodity retrieval scene, although a large amount of commodity data exists, most commodity data does not have direct labeling, and manual labeling data is high in price, and time and labor are wasted; 2) In some scenes, such as medical scenes, it is difficult to collect a large number of samples of data of some diseases, for example, rare diseases may only collect data of one patient, so that the data diversity is insufficient, and the depth model with good generalization performance cannot be obtained by training the data.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatus for training classification models and data classification.

In a first aspect, embodiments of the present disclosure provide a method of training a classification model, comprising: the following training steps are performed: selecting at least one sample from a sample set, wherein the samples in the sample set have category labels; extracting a concept representation of each sample and a concept representation of each category based on a concept representation network; calculating the prediction probability of the category to which each sample belongs according to the distance between the conceptual representation of each sample and the conceptual representation of the category to which each sample belongs; calculating a total loss value according to the prediction probability of the category to which each sample belongs and the category label; if the total loss value is less than the predetermined threshold, a classification model is constructed based on the concept-characterization network.

In some embodiments, the method further comprises: and if the total loss value is not smaller than the preset threshold value, adjusting related parameters of the concept characterization network, and continuing to execute the training step.

In some embodiments, extracting the concept representation for each sample and the concept representation for each category based on the concept representation network comprises: extracting a concept representation of each sample based on a concept representation network; clustering the concept characterizations of the samples with the same category labels to obtain the concept characterizations of each category.

In some embodiments, the concept characterization network includes a feature extraction network, a regional self-attention mechanism network, and a concept aggregation pooling network; and extracting a concept representation of each sample and a concept representation of each category based on the concept representation network, comprising: inputting at least one sample into a feature extraction network respectively to obtain the regional features of each sample; the regional characteristics of each sample are respectively input into a regional self-attention mechanism network to obtain the enhanced regional characteristics of each sample; respectively inputting the enhanced region characteristics of each sample into a concept aggregation pooling network to obtain concept characterization of each sample; clustering the concept characterizations of the samples with the same category labels to obtain the concept characterizations of each category.

In some embodiments, the method further comprises: and selecting a characteristic extraction network with the number of network layers positively related to the calculated amount according to the calculated amount of the field of the sample set application.

In some embodiments, the method for inputting the regional characteristics of each sample into the regional self-attention mechanism network respectively to obtain the enhanced regional characteristics of each sample comprises the following steps: respectively encoding the position information of the regional characteristics of each sample to obtain the position code of each sample; calculating global average characteristics of the regional characteristics of each sample respectively to obtain global context information of each sample; forming the regional characteristics, the position codes and the global context information of each sample into regional information of each sample; and respectively inputting the regional information of each sample into a regional self-attention mechanism network to obtain the enhanced regional characteristics of each sample.

In some embodiments, the respective input of the enhanced region features for each sample into the concept aggregation pooling network results in a concept representation for each sample, comprising: respectively inputting the enhanced region characteristics of each sample into an attention pooling network to obtain a first conceptual representation of each sample; respectively carrying out average pooling on the enhanced region characteristics of each sample to obtain a second conceptual representation of each sample; a weighted sum of the first and second conceptual representations of each sample is determined as the conceptual representation of each sample.

In some embodiments, the category labels are smoothed labels.

In a second aspect, embodiments of the present disclosure provide a data classification method, including: acquiring target data to be classified and at least one class of sample data set; the target data and the sample data set are input into the classification model generated by the method according to any one of the first aspect, and the prediction probability of the category to which the target data belongs is output.

In a third aspect, embodiments of the present disclosure provide an apparatus for training a classification model, comprising: a selection unit configured to select at least one sample from a sample set, wherein the samples in the sample set have category labels; an extraction unit configured to extract a concept representation of each sample and a concept representation of each category based on the concept representation network; a prediction unit configured to calculate a prediction probability of a category to which each sample belongs according to a distance between a conceptual representation of each sample and the category to which the sample belongs; a calculation unit configured to calculate a total loss value from the prediction probability and the class label of the class to which each sample belongs; and a circulation unit configured to construct a classification model based on the concept characterization network if the total loss value is less than a predetermined threshold.

In a fourth aspect, embodiments of the present disclosure provide a data classification apparatus, comprising: an acquisition unit configured to acquire target data to be classified and a sample data set of at least one category; a classification unit configured to input the target data and the sample data set into a classification model generated by the method according to any one of the first aspects, and output a prediction probability of a category to which the target data belongs.

In a fifth aspect, embodiments of the present disclosure provide an electronic device for outputting information, comprising: one or more processors; a storage device having one or more computer programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as in any of the first and second aspects.

In a sixth aspect, embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method as in any one of the first and second aspects.

According to the method and the device for training the classification model and the data classification, through learning of the concept representation of the sample and the concept representation of the category, the weight of the information related to the concept is increased, the influence of the background, noise and the information unrelated to the sample on the concept representation is eliminated, and the problem of how to further robustly acquire the concept representation of the single data when the data content contains the information unrelated to the concept under the condition of a small sample can be solved. And obtaining the conceptual representation of the data of the whole category by summarizing the conceptual representation of the single data. Taking an image as an example, namely taking a local area of the image as an information basic unit of the image, and identifying and summarizing area information with concept related information by utilizing an adaptive method.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram to which the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method of training a classification model according to the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method of training a classification model according to the present disclosure;

FIG. 4 is a schematic structural view of one embodiment of an apparatus for training a classification model according to the present disclosure;

FIG. 5 is a flow chart of one embodiment of a data classification method according to the present disclosure;

FIG. 6 is a schematic diagram of a structure of one embodiment of a data sorting apparatus according to the present disclosure;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates an exemplary system architecture 100 of a method of training a classification model, an apparatus of training a classification model, a method of data classification, or an apparatus of data classification to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include terminals 101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing a communication link between the terminals 101, 102, the database server 104 and the server 105. The network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user 110 may interact with the server 105 via the network 103 using the terminals 101, 102 to receive or send messages or the like. The terminals 101, 102 may have various client applications installed thereon, such as model training class applications, data classification applications, shopping class applications, payment class applications, web browsers, instant messaging tools, and the like.

The terminals 101 and 102 may be hardware or software. When the terminals 101, 102 are hardware, they may be various electronic devices with display screens, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video experts compression standard audio layer 3), laptop and desktop computers, and the like. When the terminals 101, 102 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

When the terminals 101, 102 are hardware, an image acquisition device may also be mounted thereon. The image capturing device may be various devices capable of implementing the function of capturing images, such as a camera, a sensor, and the like. The user 110 may acquire images using an image acquisition device on the terminal 101, 102.

Database server 104 may be a database server that provides various services. For example, a database server may have stored therein a sample set. The sample set contains a small number of samples. Wherein the sample has a category label. Thus, the user 110 may also select samples from the sample set stored by the database server 104 via the terminals 101, 102.

The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the terminals 101, 102. The background server may train the initial model using the samples in the sample set sent by the terminals 101, 102, and may send the training results (e.g., the generated classification model) to the terminals 101, 102. In this way, the user can apply the generated classification model for data classification.

The database server 104 and the server 105 may be hardware or software. When they are hardware, they may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein. Database server 104 and server 105 may also be servers of a distributed system or servers that incorporate blockchains. Database server 104 and server 105 may also be cloud servers, or intelligent cloud computing servers or intelligent cloud hosts with artificial intelligence technology.

It should be noted that, the method for training the classification model or the method for classifying data provided by the embodiments of the present disclosure is generally performed by the server 105. Accordingly, means for training a classification model or means for data classification are typically also provided in the server 105.

It should be noted that the database server 104 may not be provided in the system architecture 100 in cases where the server 105 may implement the relevant functions of the database server 104.

It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of training a classification model according to the present disclosure is shown. The method of training a classification model may include the steps of:

at step 201, at least one sample is selected from a set of samples.

In this embodiment, the execution subject of the method of training the classification model (e.g., the server 105 shown in fig. 1) may acquire a sample set in various ways. For example, the executing entity may obtain the existing sample set stored therein from a database server (e.g., database server 104 shown in fig. 1) through a wired connection or a wireless connection. As another example, a user may collect a sample through a terminal (e.g., terminals 101, 102 shown in fig. 1). In this way, the executing body may receive samples collected by the terminal and store the samples locally, thereby generating a sample set.

Here, the sample set may include at least one sample, each sample having a category label. The sample set may include multiple categories of samples. The number of samples per class may not be too large and thus the model training problem for small samples may be solved. The models trained each time belong to the same domain, for example, the samples acquired by the classification model of the image data are images with class labels. Different images may have different category labels, e.g., cat, dog, tree, car, etc.

In this embodiment, the executing body may select a batch of samples from the acquired sample set, and execute the training steps of steps 202 to 206. The selection manner and the selection number of the samples are not limited in this disclosure. For example, a batch of samples belonging to the same category may be randomly selected, or a batch of samples belonging to a plurality of categories may be randomly selected.

Step 202, extracting a concept representation of each sample and a concept representation of each category based on the concept representation network.

In this embodiment, the concept representation network is a neural network model for extracting concept representations from a sample. For example, the characteristic information of the sample is extracted by using a multilayer convolutional neural network CNN model. According to the characteristics of different application fields, different network structures, such as a simple multi-layer perceptron, LSTM, transducer and the like, can be adopted to extract the characteristic information of the sample. The larger the calculation amount required in the application field, the more network structures with more network layers are selected.

The characteristic information of the sample can be directly used as conceptual characterization. A concept representation of each sample is extracted based on the concept representation network. And clustering the concept characterizations of the samples with the same category labels to obtain the concept characterizations of each category.

Optionally, the extracted characteristic information can be enhanced to be used as a conceptual representation. The enhanced conceptual characterization may further enhance the trusted sample feature, weakening the untrusted sample feature. Thereby improving the credibility of the model.

In some alternative implementations of the present embodiments, the concept characterization network includes a feature extraction network, a regional self-attention mechanism network, and a concept aggregation pooling network; and extracting a concept representation for each sample based on the concept representation network, comprising:

s2021, inputting the at least one sample into a feature extraction network respectively to obtain the regional features of each sample.

In this embodiment, the corresponding feature extraction network may be selected according to the domain of the sample set application. For example, the image field employs CNN as a feature extraction network.

Assume that the information of one image extracted is characterized as follows:

wherein x represents an image, F is a mapping function, and θ is a training parameter. h, w, c are the length, width, and feature dimensions of the last layer output, respectively. Each element of the CNN network output spontaneously corresponds to a corresponding local region of the image of great interest. So that we can divide an image into h.w blocks, each block characterized by r _i ∈R ^c Wherein i representsThe ith zone of (3)Domain features. Where i=1, 2 …, h·w.

S2022, the regional characteristics of each sample are respectively input into a regional self-attention mechanism network, so that the enhanced regional characteristics of each sample are obtained.

In this embodiment, first, an enhanced representation of each region is obtained using a region-based self-attention mechanism network. Let all r _i A complete set R is formed.

Since the regional self-attention mechanism network does not have the capability of modeling regional position information, a new position information description code for a two-dimensional graph is designed first. It describes for two one-hot vector encodings, respectively, what row and what column the corresponding region is in the h×w image block output by CNN. For example, a one-hot vector describes that the area is the corresponding element in length h in the image, and then at the position of the j-th length, the one-hot vector value is 1, and the rest positions are 0. Another one-hot vector is treated equally. The final representation is as follows:

the above represents the positional information of the i·w+j-th area. Where i represents the position in h length and j represents the position in w width. The location information code of the region is then obtained using a neural network, such as a simple linear transformation, which is defined as

Second, it is noted that the self-attention network or transformerler layer requires multiple iterations to obtain global information, and thus more robust results. While multi-layered self-care networks or transformerlayers require a large amount of training data, which cannot be provided in small sample data sets. The features of the region and the position coding of the region may be directly used as the region features.

Optionally, in order to introduce global information in the shallow network, a global average feature is further introduced as global context information to obtain a better sensing result. The global context information is defined as follows:

the features of the summarized regions, the position coding of the regions, global context information constitute new region information, which is defined as follows:

at this time, the self-attention mechanism provided by the transformerlyer can be utilized to make the characteristics of different regions mutually perceived, and the process is defined as:

R ^e ＝FFN(Attention(R ⁿ ))

wherein the self-attention mechanism is defined as:

the feed forward neural network is defined as:

FFN(R)＝W ₂ σ(W ₁ R+b ₁ )+b ₂ ，

to obtain a more robust result, the usual residual sum Layer Normalization (layer normalization) may be used. The self-attention mechanism and the feedforward neural network are both the prior art, and related parameters in the formula are common knowledge, so that the description is omitted. The reliability of the classification model can be further improved by residual error and Layer Normalization.

S2023, the enhancement region features of each sample are respectively input into a concept aggregation pooling network to obtain the concept representation of each sample.

In this embodiment, after the region enhanced representation is obtained, a conceptual representation of the final image may be obtained through a conceptual pooling aggregation network. The concept pooling aggregation network may be attention pooling or average pooling. Attention pooling and average pooling may also be performed simultaneously. First is the attention pooling approach, which is defined as follows:

wherein the method comprises the steps of

M＝Sigmoid(Φ ₂ σ(Φ ₁ R ^e )))，

Wherein M is a weight vector obtained by learning, and each weight value M _i In the range of 0-1, the attention weight a is obtained by the above normalization operation _i WhereinHere σ is the nonlinear activation function ReLU, which is the same activation function used in the feed-forward neural network.

Attention mechanisms pooling mechanisms are provided to highlight region information with conceptual information (trusted) and to attenuate region information with conceptual-free information (i.e., untrusted). Therefore, the credible concept information can be effectively utilized, and the credibility of the model is not influenced even in a small sample scene.

But the attention pooling mechanism may be too focused on the areas with stronger part of the conceptual information, so that valuable information possibly provided by other areas is ignored, and therefore, the average pooling can be further adopted as the complement of the attention pooling, and more robust clustering results are obtained. Average pooling is also defined by G (R ^e ) And (5) calculating to obtain the product.

The final conceptual clustering result is:

where α is a weight adjustment parameter to adjust the importance of average pooling and attention pooling.

Clustering the concept characterizations of the samples with the same category labels to obtain the concept characterizations of each category.

In this embodiment, the conceptual representation of a category is defined as:

wherein S is ^k Representing a set of classes K,representative of S ^k Is S ^k The i represents the number of K class sets. />Representative image->A is the serial number of the image in the sample set.

Step 203, calculating the prediction probability of the category to which each sample belongs according to the distance between the conceptual representation of each sample and the conceptual representation of the category to which each sample belongs.

In the present embodiment, the distance (including, but not limited to, euclidean distance, cosine distance, etc.) between the conceptual representation of each sample and the conceptual representation of the category to which it belongs is calculated as the distance of each sample. And inputting the distance of each sample into a distance measurement function to obtain the prediction probability of each sample. This process may be implemented by a classifier.

For one sample characteristicThe class prediction probability is as follows:

where D represents a distance function (e.g., euclidean distance calculation formula), b is the index of the sample in the sample set, q is the index of the query set, Representing one of the samples. These sequence numbers are for the purpose of sample->The representations are distinguished. N is the total number of categories. />Representing a sample,/->Representing the corresponding label.

Step 204, calculating the total loss value according to the prediction probability of the category to which each sample belongs and the category label.

In this embodiment, the aggregation loss function may be used to further enhance the robustness of the model, and the final loss is defined as follows:

where L is the total loss value, β is the coefficient of polymerization loss,may be an actual sample tag or a smoothed sample tag (for example, the class of the image is a dog, the probability of belonging to the dog is 1, the probability of belonging to the dog is 0.95, the probability of belonging to the cat is 0.05), n _q Refers to the total number of samples in the sample set.

If the total loss value is less than the predetermined threshold, a classification model is constructed based on the concept-based characterization network, step 205.

In this embodiment, the total loss value is used to represent the gap between the predicted result and the real tag. The smaller the total loss value, the closer the predicted result of the model is to the real label. If the total loss value is less than the predetermined threshold, indicating that the concept characterization network training is complete, the concept characterization network and the classifier may be determined as a classification model.

If the total loss value is not less than the predetermined threshold, the relevant parameters of the concept characterization network are adjusted, and steps 201-206 are continued.

In this embodiment, if the execution subject determines that the concept characterization network is not trained, the relevant parameters in the concept characterization network may be adjusted. For example, using back propagation techniques to modify weights in the convolutional layers in the concept representation network. And may return to step 201 to re-select samples from the sample set. So that the training steps described above can be continued.

According to the method for training the classification model, a single sample is characterized by extracting basic sample local area information through a certain network. And then the local area information characterization information is mutually fused by using the area self-attention mechanism network, so that more reliable local area information characterization is obtained. These enhanced local region information characterizations have better conceptual characterizations. And finally, aggregating all local area information through a concept aggregation pooling network to obtain a final sample concept representation.

The sample category characterization is obtained by directly averaging the characterizations belonging to the same category, and if the category has only one supporting sample, the conceptual characterization of the sample is directly used as the conceptual characterization of the category.

Learning is performed through robust concept characterization of data (or samples, or commodities) and the like, the data is split into local information descriptions, and more robust concept characterization is obtained through a self-attention mechanism and a pooling mechanism. Thereby enabling a more trusted classification of small samples.

With further reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for training a classification model according to the present embodiment. In the application scenario of fig. 3, a sample set (i.e., a support set in the figure) is obtained and 2 samples are selected from the sample set, sample a and sample B, which have the same class label X. And inputting the 2 samples into CNN for feature extraction to obtain the regional features of the sample A and the regional features of the sample B, and respectively passing the regional features, the position codes and the global context information of the sample A and the sample B through a converter layer to obtain the reinforced regional features of the sample A and the reinforced regional features of the sample B. And inputting the reinforced regional features into a concept aggregation pooling network respectively to obtain the concept representation of the sample A and the concept representation of the sample B. The conceptual representation of sample a and the conceptual representation of sample B may be clustered to obtain a conceptual representation of class X. And calculating a distance D1 between the conceptual representation of the sample A and the conceptual representation of the class X, and determining the probability P1 that the sample A belongs to the class X according to the D1. And calculating a distance D2 between the conceptual representation of the sample B and the conceptual representation of the class X, and determining the probability P2 that the sample B belongs to the class X according to the D2. And respectively calculating a first loss value of P1 and the real label and a second loss value of P2 and the real label. The sum of the first loss value and the second loss value is the total loss value. If the total loss value is less than the predetermined threshold, the classification model training is complete. Otherwise, reselecting the sample from the sample set and continuing training. For the trained classification model, the query sample input image and the samples A and B can be used as the input of the classification model, and the probability that the query sample input image belongs to the category X can be obtained.

Referring to fig. 4, a flow 400 of one embodiment of a data classification method provided by the present disclosure is shown. The method of data classification may include the steps of:

step 401, obtaining target data to be classified and at least one class of sample data set.

In the present embodiment, the execution subject of the method of data classification (e.g., the server 105 shown in fig. 1) may acquire target data to be classified and sample data sets of at least one category in various ways. For example, the executing entity may obtain the sample data set stored therein from a database server (e.g., database server 104 shown in fig. 1) via a wired connection or a wireless connection. For another example, the executing body may also receive target data collected by a terminal (e.g., terminals 101, 102 shown in fig. 1) or other devices.

Step 402, inputting the target data and the sample data set into a classification model, and outputting the prediction probability of the category to which the target data belongs.

In this embodiment, the execution subject may input the target data and the sample data set acquired in step 401 into the classification model, so as to obtain the prediction probability of the category to which the target data belongs. For example, the probability that the target data belongs to a cat is 0.8 and the probability that it belongs to a dog is 0.2.

In this embodiment, the classification model may be generated using the method described above in connection with the embodiment of FIG. 2. The specific generation process may be referred to in the description of the embodiment of fig. 2, and will not be described herein.

It should be noted that, the data classification method of the present embodiment may be used to test the classification model generated in each of the above embodiments. And further, the classification model can be continuously optimized according to the test result. The method may be a practical application method of the classification model generated in each of the above embodiments. The classification model generated by the embodiments is used for classifying data, so that the reliability of data classification is improved.

With continued reference to FIG. 5, as an implementation of the method illustrated in the above figures, the present disclosure provides one embodiment of an apparatus for training a classification model. The embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices.

As shown in fig. 5, the apparatus 500 for training a classification model according to the present embodiment may include: a selecting unit 501, an extracting unit 502, a predicting unit 503, a calculating unit 504, and a circulating unit 505; a selection unit 501 configured to select at least one sample from a sample set, wherein the samples in the sample set have category labels; an extraction unit 502 configured to extract a concept representation of each sample and a concept representation of each category based on the concept representation network; a prediction unit 503 configured to calculate a prediction probability of a category to which each sample belongs according to a distance between a conceptual representation of each sample and the category to which the sample belongs; a calculation unit 504 configured to calculate a total loss value from the prediction probability and the category label of the category to which each sample belongs; a loop unit 505 configured to construct a classification model based on the concept characterization network if the total loss value is less than a predetermined threshold.

In this embodiment, the specific processes of the selecting unit 501, the extracting unit 502, the predicting unit 503, the calculating unit 504, and the circulating unit 505 of the apparatus 500 for training the classification model may refer to step 201, step 202, step 203, and step 205 in the corresponding embodiment of fig. 2. In some optional implementations of the present embodiment, the circulation unit 505 is further configured to: if the total loss value is not smaller than the predetermined threshold value, the relevant parameters of the concept characterization network are adjusted, and the selection unit 501, the extraction unit 502, the prediction unit 503, the calculation unit 504 and the circulation unit 505 are informed to continue to execute the steps 201-205.

In some optional implementations of the present embodiment, the extraction unit 502 is further configured to: extracting a concept representation of each sample based on a concept representation network; clustering the concept characterizations of the samples with the same category labels to obtain the concept characterizations of each category.

In some optional implementations of the present embodiment, the concept characterization network includes a feature extraction network, a regional self-attention mechanism network, and a concept aggregation pooling network; and the extraction unit 502 is further configured to: inputting the at least one sample into a feature extraction network respectively to obtain the regional features of each sample; the regional characteristics of each sample are respectively input into a regional self-attention mechanism network to obtain the enhanced regional characteristics of each sample; and respectively inputting the enhanced region characteristics of each sample into a concept aggregation pooling network to obtain the concept representation of each sample.

In some optional implementations of the present embodiment, the extraction unit 502 is further configured to: and selecting a characteristic extraction network with the positive correlation between the network layer number and the calculated amount according to the calculated amount of the field of the sample set application.

In some optional implementations of the present embodiment, the extraction unit 502 is further configured to: respectively encoding the position information of the regional characteristics of each sample to obtain the position code of each sample; calculating global average characteristics of the regional characteristics of each sample respectively to obtain global context information of each sample; forming the regional characteristics, the position codes and the global context information of each sample into regional information of each sample; and respectively inputting the regional information of each sample into a regional self-attention mechanism network to obtain the enhanced regional characteristics of each sample.

In some optional implementations of the present embodiment, the extraction unit 502 is further configured to: respectively inputting the enhanced region characteristics of each sample into an attention pooling network to obtain a first conceptual representation of each sample; respectively carrying out average pooling on the enhanced region characteristics of each sample to obtain a second conceptual representation of each sample; a weighted sum of the first and second conceptual representations of each sample is determined as the conceptual representation of each sample.

In some alternative implementations of the present embodiment, the category labels are smoothed labels.

With continued reference to fig. 6, as an implementation of the method of fig. 4 described above, the present disclosure provides one embodiment of a data sorting apparatus. The embodiment of the device corresponds to the embodiment of the method shown in fig. 4, and the device can be applied to various electronic devices.

As shown in fig. 6, the data sorting apparatus 600 of the present embodiment may include: an acquisition unit 601 and a classification unit 602. Wherein the acquisition unit 601 is configured to acquire target data to be classified and a sample data set of at least one class. The classification unit 602 is configured to input the target data and the sample data set into a classification model generated by the apparatus 500, and output a prediction probability of a category to which the target data belongs.

According to an embodiment of the disclosure, the disclosure further provides an electronic device, a readable storage medium.

An electronic device for outputting information, comprising: one or more processors; storage means having stored thereon one or more computer programs which, when executed by the one or more processors, cause the one or more processors to implement a method as described in flow 200 or 400.

A computer readable medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method as described in flow 200 or 400.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, for example, a method for outputting information. For example, in some embodiments, the method for outputting information may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM702 and/or communication unit 709. When a computer program is loaded into RAM703 and executed by computing unit 701, one or more steps of the method for outputting information described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method for outputting information by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a server of a distributed system or a server that incorporates a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology. The server may be a server of a distributed system or a server that incorporates a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of training a classification model, wherein the classification model is an image classification model, comprising: the following training steps are performed:

selecting at least one sample from a sample set, wherein the sample in the sample set is an image with a category label;

extracting a concept representation of each sample and a concept representation of each category based on a concept representation network, wherein the concept representation network comprises a feature extraction network, a regional self-attention mechanism network and a concept aggregation pooling network, wherein the concept aggregation pooling network performs attention pooling and/or average pooling;

Calculating the prediction probability of the category to which each sample belongs according to the distance between the conceptual representation of each sample and the conceptual representation of the category to which each sample belongs;

calculating a total loss value according to the prediction probability of the category to which each sample belongs and the category label;

if the total loss value is less than a predetermined threshold, constructing a classification model based on the concept characterization network.

2. The method of claim 1, wherein the method further comprises:

and if the total loss value is not smaller than a preset threshold value, adjusting the related parameters of the concept characterization network, and continuing to execute the training step.

3. The method of claim 1, wherein the extracting the concept representation of each sample and the concept representation of each category based on the concept representation network comprises:

extracting a concept representation of each sample based on a concept representation network;

4. A method according to claim 3, wherein the extracting a conceptual representation of each sample based on a conceptual representation network comprises:

inputting the at least one sample into a feature extraction network respectively to obtain the regional features of each sample;

The regional characteristics of each sample are respectively input into a regional self-attention mechanism network to obtain the enhanced regional characteristics of each sample;

and respectively inputting the enhanced region characteristics of each sample into a concept aggregation pooling network to obtain the concept representation of each sample.

5. The method of claim 4, wherein the method further comprises:

and selecting a characteristic extraction network with the positive correlation between the network layer number and the calculated amount according to the calculated amount of the field of the sample set application.

6. The method of claim 4, wherein the inputting the regional characteristics of each sample into the regional self-attention mechanism network, respectively, results in the enhanced regional characteristics of each sample, comprising:

respectively encoding the position information of the regional characteristics of each sample to obtain the position code of each sample;

calculating global average characteristics of the regional characteristics of each sample respectively to obtain global context information of each sample;

forming the regional characteristics, the position codes and the global context information of each sample into regional information of each sample;

and respectively inputting the regional information of each sample into a regional self-attention mechanism network to obtain the enhanced regional characteristics of each sample.

7. The method of claim 4, wherein the inputting the enhanced region features of each sample into the concept aggregation pooling network, respectively, results in a concept representation of each sample, comprises:

respectively inputting the enhanced region characteristics of each sample into an attention pooling network to obtain a first conceptual representation of each sample;

respectively carrying out average pooling on the enhanced region characteristics of each sample to obtain a second conceptual representation of each sample;

a weighted sum of the first and second conceptual representations of each sample is determined as the conceptual representation of each sample.

8. The method of any of claims 1-7, wherein the class label is a smoothed label.

9. A method of data classification, comprising:

acquiring target data to be classified and at least one class of sample data set;

inputting the target data and the sample data set into a classification model generated by the method according to any one of claims 1-8, and outputting the prediction probability of the category to which the target data belongs.

10. An apparatus for training a classification model, wherein the classification model is an image classification model, comprising:

a selecting unit configured to select at least one sample from a sample set, wherein the samples in the sample set are images with category labels;

An extraction unit configured to extract a concept representation of each sample and a concept representation of each category based on a concept representation network, wherein the concept representation network comprises a feature extraction network, a regional self-attention mechanism network, and a concept aggregation pooling network, wherein the concept aggregation pooling network performs attention pooling and/or average pooling;

a prediction unit configured to calculate a prediction probability of a category to which each sample belongs according to a distance between a conceptual representation of each sample and the category to which the sample belongs;

a calculation unit configured to calculate a total loss value from the prediction probability and the class label of the class to which each sample belongs;

and a circulation unit configured to construct a classification model based on the concept characterization network if the total loss value is less than a predetermined threshold.

11. A data sorting apparatus comprising:

an acquisition unit configured to acquire target data to be classified and a sample data set of at least one category;

a classification unit configured to input the target data and the sample data set into a classification model generated by the method according to any one of claims 1 to 8, and output a prediction probability of a category to which the target data belongs.

12. An electronic device for outputting information, comprising:

one or more processors;

a storage device having one or more computer programs stored thereon,

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-9.

13. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-9.