WO2021204269A1 - 分类模型的训练、对象分类 - Google Patents

分类模型的训练、对象分类 Download PDF

Info

Publication number
WO2021204269A1
WO2021204269A1 PCT/CN2021/086271 CN2021086271W WO2021204269A1 WO 2021204269 A1 WO2021204269 A1 WO 2021204269A1 CN 2021086271 W CN2021086271 W CN 2021086271W WO 2021204269 A1 WO2021204269 A1 WO 2021204269A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolution
label
result
vector
classification
Prior art date
Application number
PCT/CN2021/086271
Other languages
English (en)
French (fr)
Inventor
曹绍升
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021204269A1 publication Critical patent/WO2021204269A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • One or more embodiments of this specification relate to the field of computer technology, and in particular to training of a classification model and object classification.
  • Object classification refers to predicting which category of each specific category the object to be classified belongs to under a specific classification system through a pre-trained classification model.
  • the classification model here may be, for example, a convolutional neural network, etc., and the convolutional neural network may include a convolutional layer, a pooling layer, and the like.
  • the classification model as a convolutional neural network
  • the convolutional layer local features are extracted from the sample through a convolution operation.
  • the pooling layer global features are extracted from local features through maximum pooling or average pooling operations.
  • maximum pooling will strengthen the local features of the sample, these features may not be important information related to object classification.
  • averaging pooling will dilute effective features. Therefore, traditional training methods often cannot extract effective global features from samples, which will affect the accuracy of the trained classification model.
  • One or more embodiments of this specification describe a method and device for training and object classification of a classification model, which can improve the accuracy of the model, and thus can achieve effective classification of objects.
  • a method for training a classification model including:
  • the convolution layer based on several convolution windows of different widths, perform multiple convolution processing on the feature vector of the sample to obtain multiple convolution results;
  • the similarity between each convolution result in the multiple convolution results and the label vector of the classification label is calculated; and based on the calculated similarity, it is determined to correspond to each volume
  • the attention weight value of the product result based on the attention weight value corresponding to each convolution result, performing a weighted average pooling operation on each convolution result to obtain a pooling result;
  • the parameters of the classification model are adjusted.
  • an object classification method including:
  • the convolution layer based on several convolution windows of different widths, performing multiple convolution processing on the initial representation vector to obtain multiple convolution results;
  • the similarity between each convolution result of the multiple convolution results and the category vector of the current category is calculated; and based on the calculated similarity, it is determined to correspond to each volume
  • the attention weight value of the product result based on the attention weight value corresponding to each convolution result, performing a weighted average pooling operation on each convolution result to obtain a pooling result;
  • the target category to which the object to be classified belongs is determined from the several predetermined categories based on the multiple calculated similarities.
  • a training device for a classification model including:
  • the obtaining unit is used to obtain samples with classification labels
  • a determining unit configured to determine, in the embedding layer, the feature vector of the sample acquired by the acquiring unit and the label vector of the classification label;
  • a convolution unit configured to perform multiple convolution processing on the feature vector of the sample determined by the determining unit based on several convolution windows of different widths in the convolution layer to obtain multiple convolution results;
  • the pooling unit is configured to, in the pooling layer, calculate the similarity between each convolution result of the multiple convolution results obtained by the convolution unit and the label vector of the classification label; and based on The calculated similarity determines the attention weight value corresponding to each convolution result; based on the attention weight value corresponding to each convolution result, a weighted average pooling operation is performed on each convolution result, Get the pooling result;
  • the determining unit is further configured to use the pooling result obtained by the pooling unit as a sample representation vector of the sample, and determine the prediction loss based on at least the sample representation vector and the label vector of the classification label ;
  • the adjustment unit is configured to adjust the parameters of the classification model based on the prediction loss determined by the determination unit.
  • an object classification device including:
  • the acquiring unit is used to acquire the object to be classified and a number of predetermined categories
  • the calculation unit is configured to take each of the several predetermined categories acquired by the acquisition unit as the current category in turn, and perform similarity calculation based on the current category; the calculation unit includes: a determining sub-unit for In the embedding layer, the initial representation vector of the object to be classified and the category vector of the current category are determined; the convolution subunit is used in the convolution layer, based on several convolution windows of different widths, to The initial representation vector is subjected to multiple convolution processing to obtain multiple convolution results; the pooling subunit is used to calculate in the pooling layer, each convolution result of the multiple convolution results and the current category The degree of similarity between the category vectors; and based on the calculated similarity, determine the attention weight value corresponding to each convolution result; based on the attention weight value corresponding to each convolution result, compare the Each convolution result performs a weighted average pooling operation to obtain a pooling result; an acquisition subunit is used to use the pooling result as the final representation vector of the object to be classified, and calculate the
  • the determining unit is configured to, after the calculation unit performs the similarity calculation based on each of the plurality of predetermined categories, determine the to-be-expected from the plurality of predetermined categories based on the calculated similarities.
  • a computer storage medium is provided with a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the second aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect or the second aspect when the executable code is executed by the processor.
  • each convolution result in the pooling layer, can be pooled by weighted average based on the respective attention weight value of each convolution result. Since the attention weight value of each convolution result is determined based on the similarity between it and the label vector of the classification label, the training method provided in this specification can be determined by the guidance of the corresponding classification label for each sample.
  • the importance of each convolution result (a convolution result corresponds to a feature combination, such as multiple words), and use this importance as the attention weight value of each convolution result, and perform weighted average pooling to generate the corresponding
  • the samples of represent vectors, which can greatly improve the accuracy of the generated samples to represent vectors. It is understandable that under the premise that the accuracy of the sample representation vector is improved, the classification model obtained based on its training can be more accurate, and thus the effective classification of the object can be realized.
  • FIG. 1 is a schematic diagram of a training method of a classification model provided by an embodiment of this specification
  • Figure 2 is a schematic diagram of the text classification model provided in this manual
  • FIG. 3 is a flowchart of an object classification method provided by an embodiment of this specification.
  • Figure 4 is a schematic diagram of a training device for a classification model provided by an embodiment of this specification
  • Fig. 5 is a schematic diagram of an object classification device provided by an embodiment of this specification.
  • the applicant of this application proposes to introduce a pooling method based on the attention mechanism in the training process of the classification model.
  • a sample with a classification label is obtained.
  • the embedding layer of the model determine the feature vector of the sample and the label vector of the classification label.
  • the convolutional layer of the model based on several convolution windows of different widths, the feature vector of the sample is subjected to multiple convolution processing to obtain multiple convolution results.
  • each convolution result obtained by the convolution layer is assigned a corresponding attention weight value.
  • the attention weight value of each convolution result is determined based on the similarity between the convolution result and the label vector of the classification label.
  • the prediction loss is determined. Based on the prediction loss, adjust the parameters of the classification model.
  • the classification model described in this specification includes: embedding layer, convolutional layer, pooling layer, and so on.
  • the training method of the classification model provided in this specification for each sample, the importance of each part of the feature in the sample will be determined through the guidance of the corresponding classification label, and the corresponding sample representation vector will be generated based on this. Greatly improve the accuracy of the generated sample representation vector. It is understandable that under the premise that the accuracy of the sample representation vector is improved, the classification model obtained based on its training can be more accurate, and thus the effective classification of the object can be realized.
  • Fig. 1 is a flowchart of a training method of a classification model provided by an embodiment of this specification.
  • the execution subject of the method may be a device with processing capability: a server or a system or a device. As shown in Figure 1, the method may specifically include:
  • Step 102 Obtain samples with classification labels.
  • the classification model described in this specification can be used for business processing.
  • the business processing here may include, but is not limited to, business processing based on image recognition (such as face recognition, target detection, etc.), business processing based on audio recognition (such as speech recognition, voiceprint analysis, etc.), and business based on text analysis Processing (eg, text classification and intent recognition, etc.).
  • image recognition such as face recognition, target detection, etc.
  • audio recognition such as speech recognition, voiceprint analysis, etc.
  • text analysis Processing eg, text classification and intent recognition, etc.
  • the acquired sample may be a picture
  • the classification label of the sample may be a picture category label
  • the training classification model may be a picture classification model.
  • the above service processing is a service processing based on audio recognition
  • the sample obtained above may be audio
  • the classification label of the sample may be an audio category label
  • the training classification model may be an audio classification model.
  • the business processing is a business processing based on text analysis
  • the sample obtained above may be a text
  • the classification label of the sample may be a text category label
  • the training classification model may be a text classification model.
  • the aforementioned classification label may be any classification label in a predetermined label set.
  • the predetermined label set may include the following two classification labels: advertising and non-advertising, then the above-mentioned classification labels can be advertising or non-advertising.
  • the number of classification labels in the predetermined label set may be more than two.
  • subject information classification a type of text analysis
  • the number of classification tags in the predetermined tag set can be more than two .
  • Step 104 in the embedding layer, determine the feature vector of the sample and the label vector of the classification label.
  • the above-mentioned sample feature may refer to the words in the text.
  • the above-mentioned sample feature may refer to a speech segment.
  • the process of determining the corresponding feature vector can be: assuming that the obtained text contains n words: x1, x2,..., xn, then for each word xi, an m-dimensional vector can be randomly initialized , So that the word vector corresponding to each word can be obtained: f1, f2,..., fn. That is, each fi among them is an m-dimensional vector.
  • n and m are positive integers, and 1 ⁇ i ⁇ n.
  • the word vector corresponding to each word can also be determined based on the word vectorization method.
  • the word vectorization method here includes any of the following: Word2Vec and Glove.
  • the feature vector of the audio can be obtained based on the representation vector corresponding to each speech segment included in the audio.
  • the dimension of the representation vector corresponding to each speech segment is the same, for example, all of them are m-dimensional.
  • the classification label of the text can also be initialized as an N-dimensional vector (hereinafter referred to as a label vector, denoted as: hy).
  • N is a positive integer.
  • Step 106 In the convolution layer, based on several convolution windows of different widths, perform multiple convolution processing on the feature vector of the sample to obtain multiple convolution results.
  • the several convolution windows with different widths here can be, for example, conv3, conv4, conv5, etc., and the specific number can be preset manually.
  • the corresponding width is the width of 3 words or 3 speech fragments. Assuming that each word or each speech segment is an m-dimensional vector, its width can be expressed as: 3*m.
  • the corresponding convolution processing may specifically include: for the first convolution window, based on the width of the first convolution window, determining the corresponding The dimension of the feature vector selected for convolution processing.
  • the first convolution window as conv3 as an example, since its width is the width of 3 words or 3 speech segments, it can be determined that the dimension of the feature vector selected by the corresponding convolution processing is 3*m.
  • the predetermined step here is an integer multiple of the dimension of a word vector (or a representation vector corresponding to a speech segment), for example, it can be m or 2m.
  • the convolution processing based on a certain width of the convolution window may be an iterative process.
  • the termination condition of the iteration can also be that the number of iterations reaches a fixed number, etc., which is not limited in this specification.
  • the foregoing iterative process based on the first convolution window may specifically include:
  • the elements of the above-mentioned dimensions are selected from the feature vector of the sample.
  • the selected elements of the above-mentioned dimensions are spliced to obtain the current splicing vector.
  • linear transformation is performed on the current stitching vector to obtain a linear transformation result.
  • one of the multiple convolution results is determined.
  • the next position is determined, and the next position is taken as the current position.
  • the feature vector of the text is obtained by concatenating the word vectors f1, f2, ..., fn, where any fi is an m-dimensional vector.
  • the first convolution window is conv3
  • 3*m elements can be selected from the feature vector, that is, 3
  • the word vectors of the words are respectively: f t , f t+1 and f t+2 .
  • the current splicing vector can be obtained: [f t ; f t+1 ; f t+2 ].
  • the current stitching vector can be linearly transformed as shown in Equation 1.
  • the linear transformation result for the current stitching vector is directly used as the convolution result ht in Formula 1.
  • an activation function can also be used to perform nonlinear transformation on the linear transformation result. After that, the nonlinear transformation result is used as the above-mentioned convolution result ht.
  • the activation function here may include, but is not limited to, tanh function, relu function, sigmoid function, and so on.
  • the convolution result ht is determined, based on the current position: the starting position of the word vector f t and a predetermined step size (assuming m), the next position: the starting position of the word vector f t+1 can be determined, and the next A position is updated to the current position. After that, based on the updated current position, the steps of the iteration are repeated until the termination condition of the iteration is satisfied.
  • the multiple convolution results can be expressed as: h1, h2, ..., hk, where each hi is a vector containing N elements.
  • i and k are positive integers, and 1 ⁇ i ⁇ k.
  • each of the hi is equal to It can be regarded as the local information of the sample.
  • the use of convolution windows of different widths can more comprehensively capture the local information of the sample.
  • processing of the pooling layer can be understood as: determining important local information from the local information captured by the convolutional layer, and strengthening the semantics to improve the accuracy of the final generated sample representation vector , The following is a detailed description.
  • Step 108 In the pooling layer, calculate the similarity between each of the multiple convolution results and the label vector of the classification label. And based on the calculated similarity, the attention weight value corresponding to each convolution result is determined. Based on the attention weight value corresponding to each convolution result, the weighted average pooling operation is performed on each convolution result to obtain the pooling result.
  • the multiple convolution results h1-hk obtained based on the convolution layer are all vectors containing N elements. Since the aforementioned label vector hy is also an N-dimensional vector, each convolution result hi has the same dimension as the label vector hy.
  • the step of calculating the similarity between each convolution result in the multiple convolution results and the label vector of the classification label may include: for each convolution result in the multiple convolution results, at least calculating the The first dot product between the convolution result and the label vector of the classification label. At least the second dot product between the convolution result and the vector of each classification label in the predetermined label set is calculated, and each second dot product is summed to obtain the first sum result. Based on the ratio of the first dot product to the second summation result, the similarity between the convolution result and the label vector of the classification label is determined.
  • the vector of each classification label in the aforementioned predetermined label set may be an N-dimensional vector, and the vector of each classification label may be obtained by random initialization, or may be obtained by adjusting the initial vector during the model training process.
  • the similarity between each of the multiple convolution results and the label vector of the classification label may be calculated based on the following formula:
  • h y vector classification label tag a t h a t the degree of similarity between the label and the convolution result of the vector h y, which may be a [0,1] of Real value between.
  • Y is a predetermined label set
  • h y' is a label vector of a certain category label in the predetermined label set
  • g() is a dot product calculation function, which may include but is not limited to the following definition methods:
  • Wa is a hyperparameter, which is initialized randomly and updated by reverse gradient.
  • the calculated similarity can be used as the convolution result Attention weight value. After that, based on the attention weight value, the semantic enhancement of important local information is realized, and the weighted average pooling operation is further performed.
  • the following describes the steps of performing a weighted average pooling operation on each convolution result based on the attention weight value corresponding to each convolution result.
  • this step may specifically be: for any first convolution result in each convolution result, the attention weight value of the first convolution result is used as the attention weight value of the N elements therein .
  • the weighted average pooling operation is sequentially performed on the elements at the same position of each convolution result to obtain the pooling result.
  • the weighted average pooling operation can be implemented by the following formula:
  • h t (j) is the result of the convolution of N elements in H t j-th element
  • a t h a t the weight value of weight attention convolution result
  • k is the number of convolution result
  • q (j ) Is the jth element in the pooling result.
  • the pooling result is also a vector containing N elements, so the pooling result has the same dimension as the convolution results.
  • each element is assigned an attention weight value. That is to say, the attention mechanism is introduced in the pooling method of this scheme, which helps to determine the true and important information of the sample. In the case where the accuracy of the real important information of the determined sample is improved, the accuracy of the generated sample representation vector can be improved, so that the classification model obtained based on its training is more accurate.
  • step 110 the pooling result obtained by the pooling layer is used as the sample representation vector of the sample, and the prediction loss is determined based on at least the sample representation vector and the label vector of the classification label.
  • the dot product between the sample representation vector and the label vector of the classification label may be calculated first, and then the prediction loss is determined based on the dot product.
  • the predicted loss determined here can be inverted with respect to the dot product calculated above.
  • classification labels also called “negative labels”
  • the prediction loss is determined.
  • the third dot product between the sample representation vector and the label vector of the classification label can be calculated.
  • the prediction loss is determined so that the prediction loss is inversely related to the third dot product and positively related to the fourth dot product.
  • the above prediction loss can be determined based on the following formula.
  • L is the prediction loss
  • q is the sample representation vector of the sample
  • h y is the label vector of the classification label of the above sample
  • Y is the predetermined label set
  • y" is a certain classification label among several other classification labels selected at random.
  • H y" is its corresponding label vector.
  • is a predefined hyperparameter, and its value range can be: [2,10].
  • is an excitation function, for example, it can be a sigmoid function.
  • the corresponding value can be set based on its corresponding value range. If the total number of classification labels in the predetermined label set is less than the value of ⁇ , then ⁇ is set as the total number of classification labels, otherwise the original value remains unchanged.
  • the correlation between the positive classification label (that is, the classification label of the above sample) and the sample feature can be made higher, and the negative classification label (the predetermined label set is divided by the positive classification The correlation between the label outside the label) and the sample feature becomes low.
  • Step 112 Adjust the parameters of the classification model based on the prediction loss.
  • the parameters of the classification model may be adjusted by calculating the gradient back propagation. Specifically, it can be to adjust the parameters of the embedding layer, convolutional layer, and pooling layer.
  • the parameters of the embedding layer include: the representation vector of each sample feature in the sample (such as the word vector) and the label vector of the classification label. It should be understood that when the identification vector and the label vector of each sample feature are adjusted, it is equivalent to adjusting the above-mentioned attention weight value.
  • the above steps 102 to 112 are executed iteratively, and the model parameters used in each iteration are the parameters after the last adjustment.
  • the termination condition of the iteration may be that the number of iterations reaches a predetermined round or the value of the loss function shown in Formula 5 converges.
  • the representation vector of each sample feature in the sample and the label vector of the classification label will be initialized randomly.
  • the representation vector and label vector of the aforementioned sample features can be continuously adjusted until the optimal representation vector of each sample feature and classification label is obtained.
  • the optimal representation vector of each sample feature and classification label obtained here can be applied to the subsequent steps of object classification, which can improve the accuracy of object classification.
  • this solution will sequentially perform convolution processing on different positions and different numbers of consecutive sample features in the sample, so that the local information of the sample can be captured more comprehensively.
  • this solution determines the attention weight value of the convolution result based on the similarity between the convolution result and the label vector of the classification label for each convolution result. After that, based on the determined attention weight value, a weighted average operation is performed on each convolution result to obtain the sample representation vector of the sample. Since different attention weight values reflect the different importance of the corresponding convolution results, when pooling each convolution result based on the attention weight value, important local information can be effectively strengthened, so that it can be more important. Express the sample well.
  • the training method of the classification model provided by one embodiment of the present specification can greatly improve the accuracy of the classification model, and further can realize the effective classification of objects.
  • the classification model is a text classification model
  • the classification label is a text category label
  • the sample is a text as an example to illustrate the training process of the text classification model.
  • Figure 2 is a schematic diagram of the text classification model provided in this specification.
  • the text classification model may include: an embedding layer, a convolutional layer, a pooling layer, and a softmax layer.
  • the n words contained in the text can be determined: the word vectors of x1, x2, ..., xn, where the word vectors of the n words can be expressed as: f1, f2, ..., fn.
  • the label vector hy of the classification label of the text can also be determined.
  • the convolution layer based on conv3, conv4, and conv5 convolution windows of different widths, the word vectors of each word are convolved multiple times to obtain multiple convolution results.
  • the similarity between each convolution result and the label vector of the classification label is calculated. And based on the calculated similarity, the attention weight value ai corresponding to each convolution result is determined. Based on the attention weight value corresponding to each convolution result, the weighted average pooling operation is performed on each convolution result to obtain the pooling result q. In the softmax layer, the prediction loss is determined based on the pooling result q (that is, the text representation vector) and the label vector of the classification label.
  • the training method of the classification model shown in FIG. 2 is also applicable to the training of other object classification models containing hash features.
  • it can also be applied to the training of voiceprint recognition models.
  • the step of determining the word vector of each word in the embedding layer can replace the determination of the representation vector corresponding to each speech segment input by the user.
  • the processing of the convolutional layer and the pooling layer is similar to the training process of the text classification model, and will not be repeated here.
  • the following describes the pooling method based on the attention mechanism in the model training process of this scheme in combination with two examples.
  • the trained model is an advertisement prediction model
  • the input sample can be a text sentence
  • the classification label includes the following two types: advertisement and non-advertising.
  • the Venetian Macao online gambling platform is online, download it quickly, and add WeChat 22xxx32 to Miss About.
  • the corresponding classification label is: when advertising, through the adaptive pooling method of the attention mechanism, it can be analyzed that the words “online”, “gambling”, “download” and the like are closely related to the label. In the pooling process, including The corresponding convolutional layer results of these words will get larger weights.
  • the trained model is a pornographic prediction model
  • the input sample can also be a text sentence
  • the classification labels include the following two types: pornographic and non-pornographic.
  • Fig. 3 is a flowchart of an object classification method provided by an embodiment of this specification. As shown in Figure 3, the method may include:
  • Step 302 Obtain an object to be classified and a number of predetermined categories.
  • the object to be classified here may include, but is not limited to, any of the following: text to be classified, audio to be classified, and picture to be classified.
  • the foregoing predetermined categories may include two categories of advertisement and non-advertisement.
  • the number of the foregoing predetermined categories may also be more than two. For example, in a scenario where subject information is classified for microblog content, each subject information can be used as a predetermined category.
  • Step 304 Taking each of several predetermined categories as the current category in turn, and performing similarity calculation based on the current category.
  • step 304 may specifically include the following steps:
  • Step 3042 In the embedding layer, determine the initial representation vector of the object to be classified and the category vector of the current category.
  • the initial representation vector of the object to be classified may be determined based on the representation vector of each feature in the object to be classified (for example, a word vector or a representation vector of a speech segment).
  • the representation vector of each feature and the category vector of the current category may be obtained by training in the above-mentioned classification model training process. That is, all of them can be the optimal representation vector provided above.
  • Step 3044 In the convolutional layer, based on several convolution windows of different widths, perform multiple convolution processing on the initial representation vector to obtain multiple convolution results.
  • Step 3046 In the pooling layer, calculate the similarity between each convolution result of the multiple convolution results and the category vector of the current category, and determine the attention weight corresponding to each convolution result based on the calculated similarity value. Based on the attention weight value corresponding to each convolution result, the weighted average pooling operation is performed on each convolution result to obtain the pooling result.
  • the steps 3044 to 3046 here are the same as the above steps 106 to 108, and will not be repeated here.
  • step 3048 the pooling result is used as the final representation vector of the object to be classified, and the similarity between the final representation vector and the category vector of the current category is calculated.
  • the similarity between the two vectors may include, but is not limited to: cosine similarity, Euclidean distance, Manhattan distance, Pearson correlation coefficient, and so on.
  • Step 306 After performing the above-mentioned similarity calculation based on each of several predetermined categories, based on the multiple calculated similarities, determine the target category to which the object to be classified belongs from the several predetermined categories.
  • the category vector points of the two categories are hy0 and hy1, respectively.
  • the final representation vector q0 of the object to be classified can be obtained, and the similarity S0 between hy0 and q0 can be calculated.
  • the final representation vector q1 of the object to be classified can be obtained, and the similarity S1 between hy1 and q1 can be calculated.
  • S0>S1 the category y0 can be used as the target category to which the object to be classified belongs.
  • the object classification method provided in the embodiments of the present specification can classify the object to be classified based on the representation vector and the category vector of the feature trained in the model training process, which can greatly improve the accuracy of object classification.
  • the solutions provided in this specification can be widely used in various text classification scenarios, for example, can be applied to the classification scenarios of comment texts in major communities.
  • existing communities provide users with a comment area, but some users use the comment area to advertise third-party products or publish pornographic information. Therefore, a classification model of review texts needs to be established to determine whether a review text is illegal.
  • the traditional average pooling method if the traditional average pooling method is used, then the long irrelevant information will dilute (average) the effective information, resulting in it not being correctly expressed. If the traditional maximum pooling method is used, the enhanced information is often determined only based on the text content, and the content that is truly illegal cannot be enhanced. For example, it will strengthen “friends”, “National Day”, and “go out”. , "Participation in activities”, “Gambling” and “Online” and other partial information, which will also make the final expression of the text inaccurate.
  • the text classification method provided by this solution because it can be combined with the pooling method based on the attention mechanism, that is, it will strengthen the local information related to the label, so that the effective information content of the text can be accurately determined, and then the effective information content of the text can be accurately determined. Improve the accuracy of the text representation vector. On the premise that the accuracy of the text representation vector is improved, the text classification model obtained based on its training can be more accurate, and then effective text classification can be realized.
  • an embodiment of this specification also provides a training device for the classification model.
  • the classification model includes embedding layer, convolution layer and pooling layer.
  • the device may include:
  • the obtaining unit 402 is used to obtain samples with classification labels.
  • the determining unit 404 is configured to determine the feature vector of the sample acquired by the acquiring unit 402 and the label vector of the classification label in the embedding layer.
  • the convolution unit 406 is configured to perform multiple convolution processing on the feature vector of the sample determined by the determining unit 404 based on several convolution windows of different widths in the convolution layer to obtain multiple convolution results.
  • the several convolution windows with different widths include the first convolution window.
  • the convolution unit 406 may be specifically configured to: for the first convolution window, determine the dimension of the feature vector selected by the corresponding convolution process based on the width of the first convolution window; The parameters and the predetermined step size are used to perform convolution processing on the feature vector of the sample.
  • the convolution unit 406 can also be specifically used to iteratively perform the following steps until reaching a predetermined number of times: starting from the current position, selecting the above-mentioned dimensionality elements from the feature vector of the sample; splicing the selected dimensionality elements to obtain Current stitching vector; based on the parameters of the first convolution window, perform linear transformation on the current stitching vector to obtain a linear transformation result; based on the linear transformation result, determine one convolution result among multiple convolution results; based on the current position and the predetermined step Long, determine the next position, and use the next position as the current position.
  • the convolution unit 406 may also be specifically used to: use the linear transformation result as one of the multiple convolution results; or, use the activation function to perform nonlinear transformation on the linear transformation result, and use the nonlinear transformation result as multiple convolution results. One of the convolution results.
  • the pooling unit 408 is configured to calculate the similarity between each convolution result of the multiple convolution results obtained by the convolution unit 406 and the label vector of the classification label in the pooling layer, and based on the calculated similarity, Determine the attention weight value corresponding to each convolution result. Based on the attention weight value corresponding to each convolution result, the weighted average pooling operation is performed on each convolution result to obtain the pooling result.
  • Each of the above convolution results is a vector containing N elements.
  • the pooling unit 408 may be specifically configured to: for any first convolution result in each convolution result, use the attention weight value of the first convolution result as the attention weight value of the N elements; based on each convolution As a result, the attention weight values of the respective N elements are sequentially weighted average pooling operation on the elements at the same position of each convolution result to obtain the pooling result.
  • the above-mentioned classification labels belong to a predetermined label set.
  • the pooling unit 408 may be specifically configured to: for each convolution result of the multiple convolution results, at least calculate the first dot product between the convolution result and the label vector of the classification label; at least calculate the convolution result and Predetermine the second dot product between the vectors of each classification label in the label set, and sum each second dot product to obtain the first sum result; based on the ratio of the first dot product to the first sum result, determine The similarity between the convolution result and the label vector of the classification label.
  • the determining unit 404 is further configured to use the pooling result obtained by the pooling unit 408 as the sample representation vector of the sample, and determine the prediction loss based at least on the sample representation vector and the label vector of the classification label.
  • the determining unit 404 may be specifically used to: randomly select several other classification labels different from the classification label from a predetermined label set containing classification labels; determine based on the sample representation vector, the label vector of the classification label, and the label vector of several other classification labels. Forecast loss.
  • the determining unit 404 may also be specifically used to: calculate the third dot product between the sample representation vector and the label vector of the classification label; calculate the fourth dot product between the sample representation vector and the label vectors of several other classification labels, and compare each The fourth dot product is summed to obtain the second sum result; based on the third dot product and the second sum result, the prediction loss is determined, so that the prediction loss is inversely related to the third dot product and is positively correlated to the fourth point product.
  • the adjusting unit 410 is configured to adjust the parameters of the classification model based on the prediction loss determined by the determining unit 404.
  • the above classification model is a text classification model, the classification label is a text category label, and the sample is a text; or, the above classification model is a picture classification model, the classification label is a picture category label, and the sample is a picture; or, the above classification model is Audio classification model, the classification label is the audio category label, and the sample is the audio.
  • the training device of the classification model provided in an embodiment of the present specification can improve the accuracy of the classification model, and further can realize the effective classification of objects.
  • an embodiment of this specification also provides an object classification device.
  • the device operates based on a pre-trained classification model, which includes an embedding layer, a convolutional layer, and a pooling layer.
  • a pre-trained classification model which includes an embedding layer, a convolutional layer, and a pooling layer.
  • the device may include:
  • the acquiring unit 502 is configured to acquire the object to be classified and several predetermined categories.
  • the calculation unit 504 is configured to use each of the several predetermined categories acquired by the acquiring unit 502 as the current category in turn, and perform similarity calculation based on the current category.
  • the calculation unit 504 includes: a determining subunit 5042, which is used to determine the initial representation vector of the object to be classified and the category vector of the current category in the embedding layer; The convolution window of, performs multiple convolution processing on the initial representation vector to obtain multiple convolution results; the pooling subunit 5046 is used to calculate the convolution results of the multiple convolution results in the pooling layer and the current.
  • the similarity between the category vectors of the category and based on the calculated similarity, determine the attention weight value corresponding to each convolution result, and perform the calculation on each convolution result based on the attention weight value corresponding to each convolution result.
  • the weighted average pooling operation is used to obtain the pooling result; the acquiring subunit 5048 is used to use the pooling result as the final representation vector of the object to be classified, and to calculate the similarity between the final representation vector and the category vector of the current category.
  • the determining unit 506 is configured to, after the calculating unit 504 performs similarity calculation based on each of the several predetermined categories, based on the multiple calculated similarities, determine the target category to which the object to be classified belongs from the several predetermined categories.
  • the object classification device provided in an embodiment of this specification can realize effective classification of objects.
  • the embodiments of this specification provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method shown in FIG. 1 or FIG. 3.
  • the embodiment of the present specification provides a computing device, including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the implementation shown in FIG. 1 or FIG. 3 is implemented. Indicates the method.
  • the steps of the method or algorithm described in conjunction with the disclosure of this specification can be implemented in a hardware manner, or can be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage known in the art Medium.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC may be located in the server.
  • the processor and the storage medium may also exist as discrete components in the server.
  • the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof.
  • these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
  • the computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
  • the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本说明书实施例提供一种分类模型的训练、对象分类方法及装置,在训练方法中,获取带分类标签的样本。在嵌入层中,确定样本的特征向量以及分类标签的标签向量。在卷积层中,基于若干不同宽度的卷积窗口,对样本的特征向量进行多次卷积处理,得到多个卷积结果。在池化层中,计算各卷积结果与分类标签的标签向量之间的相似度,并基于计算得到的相似度,确定对应于各卷积结果的注意力权重值。基于对应于各卷积结果的注意力权重值,对各卷积结果进行加权平均池化操作,得到池化结果。将池化结果作为样本的样本表示向量,并至少基于样本表示向量以及分类标签的标签向量,确定预测损失。基于预测损失,调整分类模型的参数。

Description

分类模型的训练、对象分类 技术领域
本说明书一个或多个实施例涉及计算机技术领域,尤其涉及一种分类模型的训练、对象分类。
背景技术
对象分类是指通过预先训练的分类模型,预测待分类对象属于特定分类体系下各个特定类别中哪个类别。这里的分类模型例如可以为卷积神经网络等,该卷积神经网络可以包括卷积层和池化层等。
以分类模型为卷积神经网络为例来说,在传统的模型训练方法中,在卷积层,会通过卷积操作从样本中提取局部特征。在池化层,会通过最大池化或者平均池化操作从局部特征中提取全局特征。然而,由于最大池化会加强样本的局部特征,但这部分特征可能并非是与对象分类相关的重要信息。此外,平均池化会冲淡有效特征。因此,基于传统的训练方法往往不能从样本中提取到有效的全局特征,进而会影响所训练得到的分类模型的精度。
有鉴于此,希望能有改进的方案,可以提升分类模型的精度,进而可以实现对象的有效分类。
发明内容
本说明书一个或多个实施例描述了一种分类模型的训练、对象分类方法及装置,可以提升模型精度,进而可以实现对象的有效分类。
第一方面,提供了一种分类模型的训练方法,包括:
获取带分类标签的样本;
在所述嵌入层中,确定所述样本的特征向量以及所述分类标签的标签向量;
在所述卷积层中,基于若干不同宽度的卷积窗口,对所述样本的特征向量进行多次卷积处理,得到多个卷积结果;
在所述池化层中,计算所述多个卷积结果中各卷积结果与所述分类标签的标签向量 之间的相似度;并基于计算得到的相似度,确定对应于所述各卷积结果的注意力权重值;基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果;
将所述池化结果作为所述样本的样本表示向量,并至少基于所述样本表示向量以及所述分类标签的标签向量,确定预测损失;
基于所述预测损失,调整所述分类模型的参数。
第二方面,提供了一种对象分类方法,包括:
获取待分类对象以及若干预定类别;
将所述若干预定类别中的每个类别依次作为当前类别,基于当前类别进行相似度计算,所述相似度计算包括:
在所述嵌入层中,确定所述待分类对象的初始表示向量以及所述当前类别的类别向量;
在所述卷积层中,基于若干不同宽度的卷积窗口,对所述初始表示向量进行多次卷积处理,得到多个卷积结果;
在所述池化层中,计算所述多个卷积结果中各卷积结果与所述当前类别的类别向量之间的相似度;并基于计算得到的相似度,确定对应于所述各卷积结果的注意力权重值;基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果;
将所述池化结果作为所述待分类对象的最终表示向量,并计算所述最终表示向量与所述当前类别的类别向量之间的相似度;
在基于所述若干预定类别中的每个类别进行所述相似度计算之后,基于计算得到的多个相似度,从所述若干预定类别中确定出所述待分类对象所属的目标类别。
第三方面,提供了一种分类模型的训练装置,包括:
获取单元,用于获取带分类标签的样本;
确定单元,用于在所述嵌入层中,确定所述获取单元获取的所述样本的特征向量以及所述分类标签的标签向量;
卷积单元,用于在所述卷积层中,基于若干不同宽度的卷积窗口,对所述确定单元 确定的所述样本的特征向量进行多次卷积处理,得到多个卷积结果;
池化单元,用于在所述池化层中,计算所述卷积单元得到的所述多个卷积结果中各卷积结果与所述分类标签的标签向量之间的相似度;并基于计算得到的相似度,确定对应于所述各卷积结果的注意力权重值;基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果;
所述确定单元,还用于将所述池化单元得到的所述池化结果作为所述样本的样本表示向量,并至少基于所述样本表示向量以及所述分类标签的标签向量,确定预测损失;
调整单元,用于基于所述确定单元确定的所述预测损失,调整所述分类模型的参数。
第四方面,提供了一种对象分类装置,包括:
获取单元,用于获取待分类对象以及若干预定类别;
计算单元,用于将所述获取单元获取的所述若干预定类别中的每个类别依次作为当前类别,基于当前类别进行相似度计算;所述计算单元包括:确定子单元,用于在所述嵌入层中,确定所述待分类对象的初始表示向量以及所述当前类别的类别向量;卷积子单元,用于在所述卷积层中,基于若干不同宽度的卷积窗口,对所述初始表示向量进行多次卷积处理,得到多个卷积结果;池化子单元,用于在所述池化层中,计算所述多个卷积结果中各卷积结果与所述当前类别的类别向量之间的相似度;并基于计算得到的相似度,确定对应于所述各卷积结果的注意力权重值;基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果;获取子单元,用于将所述池化结果作为所述待分类对象的最终表示向量,并计算所述最终表示向量与所述当前类别的类别向量之间的相似度;
确定单元,用于在所述计算单元基于所述若干预定类别中的每个类别进行所述相似度计算之后,基于计算得到的多个相似度,从所述若干预定类别中确定出所述待分类对象所属的目标类别。
第五方面,提供了一种计算机存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面或者第二方面的方法。
第六方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面或者第二方面的方法。
本说明书一个或多个实施例提供的分类模型的训练方法,在池化层中,可以基于各 卷积结果各自的注意力权重值,对各卷积结果进行加权平均池化。由于每个卷积结果的注意力权重值,基于其与分类标签的标签向量之间的相似度确定,从而本说明书提供的训练方法针对每个样本,可以通过相应的分类标签的引导,来确定各卷积结果(一个卷积结果对应于一个特征组合,如多个词)的重要度,并将该重要度作为各卷积结果的注意力权重值,进行加权平均池化,由此生成相应的样本表示向量,这可以大大提升所生成的样本表示向量的准确性。可以理解的是,在样本表示向量的准确性提高的前提下,基于其训练得到的分类模型可以更精确,进而可以实现对象的有效分类。
附图说明
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本说明书一个实施例提供的分类模型的训练方法示意图;
图2为本说明书提供的文本分类模型示意图;
图3为本说明书一个实施例提供的对象分类方法流程图;
图4为本说明书一个实施例提供的分类模型的训练装置示意图;
图5为本说明书一个实施例提供的对象分类装置示意图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
在描述本说明书提供的方案之前,先对本方案的发明构思作以下说明。
如前所述,在传统的分类模型的训练方法中,在池化层中,通过最大池化和平均池化操作,从通过卷积层得到的局部特征中提取全局特征。然而,由于最大池化会加强样本的局部特征,但这部分特征可能与标签的相关度并不高。此外,平均池化会冲淡有效特征。因此,基于传统的训练方法,往往不能训练得到有效的分类模型。
为此,本申请的申请人提出,在分类模型的训练过程中,引入基于注意力机制的池化方法。具体地,获取带分类标签的样本。在模型的嵌入层中,确定样本的特征向量以 及分类标签的标签向量。在模型的卷积层中,基于若干不同宽度的卷积窗口,对样本的特征向量进行多次卷积处理,得到多个卷积结果。在模型的池化层中,对通过卷积层得到的各卷积结果赋予相应的注意力权重值。其中,每个卷积结果的注意力权重值基于该卷积结果与分类标签的标签向量之间的相似度确定。之后基于各卷积结果各自的注意力权重值,对各卷积结果进行加权平均池化操作,并将得到的池化结果作为样本的样本表示向量。最后,基于样本表示向量以及分类标签的标签向量,确定预测损失。基于预测损失,调整分类模型的参数。
也就是说,本说明书所述的分类模型包括:嵌入层、卷积层和池化层等。此外,在本说明书提供的分类模型的训练方法中,针对每个样本,会通过相应的分类标签的引导,确定样本中各部分特征的重要度,并基于此生成相应的样本表示向量,这可以大大提升所生成的样本表示向量的准确性。可以理解的是,在样本表示向量的准确性提高的前提下,基于其训练得到的分类模型可以更精确,进而可以实现对象的有效分类。
以上就是本说明书提供的发明构思,基于该发明构思就可以得到本方案,以下对本方案进行详细阐述。
图1为本说明书一个实施例提供的分类模型的训练方法流程图。所述方法的执行主体可以为具有处理能力的设备:服务器或者系统或者装置。如图1所示,所述方法具体可以包括:
步骤102,获取带分类标签的样本。
本说明书所述的分类模型可以用于进行业务处理。这里的业务处理可以包括但不限于基于图像识别的业务处理(如,人脸识别、目标检测等)、基于音频识别的业务处理(如,语音识别、声纹分析等)以及基于文本分析的业务处理(如,文本分类以及意图识别等)。
具体地,若上述业务处理为基于图像识别的业务处理,则上述获取的样本可以为图片,样本的分类标签可以为图片类别标签,训练得到的分类模型可以为图片分类模型。若上述业务处理为基于音频识别的业务处理,则上述获取的样本可以为音频,样本的分类标签可以为音频类别标签,训练得到的分类模型可以为音频分类模型。若上述业务处理为基于文本分析的业务处理,则上述获取的样本可以为文本,样本的分类标签可以为文本类别标签,训练得到的分类模型可以为文本分类模型。
此外,上述分类标签可以为预定标签集合中的任一分类标签。以分类模型所进行的 业务处理为进行文本分析为例来说,预定标签集合可以包括如下两种分类标签:广告和非广告,那么上述分类标签可以为广告或者非广告。
应理解,上述针对预定标签集合的说明只是一种示例性说明,在实际应用中,在不同的分类场景下,预定标签集合中的分类标签的数目可以多于两个。比如,在针对微博内容进行主题信息分类(文本分析中的一种)的场景下,若预定义的主题信息多于两个,那么预定标签集合中的分类标签的数目则可以多于两个。
步骤104,在嵌入层中,确定样本的特征向量以及分类标签的标签向量。
具体地,可以是先确定样本中各样本特征的表示向量,之后基于各样本特征的表示向量,确定样本的特征向量。以样本为文本为例来说,上述样本特征可以是指文中的词。再以音频为例来说,上述样本特征可以是指语音片段。
在样本为文本时,对应的特征向量的确定过程可以为:假设所获取的文本包含n个词:x1、x2、…、xn,那么针对其中的每个词xi,可以随机初始化m维的向量,从而可以得到各个词对应的词向量:f1、f2、…、fn。也即,其中的每个fi均为m维的向量。这里的n和m为正整数,且1≤i≤n。之后,可以将各个词对应的词向量进行拼接,得到文本的特征向量。
当然,在实际应用中,也可以基于词向量化方法,来确定各个词对应的词向量。这里的词向量化方法包括以下任一种:Word2Vec以及Glove等。
在样本为音频时,可以基于该音频所包括的各语音片段对应的表示向量,得到该音频的特征向量。其中,各语音片段对应的表示向量的维度相同,如,均为m维。
需要说明的是,在嵌入层中,还可以将文本的分类标签初始化为N维的向量(以下称为标签向量,表示为:hy)。这里的N为正整数。之后,在模型训练过程,对其进行不断地调整。可以理解的是,在模型训练结束后,可以得到该分类标签的最优表示向量。
步骤106,在卷积层中,基于若干不同宽度的卷积窗口,对样本的特征向量进行多次卷积处理,得到多个卷积结果。
这里的若干不同宽度的卷积窗口例如可以为:conv3、conv4以及conv5等,其具体个数可以人为预先设定。以conv3为例来说,其对应的宽度为3个词或者3个语音片段的宽度。假设每个词或者每个语音片段均为m维的向量,那么其宽度可以表示为:3*m。
以上述若干不同宽度的卷积窗口中任意的第一窗口为例来说,其相应的卷积处理具 体可以包括:对于该第一卷积窗口,基于第一卷积窗口的宽度,确定对应的卷积处理所选取的特征向量的维数。以第一卷积窗口为conv3为例来说,由于其宽度为3个词或者3个语音片段的宽度,从而可以确定对应的卷积处理所选取的特征向量的维数为3*m。基于确定的维数、第一卷积窗口的参数以及预定步长,对样本的特征向量进行卷积处理。这里的预定步长为一个词向量(或者一个语音片段对应的表示向量)的维数的整数倍,如,可以为m或者2m等。
需要说明的是,在实际应用中,基于某一宽度的卷积窗口的卷积处理可以是一个迭代过程。在一个例子中,迭代的终止条件也可以为迭代次数达到固定次数等,本说明书对此不作限定。
在一种实现方式中,上述基于第一卷积窗口的迭代过程具体可以包括:
从当前位置开始,在样本的特征向量中选取出上述维数个元素。对选取的上述维数个元素进行拼接,得到当前拼接向量。基于第一卷积窗口的参数,对当前拼接向量进行线性变换,得到线性变换结果。基于线性变换结果,确定多个卷积结果中的一个卷积结果。基于当前位置以及预定步长,确定下一位置,并将下一位置作为当前位置。
以样本为文本为例来说,假设文本的特征向量由词向量f1、f2、…、fn拼接得到,这里的任意的fi为m维的向量。在第一卷积窗口为conv3的情况下,若当前位置为词向量f t的开始位置,1≤t≤n,那么可以从特征向量中选取出3*m个元素,也即可以选取出3个词的词向量,且分别为:f t,f t+1和f t+2。在对选取的该3个词的词向量进行拼接后,可以得到当前拼接向量:[f t;f t+1;f t+2]。之后,可以对当前拼接向量进行如公式1所示的线性变换。
Figure PCTCN2021086271-appb-000001
其中,
Figure PCTCN2021086271-appb-000002
和b c,3均为conv3的窗口参数,h t为多个卷积结果中的一个卷积结果。
应理解,在公式1中直接将针对当前拼接向量的线性变换结果作为卷积结果ht。在实际应用中,在基于第一卷积窗口的参数,对当前拼接向量进行线性变换之后,还可以采用激活函数,对线性变换结果进行非线性变换。之后,将非线性变换结果作为上述卷积结果ht。这里的激活函数可以包括但不限于tanh函数、relu函数以及sigmoid函数等。
在确定出卷积结果ht之后,可以基于当前位置:词向量f t的开始位置和预定步长(假设为m),确定下一位置:词向量f t+1的开始位置,并将该下一位置更新为当前位置。 之后,基于更新后的当前位置重复执行上述迭代的步骤,直至满足迭代的终止条件。
可以理解的是,在基于第一卷积窗口的迭代过程结束之后,可以得到对应于第一卷积窗口的至少一个卷积结果。之后,可以基于下一卷积窗口执行相应的迭代过程;直至基于若干不同宽度的卷积窗口,均执行完成相应的迭代过程;至此,对应于卷积层的卷积处理全部完成。
在对应于卷积层的卷积处理全部完成之后,将对应于若干不同宽度的卷积窗口中每个卷积窗口的至少一个卷积结果排列起来,就可以得到上述多个卷积结果。在一个例子中,该多个卷积结果可以分别表示为:h1、h2、...、hk,其中的每个hi均为包含N个元素的向量。这里的i,k为正整数,且1≤i≤k。
需要说明的是,由于上述多个卷积结果均通过对样本中的若干连续排列的样本特征(如,3个、4个或5个词)进行卷积处理得到,从而其中的每个hi均可以看作是样本的局部信息。其中,不同宽度的卷积窗口的使用,可以更全面地捕获样本的局部信息。
可以理解的是,如下的池化层的处理可以理解为:从通过卷积层捕获的局部信息中确定出重要的局部信息,并对进行语义加强,以提升最终生成的样本表示向量的准确性,以下进行详细阐述。
步骤108,在池化层中,计算多个卷积结果中各卷积结果与分类标签的标签向量之间的相似度。并基于计算得到的相似度,确定对应于各卷积结果的注意力权重值。基于对应于各卷积结果的注意力权重值,对各卷积结果进行加权平均池化操作,得到池化结果。
在一个例子中,基于卷积层所得到的多个卷积结果h1-hk均为包含N个元素的向量。而由于前述提到标签向量hy也为N维向量,从而每个卷积结果hi与标签向量hy的维度相同。
在该例子中,上述计算多个卷积结果中各卷积结果与分类标签的标签向量之间的相似度的步骤可以包括:对于多个卷积结果中的每个卷积结果,至少计算该卷积结果与分类标签的标签向量之间的第一点积。至少计算卷积结果与预定标签集合中各分类标签的向量之间的第二点积,并对各第二点积进行求和,得到第一求和结果。基于第一点积与所述第二求和结果之比,确定该卷积结果与分类标签的标签向量之间的相似度。
需要说明的是,上述预定标签集合中各分类标签的向量可均为N维向量,且各分类标签的向量可通过随机初始化得到,也可是在模型训练过程中对初始向量调整得到。
在一种实现方式中,可以基于如下公式计算多个卷积结果中各卷积结果与分类标签的标签向量之间的相似度:
Figure PCTCN2021086271-appb-000003
其中,h t为第t个卷积结果,h y为分类标签的标签向量,a t为卷积结果h t与标签向量h y之间的相似度,其可为一个[0,1]之间的实数值。Y为预定标签集合,h y’为预定标签集合中某一分类标签的标签向量,g()为点积计算函数,其可包括但不限于如下定义方法:
g 1(h t,h y)=h t T·h y
g 2(h t,h y)=h t T·W a·h y      (公式3)
其中,W a为超参数,通过随机初始化,并通过反向梯度更新。
可以理解的是,当基于公式2计算卷积结果与分类标签的标签向量之间的相似度时,该卷积结果与标签向量的点积越大,该卷积结果与分类标签的相似度越大。而相似度越大,则说明该卷积结果所表示的局部信息越重要,从而可以对其进行语义加强。
在基于多个卷积结果h1-hk中的每个卷积结果,计算得到该卷积结果与分类标签的标签向量之间的相似度之后,可以将计算得到的相似度作为该卷积结果的注意力权重值。之后,基于该注意力权重值,来实现重要局部信息的语义加强,并进一步执行加权平均池化操作。
以下对基于对应于各卷积结果的注意力权重值,对各卷积结果进行加权平均池化操作的步骤进行说明。
在一种实现方式中,该步骤具体可以为:对于各卷积结果中任意的第一卷积结果,将该第一卷积结果的注意力权重值作为其中的N个元素的注意力权重值。基于各卷积结果各自的N个元素的注意力权重值,依次对各卷积结果的相同位置的元素进行加权平均池化操作,得到池化结果。其中,这里的加权平均池化操作具体可通过以下公式实现:
Figure PCTCN2021086271-appb-000004
其中,h t (j)为卷积结果h t的N个元素中的第j个元素,a t为卷积结果h t的注意力权重值,k为卷积结果的个数,q (j)为池化结果中的第j个元素。
从上述公式可以得出,池化结果也为一个包含N个元素的向量,从而池化结果与各卷积结果的维数相同。此外,从上述公式可以得出,本方案在基于各卷积结果的同一位置的元素计算池化结果的相同位置的元素时,各元素均被赋予了注意力权重值。也就是说,本方案的池化方法中引入了注意力机制,这有助于样本的真实重要信息的确定。在所述确定的样本的真实重要信息的准确度提高的情况下,所生成的样本表示向量的准确度就可以提高,从而基于其训练得到的分类模型则更精确。
步骤110,将通过池化层得到的池化结果作为样本的样本表示向量,并至少基于样本表示向量以及分类标签的标签向量,确定预测损失。
在一种实现方式中,可先计算样本表示向量与分类标签的标签向量之间的点积,之后基于该点积确定预测损失。这里所确定的预测损失可反相关于上述计算得到的点积。
在另一种实现方式中,也可以从上述预定标签集合中,随机选取不同于该分类标签的若干其它分类标签(也称“负标签”)。之后基于样本表示向量、分类标签的标签向量以及若干其它分类标签的标签向量,确定预测损失。
具体地,可以计算样本表示向量与分类标签的标签向量之间的第三点积。计算样本表示向量与所述若干其它分类标签的标签向量之间的第四点积,并对各第四点积进行求和,得到第二求和结果。基于第三点积与第二求和结果,确定预测损失,以使预测损失反相关于第三点积,且正相关于第四点积。
在一个例子中,可以基于如下公式确定上述预测损失。
Figure PCTCN2021086271-appb-000005
其中,L为预测损失,q为样本的样本表示向量,h y为上述样本的分类标签的标签向量,Y为预定标签集合,y"为上述随机选取的若干其它分类标签中的某个分类标签,h y"为其对应的标签向量。λ为预定义的超参数,其取值范围可以为:[2,10]。σ为激励函数,如,可以为sigmoid函数等。
需要特别强调的是,对于上述公式中的λ,可以基于其对应的取值范围设定相应的取值。如果预定标签集合中的分类标签的总数目小于λ的取值,那么将λ设定为分类标签的总数目,否则保持原来的取值不变。
需要说明的是,在基于如上的公式确定预测损失时,可以使得正分类标签(即上述样本的分类标签)与样本特征的相关度变高,而使得负分类标签(预定标签集合中除正 分类标签外的标签)与样本特征的相关度变低。
步骤112,基于预测损失,调整分类模型的参数。
在一个示例中,可以是通过计算梯度反向传播的方式,调整分类模型的参数。具体地,可以是调整嵌入层、卷积层和池化层的参数。其中,嵌入层的参数包括:样本中各样本特征的表示向量(如,词向量)以及分类标签的标签向量等。应理解,当各样本特征的标识向量以及标签向量调整后,那么也就相当于对上述注意力权重值进行了调整。
需要说明的是,在实际应用中,上述步骤102-步骤112是迭代执行的,且每次迭代所使用的模型参数均为上一次调整后的参数。该迭代的终止条件可以为迭代次数达到预定轮次或者公式5示出的损失函数的值收敛。
总之,本方案在嵌入层中,在初始时,会随机初始化样本中各样本特征的表示向量以及分类标签的标签向量。之后,在模型迭代训练的过程中,可以不断地对上述样本特征的表示向量以及标签向量进行调整,直至得到各样本特征以及分类标签的最优表示向量。这里得到的各样本特征以及分类标签的最优表示向量可以应用于后续对象分类的步骤,这可以提升对象分类的准确性。
其次,本方案在卷积层中,会依次对样本中不同位置、不同数目的连续样本特征进行卷积处理,从而可以更全面地捕获样本的局部信息。
最后,本方案在池化层中,针对每个卷积结果,基于该卷积结果与分类标签的标签向量之间的相似度,确定该卷积结果的注意力权重值。之后,基于确定的注意力权重值,对各卷积结果进行加权平均操作,得到样本的样本表示向量。由于不同的注意力权重值反映了对应卷积结果不同的重要度,从而基于该注意力权重值对各卷积结果进行池化操作时,可以有效地对重要的局部信息进行加强,从而可以更好地对样本进行表达。
综合以上,本本说明书一个实施例提供的分类模型的训练方法,可以大大提升分类模型的精度,进而可以实现对象的有效分类。
下面以分类模型用于进行基于文本分析的业务处理为例,即分类模型为文本分类模型,分类标签为文本类别标签,样本为文本为例,对文本分类模型的训练过程进行说明。
图2为本说明书提供的文本分类模型示意图。图2中,该文本分类模型可以包括:嵌入层、卷积层、池化层和softmax层。在嵌入层中,可以确定文本包含的n个词:x1、x2、…、xn各自的词向量,这里的n个词的词向量可以分别表示为:f1、f2、…、fn。此外,还可以确定文本的分类标签的标签向量hy。在卷积层中,基于conv3、conv4以 及conv5等不同宽度的卷积窗口,对各个词的词向量进行多次卷积处理,得到多个卷积结果。分别表示为:h1、h2、...、hk。在池化层中,计算各卷积结果与分类标签的标签向量之间的相似度。并基于计算得到的相似度,确定对应于各卷积结果的注意力权重值ai。基于对应于各卷积结果的注意力权重值,对各卷积结果进行加权平均池化操作,得到池化结果q。在softmax层,基于池化结果q(即文本表示向量)以及分类标签的标签向量,确定预测损失。
需要说明的是,图2示出的分类模型的训练方法,也同样适用于其它包含散列特征的对象分类模型的训练。如,也可以适用于声纹识别模型的训练等。可以理解的是,当应用于声纹识别模型的训练时,嵌入层中确定的各词的词向量的步骤可以替换确定用户输入的各个语音片段对应的表示向量。而卷积层和池化层的处理则与文本分类模型的训练过程类似,在此不复赘述。
以下结合两个例子,对本方案的模型训练过程中的基于注意力机制的池化方法进行说明。
先以文本分析的广告二分类场景为例来说,所训练的模型为广告预测模型,输入样本可以为一个文本句子,分类标签包括如下两种:广告和非广告。
那么对于样本:“澳门威尼斯人在线赌博平台上线了,快快下载,约小姐加微信22xxx32”。在对应的分类标签为:广告时,通过注意力机制的自适应池化方法,就可以分析出“上线”,“赌博”,“下载”等词与标签关系密切,在池化过程中,包含这些词语的相应的卷积层结果会得到更大的权值。
再以文本分析的色情二分类场景为例来说,所训练的模型为色情预测模型,输入样本同样可以为一个文本句子,分类标签包括如下两种:色情和非色情。
那么对于样本:“澳门威尼斯人在线赌博平台上线了,快快下载,约小姐加微信22xxx32”。在对应的分类标签为:色情时,那么通过注意力机制的自适应池化方法,就可以分析出“约”“小姐”等词语与标签关系密切。在池化过程中,包含这些词语的相应的卷积层结果则会得到更大的权值。
由此可以看出,本说明书提出的基于注意力机制的池化方法,针对同一文本,在其对应的分类标签不同的情况下,所分析得到的重要的局部信息也不一致。也就是说,通过本说明书提供的方案,在不同的模型训练场景下,可以得到文本的不同表达,这可以大大提升不同场景下所训练得到的模型的精度。
以上均是对分类模型训练过程的说明,以下对基于训练后的分类模型的对象分类过程进行说明。
图3为本说明书一个实施例提供的对象分类方法流程图。如图3所示,该方法可以包括:
步骤302,获取待分类对象以及若干预定类别。
这里的待分类对象可以包括但不限于以下任一种:待分类文本、待分类音频以及待分类图片。以待分类对象为待分类文本为例来说,上述若干预定类别可以包括广告和非广告两个类别。当然,在实际应用中,在不同的分类场景下,上述预定类别的数目也可以多于两个。比如,在针对微博内容进行主题信息分类的场景下,每个主题信息均可以作为一个预定类别。
步骤304,将若干预定类别中的每个类别依次作为当前类别,基于当前类别进行相似度计算。
上述步骤304具体可以包括如下步骤:
步骤3042,在嵌入层中,确定待分类对象的初始表示向量以及当前类别的类别向量。
这里,可以是基于待分类对象中的各特征的表示向量(如,词向量或者语音片段的表示向量),确定待分类对象的初始表示向量。其中,各特征的表示向量以及当前类别的类别向量均可以是在上述分类模型训练过程中所训练得到的。也即其均可以为上述提供的最优表示向量。
步骤3044,在卷积层中,基于若干不同宽度的卷积窗口,对初始表示向量进行多次卷积处理,得到多个卷积结果。
步骤3046,在池化层中,计算多个卷积结果中各卷积结果与当前类别的类别向量之间的相似度,并基于计算得到的相似度确定对应于各卷积结果的注意力权重值。基于对应于各卷积结果的注意力权重值,对各卷积结果进行加权平均池化操作,得到池化结果。
这里的步骤3044-步骤3046同上述步骤106-步骤108,在此不复赘述。
步骤3048,将池化结果作为待分类对象的最终表示向量,并计算最终表示向量与当前类别的类别向量之间的相似度。
这里的两个向量之间的相似度可以包括但不限于:余弦相似度、欧氏距离、曼哈顿距离以及皮尔逊相关系数等等。
步骤306,在基于若干预定类别中的每个类别进行上述相似度计算之后,基于计算得到的多个相似度,从若干预定类别中确定出待分类对象所属的目标类别。
举例来说,假设有两个预定类别:y0和y1,且该两个类别的类别向量分分别为:hy0和hy1。此外,还假设在将y0作为当前类别时,可以得到待分类对象的最终表示向量q0,并且可以计算得到hy0与q0相似度S0。在将y1作为当前类别时,可以得到待分类对象的最终表示向量q1,并且可以计算得到hy1与q1相似度S1。那么,如果S0>S1,则可以将类别y0作为待分类对象所属的目标类别。
综上,本说明书实施例提供的对象分类方法,可以基于在模型训练过程中所训练得到的特征的表示向量以及类别向量,对待分类对象进行分类,这可以大大提升对象分类的准确性。
以下针对本说明书实施例提供的对象分类方法的应用场景以及在实际应用场景中所达到的效果进行说明:
本说明书提供的方案可以广泛应用于各种文本分类场景中,比如,可以应用于各大社区的评论文本的分类场景中。为了增加用户活跃度,现有的社区均为用户提供了评论区,但部分用户利用评论区进行第三方产品广告宣传或者发布色情信息。因此,需要建立评论文本的分类模型,以判断某评论文本是否为违规文本。
在针对评论文本进行分类的过程中,经常会遇到有效信息被大部分无效信息覆盖的情况。例如,“朋友们,国庆佳节就快到了,您还等什么,无需出门,无需出门,无需出门,在家即可参与活动,澳门赌博上线了,网址:xxx”,这个文本是个典型的“赌博广告”,属于违规文本,但是很难被识别出来,原因在于有效信息“澳门赌博上线了,网址:xxx”只是文本的很小一部分,前面有很长的无关信息。
针对上述评论文本,如果是用传统的平均池化的方式,那么很长的无关信息就会把有效信息冲淡(平均化),从而导致其不能被正确表达。如果是用传统的最大池化的方式,其加强的信息往往只根据文本内容决定,而不能对真正违规的内容进行加强,比如,其会加强“朋友们”,“国庆佳节”,“出门”,“参与活动”,“赌博”和“上线”等局部信息,这也会使得文本最终表达不准确。
而如果采用本方案提供的文本分类方法,由于其可以结合基于注意力机制的池化方法,也即其会加强与标签有关的局部信息,从而可以准确确定出此文本的有效信息内容,进而可以提升文本表示向量的准确性。在文本表示向量的准确性提高的前提下, 基于其训练得到的文本分类模型可以更精确,进而可以实现文本的有效分类。
与上述分类模型的训练方法对应地,本说明书一个实施例还提供一种分类模型的训练装置。该分类模型包括嵌入层、卷积层和池化层。如图4所示,该装置可包括:
获取单元402,用于获取带分类标签的样本。
确定单元404,用于在嵌入层中,确定获取单元402获取的样本的特征向量以及分类标签的标签向量。
卷积单元406,用于在卷积层中,基于若干不同宽度的卷积窗口,对确定单元404确定的样本的特征向量进行多次卷积处理,得到多个卷积结果。
上述若干不同宽度的卷积窗口包括第一卷积窗口。
卷积单元406具体可以用于:对于第一卷积窗口,基于第一卷积窗口的宽度,确定对应的卷积处理所选取的特征向量的维数;基于维数、第一卷积窗口的参数以及预定步长,对样本的特征向量进行卷积处理。
卷积单元406还具体可以用于迭代地执行以下步骤,直至达到预定次数:从当前位置开始,在样本的特征向量中选取上述维数个元素;对选取的上述维数个元素进行拼接,得到当前拼接向量;基于第一卷积窗口的参数,对当前拼接向量进行线性变换,得到线性变换结果;基于线性变换结果,确定多个卷积结果中的一个卷积结果;基于当前位置以及预定步长,确定下一位置,并将下一位置作为当前位置。
卷积单元406还具体可以用于:将线性变换结果作为多个卷积结果中的一个卷积结果;或者,采用激活函数,对线性变换结果进行非线性变换,将非线性变换结果作为多个卷积结果中的一个卷积结果。
池化单元408,用于在池化层中,计算卷积单元406得到的多个卷积结果中各卷积结果与分类标签的标签向量之间的相似度,并基于计算得到的相似度,确定对应于所各卷积结果的注意力权重值。基于对应于各卷积结果的注意力权重值,对各卷积结果进行加权平均池化操作,得到池化结果。
上述每个卷积结果为一个包含N个元素的向量。池化单元408具体可以用于:对于各卷积结果中任意的第一卷积结果,将第一卷积结果的注意力权重值作为其中的N个元素的注意力权重值;基于各卷积结果各自的N个元素的注意力权重值,依次对各卷积结果的相同位置的元素进行加权平均池化操作,得到池化结果。
上述分类标签属于预定标签集合。池化单元408具体可以用于:对于多个卷积结果中的每个卷积结果,至少计算该卷积结果与分类标签的标签向量之间的第一点积;至少计算该卷积结果与预定标签集合中各分类标签的向量之间的第二点积,并对各第二点积进行求和,得到第一求和结果;基于第一点积与第一求和结果之比,确定卷积结果与分类标签的标签向量之间的相似度。
确定单元404,还用于将池化单元408得到的池化结果作为样本的样本表示向量,并至少基于样本表示向量以及分类标签的标签向量,确定预测损失。
确定单元404具体可以用于:从包含分类标签的预定标签集合中,随机选取不同于分类标签的若干其它分类标签;基于样本表示向量、分类标签的标签向量以及若干其它分类标签的标签向量,确定预测损失。
确定单元404还具体可以用于:计算样本表示向量与分类标签的标签向量之间的第三点积;计算样本表示向量与若干其它分类标签的标签向量之间的第四点积,并对各第四点积进行求和,得到第二求和结果;基于第三点积与第二求和结果,确定预测损失,以使预测损失反相关于第三点积,且正相关于第四点积。
调整单元410,用于基于确定单元404确定的预测损失,调整分类模型的参数。
可选地,上述分类模型为文本分类模型,分类标签为文本类别标签,样本为文本;或,上述分类模型为图片分类模型,分类标签为图片类别标签,样本为图片;或,上述分类模型为音频分类模型,分类标签为音频类别标签,样本为音频。
本说明书上述实施例装置的各功能模块的功能,可以通过上述方法实施例的各步骤来实现,因此,本说明书一个实施例提供的装置的具体工作过程,在此不复赘述。
本说明书一个实施例提供的分类模型的训练装置,可以提升分类模型的精度,进而可以实现对象的有效分类。
与上述对象分类方法对应地,本说明书一个实施例还提供的一种对象分类装置。该装置基于预先训练的分类模型运行,该分类模型包括:嵌入层、卷积层和池化层。如图5所示,该装置可以包括:
获取单元502,用于获取待分类对象以及若干预定类别。
计算单元504,用于将获取单元502获取的若干预定类别中的每个类别依次作为当前类别,基于当前类别进行相似度计算。
计算单元504包括:确定子单元5042,用于在嵌入层中,确定待分类对象的初始表示向量以及当前类别的类别向量;卷积子单元5044,用于在卷积层中,基于若干不同宽度的卷积窗口,对初始表示向量进行多次卷积处理,得到多个卷积结果;池化子单元5046,用于在池化层中,计算多个卷积结果中各卷积结果与当前类别的类别向量之间的相似度,并基于计算得到的相似度,确定对应于各卷积结果的注意力权重值,基于对应于各卷积结果的注意力权重值,对各卷积结果进行加权平均池化操作,得到池化结果;获取子单元5048,用于将池化结果作为待分类对象的最终表示向量,并计算最终表示向量与当前类别的类别向量之间的相似度。
确定单元506,用于在计算单元504基于若干预定类别中的每个类别进行相似度计算之后,基于计算得到的多个相似度,从若干预定类别中确定出待分类对象所属的目标类别。
本说明书上述实施例装置的各功能模块的功能,可以通过上述方法实施例的各步骤来实现,因此,本说明书一个实施例提供的装置的具体工作过程,在此不复赘述。
本说明书一个实施例提供的对象分类装置,可以实现对象的有效分类。
另一方面,本说明书的实施例提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行图1或图3所示的方法。
另一方面,本说明书的实施例提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现图1或图3所示的方法。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
结合本说明书公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存 储介质可以位于ASIC中。另外,该ASIC可以位于服务器中。当然,处理器和存储介质也可以作为分立组件存在于服务器中。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
以上所述的具体实施方式,对本说明书的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本说明书的具体实施方式而已,并不用于限定本说明书的保护范围,凡在本说明书的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本说明书的保护范围之内。

Claims (22)

  1. 一种分类模型的训练方法,所述分类模型包括嵌入层、卷积层和池化层;所述方法包括:
    获取带分类标签的样本;
    在所述嵌入层中,确定所述样本的特征向量以及所述分类标签的标签向量;
    在所述卷积层中,基于若干不同宽度的卷积窗口,对所述样本的特征向量进行多次卷积处理,得到多个卷积结果;
    在所述池化层中,计算所述多个卷积结果中各卷积结果与所述分类标签的标签向量之间的相似度;并基于计算得到的相似度,确定对应于所述各卷积结果的注意力权重值;基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果;
    将所述池化结果作为所述样本的样本表示向量,并至少基于所述样本表示向量以及所述分类标签的标签向量,确定预测损失;
    基于所述预测损失,调整所述分类模型的参数。
  2. 根据权利要求1所述的方法,其中,每个卷积结果为一个包含N个元素的向量;
    所述基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果,包括:
    对于所述各卷积结果中任意的第一卷积结果,将所述第一卷积结果的注意力权重值作为其中的N个元素的注意力权重值;
    基于所述各卷积结果各自的N个元素的注意力权重值,依次对所述各卷积结果的相同位置的元素进行加权平均池化操作,得到所述池化结果。
  3. 根据权利要求1所述的方法,所述若干不同宽度的卷积窗口包括第一卷积窗口;
    所述基于若干不同宽度的卷积窗口,对所述样本的特征向量进行多次卷积处理,包括:
    对于所述第一卷积窗口,基于所述第一卷积窗口的宽度,确定对应的卷积处理所选取的特征向量的维数;
    基于所述维数、所述第一卷积窗口的参数以及预定步长,对所述样本的特征向量进行卷积处理。
  4. 根据权利要求3所述的方法,所述基于所述维数、所述第一卷积窗口的参数以及预定步长,对所述样本的特征向量进行卷积处理,包括:
    迭代地执行以下步骤,直至达到预定次数:
    从当前位置开始,在所述样本的特征向量中选取所述维数个元素;
    对选取的所述维数个元素进行拼接,得到当前拼接向量;
    基于所述第一卷积窗口的参数,对所述当前拼接向量进行线性变换,得到线性变换结果;
    基于所述线性变换结果,确定所述多个卷积结果中的一个卷积结果;
    基于当前位置以及所述预定步长,确定下一位置,并将下一位置作为当前位置。
  5. 根据权利要求4所述的方法,所述基于所述线性变换结果,确定所述多个卷积结果中的一个卷积结果,包括:
    将所述线性变换结果作为所述多个卷积结果中的一个卷积结果;或者,
    采用激活函数,对所述线性变换结果进行非线性变换;
    将非线性变换结果作为所述多个卷积结果中的一个卷积结果。
  6. 根据权利要求1所述的方法,所述分类标签属于预定标签集合;
    所述计算所述多个卷积结果中各卷积结果与所述分类标签的标签向量之间的相似度,包括:
    对于所述多个卷积结果中的每个卷积结果,至少计算所述卷积结果与所述分类标签的标签向量之间的第一点积;
    至少计算所述卷积结果与所述预定标签集合中各分类标签的向量之间的第二点积,并对所述第二点积进行求和,得到第一求和结果;
    基于所述第一点积与所述第一求和结果之比,确定所述卷积结果与所述分类标签的标签向量之间的相似度。
  7. 根据权利要求1所述的方法,所述至少基于所述样本表示向量以及所述分类标签的标签向量,确定预测损失,包括:
    从包含所述分类标签的预定标签集合中,随机选取不同于所述分类标签的若干其它分类标签;
    基于所述样本表示向量、所述分类标签的标签向量以及所述若干其它分类标签的标签向量,确定预测损失。
  8. 根据权利要求7所述的方法,所述基于所述样本表示向量、所述分类标签的标签向量以及所述若干其它分类标签的标签向量,确定预测损失,包括:
    计算所述样本表示向量与所述分类标签的标签向量之间的第三点积;
    计算所述样本表示向量与所述若干其它分类标签的标签向量之间的第四点积,并对所述第四点积进行求和,得到第二求和结果;
    基于所述第三点积与所述第二求和结果,确定所述预测损失,以使所述预测损失反相关于所述第三点积,且正相关于所述第四点积。
  9. 根据权利要求1所述的方法,
    所述分类模型为文本分类模型,所述分类标签为文本类别标签,所述样本为文本;或,
    所述分类模型为图片分类模型,所述分类标签为图片类别标签,所述样本为图片;或,
    所述分类模型为音频分类模型,所述分类标签为音频类别标签,所述样本为音频。
  10. 一种对象分类方法,所述方法基于预先训练的分类模型执行,所述分类模型包括嵌入层、卷积层和池化层;所述方法包括:
    获取待分类对象以及若干预定类别;
    将所述若干预定类别中的每个类别依次作为当前类别,基于当前类别进行相似度计算,所述相似度计算包括:
    在所述嵌入层中,确定所述待分类对象的初始表示向量以及所述当前类别的类别向量;
    在所述卷积层中,基于若干不同宽度的卷积窗口,对所述初始表示向量进行多次卷积处理,得到多个卷积结果;
    在所述池化层中,计算所述多个卷积结果中各卷积结果与所述当前类别的类别向量之间的相似度;并基于计算得到的相似度,确定对应于所述各卷积结果的注意力权重值;基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果;
    将所述池化结果作为所述待分类对象的最终表示向量,并计算所述最终表示向量与所述当前类别的类别向量之间的相似度;
    在基于所述若干预定类别中的每个类别进行所述相似度计算之后,基于计算得到的多个相似度,从所述若干预定类别中确定出所述待分类对象所属的目标类别。
  11. 一种分类模型的训练装置,所述分类模型包括嵌入层、卷积层和池化层;所述装置包括:
    获取单元,用于获取带分类标签的样本;
    确定单元,用于在所述嵌入层中,确定所述获取单元获取的所述样本的特征向量以及所述分类标签的标签向量;
    卷积单元,用于在所述卷积层中,基于若干不同宽度的卷积窗口,对所述确定单 元确定的所述样本的特征向量进行多次卷积处理,得到多个卷积结果;
    池化单元,用于在所述池化层中,计算所述卷积单元得到的所述多个卷积结果中各卷积结果与所述分类标签的标签向量之间的相似度;并基于计算得到的相似度,确定对应于所述各卷积结果的注意力权重值;基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果;
    所述确定单元,还用于将所述池化单元得到的所述池化结果作为所述样本的样本表示向量,并至少基于所述样本表示向量以及所述分类标签的标签向量,确定预测损失;
    调整单元,用于基于所述确定单元确定的所述预测损失,调整所述分类模型的参数。
  12. 根据权利要求11所述的装置,其中,每个卷积结果为一个包含N个元素的向量,所述池化单元具体用于:
    对于所述各卷积结果中任意的第一卷积结果,将所述第一卷积结果的注意力权重值作为其中的N个元素的注意力权重值;
    基于所述各卷积结果各自的N个元素的注意力权重值,依次对所述各卷积结果的相同位置的元素进行加权平均池化操作,得到所述池化结果。
  13. 根据权利要求11所述的装置,所述若干不同宽度的卷积窗口包括第一卷积窗口,所述卷积单元具体用于:
    对于所述第一卷积窗口,基于所述第一卷积窗口的宽度,确定对应的卷积处理所选取的特征向量的维数;
    基于所述维数、所述第一卷积窗口的参数以及预定步长,对所述样本的特征向量进行卷积处理。
  14. 根据权利要求13所述的装置,所述卷积单元还具体用于:
    迭代地执行以下步骤,直至达到预定次数:
    从当前位置开始,在所述样本的特征向量中选取所述维数个元素;
    对选取的所述维数个元素进行拼接,得到当前拼接向量;
    基于所述第一卷积窗口的参数,对所述当前拼接向量进行线性变换,得到线性变换结果;
    基于所述线性变换结果,确定所述多个卷积结果中的一个卷积结果;
    基于当前位置以及所述预定步长,确定下一位置,并将下一位置作为当前位置。
  15. 根据权利要求14所述的装置,所述卷积单元还具体用于:
    将所述线性变换结果作为所述多个卷积结果中的一个卷积结果;或者,
    采用激活函数,对所述线性变换结果进行非线性变换;
    将非线性变换结果作为所述多个卷积结果中的一个卷积结果。
  16. 根据权利要求11所述的装置,所述分类标签属于预定标签集合,所述池化单元具体用于:
    对于所述多个卷积结果中的每个卷积结果,至少计算所述卷积结果与所述分类标签的标签向量之间的第一点积;
    至少计算所述卷积结果与所述预定标签集合中各分类标签的向量之间的第二点积,并对所述第二点积进行求和,得到第一求和结果;
    基于所述第一点积与所述第一求和结果之比,确定所述卷积结果与所述分类标签的标签向量之间的相似度。
  17. 根据权利要求11所述的装置,所述确定单元具体用于:
    从包含所述分类标签的预定标签集合中,随机选取不同于所述分类标签的若干其它分类标签;
    基于所述样本表示向量、所述分类标签的标签向量以及所述若干其它分类标签的标签向量,确定预测损失。
  18. 根据权利要求17所述的装置,所述确定单元还具体用于:
    计算所述样本表示向量与所述分类标签的标签向量之间的第三点积;
    计算所述样本表示向量与所述若干其它分类标签的标签向量之间的第四点积,并对所述第四点积进行求和,得到第二求和结果;
    基于所述第三点积与所述第二求和结果,确定所述预测损失,以使所述预测损失反相关于所述第三点积,且正相关于所述第四点积。
  19. 根据权利要求11所述的装置,
    所述分类模型为文本分类模型,所述分类标签为文本类别标签,所述样本为文本;或,
    所述分类模型为图片分类模型,所述分类标签为图片类别标签,所述样本为图片;或,
    所述分类模型为音频分类模型,所述分类标签为音频类别标签,所述样本为音频。
  20. 一种对象分类装置,所述装置基于预先训练的分类模型运行,所述分类模型包括嵌入层、卷积层和池化层;所述装置包括:
    获取单元,用于获取待分类对象以及若干预定类别;
    计算单元,用于将所述获取单元获取的所述若干预定类别中的每个类别依次作为 当前类别,基于当前类别进行相似度计算;
    所述计算单元包括:
    确定子单元,用于在所述嵌入层中,确定所述待分类对象的初始表示向量以及所述当前类别的类别向量;
    卷积子单元,用于在所述卷积层中,基于若干不同宽度的卷积窗口,对所述初始表示向量进行多次卷积处理,得到多个卷积结果;
    池化子单元,用于在所述池化层中,计算所述多个卷积结果中各卷积结果与所述当前类别的类别向量之间的相似度;并基于计算得到的相似度,确定对应于所述各卷积结果的注意力权重值;基于对应于所述各卷积结果的注意力权重值,对所述各卷积结果进行加权平均池化操作,得到池化结果;
    获取子单元,用于将所述池化结果作为所述待分类对象的最终表示向量,并计算所述最终表示向量与所述当前类别的类别向量之间的相似度;
    确定单元,用于在所述计算单元基于所述若干预定类别中的每个类别进行所述相似度计算之后,基于计算得到的多个相似度,从所述若干预定类别中确定出所述待分类对象所属的目标类别。
  21. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-9中任一项所述的方法或权利要求10所述的方法。
  22. 一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-9中任一项所述的方法或权利要求10所述的方法。
PCT/CN2021/086271 2020-04-10 2021-04-09 分类模型的训练、对象分类 WO2021204269A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010276683.9A CN111178458B (zh) 2020-04-10 2020-04-10 分类模型的训练、对象分类方法及装置
CN202010276683.9 2020-04-10

Publications (1)

Publication Number Publication Date
WO2021204269A1 true WO2021204269A1 (zh) 2021-10-14

Family

ID=70653464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/086271 WO2021204269A1 (zh) 2020-04-10 2021-04-09 分类模型的训练、对象分类

Country Status (2)

Country Link
CN (1) CN111178458B (zh)
WO (1) WO2021204269A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114970716A (zh) * 2022-05-26 2022-08-30 支付宝(杭州)信息技术有限公司 表征模型的训练方法、装置、可读存储介质及计算设备

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178458B (zh) * 2020-04-10 2020-08-14 支付宝(杭州)信息技术有限公司 分类模型的训练、对象分类方法及装置
CN111340605B (zh) * 2020-05-22 2020-11-24 支付宝(杭州)信息技术有限公司 训练用户行为预测模型、用户行为预测的方法和装置
CN111652315B (zh) * 2020-06-04 2023-06-02 广州虎牙科技有限公司 模型训练、对象分类方法和装置、电子设备及存储介质
CN111507320A (zh) * 2020-07-01 2020-08-07 平安国际智慧城市科技股份有限公司 后厨违规行为检测方法、装置、设备和存储介质
CN112132178B (zh) * 2020-08-19 2023-10-13 深圳云天励飞技术股份有限公司 对象分类方法、装置、电子设备及存储介质
CN112101437B (zh) * 2020-09-07 2024-05-31 平安科技(深圳)有限公司 基于图像检测的细粒度分类模型处理方法、及其相关设备
CN112214992A (zh) * 2020-10-14 2021-01-12 哈尔滨福涛科技有限责任公司 一种基于深度学习和规则结合的记叙文结构分析方法
CN113159840B (zh) * 2021-04-12 2024-06-07 深圳市腾讯信息技术有限公司 一种对象类型预测方法、装置和存储介质
CN113780066B (zh) * 2021-07-29 2023-07-25 苏州浪潮智能科技有限公司 行人重识别方法、装置、电子设备及可读存储介质
CN114360520A (zh) * 2022-01-14 2022-04-15 平安科技(深圳)有限公司 语音分类模型的训练方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984745A (zh) * 2018-07-16 2018-12-11 福州大学 一种融合多知识图谱的神经网络文本分类方法
US20190180147A1 (en) * 2016-06-30 2019-06-13 Konica Minolta Laboratory U.S.A., Inc. Method and system for cell annotation with adaptive incremental learning
CN110209823A (zh) * 2019-06-12 2019-09-06 齐鲁工业大学 一种多标签文本分类方法及系统
CN110263162A (zh) * 2019-06-05 2019-09-20 阿里巴巴集团控股有限公司 卷积神经网络及其进行文本分类的方法、文本分类装置
CN111178458A (zh) * 2020-04-10 2020-05-19 支付宝(杭州)信息技术有限公司 分类模型的训练、对象分类方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142995B (zh) * 2014-07-30 2017-09-26 中国科学院自动化研究所 基于视觉属性的社会事件识别方法
JP2019020893A (ja) * 2017-07-13 2019-02-07 国立研究開発法人情報通信研究機構 ノン・ファクトイド型質問応答装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190180147A1 (en) * 2016-06-30 2019-06-13 Konica Minolta Laboratory U.S.A., Inc. Method and system for cell annotation with adaptive incremental learning
CN108984745A (zh) * 2018-07-16 2018-12-11 福州大学 一种融合多知识图谱的神经网络文本分类方法
CN110263162A (zh) * 2019-06-05 2019-09-20 阿里巴巴集团控股有限公司 卷积神经网络及其进行文本分类的方法、文本分类装置
CN110209823A (zh) * 2019-06-12 2019-09-06 齐鲁工业大学 一种多标签文本分类方法及系统
CN111178458A (zh) * 2020-04-10 2020-05-19 支付宝(杭州)信息技术有限公司 分类模型的训练、对象分类方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114970716A (zh) * 2022-05-26 2022-08-30 支付宝(杭州)信息技术有限公司 表征模型的训练方法、装置、可读存储介质及计算设备

Also Published As

Publication number Publication date
CN111178458B (zh) 2020-08-14
CN111178458A (zh) 2020-05-19

Similar Documents

Publication Publication Date Title
WO2021204269A1 (zh) 分类模型的训练、对象分类
US11816440B2 (en) Method and apparatus for determining user intent
CN109101537B (zh) 基于深度学习的多轮对话数据分类方法、装置和电子设备
CN112732911B (zh) 基于语义识别的话术推荐方法、装置、设备及存储介质
CN110569427B (zh) 一种多目标排序模型训练、用户行为预测方法及装置
WO2019210695A1 (zh) 模型训练和业务推荐
WO2022199504A1 (zh) 内容识别方法、装置、计算机设备和存储介质
US20210049298A1 (en) Privacy preserving machine learning model training
CN110555714A (zh) 用于输出信息的方法和装置
EP3903202A1 (en) Data augmentation in transaction classification using a neural network
CN114830133A (zh) 利用多个正例的监督对比学习
KR102453549B1 (ko) 인공지능과 빅데이터를 이용하여 자동 트레이딩 봇을 지원하는 주식 거래 플랫폼 서버 및 이의 동작 방법
CN111667024B (zh) 内容推送方法、装置、计算机设备和存储介质
CN113011532A (zh) 分类模型训练方法、装置、计算设备及存储介质
CN114090401A (zh) 处理用户行为序列的方法及装置
CN113887214A (zh) 基于人工智能的意愿推测方法、及其相关设备
CN111523311B (zh) 一种搜索意图识别方法及装置
CN117751368A (zh) 隐私敏感神经网络训练
CN116431912A (zh) 用户画像推送方法及装置
CN113222609B (zh) 风险识别方法和装置
CN112559640B (zh) 图谱表征系统的训练方法及装置
CN109902169B (zh) 基于电影字幕信息提升电影推荐系统性能的方法
CN112115258B (zh) 一种用户的信用评价方法、装置、服务器及存储介质
CN113836439B (zh) 用户匹配方法、计算设备和计算机可读存储介质
KR102453555B1 (ko) Sns 데이터를 기반으로 유명인 가중치에 따른 예측 주가를 제공하는 주식 거래 플랫폼 서버 및 이의 동작 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21785696

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21785696

Country of ref document: EP

Kind code of ref document: A1