WO2021143396A1 - Procédé et appareil pour effectuer une prédiction de classification à l'aide d'un modèle de classification de texte - Google Patents

Procédé et appareil pour effectuer une prédiction de classification à l'aide d'un modèle de classification de texte Download PDF

Info

Publication number
WO2021143396A1
WO2021143396A1 PCT/CN2020/134518 CN2020134518W WO2021143396A1 WO 2021143396 A1 WO2021143396 A1 WO 2021143396A1 CN 2020134518 W CN2020134518 W CN 2020134518W WO 2021143396 A1 WO2021143396 A1 WO 2021143396A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
attention
sequence
label
text
Prior art date
Application number
PCT/CN2020/134518
Other languages
English (en)
Chinese (zh)
Inventor
熊涛
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021143396A1 publication Critical patent/WO2021143396A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • One or more embodiments of the present specification relate to the field of machine learning, and in particular, to a method and device for classification prediction using a text classification model.
  • Text classification is a common and typical type of natural language processing tasks performed by computers, and is widely used in a variety of business implementation scenarios.
  • the questions raised by the user need to be classified as input text for user intention recognition, automatic question and answer, or manual customer service dispatch.
  • the classified categories can correspond to various standard questions organized in advance.
  • the standard question corresponding to the user's casual and colloquial question description can be determined, and then the answer to the question can be determined and pushed to the user.
  • the classified categories can correspond to the manual customer service skill sets that are trained for different knowledge fields.
  • Text classification can also be used in various application scenarios, such as document data classification, public opinion analysis, spam identification, and so on.
  • the accuracy of text classification is the core issue of concern. Therefore, it is hoped that there will be an improved solution that can further improve the accuracy of text classification.
  • One or more embodiments of this specification describe a method and device for text classification prediction using a text classification model.
  • the text classification model comprehensively considers the semantic information of text fragments of different lengths and the correlation information with the label description text. Text classification prediction, thereby improving the accuracy and efficiency of classification prediction.
  • a method for classification prediction using a text classification model which is used to predict the category corresponding to the input text in predetermined K categories;
  • the text classification model includes an embedding layer, a convolutional layer, and attention Layer and classifier,
  • the attention layer includes a first attention module, and the method includes: obtaining K label vectors corresponding to the K categories, wherein each label vector describes the text of the corresponding category by the label vector Obtained by word embedding; using the embedding layer to perform word embedding on the input text to obtain a word vector sequence; inputting the word vector sequence into the convolutional layer, and the convolutional layer uses several text fragments of different lengths Corresponding several convolution windows, perform convolution processing on the word vector sequence to obtain several fragment vector sequences; the word vector sequence and several fragment vector sequences constitute a vector sequence set; each of the vector sequence sets is separately The vector sequence is input to the first attention module to perform first attention processing to obtain each first sequence vector corresponding to each vector sequence; wherein, the first attention processing
  • the input text is a user question; correspondingly, the label description text corresponding to each of the K categories includes a standard question description text.
  • the K label vectors are predetermined in the following manner: for each of the K categories, a label description text corresponding to the category is obtained; word embedding is performed on the label description text, Obtain the word vector of each description word contained in the label description text; synthesize the word vectors of each description word to obtain the label vector corresponding to the category.
  • the first weighting factor corresponding to each vector element is specifically determined by the following method: For each vector element in the input vector sequence, calculate the difference between the vector element and the K label vectors K similarities between the two; based on the maximum value of the K similarities, the first weighting factor corresponding to the vector element is determined.
  • calculating the K similarities between the vector element and the K label vectors may include: calculating the cosine similarity between the vector element and each label vector; or, based on the The Euclidean distance between the vector element and each label vector determines the similarity; or, based on the dot product result of the vector element and each label vector, the similarity is determined.
  • determining the first weighting factor corresponding to the vector element specifically includes: determining the value of the vector element based on the maximum value of the K similarities Mutual attention score; according to each mutual attention score corresponding to each vector element, normalize the mutual attention score of the vector element to obtain the first weighting factor corresponding to the vector element.
  • obtaining the first attention representation of the input text according to the respective first sequence vectors may specifically include: synthesizing the respective first sequence vectors to obtain the first attention representation, The synthesis includes one of the following: summation, weighted summation, and averaging.
  • the attention layer may further include a second attention module; correspondingly, the method further includes inputting each vector sequence in the vector sequence set into the second attention module to perform The second attention processing obtains each second sequence vector corresponding to each vector sequence; wherein, the second attention processing includes, for each vector element in the input vector sequence, according to the vector element and the input vector sequence Determine the second weighting factor corresponding to the vector element, and use the second weighting factor to weight and sum each vector element in the input sequence; according to each second sequence vector, Obtain the second attention representation of the input text.
  • the characterization vector may be determined according to the first attention representation and the second attention representation.
  • the second weighting factor corresponding to the vector element may be determined in the following manner: calculating each similarity between the vector element and the other vector elements; The average value determines the second weighting factor corresponding to the vector element.
  • the attention layer further includes a third attention module in which attention vectors are maintained; the method further includes forming a total sequence based at least on the splicing of each vector sequence in the vector sequence set; using The third attention module performs third attention processing on the total sequence, and the third attention processing includes, for each vector element in the total sequence, according to the relationship between the vector element and the attention The similarity between the vectors is determined, the third weighting factor corresponding to the vector element is determined, and the third weighting factor is used to weight and sum each vector element in the total sequence to obtain the third attention representation of the input text .
  • the characterization vector may be determined according to the first attention representation and the third attention representation.
  • the attention layer includes a first attention module, a second attention module, and a third attention module
  • it may be based on the first attention representation, the second attention representation and the third attention Indicates that the characterization vector is determined.
  • the first attention expression, the second attention expression and the third attention expression may be weighted and summed based on a predetermined weight coefficient to obtain the characterization vector.
  • the attention layer further includes a fusion module; before forming the total sequence input to the third attention module, the method further includes: separately inputting each vector sequence in the vector sequence set into the fusion The module performs fusion conversion processing to obtain each fusion sequence corresponding to each vector sequence, where the fusion conversion processing includes, for each vector element in the input vector sequence, according to the vector element and each label vector in the K label vectors The similarity between each tag vector is determined, and the tag weight factor corresponding to each tag vector is determined, and the vector element is converted into the fusion vector of the K tag vector weighted summation based on the tag weight factor, thereby converting the input vector sequence into The corresponding fusion sequence.
  • each vector sequence and each fusion sequence may be spliced to obtain the total sequence, which is input into the third attention module.
  • the input text is training text
  • the training text corresponds to a category label indicating its true category
  • the method further includes: obtaining a text prediction loss according to the category prediction result and the category label ; Determine the total prediction loss at least according to the text prediction loss; update the text classification model in a direction that reduces the total prediction loss, thereby training the text classification model.
  • the method further includes: inputting the K label vectors corresponding to the K categories into the classifier respectively to obtain the corresponding K prediction results; and comparing the K respectively. For each category and its corresponding prediction result, the label prediction loss is obtained based on the comparison result. In this case, the total loss can be determined according to the text prediction loss and the label prediction loss, so as to perform model training.
  • a device for classification prediction using a text classification model which is used to predict the category corresponding to the input text in predetermined K categories;
  • the text classification model includes an embedding layer, a convolutional layer, and an attention Layer and classifier
  • the attention layer includes a first attention module
  • the device includes: a label vector acquisition unit configured to acquire K label vectors corresponding to the K categories, wherein each label vector passes The label description text of the corresponding category is obtained by word embedding;
  • the word sequence obtaining unit is configured to use the embedding layer to perform word embedding on the input text to obtain a word vector sequence;
  • the fragment sequence obtaining unit is configured to obtain the word vector
  • the sequence is input to the convolutional layer, and the convolutional layer uses several convolution windows corresponding to several text fragments of different lengths to perform convolution processing on the word vector sequence to obtain several fragment vector sequences;
  • the word vector A sequence and several fragment vector sequences constitute a vector sequence set;
  • the first attention unit is configured to input each vector sequence
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
  • the convolutional layer and the attention layer in the text classification model are used to comprehensively consider the text fragments of different lengths and the similarity information with the label vector to obtain the characterization vector, thereby making the characterization-based
  • the vector is used for text classification, more consideration is given to the contextual semantic information of different lengths and the relevance information of the label description text, so as to obtain more accurate category prediction results.
  • Fig. 1 is a schematic diagram of a text classification model according to an embodiment disclosed in this specification
  • Figure 2 shows a flow chart of a method for text classification using a text classification model according to an embodiment
  • FIG. 3 shows a schematic diagram of performing convolution processing on a word vector sequence in an embodiment
  • FIG. 4 shows a schematic diagram of performing first attention processing on an input vector sequence in an embodiment
  • Figure 5 shows a schematic diagram of performing second attention processing on an input vector sequence in an embodiment
  • Fig. 6 shows a schematic diagram of performing fusion conversion processing on an input vector sequence in an embodiment
  • FIG. 7 shows a schematic diagram of attention processing of the attention layer in an embodiment
  • Figure 8 shows the further method steps involved in the model training stage
  • Fig. 9 shows a schematic block diagram of a text classification prediction device according to an embodiment.
  • a new text classification model is proposed, which further improves the classification and prediction effect of text by comprehensively considering the information of text fragments and the information of label description text. .
  • FIG. 1 is a schematic diagram of a text classification model according to an embodiment disclosed in this specification.
  • the text classification model includes an embedding layer 11, a convolutional layer 12, an attention layer 13, and a classifier 14.
  • the embedding layer 11 uses a specific word embedding algorithm to convert each input word into a word vector. Using the embedding layer 11, the label description texts corresponding to the K categories as classification targets can be converted into K label vectors in advance. When classifying and predicting the input text, the embedding layer 11 embeds the input text and converts it into a sequence of word vectors.
  • the convolution layer 12 is used to perform convolution processing on the word vector sequence.
  • the convolutional layer 12 uses multiple convolution kernels or convolution windows of different widths to perform convolution processing, thereby Obtain multiple fragment vector sequences, which are used to represent the input text at the level of text fragments of different lengths.
  • the attention layer 13 adopts an attention mechanism and combines label vectors to process the above-mentioned vector sequences.
  • the attention layer 13 may include a first attention module 131 for performing first attention processing on the input vector sequence.
  • the first attention processing includes synthesizing each vector element according to the similarity between each vector element in the input vector sequence and the aforementioned K label vectors, so as to obtain a sequence vector corresponding to the input vector sequence. Therefore, the first attention processing can also be referred to as tag attention processing, and the first attention module can also be referred to as a co-attention module (with tags).
  • the attention layer 13 may further include a second attention module 132 and/or a third attention module 133.
  • the second attention module 132 may be called an intra-attention module, which is used to synthesize each vector element according to the similarity between each vector element and other vector elements in the input vector sequence.
  • the third attention module 133 may be called a self-attention module, which is used to synthesize each vector element according to the similarity between each vector element in the input vector sequence and the attention vector.
  • the characterization vector of the input text can be obtained and input into the classifier 14.
  • the classifier 14 determines the classification corresponding to the input text based on the characterization vector, and realizes the classification prediction of the text.
  • the text classification model shown in Figure 1 has at least the following characteristics.
  • the text classification model characterizes the input text at the level of text fragments of different lengths, and obtains multiple fragment-level vector sequences, so as to better explore the semantic information of contexts of different lengths.
  • the text classification model in this embodiment also embeds the label description text of each category to obtain The label vector representation of semantic information.
  • the sequence representation of each sequence is synthesized based on the similarity between each element in the word vector sequence and the segment vector sequence and the label vector.
  • the final representation vector of the input text contains the similarity information between the vector sequence of different levels (the level of words, the level of text fragments of different lengths) and the label vector, so as to make better use of the context information of the input text and
  • the semantic similarity information of the description text with the label is used to classify the text, thereby improving the classification accuracy.
  • Fig. 2 shows a flowchart of a method for text classification using a text classification model according to an embodiment. It can be understood that the method can be executed by any device, device, platform, or device cluster with computing and processing capabilities. As shown in Figure 2, the text classification process includes at least the following steps.
  • step 21 K label vectors corresponding to the K categories as classification targets are obtained, wherein each label vector is obtained by word embedding the label description text of the corresponding category.
  • tags are generally used to represent the K categories.
  • the tags are, for example, the numbers from 1 to K, the id numbers of the categories, or the one-hot codes of the K categories, and so on.
  • the tag itself often does not contain semantic information, but is just a code for the category.
  • each category often has corresponding description information describing the characteristics of the content of the category, and we can use it as the description information for the label, that is, the label description text.
  • the label description text often contains semantic information related to the corresponding category.
  • K categories as classification targets correspond to predetermined K standard questions.
  • the label description text of each category is the standard question description text corresponding to the category.
  • the label description text of category 1 is the standard question 1 "How to repay Huabei" under this category
  • the label description text of category 2 is the standard question 2 "How much money can I borrow" under the category.
  • the classification targets are K categories corresponding to the predetermined K manual customer service skill sets.
  • the label description text of each category may be a description of the corresponding skill group, for example, including the knowledge field of the skill group.
  • the label description text corresponding to each category can also be correspondingly obtained.
  • the label vector corresponding to each category can be obtained.
  • the process of converting the label description text of each category into a label vector may include the following steps.
  • a specific word embedding algorithm is used to embed each descriptive word contained in the label description text to obtain the word vector of each descriptive word.
  • the aforementioned specific word embedding algorithm may be an algorithm in an existing word embedding tool, such as word2vec, or a pre-trained word embedding algorithm for a specific text scene. Assuming that the specific word embedding algorithm used converts each word into an h-dimensional vector, and the label description text contains m words, in this step, m h-dimensional vectors corresponding to the label description text are obtained.
  • the word vectors of each descriptor are synthesized to obtain the label vector l j corresponding to the category Cj.
  • the m h-dimensional vectors obtained in the previous step may be synthesized, and the h-dimensional vector obtained after synthesis may be used as the label vector l j .
  • the above-mentioned synthesis can be averaging, summation, weighted summation, and so on.
  • the embedding layer 11 may convert the label description texts of the K categories into label vectors in advance, and store the obtained K label vectors in the memory for use in classification prediction.
  • K pre-stored tag vectors are read.
  • the respective label description texts of the K categories may be input to the embedding layer, and word embedding may be performed to obtain the label vector of each category.
  • step 22 using the embedding layer 11, word embedding is performed on the input text to obtain a word vector sequence.
  • the embedding layer 11 adopts the aforementioned specific word embedding algorithm to perform word embedding on each word in the input text, so as to obtain the word vector sequence corresponding to the input text. Assuming that the input text contains N words ⁇ w 1 ,w 2 ,...,w N ⁇ arranged in sequence, the word vector sequence X W can be obtained:
  • steps 21 and 22 can be executed in parallel or in any order, which is not limited here.
  • step 23 the above-mentioned word vector sequence is input to the convolution layer 12, and a number of convolution kernels or convolution windows of different widths are used to perform convolution processing on the word vector sequence.
  • a number of convolution kernels or convolution windows of different widths are used to perform convolution processing on the word vector sequence.
  • Fig. 3 shows a schematic diagram of performing convolution processing on a sequence of word vectors in an embodiment.
  • a convolution window with a width of 5 (radius 2) is used for convolution processing.
  • the convolution window covers the current word as the center, and the continuous 5 word vectors formed by the two word vectors before and after, namely Perform convolution operation on these 5 word vectors to get the fragment vector corresponding to the position i
  • the aforementioned convolution operation may be a combination operation of word vectors defined by an activation function.
  • the word vector As the current word Perform convolution operation on the 5 word vectors at the center to obtain the fragment vector corresponding to the position i+1
  • the fragment vectors corresponding to the N positions are obtained, and the fragment vector sequence corresponding to the convolution window is formed
  • the convolution layer uses several convolution windows with different widths for processing. For example, in a specific example, using four convolution windows with widths of 3, 5, 9, 15 and processing the word vector sequence X W separately, four fragment vector sequences X S1 , X S2 , X S3 , X S4 , these four fragment vector sequences respectively represent the representation of the input text at the level of text fragments with lengths of 3, 5, 9, 15 words.
  • the number of convolution windows used and the width of each convolution window can be determined according to factors such as the length of the input text, the length of the text fragments to be considered, and so on, so that several fragment vector sequences are obtained.
  • the above word vector sequence X W and several fragment vector sequences X S can form a vector sequence set.
  • the vector sequences in the set contain N h-dimensional vector elements, which can be simply uniformly denoted as the vector sequence X.
  • each vector sequence X in the above vector sequence set is input to the first attention module in the attention layer, and the first attention processing is performed to obtain each first attention corresponding to each vector sequence X.
  • Sequence vector As mentioned earlier, the first attention module is also called the mutual attention module (with labels). Correspondingly, the first attention processing can also be called label attention processing. Similarity, the corresponding sequence vector is obtained.
  • the first attention processing may include, for each vector element x i in the input vector sequence X, determining the vector element according to the similarity between the vector element x i and the K label vectors obtained in step 21 The first weighting factor corresponding to x i , and the weighted summation of each vector element in the input vector sequence using the first weighting factor, to obtain the first sequence vector V1(X) corresponding to the input vector sequence X.
  • determining the first weighting factor corresponding to the vector element x i can be performed in the following manner.
  • the similarity a ij between the vector element x i and the label vector l j can be calculated by the cosine similarity, as shown in the following formula (1):
  • the similarity a ij between the vector element x i and the label vector l j can also be determined based on the Euclidean distance between the two. The greater the distance, the smaller the similarity.
  • the similarity a ij can also be directly determined as the dot product (inner product) result of the vector element x i and the label vector l j In more examples, the similarity can also be determined in other ways.
  • the maximum value can be determined, and the first weighting factor corresponding to the vector element x i can be determined based on the maximum value.
  • the corresponding K label vectors are usually far away from each other in the corresponding vector space.
  • the vector element x i has a high similarity with any label vector l j , it means that the word or text fragment corresponding to the vector element may have a greater relationship with the corresponding category j. Therefore, the vector element should be given Xi is more concerned or attention (attention), giving it a higher weight. Therefore, in the above steps, the first weighting factor of the vector element is determined according to the maximum value of the similarity.
  • the maximum value of the K similarities is directly used as the first weighting factor corresponding to the vector element x i
  • the maximum value of the vector elements corresponding to the K x i of the similarity score is determined for a i x i mutual attention vector element, and similarly, the input vector to obtain a sequence of vector elements in each of the respective The corresponding mutual attention scores. Then, according to each mutual attention score corresponding to each vector element, normalize the mutual attention score a i of the vector element x i to obtain the first weighting factor corresponding to the vector element
  • the above-mentioned normalization processing is realized by the softmax function, as shown in the following formula (2):
  • the first attention module can weight and sum each vector element based on the first weight factor to obtain the first weight factor of the input vector sequence X.
  • Sequence vector V1(X) namely:
  • Fig. 4 shows a schematic diagram of performing first attention processing on an input vector sequence in an embodiment.
  • the K-dimensional similarity matrix is called the label attention matrix.
  • Perform the maximum pooling operation on the label attention matrix that is, select the maximum value in a column corresponding to each vector element to obtain the mutual attention score of each vector element, and then obtain its weighting factor based on the mutual attention score, based on the weight
  • the factor weights and sums each vector element to obtain the first sequence vector representation V1 of the input vector sequence.
  • the corresponding first sequence vectors can be obtained respectively.
  • the word vector sequence X W obtains the corresponding first sequence vector V1 (X W )
  • the several fragment vector sequences X S obtain the corresponding several first sequence vectors V1 (X S ).
  • step 25 the first attention representation S label of the input text is obtained according to each first sequence vector corresponding to each of the above vector sequences.
  • each first sequence vector including V1(X W ) and several V1(X S ), can be synthesized.
  • the synthesis method can include summation, weighted summation, averaging, etc., so that the first sequence vector is obtained. Attention means S label .
  • the characterization vector S of the input text is determined according to at least the above-mentioned first attention representation S label.
  • the first attention representation can be used as the characterization vector S.
  • step 27 the characterization vector S is input to the classifier 14, and through the operation of the classifier, the category prediction results of the input text in the K categories are obtained.
  • the convolutional layer 13 may further include a second attention module 132 and/or a third attention module 133.
  • the processing procedures of the second attention module and the third attention module are described below.
  • the second attention module 132 is also known as the intra-attention module, which is used to perform calculations on each vector element according to the similarity between each vector element in the input vector sequence and other vector elements. comprehensive.
  • the module 132 when the vector sequence X is input to the second attention module 132, the module 132 performs second attention processing on the input vector sequence X, also called internal attention processing.
  • the internal attention processing specifically includes: For each vector element x i in the input vector sequence X, according to the similarity between the vector element and each other vector element x j in the input vector sequence X, determine the second weighting factor corresponding to the vector element x i, and use The second weighting factor weights and sums each vector element in the input sequence to obtain the second sequence vector V2(X) corresponding to the input vector sequence X.
  • determining the second weighting factor corresponding to the vector element x i can be performed in the following manner.
  • each similarity a ij between the vector element x i and each other vector element x j calculates each similarity a ij between the vector element x i and each other vector element x j .
  • the calculation of the similarity can adopt the cosine similarity, or it can be determined based on other methods such as the vector distance, the vector dot multiplication result, etc., which will not be repeated here.
  • the mean value of the aforementioned similarity is directly used as the second weighting factor corresponding to the vector element x i
  • the elements of the vector x i corresponding to the mean similarity score is determined for a i x i of vector elements within the attention, and attention based on the scores of each vector element, for example by a softmax normalization function Processing to obtain the second weighting factor corresponding to the vector element x i
  • the second attention module can weight and sum each vector element based on the second weighting factor to obtain the second weighting factor of the input vector sequence X.
  • Sequence vector V2(X) namely:
  • Fig. 5 shows a schematic diagram of performing second attention processing on an input vector sequence in an embodiment.
  • the N vector elements in the input vector sequence are arranged into rows and columns, respectively, and the similarity between the two vector elements x i and x j is calculated, so that an N*N-dimensional
  • the similarity matrix is called the internal attention matrix.
  • Perform an average pooling operation on the internal attention matrix that is, calculate the average value of a column of similarity values corresponding to each vector element to obtain the internal attention score of each vector element, and then obtain its weighting factor based on the internal attention score, Based on the weighting factor, each vector element is weighted and summed to obtain the second sequence vector representation V2 of the input vector sequence.
  • Each vector sequence X in the aforementioned vector set can be input to the second attention module 132 to perform the above-mentioned internal processing power processing, so as to obtain respectively corresponding second sequence vectors V2(X), including V2 corresponding to the word vector sequence X W (X W ), a number of second sequence vectors V2(X S ) corresponding to a number of segment vector sequences X S.
  • the second sequence vectors V2(X) corresponding to the above vector sequences can be synthesized to obtain the second attention representation S intra of the input text.
  • the step 26 of determining the characterization vector S in FIG. 2 may include, based on the first attention representation S label and the second attention Denote S intra, determine the characterization vector S.
  • the first attention representation S label and the second attention representation S intra can be synthesized through a variety of methods, such as summation, weighted summation, averaging, etc., to obtain the characterization vector S.
  • the attention layer 13 may further include a third attention module 133.
  • the third attention module 133 may be referred to as a self-attention module, which is used to perform self-attention processing, that is, to synthesize each vector element according to the similarity between each vector element in the input vector sequence and the attention vector.
  • the self-attention module 133 maintains an attention vector v, which has the same dimension as the vector obtained by word embedding, and both are h-dimensional.
  • the parameters contained in the attention vector v can be determined through training.
  • the third attention module 133 is based on a total sequence X'formed based on each vector sequence in the vector sequence set.
  • the third attention module 133 performs third attention processing on the total sequence X', that is, self-attention processing, which specifically includes, for each vector element x i in the total sequence X', according to the vector element x i
  • the similarity with the attention vector v, the third weighting factor corresponding to the vector element is determined, and the third weighting factor is used to weight and sum each vector element in the total sequence to obtain the third attention of the input text Force expression.
  • determining the third weighting factor corresponding to the vector element x i can be performed in the following manner.
  • the calculation of similarity can adopt cosine similarity, or it can be determined based on other methods such as vector distance, vector dot multiplication result, etc., which will not be repeated here.
  • the third weighting factor corresponding to the vector element x i determines the third weighting factor corresponding to the vector element x i.
  • the above self-attention score is directly used as the third weighting factor corresponding to the vector element x i
  • the third weighting factor corresponding to the vector element x i is obtained
  • the similarity between the vector element x i and the attention vector v is calculated by the vector dot product, and the normalization is by the softmax function, so that the following third weighting factor can be obtained:
  • v T is the transposition of the attention vector v
  • M is the number of vector elements contained in the total sequence X'.
  • the third attention module can weight and sum each vector element based on the third weighting factor. Since the total sequence already contains the information of each vector sequence, the result of processing the total sequence can be directly used as the third attention representation S self of the input text, namely:
  • the above third attention module 133 performs self-attention processing on the total sequence X'formed by splicing each vector sequence together to obtain a third attention representation.
  • each vector sequence can be fused and transformed to obtain a corresponding fusion sequence, and the fusion sequence and each vector sequence can be spliced together to form a more comprehensive total sequence X'.
  • the attention layer 13 further includes a fusion module, which is used to perform fusion conversion processing on the input vector sequence X and convert it into a corresponding fusion sequence Q.
  • the fusion conversion processing may specifically include, for each vector element x i in the input vector sequence X, determining the difference with each label according to the similarity between the vector element x i and each label vector l j in the aforementioned K label vectors The label weight factor corresponding to the vector l j , and based on the label weight factor, the vector element x i is converted into the fusion vector q i of the weighted summation of K label vectors, thereby converting the input vector sequence X into the corresponding fusion sequence Q .
  • the process of correspondingly transforming the vector element x i into the fusion vector q i can be performed in the following manner.
  • the similarity calculation method can be realized by, for example, formula (1), or it can be determined based on vector distance, dot multiplication operation, etc., and will not be repeated.
  • the label weight factor ⁇ j corresponding to each label vector l j is determined.
  • the label similarity weight directly as the label similarity weight a ij tag vectors l j corresponding weighting factor ⁇ j.
  • the similarity of a ij is normalized, as the tag label weight vectors corresponding to the weight factor l j ⁇ j.
  • the label weight factor can be determined by the following formula:
  • the weighted sum of each label vector can be based on the label weight factor, thereby converting the vector element x i into the fusion vector q i :
  • Fig. 6 shows a schematic diagram of performing fusion conversion processing on an input vector sequence in an embodiment.
  • N vector elements in the input vector sequence X as columns and K label vectors as rows, respectively calculate the similarity between each vector element x i and each label vector l j , which can form a Similarity matrix.
  • For each vector element x i based on each similarity in the row corresponding to the vector element in the similarity matrix, determine the label weighting factor corresponding to each label vector, and weighted and sum each label vector based on the label weighting factor to obtain the vector element corresponding fusion vector x i q i.
  • each vector element x i in the input vector sequence X into a corresponding fusion vector q i
  • the vector sequence X can be converted into a fusion sequence Q.
  • each corresponding fusion sequence can be obtained, for example , the fusion sequence Q W corresponding to the word vector sequence X W and the fusion sequence corresponding to the fragment vector sequence X S Q S.
  • each original vector sequence (X W X S1 X S2 %) and each fusion sequence (Q W Q S1 Q S2 ...) obtained as above can be spliced to obtain the total sequence X'.
  • the third attention module 133 is used to process the total sequence X′ to obtain the third attention representation S self .
  • the step 26 of determining the characterization vector S in FIG. 2 may include, based on the first attention representation S label and the third attention The force represents S self , and the characterization vector S is determined.
  • the first attention representation S label and the third attention representation S self can be synthesized in a variety of ways to obtain the characterization vector S.
  • the step 26 of determining the characterization vector S in FIG. 2 may include, based on the first attention representation S label , the second attention represents S intra and the third attention represents S self , and the characterization vector S is determined.
  • the weighted summation of the first attention representation, the second attention representation, and the third attention representation can be based on a predetermined weight coefficient to obtain the characterization vector S, as shown in the following formula:
  • ⁇ 1 , ⁇ 2 , and ⁇ 3 are weight coefficients, which may be pre-set hyperparameters.
  • FIG. 7 shows a schematic diagram of attention processing of the attention layer in an embodiment.
  • the schematic diagram shows the input and output of each attention module when the attention layer contains the first, second and third attention modules.
  • the input of the first attention module includes a vector sequence set consisting of a word vector sequence X W and a segment vector sequence X S , and K label vectors.
  • the first attention module obtains the first sequence vector of the vector sequence according to the similarity between the vector elements and the K label vectors. By synthesizing each first sequence vector, the first attention representation S label of the input text can be obtained.
  • the input of the second attention module includes the aforementioned vector sequence set. For each vector sequence X in the set, the second attention module obtains the second sequence vector of the vector sequence according to the similarity between the various vector elements. By synthesizing each second sequence vector, the second attention representation S intra of the input text can be obtained.
  • the input of the fusion module includes the aforementioned vector sequence set and K label vectors.
  • the fusion module converts each vector sequence X in the vector sequence set into a fusion vector Q through fusion conversion processing. Then, each fusion sequence corresponding to each vector sequence in the vector sequence set is output.
  • the input of the third attention module is each vector sequence in the aforementioned vector sequence set, and the total sequence formed by the synthesis of each fusion sequence.
  • the third attention module performs self-attention processing on the total sequence, and obtains the third attention representation S self of the input text.
  • the final characterization vector of the input text can be synthesized based on the output of the first, second and third attention modules.
  • the attention layer includes the first attention module, and the attention layer also includes the second attention module and/or the third attention module.
  • the process of classifying and predicting the input text is not only applicable to the training phase of the text classification model, but also applicable to the use phase after the model training is completed.
  • the input text input to the model is training text
  • the training text corresponds to a category label y indicating its true category.
  • the model needs to be trained based on the foregoing category prediction result.
  • the training process is shown in FIG. 8.
  • FIG. 8 shows the method steps further included in the model training stage.
  • the text prediction loss L text is obtained according to the category prediction result y′ for the training text and the category label y of the training text.
  • the category prediction result y' is obtained by the classifier 14 using a predetermined classification function to perform operations on the characterization vector S of the input text. Therefore, the category prediction result can be expressed as:
  • f c is the classification function.
  • the category prediction result y′ includes the probability that the predicted current training text belongs to the predetermined K categories. Therefore, the text prediction loss L text can be obtained based on the probability distribution indicated by the category prediction result y′ and the actual classification indicated by the category label y through a loss function in the form of cross entropy. In other embodiments, other known loss function forms can also be used to obtain the text prediction loss L text .
  • the total prediction loss L is determined based on at least the aforementioned text prediction loss L text.
  • the text prediction loss is determined as the total prediction loss L.
  • step 83 the text classification model is updated in the direction that reduces the total prediction loss L.
  • gradient descent, back propagation and other methods can be used to adjust the model parameters in the text classification model, so that the total prediction loss L is reduced until a predetermined convergence condition is reached, thereby realizing the training of the model.
  • K label vectors l j (j from 1 to K) corresponding to the K categories can be input to the classifier 14 respectively, so that the classifier 14 performs classification prediction based on the input label vector to obtain the corresponding K label predictions as a result
  • vectors l j where tag labels corresponding prediction result y "j can be expressed as:
  • the K categories and their corresponding label prediction results are respectively compared, and the label prediction loss L label is obtained based on the comparison results.
  • a cross-entropy loss function can be used to obtain the label prediction loss under the category, and then the label prediction loss of each category is summed to obtain the total label prediction loss L label .
  • the step 82 of determining the total loss in FIG. 8 may include determining the total loss L according to the text prediction loss L text and the label prediction loss L label .
  • the total loss L may be determined as:
  • is a hyperparameter.
  • the classifier can be more targeted for better training.
  • the text classification model can be used to classify and predict the input text of unknown category.
  • the classification prediction model combines the semantic information of different length text segment levels and the semantic information of the label description text, the classification prediction of the text can be realized with higher accuracy.
  • a device for classification prediction using a text classification model is provided.
  • the device is used to predict the category corresponding to the input text in the predetermined K categories.
  • the text classification model used includes an embedding layer, Convolutional layer, attention layer and classifier, attention layer further includes the first attention module, as shown in Figure 1.
  • the above classification prediction device can be deployed in any device, platform or device cluster with computing and processing capabilities.
  • Fig. 9 shows a schematic block diagram of a text classification prediction device according to an embodiment. As shown in FIG. 9, the prediction device 900 includes the following units.
  • the label vector obtaining unit 901 is configured to obtain K label vectors respectively corresponding to the K categories, where each label vector is obtained by word embedding the label description text of the corresponding category;
  • the word sequence obtaining unit 902 is configured to use the embedding layer to perform word embedding on the input text to obtain a word vector sequence;
  • the segment sequence acquiring unit 903 is configured to input the word vector sequence into the convolutional layer, and the convolution layer uses a number of convolution windows corresponding to a number of text segments of different lengths to convolve the word vector sequence Product processing to obtain several fragment vector sequences; the word vector sequence and several fragment vector sequences constitute a vector sequence set;
  • the first attention unit 904 is configured to input each vector sequence in the vector sequence set into the first attention module to perform first attention processing to obtain each first sequence vector corresponding to each vector sequence; Wherein, the first attention processing includes determining the first weighting factor corresponding to each vector element according to the similarity between each vector element in the input vector sequence and the K label vectors, and using the first weighting factor.
  • the weighting factor is a weighted summation of each vector element;
  • the first representation obtaining unit 905 is configured to obtain the first attention representation of the input text according to the respective first sequence vectors
  • a characterization vector determining unit 906, configured to determine a characterization vector of the input text at least according to the first attention expression
  • the prediction result obtaining unit 907 is configured to input the characterization vector into the classifier to obtain category prediction results of the input text in the K categories.
  • the input text is a user question; correspondingly, the label description text corresponding to each of the K categories includes a standard question description text.
  • the label vector obtaining unit 901 is configured to predetermine the K label vectors in the following manner: for each of the K categories, obtain the label description text corresponding to the category; The description text is word-embedded to obtain the word vector of each description word contained in the label description text; the word vectors of each description word are synthesized to obtain the label vector corresponding to the category.
  • the first weighting factor corresponding to each vector element is determined by the following method: For each vector element in the input vector sequence, calculate the vector element K similarities with the K label vectors; based on the maximum value of the K similarities, the first weighting factor corresponding to the vector element is determined.
  • the K similarities between the vector element and the K label vectors can be calculated by: calculating the cosine similarity between the vector element and each label vector; or, based on the vector element and each label The Euclidean distance between the vectors determines the similarity; or, based on the dot product result of the vector element and each label vector, the similarity is determined.
  • determining the first weighting factor corresponding to the vector element based on the maximum value of the K similarities may include: determining the mutual attention of the vector element based on the maximum value of the K similarities Force score; according to each mutual attention score corresponding to each vector element, normalize the mutual attention score of the vector element to obtain the first weighting factor corresponding to the vector element.
  • obtaining the first attention representation of the input text according to the respective first sequence vectors includes: according to an embodiment, by synthesizing the respective first sequence vectors to obtain the first Attention means that the synthesis includes one of the following: summation, weighted summation, and averaging.
  • the attention layer of the text classification model further includes a second attention module.
  • the device 900 further includes (not shown in the figure) a second attention unit and a second representation acquisition unit, wherein: the second attention unit is configured to separately set each vector sequence in the vector sequence set Input the second attention module to perform second attention processing to obtain each second sequence vector corresponding to each vector sequence; wherein, the second attention processing includes, for each vector element in the input vector sequence , According to the similarity between the vector element and each other vector element in the input vector sequence, determine the second weighting factor corresponding to the vector element, and use the second weighting factor to weight each vector element in the input sequence And; the second representation obtaining unit is configured to obtain a second attention representation of the input text according to the respective second sequence vectors.
  • the characterization vector determining unit 906 in FIG. 9 is configured to determine the characterization vector according to the first attention expression and the second attention expression.
  • the second weighting factor corresponding to the vector element can be determined in the following manner: calculating the respective similarities between the vector element and the other vector elements; Based on the average of the respective similarities, the second weighting factor corresponding to the vector element is determined.
  • the attention layer further includes a third attention module in which attention vectors are maintained.
  • the device 900 further includes (not shown in the figure) a total sequence forming unit and a third attention unit, wherein,
  • the total sequence forming unit is configured to form a total sequence based at least on the splicing of each vector sequence in the vector sequence set; the third attention unit is configured to use the third attention module to perform a third operation on the total sequence Attention processing, the third attention processing includes, for each vector element in the total sequence, determining a third weight corresponding to the vector element according to the similarity between the vector element and the attention vector Factor, and use the third weighting factor to weight and sum each vector element in the total sequence to obtain the third attention representation of the input text.
  • the aforementioned characterization vector determining unit 906 is configured to determine the characterization according to the first attention expression and the third attention expression vector.
  • the aforementioned characterization vector determining unit 906 is configured to, according to the first attention expression, the second attention expression and The third attention representation determines the characterization vector.
  • the characterization vector determining unit 906 may perform a weighted summation of the first attention expression, the second attention expression and the third attention expression based on a predetermined weight coefficient to obtain the Representation vector.
  • the attention layer further includes a fusion module.
  • the device 900 further includes a fusion unit (not shown) configured to input each vector sequence in the vector sequence set into the fusion module to perform fusion conversion processing to obtain each fusion sequence corresponding to each vector sequence,
  • the fusion conversion processing includes, for each vector element in the input vector sequence, determining the label weighting factor corresponding to each label vector according to the similarity between the vector element and each of the K label vectors , And based on the tag weight factor, convert the vector element into a fusion vector of the K tag vectors weighted and sum, thereby converting the input vector sequence into a corresponding fusion sequence.
  • the total sequence forming unit may be configured to splice the respective vector sequences and the respective fusion sequences to obtain the total sequence.
  • the input text is training text
  • the training text correspondingly has a category label indicating its true category
  • the device 900 further includes a training unit (not shown) configured to predict according to the category As a result and the category label, the text prediction loss is obtained; at least the total prediction loss is determined according to the text prediction loss; and the text classification model is updated in the direction that reduces the total prediction loss.
  • the training unit is further configured to: input the K label vectors corresponding to the K categories into the classifier to obtain the corresponding K prediction results; respectively compare the K categories with their corresponding According to the prediction result, the label prediction loss is obtained based on the comparison result; the total loss is determined according to the text prediction loss and the label prediction loss.
  • the text classification model is used to achieve accurate classification of the input text.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
  • a computing device including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, it implements the method described in conjunction with FIG. method.

Abstract

La présente invention concerne un procédé et un appareil permettant d'effectuer une prédiction de classification à l'aide d'un modèle de classification de texte. Le modèle de classification de texte comprend une couche d'incorporation, une couche de convolution, une couche d'attention et un classificateur. Le procédé de réalisation d'une prédiction de classification comprend les étapes consistant à : effectuer une incorporation de mot sur un texte de description d'étiquette correspondant à K catégories à l'avance pour obtenir K vecteurs d'étiquette ; pendant la prédiction, réaliser une incorporation de mots sur un texte d'entrée à l'aide d'une couche d'incorporation de façon à obtenir une séquence de vecteurs de mots ; au niveau d'une couche de convolution, effectuer un traitement de convolution sur la séquence de vecteurs de mots en utilisant des fenêtres de convolution de différentes largeurs de façon à obtenir une séquence de vecteurs de fragments ; puis, au niveau d'une couche d'attention, effectuer respectivement un premier traitement d'attention sur chaque séquence de vecteur, le premier traitement d'attention comprenant la détermination, en fonction de la similarité entre un élément de vecteur dans la séquence de vecteurs et les K vecteurs d'étiquette, d'un facteur de pondération de l'élément de vecteur, puis la réalisation d'une sommation pondérée pour obtenir un premier vecteur de séquence ; et obtenir un vecteur de représentation du texte d'entrée sur la base du premier vecteur de séquence de chaque séquence, et un classificateur obtient un résultat de prédiction de catégorie du texte d'entrée sur la base du vecteur de représentation.
PCT/CN2020/134518 2020-01-16 2020-12-08 Procédé et appareil pour effectuer une prédiction de classification à l'aide d'un modèle de classification de texte WO2021143396A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010049397.9 2020-01-16
CN202010049397.9A CN111291183B (zh) 2020-01-16 2020-01-16 利用文本分类模型进行分类预测的方法及装置

Publications (1)

Publication Number Publication Date
WO2021143396A1 true WO2021143396A1 (fr) 2021-07-22

Family

ID=71025468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134518 WO2021143396A1 (fr) 2020-01-16 2020-12-08 Procédé et appareil pour effectuer une prédiction de classification à l'aide d'un modèle de classification de texte

Country Status (2)

Country Link
CN (1) CN111291183B (fr)
WO (1) WO2021143396A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554241A (zh) * 2021-09-02 2021-10-26 国网山东省电力公司泰安供电公司 一种基于用户用电投诉行为的用户分层方法及预测方法
CN113761935A (zh) * 2021-08-04 2021-12-07 厦门快商通科技股份有限公司 一种短文本语义相似度度量方法、系统及装置
CN114092949A (zh) * 2021-11-23 2022-02-25 支付宝(杭州)信息技术有限公司 类别预测模型的训练、界面元素类别的识别方法及装置
CN115795037A (zh) * 2022-12-26 2023-03-14 淮阴工学院 一种基于标签感知的多标签文本分类方法
CN116561314A (zh) * 2023-05-16 2023-08-08 中国人民解放军国防科技大学 基于自适应阈值选择自注意力的文本分类方法
CN116611057A (zh) * 2023-06-13 2023-08-18 北京中科网芯科技有限公司 数据安全检测方法及其系统
CN116662556A (zh) * 2023-08-02 2023-08-29 天河超级计算淮海分中心 一种融合用户属性的文本数据处理方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291183B (zh) * 2020-01-16 2021-08-03 支付宝(杭州)信息技术有限公司 利用文本分类模型进行分类预测的方法及装置
CN111340605B (zh) * 2020-05-22 2020-11-24 支付宝(杭州)信息技术有限公司 训练用户行为预测模型、用户行为预测的方法和装置
CN112395419B (zh) * 2021-01-18 2021-04-23 北京金山数字娱乐科技有限公司 文本分类模型的训练方法及装置、文本分类方法及装置
CN113838468A (zh) * 2021-09-24 2021-12-24 中移(杭州)信息技术有限公司 流式语音识别方法、终端设备及介质
CN113806545B (zh) * 2021-09-24 2022-06-17 重庆理工大学 基于标签描述生成的评论文本情感分类方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248394A1 (en) * 2008-03-25 2009-10-01 Ruhi Sarikaya Machine translation in continuous space
CN110046248A (zh) * 2019-03-08 2019-07-23 阿里巴巴集团控股有限公司 用于文本分析的模型训练方法、文本分类方法和装置
CN110163220A (zh) * 2019-04-26 2019-08-23 腾讯科技(深圳)有限公司 图片特征提取模型训练方法、装置和计算机设备
CN110347839A (zh) * 2019-07-18 2019-10-18 湖南数定智能科技有限公司 一种基于生成式多任务学习模型的文本分类方法
CN110609897A (zh) * 2019-08-12 2019-12-24 北京化工大学 一种融合全局和局部特征的多类别中文文本分类方法
CN111291183A (zh) * 2020-01-16 2020-06-16 支付宝(杭州)信息技术有限公司 利用文本分类模型进行分类预测的方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710800B (zh) * 2018-11-08 2021-05-25 北京奇艺世纪科技有限公司 模型生成方法、视频分类方法、装置、终端及存储介质
CN111428520B (zh) * 2018-11-30 2021-11-23 腾讯科技(深圳)有限公司 一种文本翻译方法及装置
CN110134789B (zh) * 2019-05-17 2021-05-25 电子科技大学 一种引入多路选择融合机制的多标签长文本分类方法
CN110442707B (zh) * 2019-06-21 2022-06-17 电子科技大学 一种基于seq2seq的多标签文本分类方法
CN110362684B (zh) * 2019-06-27 2022-10-25 腾讯科技(深圳)有限公司 一种文本分类方法、装置及计算机设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248394A1 (en) * 2008-03-25 2009-10-01 Ruhi Sarikaya Machine translation in continuous space
CN110046248A (zh) * 2019-03-08 2019-07-23 阿里巴巴集团控股有限公司 用于文本分析的模型训练方法、文本分类方法和装置
CN110163220A (zh) * 2019-04-26 2019-08-23 腾讯科技(深圳)有限公司 图片特征提取模型训练方法、装置和计算机设备
CN110347839A (zh) * 2019-07-18 2019-10-18 湖南数定智能科技有限公司 一种基于生成式多任务学习模型的文本分类方法
CN110609897A (zh) * 2019-08-12 2019-12-24 北京化工大学 一种融合全局和局部特征的多类别中文文本分类方法
CN111291183A (zh) * 2020-01-16 2020-06-16 支付宝(杭州)信息技术有限公司 利用文本分类模型进行分类预测的方法及装置

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761935A (zh) * 2021-08-04 2021-12-07 厦门快商通科技股份有限公司 一种短文本语义相似度度量方法、系统及装置
CN113761935B (zh) * 2021-08-04 2024-02-27 厦门快商通科技股份有限公司 一种短文本语义相似度度量方法、系统及装置
CN113554241A (zh) * 2021-09-02 2021-10-26 国网山东省电力公司泰安供电公司 一种基于用户用电投诉行为的用户分层方法及预测方法
CN113554241B (zh) * 2021-09-02 2024-04-26 国网山东省电力公司泰安供电公司 一种基于用户用电投诉行为的用户分层方法及预测方法
CN114092949A (zh) * 2021-11-23 2022-02-25 支付宝(杭州)信息技术有限公司 类别预测模型的训练、界面元素类别的识别方法及装置
CN115795037B (zh) * 2022-12-26 2023-10-20 淮阴工学院 一种基于标签感知的多标签文本分类方法
CN115795037A (zh) * 2022-12-26 2023-03-14 淮阴工学院 一种基于标签感知的多标签文本分类方法
CN116561314B (zh) * 2023-05-16 2023-10-13 中国人民解放军国防科技大学 基于自适应阈值选择自注意力的文本分类方法
CN116561314A (zh) * 2023-05-16 2023-08-08 中国人民解放军国防科技大学 基于自适应阈值选择自注意力的文本分类方法
CN116611057A (zh) * 2023-06-13 2023-08-18 北京中科网芯科技有限公司 数据安全检测方法及其系统
CN116611057B (zh) * 2023-06-13 2023-11-03 北京中科网芯科技有限公司 数据安全检测方法及其系统
CN116662556A (zh) * 2023-08-02 2023-08-29 天河超级计算淮海分中心 一种融合用户属性的文本数据处理方法
CN116662556B (zh) * 2023-08-02 2023-10-20 天河超级计算淮海分中心 一种融合用户属性的文本数据处理方法

Also Published As

Publication number Publication date
CN111291183B (zh) 2021-08-03
CN111291183A (zh) 2020-06-16

Similar Documents

Publication Publication Date Title
WO2021143396A1 (fr) Procédé et appareil pour effectuer une prédiction de classification à l'aide d'un modèle de classification de texte
US11270225B1 (en) Methods and apparatus for asynchronous and interactive machine learning using word embedding within text-based documents and multimodal documents
CN109101537B (zh) 基于深度学习的多轮对话数据分类方法、装置和电子设备
US20220019745A1 (en) Methods and apparatuses for training service model and determining text classification category
CN110046248B (zh) 用于文本分析的模型训练方法、文本分类方法和装置
US8331655B2 (en) Learning apparatus for pattern detector, learning method and computer-readable storage medium
US20160140425A1 (en) Method and apparatus for image classification with joint feature adaptation and classifier learning
CN111191791A (zh) 机器学习模型的应用方法、训练方法、装置、设备及介质
CN112015868A (zh) 基于知识图谱补全的问答方法
CN113268609A (zh) 基于知识图谱的对话内容推荐方法、装置、设备及介质
CN112785441B (zh) 数据处理方法、装置、终端设备及存储介质
CN114936623A (zh) 一种融合多模态数据的方面级情感分析方法
US20180137410A1 (en) Pattern recognition apparatus, pattern recognition method, and computer program product
US10733483B2 (en) Method and system for classification of data
CN112988970A (zh) 一种服务于智能问答系统的文本匹配算法
CN110543566B (zh) 一种基于自注意力近邻关系编码的意图分类方法
CN111950647A (zh) 分类模型训练方法和设备
CN111339734A (zh) 一种基于文本生成图像的方法
CN116258938A (zh) 基于自主进化损失的图像检索与识别方法
CN114970882A (zh) 适于多场景多任务的模型预测方法及模型系统
CN115017321A (zh) 一种知识点预测方法、装置、存储介质以及计算机设备
CN111339303A (zh) 一种基于聚类与自动摘要的文本意图归纳方法及装置
CN113343666B (zh) 评分的置信度的确定方法、装置、设备及存储介质
EP4198836A1 (fr) Procédé et système d'explication locale de champ de prédiction de réseau neuronal
US20210365794A1 (en) Discovering Novel Artificial Neural Network Architectures

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913983

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20913983

Country of ref document: EP

Kind code of ref document: A1