WO2021051503A1

WO2021051503A1 - Semantic representation model-based text classification method and apparatus, and computer device

Info

Publication number: WO2021051503A1
Application number: PCT/CN2019/116339
Authority: WO
Inventors: 邓悦; 金戈; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-09-19
Filing date: 2019-11-07
Publication date: 2021-03-25
Also published as: CN110781312A; CN110781312B

Abstract

A semantic representation model-based text classification method and apparatus, a computer device and a storage medium. The method comprises: acquiring inputted original text, and preprocessing the original text so as to obtain a word sequence; calculating to obtain a vector wi; generating a text embedding vector sequence {w1, w2,..., wn}; inputting the word sequence into a preset knowledge embedding model to acquire an entity embedding vector sequence {e1, e2,..., en}; inputting the text embedding vector sequence into a M-layer word granularity encoder for calculation to obtain an intermediate text embedding vector sequence; inputting the intermediate text embedding vector sequence and the entity embedding vector sequence into a N-layer knowledge granularity encoder for calculation to obtain a final text embedding vector sequence and a final entity embedding vector sequence; and inputting the final text embedding vector sequence and the final entity embedding vector sequence into a classification model to obtain a text classification result. Thus, the accuracy of text classification is improved.

Description

Text classification method, device and computer equipment based on semantic representation model

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 19, 2019, the application number is 2019108866221, and the invention title is "Text Classification Method, Apparatus and Computer Equipment Based on Semantic Representation Model". The entire content of the patent application is approved The reference is incorporated in this application.

Technical field

This application relates to the computer field, and in particular to a text classification method, device, computer equipment and storage medium based on a semantic representation model.

Background technique

Text classification is an important part of natural language processing, and text classification models are generally used for text classification. The performance of the text classification model largely depends on its semantic representation model. Common semantic representation models, such as models based on the word2vec algorithm, models based on the two-way LSTM network, etc., only consider the relationship between the word itself and/or the context. When in a professional question and answer context, such as in the process of a professional interview, the performance is The questions that appear in the interview are professional (professional vocabulary, professional relationship expression, etc.), and often examine whether the candidate has a clear grasp of a certain concept or a certain definition, that is, the problem has a knowledge background, so traditional The semantic representation model cannot accurately reflect the professional vocabulary and the relationship between the professional vocabulary (ie entity and entity relationship), so it cannot accurately reflect the input text, thereby reducing the accuracy of the final text classification.

technical problem

The main purpose of this application is to provide a text classification method, device, computer equipment and storage medium based on a semantic representation model, aiming to improve the accuracy of text classification.

Technical solutions

In order to achieve the above objective, this application proposes a text classification method based on a semantic representation model, which includes the following steps:

Acquiring the input original text, and preprocessing the original text to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division;

According to the preset word vector generation method, the corresponding relationship between the position of the i-th word in the original text and the sentence segmentation vector, and the corresponding relationship between the position of the i-th word in the word sequence and the position vector, corresponding Obtain the word vector ai, sentence segmentation vector bi, and position vector ci corresponding to the i-th word in the word sequence, and calculate the text embedding corresponding to the i-th word according to the formula: wi=ai+bi+ci Vector wi, where the word vector ai, sentence segmentation vector bi and position vector ci have the same dimensions;

Generate a text embedding vector sequence {w1, w2,..., wn}, where there are a total of n words in the word sequence;

Input the word sequence into the preset knowledge embedding model to obtain the entity embedding vector sequence {e1, e2,..., en}, where en is the entity embedding vector corresponding to the nth word;

Input the text embedding vector sequence into the preset M-layer word granularity encoder for calculation, thereby obtaining the intermediate text embedding vector sequence output by the last layer word granularity encoder; wherein the M-layer word granularity encoder and the pre- Suppose that the N-level knowledge granularity encoders are sequentially connected to form a semantic representation model, where M and N are both greater than or equal to 2;

Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, thereby obtaining the final text embedding vector sequence and the final entity embedding vector output by the last layer of knowledge granularity encoder sequence;

The final text embedding vector sequence and the final entity embedding vector sequence are input into a preset classification model for processing to obtain a text classification result.

This application proposes a text classification device based on a semantic representation model, including:

The text acquisition unit is configured to acquire the input original text and preprocess the original text to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division;

The first embedding calculation unit is used to generate the word vector according to the preset method, the corresponding relationship between the position of the sentence to which the i-th word belongs in the original text and the sentence segmentation vector, and the position of the i-th word in the word sequence Correspondence with the position vector, correspondingly obtain the word vector ai, sentence segmentation vector bi and position vector ci corresponding to the i-th word in the word sequence, and calculate according to the formula: wi=ai+bi+ci The text embedding vector wi corresponding to the i-th word, where the word vector ai, the sentence segmentation vector bi and the position vector ci have the same dimensions;

The first sequence generating unit is used to generate a text embedding vector sequence {w1, w2,..., wn}, wherein there are a total of n words in the word sequence;

The second sequence generating unit is used to input the word sequence into the preset knowledge embedding model to obtain the entity embedding vector sequence {e1, e2,..., en}, where en is the entity embedding vector corresponding to the nth word ；

The intermediate sequence generating unit is used to input the text embedding vector sequence into a preset M-layer word granularity encoder for calculation, so as to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder; wherein the M The layer word granularity encoder and the preset N layer knowledge granularity encoder are sequentially connected to form a semantic representation model, where M and N are both greater than or equal to 2;

The final sequence generating unit is used to input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, so as to obtain the final text embedding output by the last layer of knowledge granularity encoder Vector sequence and final entity embedding vector sequence;

The text classification unit is used to input the final text embedding vector sequence and the final entity embedding vector sequence into a preset classification model for processing to obtain a text classification result.

The present application provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of any one of the above methods when the computer program is executed.

The present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the above-mentioned methods are implemented.

Beneficial effect

The text classification method, device, computer equipment and storage medium based on the semantic representation model of the present application obtain the input original text and preprocess the original text to obtain the word sequence; obtain the word vector ai and sentence segmentation vector bi and position vector ci, and calculate vector wi according to the formula: wi=ai+bi+ci; generate text embedding vector sequence {w1,w2,...,wn}; input the word sequence into the preset knowledge embedding model , Obtain the entity embedding vector sequence {e1, e2,..., en}; input the text embedding vector sequence into the preset M-layer word granularity encoder for calculation, thereby obtaining the intermediate text embedding vector sequence; The intermediate text embedding vector sequence and the entity embedding vector sequence are input into the N-layer knowledge granularity encoder for calculation, so as to obtain the final text embedding vector sequence and the final entity embedding vector sequence; the final text embedding vector sequence and the final The entity embedding vector sequence is input into the preset classification model for processing, and the text classification result is obtained. Thus, the entity embedding vector is introduced into the classification process, which improves the accuracy of text classification.

Description of the drawings

FIG. 1 is a schematic flowchart of a text classification method based on a semantic representation model according to an embodiment of this application;

2 is a schematic block diagram of the structure of a text classification device based on a semantic representation model according to an embodiment of the application;

FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.

The best implementation of this application

1, an embodiment of the present application provides a method for text classification based on a semantic representation model, including the following steps:

S1. Obtain the input original text, and preprocess the original text to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division;

S2. According to the preset word vector generation method, the corresponding relationship between the position of the sentence to which the i-th word belongs in the original text and the sentence segmentation vector, and the corresponding relationship between the position of the i-th word in the word sequence and the position vector , Correspondingly obtain the word vector ai, the sentence segmentation vector bi and the position vector ci corresponding to the i-th word in the word sequence, and calculate according to the formula: wi=ai+bi+ci to obtain the i-th word corresponding to The text embedding vector wi, where the word vector ai, sentence segmentation vector bi and position vector ci have the same dimensions;

S3. Generate a text embedding vector sequence {w1, w2,...,wn}, wherein there are n words in the word sequence;

S4. Input the word sequence into the preset knowledge embedding model to obtain the entity embedding vector sequence {e1, e2,..., en}, where en is the entity embedding vector corresponding to the nth word;

S5. Input the text embedding vector sequence into a preset M-layer word granularity encoder for calculation, so as to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder; wherein the M-layer word granularity encoder Connect with the preset N-layer knowledge granularity encoder sequentially to form a semantic representation model, where M and N are both greater than or equal to 2;

S6. Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, so as to obtain the final text embedding vector sequence and the final entity output by the last layer of knowledge granularity encoder Embedded vector sequence;

S7. Input the final text embedding vector sequence and the final entity embedding vector sequence into a preset classification model for processing to obtain a text classification result.

This application introduces the entity embedding vector sequence into the semantic representation model, so that the semantic representation model and text classification model can be competent for more complex situations (such as processing text with professional vocabulary and the relationship between professional vocabulary), and improve the final text Accuracy of classification.

As described in the above step S1, the input original text is obtained, and the original text is preprocessed to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division. The original text may include multiple sentences, and each sentence includes multiple words. Therefore, the word sequence is obtained by preprocessing including at least sentence division and word division. Among them, sentence division and word division can use open source division tools, such as jieba tools, SnowNLP tools, etc. Wherein, the original text may be any feasible text, preferably text with designated words, wherein the designated words are knowledge nodes in a preset knowledge graph, and the designated words are professional vocabulary in a preset field.

As described in step S2 above, according to the preset word vector generation method, the corresponding relationship between the position of the sentence to which the i-th word belongs in the original text and the sentence segmentation vector, the position of the i-th word in the word sequence and Correspondence of the position vector, corresponding to the word vector ai, sentence segmentation vector bi and position vector ci corresponding to the i-th word in the word sequence, and according to the formula: wi=ai+bi+ci, the first The text embedding vector wi corresponding to i words, where the word vector ai, the sentence segmentation vector bi and the position vector ci have the same dimensions. The word vector generation method can adopt any feasible method, such as querying a preset word vector library to obtain the word vector corresponding to the words in the word sequence. The word vector library can be an existing database or For example, the word2vec model can be used to train the collected corpus; or, the word vector generation method is, for example: before the training of the semantic representation model, the word vector corresponding to each word is initialized to a random value, and then in the training process , Optimize together with other network parameters to obtain the word vector corresponding to each word. Since the text embedding vector wi is not only composed of the word vector ai, but also composed of the sentence segmentation vector bi and the position vector ci, it can also reflect the sentence position and word position of the i-th word.

As described in the above step S3, a text embedding vector sequence {w1, w2,...,wn} is generated, wherein there are a total of n words in the word sequence. The text embedding vector sequence {w1,w2,...,wn} is composed of the text embedding vectors corresponding to n words, and the text embedding vector is displayed in the form of a column vector, so the text embedding vector sequence {w1,w2,...,wn} is also Treated as a matrix with n columns;

As described in step S4 above, the word sequence is input into the preset knowledge embedding model to obtain the entity embedding vector sequence {e1, e2,...,en}, where en is the entity embedding vector corresponding to the nth word. The knowledge embedding model is, for example, the TransE model, which can extract the entities and relationships in the knowledge graph in the form of vectors, and because the knowledge nodes and relationships in the knowledge graph are more professional (you can select the appropriate Knowledge graph) to obtain the entity embedding vector corresponding to each word. The knowledge embedding model, such as the TransE model, is a traditional model, and will not be repeated here. Further, if there is a word that is not an entity, the entity embedding vector corresponding to the word is set to 0.

As described in step S5 above, the text embedding vector sequence is input into the preset M-layer word granularity encoder for calculation, so as to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder; wherein the M The layer word granularity encoder and the preset N layer knowledge granularity encoder are sequentially connected to form a semantic representation model, where M and N are both greater than or equal to 2. Wherein, the calculation process in the M-layer word granular encoder is, for example: in the multi-head self-attention mechanism layer in the first-layer word granular encoder, the text embedding vector sequence is respectively multiplied by the trained h th A parameter matrix group, thereby obtaining the first matrix {Q1,Q2,...,Qh}, the second matrix {K1,K2,...,Kh} and the third matrix {V1,V2,...,Vh}, each of which is A parameter matrix group includes three q×k first parameter matrices; according to the formula:

Calculate the z-th sub-attention matrix, where z is greater than or equal to 1 and less than or equal to h; according to the formula: Multihead({w ₁ ,w ₂ ,…,w _n })=Concat(head ₁ ,head ₂ ,…,head _h ) W, the multihead self-attention matrix Multihead is calculated, where W is the preset second parameter matrix, and the Concat function refers to directly splicing the matrix in the column direction; the multihead self-attention matrix is input into the feedforward fully connected layer , The temporary text embedding vector FFN(x) is obtained, where the calculation formula in the feedforward fully connected layer is: FFN(x)=gelu(xW ₁ +b ₁ )W ₂ +b ₂ , where x is the The multi-head self-attention matrix, W ₁ , W ₂ are preset parameter matrices, b ₁ , b ₂ are preset bias values; the temporary text embedding vectors corresponding to all words form a temporary text embedding vector sequence, and The temporary text embedding vector sequence is input into the next layer of word granularity encoder, until the intermediate text embedding vector sequence output by the last layer of word granularity encoder is obtained.

As described in step S6 above, input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, so as to obtain the final text embedding output by the last layer of knowledge granularity encoder The vector sequence and the final entity are embedded in the vector sequence. Wherein, the calculation process in the N-layer knowledge granular encoder is, for example: inputting the intermediate text embedding vector sequence and the entity embedding vector sequence into the multi-head self-attention mechanism layer of the first-layer knowledge granular encoder To get the first vector sequence

And the second vector sequence

The first vector sequence and the second vector sequence are input to the information aggregation layer in the first-level knowledge granularity encoder, so as to obtain the final text embedding vector mj and the final entity embedding vector pj corresponding to the jth word, where the information The calculation formula in the aggregation layer is: mj=gelu(W ₃ h _j +b ₃ ); pj=gelu(W ₄ h _j +b ₄ ); where

All are preset parameter matrices, b ₃ , b ₄ , b ₅ are preset offset values; generate the first text embedding vector sequence {m1, m2,..., mn} and the first entity embedding vector sequence {m1 ,m2,...,mn}, and input the first text embedding vector sequence {m1,m2,...,mn} and the first entity embedding vector sequence {m1,m2,...,mn} into the next level of knowledge granularity coding Until the final text embedding vector sequence and the final entity embedding vector sequence output by the last level of knowledge granularity encoder are obtained.

As described in step S7 above, the final text embedding vector sequence and the final entity embedding vector sequence are input into a preset classification model for processing, and a text classification result is obtained. The classification model may be any feasible classification model, such as a softmax classifier. Since the final text embedding vector sequence and the final entity embedding vector sequence utilize the entity embedding vector, the final text classification result is more suitable for professional situations and the classification is more accurate.

In one embodiment, each layer of word granularity encoder is composed of a multi-head self-attention mechanism layer and a feedforward fully connected layer connected in sequence, and the text embedding vector sequence is input to the preset M layer word granularity. The step S5 of calculating in the encoder to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder includes:

S501. In the multi-head self-attention mechanism layer of the first-layer word granularity encoder, the text embedding vector sequence is respectively multiplied by the trained h first parameter matrix groups to obtain the first matrix {Q1, Q2 ,...,Qh}, the second matrix {K1,K2,...,Kh} and the third matrix {V1,V2,...,Vh}, where each first parameter matrix group includes three q×k first Parameter matrix

S502. According to the formula:

Calculate the z-th sub-attention matrix, where z is greater than or equal to 1 and less than or equal to h;

S503. According to the formula: Multihead({w ₁ ,w ₂ ,...,w _n })=Concat(head ₁ ,head ₂ ,...,head _h )W, calculate the multihead self-attention matrix Multihead, where W is the preset The second parameter matrix of, Concat function refers to the direct splicing of the matrix in the column direction;

S504. Input the multi-head self-attention matrix into the feedforward fully connected layer to obtain a temporary text embedding vector FFN(x), wherein the calculation formula in the feedforward fully connected layer is: FFN(x)= gelu(xW ₁ +b ₁ )W ₂ +b ₂ , where x is the multi-head self-attention matrix, W ₁ and W ₂ are preset parameter matrices, and b ₁ and b ₂ are preset bias values;

S505. The temporary text embedding vectors corresponding to all words are formed into a temporary text embedding vector sequence, and the temporary text embedding vector sequence is input into the next layer of word granularity encoder, until the intermediate text output by the last layer of word granularity encoder is obtained Embedding a sequence of vectors.

As mentioned above, the intermediate text embedding vector sequence output by the last layer of word granularity encoder is achieved. Since each layer of word granularity encoder is composed of a multi-head self-attention mechanism layer and a feedforward fully connected layer connected sequentially, the relationship between words and words (contextual relationship) is reflected. And in order to improve the performance of self-attention, this application adopts the formula: Multihead({w ₁ ,w ₂ ,…,w _n })=Concat(head ₁ ,head ₂ ,…,head _h )W, and calculates the multihead self Attention matrix Multihead, where W is the preset second parameter matrix, and the Concat function refers to the method of directly splicing the matrix in the column direction to construct a comprehensive matrix, and then multiplying the second parameter matrix W to obtain the multihead self-attention matrix , Thereby improving the performance of self-attention (using multiple self-attention groups). The multi-head self-attention matrix is then input into the feedforward fully connected layer to obtain a temporary text embedding vector, and the temporary text embedding vectors corresponding to all words are formed into a temporary text embedding vector sequence. Therefore, the output of the first-level word granularity encoder is the temporary text embedding vector sequence. Since this application is provided with an M-layer word granularity encoder, repeat the above calculation process to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder.

In one embodiment, each layer of knowledge granularity encoder includes a multi-head self-attention mechanism layer and an information aggregation layer, and the intermediate text embedding vector sequence and the entity embedding vector sequence are input to the N The step S6 of obtaining the final text embedding vector sequence and the final entity embedding vector sequence outputted by the knowledge granularity encoder of the last layer of knowledge granularity encoder is:

S601. Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the multi-head self-attention mechanism layer in the first-layer knowledge granularity encoder, so as to obtain the first vector sequence

And the second vector sequence

S602. Input the first vector sequence and the second vector sequence into the information aggregation layer in the first-level knowledge granularity encoder, so as to obtain the final text embedding vector mj and the final entity embedding vector pj corresponding to the jth word. The calculation formula in the information aggregation layer is:

mj=gelu(W ₃ h _j +b ₃ ); pj=gelu(W ₄ h _j +b ₄ ); where

They are all preset parameter matrices, and b ₃ , b ₄ , and b ₅ are all preset offset values;

S603. Generate the first text embedding vector sequence {m1, m2,...,mn} and the first entity embedding vector sequence {m1,m2,...,mn}, and embed the first text into the vector sequence {m1,m2, …,Mn} and the first entity embedding vector sequence {m1,m2,…,mn} are input into the next level of knowledge granularity encoder, until the final text embedding vector sequence and final entity embedding output from the last layer of knowledge granularity encoder are obtained Vector sequence.

As mentioned above, the final text embedding vector sequence and the final entity embedding vector sequence output by the last layer of knowledge granularity encoder are achieved. Each level of knowledge granularity encoder includes a multi-head self-attention mechanism layer and an information aggregation layer. The calculation method of the multi-head self-attention mechanism layer is the same as the calculation method of the multi-head self-attention mechanism layer in the aforementioned word granular encoder. The same, but because the parameter matrix used is trained, the parameter matrix can be different. The information aggregation layer is used to obtain the final text embedding vector mj and the final entity embedding vector pj by using the activation function gelu. The calculation formula in the information aggregation layer is:

mj=gelu(W ₃ h _j +b ₃ ); pj=gelu(W ₄ h _j +b ₄ ); where

They are all preset parameter matrices, and b ₃ , b ₄ , and b ₅ are all preset offset values. Thus, the first text embedding vector sequence {m1, m2,...,mn} and the first entity embedding vector sequence {m1, m2,...,mn} output by the first-level knowledge granularity encoder can be obtained. The calculation process of the knowledge granular encoder is repeated until the final text embedding vector sequence and the final entity embedding vector sequence output by the last layer of knowledge granular encoder.

In one embodiment, said inputting said text embedding vector sequence into a preset M-layer word granularity encoder for calculation, so as to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder; wherein Before step S5 where the M-layer word granular encoder and the preset N-layer knowledge granular encoder are sequentially connected to form a semantic representation model, it includes:

S41. Call the pre-collected training text;

S42. Generate a training text embedding vector sequence corresponding to the training text according to a preset text embedding vector sequence generation method, and input the training text embedding vector sequence into a preset M-layer word granularity encoder Perform calculations to obtain the first sub-attention matrix output by the M-layer word granularity encoder, and then input the first sub-attention matrix into the preset first loss function to obtain the first loss function value;

S43. Generate a training entity embedding vector sequence corresponding to the training text according to a preset entity embedding vector sequence generation method, and input the training entity embedding vector sequence and the training text embedding vector sequence into the preset Suppose the calculation is performed in the N-layer knowledge granular encoder to obtain the second sub-attention matrix output by the N-layer knowledge granular encoder, and then the second sub-attention matrix is input into the preset second loss function , So as to obtain the second loss function value;

S44. According to the formula: total loss function value=the first loss function value+the second loss function value, calculate the total loss function value, and determine whether the total loss function value is greater than a preset loss function threshold;

S45. If the total loss function value is greater than the preset loss function threshold value, adjust the semantic representation model parameters so that the total loss function value is less than the loss function threshold value.

As described above, the training semantic representation model is realized. The M-layer word granularity encoder and the preset N-layer knowledge granularity encoder are sequentially connected to form a semantic representation model. Therefore, this application adopts a comprehensive consideration of the first loss function and the second loss function, and simultaneously trains the M-layer words Granular encoder and N-layer knowledge granular encoder. Accordingly, set the total loss function value=the first loss function value+the second loss function value, and determine whether the total loss function value is greater than a preset loss function threshold. Since the total loss function measures the degree of difference between the output and expectations, if the value of the total loss function is small, it indicates that the semantic representation model is suitable for the current training data, otherwise, the parameters need to be adjusted. Therefore, if the total loss function value is greater than the preset loss function threshold value, the semantic representation model parameters are adjusted so that the total loss function value is less than the loss function threshold value.

In one embodiment, the step S42 of generating a training text embedding vector sequence corresponding to the training text according to a preset text embedding vector sequence generating method includes:

S421. Replace random words in the training text with mask marks, and perform preprocessing on the training text after the mask marks to obtain a training word sequence, wherein the preprocessing includes at least sentence division And word division;

S422. According to the preset word vector library, the corresponding relationship between the position of the sentence to which the i-th word belongs in the training text and the sentence segmentation vector, and the position and position of the i-th word in the training word sequence Correspondence of the vectors, correspondingly acquiring the training word vector di, the training sentence segmentation vector fi, and the training position vector gi corresponding to the i-th word in the training word sequence;

S423. According to the formula: ti=di+fi+gi, the training text embedding vector ti corresponding to the i-th word is calculated, wherein the training word vector di, the training sentence segmentation vector fi and the training position vector gi are the same Dimension of

S424. Generate a training text embedding vector sequence {t1, t2,..., tn}, where there are a total of n words in the training word sequence.

As described above, it is realized that the training text embedding vector sequence corresponding to the training text is generated according to the preset text embedding vector sequence generating method. Wherein, the random words in the training text are replaced with mask marks, and the training text after the mask marks is preprocessed to obtain the training word sequence, that is, the training is performed by mask embedding, It is expected that the model can predict the corresponding words at the mask mark based on the context. Since the semantic representation model is trained, the preprocessing method and the method of generating the text embedding vector sequence for training are the same as the preprocessing method and the method of generating the text embedding vector sequence when the semantic representation model operates normally.

In one embodiment, the method for generating a text embedding vector sequence according to a preset generates a training text embedding vector sequence corresponding to the training text, and inputs the training text embedding vector sequence into a preset M The first sub-attention matrix output by the M-layer word granular encoder is calculated, and then the first sub-attention matrix is input into the preset first loss function to obtain Before step S42 of the first loss function value, it includes:

S411. Set the first loss function as: LOSS1=-∑Y _i log X _i , where LOSS1 is the first loss function, Yi is the expected first sub-attention matrix corresponding to the training text, Xi Is the first sub-attention matrix;

S412. Set the second loss function as: LOSS2=-∑(G _i log H _i +(1-Gi)log(1-Hi)), where LOSS2 is the second loss function, and Gi is the The expected second sub-attention matrix corresponding to the training text, Hi is the second sub-attention matrix.

As described above, it is achieved to set the first loss function and the second loss function. The loss function is used to measure the difference between the value generated by the training data and the expected value, so as to reflect whether the parameters of the model need to be adjusted. In this application, the first loss function is set as: LOSS1=-∑Y _i log X _i , where LOSS1 is the first loss function, and Yi is the expected first sub-attention matrix corresponding to the training text, Xi is the first sub-attention matrix; the second loss function is set as: LOSS2=-∑(G _i log H _i +(1-Gi)log(1-Hi)), where LOSS2 is the The second loss function, Gi is the expected second sub-attention matrix corresponding to the training text, and Hi is the method of the second sub-attention matrix to measure the first sub-attention matrix and the second sub-attention matrix The degree of difference relative to the expected value.

In one embodiment, after the step S7 of inputting the final text embedding vector sequence and the final entity embedding vector sequence into a preset classification model to obtain a text classification result, the method includes:

S71. Obtain a designated answer sentence corresponding to the text classification result according to the preset correspondence between the classification result and the answer sentence;

S72. Output the designated answer sentence.

As described above, the output of the designated answer sentence is realized. Since this application is particularly suitable for the interview question and answer process in a professional context, the original text should be the interviewer's answer to the question, and the text classification result is the analysis of the answer. Since it is an interview question and answer process, this application also adopts the method of obtaining the designated answer sentence corresponding to the text classification result according to the preset correspondence between the classification result and the answer sentence; outputting the designated answer sentence to complete the question and answer process In the final interaction with the interviewer. Wherein, the designated answer sentence is, for example, congratulations, passed the interview, etc.

2, an embodiment of the present application provides a text classification device based on a semantic representation model, including:

The text acquisition unit 10 is configured to acquire the input original text and preprocess the original text to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division;

The first embedding calculation unit 20 is configured to generate a word vector according to a preset method, the corresponding relationship between the position of the sentence to which the i-th word belongs in the original text and the sentence segmentation vector, and the position of the i-th word in the word sequence Correspondence between position and position vector, correspondingly obtain the word vector ai, sentence segmentation vector bi and position vector ci corresponding to the i-th word in the word sequence, and calculate according to the formula: wi=ai+bi+ci Obtain the text embedding vector wi corresponding to the i-th word, where the word vector ai, the sentence segmentation vector bi and the position vector ci have the same dimensions;

The first sequence generating unit 30 is configured to generate a text embedding vector sequence {w1, w2,..., wn}, wherein there are a total of n words in the word sequence;

The second sequence generating unit 40 is configured to input the word sequence into a preset knowledge embedding model to obtain an entity embedding vector sequence {e1, e2,..., en}, where en is the entity embedding corresponding to the nth word vector;

The intermediate sequence generating unit 50 is configured to input the text embedding vector sequence into a preset M-layer word granularity encoder for calculation, so as to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder; wherein The M-layer word granular encoder and the preset N-layer knowledge granular encoder are sequentially connected to form a semantic representation model, where M and N are both greater than or equal to 2;

The final sequence generating unit 60 is configured to input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, so as to obtain the final text output by the last layer of knowledge granularity encoder Embedding vector sequence and final entity embedding vector sequence;

The text classification unit 70 is configured to input the final text embedding vector sequence and the final entity embedding vector sequence into a preset classification model for processing to obtain a text classification result.

The operations performed by the above-mentioned units respectively correspond to the steps of the text classification method based on the semantic representation model of the foregoing embodiment, and will not be repeated here.

In one embodiment, each layer of word granularity encoder is composed of a multi-head self-attention mechanism layer and a feedforward fully connected layer connected in sequence, and the intermediate sequence generating unit 50 includes:

The matrix group calculation subunit is used to multiply the text embedding vector sequence by the trained h first parameter matrix groups in the multi-head self-attention mechanism layer of the first-layer word granularity encoder to obtain the first parameter matrix group. A matrix {Q1,Q2,...,Qh}, a second matrix {K1,K2,...,Kh} and a third matrix {V1,V2,...,Vh}, each of the first parameter matrix groups includes three q×k first parameter matrix;

The first matrix obtains subunits, which are used according to the formula:

The second matrix obtains subunits, which are used according to the formula:

Multihead({w ₁ ,w ₂ ,…,w _n })=Concat(head ₁ ,head ₂ ,…,head _h )W, calculate the multihead self-attention matrix Multihead, where W is the preset second parameter matrix , Concat function refers to directly splicing the matrix in the column direction;

The temporary vector acquisition subunit is used to input the multi-head self-attention matrix into the feedforward fully connected layer to obtain a temporary text embedding vector FFN(x), wherein the calculation formula in the feedforward fully connected layer is : FFN(x)=gelu(xW ₁ +b ₁ )W ₂ +b ₂ , where x is the multi-head self-attention matrix, W ₁ , W ₂ are preset parameter matrices, and b ₁ , b ₂ are preset parameter matrices. Set offset value;

The intermediate sequence generation subunit is used to compose the temporary text embedding vector corresponding to all words into a temporary text embedding vector sequence, and input the temporary text embedding vector sequence into the next layer of word granularity encoder until the last layer of word granularity is obtained The intermediate text output by the encoder is embedded in the vector sequence.

The operations performed by the above subunits respectively correspond to the steps of the semantic representation model-based text classification method of the foregoing embodiment, and will not be repeated here.

In one embodiment, each layer of knowledge granularity encoder includes a multi-head self-attention mechanism layer and an information aggregation layer, and the final sequence generating unit 60 includes:

The first acquisition subunit is used to input the intermediate text embedding vector sequence and the entity embedding vector sequence into the multi-head self-attention mechanism layer in the first-level knowledge granularity encoder, thereby obtaining the first vector sequence

And the second vector sequence

The first calculation subunit is used to input the first vector sequence and the second vector sequence into the information aggregation layer in the first-layer knowledge granularity encoder, so as to obtain the final text embedding vector mj and mj corresponding to the j-th word. The final entity embedding vector pj, where the calculation formula in the information aggregation layer is:

mj=gelu(W ₃ h _j +b ₃ ); pj=gelu(W ₄ h _j +b ₄ ); where

The second calculation subunit is used to generate the first text embedding vector sequence {m1, m2,...,mn} and the first entity embedding vector sequence {m1,m2,...,mn}, and embed the first text into the vector The sequence {m1,m2,…,mn} and the first entity embedding vector sequence {m1,m2,…,mn} are input into the next level of knowledge granularity encoder until the final text embedding output by the last layer of knowledge granularity encoder is obtained The vector sequence and the final entity are embedded in the vector sequence.

In one embodiment, the device includes:

The text calling unit is used to call the pre-collected training text;

The first acquiring unit is configured to generate a training text embedding vector sequence corresponding to the training text according to a preset text embedding vector sequence generation method, and input the training text embedding vector sequence into a preset M layer The word granularity encoder performs calculations to obtain the first sub-attention matrix output by the M-layer word granular encoder, and then inputs the first sub-attention matrix into the preset first loss function, thereby obtaining the first sub-attention matrix A loss function value;

The second acquiring unit is configured to generate a training entity embedding vector sequence corresponding to the training text according to a preset entity embedding vector sequence generating method, and embed the training entity into the vector sequence and the training text The embedding vector sequence is input into the preset N-layer knowledge granular encoder for calculation, so as to obtain the second sub-attention matrix output by the N-layer knowledge granular encoder, and then the second sub-attention matrix is input into the preset In the second loss function, the value of the second loss function is thus obtained;

The third acquiring unit is configured to calculate the total loss function value according to the formula: total loss function value=the first loss function value+the second loss function value, and determine whether the total loss function value is greater than a preset value The threshold of loss function;

The adjustment unit is configured to, if the total loss function value is greater than a preset loss function threshold, adjust the semantic representation model parameters so that the total loss function value is less than the loss function threshold.

The operations performed by the above-mentioned units respectively correspond to the steps of the text classification method based on the semantic representation model of the foregoing embodiment one-to-one, and will not be repeated here.

In one embodiment, the first acquiring unit includes:

The second acquisition subunit is used to replace random words in the training text with mask marks, and preprocess the training text after the mask marks to obtain a training word sequence, wherein Preprocessing includes at least sentence division and word division;

The third acquisition subunit is used for the corresponding relationship between the position of the sentence to which the i-th word belongs in the training text and the sentence segmentation vector according to the preset word vector library, and the position of the i-th word in the training word Correspondence between the position in the sequence and the position vector, correspondingly acquiring the training word vector di, the training sentence segmentation vector fi, and the training position vector gi corresponding to the i-th word in the training word sequence;

The fourth acquisition subunit is used to calculate the training text embedding vector ti corresponding to the i-th word according to the formula: ti=di+fi+gi, where the training word vector di, the training sentence segmentation vector fi and training Use the position vector gi to have the same dimension;

The fifth acquisition subunit is used to generate a training text embedding vector sequence {t1, t2,..., tn}, wherein there are a total of n words in the training word sequence.

In one embodiment, the device includes:

The first setting unit is configured to set the first loss function as: LOSS1=-∑Y _i log X _i , where LOSS1 is the first loss function, and Yi is the expected first sub-function corresponding to the training text An attention matrix, Xi is the first sub-attention matrix;

The second setting unit is configured to set the second loss function as: LOSS2=-∑(G _i log H _i +(1-Gi)log(1-Hi)), where LOSS2 is the second loss function , Gi is the expected second sub-attention matrix corresponding to the training text, and Hi is the second sub-attention matrix.

In one embodiment, the device includes:

The sentence acquisition unit is configured to acquire the designated answer sentence corresponding to the text classification result according to the preset correspondence between the classification result and the answer sentence;

The sentence output unit is used to output the specified answer sentence.

3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in the figure. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store the data used in the text classification method based on the semantic representation model. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a text classification method based on the semantic representation model.

The above-mentioned processor executes the above-mentioned semantic representation model-based text classification method, wherein the steps included in the method respectively correspond to the steps of executing the semantic representation model-based text classification method of the foregoing embodiment, and will not be repeated here.

An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, a method for text classification based on a semantic representation model is realized, wherein the steps included in the method are respectively the same as those in the foregoing The steps of the text classification method based on the semantic representation model of the embodiment correspond one-to-one, and will not be repeated here. The computer-readable storage medium is, for example, a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.

Claims

A text classification method based on a semantic representation model, including:

Acquiring the input original text, and preprocessing the original text to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division;

According to the preset word vector generation method, the corresponding relationship between the position of the i-th word in the original text and the sentence segmentation vector, and the corresponding relationship between the position of the i-th word in the word sequence and the position vector, corresponding Obtain the word vector ai, sentence segmentation vector bi, and position vector ci corresponding to the i-th word in the word sequence, and calculate the text embedding corresponding to the i-th word according to the formula: wi=ai+bi+ci Vector wi, where the word vector ai, sentence segmentation vector bi and position vector ci have the same dimensions;

Generate a text embedding vector sequence {w1, w2,..., wn}, where there are a total of n words in the word sequence;

Input the word sequence into the preset knowledge embedding model to obtain the entity embedding vector sequence {e1, e2,..., en}, where en is the entity embedding vector corresponding to the nth word;

Input the text embedding vector sequence into the preset M-layer word granularity encoder for calculation, thereby obtaining the intermediate text embedding vector sequence output by the last layer word granularity encoder; wherein the M-layer word granularity encoder and the pre- Suppose that the N-level knowledge granularity encoders are sequentially connected to form a semantic representation model, where M and N are both greater than or equal to 2;

Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, thereby obtaining the final text embedding vector sequence and the final entity embedding vector output by the last layer of knowledge granularity encoder sequence;

The final text embedding vector sequence and the final entity embedding vector sequence are input into a preset classification model for processing to obtain a text classification result.
According to the text classification method based on the semantic representation model of claim 1, each layer of word granularity encoder is composed of a multi-head self-attention mechanism layer and a feedforward fully connected layer connected in sequence, and said embedding the text into the vector The sequence is input into the preset M-layer word granularity encoder for calculation, so as to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder, including:

In the multi-head self-attention mechanism layer of the first-layer word granularity encoder, the text embedding vector sequence is respectively multiplied by the trained h first parameter matrix groups, thereby obtaining the first matrix {Q1,Q2,... ,Qh}, the second matrix {K1,K2,...,Kh} and the third matrix {V1,V2,...,Vh}, where each first parameter matrix group includes three q×k first parameter matrices ；

According to the formula:
Calculate the z-th sub-attention matrix, where z is greater than or equal to 1 and less than or equal to h;

According to the formula: Multihead({w 1 ,w 2 ,…,w n })=Concat(head 1 ,head 2 ,…,head h )W, the multihead self-attention matrix Multihead is calculated, where W is the preset first Two-parameter matrix, the Concat function refers to directly splicing the matrix in the column direction;

The multi-head self-attention matrix is input into the feedforward fully connected layer to obtain a temporary text embedding vector FFN(x), wherein the calculation formula in the feedforward fully connected layer is: FFN(x)=gelu( xW 1 + b 1 ) W 2 + b 2 , where x is the multi-head self-attention matrix, W 1 and W 2 are preset parameter matrices, and b 1 and b 2 are preset bias values;

The temporary text embedding vectors corresponding to all words are formed into a temporary text embedding vector sequence, and the temporary text embedding vector sequence is input into the next layer of word granularity encoder, until the intermediate text embedding vector output by the last layer of word granularity encoder is obtained sequence.
According to the text classification method based on the semantic representation model of claim 1, each layer of knowledge granularity encoder includes a multi-head self-attention mechanism layer and an information aggregation layer, and the intermediate text is embedded in the vector sequence and the information aggregation layer. The steps of inputting the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, so as to obtain the final text embedding vector sequence and the final entity embedding vector sequence output by the last layer of knowledge granularity encoder, include:

Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the multi-head self-attention mechanism layer in the first-layer knowledge granularity encoder, thereby obtaining the first vector sequence
And the second vector sequence

The first vector sequence and the second vector sequence are input to the information aggregation layer in the first-level knowledge granularity encoder, so as to obtain the final text embedding vector mj and the final entity embedding vector pj corresponding to the jth word, where the information The calculation formula in the aggregation layer is:

mj=gelu(W 3 h j +b 3 ); pj=gelu(W 4 h j +b 4 ); where
W 3 , W 4 ,
They are all preset parameter matrices, and b 3 , b 4 , and b 5 are all preset offset values;

Generate the first text embedding vector sequence {m1,m2,...,mn} and the first entity embedding vector sequence {m1,m2,...,mn}, and embed the first text into the vector sequence {m1,m2,..., mn} and the first entity embedding vector sequence {m1,m2,...,mn} are input into the next level of knowledge granularity encoder, until the final text embedding vector sequence and the final entity embedding vector sequence output by the last layer of knowledge granularity encoder are obtained .
The text classification method based on the semantic representation model according to claim 1, said inputting said text embedding vector sequence into a preset M-layer word granularity encoder for calculation, thereby obtaining the output of the last layer of word granularity encoder The intermediate text embedding vector sequence; wherein the M-layer word granularity encoder and the preset N-layer knowledge granularity encoder are sequentially connected to form a semantic representation model before the step includes:

Call the pre-collected training text;

According to the preset text embedding vector sequence generating method, generate the training text embedding vector sequence corresponding to the training text, and input the training text embedding vector sequence into the preset M-layer word granularity encoder for calculation , So as to obtain the first sub-attention matrix output by the M-layer word granularity encoder, and then input the first sub-attention matrix into the preset first loss function to obtain the first loss function value;

According to the preset entity embedding vector sequence generating method, the training entity embedding vector sequence corresponding to the training text is generated, and the training entity embedding vector sequence and the training text embedding vector sequence are input into the preset Perform calculations in the N-layer knowledge granular encoder to obtain the second sub-attention matrix output by the N-layer knowledge granular encoder, and then input the second sub-attention matrix into the preset second loss function, thereby Obtain the second loss function value;

According to the formula: total loss function value=the first loss function value+the second loss function value, the total loss function value is calculated, and it is judged whether the total loss function value is greater than a preset loss function threshold;

If the total loss function value is greater than the preset loss function threshold value, the semantic representation model parameters are adjusted so that the total loss function value is less than the loss function threshold value.
The text classification method based on a semantic representation model according to claim 4, wherein the step of generating a training text embedding vector sequence corresponding to the training text according to a preset text embedding vector sequence generating method comprises:

Replace random words in the training text with mask marks, and perform preprocessing on the training text after the mask marks to obtain a training word sequence, wherein the preprocessing includes at least sentence division and words Divide

According to the preset word vector library, the corresponding relationship between the position of the sentence to which the i-th word belongs in the training text and the sentence segmentation vector, the position of the i-th word in the training word sequence and the position vector Correspondence, correspondingly acquiring the training word vector di, the training sentence segmentation vector fi, and the training position vector gi corresponding to the i-th word in the training word sequence;

According to the formula: ti=di+fi+gi, the training text embedding vector ti corresponding to the i-th word is calculated, where the training word vector di, the training sentence segmentation vector fi and the training position vector gi have the same dimensions ；

Generate a training text embedding vector sequence {t1, t2,..., tn}, wherein there are a total of n words in the training word sequence.
The method for text classification based on a semantic representation model according to claim 4, wherein said training text embedding vector sequence corresponding to said training text is generated according to a preset text embedding vector sequence generation method, and said training The text embedding vector sequence is input into the preset M-layer word granularity encoder for calculation, so as to obtain the first sub-attention matrix output by the M-layer word granularity encoder, and then the first sub-attention matrix is input to the pre- Before the step of obtaining the value of the first loss function in the first loss function, it includes:

Set the first loss function as: LOSS1=-∑Y i logX i , where LOSS1 is the first loss function, Yi is the expected first sub-attention matrix corresponding to the training text, and Xi is the The first sub-attention matrix;

Set the second loss function as: LOSS2=-∑(G i logH i +(1-Gi)log(1-Hi)), where LOSS2 is the second loss function, and Gi is the training text The corresponding expected second sub-attention matrix, Hi is the second sub-attention matrix.
The text classification method based on a semantic representation model according to claim 1, after the step of inputting the final text embedding vector sequence and final entity embedding vector sequence into a preset classification model for processing, and obtaining a text classification result ,include:

Obtaining the designated answer sentence corresponding to the text classification result according to the preset correspondence between the classification result and the answer sentence;

The specified answer sentence is output.
A text classification device based on a semantic representation model, including:

The text acquisition unit is configured to acquire the input original text and preprocess the original text to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division;

The first embedding calculation unit is used to generate the word vector according to the preset method, the corresponding relationship between the position of the sentence to which the i-th word belongs in the original text and the sentence segmentation vector, and the position of the i-th word in the word sequence Correspondence with the position vector, correspondingly obtain the word vector ai, sentence segmentation vector bi and position vector ci corresponding to the i-th word in the word sequence, and calculate according to the formula: wi=ai+bi+ci The text embedding vector wi corresponding to the i-th word, where the word vector ai, the sentence segmentation vector bi and the position vector ci have the same dimensions;

The first sequence generating unit is used to generate a text embedding vector sequence {w1, w2,..., wn}, wherein there are a total of n words in the word sequence;

The second sequence generating unit is used to input the word sequence into the preset knowledge embedding model to obtain the entity embedding vector sequence {e1, e2,..., en}, where en is the entity embedding vector corresponding to the nth word ；

The intermediate sequence generating unit is used to input the text embedding vector sequence into a preset M-layer word granularity encoder for calculation, so as to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder; wherein the M The layer word granularity encoder and the preset N layer knowledge granularity encoder are sequentially connected to form a semantic representation model, where M and N are both greater than or equal to 2;

The final sequence generating unit is used to input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, so as to obtain the final text embedding output by the last layer of knowledge granularity encoder Vector sequence and final entity embedding vector sequence;

The text classification unit is used to input the final text embedding vector sequence and the final entity embedding vector sequence into a preset classification model for processing to obtain a text classification result.
According to the text classification device based on the semantic representation model of claim 8, each layer of word granularity encoder is composed of a multi-head self-attention mechanism layer and a feedforward fully connected layer connected in sequence, and the intermediate sequence generating unit includes :

The matrix group calculation subunit is used to multiply the text embedding vector sequence by the trained h first parameter matrix groups in the multi-head self-attention mechanism layer of the first-layer word granularity encoder to obtain the first parameter matrix group. A matrix {Q1,Q2,...,Qh}, a second matrix {K1,K2,...,Kh} and a third matrix {V1,V2,...,Vh}, each of the first parameter matrix groups includes three q×k first parameter matrix;

The first matrix obtains subunits, which are used according to the formula:
Calculate the z-th sub-attention matrix, where z is greater than or equal to 1 and less than or equal to h;

The second matrix obtains subunits, which are used according to the formula:

Multihead({w 1 ,w 2 ,…,w n })=Concat(head 1 ,head 2 ,…,head h )W, calculate the multihead self-attention matrix Multihead, where W is the preset second parameter matrix , Concat function refers to directly splicing the matrix in the column direction;

The temporary vector acquisition subunit is used to input the multi-head self-attention matrix into the feedforward fully connected layer to obtain a temporary text embedding vector FFN(x), wherein the calculation formula in the feedforward fully connected layer is : FFN(x)=gelu(xW 1 +b 1 )W 2 +b 2 , where x is the multi-head self-attention matrix, W 1 , W 2 are preset parameter matrices, and b 1 , b 2 are preset parameter matrices. Set offset value;

The intermediate sequence generation subunit is used to compose the temporary text embedding vector corresponding to all words into a temporary text embedding vector sequence, and input the temporary text embedding vector sequence into the next layer of word granularity encoder until the last layer of word granularity is obtained The intermediate text output by the encoder is embedded in the vector sequence.
According to the text classification device based on the semantic representation model of claim 8, each layer of knowledge granularity encoder includes a multi-head self-attention mechanism layer and an information aggregation layer, and the final sequence generation unit includes:

The first acquisition subunit is used to input the intermediate text embedding vector sequence and the entity embedding vector sequence into the multi-head self-attention mechanism layer in the first-level knowledge granularity encoder, thereby obtaining the first vector sequence
And the second vector sequence

The first calculation subunit is used to input the first vector sequence and the second vector sequence into the information aggregation layer in the first-layer knowledge granularity encoder, so as to obtain the final text embedding vector mj and mj corresponding to the j-th word. The final entity embedding vector pj, where the calculation formula in the information aggregation layer is:

mj=gelu(W 3 h j +b 3 ); pj=gelu(W 4 h j +b 4 ); where
W 3 , W 4 ,
They are all preset parameter matrices, and b 3 , b 4 , and b 5 are all preset offset values;

The second calculation subunit is used to generate the first text embedding vector sequence {m1, m2,...,mn} and the first entity embedding vector sequence {m1,m2,...,mn}, and embed the first text into the vector The sequence {m1,m2,…,mn} and the first entity embedding vector sequence {m1,m2,…,mn} are input into the next level of knowledge granularity encoder until the final text embedding output by the last layer of knowledge granularity encoder is obtained The vector sequence and the final entity are embedded in the vector sequence.
The text classification device based on a semantic representation model according to claim 8, the device comprising:

The text calling unit is used to call the pre-collected training text;

The first acquiring unit is configured to generate a training text embedding vector sequence corresponding to the training text according to a preset text embedding vector sequence generation method, and input the training text embedding vector sequence into a preset M layer The word granularity encoder performs calculations to obtain the first sub-attention matrix output by the M-layer word granular encoder, and then inputs the first sub-attention matrix into the preset first loss function, thereby obtaining the first sub-attention matrix A loss function value;

The second acquiring unit is configured to generate a training entity embedding vector sequence corresponding to the training text according to a preset entity embedding vector sequence generating method, and embed the training entity into the vector sequence and the training text The embedding vector sequence is input into the preset N-layer knowledge granular encoder for calculation, so as to obtain the second sub-attention matrix output by the N-layer knowledge granular encoder, and then the second sub-attention matrix is input into the preset In the second loss function, the value of the second loss function is thus obtained;

The third acquiring unit is configured to calculate the total loss function value according to the formula: total loss function value=the first loss function value+the second loss function value, and determine whether the total loss function value is greater than a preset value The threshold of loss function;

The adjustment unit is configured to, if the total loss function value is greater than a preset loss function threshold, adjust the semantic representation model parameters so that the total loss function value is less than the loss function threshold.
The apparatus for text classification based on a semantic representation model according to claim 11, wherein the first acquiring unit comprises:

The second acquisition subunit is used to replace random words in the training text with mask marks, and preprocess the training text after the mask marks to obtain a training word sequence, wherein Preprocessing includes at least sentence division and word division;

The third acquisition subunit is used for the corresponding relationship between the position of the sentence to which the i-th word belongs in the training text and the sentence segmentation vector according to the preset word vector library, and the position of the i-th word in the training word Correspondence between the position in the sequence and the position vector, correspondingly acquiring the training word vector di, the training sentence segmentation vector fi, and the training position vector gi corresponding to the i-th word in the training word sequence;

The fourth acquisition subunit is used to calculate the training text embedding vector ti corresponding to the i-th word according to the formula: ti=di+fi+gi, where the training word vector di, the training sentence segmentation vector fi and training Use the position vector gi to have the same dimension;

The fifth acquisition subunit is used to generate a training text embedding vector sequence {t1, t2,..., tn}, wherein there are a total of n words in the training word sequence.
The text classification device based on a semantic representation model according to claim 11, the device comprising:

The first setting unit is configured to set the first loss function as: LOSS1=-∑Y i logX i , where LOSS1 is the first loss function, and Yi is the expected first sub-attention corresponding to the training text Force matrix, Xi is the first sub-attention matrix;

The second setting unit is configured to set the second loss function as: LOSS2=-∑(G i logH i +(1-Gi)log(1-Hi)), where LOSS2 is the second loss function, Gi is the expected second sub-attention matrix corresponding to the training text, and Hi is the second sub-attention matrix.
The text classification device based on a semantic representation model according to claim 8, the device comprising:

The sentence acquisition unit is configured to acquire the designated answer sentence corresponding to the text classification result according to the preset correspondence between the classification result and the answer sentence;

The sentence output unit is used to output the specified answer sentence.
A computer device includes a memory and a processor, the memory stores a computer program, the processor implements a semantic representation model-based text classification method when the processor executes the computer program, the semantic representation model-based text classification method, include:

Acquiring the input original text, and preprocessing the original text to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division;

According to the preset word vector generation method, the corresponding relationship between the position of the i-th word in the original text and the sentence segmentation vector, and the corresponding relationship between the position of the i-th word in the word sequence and the position vector, corresponding Obtain the word vector ai, sentence segmentation vector bi, and position vector ci corresponding to the i-th word in the word sequence, and calculate the text embedding corresponding to the i-th word according to the formula: wi=ai+bi+ci Vector wi, where the word vector ai, sentence segmentation vector bi and position vector ci have the same dimensions;

Generate a text embedding vector sequence {w1, w2,..., wn}, where there are a total of n words in the word sequence;

Input the word sequence into the preset knowledge embedding model to obtain the entity embedding vector sequence {e1, e2,..., en}, where en is the entity embedding vector corresponding to the nth word;

Input the text embedding vector sequence into the preset M-layer word granularity encoder for calculation, thereby obtaining the intermediate text embedding vector sequence output by the last layer word granularity encoder; wherein the M-layer word granularity encoder and the pre- Suppose that the N-level knowledge granularity encoders are sequentially connected to form a semantic representation model, where M and N are both greater than or equal to 2;

Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, thereby obtaining the final text embedding vector sequence and the final entity embedding vector output by the last layer of knowledge granularity encoder sequence;

The final text embedding vector sequence and the final entity embedding vector sequence are input into a preset classification model for processing to obtain a text classification result.
The computer device according to claim 15, wherein each layer of word granularity encoder is composed of a multi-head self-attention mechanism layer and a feedforward fully connected layer connected in sequence, and the said text embedding vector sequence is input to a preset The steps of calculating in the M-layer word granular encoder to obtain the intermediate text embedding vector sequence output by the last layer of word granular encoder include:

In the multi-head self-attention mechanism layer of the first-layer word granularity encoder, the text embedding vector sequence is respectively multiplied by the trained h first parameter matrix groups, thereby obtaining the first matrix {Q1,Q2,... ,Qh}, the second matrix {K1,K2,...,Kh} and the third matrix {V1,V2,...,Vh}, where each first parameter matrix group includes three q×k first parameter matrices ；

According to the formula:
Calculate the z-th sub-attention matrix, where z is greater than or equal to 1 and less than or equal to h;

According to the formula: Multihead({w 1 ,w 2 ,…,w n })=Concat(head 1 ,head 2 ,…,head h )W, the multihead self-attention matrix Multihead is calculated, where W is the preset first Two-parameter matrix, the Concat function refers to directly splicing the matrix in the column direction;

The multi-head self-attention matrix is input into the feedforward fully connected layer to obtain a temporary text embedding vector FFN(x), wherein the calculation formula in the feedforward fully connected layer is: FFN(x)=gelu( xW 1 + b 1 ) W 2 + b 2 , where x is the multi-head self-attention matrix, W 1 and W 2 are preset parameter matrices, and b 1 and b 2 are preset bias values;

The temporary text embedding vectors corresponding to all words are formed into a temporary text embedding vector sequence, and the temporary text embedding vector sequence is input into the next layer of word granularity encoder, until the intermediate text embedding vector output by the last layer of word granularity encoder is obtained sequence.
The computer device according to claim 15, wherein each layer of knowledge granularity encoder includes a multi-head self-attention mechanism layer and an information aggregation layer, and said inputting said intermediate text embedding vector sequence and said entity embedding vector sequence The steps of performing calculation in the N-layer knowledge granular encoder to obtain the final text embedding vector sequence and the final entity embedding vector sequence output by the last layer of knowledge granular encoder include:

Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the multi-head self-attention mechanism layer in the first-layer knowledge granularity encoder, thereby obtaining the first vector sequence
And the second vector sequence

The first vector sequence and the second vector sequence are input to the information aggregation layer in the first-level knowledge granularity encoder, so as to obtain the final text embedding vector mj and the final entity embedding vector pj corresponding to the jth word, where the information The calculation formula in the aggregation layer is:

mj=gelu(W 3 h j +b 3 ); pj=gelu(W 4 h j +b 4 ); where
W 3 , W 4 ,
They are all preset parameter matrices, and b 3 , b 4 , and b 5 are all preset offset values;

Generate the first text embedding vector sequence {m1,m2,...,mn} and the first entity embedding vector sequence {m1,m2,...,mn}, and embed the first text into the vector sequence {m1,m2,..., mn} and the first entity embedding vector sequence {m1,m2,...,mn} are input into the next level of knowledge granularity encoder, until the final text embedding vector sequence and the final entity embedding vector sequence output by the last layer of knowledge granularity encoder are obtained .
A non-volatile computer-readable storage medium having a computer program stored thereon, the computer program being executed by a processor to realize a text classification method based on a semantic representation model, the text classification method based on a semantic representation model, include:

Acquiring the input original text, and preprocessing the original text to obtain a word sequence, wherein the preprocessing includes at least sentence division and word division;

According to the preset word vector generation method, the corresponding relationship between the position of the i-th word in the original text and the sentence segmentation vector, and the corresponding relationship between the position of the i-th word in the word sequence and the position vector, corresponding Obtain the word vector ai, sentence segmentation vector bi, and position vector ci corresponding to the i-th word in the word sequence, and calculate the text embedding corresponding to the i-th word according to the formula: wi=ai+bi+ci Vector wi, where the word vector ai, sentence segmentation vector bi and position vector ci have the same dimensions;

Generate a text embedding vector sequence {w1, w2,..., wn}, where there are a total of n words in the word sequence;

Input the word sequence into the preset knowledge embedding model to obtain the entity embedding vector sequence {e1, e2,..., en}, where en is the entity embedding vector corresponding to the nth word;

Input the text embedding vector sequence into the preset M-layer word granularity encoder for calculation, thereby obtaining the intermediate text embedding vector sequence output by the last layer word granularity encoder; wherein the M-layer word granularity encoder and the pre- Suppose that the N-level knowledge granularity encoders are sequentially connected to form a semantic representation model, where M and N are both greater than or equal to 2;

Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, thereby obtaining the final text embedding vector sequence and the final entity embedding vector output by the last layer of knowledge granularity encoder sequence;

The final text embedding vector sequence and the final entity embedding vector sequence are input into a preset classification model for processing to obtain a text classification result.
According to the non-volatile computer-readable storage medium of claim 18, each layer of word granularity encoder is composed of a multi-head self-attention mechanism layer and a feedforward fully connected layer connected in sequence, and the text The embedding vector sequence is input into the preset M-layer word granularity encoder for calculation, so as to obtain the intermediate text embedding vector sequence output by the last layer of word granularity encoder, including:

In the multi-head self-attention mechanism layer of the first-layer word granularity encoder, the text embedding vector sequence is respectively multiplied by the trained h first parameter matrix groups, thereby obtaining the first matrix {Q1,Q2,... ,Qh}, the second matrix {K1,K2,...,Kh} and the third matrix {V1,V2,...,Vh}, where each first parameter matrix group includes three q×k first parameter matrices ；

According to the formula:
Calculate the z-th sub-attention matrix, where z is greater than or equal to 1 and less than or equal to h;

According to the formula: Multihead({w 1 ,w 2 ,…,w n })=Concat(head 1 ,head 2 ,…,head h )W, the multihead self-attention matrix Multihead is calculated, where W is the preset first Two-parameter matrix, the Concat function refers to directly splicing the matrix in the column direction;

The multi-head self-attention matrix is input into the feedforward fully connected layer to obtain a temporary text embedding vector FFN(x), wherein the calculation formula in the feedforward fully connected layer is: FFN(x)=gelu( xW 1 + b 1 ) W 2 + b 2 , where x is the multi-head self-attention matrix, W 1 and W 2 are preset parameter matrices, and b 1 and b 2 are preset bias values;

The temporary text embedding vectors corresponding to all words are formed into a temporary text embedding vector sequence, and the temporary text embedding vector sequence is input into the next layer of word granularity encoder, until the intermediate text embedding vector output by the last layer of word granularity encoder is obtained sequence.
The non-volatile computer-readable storage medium according to claim 18, each layer of knowledge granularity encoder includes a multi-head self-attention mechanism layer and an information aggregation layer, and said intermediate text is embedded in a vector sequence The steps of inputting the entity embedding vector sequence into the N-layer knowledge granularity encoder for calculation, so as to obtain the final text embedding vector sequence and the final entity embedding vector sequence output by the last layer of knowledge granularity encoder, include:

Input the intermediate text embedding vector sequence and the entity embedding vector sequence into the multi-head self-attention mechanism layer in the first-layer knowledge granularity encoder, thereby obtaining the first vector sequence
And the second vector sequence

The first vector sequence and the second vector sequence are input to the information aggregation layer in the first-level knowledge granularity encoder, so as to obtain the final text embedding vector mj and the final entity embedding vector pj corresponding to the jth word, where the information The calculation formula in the aggregation layer is:

mj=gelu(W 3 h j +b 3 ); pj=gelu(W 4 h j +b 4 ); where
W 3 , W 4 ,
They are all preset parameter matrices, and b 3 , b 4 , and b 5 are all preset offset values;

Generate the first text embedding vector sequence {m1,m2,...,mn} and the first entity embedding vector sequence {m1,m2,...,mn}, and embed the first text into the vector sequence {m1,m2,..., mn} and the first entity embedding vector sequence {m1,m2,...,mn} are input into the next level of knowledge granularity encoder, until the final text embedding vector sequence and the final entity embedding vector sequence output by the last layer of knowledge granularity encoder are obtained .