CN110543242A

CN110543242A - expression input method based on BERT technology and device thereof

Info

Publication number: CN110543242A
Application number: CN201910679545.2A
Authority: CN
Inventors: 周诚
Original assignee: Beijing Wisdom Octopus Technology Co Ltd
Current assignee: Beijing Wisdom Octopus Technology Co Ltd
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2019-12-06
Anticipated expiration: 2039-07-25
Also published as: CN110543242B

Abstract

the expression input method based on the BERT technology comprises the following steps: s1: pre-training a corpus feature BERT model; s2: pre-training a classifier model, classifying a plurality of preset expressions, and pre-training the classification of the expressions according to characteristics; s3: when the corpus information input by a user is received, performing corpus word processing including word segmentation and stop words by taking words as units, and setting the corpus word processing into an input data format required by a corpus characteristic BERT model; s4: inputting the data into a corpus feature BERT model for feature extraction to obtain corresponding feature vectors v1, v 2.., vk; s5: inputting the feature vectors v1, v 2.., vk into a pre-trained classifier model, and obtaining the final belonged expression; s6: and expression display information including pictures, animations and the like corresponding to the expression required by the user. The expression package recommendation method and the expression package recommendation system can classify expressions, input words can be accurately matched with the corresponding expression categories, an expression package recommendation algorithm based on user behaviors is designed, the expressions which are most interesting to users are displayed, excessive screen sliding selection of the users is avoided, and the use experience of the users is improved.

Description

expression input method based on BERT technology and device thereof

Technical Field

The invention relates to an expression motion picture input algorithm, in particular to an expression motion picture input method and device based on a BERT technology.

background

The input method is to carry out fuzzy matching of key words according to information input by a user, guess the input intention wanted by the user, dynamically pop up an expression window including Gif and the like, enable the user to carry out clicking operation, send corresponding Gif and other animation expressions to the application where the user is located at present and send the corresponding Gif and other animation expressions to a receiving party.

Existing emotive input methods are generally spoken based on emotion classification techniques and deep learning techniques. For example, an emotion-based classification technique is required to guess the input intention intended by the user, and a deep learning technique may be used to make the guess more accurate. The following briefly speaks about the introduction of the related art.

emotion classification technology

Dictionary-based method

and (3) carrying out polarity and intensity labeling on the constructed text emotion dictionary to further carry out text emotion classification, and carrying out a text emotion analysis process based on the dictionary as shown in figure 1.

(II) method based on machine learning

Supervised machine learning methods: naive Bayes NB (Naive Bayes) and support vector machine SVM (support vector Ma-c hine) are common supervised learning algorithms in the machine learning method, but researches indicate that NB and SVM can face the problems of independent condition hypothesis and kernel function selection when used independently, so Sharma and the like integrate a 'weak' support vector machine classifier by using Boosting technology, utilize the classification performance of Boosting and simultaneously use SVM as a basic classifier, and research results show that the accuracy of the set classifier is obviously superior to that of a simple SVM classifier.

When a computer executes a text emotion analysis task, each word of a text is usually analyzed independently, words possibly with emotion colors are mined, emotion words in sentences are integrated to judge the emotion of the sentences, and the emotion words are overlapped layer by layer so as to judge the emotion polarity of the whole text. However, each word in the text is not equally important for the emotion analysis of the text, and the computer cannot automatically judge the importance of the word, so that Deng et al propose a supervised word weight assignment scheme according to two factors, namely the importance of the word in the whole text and the importance of the expression emotion. The texts of the comment classes lack logicality, the texts are mostly in disorder, the accuracy rate is low when the general supervised learning algorithm processes the disordered texts, and the integrated classifier is designed by Perikos and the like and is based on 3 classifiers: the 1 st and 2 nd are statistics (naive bayes and maximum entropy), and the 3 rd is a knowledge-based tool for deep analysis of natural language sentences. Similarly, Tripathy et al divides the text by 1 word, 2 words, 3 words, and combinations thereof, and then performs comment sentiment analysis using naive bayes, maximum entropy, stochastic gradient descent, and support vector machine methods, respectively. Tripathy et al compare not only different methods, but also combinations.

The weak supervision deep learning method comprises the following steps:

The use of neural network models inevitably involves Word vector embedding techniques, i.e. the conversion of human language into machine language, such as Word2Vec, Giatsoglou, etc. combines the context sensitive coding provided by Word2Vec with the emotional information provided by the lexicon. Although the word vector embedding technique considers the context of the word, but ignores the emotion of the whole text, Tang et al propose to solve this problem by encoding the emotion information of the text together with the context of the word in emotion embedding, and develop a neural network with clipping loss function to automatically collect emotion signals. Fern endez-Gavil-anes et al propose a new unsupervised emotion analysis algorithm that uses dependency syntax to determine the polarity of the emotion. During text emotion analysis, a plurality of emotion words with inconsistent emotion polarities often appear in the same sentence, and it is considered that an attention machine mechanism (attentionmachine) can effectively solve the above problems, so that a Multi-attention convolutional neural network (MAT T-CNN) is constructed by combining the word vector attention machine mechanism, the part-of-speech attention machine mechanism and the position attention machine mechanism.

RNN is one of network models often used in deep learning, and has been widely used in natural language processing. Generally, the RNN Network model is referred to as a Recurrent N-ary Network (Recurrent neural Network) and is a model based on time series, but the RNN may also be referred to as a Recurrent neural Network (Recurrent neural Network) and is a Network model focusing on structural hierarchy. Liu jin Shuo, etc. takes the word vector which is pre-trained as the input of the lower-layer recurrent neural network, then takes the sentence vector which is output by the recurrent neural network as the input of the upper-layer recurrent neural network by using time sequence logic, effectively combines 2 networks, and can solve the problem of low accuracy of the classifier. The Xie iron and the like capture sentence semantic information by utilizing a deep recurrent neural network algorithm and introduce Chinese 'emotion training tree diagram database' as training data to find word emotion information. When a text sentiment analysis task is carried out, most of the used databases are comment short text databases, and the accuracy of the traditional feature extraction method is low due to the fact that the texts are short and the features are sparse. Sun and the like combine the deep belief network and the feature extraction method to obtain an extended feature extraction method so as to solve the problem of sparse short text extraction features. The Cao Yu coma is focused on solving the problem that the recurrent neural network RNN cannot learn long-distance dependence information and the problem that the fully-connected classification layer in the text emotion analysis model of the convolutional neural network C NN cannot effectively classify the emotion of data in nonlinear distribution. In consideration of the defect that the traditional method cannot obtain text semantic information, a semi-supervised RAE (RecursiveAutoEn-co-ders) method based on deep learning is introduced to Zhu Shao Jie, and the method can obtain higher accuracy when the feature dimension is lower. The long short term Memory network LSTM (Long short term Memory network) is a special RNN, the RNN cannot solve the problem of long distance dependence, and the LSTM can capture the dependence relationship between texts and store information for a long time. Zhongying, et al propose an LSTM model based on a multi-attention mechanism, which is applied to microblog comments of P10 flash memory to analyze the social emotion. Hu and the like establish a keyword word bank on the basis of an LSTM model, can help to mine potential languages in the text, and can further improve the correctness of text polarity judgment. Ma et al propose to add to LST M an overlay attention mechanism consisting of target-level and sentence-level attention models, called perceptual L STM, with particular attention to exploiting common sense knowledge in deep neural sequence models. The LSTM is an effective chain type cyclic neural network, a chain structure cannot effectively represent structural level information of a language, and the LSTM is expanded to a recursive neural network based on a tree structure by Liojun and the like and is used for capturing deep-level information of a text. The difficulty of the text emotion analysis task at the sentence level and the chapter level is different, so that the general model has no universality.

However, the existing input method can only display limited Gif expressions, and can only support more expressions by manual updating; when the expression page is used, a user must manually enter the expression page and select the gif expression to send the corresponding expression to the other side, and the personalized requirements of the user cannot be met. The patent with the application number of 201610356623.1 discloses a method for inputting expressions, and the technical scheme of the invention comprises the following steps:

S1, acquiring the character string currently input by the user in real time;

s2, connecting with a remote server through a network, and performing fuzzy matching in the remote server according to the obtained character string to obtain the latest dynamic expression picture resource and storing the latest dynamic expression picture resource in a background database, wherein the specific operation of performing fuzzy matching in the remote server according to the obtained character string is as follows: associating the acquired character strings with names of corresponding expression contents according to search rules, matching the same or similar meaning names of the dynamic expression pictures in a dynamic expression picture database of a remote server according to the names, and screening out the latest dynamic expression pictures after matching the obtained results;

and S3, displaying the latest dynamic expression picture on the interface by a pop-up frame for the user to select, confirming the input of the dynamic expression picture if the user clicks the dynamic expression pop-up frame, enabling the pop-up frame to disappear, and automatically sending the dynamic expression picture, so that the latest dynamic expression picture can be quickly input, and the user experience is improved.

the expression input is a matching technology based on templates such as key words and the like, the number of expression packages is very limited, in addition, fuzzy matching is needed when a remote server is accessed, and the problem that the use limitation of an input method is easily influenced when a network has problems.

Disclosure of Invention

The invention aims to provide an expression input method based on a BERT technology, which aims to solve the technical problems that the number of expression packages is very limited in the prior art and the like.

an expression input method based on BERT technology comprises the following steps:

S1: pre-training a corpus feature BERT model, and performing feature extraction training of feature expression on characters/words to be expressed by a user;

S2: pre-training a classifier model, classifying a plurality of preset expressions, and pre-training the classification of the expressions according to characteristics; and adding an emotion analysis algorithm in the classification process to improve the classification result; (the above process is done off-line)

S3: when the corpus information input by a user is received, performing corpus word processing including divided words and stop words by taking 'words' or 'words' as a unit, and setting the corpus word processing into an input data format required by a corpus characteristic BERT model;

s4: inputting the data into a corpus feature BERT model for feature extraction to obtain corresponding feature vectors v1, v2

S5: inputting the feature vectors v1, v 2.., vk into a pre-trained classifier model, and normalizing the class probability of the feature vectors by using a SoftMax function to obtain the final belonged expression;

s6: and displaying expression display information including pictures, animations and the like corresponding to the expression required by the User by using the historical expression of the User through the User-CF or ltem-CF.

Compared with the prior art, the invention has the following advantages:

Firstly, the method is different from the traditional text chatting and expression chatting processes, only the expression of related feelings is directly displayed on a client (namely, a corresponding expression packet), the input method can directly endow the required expression texts on the corresponding expression packet, and the user requirements can be more accurately expressed;

Secondly, the emotion types of the input method expression package are more than two hundred, and basically cover emotion scenes in daily life; and the traditional text chat is changed into the moving picture chat, thereby improving the interest of the chat.

subsequently, the invention adopts the BERT technology and pre-trains the corpus feature BERT model to extract the feature of the character/word to be expressed by the user, and the BERT model has more accurate feature extraction, thereby leading the feature extraction of the character/word to be more comprehensive and accurate.

in addition, more accurate feature extraction is input to the LSTM algorithm, the highest classification probability of the feature belonging to hundreds of expression classifications can be conveniently calculated, corresponding expressions are found, correspondingly displayed pictures, animations or synthesized pictures and animations are set for each expression in advance, an emotion analysis algorithm is further added in the process, when the content input by a user is, the emotion which the user possibly wants to express is analyzed, the expression with the highest probability of classification is classified according to the emotion, then, expression pictures which are accurately matched with the expression data are selected and recommended, the expression data and the expression pictures are randomly typeset and configured into thumbnails or moving pictures of expression packages and displayed in an expression input panel, the accuracy of expression input is achieved, and the use experience of the user is greatly improved.

Finally, the expression display information including pictures, animations and the like corresponding to the expression required by the User is displayed through the expression by using the User-CF or ltem-CF, so that a better display effect is achieved, and the User has good expression input experience. In addition, aiming at the defects that the number of the expression packages is large and the display of the mobile terminal is limited, the expression package recommendation algorithm based on the user behavior is designed, and the user experience is improved.

Drawings

FIG. 1 is a diagram of a process of dictionary-based text emotion analysis;

FIG. 2 is a schematic flow chart of the input method;

FIG. 3 is a visual representation of a BERT model input representation;

FIG. 4 is a block diagram of the LSTM algorithm;

FIG. 5 is a functional block diagram of an example;

6A-6C are diagrams showing effects of the octopus input method;

FIG. 7 is a schematic diagram of a BERT model input device.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

The invention discloses an expression input method based on a BERT technology, which adopts a BERT (bidirectional Encoder reproduction from transformations) method, firstly, the emotion to be expressed by a user is subjected to characteristic expression, and therefore, when the user inputs the required expression, more accurate expression can be provided; secondly, classifying the emotional expressions into a plurality of types (such as 202 types) by using a deep learning classification method; finally, aiming at the defects that the expression packages are large in quantity and the mobile terminal display is limited, an expression package recommendation algorithm based on user behaviors is designed, the expression packages which are most interested by the user are displayed, excessive screen sliding selection of the user is avoided, and the use experience of the user is improved. The input method is different from the traditional text chatting and expression chatting processes, only related emotion expressions (namely corresponding expression packages) are directly displayed on the client, the required expression texts can be directly given to the corresponding expression packages by the input method, and the requirements of the user can be more accurately expressed; secondly, the emotion types of the input method expression package are more than two hundred, and the emotion scenes in daily life are basically covered; and the traditional text chat is changed into the moving picture chat, thereby improving the interest of the chat.

please refer to fig. 2, which is a flowchart of the input method. It comprises the following steps:

S110: pre-training a corpus feature BERT model, and performing feature extraction training of feature expression on characters/words to be expressed by a user;

s120: pre-training a classifier model, classifying a plurality of preset expressions, and pre-training the classification of the expressions according to characteristics;

s130: when the corpus information input by a user is received, performing corpus word processing including divided words and stop words by taking 'words' or 'words' as a unit, and setting the corpus word processing into an input data format required by a corpus characteristic BERT model;

s140: inputting the data into a corpus feature BERT model for feature extraction to obtain corresponding feature vectors v1, v2

S150: inputting the feature vectors v1, v 2.., vk into a pre-trained classifier model, normalizing the class probability of the feature vectors by using a SoftMax function, and finding out the final belonged expression from emotion classification;

S160: and displaying expression display information including pictures, animations and the like corresponding to the expression required by the User by using the historical expression of the User through the User-CF or ltem-CF.

First, S130 is introduced.

The new language representation model of BERT, which represents the bi-directional encoder representation of the Transformer. BERT aims to pre-train a deep bi-directional representation by jointly mediating contexts in all layers. Therefore, the trained BERT representation can be fine-tuned by an additional output layer, and is suitable for the construction of the most advanced model of a wide range of tasks, BERT: i.e., the transform's bi-directional coded representation, to improve the approach of architecture-based fine tuning. BERT proposes a new training target: a Mask Language Model (MLM) to overcome the unidirectional limitations mentioned above. The inspiration of MLM comes from the size task. Some tokens in the MLM stochastic masking model (i.e., the corpus-feature BERT model) input aim to predict their original vocabulary id based only on the context of the masked words. Unlike left-to-right language model training, MLM targets allow for the characterization of contexts that fuse the left and right sides, thus training a deep two-way Tansformer. In addition to the masking language model, this patent also introduces a "next sentence prediction" task that can pre-train the representation of the text pair in conjunction with the MLM.

The inputs to BERT represent: the input representation can explicitly represent a single text sentence or a pair of text sentences (e.g. [ Question, Answer ]) in a token sequence. For a given token, its input representation is constructed by summing the corresponding token, segment, and position embeddings. FIG. 3 is a visual representation of an input representation. The representation of each word is obtained by adding the Token Embedding, Segment Embedding and Position Embedding. The Token Embedding is a simple table look-up operation, the Segment Embedding represents the sentence to which the word belongs, and the Position Embedding is information of the corresponding Position of the word in the sentence, and is also a table look-up operation. The corpus-feature BERT model is a feature extraction model consisting of bidirectional transformers. In the figure, E represents the embedding of the word, T represents the new feature representation of each word after being subjected to BERT coding, and Trm represents the transform feature extractor. Using a masked language model in training, randomizing some tokens in the mask input, then predicting them in pre-training, adding a sentence-level task, next sense prediction, randomly replacing some sentences, and then using the previous sentence to predict lsNext/NotNext. Through the two tasks, three expressions of words are optimized by using large-scale unmarked corpora to obtain a pre-trained corpus feature BERT model. Setting the input data format required by the corpus-feature BERT model further comprises: using WordPiece embedded and a vocabulary of multiple (say 30,000) tokens, denoted participles with #, using learned position entries, supporting a sequence length of up to 512 tokens,

the first token of each sequence is always a special classification embedding, the final hidden state corresponding to this token is used as the aggregate sequence representation of the classification task, and for non-classification tasks, this vector will be ignored; sentence pairs are packed into a sequence: the sentences are distinguished in two ways, first, they are separated by a [ SEP ] inner special mark, second, a least presence A is added to each token of the first sentence, a presence B is inserted to each token of the second sentence,

For a single sentence input, only sensor A embedding is used.

Next, introducing s140. inputting the corpus feature BERT model to perform feature extraction, and obtaining corresponding feature vectors v1, v 2.

The method comprises the steps of generating addition of token embedding, segment embedding and position embedding three representations for each word in a Sentence x, x1, x 2.

For the obtained word optimization text, feature extraction can also be performed by using a model including a TextCNN model, wherein the TextCNN model is a laminated model formed by a plurality of feeling CNNs in parallel, and can be used for well extracting the features helpful for classification for representation in sentences, and obtaining the final classification feature representation after performing pooling operation on the extracted features. The TextCNN is formed by a plurality of different convolution layers in parallel, is calculated by a plurality of convolution kernels with different sizes, and uses a plurality of convolution kernels with different sizes to be beneficial to extracting semantic features and sentence pattern features of sentences; the pooling layer performs pooling operation on the result after convolution, and extracts the most important characteristics after convolution calculation; building a semantic file of the text from the obtained word optimization text, and processing the semantic file by a convolution layer to obtain a feature map; inputting the feature map into a pooling layer, obtaining word vectors through maximum pooling processing, and connecting the word vectors in series into feature vectors. Although only one feature extraction scheme is disclosed above, other feature extraction algorithms may be adopted, which are only given as examples and are not intended to limit the present invention.

next, S150 is introduced. And inputting the feature vectors v1, v 2.., vk into a pre-trained classifier model, and normalizing the class probability of the feature vectors by using a SoftMax function to obtain the final belonged expression. The classifier models may be RNN, CBOW, etc., where LSTM works best. The following description will take LSTM as an example.

The LSTM algorithm is an algorithm proposed due to the deficiencies of gradient disappearance and gradient explosion of RNN. And has short-term memory ability. The update of the gradient is generally performed using bptt (back Propagation Through time). In the LSTM network, the neurons in the general RNN network are replaced with blocks, and an illustration thereof is shown in fig. 4.

an LSTM layer is made up of a plurality of blocks connected as shown in figure 4. These blocks contain one or more circular connected memory cells, i.e., cells in FIG. 4, and three other cells: an Input gate (Input gate), an Output gate (Output gate), and a forgetting gate (Forget gate). Considering meaning, wherein an input gate and an output gate are used for inputting and outputting data, and in the graph, ^ g and ^ h; the network adjusts whether to "forget" or "remember" the currently entered data by initializing the forget gate to a value of 1.

1. Forward propagation

Assuming that the current moment is t1 and the previous moment is t0, storing all hidden layer units and activating each output step by step of network iteration; where N is the total number of neurons in the network and wij represents the weight from neuron i to neuron j. For each LSTM block, the parameters of the input gate, the forgetting gate and the output gate are represented by l, phi and omega respectively, C represents an element of the cell set C, sc represents the state value of the cell C, f is the compression function of the gate, and g and h represent the input and output compression functions of the cell. Then there is a change in the number of,

Input gates (Input gates):

y＝f(x)

forgetting gate (Forget gates):

y＝f(x)

Cells：

s＝ys(t-1)+yg(x)

Output gates (Output gates):

y＝f(x)

Cell output：

2. counter-propagating

Assuming that the current start time is t1, the back propagation iterative process uses the standard BPTT for parameter update:

The definition of the method is that,

ε(t)＝e(t)

for each LSTM block, the value δ is calculated using the following equation:

Cell Output：

Output Gates (Output Gates):

State value (States):

Cells：

forgetting gate (Forget Gates):

input Gates (Input Gates):

Using the standard BPTT, calculating δ yields the corresponding error value:

definition of

Definition of

that is, the classifier model is an LSTM neural network model, and inputting the feature vectors v1, v 2.

acquiring a characteristic vector v1, v 2.., vk which is used as an input sequence of an LSTM neural network model; the LSTM neural network model comprises a plurality of LSTM layers, each LSTM layer is formed by connecting a plurality of blocks, the blocks comprise one or more cyclic connection memory units, and the LSTM neural network model further comprises other three units: an Input gate (Input gate), an Output gate (Output gate), and a forgetting gate (Forget gate), wherein the forgetting gate adjusts whether to 'Forget' or 'remember' the currently Input data through ^ g and ^ h;

calculating an output Y for the input sequence by forward propagation and/or backward propagation through parameters of the LSTM neural network model,

and obtaining the most corresponding expression classification information in the plurality of preset expression classifications through outputting Y.

take a simple example: at the LSTM step, our problem has been classified, and the effect of forward-backward propagation is briefly illustrated here. Assuming that there are four expressions of happiness, anger, sadness and happiness at present, the feature vectors thereof are v1, v2, v3 and v4, respectively, and the input into the LSTM is finally expected to output four results, i.e. the target input is [1, 2, 3, 4 ]. Forward propagating, after the series of calculations, there will be an actual calculation result of t1, t2, t3, t4, respectively, but we expect the output to be [1, 2, 3, 4], where the error is generated as [1-t1, 2-t2, 3-t3, 4-t4], so there is a process of back propagating, the error is back propagated; after the propagation is finished, the network has a new weighted value, a new round of forward propagation is carried out, and then a new actual output is obtained, and the network can determine when the training is finished according to a set error acceptance range and the number of network iterations; with the optimal network weight, after input (i.e. words and sentences input by the customer) is input into the network, the classification result of each expression is obtained. In the expression classification 202, the training process is performed through the above processes (forward and reverse processes), and after each expression is input into the network, the result in 202 is obtained, which value is the maximum, and the expression result is the result corresponding to the value. Therefore, it is the training process of LSTM.

Of course, sentiment algorithms may also be considered throughout the process. For example, we classify the expression types at multiple levels, which can use the emotion as one of the levels, and classify the expression types possibly related to a certain emotion under the subdirectory corresponding to the emotion. During expression classification training and continuous expression classification, corresponding emotion algorithms can be generally carried out to obtain the possibly expressed emotion of the input content, and then the input content is classified into corresponding expression classes in a refined mode, so that the classification accuracy can be improved.

Finally, introduction S160:

S106: and displaying the expression display information including pictures, animations and the like corresponding to the expression required by the User by using the User-CF or ltem-CF through the expression.

And recommending the expression display information in the graph or the animation corresponding to the expression based on collaborative filtering.

(1)ltem-CF

the item-based collaborative filtering is similar to the user-based collaborative filtering in that it uses all user preferences (scores) for items or information to find similarities between items and then recommends similar items to the user based on the user's calendar history preference information. Collaborative filtering based on articles can be regarded as a kind of degeneration of association rule recommendation, but since collaborative filtering considers actual scores of users more and only calculates similarity rather than finding frequent sets, collaborative filtering based on articles can be regarded as high in accuracy and coverage.

(2)User-CF

The basic principle of collaborative filtering recommendation based on users is that according to the preferences (scores) of all users on articles or information, a 'neighbor' user group similar to the taste and the preferences of the current user is found, and an algorithm for calculating 'K-Nearest neighbor' is adopted in general application; and then recommending the current user based on the history preference information of the K neighbors.

there are many collaborative filtering methods based on commodities, and some are given in the following examples. The description is given first by way of example. A user group of related or close users, the user group having emoticons a, emoticons b,.. the emoticons N each have a number N (e.g., N may be 2-4) emoticon labels (mainly 2-4 keywords), such as emoticon a having a label (taga): taga1 (e.g., happy), taga2 (e.g., tamamidine), taga3 (e.g., yaoming) … tagaN, the keywords in taga get their corresponding feature vectors by the BERT model:

v(taga1)＝[v，v，...，v]

v(taga2)＝[v，v，..，v]

…

v(taga3)＝[v，v，...，v]

…

v(tagaN)＝[v，v，..，v]

Weight-averaging each keyword vector of taga yields:

v(taga)＝[v+v+v，v+v+v，...，v+v+v]/m

＝[V，V，...，V]

similarly, expression b has label tag, each label keyword passes through BERT model to obtain corresponding feature vector V (tag) ═ V21, V22., V2m, expression c has label tag, each label keyword passes through BERT model to obtain corresponding word feature vector V (tag) ═ V31, V32., V3m

then the similarity between expressions can be obtained by using the cosine similarity:

The similarity degree of the expression a and the expression b is as follows: cos (taga, tagb):

The similarity between the expression a and the expression c is as follows: cos (taga, tagc):

The similarity between the expression b and the expression c is as follows: cos (tagb, tagc):

…

and calculating a cos value of a certain expression by calculating a plurality of related expressions, wherein the cos value is the smallest and the expression is the most similar, and the similar expression can be recommended to the user.

(3) Content-based recommendations

The content-based recommendation is a recommendation mechanism which is most widely applied at the beginning of the emergence of a recommendation engine, and the core idea of the recommendation mechanism is to find the relevance of an item or content according to metadata of the recommended item or content and then recommend similar items to a user based on past preference records of the user. The recommendation system is mostly used for application of some news, tags are extracted from the articles as keywords of the articles, and then the similarity of the two articles can be evaluated through the tags.

the advantages of such a recommendation system are:

A. Easy to implement, does not require user data and therefore does not have the problems of sparsity and cold start.

B. and the recommendation is based on the characteristics of the article, so that the problem of excessive recommendation hot is avoided.

(4) Association rule based recommendations

Association rule based recommendations are more common in e-commerce systems and have also proven effective. The practical meaning is that users who have purchased some items prefer to purchase others. The primary goal of an association rule based recommendation system is to mine association rules, i.e., collections of items purchased by many users at the same time, which can be recommended to each other. At present, the association rule mining algorithm mainly develops and evolves from Apriori and FP-Growth.

recommendation systems based on association rules generally have higher conversion rates because when a user has purchased several items in a frequent collection, there is a higher likelihood of purchasing other items in the frequent collection. The disadvantages of this mechanism are:

A. The calculation amount is large, but the calculation can be carried out off line, so the influence is not large.

B. With the adoption of user data, there are inevitable cold start and sparsity problems.

C. there is a problem in that hit items are easily over-recommended.

the example can select one of the recommended images or pictures or animation information which is displayed to the user and corresponds to the expression type.

application example

Fig. 5 is an application environment diagram of the dynamic expression generation method in one embodiment. Referring to fig. 5, the dynamic expression generation method is applied to a dynamic expression generation system. The dynamic expression generation system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may specifically be a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers. The terminal 110 may enter an emoticon input panel in the conversation page, detect a character or a word input in the emoticon input panel, and collect the data; the terminal 110 may further obtain feature extraction for the characters or words, find the expression to which the character or word belongs finally from the emotion classification, screen and recommend an expression picture accurately matching the expression data, arrange the expression data and the expression picture at random to configure a thumbnail or a moving picture of the expression package and display the thumbnail or the moving picture in the expression input panel, and may display a plurality of thumbnails and/or moving pictures as session messages in the session page according to various recommendation algorithms.

It should be noted that the above application environment is only an example, in some embodiments, the terminal 110 may further send an input word operation to the server 120, perform feature extraction on the word or the word by the server 120, find a final belonging expression from the emotion classification, filter and recommend an expression picture accurately matched with the expression data, and configure the expression data and the expression picture into a thumbnail or a motion picture of an expression package by random layout (it should be noted that random layout may be adopted for the first time use, and along with the accumulation of the use habits of the user, the preferred expression of the user may be recommended to the user by using a recommendation algorithm). And finally, feeding back the synthesized expression thumbnail corresponding to one or more dynamic expressions to the terminal 110, and adding a plurality of expression thumbnails or animations in the expression input panel by the terminal 110, so that when the terminal 110 detects a trigger operation for the expression thumbnail in the expression input panel, the terminal 110 can pull the dynamic expression corresponding to the expression thumbnail from the server 120 and display the dynamic expression as a session message in a session page. Of course, the terminal 110 may also present a plurality of motion pictures to the user, and recommend the motion pictures to the user through various recommendation algorithms, and the motion pictures are presented in the session page selected by the user.

referring to fig. 6A-6C, in one embodiment, a method for generating dynamic expressions is provided. The embodiment is mainly illustrated by applying the method to the terminal 110 in fig. 1. The dynamic expression generation method specifically comprises the following steps:

s1, enter emoji input panel in the conversation page.

the conversation page is a page for presenting a conversation message, and may be, for example, a page for presenting a conversation message sent by both parties of a conversation in a social application. The social application is an application for performing network social interaction based on a social network, the social application generally has an instant messaging function, the social application can be an instant messaging application, and the like, and the session message can be an instant session message.

an expression is an image with a meaning expression function, and can reflect the internal heart activity, emotion or specific semantics of a user who sends the expression. Expressions include static expressions and dynamic expressions. Generally, a static emoticon is a frame of static picture, and may be in a file Format of PNG (Portable Network Graphics), and a dynamic emoticon is an animation, which is synthesized from multiple frames of pictures, and may be in a file Format of GIF (Graphics Interchange Format).

the expression input panel is a container for storing expression thumbnails corresponding to all expressions, and a user can add new expressions into the expression input panel. The expression input panel may further include a plurality of tabs for accommodating expression thumbnails corresponding to expressions of different categories or different sources, for example, a common tab for accommodating expression thumbnails corresponding to expressions self-designed by a developer of a social application, a collection tab for accommodating expression thumbnails corresponding to expressions currently collected by a user, and an add tab for downloading, saving, or importing new expressions, and the like. Generally, in the conversation page, the emoticon input panel may be switched back and forth with the text input panel, so that the user may input text in the text input box and send a text message to the communication partner when switching to the text input panel, and send an emoticon message to the communication partner by inputting an emoticon when switching to the emoticon input panel. The expression input panel and the text input panel may be collectively referred to as a conversation panel.

Specifically, the terminal may display an emoticon in a session page, and when a trigger event for the emoticon is detected, display an emoticon input panel in the session page, and enter the emoticon input panel. The terminal can acquire the expression corresponding to the expression thumbnail when detecting the triggering operation, triggered by the current user, for any expression thumbnail in the expression input panel, send the acquired expression to another terminal logged in the login account of the communication counterpart, and display the acquired expression in the current session page. In this embodiment, the trigger operation for the expression thumbnail may be a click operation, a press operation, a move operation, a slide operation, or the like.

In this example, we adopt that different icon frames or icons are recommended to the user, and although the emoticons are not shown in the example, the synthesized icons are not excluded from being arranged in the emoticons.

s2: the terminal receives chat linguistic data input by a user, such as 'good and difficult' and the like, and performs feature extraction processing by adopting the open chat linguistic data:

performing word segmentation and stop word processing on the corpus;

and performing feature extraction on the corpora by using BERT, wherein corresponding feature vectors are v1, v 2.

s3: inputting the feature vector of the terminal after BERT feature extraction into an LSTM, and obtaining corresponding classification results in 202 expressions by utilizing a softmax function;

The terminal receives characters and words input by a user, performs feature extraction on the corpus by using BERT, extracts features which are difficult to be correlated, and matches the features with LSTM to obtain emotion classification which is 'difficult to be correlated'.

s4: and screening and recommending expression pictures accurately matched with the expression data, and randomly composing the expression data and the expression pictures to configure a thumbnail or a moving picture of the expression package and displaying the thumbnail or the moving picture in an expression input panel. (it should be noted that the first use can adopt random typesetting, and along with the accumulation of the user's use habits, the preferred expressions of the user can be recommended to the user by using a recommendation algorithm)

The terminal sets a plurality of expression pictures correspondingly according to different emotion classifications, and gives a specific configuration method according to how the expression pictures are combined into a motion picture. For example, a matching picture (such as a crying related picture) is found out from the 'difficult picture', and then the matching picture and the crying related picture generate a thumbnail or a moving picture of the emoticon according to a preset configuration method.

S5: and displaying the thumbnails or the dynamic images of the expression packages to a user input interface in a tiled arrangement mode, and directly displaying the dynamic images in a conversation frame after a user selects a certain dynamic image. Aiming at the defects that the emoticons are large in number and the mobile terminal display is limited, the emoticons required by the display User can be recommended and displayed by using User-CF or ltem-CF, so that the User can select the emoticons which accord with the personality of the User at the first time, and the situation that the User turns over and finds the emoticons is avoided as much as possible.

the present invention also provides an expression input device based on BERT technology (please refer to fig. 7), comprising:

The corpus feature BERT model 110 is used for performing feature extraction training of feature expression on characters/words to be expressed by a user, receiving corpus information input by the user, and performing feature extraction to obtain corresponding feature vectors v1, v2, a.

The classifier model 120 is used for classifying a plurality of preset expressions, pre-training the classification of the expressions according to features, receiving input feature vectors v1, v2,.., vk, normalizing the class probability of the feature vectors by using a SoftMax function, and finding the expression which the feature vectors belong to finally from emotion classification;

The expression display device 130 displays expression display information including pictures and animations corresponding to the expression required by the user through the expression.

the corpus-feature BERT model 110 further includes:

Input data format preprocessing module 111: performing language material word processing including language segmentation and stop words by taking words as a unit, and setting the language material word processing into an input data format required by a language material characteristic BERT model;

the feature extraction processing module 112: the method comprises the steps of receiving linguistic data and character data input according to a preset input data format, extracting line characteristics, and obtaining corresponding characteristic vectors v1, v 2.

the classifier model is an LSTM neural network model, which further includes:

the LSTM neural network model includes a plurality of LSTM layers 121, each LSTM layer is formed by connecting a plurality of blocks, the blocks include one or more cyclic connection memory units, and further include three other units: an Input gate (Input gate), an Output gate (Output gate), and a forgetting gate (Forget gate), wherein the forgetting gate adjusts whether to 'Forget' or 'remember' the currently Input data through ^ g and ^ h;

a forward/backward propagation calculation module 122 for calculating an output Y for the input sequence by forward propagation and/or backward propagation through parameters of the LSTM neural network model,

And the emotion classification module 123 is configured to obtain the most corresponding expression classification information in the multiple preset expression classifications through outputting Y.

the display device 130 further comprises:

picture/motion picture forming module 131: and screening and recommending the expression pictures accurately matched with the expression data for the final expression classification data corresponding to the input, and randomly composing and configuring the expression data and the expression pictures into thumbnails or images of expression packages and displaying the thumbnails or images in an expression input panel.

the recommendation module 132: and the thumbnail or the motion picture is recommended to the user terminal through a recommendation algorithm.

For the expression recommendation, the following method can be specifically used for implementation:

The first method is that the expression is directly displayed to the user according to the times of the user using the expression, namely the expression of the 'little yellow head portrait' used by the user is the most times in the 'favorite' expression, and the expression can be recommended to be displayed to the user;

The second method, by means of collaborative filtering: a simple example is as follows.

The collaborative filtering method based on users comprises a user A, a user B and a user C; and receiving the click times input by the user, wherein the use condition of each expression (namely the click times) of each user is shown as the following table.

	Expression a	expression b	expression n
				user A	8	5	3
User B	7	10	2
				User C	4	2	1

the user similarity may be obtained by using other distance similarity calculation methods such as euclidean distance and clustering, which are exemplified by euclidean distance. The Euclidean distance formula is as follows: in n-dimensional space, the distance between the set of points x and the set of points and y is d (x, y):

Calculating the distance between the user and the adjacent user; and finding the adjacent user with the minimum distance, and recommending the expression used by the adjacent user to the user.

taking the user a as an example, the small recommendation calculation process of the user a is specifically described.

Then for user a (usera) and user b (userb):

for user a (usera) and user c (userc) there are:

for user b (userb) and user c (userc) there are:

The emotions used by the user a may be recommended to the user B, or the emotions used by the user B may be recommended to the user a.

If the clustering method is used, users with the same favorite commodities (user expressions here) are known for the user A, the user B and the user N, and the expressions used by the users can be recommended to the users.

Product-based collaborative filtering methods, where emoji a, emoji b,., emoji n has 2-4 emoji labels (mainly 2-4 keywords) per emoji, such as emoji a has a label (taga): happy, acetamiprid, yaoming, then the keyword in the taga gets its corresponding feature vector through the BERT model:

v (open center) ([ v11, v 12.., v1m ])

v (laumidinium) ([ v21, v 22., v2m ]

left-hand teeth (teeth) [ v31, v 32.., v3m ]

Weight-averaging each keyword vector of taga yields:

v(taga)＝[v+v+v，v+v+v，...，v+v+v]/m

＝[V，V，...，V]

in practical applications, the expression is calculated to be most similar when the cos value is minimum, and the similar expression can be recommended to the user.

In the expression similarity calculation method, other similarity methods such as clustering and the like can be used, and finally, expressions with similar results are obtained and recommended to the user.

A computer device may be embodied as the terminal 110 in fig. 5. The computer device comprises a processor, a memory, a network interface, an input device and a display screen which are connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by a processor, causes the processor to implement the dynamic expression generation method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a dynamic expression generation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like. The camera of the computer equipment can be a front camera or a rear camera, and the sound collection device of the computer equipment can be a microphone.

those skilled in the art will appreciate that the only partial structure associated with the disclosed aspects and not limitation of the computing devices to which the disclosed aspects apply, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components. In one embodiment, the dynamic expression generation apparatus provided in the present application may be implemented in the form of a computer program that is executable on an upper computer device. The memory of the computer device may store various program modules constituting the dynamic expression generation apparatus, and the computer program constituted by the various program modules causes the processor to execute the steps in the dynamic expression generation method according to the embodiments of the present application described in the present specification.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-described dynamic representation generation method. Here, the steps of the dynamic expression generation method may be steps in the dynamic expression generation methods of the above embodiments.

in one embodiment, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, causes the processor to perform the steps of the above-described dynamic expression generation method. Here, the steps of the dynamic expression generation method may be steps in the dynamic expression generation methods of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The present invention has been described in terms of specific examples, which are provided to aid in the understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. An expression input method based on BERT technology is characterized by comprising the following steps:

s2: pre-training a classifier model, classifying a plurality of preset expressions, and pre-training the classification of the expressions according to characteristics;

S3: when the corpus information input by a user is received, performing corpus word processing including language segmentation and stop words by taking words/phrases as a unit, and setting the corpus word processing into an input data format required by a corpus characteristic BERT model;

S4: inputting the data into a corpus feature BERT model for feature extraction to obtain corresponding feature vectors v1, v2, and vk, wherein k is the total word number obtained after all corpora are participled;

s5: inputting the feature vectors v1, v 2.., vk into a pre-trained classifier model, normalizing the class probability of the feature vectors by using a SoftMax function, and finding out the final belonged expression from emotion classification;

S6: and displaying the picture and animation corresponding to the expression required by the User by using the historical expression through the User-CF or ltem-CF.

2. the method as claimed in claim 1, wherein the pre-trained corpus feature BERT Model in S3 is a BERT Model, which is generated by adding three representations including token embedding, segment embedding and position embedding for each word or phrase in a Sentence x1, x 2.

3. The method of claim 1, wherein the input data format required for setting the language features BERT model in S3 further comprises:

Using a vocabulary of WordPiece embedding and multiple tokens, denoted participles with #,

using learned positional embeddings, the supported sequence length is at most 512 tokens, the first token of each sequence is always a special classification embedding, the final hidden state corresponding to that token is used as the aggregated sequence representation of the classification task, and for non-classification tasks, this vector will be ignored;

Sentence pairs are packed into a sequence: the sentences are distinguished in two ways, first, they are separated by a [ SEP ] inner special mark, second, a least presence A is added to each token of the first sentence, a presence B is inserted to each token of the second sentence,

for a single sentence input, only sensor A embedding is used.

4. the method of claim 1, wherein classifying the plurality of predetermined expressions further comprises:

The classifier module can classify by adopting algorithms including CBOW and LSTM, wherein the expression type is classified in advance according to the expression result of the emotion of the user, and the subsequent expression type can be set to be expandable.

5. the method of claim 1, wherein the classifier model is an LSTM neural network model, and inputting the feature vectors v1, v 2.

Acquiring a characteristic vector v1, v 2.., vk which is used as an input sequence of an LSTM neural network model; the LSTM neural network model comprises a plurality of LSTM layers, each LSTM layer is formed by connecting a plurality of blocks, and the blocks comprise one or more cyclic connection memory units and other three units: an Input gate (Input gate), an output gate (0utput gate) and a forgetting gate (Forget gate), wherein the forgetting gate adjusts whether to 'Forget' or 'remember' the currently Input data through ^ g and ^ h;

computing an output Y for the input sequence by forward propagation and/or backward propagation through parameters of the LSTM neural network model,

6. The method of claim 1, wherein S6 further comprises:

According to the preferences of all users for articles or information, discovering a 'neighbor' user group similar to the taste and the preferences of the current user, and adopting an algorithm for calculating 'K-Nearest neighbor'; then, recommendation is carried out for the current user based on the historical preference information of the K neighbors.

7. The expression input method of claim 6, further comprising:

A user-related or proximate user group having emotes a, b, a, emotes N each having a number N of emoji labels, emoji a having a label (taga): taga1, taga2 and taga3 … tagaN, then the keywords in raga get their corresponding feature vectors through BERT model:

v(taga1)＝[v，v，...，v]

v(taga2)＝[v，v，..，v]

…

v(taga3)＝＝[v，v，...，v]

…

v(tagaN)＝[N，v，..，v]

weight-averaging each keyword vector of taga yields:

v(taga)＝[v+v+v，v+v+v，...，v+v+v]/m＝[V，V，...，V]

Similarly, expression b has label tagb, each label keyword passes through BERT model to obtain corresponding feature vector V (tagb) ([ V21, V22., V2m ], expression c has label tagc, each label keyword passes through BERT model to obtain feature vector V (tagc) ([ V31, V32., V3m ] …) of corresponding word, and so on, the similarity degree between expressions can be obtained by using cosine similarity:

…

8. The method of claim 1, wherein S6 further comprises:

and finding the similarity between the items or the similarity between the user and the user through a similarity algorithm including a clustering algorithm by using the preference of all users for the items or the information, and recommending the expression to the user.

9. The method of claim 8, wherein calculating the similarity of the user to the user comprises one of the following algorithms:

The user similarity utilizes a distance similarity calculation method including Euclidean distance and clustering, and the Euclidean distance formula is as follows: in n-dimensional space, the distance between the set of points x and the set of points and y is d (x, y):

Calculating the distance between the user and the adjacent user;

and finding the adjacent user with the minimum distance, and recommending the expression used by the adjacent user to the user.

10. An expression input device based on BERT technology, comprising:

the corpus characteristic BERT model is used for carrying out characteristic extraction training on characters/words to be expressed by a user, receiving corpus information input by the user, and carrying out characteristic extraction to obtain corresponding characteristic vectors v1, v2, a.

The classifier model is used for classifying a plurality of preset expressions, pre-training the classification of the expressions according to features, receiving input feature vectors v1, v2,.. and vk, normalizing the class probability of the feature vectors by using a SoftMax function, and finding the expression which the feature vectors finally belong to from emotion classification;

And the expression display device displays expression display information including pictures and animations corresponding to the expression required by the user through the expression.

11. the emotive input apparatus of claim 10, wherein the corpus-feature BERT model further comprises:

an input data format preprocessing module: performing language material word processing including language segmentation and stop words by taking words as a unit, and setting the language material word processing into an input data format required by a language material characteristic BERT model;

the characteristic extraction processing module: and receiving the corpus text data input according to a preset input data format, and extracting line features to obtain corresponding feature vectors v1, v 2.

12. the expression input apparatus of claim 10, wherein the classifier model is an LSTM neural network model, further comprising:

The LSTM neural network model comprises a plurality of LSTM layers, each LSTM layer is formed by connecting a plurality of blocks, and the blocks comprise one or more cyclic connection memory units and other three units: an input gate (1nput gate), an output gate (0utput gate) and a forgetting gate (Forget gate), wherein the forgetting gate adjusts whether to 'Forget' or 'remember' the currently input data through ^ g and ^ h;

A forward propagation/backward propagation calculation module for calculating an output Y for the input sequence by forward propagation and/or backward propagation through parameters of the LSTM neural network model,

and the emotion classification module is used for obtaining the most corresponding expression classification information in the plurality of preset expression classifications through outputting Y.

13. The expression input apparatus of claim 10, wherein the presentation apparatus further comprises:

Picture/motion picture forming module: and screening and recommending the expression pictures accurately matched with the expression data for the final expression classification data corresponding to the input, and randomly composing and configuring the expression data and the expression pictures into thumbnails or images of expression packages and displaying the thumbnails or images in an expression input panel.

14. the expression input apparatus of claim 13, wherein the presentation apparatus further comprises:

A recommendation module: and the thumbnail or the motion picture is recommended to the user terminal through a recommendation algorithm.

15. A computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.

16. a computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.