CN110543242B

CN110543242B - Expression input method and device based on BERT technology

Info

Publication number: CN110543242B
Application number: CN201910679545.2A
Authority: CN
Inventors: 周诚
Original assignee: Beijing Wisdom Octopus Technology Co ltd
Current assignee: Beijing Wisdom Octopus Technology Co ltd
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2023-07-04
Anticipated expiration: 2039-07-25
Also published as: CN110543242A

Abstract

The expression input method based on the BERT technology comprises the following steps: s1: pre-training a corpus feature BERT model; s2: pre-training a classifier model, classifying a plurality of preset expressions, and pre-training the classification of the expressions according to characteristics; s3: when receiving the corpus information input by a user, carrying out corpus word processing including word segmentation and word stopping by taking words as units, and setting an input data format required by a language feature BERT model; s4: inputting the feature vector v into a corpus feature BERT model to extract features to obtain a corresponding feature vector v ₁ ，v ₂ ，...，v _k The method comprises the steps of carrying out a first treatment on the surface of the S5: by combining the feature vectors v ₁ ，v ₂ ，...，v _k Inputting the expression to a pre-trained classifier model, and obtaining the final belonged expression; s6: and the expression display information such as pictures, animations and the like corresponding to the expression required by the user. According to the invention, expressions can be classified, the input words can be accurately matched to the corresponding expression categories, and an expression package recommendation algorithm based on user behaviors is designed to display the expression most interesting to the user, so that excessive screen sliding selection of the user is avoided, and the use experience of the user is improved.

Description

Expression input method and device based on BERT technology

Technical Field

The invention relates to an expression moving picture input algorithm, in particular to an expression moving picture input method and a device thereof based on BERT technology.

Background

The input method is to match keyword fuzzy according to the information input by the user, guess the input intention of the user, dynamically pop up expression windows including Gif and the like, let the user click, send corresponding animation expressions of Gif and the like to the current application and send the animation expressions to the receiver.

Existing expression input methods are generally referred to based on emotion classification techniques and deep learning techniques. For example, it is necessary to use emotion classification technology to make guesses more accurate, and deep learning technology may be used to guess the user's intended input intention. The following is a brief description of the related art.

1. Emotion classification technique

Dictionary-based method

And (3) marking the polarity and strength of the emotion dictionary by using the constructed text emotion dictionary, and further classifying the text emotion, wherein the text emotion analysis process is based on the dictionary as shown in fig. 1.

(II) machine learning-based method

A supervised machine learning method: in the machine learning method, naive Bayes NB (Naive Bayes) and Support Vector Machines (SVM) are common supervised learning algorithms, but researches indicate that NB and SVM can face independent condition assumptions and kernel function selection problems when used independently, so that Shalma and the like integrate a 'weak' support vector machine classifier by using Boosting technology, the classification performance of Boosting is utilized, and meanwhile, SVM is used as a basic classifier, and research results show that the integrated classifier is obviously superior to a simple SVM classifier in accuracy.

When executing text emotion analysis task, the computer usually analyzes each word of the text separately, excavates words possibly with emotion colors, integrates emotion words in sentences to judge emotion of the sentences, and stacks layers by layer to further judge emotion polarity of the whole text. However, each word in the text is not equally important to text emotion analysis, and the computer cannot automatically determine the importance of the word, so Deng et al propose a supervised word weight assignment scheme based on the two factors of importance of the word in the entire text and importance of expressing emotion. The text of the comment class lacks of logicality, the text is more unordered, the accuracy is lower when the unordered text is processed by a general supervised learning algorithm, and Perikos and the like design an integrated classifier which is based on 3 classifiers: the 1 st and 2 nd are statistics (naive bayes and maximum entropy), the 3 rd is a knowledge-based tool, and deep analysis is performed on natural language sentences. Similarly, tripath et al divide text by 1 word, 2 words, 3 words, and combinations thereof, respectively, and then comment on emotion analysis by naive bayes, maximum entropy, random gradient descent, and support vector machine methods, respectively. Tripath et al not only compares the different methods, but also the combination.

The weakly supervised deep learning method comprises the following steps:

the use of neural network models inevitably involves Word vector embedding techniques, i.e., converting human language into machine language, such as Word2Vec, giatsoglou, et al, which combine context-sensitive coding provided by Word2Vec with emotion information provided by a dictionary. Although word vector embedding techniques consider the context of words, ignoring emotion of the whole text, tang et al propose to solve this problem by encoding emotion information of the text along with the context of the words in emotion embedding, and develop a neural network with clipping penalty function to automatically collect emotion signals. Fernandez-Gavil-anes et al propose a new unsupervised emotion analysis algorithm that uses dependency syntax to determine emotion polarity. When a text emotion analysis task is performed, a plurality of emotion words with inconsistent emotion polarities in the same sentence often appear, and Liang and the like consider that an attention mechanism (attention mechanism) can effectively solve the problem, so that a Multi-attention convolutional neural network MATT-CNN (Multi-ATTentionConvolution NeuralNetworks) is constructed by combining a word vector attention mechanism, a part-of-speech attention mechanism and a position attention mechanism.

RNNs are one of the network models often used in deep learning, and have been widely used in natural language processing. Generally, RNN network model is referred to as recurrent neural network (Recurrent NeuralNetwork), which is a time series-based model, but RNN may also be referred to as recurrent neural network (RecursiveNeural Network), which is a network model focusing on the structural hierarchy. Liu Jinshuo and the like take the pre-trained word vector as the input of the lower layer recurrent neural network, then take the sentence vector output by the recurrent neural network as the input of the upper layer recurrent neural network by using sequential logic, effectively combine 2 networks and solve the problem of low accuracy of the classifier. Xie Tie and the like capture sentence semantic information by using a deep recurrent neural network algorithm and introduce a chinese "emotion training tree graph database" as training data to discover word emotion information. When the text emotion analysis task is carried out, most of the databases used are short text databases of comment types, and the traditional feature extraction method is low in accuracy due to the fact that the text is short and the features are sparse. Sun et al combine the deep belief network and the feature extraction method to obtain an extended feature extraction method so as to solve the problem of sparse feature extraction of short texts. Cao Yuhui is used for solving the problems that the RNN cannot learn the long-distance dependent information and the fully-connected classification layer in the text emotion analysis model of the CNN cannot effectively perform emotion classification on the data with nonlinear distribution. Considering the defect that the traditional method cannot acquire text semantic information, zhu Shaojie introduces a deep learning-based semi-supervised RAE (Recirculation AutoEn-coders) method, and the method can acquire higher accuracy when the feature dimension is lower. The long-term short-term memory network LSTM (Long ShortTerm Memory Network) is a special RNN that cannot solve the long-distance dependency problem, and the LSTM can capture the dependency relationship between texts and can store information for a long time. Zhou Ying et al propose an LSTM model based on a multi-attention mechanism applied to microblog comments of the Hua P10 flash gate for analyzing the emotion of netizens. Hu and the like establish a keyword word stock on the basis of the LSTM model, can help to mine potential languages in the text, and can further improve the accuracy of text polarity judgment. Ma et al propose to add an overlaid attention mechanism consisting of object-level and sentence-level attention models in LSTM, called perceptual LSTM, focusing specifically on the use of common sense knowledge in deep nerve sequence models. LSTM is an effective chained recurrent neural network, and the chained structure cannot effectively represent structural hierarchy information of a language, liang Jun and the like extend LSTM to a recurrent neural network based on a tree structure, and is used for capturing information of a text deep layer. The difficulty of the text emotion analysis task is different at the sentence level and the chapter level, so that the general model has no universality.

However, the existing input method can only display limited Gif expressions, and can only support more expressions if updating is performed manually; when the user is in use, the user must manually enter the expression page to select gif expressions, and the corresponding expressions can be sent to the other party, so that the personalized requirements of the user can not be met. The patent with the application number of 201610356623.1 discloses an expression input method, and the technical scheme provided by the invention comprises the following steps:

s1, acquiring a character string currently input by a user in real time;

s2, connecting with a remote server through a network, and performing fuzzy matching in the remote server according to the acquired character string to acquire the latest dynamic expression picture resource and storing the latest dynamic expression picture resource in a background database, wherein the specific operation of performing fuzzy matching in the remote server according to the acquired character string is as follows: the obtained character strings are associated with names of corresponding expression contents according to search rules, the dynamic expression pictures are matched with the names of the same or similar meaning according to the names in a dynamic expression picture database of a remote server, and the latest dynamic expression picture is screened out after the obtained results are matched;

and S3, displaying the latest dynamic expression picture on the interface by the bullet frame for the user to select, and if the user clicks the dynamic expression bullet frame, confirming to input the dynamic expression picture, and enabling the bullet frame to disappear and automatically send the dynamic expression picture so as to realize the quick input of the latest dynamic expression picture and improve the user experience.

The expression input is based on a matching technology of templates such as key words, the quantity of expression packages is very limited, in addition, fuzzy matching is needed when a remote server is accessed, and when a network has a problem, the problem that the input method is limited in use is easily affected.

Disclosure of Invention

A first object of the present invention is to provide an expression input method based on BERT technology, so as to solve the technical problems of very limited number of expression packages in the prior art.

An expression input method based on a BERT technique, comprising:

s1, pre-training a corpus feature BERT model, and carrying out feature extraction training on characters/words to be expressed by a user;

s2, pre-training a classifier model, classifying a plurality of preset expressions, and pre-training the classification of the expressions according to the characteristics; and an emotion analysis algorithm is added in the classification process to improve the classification result; (the above process is finished off-line)

S3, when the corpus information input by the user is received, carrying out corpus word processing including word segmentation and word stopping by taking word or word as a unit, and setting an input data format required by a language feature BERT model;

s4, inputting the feature vector v into a corpus feature BERT model to perform feature extraction to obtain a corresponding feature vector v ₁ ,v ₂ ,...,v _k Wherein k is the total word number obtained after the word segmentation of all the corpus

S5, feature vector v ₁ ,v ₂ ,...,v _k Inputting the feature vector into a pre-trained classifier model, and normalizing the class probability of the feature vector by using a softMax function to obtain the finally-belonged expression;

and S6, displaying expression display information including pictures, animations and the like corresponding to the expressions required by the User by using the User-CF or the Item-CF through the User using the historical expressions.

Compared with the prior art, the invention has the following advantages:

firstly, different from the traditional text chat and expression chat processes, the related emotion expressions (namely corresponding expression packages) are only directly displayed on clients, and the input method can directly endow the required expression characters on the corresponding expression packages, so that the user can be expressed more accurately;

secondly, the emotion categories of the expression package of the input method are more than two hundred, so that emotion scenes in daily life are basically covered; and the traditional text chat is changed into the dynamic diagram chat, so that the interestingness of the chat is improved.

Then, the invention adopts BERT technology to pretrain corpus feature BERT model, and performs feature extraction of feature expression on the character/word to be expressed by the user, while the BERT model has more accurate feature extraction, thereby enabling the feature extraction of the character/word to be more comprehensive and more accurate.

In addition, more accurate feature extraction is input to an LSTM algorithm, the highest classification probability of the feature belonging to hundreds of expression classifications can be conveniently calculated, corresponding expressions are found, corresponding displayed pictures, animations or synthesized pictures and animations are preset for each expression, an emotion analysis algorithm is further added in the process, when the content is input by a user, the emotion which the user wants to express is analyzed, the expression picture which is accurately matched with the expression data is screened and recommended according to the expression with the highest classification probability of the emotion, the expression data and the expression picture are randomly typeset and configured into a thumbnail or an icon of an expression package and displayed in an expression input panel, the accuracy of expression input is improved greatly, and the use experience of the user is improved greatly.

Finally, the expression display information including pictures, animations and the like corresponding to the expression required by the User is displayed by using the User-CF or the Item-CF, so that a better display effect is achieved, and the User has good expression input experience. In addition, aiming at the defects of large expression package quantity and limited display of the mobile terminal, the invention designs an expression package recommendation algorithm based on user behaviors, and improves the user experience.

Drawings

FIG. 1 is a dictionary-based text emotion analysis process diagram;

FIG. 2 is a schematic flow chart of the input method;

FIG. 3 is a visual representation of a BERT model input representation;

FIG. 4 is a block diagram of the LSTM algorithm;

FIG. 5 is a schematic block diagram of an example;

FIGS. 6A-6C are diagrams showing the effects of octopus input methods;

fig. 7 is a schematic diagram of a BERT model input device.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

The invention discloses an expression input method based on BERT technology, which adopts BERT (Bidirectional Encoder Representation from Transformers) method, firstly, the emotion to be expressed by the user is expressed in characteristics, thus, when the user inputs the required expression, more accurate expression can be provided; secondly, classifying emotion expressions into a plurality of types (such as 202 types) by using a deep learning classification method; finally, aiming at the defects of large quantity of expression packages and limited display of a mobile terminal, a user behavior-based expression package recommendation algorithm is designed, expression display of the most interesting users and the expression display are performed, excessive screen sliding selection of the users is avoided, and the use experience of the users is improved. Different from the traditional text chat and expression chat processes, the related emotion expressions (namely corresponding expression packages) are only directly displayed on clients, and the input method can directly endow the required expression text to the corresponding expression packages, so that the user can be expressed more accurately; secondly, the emotion categories of the expression package of the input method are more than two hundred, so that emotion scenes in daily life are basically covered; and the traditional text chat is changed into the dynamic diagram chat, so that the interestingness of the chat is improved.

Please refer to fig. 2, which is a flowchart of the input method. It comprises the following steps:

s110, pre-training a corpus feature BERT model, and carrying out feature extraction training on characters/words to be expressed by a user;

s120, pre-training a classifier model, classifying a plurality of preset expressions, and pre-training the classification of the expressions according to the characteristics;

s130, when the corpus information input by the user is received, carrying out corpus word processing including word segmentation and word stopping by taking word or word as a unit, and setting an input data format required by a language feature BERT model;

s140, inputting the feature vector v into a corpus feature BERT model to extract features to obtain a corresponding feature vector v ₁ ,v ₂ ,...,v _k Wherein k is the total word number obtained after the word segmentation of all the corpus

S150, feature vector v ₁ ,v ₂ ,...,v _k Inputting the feature vector into a pre-trained classifier model, normalizing the class probability of the feature vector by using a softMax function, and finding out the finally-belonged expression from emotion classification;

and S160, displaying expression display information including pictures, animations and the like corresponding to the expressions required by the User through the User using the historical expressions by using the User-CF or the Item-CF.

S130 is first introduced.

The new language representation model of BERT, which represents the bi-directional encoder representation of the transducer. BERT aims to pre-train the depth bi-directional representation by co-reconciling contexts in all layers. Thus, the trained BERT representation can be trimmed by an additional output layer, suitable for the construction of the most advanced model for a wide range of tasks, BERT: i.e., a bi-directional coded representation of the transducer, to improve the architecture-based hinting approach. BERT proposes a new training goal: the language model (mask language model, MLM) is masked to overcome the unidirectional limitations mentioned above. The inspiration of MLM comes from the Cloze task. Some token in the MLM random mask model (i.e., as a corpus feature BERT model) input aims to predict its original vocabulary i d based only on the context of the mask word. Unlike left-to-right language model training, the MLM targets allow for the characterization of the context on both the left and right sides of the fusion, thereby training a deep bi-directional Tansformer. In addition to masking the language model, this patent introduces a "next sentence prediction" (next sentence prediction) task that can pretrain the representation of text pairs in conjunction with the MLM.

The input representation of BERT: the input representation (input representation) can explicitly represent a single text sentence or a pair of text sentences (e.g., [ Question, answer ]) in a token sequence. For a given token, its input representation is constructed by summing the corresponding token, segment and position embeddings. Fig. 3 is a visual representation of an input representation. The representation of each word is obtained by summing the three parts of Token coding, segment Embedding, position Embedding. Wherein Token Embedding is a simple table lookup operation, segment Embedding indicates the sentence to which the word belongs, position Embedding is information about the corresponding position of the word in the sentence, and is also a table lookup operation. The corpus feature BERT model is a feature extraction model consisting of bi-directional transformers. In the figure, E represents the word ebedding, T represents the new feature representation of each word after BERT encoding, and Trm represents the transducer feature extractor. Using masked language model in training, randomizing some token in the mask input, then predicting them in pre-training and adding a sentence-level task, next sentence prediction, randomly replacing some sentences, and then using the last sentence to make IsNext/NotNext predictions. Through the two tasks, the three representations of the words are optimized by using large-scale unlabeled corpus, and a pre-trained corpus feature BERT model is obtained. The input data format required for setting up the corpus feature BERT model further comprises: word segmentation is represented by # # using a word piece embedded and a vocabulary of multiple (e.g., 30,000) token, with a sequence length supported of up to 512 token using learning positional embeddings,

The first token of each sequence is always a special class insert, the final hidden state corresponding to that token is used as an aggregate sequence representation of the class task, for non-class tasks this vector will be ignored; sentence pairs are packed into a sequence: sentences are distinguished in two ways, firstly, they are separated by an internal special tag [ SEP ], secondly, a learned sentence A is added and embedded in each token of the first sentence, a content B is embedded in each token of the second sentence,

for single sentence input, only the sense a embedding is used.

S140, inputting the feature vector v to a corpus feature BERT model to extract features, and obtaining a corresponding feature vector v ₁ ,v ₂ ,...,v _k Further comprises:

for a sentence x=x1, x2, & gt.. and optimizing the at least three representations of the word using Masked Language Model and Next Sentence Prediction as optimization objectives.

For the obtained word optimization text, a model including a textCNN model can be used for feature extraction, the textCNN model is a stacked model formed by a plurality of perceived CNNs in parallel, the features which are helpful for classification can be extracted from the representations in the sentence, and the extracted features are subjected to pooling operation to obtain final classification feature representation. The textCNN is formed by a plurality of different convolution layers in parallel, is calculated through a plurality of convolution kernels with different sizes, and is favorable for extracting sentence semantic features and sentence pattern features by using a plurality of convolution kernels with different sizes; the pooling layer performs pooling operation on the convolved result and extracts the most important features after convolution calculation; the obtained word optimization text is constructed into a semantic file of the text, and a feature map is obtained through convolution layer processing; inputting the feature map into a pooling layer, obtaining word vectors through maximum pooling treatment, and connecting the word vectors in series to form feature vectors. The above discloses only one scheme of feature extraction, but other algorithms of feature extraction may be used, which are only examples and are not intended to limit the invention.

Next, S150 is described. By combining the feature vectors v ₁ ,v ₂ ,...,v _k Inputting the feature vector into a pre-trained classifier model, and normalizing the class probability of the feature vector by using a softMax function to obtain the finally-belonged expression. The classifier model may be RNN, CBOW, etc., with LSTM being most effective. The following will take LSTM as an example.

The LSTM algorithm is an algorithm proposed due to the lack of gradient extinction and gradient explosion of RNN. And has short-term memory capacity. Gradient updates are typically performed using BPTT (Back Propagation Through Time). In the LSTM network, neurons in a general RNN network are replaced with blocks, and a schematic diagram thereof is shown in fig. 4.

An LSTM layer is formed by a plurality of blocks connected as shown in fig. 4. These blocks contain one or more circularly connected memory cells, i.e. cells in fig. 4, and three other cells: input gate (Input gate), output gate (Output gate), and Forget gate (Forget gate). As the name implies, the input gate and the output gate are those for inputting and outputting data, in the drawing ≡ _g And ≡ _h The method comprises the steps of carrying out a first treatment on the surface of the The network adjusts whether to "forget" or "remember" the currently entered data by initializing the value of the forget gate to 1.

1. Forward propagation

Let the current time be t ₁ The former time is t ₀ Storing all hidden layer units and activating each output every time when the network iterates one step; wherein N is the total number of neurons in the network, w _i j represents the weight from neuron i to neuron j. For each LSTM blockParameters of input gate, forget gate and output gate are denoted by l, φ, ω, C represents an element of the cell set C, s _c The state value of cell c is represented, f is the compression function of the gate, g and h represent the input and output compression functions of the cell. There is a case where the number of the group,

input gates (Input gates):

y _l ＝f(x _l )

forget gate (Forget gates):

y _φ ＝f(x _φ )

Cells:

s _c ＝y _φ s _c (t-1)+y _l g(x _c )

output gates):

y _ω ＝f(x _ω )

Cell output:

2. counter-propagation

Let the current start time be t ₁ The back propagation iteration process uses the standard BPTT to update parameters:

the definition of the term "a" or "an" is,

ε(t ₁ )＝e _j (t ₁ )

for each LSTM block, the value delta is calculated using the following formula:

Cell Output:

output Gates (Output Gates):

state value (States):

Cells：

forgetting gate (Forget Gates):

input Gates (Input Gates):

calculating delta by using the standard BPTT to obtain a corresponding error value:

definition of the definition

Definition of the definition

That is, the classifier model is an LSTM neural network model, and the feature vector v ₁ ,v ₂ ,...,v _k Input to the pre-trained classifier model further comprises:

Obtaining feature vector v ₁ ,v ₂ ,...,v _k The LSTM neural network model is used as an input sequence of the LSTM neural network model; the LSTM neural network model comprises a plurality of LSTM layers, each LSTM layer is formed by connecting a plurality of blocks, each block comprises one or a plurality of circularly connected memory units, and the LSTM neural network model further comprises three other units: input gate (Input gate), output gate (Output gate), and Forget gate (Forget gate), the Forget gate passing through ≡ _g And ≡ _h To adjust whether to "forget" or "remember" the currently entered data;

the parameters through the LSTM neural network model are calculated for the input sequence by forward propagation and/or backward propagation to yield an output Y,

and obtaining the most corresponding expression classification information in the plurality of preset expression classifications through outputting the Y.

As a simple example: by the LSTM step, our problem has been a classification problem, where the effect of forward and backward propagation is exemplified briefly. Assuming that four expressions of happiness, anger, grippe and happiness exist at present, the characteristic vectors are v1, v2, v3 and v4 respectively, and the four expressions are input into the LSTM, and the final expected output is four results, namely the target input is [1,2,3 and 4]. After the series of calculations, there is one actual calculation result t1, t2, t3, t4, respectively, but the desired output is [1,2,3,4], where the error is [1-t1,2-t2,3-t3,4-t4], so there is a back-propagation process, and the error is back-propagated; after the transmission is finished, the network has a new weight value, performs a new round of forward transmission, then obtains a new actual output, and decides when the training is finished according to the set error acceptance range and the number of network iterations; and after the optimal network weight is provided and the input (namely, the words and sentences input by the client) is input into the network, the classification result of each expression is obtained. In this expression classification 202, the training process is the above process (forward and reverse processes), and each expression input network will have 202 results, which value is the largest, and its expression result is the result corresponding to the value. Therefore, the training process of LSTM is adopted.

Of course, affective algorithms can also be considered throughout the process. For example, we can classify the expression types in multiple levels, which can take emotion as a certain level, and classify the expression types possibly related to a certain emotion under the sub-directory corresponding to the emotion. When the expression classification training and the continuous expression classification are performed, the emotion algorithm can be generally performed to acquire the emotion possibly expressed by the input content, and then the emotion is finely classified under the corresponding expression category, so that the classification accuracy can be improved.

Finally, S160:

s106, displaying the expression display information including pictures, animations and the like corresponding to the expression required by the User through the expression by using the User-CF or the Item-CF.

And recommending expression display information in the graph or the animation corresponding to the expression based on collaborative filtering.

(1)Item-CF

The collaborative filtering based on the items is similar to the collaborative filtering based on the users, and uses the preference (score) of all the users on the items or information to find the similarity between the items and the items, and then recommends the similar items to the users according to the historical preference information of the users. Item-based collaborative filtering can be considered a degradation of the association rule recommendation, but because collaborative filtering takes more into account the actual scoring of the user and only calculates similarity rather than finding frequent sets, item-based collaborative filtering can be considered to be more accurate and have higher coverage.

(2)User-CF

The basic principle of collaborative filtering recommendation based on users is that according to the preference (grading) of all users to articles or information, a 'neighbor' user group similar to the taste and preference of the current user is found, and an algorithm for calculating 'K-Nearest Neighboor' is adopted in general application; then, based on the historical preference information of the K neighbors, recommendation is made for the current user.

There are a number of co-filtration methods based on commodity products, some of which are also given in the examples that follow. An example is described herein. User-related or close user group with expressions a, b, etc., expression N each having several N (e.g., N may be 2-4) expression labels (mainly 2-4 keywords), such as expression a having labels (taga): taga1 (e.g., happy), taga2 (e.g., amitraz), taga3 (e.g., yao Ming) … tagaN, the keywords in taga get their corresponding feature vectors by the BERT model:

v(taga1)＝[v ₁₁ ，v ₁₂ ，...，v _1m ]

v(taga2)＝[v ₂₁ ，v ₂₂ ，..，v _2m ]

…

v(taga3)＝[v ₃₁ ，v ₃₂ ，...，v _3m ]

…

v(tagaN)＝[v _N1 ，v _N2 ，..，v _Nm ]

weighting and averaging each keyword vector of taga:

v(taga)＝[v ₁₁ +v ₂₁ +v ₃₁ ，v ₁₂ +t ₂₂ +v ₃₂ ，...，v _Nm +v _Nm +v _Nm ]/m

＝[V ₁₁ ,V ₁₂ ,...,V _1m ]

the same expression b has tags, and each tag keyword of the tag b passes through the BERT model to obtain a corresponding feature vector V (tag) = [ V ] ₂₁ ,V ₂₂ ,...,V _2m ]Expression c has tags tagc, and each tag keyword of the tags obtains a feature vector V (tagc) = [ V ] of the corresponding word through a BERT model ₃₁ ,V ₃₂ ,...,V _3m ]

The cosine similarity can be used to find the similarity between expressions:

the similarity of expression a and expression b is: cos (taga, tagb):

the similarity of expression a and expression c is: cos (taga, tagc):

the similarity of expression b and expression c is: cos (tagb, tagc):

…

by calculating a plurality of relevant expression cos values of a certain expression, the expression with the minimum cos value is calculated to be the most similar expression, and the similar expression can be recommended to the user.

(3) Content-based recommendation

Content-based recommendation is the most widely used recommendation mechanism at the beginning of the appearance of a recommendation engine, and the core idea is to discover the relevance of an item or content according to the metadata of the recommended item or content, and then recommend similar items to a user based on the past preference records of the user. The recommendation system is mainly applied to some information types, extracts some tags for the articles as keywords of the articles, and can evaluate the similarity of the two articles through the tags.

The recommendation system has the advantages that:

A. easy to implement, does not require user data and therefore does not have sparsity and cold start problems.

B. Based on the characteristic recommendation of the article, the problem of over-recommendation is avoided.

(4) Recommendation based on association rules

Recommendations based on association rules are more common in e-commerce systems and have also proven effective. In the practical sense, users who purchase some items are more inclined to purchase other items. The primary goal of an association rule-based recommendation system is to mine association rules, that is, collections of items that are purchased by many users simultaneously, where items within the collection can be recommended to each other. The existing association rule mining algorithm is mainly developed from two algorithms, namely Apriori and FP-Growth.

Recommendation systems based on association rules generally have higher conversions because of the higher likelihood of purchasing other items in a frequent collection after a user has purchased several items in the frequent collection. The disadvantage of this mechanism is that:

A. the calculation amount is large, but the off-line calculation can be performed, so that the influence is not large.

B. Due to the adoption of user data, there are unavoidable cold start and sparsity problems.

C. There is a problem in that the popular items are easily over-recommended.

The example can select one of the images or pictures or animation information corresponding to the expression type and then display the selected image or picture or animation information to the user.

Application example

Fig. 5 is an application environment diagram of a dynamic expression generating method in one embodiment. Referring to fig. 5, the dynamic expression generating method is applied to a dynamic expression generating system. The dynamic expression generating system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may be a mobile terminal, and the mobile terminal may be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers. The terminal 110 may enter an expression input panel in the conversation page, detect a word or a word input in the expression input panel, and collect the data; the terminal 110 may further obtain an expression picture that is accurately matched with the expression data by performing feature extraction on a word or word and finding out a final belonged expression from emotion classification, and randomly typeset the expression data and the expression picture to be configured into a thumbnail or a moving picture of an expression package and display the thumbnail or the moving picture in an expression input panel, and may display a plurality of thumbnails and/or moving pictures as a conversation message in a conversation page according to various recommendation algorithms.

It should be noted that, the above application environment is only an example, in some embodiments, the terminal 110 may further send the input word operation to the server 120, the server 120 performs feature extraction on the word or word and finds the finally-belonged expression from the emotion classification, and filters the expression picture that is recommended to be exactly matched with the expression data, and configures the random typesetting of the expression data and the expression picture into a thumbnail or an active diagram of the expression package (it should be noted that, the initial use may be random typesetting, and along with the accumulation of the user use habit, the preferred expression of the user may also be recommended to the user by using the recommendation algorithm). Finally, the synthesized one or more dynamic expression thumbnails are fed back to the terminal 110, and a plurality of expression thumbnails or animations are added in the expression input panel by the terminal 110, so that when the terminal 110 detects a triggering operation for the expression thumbnail in the expression input panel, the dynamic expression corresponding to the expression thumbnail can be pulled from the server 120, and the dynamic expression is displayed in a conversation page as a conversation message. Of course, the terminal 110 may also present a plurality of images to the user, and recommend to the user through various recommendation algorithms, and be presented in the session page by the user selection.

Referring to fig. 6A-6C, in one embodiment, a dynamic expression generating method is provided. The present embodiment is mainly exemplified by the application of the method to the terminal 110 in fig. 1. The dynamic expression generating method specifically comprises the following steps:

s1, entering an expression input panel in a conversation page.

The session page is a page for displaying session messages, for example, may be a page for displaying session messages sent by both parties of a session in a social application. The social application is an application for performing network social interaction based on a social network, the social application generally has an instant messaging function, the social application can be an instant messaging application, and the session message can be an instant session message.

An expression is an image having meaning expression function that reflects the mental activities, emotion, or specific semantics of the user who transmitted the expression. The expressions include static expressions and dynamic expressions. Typically, the static expression is a frame of still picture, which may be in the file format of PNG (Portable Network Graphics ), while the dynamic expression is an animation, which is synthesized from multiple frames of pictures, which may be in the file format of GIF (Graphics Interchange Format, image interchange format).

The expression input panel is a container for storing expression thumbnails corresponding to each expression, and a user can add a new expression in the expression input panel. The expression input panel may also include multiple tabs for accommodating expression thumbnails corresponding to different categories or different sources of expressions, such as a common tab for accommodating expression thumbnails corresponding to expressions designed by a developer of a social application, a collection tab for accommodating expression thumbnails corresponding to expressions of a current user collection, and an add tab for downloading, saving, or importing new expressions, and so forth. In general, in a conversation page, an expression input panel can be switched back and forth with a text input panel, a user can input text in a text input box when switching to the text input panel and send a text message to a communication counterpart, and when switching to the expression input panel, the user sends an expression message to the communication counterpart by inputting an expression. The expression input panel and the text input panel may be collectively referred to as a conversation panel.

Specifically, the terminal may display an expression input icon in a conversation page, and when a trigger event for the expression input icon is detected, display an expression input panel in the conversation page, and enter the expression input panel. The terminal can acquire the expression corresponding to any one of the expression thumbnails in the expression input panel when detecting the triggering operation of the expression thumbnail triggered by the current user, send the acquired expression to the other terminal logging in the login account of the communication counterpart, and display the acquired expression in the current session page. In this embodiment, the triggering operation for the expression thumbnail may be a click operation, a press operation, a move operation, a slide operation, or the like.

In this example, we use a different icon frame or icon to recommend to the user, although the expression thumbnail is not shown in the example, nor does our synthesized icon exclude that it can be placed in the expression thumbnail.

S2, the terminal receives chat corpus input by a user, such as 'good and difficult', and the like, and performs feature extraction processing by adopting open chat corpus:

word segmentation and word disabling are carried out on the corpus;

feature extraction is carried out on corpus by using BERT, and the corresponding feature vector is v ₁ ,v ₂ ,...,v _k Where k is the total number of corpora.

S3, inputting the feature vector of the terminal after BERT feature extraction into an LSTM, and obtaining corresponding classification results in 202 expressions by using a softmax function;

the terminal receives the words and characters input by the user, performs feature extraction on the corpus by using BERT, extracts the features related to difficulty, and then performs LSTM matching to obtain the emotion classification related to difficulty.

S4: and screening and recommending the expression picture which is accurately matched with the expression data, and configuring the expression data and the expression picture into a thumbnail or a moving picture of an expression package by random typesetting and displaying the thumbnail or the moving picture in an expression input panel. (it should be noted that, the initial use can be random typesetting, and along with the accumulation of the user's use habits, the expression of the user's preference can also be recommended to the user by using a recommendation algorithm)

The terminal correspondingly sets a plurality of expression pictures according to different emotion classifications, and gives out a specific configuration method according to how the expression pictures are synthesized into the dynamic diagram. For example, finding out matching graphs (such as crying related graphs) from the difficult graphs, and generating thumbnail images or moving pictures of expression packages according to a preset configuration method by the crying related graphs.

And S5, displaying the thumbnails or the moving pictures of the expression packages to a user input interface in a tiled arrangement mode, and directly displaying the moving pictures in a conversation frame after a user selects a certain moving picture. Aiming at the defects of large quantity of expression packages and limited display of a mobile terminal, the expression images required by a display User can be recommended and displayed by using a User-CF or an Item-CF, so that the User can select the images meeting the individuality of the User at the first time, and the User is prevented from searching the images backwards as much as possible.

The present invention also provides an expression input device (refer to fig. 7) based on BERT technology, which comprises:

corpus feature BERT model 110 performs feature extraction training of feature expression on words/words to be expressed by a user, receives corpus information input by the user, performs feature extraction, and obtains corresponding feature vectors v ₁ ,v ₂ ,...,v _k Wherein k is the total word number obtained after the word segmentation of all the corpus;

A classifier model 120 for classifying a plurality of preset expressions and pre-training the classification of the expressions according to the characteristics, and receiving input feature vectors v ₁ ,v ₂ ,...,v _k Normalizing the class probability of the feature vector by using a softMax function, and finding out the finally-belonged expression from the emotion classification;

the expression display device 130 displays the expression display information including the pictures and the animations corresponding to the expressions required by the user through the expression.

The corpus feature BERT model 110 further includes:

the input data format preprocessing module 111: carrying out corpus word processing including word segmentation and word stopping by taking words/words as units, and setting an input data format required by a language feature BERT model;

feature extraction processing module 112: receiving corpus and text data input according to a preset input data format, extracting line characteristics, and obtaining corresponding characteristic vectors v ₁ ,v ₂ ,...,v _k Wherein k is the total word number obtained after the word segmentation of all the corpus.

The classifier model is an LSTM neural network model, further comprising:

the LSTM neural network model comprises a plurality of LSTM layers 121, each LSTM layer is formed by connecting a plurality of blocks, each block comprises one or a plurality of circularly connected memory units and further comprises three other units: input gate (Input gate), output gate (Output gate), and Forget gate (Forget gate), the Forget gate passing through ≡ _g And ≡ _h To adjust whether to "forget" or "remember" the currently entered data;

a forward propagation/backward propagation calculation module 122 for calculating an output Y from the input sequence by forward propagation and/or backward propagation of the parameters of the LSTM neural network model,

the emotion classification module 123 is configured to obtain, through the output Y, expression classification information that corresponds most to the plurality of preset expression classifications.

The display 130 further comprises:

the picture/motion picture forming module 131: and the final expression classification data corresponding to the input are used for screening and recommending the expression picture which is accurately matched with the expression data, and the expression data and the expression picture are randomly typeset and configured into a thumbnail or a moving picture of an expression package and displayed in an expression input panel.

Recommendation module 132: and recommending the thumbnail or the moving picture to the user terminal through a recommendation algorithm.

For expression recommendation, the following method can be specifically used for implementation:

according to the first method, the expression is directly displayed to the user according to the number of times the user uses the expression, namely, the expression of the 'xiaohuang head portrait' used by the user is the most in the 'happy' type of expression, and the expression can be recommended and displayed to the user;

A second method, by collaborative filtering: a simple example is as follows.

The collaborative filtering method based on the user comprises a user A, a user B and a user C; expression a, expression b, expression c, and the number of clicks input by the user, and the usage of each expression by each user is shown in the following table (i.e., the number of clicks).

	Expression a	Expression b	Expression n
				User A	8	5	3
User B	7	10	2
				User C	4	2	1

The user similarity may be illustrated herein using euclidean distance, clustering, and other distance similarity methods. The Euclidean distance formula is as follows: in n-dimensional space, the distance between the point set x and the point set and y is d (x, y):

calculating the distance between the user and the adjacent user; and finding the adjacent user with the smallest distance, and recommending the expression used by the adjacent user to the user.

Taking the user a as an example, a small recommendation calculation process of the user a is specifically described.

Then there are for user a (userA) and user B (userB):

for user a (userA) and user C (userC), there are:

for user B (userB) and user C (userC), there are:

the expression used by the user a may be recommended to the user B or the expression used by the user B may be recommended to the user a.

If the clustering method is used, the clustering method is used for the users A, B and N, so that the users with the same favorite commodities (here, the expressions of the users are used), and the expressions used by the users can be recommended to the users.

The collaborative filtering method based on commodities is characterized in that expression a, expression b, expression n has 2-4 expression labels (mainly 2-4 keywords) for each expression, for example, expression a has labels (taga) of happy, amimidine and Yao Ming, and the keywords in taga obtain corresponding feature vectors through a BERT model:

v (happy) = [ v ₁₁ ,v ₁₂ ,...,v _1m ]

v (amimity) = [ v ₂₁ ,v ₂₂ ,..,v _2m ]

v (pyridine) = [ v ₃₁ ,v ₃₂ ,...,v _3m ]

Weighting and averaging each keyword vector of taga:

v(taga)＝[v ₁₁ +v ₂₁ +v ₃₁ ,v ₁₂ +v ₂₂ +v ₃₂ ,...,v _1m +v _2m +v _3m ]/m

＝[V ₁₁ ,V ₁₂ ,...,V _1m ]

The cosine similarity can be used to find the similarity between expressions:

the similarity of expression a and expression b is: cos (taga, tagb):

the similarity of expression a and expression c is: cos (taga, tagc):

/>

the similarity of expression b and expression c is: cos (tagb, tagc):

In practical applications, the minimum cos value is calculated to be the most similar expression, and the similar expression can be recommended to the user.

In the expression similarity calculation method, other similarity methods such as clustering can be used, and finally the expressions with similar results can be obtained and recommended to the user.

A computer device may be specifically the terminal 110 in fig. 5. The computer device includes a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program that, when executed by the processor, causes the processor to implement a dynamic expression generating method. The internal memory may also store a computer program that, when executed by the processor, causes the processor to perform the dynamic expression generating method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like. The camera of the computer device may be a front camera or a rear camera, and the sound collecting device of the computer device may be a microphone.

Those skilled in the art will appreciate that only partial structures associated with the aspects of the present application are not limiting as to the computer device to which the aspects of the present application apply, and that a particular computer device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components. In one embodiment, the dynamic expression generating apparatus provided in the present application may be implemented in the form of a computer program, which may be executed on an upper computer device. The memory of the computer device may store therein respective program modules constituting the dynamic expression generating apparatus, and the computer program constituted by the respective program modules causes the processor to execute the steps in the dynamic expression generating method of the respective embodiments of the present application described in the present specification.

In one embodiment, a computer device is provided that includes a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the dynamic expression generating method described above. The steps of the dynamic expression generating method herein may be the steps in the dynamic expression generating method of the above embodiments.

In one embodiment, a computer readable storage medium is provided, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the dynamic expression generating method described above. The steps of the dynamic expression generating method herein may be the steps in the dynamic expression generating method of the above embodiments.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims

1. The expression input method based on the BERT technology is characterized by comprising the following steps of:

s1: pre-training a corpus feature BERT model, and carrying out feature extraction training on character/word to be expressed by a user;

s2: pre-training a classifier model, classifying a plurality of preset expressions, and pre-training the classification of the expressions according to characteristics;

s3: when receiving the corpus information input by a user, carrying out corpus word processing including word segmentation and word stopping by taking a word/word as a unit, and setting an input data format required by a language characteristic BERT model;

s4: inputting the feature vector v into a corpus feature BERT model to extract features to obtain a corresponding feature vector v ₁ ，v ₂ ，...，v _k Wherein k is the total word number obtained after the word segmentation of all the corpus;

s5: by combining the feature vectors v ₁ ，v ₂ ，...，v _k Inputting the feature vector into a pre-trained classifier model, normalizing the class probability of the feature vector by using a softMax function, and finding out the finally-belonged expression from emotion classification;

S6: displaying the picture and animation corresponding to the expression required by the User by using the historical expression by using the User-CF or the Item-CF;

s6 further comprises:

according to the preference of all users to the articles or the information, finding a 'Neighbor' user group similar to the taste and the preference of the current user, and adopting an algorithm for calculating a 'K-Nearest Neighbor'; then, recommending the current user based on the historical preference information of the K neighbors;

a user-related or similar user group having expressions a, b, expression N each having N expression tags, expression a having tags (taga): taga1, taga2 and taga3 … tagaN, the keywords in taga obtain their corresponding feature vectors through the BERT model:

v(taga1)＝[v ₁₁ ，v ₁₂ ，...，v _1m ]

v(taga2)＝[v ₂₁ ，v ₂₂ ，...，v _2m ]

…

v(taga3)＝[v ₃₁ ，v ₃₂ ，...，v _3m ]

…

v(tagaN)＝[v _N1 ，v _N2 ，...，v _Nm ]

weighting and averaging each keyword vector of taga:

v(taga)＝[v ₁₁ +v ₂₁ +v ₃₁ ，v ₁₂ +v ₂₂ +v ₃₂ ，...，v _Nm +v _Nm +v _Nm ]/m

＝[V ₁₁ ，V ₁₂ ，...，V _1m ]

the homography b has tags, and each tag keyword of the tags obtains a corresponding feature vector V (tag) = [ V ] through a BERT model ₂₁ ，V ₂₂ ，...，V _2m ]Expression c has tags tagc, and each tag keyword of the tags obtains a feature vector V (tagc) = [ V ] of the corresponding word through a BERT model ₃₁ ,V ₃₂ ,...,V _3m ]

… and so on, the cosine similarity can be used to find the similarity between expressions:

the similarity of expression a and expression b is: cos (taga, tagb):

The similarity of expression a and expression c is: cos (taga, tagc):

the similarity of expression b and expression c is: cos (tagb, tagc):

…

2. The expression input method of claim 1, wherein the pre-trained corpus feature BERT model in S3 is a BERT model, and each word or word in a sentence x=x1, x2, & gt, xn, sentence is generated by adding three representations of token enabling, segment embedding, position embedding, and at least three representations of the word are optimized using Masked Language Model and Next Sentence Prediction as optimization targets.

3. The expression input method of claim 1, wherein the input data format required for the set corpus feature BERT model in S3 further comprises:

word segmentation is represented by # using a word table embedded by WordPiece and a plurality of token,

using learned positional embeddings, the supported sequence length is a maximum of 512 token,

the first token of each sequence is always a special class insert, the final hidden state corresponding to that token is used as an aggregate sequence representation of the class task, for non-class tasks this vector will be ignored;

Sentence pairs are packed into a sequence: sentences are distinguished in two ways, firstly, they are separated by an internal special tag [ SEP ], secondly, a learned sentence A is added and embedded in each token of the first sentence, a content B is embedded in each token of the second sentence,

for single sentence input, only the sense a embedding is used.

4. The expression input method of claim 1, wherein classifying the plurality of preset expressions further comprises:

the classifier module can adopt algorithms including CBOW and LSTM to classify, wherein the expression category is classified in advance according to the expression result of the emotion of the user and can be set as extensible later.

5. The expression input method of claim 1, wherein the classifier model is an LSTM neural network model, and the feature vector v is calculated by ₁ ,v ₂ ,...,v _k Input to the pre-trained classifier model further comprises:

obtaining feature vector v ₁ ,v ₂ ,...,v _k The LSTM neural network model is used as an input sequence of the LSTM neural network model; the LSTM neural network model comprises a plurality of LSTM layers, each LSTM layer is formed by connecting a plurality of blocks, each block comprises one or a plurality of circularly connected memory units and further comprises three other units: input gate (Input gate), output gate (Output gate), and Forget gate (Forget gate), the Forget gate passing through ≡ _g And ≡ _h To adjust whether to "forget" or "remember" the currently entered data;

6. The expression input method of claim 1, wherein S6 further comprises:

the expression may be recommended to the user by finding the similarity between items or the similarity between the user and the item through a similarity algorithm including a clustering algorithm using all the user's preferences for items or information.

7. The method of claim 6, wherein calculating the user-to-user similarity includes an algorithm of:

the user similarity uses Euclidean distance and clustering to find distance similarity, and the Euclidean distance formula is as follows: in n-dimensional space, the distance between the point set x and the point set and y is d (x, y):

calculating the distance between the user and the adjacent user;

and finding the adjacent user with the smallest distance, and recommending the expression used by the adjacent user to the user.

8. A BERT technology based expression input apparatus for performing the expression input method of any of claims 1-7, comprising:

the corpus feature BERT model carries out feature extraction training of feature expression on characters/words to be expressed by a user, receives corpus information input by the user, carries out feature extraction, and obtains corresponding feature vectors v ₁ ,v ₂ ,...,v _k Wherein k is the total word number obtained after the word segmentation of all the corpus;

the classifier model is used for classifying a plurality of preset expressions, pretraining the classification of the expressions according to the characteristics, and receiving input characteristic vectors v ₁ ,v ₂ ,...,v _k Normalizing the class probability of the feature vector by using a softMax function, and finding out the finally-belonged expression from the emotion classification;

the expression display device displays the expression display information including pictures and animations corresponding to the expression required by the user through the expression.

9. The expression input device of claim 8, wherein the corpus feature BERT model further comprises:

an input data format preprocessing module: carrying out corpus word processing including word segmentation and word stopping by taking words/words as units, and setting an input data format required by a language feature BERT model;

The feature extraction processing module: receiving corpus and text data input according to a preset input data format, extracting line characteristics, and obtaining corresponding characteristic vectors v ₁ ,v ₂ ,...,v _k Wherein k is the total word number obtained after the word segmentation of all the corpus.

10. The expression input device of claim 8, wherein the classifier model is an LSTM neural network model, further comprising:

the LSTM neural network model comprises a plurality of LSTM layers, each LSTM layer is formed by connecting a plurality of blocks, each block comprises one or a plurality of circularly connected memory units and further comprises three other units: input gate (Input gate), output gate (Output gate), and Forget gate (Forget gate), the Forget gate passing through ≡ _g And ≡ _h To adjust whether to "forget" or "remember" the currently entered data;

a forward propagation/backward propagation calculation module for calculating an output Y from the input sequence by forward propagation and/or backward propagation of the parameters of the LSTM neural network model,

and the emotion classification module is used for obtaining the expression classification information which corresponds most to the preset expression classifications through the output Y.

11. The expression input device of claim 8, wherein the expression presentation device further comprises:

Picture/motion picture forming module: and the final expression classification data corresponding to the input are used for screening and recommending the expression pictures which are accurately matched with the expression classification data, and the expression classification data and the expression pictures are randomly typeset and configured into the thumbnail or the moving picture of the expression package and displayed in the expression input panel.

12. The expression input device of claim 11, wherein the expression presentation device further comprises:

and a recommendation module: and recommending the thumbnail or the moving picture to the user terminal through a recommendation algorithm.

13. A computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the input method of any one of claims 1 to 7.

14. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the input method of any one of claims 1 to 5.