CN111372141B

CN111372141B - Expression image generation method and device and electronic equipment

Info

Publication number: CN111372141B
Application number: CN202010193410.8A
Authority: CN
Inventors: 黄海兵; 刘水
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2024-01-05
Anticipated expiration: 2040-03-18
Also published as: CN111372141A

Abstract

The application discloses an expression image generation method and device and electronic equipment, and relates to the technical field of Internet. The method comprises the following steps: acquiring input barrage text information; identifying a target semantic type to which the content of the barrage text information belongs; acquiring an expression image belonging to a target semantic type as an image to be processed; and adding the barrage text information into the image to be processed to generate a barrage expression image. Through the design, the form of barrage information and optional barrage expression images are enriched.

Description

Expression image generation method and device and electronic equipment

Technical Field

The application relates to the technical field of internet, and in particular relates to an expression image generation method, an expression image generation device and electronic equipment.

Background

Currently, applications with video playback capabilities typically support interactions through bullet screen information. However, the existing applications with video playing function usually only support bullet screen information in text form, and the form is single.

Disclosure of Invention

The application provides an expression image generation method and device and electronic equipment, so as to improve the problems.

In a first aspect, an embodiment of the present application provides a method for generating an expression image, including: acquiring input barrage text information; identifying a target semantic type to which the content of the barrage text information belongs; acquiring an expression image belonging to a target semantic type as an image to be processed; and adding the barrage text information into the image to be processed to generate a barrage expression image.

In a second aspect, an embodiment of the present application provides a method for generating an expression image, including: acquiring input barrage text information; displaying a barrage expression image on a video playing interface, wherein the barrage expression image comprises input barrage text information; responding to the selection operation, and determining a barrage expression image corresponding to the selection operation from the barrage expression images displayed currently; and displaying the barrage expression image corresponding to the selection operation in the barrage.

In a third aspect, an embodiment of the present application provides an expression image generating apparatus, including: the device comprises a first acquisition module, an identification module, a second acquisition module and an image generation module. The first acquisition module is used for acquiring the input barrage text information. The identification module is used for identifying the target semantic type to which the content of the barrage text information belongs. The second acquisition module is used for acquiring the expression image belonging to the target semantic type as an image to be processed. The image generation module is used for adding the barrage text information into the image to be processed to generate barrage expression images.

In a fourth aspect, an embodiment of the present application provides an expression image generating apparatus, including: the device comprises a first acquisition module, a first display module, a selection module and a second display module. The first acquisition module is used for acquiring the input barrage text information. The first display module is used for displaying the barrage expression image on the video playing interface, wherein the barrage expression image comprises input barrage text information. And the selection module is used for responding to the selection operation and determining the barrage expression image corresponding to the selection operation from the barrage expression images currently displayed. The second display module is used for displaying the barrage expression image corresponding to the selection operation in the barrage.

In a fifth aspect, embodiments of the present application provide an electronic device, including: one or more processors; a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the methods described above.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium having program code stored therein, the program code being callable by a processor to perform the method described above.

Compared with the prior art, the scheme provided by the application identifies the target semantic type to which the content of the barrage text information belongs by acquiring the input barrage text information, acquires the expression image belonging to the target semantic type as the image to be processed, adds the barrage text information into the image to be processed, generates the barrage expression image, enriches the types of barrage information, and provides rich barrage expression images for users to select, thereby increasing the enthusiasm of the users to participate in barrage interaction, improving user experience and improving user viscosity.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic view of an application environment suitable for use in embodiments of the present application.

Fig. 2 shows a flowchart of an expression image generating method according to an embodiment of the present application.

Fig. 3 is an interactive flowchart of an expression image generating method according to another embodiment of the present application.

Fig. 4A is a schematic architecture diagram of a text classification model according to an embodiment of the present application.

FIG. 4B is a schematic diagram of a model architecture for converting words into vectors according to an embodiment of the present application.

Fig. 5 is another flow chart of the method for generating the expression image shown in fig. 2.

Fig. 6 is a schematic architecture diagram of an image classification model according to an embodiment of the present application.

Fig. 7 is a schematic diagram illustrating a sub-step of step S103 shown in fig. 2.

Fig. 8 is a schematic diagram of another sub-step of step S103 shown in fig. 2.

Fig. 9 is an interface schematic diagram of a video playing application according to an embodiment of the present application.

Fig. 10 is a schematic diagram of a further sub-step of step S103 shown in fig. 2.

Fig. 11 is a schematic diagram illustrating a sub-step of step S104 shown in fig. 2.

Fig. 12 is a schematic diagram of another sub-step of step S104 shown in fig. 2.

Fig. 13 is an interface schematic diagram of another video playing application according to an embodiment of the present application.

Fig. 14 is a schematic flow chart of another method for generating the expression image shown in fig. 2.

Fig. 15 is an interface display schematic diagram of the flow shown in fig. 14.

Fig. 16 is a flowchart of an expression image generating method according to another embodiment of the present application.

Fig. 17 is a block diagram of an expression image generating apparatus according to an embodiment of the present application.

Fig. 18 is a block diagram of an expression image generating apparatus according to another embodiment of the present application.

Fig. 19 is a block diagram of an electronic device provided in an embodiment of the present application.

Fig. 20 is a storage unit for storing or carrying program codes for implementing the emoticon image generation method according to the embodiment of the application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.

Currently, applications with video playing functions generally support barrage information interaction, but barrage information is generally information in text form and is single in form. In some video playing applications of the related art, a user may select a desired expression image from downloaded or collected expression images as bullet screen information and transmit the same. However, in this way, the bullet screen expression images available for the user to select are very limited, resulting in poor user experience and poor user viscosity.

Through long-term research, the inventor provides a method, a device and electronic equipment for generating expression images, which enriches the form of barrage information on one hand and provides rich barrage expression images for selection on the other hand, so that user experience can be improved and user viscosity can be improved. This will be described in detail below.

Referring to fig. 1, fig. 1 is a schematic view of an application environment suitable for an embodiment of the present application. The server 100 may be communicatively connected to the terminal device 200 through a network, in which the terminal device 200 runs a client 210, and the terminal device 200 may log in to the server 100 through the client 210, and provide corresponding services to the user through cooperation with the server 100.

The server 100 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers. The terminal device 200 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a personal computer (Personal Computer, PC), a wearable device, or the like. The client 210 may be any application program supporting transmission of an emoticon, for example, an instant messaging application, a video playing application, a page browsing platform, etc., which is not limited in this embodiment.

The method and apparatus for generating an emoticon provided in the embodiments of the present application are applied to an electronic device, which may be the server 100 or the terminal device 200 shown in fig. 1, where when the electronic device is the terminal device 200, the method for generating an emoticon may be performed by the client 210 in the terminal device 200. In this embodiment, the client 210 may be a video playing application, or other application with a video playing function. For example, if the browser client has a video playing function, the browser client may be the client 210. As another example, the social platform application has a video playing function, and the social platform application may act as the client 210.

Referring to fig. 2, fig. 2 is a schematic diagram of an embodiment of a method for generating an expression image, which is applied to an electronic device, and the embodiment uses the electronic device as a terminal device 200 as an example, and describes steps included in the method.

S101, acquiring input barrage text information.

In this embodiment, text refers to a representation of a written language, and refers to one or more characters having a particular meaning, for example, a word, a phrase, a sentence, a paragraph, or a chapter having a particular meaning. The barrage text information refers to barrage information in text form.

An input area, for example, an input box or an input interface, may be provided on the display interface of the client 210 of the terminal device 200, so that the user may input information. The client 210 may acquire bullet screen text information input in the input area upon detecting an input operation of the user in the input area.

S102, identifying the target semantic type to which the content of the barrage text information belongs.

After acquiring the input barrage text information, the client 210 may perform semantic recognition on the content of the barrage text information, and perform classification processing on the barrage text information according to the result of the semantic recognition, so as to determine the type of the content of the barrage text information, where the determined type represents the semantic type of the barrage text information, that is, the target semantic type.

After classifying the barrage text information, the client 210 may obtain a semantic type tag of the barrage text information, where the semantic type tag may characterize the target semantic type.

S103, acquiring an expression image belonging to the target semantic type as an image to be processed.

An emoticon, among others, refers to any image in an electronic communication (e.g., text message, email, application, etc.) that may be used to express an emotional attitude, convey information. In some scenarios, the emoticons may be emoticons, for example, images containing current popular characters, recordings, screenshots of film and television works that may be used to express emoticons, may be viewed as emoticons. In addition, in some cases, in order to express a specific emotion attitude, corresponding text information can be matched with the image containing popular characters, a transcript and a screenshot of a film and television work to form a new emotion package picture.

It will be appreciated that in this embodiment, the expression image may be either a still image or a moving image. The present embodiment is not limited thereto.

In this embodiment, an expression image library may be provided, so that expression images belonging to the target semantic type may be obtained from the expression image library.

In one embodiment, the emoticon library may be deployed in a device (e.g., server 100 or other server) in communication with the client 210. In this case, the client 210 may send a search request to the device in which the expression image library is deployed after identifying the target semantic type, the search request including type information characterizing the target semantic type, the device may access the expression image library according to the search request, search the expression image library for expression images belonging to the target semantic type, and return the searched expression images belonging to the target semantic type to the client 210, and the client 210 determines the received expression image as the image to be processed.

In another embodiment, the expression image library may be stored in the terminal device 200, and at this time, the client 210 may search the expression image belonging to the target semantic type from the expression image library as the image to be processed, thereby implementing S103.

It should be noted that, in this embodiment, the expression image in the expression image library may be an expression image without text information, or may be an expression image with text information. The emoticons in the emoticon library may be divided into at least two semantic types, and each of the emoticons may have a semantic type tag that characterizes the semantic type to which it belongs.

It should be noted that, in this embodiment, the semantic type tag of the bullet screen text information determined by the client 210 through the classification processing is the same as the semantic type tag of the at least one semantic type of the expression image in the expression image library, so that the association between the expression image and the bullet screen text information can be established.

After the semantic type tag of the obtained input barrage text information is obtained, the client 210 may search the expression image with the semantic type tag from the expression image library, and the searched expression image is the image to be processed.

S104, adding the barrage text message to the image to be processed, and generating a barrage expression image.

After determining the to-be-processed image, the client 210 may add the barrage text information to the to-be-processed image, to obtain the to-be-processed image added with the barrage text information, that is, the barrage expression image. The client 210 may display the obtained bullet screen expression image for selection by the user.

Illustratively, the client 210 may call a text information add interface (e.g., putText) in an open source computer vision library (Open Source Computer Vision Library, openCV) to add a text box containing the barrage text information to the image to be processed.

Through the flow shown in fig. 2, on one hand, bullet screen information in the form of expression images is provided, on the other hand, bullet screen text information and expression images with the same semantic type can be combined at will to form bullet screen expression images which can be selected by a user, the number of bullet screen expression images which can be selected by the user is enriched, the user experience can be improved, the enthusiasm of the user for participating in the interaction of the expression images is increased, and the viscosity of the user is improved.

In another embodiment of the present application, the electronic device may be the server 100 shown in fig. 1, and the above-described expression image generating method may be used in the server 100. Referring to fig. 3, fig. 3 shows an interaction flow of the server 100 with the client 210 in the implementation of the emoticon method.

S301, the client 210 receives the input barrage text information.

S302, the client 210 sends the received barrage text information to the server 100.

The client 210 may detect the barrage text information input by the user in the input area, acquire the barrage text information, and send the barrage text information to the server 100.

S303, the server 100 acquires the barrage text information sent by the client 210, identifies the target semantic type to which the content of the barrage text information belongs, and acquires the expression image belonging to the target semantic type as the image to be processed.

The process of identifying the target semantic type to which the content of the barrage text information belongs by the server 100 is similar to S102 in the previous embodiment, and will not be described here again.

In an alternative manner of this embodiment, the server 100 may include an expression image library, in which case the server 100 may search for an expression image belonging to the target semantic type from the expression image library as the image to be processed.

In another alternative of this embodiment, an expression image library may be deployed in a device in communication with the server 100, in which case, after determining the target semantic type, the server 100 may send a search request to the device deploying the expression image library, so that the device searches for an expression image belonging to the target semantic type from the expression image library, returns the expression image to the server 100, and the server 100 uses the expression image returned by the device as the image to be processed.

S304, the server 100 adds the received barrage text information to the image to be processed to obtain a barrage expression image.

The implementation process of step S304 is similar to S104 in the previous embodiment, and will not be described here again.

S305, the server 100 transmits the bullet screen expression image to the client 210.

In practice, after generating the bullet screen expression image with bullet screen text information, the server 100 may send the generated bullet screen expression image to the client 210.

S306, the client 210 displays the bullet screen expression image.

The client 210 receives the bullet screen and displays the bullet screen on the video playing interface for the user to select.

Through the flow shown in fig. 3, on one hand, forms of barrage information are enriched, on the other hand, user selectable barrage expression image types can be enriched, user experience is improved, and user viscosity is improved.

Referring to fig. 2 again, a detailed description will be made below of an implementation flow of the expression image generating method provided in the embodiment of the present application with reference to fig. 2.

In S102, the electronic device may identify the content of the barrage text information through the text classification model, so as to obtain a semantic type tag of the barrage text information, where the semantic type represented by the semantic type tag is the target semantic type.

The text classification model can divide the content of barrage text information into one of at least two semantic types to which the expression images in the expression image library belong. In other words, the bullet screen text information and the emoticons have the same semantic type distribution. For example, the expression images in the expression image library are divided into three semantic types A1, A2 and A3, and correspondingly, the text classification model can divide the input barrage text information into one of the three semantic types A1, A2 and A3. Thus, the semantic correlation between the content of the barrage text information and the expression image can be realized.

In this embodiment, semantic types of bullet screen text information and expression images can be customized according to requirements, for example, multiple semantic types of fighting patterns, worship, delight, fighting patterns and the like can be defined. Of course, the foregoing semantic types are merely illustrative and are not intended to limit the present application.

Taking the example that at least two custom semantic types exist, in the implementation process, a semantic type label can be added to the expression image according to the custom semantic type suitable for the expression image in the expression image library in the at least two custom semantic types, and the semantic type label characterizes the custom semantic type suitable for the expression image. Alternatively, each emoticon may have one or more semantic type tags, which the present embodiment is not limited to.

Correspondingly, bullet screen text information can be obtained as samples, and semantic type tags are added to each sample according to the proper custom semantic type of the sample, and typically, one sample has one semantic type tag. And performing supervised training (Supervisor Learning, SL) on the text classification model based on the sample with the semantic type label, thereby obtaining a trained text classification model. The trained text classification model can classify the input barrage text information so as to obtain a semantic type label of the barrage text information, wherein the type label characterizes the self-defined semantic type of the content of the barrage text information in the at least two self-defined semantic types.

In this way, the content of the bullet screen text information txt-1 is assumed to be identified through the trained text classification model, so that the semantic type label of the txt-1 is obtained to be A1, and the txt-1 is determined to belong to the custom semantic type A1. Correspondingly, in S103, the electronic device may find an expression image with a semantic type tag A1 from the expression image library, and determine that the expression image belongs to the custom semantic type A1, and may use the expression image as the image to be processed.

Alternatively, in the embodiment of the present application, there may be various Text classification models for identifying the content of the barrage Text information, for example, a FastText (fast Text) classification model, a Text-CNN (Text-Convolutional Neural Networks, text-convolutional neural network) model, a ResLCNN (Residual Layer CNN, residual layer convolutional neural network), and the like. The foregoing text classification model is merely an example, and embodiments of the present application are not limited thereto.

Referring to FIG. 4A, FIG. 4A shows a schematic structure of the FastText model. The training process of the text classification model is described below using the FastText model as an example.

Wherein input X ₁ 、…、X _N N word vectors obtained based on one barrage text information are represented, and N is a positive integer. The FastText model includes an Input Layer (Input Layer) L11, an Hidden Layer (Hidden Layer) L12, and an Output Layer (Output Layer) L13.

For barrage text information requiring input of a FastText model, such as txt-2, word segmentation is first performed on txt-2 to divide txt-2 into at least one word and each resulting word is encoded into a corresponding word vector. In one example, the content of txt-2 may be "build is a machine learning model", and txt-2 may be split into, for example, "build", "one", "machine", "learn", "model" 6 words. Then, at least one word obtained by the word segmentation processing is respectively encoded into word vectors.

Further, in some scenes, in order to more accurately identify the content of the bullet screen text information, the classification processing performed later is more accurate, and word order information can be introduced into words obtained by word segmentation processing. For example, N-gram (ngram) information of bullet screen text information may be acquired, each word obtained by processing each N-gram information and word segmentation may be encoded into a vector form, and each vector obtained may be input into the input layer L11. In detail, N meta information of one bullet screen text information can be obtained by:

after word segmentation processing is carried out on the barrage text information to obtain at least one word, combining at least two adjacent words according to the position sequence of the at least one word in the barrage text information to obtain N-element information of the barrage text information.

Taking txt-2 as an example, each of the 6 words obtained by the word segmentation process can be regarded as meta information of txt-2. Two adjacent words, such as "build one", "one machine", "machine learning", "learn/model" can be regarded as binary (bi-gram) information of txt-2; adjacent three words, such as "build a machine", "a machine learning model", can be considered as ternary (tri-gram) information of txt-2. Similarly, there may also be quaternary information, quinary information, etc. In practical application, binary information and ternary information are added on the basis of the meta information obtained by word segmentation, and then each piece of N meta information is encoded.

In this embodiment, the N meta information may be encoded in a plurality of ways. One way may be, for example, one-hot (one-hot) coding, and the detailed implementation procedure may be:

given a vocabulary that contains substantially all of the words that may appear in the current scene, the words in the vocabulary have an order of arrangement, and each word has an index (index). Then, each word obtained by the word segmentation process may correspond to a word in the vocabulary. If all words in the vocabulary are grouped into a one-dimensional vector (e.g., a one-dimensional column vector), the elements in the one-dimensional vector correspond one-to-one with the words in the dictionary. For each word obtained by word segmentation, setting the value of the element corresponding to the word to be 1 in the vector, and keeping the values of the other elements to be 0, thus obtaining the word vector of the word.

Another way may be, for example, a distributed representation (distributed representation) based on a neural network, i.e. a narrow word embedded representation. Unlike the above-described one-hot encoding scheme, the dimension of the word vector obtained by the word-embedded representation scheme is lower than that of the word vector obtained by the one-hot encoding scheme, and the elements in the word vector obtained by the word-embedded representation scheme are generally of the floating point type. The essence of word embedding representation is that each word (e.g., each of the N-grams described above) is mapped to a vector in a given dimensional space, the more similar the semantics the greater the similarity of the corresponding word vectors for two words. The similarity may be generally the cosine similarity of the two vectors.

For example, one typical implementation of a word embedding representation may be a word2vec (converting words into vectors) model. word2vec is actually used to predict a likely blank word from a given context, e.g., a blank word given a context of length C words. In this case, the word2vec model is input as the C words (also called "context words") and output as a vector whose dimension is equal to the number of words in the vocabulary, the elements of the vector corresponding one-to-one to the words in the vocabulary, and an element representing the probability that the corresponding word in the vocabulary is the empty word. And the word vector is an additional product of the word2vec model.

Referring to fig. 4B, fig. 4B shows an architecture diagram of a word2vec model, which includes an input layer L21, an hidden layer L22, and an output layer L23.

If the vacant target word y needs to be predicted, the context words of the target word y can be acquired, the acquired number can be set according to the needs, and C context words of the target word y are acquired. Firstly, one-hot coding can be respectively carried out on C context words to obtain C V-dimensional vectors { X ] ₁ ，X ₂ ，…，X _C }. V represents the number of words in the vocabulary, being an integer greater than 1. Wherein, for the kth context word ωk, the vector obtained by performing one-hot encoding on the kth context word ωk can beExpressed as:

{x ₁ ,x ₂ ,…,x _C }，

of the vectors, only one node x _k ＝1，x _k′ =0 (k' +.k), where 1+.k+.c.

The input layer L21 is connected to the hidden layer L22 by a weight matrix W, and the hidden layer L22 is connected to the output layer L23 by a weight matrix W'. For example, in the scenario shown in fig. 4B, the dimension of each vector input to the input layer L21 is V, the output vector of the hidden layer L22 is an N-dimensional vector h, and the output vector of the output layer L23 is a V-dimensional vector.

The input layer L21 may be connected to the hidden layer L21 by a weight matrix of v×n dimensions, in other words, the weight matrix W may be a matrix of v×n dimensions. Based on the above description, the output vector that the hidden layer L22 can represent can be expressed as:

Where C is the number of context words, ω1, ω2, … …, ωc refer to a specific context word, X _i One-hot vector representation representing one context word input to the input layer L21, W.X _i Represented is a hidden layer vector representation V of the input context word ωi _ωi . It can be seen that the output of the hidden layer L22 is a weighted average of the vector representations of the C context words.

Correspondingly, the hidden layer L22 may be connected to the output layer L23 by an n×v-dimensional weight matrix, in other words, the weight matrix W' may be an n×v-dimensional matrix. The following describes the processing procedure from the hidden layer L22 to the output layer L23:

the elements in the weight matrix W 'may be denoted as ω' _ij By weight ω' _ij A score μj for each context word ωj entered may be calculated:

μj＝v′ _ωj ^T ·h， (2)

wherein v' _ωj Is the j-th column, v 'of the weight matrix W' _ωj ^T Is v' _ωj Based on which the posterior distribution probability of the context word ωj can be obtained by a logarithmic linear classification model softmax satisfying a polynomial distribution:

wherein y is _j Is the output of the j-th node in the output layer L23, i.e. the probability that the j-th word in the vocabulary is the target word y.

In the implementation process, the initial values of the weight matrices W and W' may be randomly determined, then the word2vec model is trained by training samples, the loss between the expected output and the real output is calculated, the gradient of the loss with respect to the weight matrices is calculated, and the values of the weight matrices are adjusted in the gradient direction. In view of the fact that the goal of training word2vec is to maximize the conditional probability of the occurrence of the target word given the input context, the loss function may be, for example, a cross entropy loss function, which is not limited by the present embodiment.

Illustratively, the expression of the cross entropy loss function loss may be:

where V is the number of words in the vocabulary, y _j The j-th element in the output vector representing the word2vec model, i.e., the target word y is the predictive probability of the j-th word in the vocabulary, a _j The representation target word y is the true probability of the jth word in the vocabulary.

After the word2vec model converges, training may be stopped. The convergence may be, for example, that the value of the loss function is not reduced, or that the value of the loss function is smaller than a set threshold, or that the set number of iterations is reached, or that the accuracy obtained by the word2vec model on the test set reaches the target accuracy, which is not limited in this embodiment.

When the word2vec model converges, two weight matrices W and W 'can be obtained, where the ith row of W represents an N-dimensional word vector of the i-th word in the vocabulary, and the jth column of W' may also represent an N-dimensional word vector of the jth word in the vocabulary. That is, the word vector is an additional product of the word2vec model. Typically, the weight matrix W may be selected to obtain a word vector for the corresponding word.

Because each N-element information of the barrage text information belongs to a vocabulary, a word vector corresponding to each N-element information can be obtained based on the weight matrix W of the converged word2vec model and used as input of the FastText model.

Referring back to fig. 4a, the implementation architecture of the fasttext model is similar to the word2vec model described above, in which the hidden layer L12 performs weighted average on the word vector of each input N-element information in a similar manner to the expression (1) described above, so as to obtain the output vector of the hidden layer L12. The output layer L13 may include at least a full Connected layer and a softmaxt layer, where the full Connected layer may obtain a score of the barrage text information belonging to each semantic type according to the above expression (2), and normalize the score belonging to each semantic type to a probability belonging to the semantic type through softmaχ layer processing.

It can be understood that the architecture of the FastText model is similar to that of the word2vec model, but the input of the FastText model is different from that of the word2vec model, the input of the FastText model is a word vector of N-element information of barrage text information, and correspondingly, the output of the FastText model is different from that of the word vector of N-element information of barrage text information, and the output of the FastText model is the probability that barrage text information belongs to different semantic types respectively. The semantic type with the highest probability can be used as the prediction semantic type to which the text information belongs.

Wherein, each semantic type can have a corresponding semantic type label, and the semantic type label of the predicted semantic type can be regarded as the predicted type information output by the FastText model.

Similarly, the FastText model has a weight matrix between the input layer L11 and the hidden layer L12, and between the hidden layer L12 and the output layer L13. In this embodiment, in order to establish an association between the semantic type to which the content of the barrage text information belongs and the semantic type to which the expression image belongs, as described above, at least two custom semantic types, for example, four types A1, A2, B1, and B2, may be set, and the custom semantic type suitable for the expression image in the expression image library is determined from the types A1, A2, B1, and B2, and a semantic type tag of the suitable custom semantic type is added for the expression image.

In this case, bullet screen text information sent by the user through the client 210 may be collected as samples, semantic type tags are added to each collected sample according to custom semantic types A1, A2, B1, B2 set for the expression images in the expression image library, the added semantic type tags are one of A1, A2, B1, B2, and then the sample with the semantic type tags is used to train the FastText model, and a loss function is set to represent the true semantic type of the bullet screen text information and the error before predicting the semantic type. Illustratively, the loss function of the FastText model may also be the cross entropy loss function described above, and the expression is similar to the expression (4) described above, and will not be repeated here.

The weight matrix can be updated in the gradient descent direction by calculating the gradient of the loss function with respect to the weight matrix of the FastText model. After the FastText model converges, training is stopped. The convergence may be the number of training iterations reaching the setting, or the value of the loss function is no longer reduced or smaller than the setting threshold, which is not limited in this embodiment. At this time, the trained FastText model can be used to classify the barrage text information, and the trained FastText model can divide the barrage text information into a semantic type to which the expression images in the expression image library belong, so that the expression images associated with the barrage text information can be determined.

Further, in order to reduce the amount of work and the duration required to set the semantic type label for the expression image in the expression image library, in this embodiment, the expression image library may include a first expression image and a second expression image. Initially, the expression image library may not include the second expression image. The first expression image is the image provided with the semantic type label corresponding to the custom semantic type. Then, before searching the expression image belonging to the target semantic type from the expression image library as the image to be processed, the expression image generating method provided in this embodiment may further include a flow shown in fig. 5, which is described in detail below.

S401, training an image classification model according to the first expression image and the semantic type label of the first expression image.

In this embodiment, the image classification model may include an image feature extraction portion and a classification portion, where the feature extraction portion is configured to extract image features from an input expression image, and then the classification portion determines, according to the extracted image features, that the input expression image belongs to different types of scores, and converts the score of the input expression image belonging to each semantic type into a corresponding probability.

Referring to fig. 6, fig. 6 schematically illustrates a structure of an image classification model. The feature extraction part may include, for example, two convolutional neural network units, each of which may include one convolutional layer (convolutional layer) and one pooling layer (pooling layer), wherein the pooling layer may be a mean pooling layer or a max pooling layer. The classification section may include, for example, a fully connected layer and a softmax layer for normalization processing. In addition, the image classification model may further include an output layer, configured to output, as prediction type information of the image classification model, a semantic type tag of a semantic type corresponding to a maximum probability according to a probability that the input expression image belongs to different types.

In this embodiment, the image classification model shown in fig. 6 may be trained by using the first expression image with the semantic type tag in the expression image library, and the error between the prediction type information of the first expression image output by the image classification model and the semantic type tag of the first expression image may be calculated according to a loss function, where the loss function may be, for example, a square loss function, a logarithmic loss function, an exponential loss function, a cross entropy loss function, and the like, which is not limited in this embodiment. And updating the model parameters of the image classification model along the gradient descending direction according to the gradient of the loss function relative to the model parameters of the image classification model. And (3) repeating the cycle, and stopping training when the image classification model reaches the optimization condition.

The optimization condition may be, for example, reaching a set number of iterations, or, for example, that the classification accuracy obtained by performing verification on the test set reaches a target proportion (for example, 70% -90%, such as 80%), or, for example, that the loss function converges. The present embodiment is not limited thereto. The image classification model obtained after the training is stopped can relatively accurately divide the expression image into one of the custom types A1, A2, B1 and B2.

S402, acquiring a second expression image.

In this embodiment, after obtaining the trained image classification model, the electronic device may obtain other expression images, and classify the obtained other expression images through the trained image classification model to determine the semantic type label thereof. And the other expression images additionally acquired on the basis of the first expression image are the second expression image.

In this embodiment, the electronic device may acquire the expression image from the network through the data acquisition component as the second expression image. The data acquisition component may be, for example, a data crawling component, such as a web crawler.

S403, obtaining prediction type information output by the trained image classification model according to the second expression image, and taking the prediction type information as a semantic type label of the second expression image.

S404, adding the second expression image with the semantic type label into an expression image library.

After the second expression image is acquired, the electronic device can input the acquired second expression image into the trained image classification model to obtain output prediction type information. The image classification model may be located in the electronic device, or may be located in another device that is communicatively connected to the electronic device, which is not limited in this embodiment.

Referring to the foregoing, the prediction type information output by the image classification model is a semantic type tag, and characterizes the prediction semantic type of the second expression image in the custom semantic types A1, A2, B1, and B2, so that the electronic device may set the prediction type information as the semantic type tag of the second expression image, and add the second expression image with the semantic type tag set to the expression image library.

Through the flow shown in fig. 4, a large number of expression images with semantic type labels can be obtained with less workload and less time, so that the number of expression images available for users to select is enriched, the user experience is further improved, and the user viscosity is improved.

Alternatively, in order to further improve user viscosity, an expression image with higher heat may be crawled as the second expression image. Based on this, S402 may be implemented by the following procedure:

counting the number of times of crawling the expression image in a first time period as a first number of times for each crawled expression image; and if the first time number of any expression image reaches the second time number, taking the expression image as a second expression image.

The second times and the first time period can be flexibly set, for example, the second times can be any integer of 5-10, such as 6. For example, the first time period may be 2 hours, half a day, one day, etc.

If the first time number of a certain expression image is up to the second time number, it means that the expression image has a certain heat on the network, so that the expression image can be used as a second expression image, and the second expression image is processed according to S403 and S404. Therefore, the expression images in the expression image library can be made to be images with higher heat, so that the probability of using the expression images generated by the method based on the embodiment by a user is improved, and the viscosity of the user is further improved.

Further, considering that the heat of the expression image is changed with the lapse of time, in this embodiment, a new second expression image may be periodically acquired, and processed according to S403 and S404, and then the existing second expression image in the expression image library may be replaced with the new second expression image having the semantic type tag. Therefore, the expression images in the expression image library can be ensured to be the expression images with higher heat at present, and the use probability of the user is improved.

Referring to fig. 2 again, in the present embodiment, the expression image library may be further divided into a plurality of image libraries, and based on this, the server 100 may search the expression image belonging to the target semantic type from the expression image library as the image to be processed in various ways.

In one embodiment, the expression image library may be divided into at least two first image libraries according to different usage rights, in which case, referring to fig. 7, the step of searching the expression image library for the expression image belonging to the target semantic type as the image to be processed may be implemented through a flow shown in fig. 7.

S103-1, acquiring authority information of the current account information.

Taking the example that the electronic device is the terminal device 200 as an example, the current account information may be account information currently logged in on the client 210; taking the example where the electronic device is the server 100, the current account information may be account information currently logged in on the client 210 that sends text information to the server 100.

In this embodiment, the authority information of each account information may include an identifier of a first image library that may be used by the user corresponding to the account information.

In one example, the server 100 may divide the use of the emoticons into two first image libraries according to whether payment is required. Wherein the expressive images requiring payment can be divided into the first image repository DB-01, and the expressive images not requiring payment can be divided into the first image repository DB-02.

User 1 whose account information is U1 has paid for DB-01, user 1 has the use rights of DB-01 and DB-02, and its rights information data1 may include: identification of DB-01 and DB-02; user 2 whose account information is U2 does not pay for DB-01, and user U2 has only the use right of DB-02 and does not have the use right of DB-01, and its right information data2 includes: identification of DB-02.

In another example, the server 100 may divide the expression images not requiring payment into one first image library, and the rest of expression images may be divided into other different first image libraries according to the difference of the usage cost to be paid, and the specific implementation logic is similar to that of the previous example, which is not described herein.

In another example, the electronic device may divide the expression image in the expression image library into at least two first image libraries according to the heat information of the expression image, and respectively establish correspondence between different first image libraries and different user levels, for example, 3 heat information sections may be set according to the difference of the heat information, where the 3 sections are sequentially, from small to large, sections 1, 2, and 3 according to the order of the heat information.

Wherein, the expression image with heat information belonging to the interval 1 is divided into a first image library DB-03, the expression image with heat information belonging to the interval 2 is divided into a first image library DB-04, and the expression image with heat information belonging to the interval 3 is divided into a first image library DB-05. Correspondingly, assuming that the server 100 classifies users into 18 ranks, users with user ranks 1 to 6 may use the emoticons in DB-03, users with user ranks 7 to 12 may use the emoticons in DB-03, DB-04, and users with user ranks 13 to 18 may use the emoticons in DB-03, DB-04, DB-05.

In this case, the authority information data3 of the account information of the user class 1-6 may include an identification of DB-03; rights information data4 of account information of user class 7-12 may include identifications of both DB-04 and DB-05; rights information data5 of account information of user class 13-18 may include identifications of three of DB-03, DB-04, and DB-05.

It can be understood that the electronic device may further divide the at least two first image libraries according to other standards, and set the authority of using each first image library respectively, so that when a user corresponding to any one piece of account information has the use authority of a certain first image library, the identifier of the first image library may be added in the authority information of the account information. For example, the electronic device may divide the dynamic expression image into one first image library DB-06 and the static expression image into the other first image library DB-07 according to whether the expression image is a dynamic image or not. In this case, the electronic device may set its authority information according to whether the account information is member account information. For example, if an account information is member account information, its authority information may be set to include identifications of both DB-06 and DB-07; if an account information is non-member account information, its authority information may be set to include an identification of DB-07.

S103-2, determining a first image library corresponding to the authority information of the current account information from at least two first image libraries included in the expression image library.

And S013-3, searching the expression image belonging to the target semantic type from the determined first image library as an image to be processed.

In an implementation process, after obtaining the authority information of the current account information, the electronic device may identify the identifier included in the authority information. Then, according to the identified identification, searching a first image library with the identification from at least two first image libraries included in the expression image library, and searching expression images belonging to the target semantic type from the first image library with the identification. The searched expression image belonging to the target semantic type can be used as the image to be processed for adding the input barrage text information.

The flow shown in fig. 7 is further described below by taking the example that the expression image library includes the first image libraries DB-01 and DB-02 and the electronic device is the server 100 described above as an example:

if the user 1 corresponding to the account information U1 inputs the bullet screen text information txt-3 on the interface of the client 210, after recognizing txt-3 through the text classification model and obtaining the semantic type tag (for example, A1), the server 100 may obtain the authority information data1 of the account information U1, recognize that the data1 includes the identifications of DB-01 and DB-02, determine the first image library DB-01 from the expression image library according to the identification of DB-01, and determine the first image library DB-02 from the expression image library according to the identification of DB-02. At this time, the server 100 may search for the emoji image having the semantic type tag A1 from the first image library DB-01 and DB-02, respectively, where the found emoji images are all images to be processed.

If the user 2 corresponding to the account information U2 inputs the bullet screen text information txt-3 on the interface of the client 210, after obtaining the semantic type tag A1 thereof through the text classification model, the server 100 may obtain the authority information data2 of the account information U2, and identify that the data2 includes the identifier of DB-02, determine the first image library DB-02 from the expression image library according to the identifier, and search the expression image with the semantic type tag A1 therefrom, where each expression image is the image to be processed.

It will be appreciated that the above-described flow is implemented by the server 100 by way of example only, and that the flow shown in fig. 7 may also be implemented by the client 210 in the terminal apparatus 200. The present embodiment is not limited thereto.

Optionally, in this embodiment, the electronic device may further search an expression image belonging to the target semantic type from the expression image library as the image to be processed through another implementation manner. In detail, a corresponding relation between the expression image and the video type can be established, and the expression image library is divided into at least two second image libraries according to different video types corresponding to the expression image.

Among them, video types can be classified into, for example, a cartoon type, an ancient wind type, a modern play type, a sports type, a game type, and the like. In one mode, the video type corresponding to the expression image can be determined by manually adding a label; in another way, a part of the expression images may be manually added with a video type tag for the first expression image, and an image classification model is trained according to the first expression image with the video type tag, and the training process is similar to that of the model shown in fig. 6, and will not be repeated here. The electronic device may then utilize the trained image classification model to determine video type tags for other emoticons (e.g., a second emoticon).

After determining the video type tag of each of the emoticons, the electronic device may divide the emoticons having the same video type tag into a second image library and associate the same video type tag with the second image library.

Based on the above situation, referring to fig. 8, the electronic device may further search the expression image belonging to the target semantic type from the expression image library through the flow shown in fig. 8 as the image to be processed, which is described in detail below.

S103-4, acquiring the multimedia information browsing record of the current account information in the target time before the current time.

In this embodiment, the way of obtaining the video viewing record is different according to the type of the client 210. In an embodiment, the client 210 may be an instant messaging application, where a web page including video information accessed by the client 210 within a target duration may be acquired, and the image information included in the web page is identified through the image classification model for identifying a video type, so as to determine the video type corresponding to the video information.

Further, in the case where the client 210 is a social platform application with video playing function, account information for logging in to the social platform application may be generally authorized to log in to other video playing applications and associated with corresponding video playing servers. In this case, taking account information U3 as an example, authorization information associated with the account information U3 may be searched, and a video playing application associated with the account information U3 may be determined according to the authorization information, so as to request a video viewing record of the account information U3 in a target duration from a video playing server corresponding to the video playing application.

In another embodiment, the client 210 may be a video playing application, where the video viewing record associated with the current account information and within the target duration may be directly obtained from the server 100 where the corresponding server of the client 210 is located. Some video playing applications may record the video information recently watched by the user, in which case, the video information in the target duration may be selected directly from the recently watched video information recorded by the server 100, and used as the video watching record.

Alternatively, the target duration may be flexibly set, for example, may be 1 day, two days, or one week, which is not limited in this embodiment.

S103-5, determining the target video type from the video watching record.

The video watching record comprises video information which is watched by a user and corresponds to the current account information in a target duration. Such video information is typically associated with a particular video type tag, such as the video playback interface shown in fig. 9, in which a schematic diagram of a video playback interface showing a plurality of video type tags for a television show, a cartoon, a game, etc., the video type represented by each video type tag may be further divided into a plurality of sub-types, such as the type of television show may be further divided into antiques, cities, real person cartoons, etc.

Based on the above, the electronic device may determine the video type to which the video information belongs according to the video type tag associated with the video information in the video viewing record, and count the occurrence number of the video information belonging to the video type in the video viewing record for each video type. The electronic device may sort the counted occurrences of each video type, and may sequentially select the first number of video types as the target video type according to the order of the occurrences from the larger number to the smaller number. The first number may be flexibly set, for example, may be any one integer of 1 to 5, for example, may be 2, which is not limited in this embodiment.

S103-6, determining a second image library corresponding to the target video type from at least two second image libraries included in the expression image library.

S103-7, searching the expression image belonging to the target semantic type from the determined second image library as an image to be processed.

After determining the target video type, the electronic device may search a second image library having a video type tag for representing the target video type from the expression image library, and then search an expression image belonging to the target semantic type from the searched second image library, where the searched expression image belonging to the target semantic type is the image to be processed.

Through the flow shown in fig. 8, the expression image meeting the user requirement can be more accurately selected to be overlapped with the barrage text information input by the user, so that the barrage expression image meeting the requirement is provided, the user experience can be further improved, and the user viscosity is improved.

Alternatively, in the present embodiment, the image to be processed may be acquired by other means in addition to the image to be processed from the expression image library. Based on this, step S103 can also be implemented by the flow shown in fig. 10.

S103-8, capturing a video picture from the video file which is currently played.

S103-9, identifying the expression image from the video picture.

S103-10, determining the expression image belonging to the target semantic type from the identified expression images as an image to be processed.

In this embodiment, the client 210 is a video playing application or an application with a video playing function, and then, when the terminal device 100 runs the client 210, the user corresponding to the client 210 is in the process of watching the video file, at this time, a video picture in the video file currently played by the client 210 may be intercepted, and an expression image may be identified from the video picture.

In one possible manner, the video frame may be taken from the played content of the currently played video file, or may be taken from all the content of the currently played video file (i.e., including the played content and the unplayed content), which is not limited in this embodiment.

Alternatively, the expression image recognized from the video picture may be an expression image transmitted by the user and displayed in the form of a bullet screen, or may be an image similar to the expression image contained in the video picture, which is not limited in this embodiment.

After the expression images are identified from the intercepted video frames, each identified expression image can be processed by using the trained image classification model, so that the semantic type label of the expression image can be determined. It will be appreciated that in step S103-10, the possible semantic types of the identified emoticons are the same as those of the barrage text information in step S102. In other words, the semantic types that the trained image classification model can partition are the same as the semantic types that the trained text classification model can partition. Thus, the relation between the expression image and the barrage text information can be realized.

After the semantic type labels of the identified expression images are obtained, determining the expression image with the semantic type label of the target semantic type from the identified expression images, wherein the determined expression image can be used as an image to be processed.

Therefore, the determined image to be processed is the image appearing in the video file being watched by the user, the probability that the bullet screen expression image generated based on the image to be processed is selected and used by the user is improved, and the user viscosity is further increased.

Referring to fig. 2 again, considering that the number of the expression images belonging to each semantic type in the expression image library is large, in S104, a portion may be selected from the respective images to be processed determined in S103, and barrage text information may be added to the portion of the images to be processed, thereby reducing the data processing amount. Based on this, there may be different embodiments of step S104.

In one embodiment, the selection may be made based on the determined heat information of the image to be processed. In this embodiment, step S104 may include the steps shown in fig. 11, which are described in detail below.

S104-1, acquiring heat information of each image to be processed.

The heat information may be any information capable of reflecting the preference degree of the user to the image to be processed, for example, the click times, click rates, collection times, etc. of the user to the image to be processed in the second duration. It is understood that the user herein may be any user capable of logging into the server 100. The second duration may be flexibly set, for example, may be 15 days, 0.5 month, 1 quarter, 1 year, etc., which is not limited in this embodiment.

S104-2, determining the to-be-processed image with the heat information meeting the first condition as a target image from the to-be-processed images.

The first condition may be flexibly set, for example, the heat information may reach a preset heat value. In one example, if the heat information is the number of clicks, the first condition may be that the number of clicks reaches a first number, where, taking the first duration as an example of 1 month, the first number may be set to 500-1000 times, such as 800 times, for example. The present embodiment is not limited thereto.

In another example, if the heat information is a click rate, the first condition may be that the click rate reaches a target proportion, which may be, for example, a proportion of more than 30%, such as 32% or 40%, which is not limited by the present embodiment.

In another example, if the heat information is the collection number, the first condition may be that the collection number reaches a second number, for example, the first time period is 1 month, and the second number may be 500-1000 times, for example, 600 times, which is not limited in this embodiment.

In yet another example, the heat information of the image to be processed in step S104-1 may characterize the heat of the image to be processed in the played content of the video file currently played. In this case, step S104-1 may be implemented by the following procedure:

Acquiring the occurrence times of the image to be processed in the played content of the currently played video file;

and obtaining the heat information of the image to be processed according to the occurrence times.

In the implementation process, target image frames with similarity reaching a similarity threshold value with the image to be processed can be identified from all image frames contained in the played content of the video file which is played currently, and the number of the identified target image frames can be used as the occurrence times of the image to be processed in the played content of the video file which is played currently. The number of occurrences may be directly used as the heat information of the image to be processed. Correspondingly, in this case, the first condition may be to indicate that the number of occurrences reaches a third number, which may be set according to statistical data, for example, 300-500 times.

In the implementation process, among the images to be processed determined in S103, the image to be processed whose heat information satisfies the first condition is the target image.

S104-3, adding the barrage text information into the target image to obtain a barrage expression image.

After the electronic equipment determines the target image, the acquired barrage text information can be added into the target image, so that the barrage expression image with the barrage text information is obtained.

Through the flow shown in fig. 11, the expression image with larger heat information and belonging to the target semantic type can be selected to add the barrage text information input by the user, so that the data processing amount is reduced, the probability of using the generated barrage expression image by the user is improved, and the user viscosity is improved.

Optionally, S104 may be implemented in another embodiment, and in detail, the emoticons in the emoticon library may have content tags, where the content tags may indicate contents of the emoticons, for example, if a certain emoticon is derived from a certain television show, the content tags of the emoticons may be a name of the television show, a name of a character in the television show, a name of an actor corresponding to the character in the television show, and so on.

In this embodiment, the manner in which the electronic device determines the content tag of the emoticon may be various. In an optional manner, when the electronic device crawls an expression image, a content label is added to the expression image according to the source information of the expression image; in another alternative, the electronic device may add a corresponding content tag to any of the emoticons in response to an operation for that emoticon. The corresponding content tab may be a user-selected tab.

Referring to fig. 12, step S104 in this embodiment may be implemented by the flow shown in fig. 12, which is described in detail below.

S104-4, title bar information on the current interface is obtained.

In this embodiment, if the user sends the emoticon on the current interface of the client 210, it indicates that the sending action of the user is performed on the content displayed on the current interface, so that the target image for adding the barrage text information can be selected from the images to be processed according to the content displayed on the current interface.

In some scenarios, the client 210 may be used to access a Web (World Wide Web) page, which typically has title bar (title) information that typically has some relevance to what the Web page displays. Based on this, if the current interface of the client 210 is a web page, the title bar information on the current interface may be acquired to select a target image from the images to be processed based on the title bar information.

In other scenarios, since the client 210 is a video playing application or an application with a video playing function, in this case, if the current interface of the client 210 may be a video playing interface, the video playing interface may have title bar information, where the title bar information is usually name information of the video information currently played in the video playing interface. For example, a video playing interface of a video playing application shown in fig. 13 includes a video playing window 1301, where video information is displayed in the video playing window 1301. A title bar 1302 is displayed on one side of the video playing window 1301, where the text information displayed is title bar information. It will be appreciated that the title bar 1302 may be located at other locations on the video playback interface, which is not limited in this embodiment.

In this scenario, the bullet screen text information sent by the user at the client 210 is based on the content sent by the video information, and the title bar information on the current interface (video playing interface) has a relatively high relevance to the video information, so that the title bar information on the current interface can be obtained as a basis for selecting the image to be processed for adding the bullet screen text information.

S104-5, respectively determining the similarity between the content label of each image to be processed and the title bar information.

In this embodiment, the title bar information and the content tag are text information, so that the title bar information and the content tag can be subjected to word segmentation in the manner described above, and word vectors of word segmentation results of the title bar information and word vectors of word segmentation results of the content tag can be obtained by using the word2vec model, so that the title bar information and the content tag can be converted into vectors.

The electronic device may calculate the similarity of vectors of both the title bar information and the content tags, which may be, for example, euclidean distance, or cosine distance. The present embodiment is not limited thereto.

S104-6, determining the to-be-processed image with the similarity meeting the second condition with the title bar information from the to-be-processed images as a target image.

S104-7, adding the barrage text information into the target image to obtain the barrage expression image.

When the method is implemented, the electronic equipment can sort the similarity between each image to be processed and the title bar information, and can sequentially select a second number of images to be processed according to the sequence from large to small, wherein the selected second number of images to be processed is the target number. The second number may be flexibly set, for example, may be 1, 2, 3 or 5, which is not limited in this embodiment.

After determining the target images, the electronic device may add the acquired barrage text information to each of the determined target images, so as to obtain a second number of barrage expression images with barrage text information.

Through the flow shown in fig. 12, the bullet screen expression image which can be selected by the user can be generated more accurately, so that the user experience is further improved, and the user viscosity is improved.

Referring again to fig. 2, in S104, for each image to be processed, a target location for adding bullet screen text information may be determined from the image to be processed before bullet screen text information is added to the image to be processed. In one approach, bullet screen text information may be added to a fixed location in the image to be processed.

In another approach, the target position may be determined by:

identifying a target area with a target size from the image to be processed, wherein the difference value of any two pixel values in the target area is smaller than a pixel threshold value; and adding the text information to a target area of the image to be processed to obtain a target expression image.

Illustratively, the target size and pixel threshold may be flexibly set as desired. For example, the target size may be 10×10, 10×8, etc., and the pixel threshold may be set as needed, and for example, the pixel threshold may be set to be relatively small, for example, 0.1-0.5, if the values of the pixel points in the target area are required to be substantially the same. It should be understood that the foregoing 0.1-0.5 is merely exemplary, and the present embodiment is not limited thereto.

Compared with the prior art that only the barrage information in text form can be sent, the barrage in the form of the expression image can be generated for the user to select, the barrage form is enriched, the enthusiasm of the user for participating in barrage interaction can be improved, and the user viscosity is improved.

In this case, after S104 shown in fig. 2 is performed, the expression image generation method provided by the present embodiment may further include the steps shown in fig. 14.

S105, displaying the generated barrage expression image in the video playing interface.

Alternatively, in the implementation process, the bullet screen expression images may be displayed in order of from the high to the low heat information (e.g., click rate, number of clicks, or collection number) of the expression images used to generate the bullet screen expression images.

For example, referring to fig. 15, a schematic diagram of a video playing interface 1500 of a video playing application is provided. The interface has a video playing window 1501 and a text input box 1502, the user U1 inputs text information txt-4 in the text input box 1502, the content of which is "around view", and if the client 210 detects the text information txt-4, the txt-4 is identified by the above-mentioned trained FastText model, so as to determine that txt-4 has a type tag B1, and searches the emoticon library for the emoticon having the type tag B1 as a to-be-processed image. For example, the images img1, im2, …, imgN to be processed are determined, where N is a positive integer. Then, the server 100 may select 3 images to be processed, for example, img1, img2, img5, as target images, from among them in order of the respective click rates of img1 to imgN from high to low.

The server 100 recognizes a blank pixel region of a target size from img1, img2, img5, respectively, and adds text information txt-4 to the blank pixel region. In this way, three images in the target expression image presentation area 1503 of fig. 14 can be obtained for the user to select. Wherein, the target expression image display area 1503 is located in the video playing interface 1500.

S106, responding to the selection operation, and determining the barrage expression image corresponding to the selection operation from the barrage expression images currently displayed.

S107, displaying a barrage expression image corresponding to the selection operation in the barrage.

Taking the scenario shown in fig. 15 as an example, a user may select any one of three bullet screen expression images displayed on the client 210, for example, select a first bullet screen expression image, and then the identification information of the selected bullet screen image may be sent to the server 100, so that the server 100 adds the bullet screen expression image corresponding to the identification information to the video stream, and displays the bullet screen expression image on the video playing interface of the client 210, so that the first bullet screen expression image selected by the user may be viewed in the bullet screen. In another alternative, the client 210 may be, for example, an instant messaging client, in which case, after the user inputs the barrage text information in the text input box of the client 210, the barrage text information is sent to the server 100, and the processing flow of the server 100 on the barrage text information is similar to that of the previous implementation, which is different in that the generated barrage expression image may be displayed in the information interaction interface of the client 210, and the barrage expression image selected by the user may also be displayed in the information interaction interface of the client 210.

Optionally, in the embodiment of the present application, the electronic device may count, as the third number, the number of times of sending each bullet screen expression image generated based on one bullet screen text information txt-3 after the user inputs the bullet screen text information (such as txt-3 described above). When the third time of any bullet screen expression image reaches the fourth time, the corresponding relation between txt-3 and the bullet screen expression image can be cached. Thus, when the user inputs txt-3 next time, the electronic device can determine that the bullet screen expression image corresponding to txt-3 is displayed to the user according to the cached corresponding relation, so that the user can select. The third number refers to the counted number of transmissions, and the fourth number can be flexibly set, for example, can be 5-8, for example, can be 6. The present embodiment is not limited thereto.

By caching the corresponding relation, the time for providing the selectable barrage expression images for the user can be shortened, and the user experience is improved.

Optionally, in the embodiment of the present application, the electronic device may further collect, according to a user operation, a bullet screen expression image corresponding to the operation. Or after the user inputs the barrage text information, the barrage text information is directly displayed on the interface in response to the user operation without generating the barrage expression image. The present embodiment is not limited thereto.

Referring to fig. 16, a flowchart of an expression image generating method according to another embodiment of the present application is shown, which may be executed by the client 210 shown in fig. 1. The detailed description is as follows.

In step S1601, input barrage text information is acquired.

The implementation procedure of step S1601 is similar to step S101 in the above embodiment.

Step S1602, displaying a barrage expression image on a video playing interface, wherein the barrage expression image comprises barrage text information.

In step S1603, in response to the selection operation, a barrage expression image corresponding to the selection operation is determined from the currently displayed barrage expression images.

Step S1604, displaying a barrage expression image corresponding to the selection operation in the barrage.

The detailed implementation process of step S1602 to step S1604 is similar to that of step S105 to step S107 in the above embodiment, and will not be described herein. The video playing interface refers to an interface of the client 210 for playing video. In step S1602, the bullet screen expression image displayed on the video playback interface may be generated by the expression image generation method described in the above embodiment.

Referring to fig. 17, a block diagram of an expression image generating apparatus 1700 according to an embodiment of the present application is shown, where the apparatus 1700 may be applied to an electronic device, which may be the server 100 or the terminal device 200 shown in fig. 1. When the electronic device is a terminal device 200, the apparatus 1700 may be included in the client 210 or may be independent of the client 210 and may interact with the client 210.

The apparatus 1700 may include: a first acquisition module 1710, an identification module 1720, a second acquisition module 1730, and an image generation module 1740.

The first obtaining module 1710 is configured to obtain the input barrage text information.

The identifying module 1720 is configured to identify a target semantic type to which the content of the barrage text information belongs.

The second obtaining module 1730 is configured to obtain an expression image belonging to the target semantic type as an image to be processed.

Optionally, the second obtaining module 1730 may be specifically configured to search an expression image belonging to the target semantic type from the expression image library as the image to be processed.

Optionally, in an embodiment of the present application, the apparatus 1700 may further include an image adding module, where the image adding module is configured to: before the second obtaining module 1730 searches the expression image belonging to the target semantic type from the expression image library as the image to be processed, training an image classification model according to the first expression image and the semantic type label of the first expression image; acquiring a second expression image; acquiring prediction type information output by the trained image classification model according to the second expression image as a semantic type label of the second expression image; and adding a second expression image with a semantic type label to the expression image library.

Alternatively, in one implementation manner of the embodiment of the present application, the expression image library may include at least two first image libraries corresponding to different rights information. In this case, the second obtaining module 1730 searches the expression image belonging to the target semantic type from the expression image library as the image to be processed may be:

acquiring authority information of current account information; determining a first image library corresponding to the authority information of the current account information from the at least two first image libraries; and searching the expression image belonging to the target semantic type from the determined first image library as an image to be processed.

Alternatively, in another implementation manner of the embodiment of the present application, the expression image library may include at least two second image libraries corresponding to different video types. In this case, the searching module 1630 searches the expression image belonging to the target semantic type from the expression image library as the image to be processed may be:

acquiring video watching records of the current account information in a target time before the current moment; determining a target video type from the video viewing record; determining a second image library corresponding to the target video type from the at least two second image libraries; and searching the expression image belonging to the target semantic type from the determined second image library as the image to be processed.

Optionally, the second obtaining module 1730 may be specifically further configured to: intercepting a video picture from a video file which is currently played, identifying an expression image from the video picture, and determining the expression image belonging to the target semantic type from the identified expression image as an image to be processed.

The image generating module 1740 is configured to add the barrage text information to the image to be processed to generate a barrage expression image.

Optionally, in an implementation manner of the embodiment of the present application, the image generating module 1740 adds the barrage text information to the image to be processed, and a manner of generating a barrage expression image may be:

acquiring heat information of each image to be processed; determining the to-be-processed image with the heat information meeting the first condition as a target image from the to-be-processed images; and adding the barrage text information into the target image to obtain the barrage expression image.

Alternatively, in another implementation manner of the embodiment of the present application, the expression images in the expression image library may have content tags, in which case, the image generating module 1640 adds the barrage text information to the image to be processed, and the manner of generating the barrage expression images may be:

Acquiring title bar information on a current interface; respectively determining the similarity between the content label of each image to be processed and the title bar information; determining the to-be-processed image with the similarity with the title bar information meeting a second condition as a target image from the to-be-processed images; and adding the barrage text information into the target image to obtain the barrage expression image.

Optionally, in the embodiment of the present application, the image generating module 1640 may add the barrage text information to the image to be processed to obtain the barrage expression image in the following manner: identifying a target area with a target size from the image to be processed, wherein the difference value of any two pixel values in the target area is smaller than a pixel threshold value; and adding the barrage text information to the target area of the image to be processed to obtain the barrage expression image.

Optionally, in an embodiment of the present application, the apparatus 1600 may further include a display module, where the display module is configured to:

displaying the generated barrage expression image in a video playing interface; responding to a selection operation, and determining a target expression image corresponding to the selection operation from the currently displayed barrage expression images; and displaying a target expression image corresponding to the selection operation in the barrage.

Further, the display module can display the generated barrage expression images in the video playing interface according to the order of the click rate from high to low.

Referring to fig. 18, a block diagram of an expression image generating apparatus 1800 according to another embodiment of the present application is shown, where the apparatus 1800 may be applied to an electronic device, which may be the terminal device 200 shown in fig. 1. The apparatus 1800 may include: a first acquisition module 1810, a first display module 1820, a selection module 1830, and a second display module 1840.

The first obtaining module 1810 is configured to obtain the input barrage text information.

The first display module 1820 is configured to display a barrage expression image on a video playing interface, where the barrage expression image includes the barrage text information.

The selection module 1830 is configured to determine, in response to a selection operation, a barrage expression image corresponding to the selection operation from currently displayed barrage expression images.

The second display module 1840 is configured to display a barrage expression image corresponding to the selection operation in the barrage.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In the several embodiments provided herein, the illustrated or discussed coupling or direct coupling or communication connection of the modules to each other may be through some interfaces, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other forms.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

Referring to fig. 19, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 1900 may be the server 100 or the terminal device 200 described above, to which the present embodiment is not limited.

The electronic device 1900 in this application may include one or more of the following components: a processor 1910, a memory 1920, and one or more programs, wherein the one or more programs may be stored in the memory 1920 and configured to be executed by the one or more processors 1910, the one or more programs configured to perform the methods as described by the foregoing method embodiments.

The processor 1910 may include one or more processing cores. The processor 1910 utilizes various interfaces and lines to connect various portions of the overall electronic device 1900, performing various functions of the electronic device 1900 and processing data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1920, and invoking data stored in the memory 1920. Alternatively, the processor 1910 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1910 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modems described above may also be implemented solely by a communication chip, rather than being integrated into the processor 1910.

The Memory 1920 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Memory 1920 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1920 may include a stored program area that may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc., and a stored data area. The storage data area may also store data created by the electronic device 1900 in use (such as emoticons), and the like.

Referring to fig. 20, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable storage medium 2000 has stored therein program code that can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 2000 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 2000 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 2000 has storage space for program code 2010 to perform any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 2010 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for generating an expression image, comprising:

acquiring input barrage text information;

identifying a target semantic type to which the content of the barrage text information belongs;

acquiring an expression image belonging to the target semantic type as an image to be processed;

adding the barrage text information into the image to be processed to generate a barrage expression image;

the obtaining the expression image belonging to the target semantic type as the image to be processed comprises the following steps: searching an expression image belonging to the target semantic type from an expression image library as the image to be processed, wherein the expression image in the expression image library is divided into at least two semantic types; the method for searching the expression image belonging to the target semantic type from the expression image library as the image to be processed comprises the following steps: acquiring video watching records of the current account information in a target time before the current moment; counting the occurrence times of video information belonging to each video type in the video watching record, and sequentially selecting a first number of video types as target video types according to the sequence from the large occurrence times to the small occurrence times; determining a second image library corresponding to the target video type from at least two second image libraries; and searching the expression image belonging to the target semantic type from the determined second image library as the image to be processed.

2. The method of claim 1, wherein the adding the bullet screen text information to the image to be processed to generate a bullet screen expression image comprises:

acquiring heat information of each image to be processed;

determining the to-be-processed image with the heat information meeting the first condition as a target image from the to-be-processed images;

and adding the barrage text information into the target image to obtain the barrage expression image.

3. The method of claim 2, wherein the acquiring the heat information of each image to be processed comprises:

4. The method of claim 1, wherein the emoticons in the emoticon library have content tags, and wherein the adding the barrage text information to the image to be processed generates a barrage emoticon comprises:

acquiring title bar information on a current interface;

respectively determining the similarity between the content label of each image to be processed and the title bar information;

Determining the to-be-processed image with the similarity with the title bar information meeting a second condition as a target image from the to-be-processed images;

5. The method according to any one of claims 1-4, wherein adding the barrage text information to the image to be processed to obtain the barrage expression image includes:

identifying a target area with a target size from the image to be processed, wherein the difference value of any two pixel values in the target area is smaller than a pixel threshold value;

and adding the barrage text information to the target area of the image to be processed to obtain the barrage expression image.

6. The method of claim 1, wherein the emoticon comprises a first emoticon having a semantic type tag;

before searching the expression image belonging to the target semantic type from the expression image library as an image to be processed, the method further comprises:

training an image classification model according to the first expression image and the semantic type label of the first expression image;

Acquiring a second expression image;

acquiring prediction type information output by the trained image classification model according to the second expression image as a semantic type label of the second expression image;

and adding a second expression image with a semantic type label to the expression image library.

7. A method for generating an expression image, comprising:

acquiring input barrage text information;

displaying a barrage expression image on a video playing interface, wherein the barrage expression image comprises barrage text information; wherein the bullet screen is generated according to the method of any one of claims 1-6;

responding to a selection operation, and determining a barrage expression image corresponding to the selection operation from currently displayed barrage expression images;

and displaying the barrage expression image corresponding to the selection operation in the barrage.

8. An expression image generating apparatus, characterized by comprising:

the first acquisition module is used for acquiring input barrage text information;

the identification module is used for identifying the target semantic type to which the content of the barrage text information belongs;

the second acquisition module is used for acquiring the expression image belonging to the target semantic type as an image to be processed;

The image generation module is used for adding the barrage text information into the image to be processed to generate a barrage expression image;

the second acquisition module is further used for searching an expression image belonging to the target semantic type from an expression image library to serve as the image to be processed, and the expression image in the expression image library is divided into at least two semantic types; the expression image library comprises at least two second image libraries corresponding to different video types, and the second acquisition module is specifically used for acquiring video watching records of current account information in a target time before the current moment; counting the occurrence times of video information belonging to each video type in the video watching record, and sequentially selecting a first number of video types as target video types according to the sequence from the large occurrence times to the small occurrence times; determining a second image library corresponding to the target video type from at least two second image libraries; and searching the expression image belonging to the target semantic type from the determined second image library as the image to be processed.

9. An expression image generating apparatus, characterized by comprising:

The first display module is used for displaying bullet screen expression images on a video playing interface, wherein the bullet screen expression images comprise bullet screen text information; wherein the bullet screen is generated according to the method of any one of claims 1-6;

the selecting module is used for responding to the selecting operation and determining a barrage expression image corresponding to the selecting operation from the barrage expression images displayed currently;

and the second display module is used for displaying the barrage expression image corresponding to the selection operation in the barrage.

10. An electronic device, comprising:

one or more processors;

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.

11. A computer readable storage medium, characterized in that the computer readable storage medium stores a program code, which is callable by a processor for performing the method according to any one of claims 1-7.