WO2021104097A1

WO2021104097A1 - Meme generation method and apparatus, and terminal device

Info

Publication number: WO2021104097A1
Application number: PCT/CN2020/129209
Authority: WO
Inventors: 乔宇; 李英; 孟锝斌; 彭小江
Original assignee: 深圳先进技术研究院
Priority date: 2019-11-29
Filing date: 2020-11-17
Publication date: 2021-06-03
Also published as: CN110889379A; CN110889379B

Abstract

A meme generation method and apparatus, and a terminal device, which relate to the technical field of visual application. The method comprises: acquiring at least one portrait image from a figure video to be processed, wherein the portrait image includes a facial image (S101); calculating expression similarity between the facial image in the portrait image and a meme image in a preset meme image library (S102); determining a target meme image and meme material on the basis of the expression similarity, wherein the target meme image belongs to the meme image library, and the meme material belongs to the portrait image (S103); and extracting text information from the target meme image, and integrating the text information with the meme material to generate a target meme (S104). According to the method, efficiency of meme generation and accuracy of text information matching are improved.

Description

Emoticon package generation method, device and terminal equipment

Technical field

This application belongs to the technical field of video applications, and in particular relates to a method, device, terminal device, and computer-readable storage medium for generating emoticons.

Background technique

With the popularization of the Internet and the gradual development of mobile communications, more and more people are beginning to use instant messaging tools. In order to activate the chat atmosphere, emoticons are a way of using pictures to express feelings. The original emoticons were mostly designed for professionals, such as emoji emoticons, QQ emoticons, etc. With the development of emoticons, people began to popularize emoticons with pictures and text, but because the production of emoticons requires manual extraction of emoticons or adding text information, it will be time-consuming and labor-intensive, and it will also generate emoticons in character videos. The problem of inefficiency.

technical problem

In view of this, the embodiments of the present application provide a method, a device and a terminal device for generating an emoticon package to solve the problem of low efficiency of extracting a face image from a character video and generating a corresponding emoticon package in the prior art.

Technical solutions

In the first aspect, an embodiment of the present application provides a method for generating an emoticon package, including:

Acquiring at least one portrait image from a character video to be processed, where the portrait image includes a face image;

Calculating the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;

Determining a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;

Extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package.

Optionally, the determining the target emoticon package image and emoticon package material based on the expression similarity includes:

When the expression similarity between the first face image and the first emoticon pack image is greater than or equal to the first preset similarity threshold, the first emoticon pack image is used as the target emoticon pack image, and the first person A face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image is any emoticon pack in the emoticon pack image library image.

When the expression similarity between the first face image and the first emoticon package image is greater than or equal to the first preset similarity threshold, acquiring attribute information of the first face image in the person video;

Acquiring multiple frames of images corresponding to the first face image from the person video according to the attribute information;

Calculating the expression similarity between each frame of the image and the first emoticon package image;

When the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold, the first emoticon package image is used as the target emoticon package image , Using the face image in the first image as a static emoticon package material, wherein the first image is any one of the multiple frames of images.

Optionally, after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further includes:

When the expression similarity between the face image in the second image and the first emoticon package image is greater than or equal to the second preset similarity threshold, the first emoticon package image is used as the target expression A packet image, using the face image in the first image and the face image in the second image as dynamic emoticon packet materials, wherein the second image is the same as the first image in the multi-frame image At least one adjacent frame of image.

When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image is used as a static emoticon package material, wherein the third image is at least one frame of image adjacent to the first image among the multiple frames of images.

When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image and the fourth image are used as dynamic emoticon package materials, wherein the fourth image is at least two consecutive images in the multi-frame images, and at least one of the at least two images is related to the first image. The images are adjacent.

Optionally, the calculating the expression similarity between the face image and the emoticon package image in the emoticon package image library includes:

Extracting facial expression features of the facial image through a preset deep neural network;

The facial expression feature is compared with the expression feature of the emoticon package image to obtain the expression similarity.

In the second aspect, an embodiment of the present application also provides an emoticon package generation device, including:

An obtaining module, configured to obtain at least one portrait image from a character video to be processed, the portrait image including a face image;

A calculation module for calculating the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;

Determining module: used to determine a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;

The generating module is used for extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate the target emoticon package.

In a third aspect, an embodiment of the present application also provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor executes the computer program When realizing the steps of the method in the first aspect.

In a fourth aspect, an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program implements the steps of the method in the first aspect when the computer program is executed by a processor.

Beneficial effect

The emoticon package generation method, device, and terminal equipment provided by the embodiments of the present application have the following beneficial effects:

In the embodiment of the present application, by extracting a portrait image from a person video to be processed, the portrait image contains a face image, and then calculating the face image in the portrait image and the preset emoticon image library Based on the expression similarity between the emoticon package images, the target emoticon package image and the emoticon package material are determined based on the expression similarity, wherein the target emoticon package image belongs to the emoticon package image library, and the emoticon package material belongs to the A portrait image, extracting text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package. Since the present invention can determine the target emoticon pack image and emoticon pack material according to the degree of expression similarity, and automatically integrate the text information of the emoticon pack image with the emoticon pack material, the user does not need to manually select the emoticon pack image, which reduces the user's operational burden, Allow users to quickly and easily make their own emoticons in the video. According to the facial expression similarity, the facial image or video fragment is extracted from the character video as the emoticon package material, which realizes the generation of the emoticon package based on the expression similarity calculation, and improves the efficiency of emoticon generation.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only of the present application. For some embodiments, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative labor.

FIG. 1 is a schematic diagram of an implementation process of a method for generating an emoticon package provided by an embodiment of the present application;

2 is a schematic diagram of the implementation process of the emoticon package generation method provided by another embodiment of the present application;

FIG. 3 is a schematic structural diagram of an emoticon package generating apparatus provided by an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.

Symbol description of main components:

300-emoji package generation device; 310-acquisition module; 320-calculation module; 330-determination module; 340-generation module; 400-terminal equipment; 410-memory; 420-processor; 430-computer program.

Embodiments of the present invention

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.

It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the term "and/or" used in the specification of this application and the appended claims refers to any combination of one or more of the items listed in association and all possible combinations, and includes these combinations.

As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to determination" or "in response to detecting ". Similarly, the phrase "if determined" or "if detected [described condition or event]" can be construed as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

In order to illustrate the technical solution of the present invention, the following specific embodiments are used to illustrate.

As shown in FIG. 1, FIG. 1 shows a method for generating an emoticon package provided by an embodiment of the present application, and the method for generating an emoticon package may include the following S101 to S104.

S101: Obtain at least one portrait image from a character video to be processed, where the portrait image includes a face image;

In this embodiment, the execution subject of the emoticon package generation method may compose each frame image in the above-mentioned character video into a portrait image, or the above-mentioned execution subject may be based on a preset step size (for example, 1 or 2, etc.) from the above-mentioned character video. The image is extracted from the image, and the extracted images are formed into a portrait image. The images in the portrait image are arranged in the order of playback in the above-mentioned character video. It should be understood that the preset step length can be set according to actual needs. There are no specific restrictions.

It should be noted that the execution body may receive the emoticon package generation request sent by the user through the electronic device in real time, and the character video may be a video included in the emoticon package generation request received by the execution body.

It is understandable that the character video can be pre-stored in the electronic device, or it can be a video saved on the website for playback, or it can be recorded in real time through the electronic device. When the character video is played or recorded, use Face recognition technology processes the portrait image in the person video, and extracts at least one portrait image with a face image. The portrait image includes background, text, etc. The face image can be a person's face or multiple people's human face. When the electronic device is a terminal device, it can meet the user's needs for extracting a specific face image from the video; when the electronic device is a server, by running the device for extracting the face image from the character video in the electronic device, It can meet the facial image extraction requirements of platforms such as video websites.

Specifically, a face detection algorithm is used to detect a face image with a face from multiple portrait images, and the face image can be marked, which is convenient for extracting facial expression features from the face image.

It should be noted that there may be one or more faces in the portrait image, or there may be no face. The face detection algorithm is used to detect at least one portrait image in the person video to obtain the corresponding face detection result. , The face detection result is used to indicate whether a human face is displayed in the portrait image. The face detection algorithm can be a multi-task convolutional neural network (Multi-task convolutional neural network). neural network, MTCNN), a multi-task neural network model for face detection tasks. The model mainly uses three cascaded networks and the idea of candidate boxes plus classifiers to perform fast and efficient face detection. The cascaded networks are P-Net that quickly generates candidate windows, R-Net that performs high-precision candidate window filtering and selection, and O-Net that generates final bounding boxes and face key points. The human face in the video of the person is detected by a multi-task convolutional neural network to obtain a corresponding face image.

S102: Calculate the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;

Specifically, when calculating facial expression similarity between facial images in multiple portrait images and emoticon package images, a deep neural network is used to extract facial expression features in the facial image. The deep neural network can be used in ImageNet, Face Recognition data or facial expression data and other databases are pre-trained, and the extracted facial expression features are compared with the pre-processed expression packs in the emoticon pack image library, where the emoticon pack feature can be a new input The emoticon feature, feature comparison is the convolutional layer feature or the fully connected layer feature of the deep neural network, and the classification probability of the neural network can also be used as the feature. Obtain emoticon package features and facial image features. Feature comparison can be calculated based on feature cosine similarity, or Euclidean distance can be calculated after feature normalization, and the calculated result is used as expression similarity. Use the aforementioned face detection algorithm to obtain at least one portrait image containing a human face image, and calculate the expression similarity between each of the portrait images and the emoticon package images in the preset emoticon package image library, according to multiple The expression similarity can filter out the emoticon package images that meet the requirements, thereby improving the accuracy of matching between the facial image of the portrait image and the emoticon package image.

It should be noted that before performing the above S102, the emoticon pack image library can be downloaded or collected from the Internet, or it can be an emoticon pack image manually added or created by the user, and classified according to the text information, format, etc. of the emoticon pack image , Use the text extraction method to obtain the text in the emoticon, such as OCR text recognition software, if the emoticon does not contain text, you can obtain the name and format of the emoticon, or extract the surrounding text of the emoticon to form the text of the emoticon Information, according to the emoticon package and the corresponding text information to establish an emoticon package image library.

S103: Determine a target emoticon pack image and emoticon pack material based on the expression similarity, where the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;

Specifically, after calculating the expression similarity between each face image and the emoticon package image in the emoticon package image library, the calculated expression similarity can be marked or sorted, or the expression similarities can be compared one by one to obtain the expression. The value corresponding to the similarity is the largest one or more. There are one or more expression similarities calculated that meet the above conditions, and there are also one or more expression package images in the corresponding expression package image library to determine the target expression package image and the person in the portrait image in the expression package image library. Face image, using the face image as the emoticon package material to generate a new emoticon package. In addition, the determined target emoticon pack image, emoticon pack material, and corresponding emoticon similarity can also be associated, so as to quickly find the target emoticon pack image and emoticon pack material.

S104: Extract text information of the target emoticon package image, and integrate the text information with the emoticon package material to generate a target emoticon package.

Specifically, open the video, start the emoticon package to automatically make commands, run the face detection algorithm in the background, and define the detected face as face image 1, face image 2...face image N, which can be detected in face image 1 When calculating the expression similarity with the emoticon package image in the emoticon package image library, you can also set the detection time or the number of detected facial images and then perform the expression similarity calculation with the emoticon package image in the emoticon package image library. Take the detected face image 1 as an example, calculate the facial expression similarity between the recognized face image 1 and the face in the emoticon package image library to obtain the corresponding expression similarity, and you can select the one with the greatest expression similarity The emoticon pack image is used as the target emoticon pack image, the text information of the target emoticon pack image is extracted, the face image in the portrait image is extracted as the emoticon pack material, and the text information is added by subtitles or naming method and the emoticon pack material is integrated to generate the target emoticon pack . In addition, the facial images in the multiple portrait images are respectively calculated with the emoticons in the emoticon pack image library, and the emoticon pack images and face images corresponding to the emoticon similarity that exceed the preset threshold are selected, so that the target The accurate matching of the emoticon image and the emoticon material also improves the efficiency of emoticon generation.

Further, the determining the target emoticon package image and the emoticon package material based on the expression similarity includes:

Specifically, taking the face image a as an example, when the expression similarity between the face image a and the emoticon package image in the emoticon package image library is greater than or equal to the first preset similarity threshold, the emoticon package image is used as the target emoticon package Image, the target emoticon pack image can be understood as an emoticon pack template, and the emoticon pack template is used to extract text information. Precisely screen out the emoticon pack images and corresponding face images that meet the requirements, so as to quickly generate new emoticon packs, and save the generated emoticon packs in the emoticon pack image library, which can be downloaded or forwarded by users, thereby improving users Activity.

As shown in FIG. 2, FIG. 2 shows a schematic diagram of an implementation process of a method for generating an emoticon package according to another embodiment of the present application, wherein the above S103 includes the following S201 to S204.

S201: When the expression similarity between the first face image and the first emoticon package image is greater than or equal to a first preset similarity threshold, obtain attribute information of the first face image in the person video;

Specifically, the expression similarity can be set to a value in [0,100], the first preset similarity threshold can be set to 50, and the expression similarity between the first face image and the first emoticon package image is 55. After the above conditions, the attribute information of the first face image in the character video is obtained. The attribute information includes the position and time of the first face image in the entire video, so as to facilitate subsequent interception of the character video based on the attribute information Related video clips effectively avoid background information from interfering with the normal extraction of face images.

It should be noted that there is a one-to-one corresponding expression similarity between each face image in the portrait image and each emoticon package image in the emoticon package image library, and the expression similarity is greater than or equal to the first preset similarity. Face images, portrait images include face images, background information, subtitles, etc., thereby improving the accuracy of extracting emoticon materials.

S202: Acquire multiple frames of images corresponding to the first face image from the person video according to the attribute information;

In this embodiment, the first face image may appear one or more times in the person video. According to the attribute information of the first face image, the corresponding multi-frame images, that is, a video clip, are intercepted from the person video. In other words, Perform face detection and tracking on the first face image, and extract video frames of the face image from the person video according to the detection and tracking results to ensure the integrity of the emoticon pack material extraction.

S203: Calculate the expression similarity between each frame of the image and the first expression pack image;

In this embodiment, after obtaining the corresponding multi-frame images from the person video, the face area can be detected from the obtained video frames, and after the face area is detected, the face frame set is obtained, and then based on the person The face frame collection extracts the face image from the obtained video clips, that is, the image containing the area within the face frame, and calculates the facial expression of each frame of the image with the first expression pack image to obtain the corresponding expression similarity In order to select the emoticon package material that meets the requirements.

S204: When the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold, use the first emoticon package image as the target expression Packet image, using the face image in the first image as a static emoticon packet material, wherein the first image is any one of the multiple frames of images.

In this embodiment, the above-mentioned execution subject may be based on the expression similarity corresponding to the above-mentioned multi-frame images. The expression similarity may be represented by a numerical value within [0,100]. In practice, the smaller the value, the lower the expression similarity. The larger the expression, the higher the similarity. When the facial expression similarity between the first image in the portrait image and the first emoticon package image stored in the emoticon pack image library is greater than or equal to the first preset similarity threshold, for example, the first preset similarity may be set to 50, Obtain the attribute information of the first face image in the character video, the attribute information includes the time and location of the first face image, and the multiple frames of images corresponding to the first face image can be located according to the attribute information, and the A video clip of a certain period of time (such as 3 seconds) before and after the time in the character video is obtained. The images around the face position in the character video are obtained to form a condensed video clip, which can remove irrelevant background information in the video clip. Calculate the expression similarity between each frame of image and the first emoticon package image, and determine that each frame of image is calculated with the first emoticon package image to obtain the corresponding expression similarity. When the expression similarity is greater than or equal to the second When a frame of image with a similarity threshold is preset, the face image in the image is used as a static emoticon pack material, that is, an emoticon pack picture. The size of the image around the face position can be set to four times the size of the original face image. This makes it easy to distinguish between the face image and the background information. It should be noted that the first preset similarity threshold and the second similarity threshold may be the same or different, and are set according to actual conditions, which are not specifically limited here.

It is understandable that one frame of image may contain one or more face regions, or there is no face region; and when there are one or more face regions, a corresponding number of face frames can be obtained, and then, a corresponding number can be obtained. In the face image of, the above-mentioned face frame set is a set about at least one face frame, the face frame is a rectangular frame surrounding the face area, and since the face frame can be characterized by coordinate information, therefore, The aforementioned face frame set may specifically include at least one piece of coordinate information, and each piece of coordinate information can determine a face frame. In order to extract the complete expression of the emoticon package material, the face frame can be expanded outward to obtain a new rectangle, and the image block enclosed by the new rectangle can be taken out to obtain the face image. For example, a pre-trained face detection model can be used to detect the face area from the acquired multi-frame images, so as to quickly and accurately extract the emoticon package material.

Further, after S203, it also includes:

In this embodiment, taking the selection of three images as an example, the expression similarity between the first image and the second image and the first emoticon package image respectively meet the condition, that is, the expression similarity is greater than or equal to the second preset similarity threshold. At this time, the face image in the first image and the face image in the second image can be used as dynamic emoticon package materials. If the expression similarity between the first image and the first emoticon pack image meets the condition, and the expression similarity between the third image and the first emoticon pack image meets the condition, the expression similarity between the second image and the first emoticon pack image does not meet the condition , The face images in the three frames of images can also be used as dynamic emoticon pack materials, that is, dynamic emoticon packs can be generated, which expands the types of emoticon pack generation and adds the fun of using emoticon packs.

Further, after S203, it also includes:

In this embodiment, taking the selection of three frames of images as an example, when the facial image in the third image in the multi-frame images has the greatest expression similarity with the second emoticon pack image, the second emoticon pack image is taken as The target emoticon package image, the face image in the third image is used as the static emoticon package material, that is, the emoticon package picture. By selecting the emoticon pack image with the greatest expression similarity and the face image in the corresponding image, the emoticon pack image to be generated can be accurately extracted, so that the generated emoticon pack can express emotion accurately.

Further, after S203, it also includes:

In this embodiment, taking two emoticon pack images in the emoticon pack image library as an example, the expression similarity between the face image in the portrait image and the first emoticon pack image is recorded as the first expression similarity, and the portrait The expression similarity between the face image in the image and the second expression pack image is recorded as the second expression similarity. When the first expression similarity is less than the second expression similarity, the second expression similarity corresponding to the second expression similarity is selected. The emoticon package image is used as the target emoticon package image. In the selected multi-frame images, there may be the greatest similarity between the face image in the image and the emoticon pack image. The continuous face images in the previous and subsequent frames of the image can be used as the dynamic emoticon pack material to improve The display effect of the emoticon package.

In another embodiment, in order to improve the display effect of the target emoticon package, the user can edit the target emoticon package according to his own wishes or add props effects, such as hats, mushroom heads and other effect props, and/or add some art Words, watermarks, etc., the target emoticon package is made into a preset format and stored in the emoticon package image library such as interfaces and chat tools for users to operate. The preset format is set according to needs. For example, the preset format can be It is a GIF format. The GIF format can store multiple images. Multiple images saved in a file can be read out and displayed on the screen to form a simple animation to improve user operability and experience.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the sequence of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application. It corresponds to the emoticon package generation method described in the above embodiment.

Please refer to 3. FIG. 3 shows a schematic structural diagram of an emoticon package generating apparatus 300 provided by the present application, as shown in FIG. 3, including:

The obtaining module 310 is configured to obtain at least one portrait image from a character video to be processed, where the portrait image includes a face image;

The calculation module 320 is configured to calculate the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;

The determining module 330 is configured to determine a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;

The generating module 340 is configured to extract text information of the target emoticon package image, and integrate the text information with the emoticon package material to generate a target emoticon package.

Optionally, the determining module 330 is specifically configured to: when the expression similarity between the first facial image and the first emoticon package image is greater than or equal to a first preset similarity threshold, the first emoticon package The image is used as a target emoticon pack image, and the first face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image Is any emoticon package image in the emoticon package image library.

Optionally, the determining module 330 specifically includes:

The first acquiring unit is configured to acquire the first face image in the person video when the expression similarity between the first face image and the first emoticon package image is greater than or equal to a first preset similarity threshold. Attribute information in;

A second acquiring unit, configured to acquire multiple frames of images corresponding to the first face image from the person video according to the attribute information;

A calculation unit, configured to calculate the expression similarity between each frame of the image and the first expression pack image;

The first material determining unit is configured to combine the first emoticon package when the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold An image is used as the target emoticon pack image, and the face image in the first image is used as a static emoticon pack material, wherein the first image is any one of the multiple frames of images.

Optionally, the determining module 330 may further include:

The second material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon pack image, when the face image in the second image is the same as the first emoticon pack image. When the expression similarity between the emoticon package images is greater than or equal to the second preset similarity threshold, the first emoticon package image is used as the target emoticon package image, and the face image in the first image And the face image in the second image is used as a dynamic emoticon package material, wherein the second image is at least one frame of image adjacent to the first image among the multiple frames of images.

Optionally, the determining module 330 may further include:

The third material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon package image, when the face image in the third image is compared with the second emoticon package When the expression similarity between the images is the greatest, the second emoticon pack image is used as the target emoticon pack image, and the face image in the third image is used as the static emoticon pack material, wherein the third emoticon pack image The image is at least one frame of image adjacent to the first image among the multiple frames of images.

Optionally, the determining module 330 may further include:

The fourth material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon package image, when the face image in the third image is compared with the second emoticon package When the expression similarity between the images is maximum, the second emoticon pack image is used as the target emoticon pack image, and the face image and the fourth image in the third image are used as dynamic emoticon pack materials, wherein, The fourth image is at least two consecutive frames of images in the multiple frames of images, and at least one of the at least two frames of images is adjacent to the first image.

Optionally, the acquisition module 310 is further configured to extract facial expression features of the facial image through a preset deep neural network, and the calculation module 320 compares the facial expression features with those of the emoticon package image. The expression features are compared to obtain the expression similarity.

Please refer to FIG. 4, which is a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 4, the terminal device 400 includes a memory 410, at least one processor 420, and is stored in the memory 410 and can be stored in the memory 410. The computer program 430 running on the processor 420 implements the aforementioned emoticon package generation method when the processor 420 executes the computer program 430.

The terminal device 400 may be a desktop computer, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra mobile personal computer (ultra -mobile personal On terminal devices such as a computer (UMPC), a netbook, and a personal digital assistant (personal digital assistant, PDA), the embodiments of this application do not impose any restrictions on the specific types of terminal devices.

The terminal device 400 may include but is not limited to a processor 420 and a memory 410. Those skilled in the art can understand that FIG. 4 is only an example of the terminal device 400, and does not constitute a limitation on the terminal device 400. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices.

The so-called processor 420 may be a central processing unit (Central Processing Unit, CPU), and the processor 420 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

In some embodiments, the memory 410 may be an internal storage unit of the terminal device 400, such as a hard disk or a memory of the terminal device 400. In other embodiments, the memory 410 may also be an external storage device of the terminal device 400, such as a plug-in hard disk equipped on the terminal device 400, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card, etc. Further, the memory 410 may also include both an internal storage unit of the terminal device 400 and an external storage device. The memory 410 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, such as the program code of the computer program. The memory 410 may also be used to temporarily store data that has been output or will be output.

It should be noted that the information interaction and execution process between the above-mentioned emoticon package generating apparatus/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment. Part, I won’t repeat it here.

Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above-mentioned auxiliary photographing device, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

The embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.

The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiments and methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (Read-Only Memory, ROM), and random access memory (Random Access Memory, RAM), electric carrier signal, telecommunications signal, and software distribution medium. Such as U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, in accordance with legislation and patent practices, computer-readable media cannot be electrical carrier signals and telecommunication signals.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

The reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/network equipment and method may be implemented in other ways. For example, the device/network device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A method for generating emoticons, which is characterized in that it includes:

Acquiring at least one portrait image from a character video to be processed, where the portrait image includes a face image;

Calculating the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;

Determining a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;

Extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package.
The emoticon package generation method according to claim 1, wherein the determining the target emoticon package image and the emoticon package material based on the expression similarity comprises:

When the expression similarity between the first face image and the first emoticon pack image is greater than or equal to the first preset similarity threshold, the first emoticon pack image is used as the target emoticon pack image, and the first person A face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image is any emoticon pack in the emoticon pack image library image.
The emoticon package generation method according to claim 1, wherein the determining the target emoticon package image and the emoticon package material based on the expression similarity comprises:

When the expression similarity between the first face image and the first emoticon package image is greater than or equal to the first preset similarity threshold, acquiring attribute information of the first face image in the person video;

Acquiring multiple frames of images corresponding to the first face image from the person video according to the attribute information;

Calculating the expression similarity between each frame of the image and the first emoticon package image;

When the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold, the first emoticon package image is used as the target emoticon package image , Using the face image in the first image as a static emoticon package material, wherein the first image is any one of the multiple frames of images.
The emoticon package generation method according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further comprises:

When the expression similarity between the face image in the second image and the first emoticon package image is greater than or equal to the second preset similarity threshold, the first emoticon package image is used as the target expression A packet image, using the face image in the first image and the face image in the second image as dynamic emoticon packet materials, wherein the second image is the same as the first image in the multi-frame image At least one adjacent frame of image.
The emoticon package generation method according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further comprises:

When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image is used as a static emoticon package material, wherein the third image is at least one frame of image adjacent to the first image among the multiple frames of images.
The emoticon package generation method according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further comprises:

When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image and the fourth image are used as dynamic emoticon package materials, wherein the fourth image is at least two consecutive images in the multi-frame images, and at least one of the at least two images is related to the first image. The images are adjacent.
The emoticon package generation method according to any one of claims 1 to 6, wherein the calculating the expression similarity between the face image and the emoticon package image in the emoticon package image library comprises:

Extracting facial expression features of the facial image through a preset deep neural network;

The facial expression feature is compared with the expression feature of the emoticon package image to obtain the expression similarity.
An emoticon package generating device, which is characterized in that it comprises:

An obtaining module, configured to obtain at least one portrait image from a character video to be processed, the portrait image including a face image;

A calculation module for calculating the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;

A determining module, configured to determine a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;

The generating module is used for extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate the target emoticon package.
A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program as claimed in claims 1 to 7. Any one of the emoticon package generation method.
A computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein the computer program is executed by a processor to implement the emoticon package generation method according to any one of claims 1 to 7 .