CN110889379A - Expression package generation method and device and terminal equipment - Google Patents

Expression package generation method and device and terminal equipment Download PDF

Info

Publication number
CN110889379A
CN110889379A CN201911197094.5A CN201911197094A CN110889379A CN 110889379 A CN110889379 A CN 110889379A CN 201911197094 A CN201911197094 A CN 201911197094A CN 110889379 A CN110889379 A CN 110889379A
Authority
CN
China
Prior art keywords
image
expression
expression package
package
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911197094.5A
Other languages
Chinese (zh)
Other versions
CN110889379B (en
Inventor
乔宇
李英
孟锝斌
彭小江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201911197094.5A priority Critical patent/CN110889379B/en
Publication of CN110889379A publication Critical patent/CN110889379A/en
Priority to PCT/CN2020/129209 priority patent/WO2021104097A1/en
Application granted granted Critical
Publication of CN110889379B publication Critical patent/CN110889379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application is suitable for the technical field of visual application, and provides an expression package generation method, which comprises the following steps: the method comprises the steps of obtaining at least one portrait image from a character video to be processed, wherein the portrait image comprises a face image, calculating expression similarity between the face image in the portrait image and an expression package image in a preset expression package image library, determining a target expression package image and an expression package material based on the expression similarity, wherein the target expression package image belongs to the expression package image library, the expression package material belongs to the portrait image, extracting text information of the target expression package image, and integrating the text information and the expression package material to generate a target expression package. The application also provides an expression package generation device and terminal equipment, and the expression package generation efficiency and the text information matching accuracy are improved.

Description

Expression package generation method and device and terminal equipment
Technical Field
The application belongs to the technical field of video application, and particularly relates to an expression package generation method and device, terminal equipment and a computer readable storage medium.
Background
With the popularization of networks and the gradual development of mobile communication, more and more people start to use instant messaging tools. To animate a chat atmosphere, emoticons are one way to represent emotion using pictures. The first emoticons were mostly designed by professionals, such as emoji emoticons, QQ emoticons, and the like. With the development of expressions, people begin to popularize the expression packages with pictures and characters, but because the expression packages are manufactured by manually extracting the expression packages or adding text information, time and labor are consumed, and the efficiency of generating the expression packages in character videos is low.
Disclosure of Invention
In view of this, embodiments of the present application provide an expression package generation method, an expression package generation device, and a terminal device, so as to solve the problem in the prior art that efficiency of extracting a face image from a character video and generating a corresponding expression package is not high.
In a first aspect, an embodiment of the present application provides an expression package generation method, including:
acquiring at least one portrait image from a character video to be processed, wherein the portrait image comprises a face image;
calculating expression similarity between a face image in the portrait image and an expression package image in a preset expression package image library;
determining a target expression package image and an expression package material based on the expression similarity, wherein the target expression package image belongs to the expression package image library, and the expression package material belongs to the portrait image;
extracting text information of the target expression package image, and integrating the text information and the expression package materials to generate a target expression package.
Optionally, the determining a target expression package image and expression package materials based on the expression similarity includes:
when the expression similarity between a first face image and a first expression package image is larger than or equal to a first preset similarity threshold value, taking the first expression package image as a target expression package image, and taking the first face image as a static expression package material, wherein the first face image is any face image in the face image, and the first expression package image is any expression package image in an expression package image library.
Optionally, the determining a target expression package image and expression package materials based on the expression similarity includes:
when the expression similarity between a first face image and a first expression package image is larger than or equal to a first preset similarity threshold value, acquiring attribute information of the first face image in the person video;
acquiring a multi-frame image corresponding to the first face image from the person video according to the attribute information;
calculating the expression similarity between each frame of image and the first expression package image;
and when the expression similarity between the face image in the first image and the first expression package image is greater than or equal to a second preset similarity threshold, taking the first expression package image as the target expression package image, and taking the face image in the first image as a static expression package material, wherein the first image is any one of the multi-frame images.
Optionally, after the calculating the expression similarity between each frame of the image and the first expression package image, the method further includes:
and when the expression similarity between the face image in the second image and the first expression package image is greater than or equal to the second preset similarity threshold, taking the first expression package image as the target expression package image, and taking the face image in the first image and the face image in the second image as dynamic expression package materials, wherein the second image is at least one frame image adjacent to the first image in the multi-frame images.
Optionally, after the calculating the expression similarity between each frame of the image and the first expression package image, the method further includes:
and when the expression similarity between the face image in the third image and the second expression package image is the maximum, taking the second expression package image as the target expression package image, and taking the face image in the third image as a static expression package material, wherein the third image is at least one frame image adjacent to the first image in the multi-frame images.
Optionally, after the calculating the expression similarity between each frame of the image and the first expression package image, the method further includes:
and when the expression similarity between the face image in the third image and the second expression package image is maximum, taking the second expression package image as the target expression package image, and taking the face image and a fourth image in the third image as dynamic expression package materials, wherein the fourth image is at least two continuous frames of images in the multi-frame image, and at least one frame of image in the at least two frames of images is adjacent to the first image.
Optionally, the calculating an expression similarity between the face image and an expression package image in the expression package image library includes:
extracting facial expression characteristics of the facial image through a preset deep neural network;
and comparing the facial expression features with the expression features of the expression packet image to obtain the expression similarity.
In a second aspect, an embodiment of the present application further provides an expression package generating device, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring at least one portrait image from a to-be-processed figure video, and the portrait image comprises a face image;
the computing module is used for computing the expression similarity between the face image in the portrait image and the expression package image in a preset expression package image library;
a determination module: the facial expression recognition system is used for determining a target facial expression package image and facial expression package materials based on the facial expression similarity, wherein the target facial expression package image belongs to the facial expression package image library, and the facial expression package materials belong to the portrait image;
and the generation module is used for extracting the text information of the target expression package image and integrating the text information and the expression package materials to generate the target expression package.
In a third aspect, an embodiment of the present application further provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method in the first aspect when executing the computer program.
In a fourth aspect, this application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the method in the first aspect.
The method, the device and the terminal equipment for generating the expression package have the following beneficial effects that:
in the embodiment of the application, a portrait image is extracted from a character video to be processed, the portrait image contains a face image, expression similarity between the face image in the portrait image and an expression package image in a preset expression package image library is calculated, a target expression package image and expression package materials are determined based on the expression similarity, the target expression package image belongs to the expression package image library, the expression package materials belong to the portrait image, text information of the target expression package image is extracted, and the text information and the expression package materials are integrated to generate a target expression package. According to the method and the device, the target expression package image and the expression package material can be determined according to the expression similarity, and the text information of the expression package image and the expression package material are automatically integrated, so that a user does not need to manually select the expression package image, the operation burden of the user is reduced, and the user can rapidly and easily make the expression package in the video. Facial images or video clips are extracted from the character video according to the expression similarity and serve as the materials of the expression packages, the expression packages are generated based on expression similarity calculation, and the generation efficiency of the expression packages is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of an implementation of an expression package generation method provided in an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating an implementation of an expression package generation method according to another embodiment of the present application;
fig. 3 is a schematic structural diagram of an expression package generation apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Description of the main element symbols:
300-expression package generating means; 310-an acquisition module; 320-a calculation module; 330-a determination module; 340-a generation module; 400-a terminal device; 410-a memory; 420-a processor; 430-computer program.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
In order to illustrate the technical means of the present invention, the following specific examples are given.
As shown in fig. 1, fig. 1 illustrates an emoticon generation method provided in an embodiment of the present application, where the emoticon generation method may include the following steps S101 to S104.
S101: acquiring at least one portrait image from a character video to be processed, wherein the portrait image comprises a face image;
in this embodiment, the execution subject of the expression package generation method may combine each frame of image in the character video into a portrait image, or the execution subject may extract an image from the character video based on a preset step length (e.g., 1 or 2), and combine the extracted image into a portrait image, where the images in the portrait image are arranged according to a playing sequence in the character video, it should be understood that the preset step length may be set according to an actual need, and is not specifically limited herein.
The executive agent may receive, in real time, an emoticon generation request sent by a user through the electronic device, and the character video may be a video included in the emoticon generation request received by the executive agent.
It can be understood that the character video may be pre-stored in the electronic device, may also be a video stored on a website for playing, and may also be recorded in real time through the electronic device, when playing or recording the character video, the human face image in the character video is processed by using a face recognition technology, at least one human face image with a human face image is extracted, the human face image includes a background, characters, and the like, and the human face image may be a human face of a person or a plurality of human faces. When the electronic equipment is terminal equipment, the requirement that a user extracts a specific face image from a video can be met; when the electronic equipment is a server, the device for extracting the face image from the character video is operated in the electronic equipment, so that the extraction requirement of the face image of platforms such as a video website and the like can be met.
Specifically, a face image with a face is detected from a plurality of face images by adopting a face detection algorithm, the face image can be marked, and the extraction of facial expression characteristics of the face image is facilitated.
It should be noted that one or more faces may exist in the portrait image, or no face may exist in the portrait image, and a face detection algorithm is adopted to perform face detection on at least one portrait image in the character video to obtain a corresponding face detection result, where the face detection result is used to indicate whether a face is displayed in the portrait image. The face detection algorithm can be a Multi-task convolutional neural network (MTCNN) and is used for a Multi-task neural network model of a face detection task, the model mainly adopts the ideas of three cascaded networks and a candidate frame classifier to carry out rapid and efficient face detection, and the three cascaded networks are respectively a P-Net for rapidly generating a candidate window, an R-Net for carrying out high-precision candidate window filtering selection and an O-Net for generating a final boundary frame and face key points. And detecting the face in the person video through a multitask convolutional neural network to obtain a corresponding face image.
S102: calculating expression similarity between a face image in the portrait image and an expression package image in a preset expression package image library;
specifically, when the expression similarity calculation is performed on the face images in the multiple face images and the expression package image, a deep neural network is adopted to extract the facial expression features in the face images, the deep neural network can be trained in advance on databases such as ImageNet, face recognition data or facial expression data, and the extracted facial expression features are compared with the pre-processed expression packages in the expression package image library, wherein the expression package features can be input new expression package features, the feature comparison is convolutional layer features or full-link layer features of the deep neural network, and classification probabilities of the neural network can also be adopted as features. And obtaining the characteristics of the expression packet and the characteristics of the face picture. The feature comparison can be calculated based on the feature cosine similarity, or can be calculated by Euclidean distance after feature normalization, and the calculated result is used as the expression similarity. The face detection algorithm is adopted to obtain at least one portrait image containing a face image, expression similarity between each portrait image and an expression package image in a preset expression package image library is calculated, and expression package images meeting requirements can be screened out according to the expression similarities, so that the matching accuracy of the face image of the portrait image and the expression package images is improved.
Before executing the above S102, the expression package image library may be downloaded or collected from the internet, or may be an expression package image manually added or created by a user, the expression package image is classified according to character information, format, and the like of the expression package image, characters in the expression package, such as OCR character recognition software, are obtained in a text extraction manner, and if the expression package does not include characters, names and formats of the expression package may be obtained, or characters around the expression package are extracted to form text information of the expression package, and the expression package image library is established according to the expression package and corresponding text information.
S103: determining a target expression package image and an expression package material based on the expression similarity, wherein the target expression package image belongs to the expression package image library, and the expression package material belongs to the portrait image;
specifically, after the expression similarity between each face image and the expression package image in the expression package image library is calculated, the calculated expression similarities may be marked or sorted, or the expression similarities may be compared one by one to obtain one or more expression similarity values with the largest corresponding value. One or more expression similarity degrees obtained through calculation meet the conditions, one or more expression package images in the corresponding expression package image library are also one or more, so that a target expression package image in the expression package image library and a face image in the portrait image are determined, and the face image is used as an expression package material for generating a new expression package. In addition, the determined target expression package image, the determined expression package material and the corresponding expression similarity can be correlated, so that the target expression package image and the expression package material can be quickly found out.
S104: extracting text information of the target expression package image, and integrating the text information and the expression package materials to generate a target expression package.
Specifically, the video is opened, an automatic expression package making command is started, a face detection algorithm is run in a background, detected faces are defined as a face image 1 and a face image 2 … …, expression similarity calculation can be performed with the expression package images in an expression package image library when the face image 1 is detected, and expression similarity calculation can also be performed with the expression package images in the expression package image library after detection time or the number of detected face images is set. Taking the detected face image 1 as an example, performing expression similarity calculation on the face in the identified face image 1 and the face in the expression package image library to obtain corresponding expression similarity, selecting the expression package image with the maximum expression similarity as a target expression package image, extracting text information of the target expression package image, extracting the face image in the portrait image as an expression package material, and integrating the text information with the expression package material in a subtitle adding or naming mode to generate a target expression package. In addition, the facial images in the multiple portrait images are respectively subjected to expression similarity calculation with the expression packages in the expression package image library, and the expression package images and the facial images corresponding to the expression similarities exceeding the preset threshold are selected, so that the target expression package images can be accurately matched with the expression package materials, and the expression package generation efficiency is also improved.
Further, the determining the target expression package image and the expression package material based on the expression similarity includes:
when the expression similarity between a first face image and a first expression package image is larger than or equal to a first preset similarity threshold value, taking the first expression package image as a target expression package image, and taking the first face image as a static expression package material, wherein the first face image is any face image in the face image, and the first expression package image is any expression package image in an expression package image library.
Specifically, taking the facial image a as an example, when the expression similarity between the facial image a and the expression package image in the expression package image library is greater than or equal to a first preset similarity threshold, the expression package image is taken as a target expression package image, the target expression package image can be understood as an expression package template, and the expression package template is used for extracting text information. The expression package images meeting the requirements and the corresponding face images are accurately screened out, so that a new expression package can be quickly generated, the generated expression package is stored in the expression package image library, downloading or forwarding of a user can be provided, and the activity of the user is improved.
As shown in fig. 2, fig. 2 is a schematic diagram illustrating an implementation flow of an expression package generation method according to another embodiment of the present application, where the above S103 includes the following S201 to S204.
S201: when the expression similarity between a first face image and a first expression package image is larger than or equal to a first preset similarity threshold value, acquiring attribute information of the first face image in the person video;
specifically, the expression similarity can be set to be a numerical value in [0,100], a first preset similarity threshold can be set to be 50, the expression similarity between the first face image and the first expression package image is 55, the above conditions are met, then the attribute information of the first face image in the character video is obtained, the attribute information includes the position, time and the like of the first face image in the whole video, the relevant video segment can be conveniently intercepted from the character video subsequently according to the attribute information, and the situation that the background information interferes with the normal extraction of the face image is effectively avoided.
It should be noted that each facial image in the portrait image and each facial image in the expression package image library have expression similarity in one-to-one correspondence, a facial image with the expression similarity greater than or equal to a first preset similarity is selected, and the portrait image includes a facial image, background information, subtitles, and the like, so that the accuracy of extracting the expression package materials is improved.
S202: acquiring a multi-frame image corresponding to the first face image from the person video according to the attribute information;
in this embodiment, the first face image may appear in the person video one or more times, and a plurality of corresponding frames of images, that is, video films, are captured from the person video according to the attribute information of the first face image.
S203: calculating the expression similarity between each frame of image and the first expression package image;
in this embodiment, after acquiring a plurality of corresponding frames of images from a character video, a face region may be detected from the acquired video frames, a face frame set may be obtained after the face region is detected, then a face image is extracted from the acquired video segment based on the face frame set, that is, an image including a region in the face frame is extracted, and the facial expression of each frame of image and the first expression package image are subjected to expression similarity calculation to obtain corresponding expression similarity, so as to select an expression package material meeting the requirement.
S204: and when the expression similarity between the face image in the first image and the first expression package image is greater than or equal to a second preset similarity threshold, taking the first expression package image as the target expression package image, and taking the face image in the first image as a static expression package material, wherein the first image is any one of the multi-frame images.
In this embodiment, the executing body may be based on the expression similarity corresponding to the multi-frame images, and the expression similarity may be represented by a numerical value in [0,100], in practice, a smaller numerical value indicates a lower expression similarity, and a larger numerical value indicates a higher expression similarity. When the expression similarity between the first image in the portrait image and the first expression package image stored in the expression package image library before is greater than or equal to a first preset similarity threshold, for example, the first preset similarity may be set to 50, attribute information of the first person image in the portrait video is obtained, where the attribute information includes time and position of the first person image, and according to the attribute information, a multi-frame image corresponding to the first person image may be located, a video segment of the portrait video within a certain time (e.g., 3 seconds) before and after the time is obtained, images around the face position in the portrait video are obtained to form a simplified video segment, and irrelevant background information in the video segment may be removed. The method comprises the steps of calculating expression similarity between each frame of image and a first expression package image, judging whether each frame of image is subjected to expression similarity calculation with the first expression package image to obtain corresponding expression similarity, and taking a face image in the image as a static expression package material, namely an expression package image when one frame of image with the expression similarity larger than or equal to a second preset similarity threshold exists, wherein the size of the image around the face position can be set to be four times that of the original face image, so that the face image and background information can be distinguished conveniently. It should be noted that the first preset similarity threshold and the second similarity threshold may be the same or different, and are set according to actual situations, and are not specifically limited herein.
It can be understood that one frame of image may include one or more face regions, or may not include a face region; when one or more face regions exist, a corresponding number of face frames can be obtained, and then a corresponding number of face images are obtained, wherein the face frame set is a set related to at least one face frame, and the face frame is a rectangular frame surrounding the face region. In order to extract the complete expression of the emotion bag material, the face frame can be expanded outwards to obtain a new rectangle, and the image blocks surrounded by the new rectangle are taken out to obtain the face image. For example, a face detection model trained in advance can be adopted to detect a face region from the acquired multi-frame image, so that the emotion bag material can be extracted quickly and accurately.
Further, after S203, the method further includes:
and when the expression similarity between the face image in the second image and the first expression package image is greater than or equal to the second preset similarity threshold, taking the first expression package image as the target expression package image, and taking the face image in the first image and the face image in the second image as dynamic expression package materials, wherein the second image is at least one frame image adjacent to the first image in the multi-frame images.
In this embodiment, taking the selection of three frames of images as an example, when the expression similarities of the first image and the second image with the first expression package image respectively meet the condition, that is, the expression similarity is greater than or equal to a second preset similarity threshold, the face image in the first image and the face image in the second image may be used as dynamic expression package materials. If the expression similarity between the first image and the first expression package image meets the condition, the expression similarity between the third image and the first expression package image meets the condition, and the expression similarity between the second image and the first expression package image does not meet the condition, the face images in the three frames of images can be used as dynamic expression package materials, so that a dynamic expression package can be generated, the generation type of the expression package is expanded, and the interest of using the expression package is added.
Further, after S203, the method further includes:
and when the expression similarity between the face image in the third image and the second expression package image is the maximum, taking the second expression package image as the target expression package image, and taking the face image in the third image as a static expression package material, wherein the third image is at least one frame image adjacent to the first image in the multi-frame images.
In this embodiment, taking the selection of three frames of images as an example, when the expression similarity between the face image in the third image of the multiple frames of images and the second expression package image is the largest, the second expression package image is taken as a target expression package image, and the face image in the third image is taken as a static expression package material, that is, an expression package image. By selecting the expression package image with the maximum expression similarity and the face image in the corresponding image, the expression package image to be generated can be accurately extracted, so that the generated expression package expresses the accuracy of emotion.
Further, after S203, the method further includes:
and when the expression similarity between the face image in the third image and the second expression package image is maximum, taking the second expression package image as the target expression package image, and taking the face image and a fourth image in the third image as dynamic expression package materials, wherein the fourth image is at least two continuous frames of images in the multi-frame image, and at least one frame of image in the at least two frames of images is adjacent to the first image.
In this embodiment, taking two expression package images in an expression package image library as an example, the expression similarity between a face image in a portrait image and a first expression package image is recorded as a first expression similarity, the expression similarity between the face image in the portrait image and a second expression package image is recorded as a second expression similarity, and when the first expression similarity is smaller than the second expression similarity, the second expression package image corresponding to the second expression similarity is selected as a target expression package image. The similarity between the facial image in the image and the expression bag image is the largest in the selected multi-frame image, and continuous facial images in the front frame image and the rear frame image of the image can be used as dynamic expression bag materials, so that the display effect of the expression bag is improved.
In another embodiment, in order to improve the display effect of the target expression package, the user may edit the target expression package or add a property effect, such as a property with the effect of a hat, a mushroom head, and the like, and/or add some artistic characters, watermarks, and the like according to his/her own intention, and make the target expression package into a preset format to be stored in an expression package image library such as an interface for the user to operate, a chat tool, and the like, where the preset format is set according to needs, for example, the preset format may be a GIF format, and the GIF format may store a plurality of images, and read and display the plurality of images stored in one file on a screen, which may constitute a simple animation to improve the operability and experience of the user.
In the embodiment of the application, a portrait image is extracted from a character video to be processed, the portrait image contains a face image, expression similarity between the face image in the portrait image and an expression package image in a preset expression package image library is calculated, a target expression package image and expression package materials are determined based on the expression similarity, the target expression package image belongs to the expression package image library, the expression package materials belong to the portrait image, text information of the target expression package image is extracted, and the text information and the expression package materials are integrated to generate a target expression package. According to the method and the device, the target expression package image and the expression package material can be determined according to the expression similarity, and the text information of the expression package image and the expression package material are automatically integrated, so that a user does not need to manually select the expression package image, the operation burden of the user is reduced, and the user can rapidly and easily make the expression package in the video. Facial images or video clips are extracted from the character video according to the expression similarity and serve as the materials of the expression packages, the expression packages are generated based on expression similarity calculation, and the generation efficiency of the expression packages is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. Corresponding to the method for generating the expression package described in the above embodiments.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an expression package generating device 300 provided by the present application, as shown in fig. 3, including:
an obtaining module 310, configured to obtain at least one portrait image from a to-be-processed person video, where the portrait image includes a face image;
a calculating module 320, configured to calculate expression similarity between a face image in the portrait image and an expression package image in a preset expression package image library;
a determining module 330, configured to determine a target emoticon image and an emoticon material based on the expression similarity, where the target emoticon image belongs to the emoticon image library, and the emoticon material belongs to the portrait image;
the generating module 340 is configured to extract text information of the target expression package image, and integrate the text information with the expression package material to generate a target expression package.
Optionally, the determining module 330 is specifically configured to: when the expression similarity between a first face image and a first expression package image is larger than or equal to a first preset similarity threshold value, taking the first expression package image as a target expression package image, and taking the first face image as a static expression package material, wherein the first face image is any face image in the face image, and the first expression package image is any expression package image in an expression package image library.
Optionally, the determining module 330 specifically includes:
the first acquiring unit is used for acquiring attribute information of a first face image in the person video when the expression similarity between the first face image and a first expression package image is greater than or equal to a first preset similarity threshold;
a second obtaining unit, configured to obtain, from the person video, a multi-frame image corresponding to the first person image according to the attribute information;
the calculating unit is used for calculating the expression similarity between each frame of image and the first expression package image;
and the first material determining unit is used for taking the first expression package image as the target expression package image and taking the face image in the first image as a static expression package material when the expression similarity between the face image in the first image and the first expression package image is greater than or equal to a second preset similarity threshold, wherein the first image is any one of the multi-frame images.
Optionally, the determining module 330 may further include:
and a second pixel determining unit, configured to, after the expression similarity between each frame of image and the first expression package image is obtained through calculation by the calculating unit, when the expression similarity between a face image in a second image and the first expression package image is greater than or equal to a second preset similarity threshold, take the first expression package image as the target expression package image, and take the face image in the first image and a face image in a second image as dynamic expression package materials, where the second image is at least one frame of image adjacent to the first image in the multiple frames of images.
Optionally, the determining module 330 may further include:
and the third material determining unit is used for taking the second expression package image as the target expression package image and taking the face image in the third image as a static expression package material when the expression similarity between the face image in the third image and the second expression package image is maximum after the expression similarity between each frame of image and the first expression package image is calculated by the calculating unit, wherein the third image is at least one frame image adjacent to the first image in the multi-frame images.
Optionally, the determining module 330 may further include:
and a fourth material determining unit, configured to, after the expression similarity between each frame of image and the first expression package image is obtained through calculation by the calculating unit, when the expression similarity between a face image in a third image and a second expression package image is the maximum, take the second expression package image as the target expression package image, and take a face image in the third image and a fourth image as dynamic expression package materials, where the fourth image is at least two consecutive frames of images in the multiple frames of images, and at least one of the at least two frames of images is adjacent to the first image.
Optionally, the obtaining module 310 is further configured to extract facial expression features of the facial image through a preset deep neural network, and the calculating module 320 performs feature comparison between the facial expression features and the expression features of the expression package image to obtain the expression similarity.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application, as shown in fig. 4, a terminal device 400 includes a memory 410, at least one processor 420, and a computer program 430 stored in the memory 410 and executable on the processor 420, and when the processor 420 executes the computer program 430, the method for generating an expression package is implemented.
The terminal device 400 may be a desktop computer, a mobile phone, a tablet computer, a wearable device, an in-vehicle device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and other terminal devices, and the specific type of the terminal device is not limited in this embodiment of the present application.
The terminal device 400 may include, but is not limited to, a processor 420, a memory 410. Those skilled in the art will appreciate that fig. 4 is merely an example of the terminal device 400, and does not constitute a limitation of the terminal device 400, and may include more or less components than those shown, or combine some components, or different components, such as may also include input/output devices, etc.
The Processor 420 may be a Central Processing Unit (CPU), and the Processor 420 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 410 may in some embodiments be an internal storage unit of the terminal device 400, such as a hard disk or a memory of the terminal device 400. The memory 410 may also be an external storage device of the terminal device 400 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the terminal device 400. Further, the memory 410 may also include both an internal storage unit of the terminal device 400 and an external storage device. The memory 410 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs, such as program codes of the computer programs. The memory 410 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the content of information interaction, execution process, and the like between the above expression package generation devices/units is based on the same concept as that of the method embodiment of the present application, specific functions and technical effects thereof may be specifically referred to a part of the method embodiment, and details are not described here.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. For the specific working processes of the units and modules in the auxiliary shooting device, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. An expression package generation method, comprising:
acquiring at least one portrait image from a character video to be processed, wherein the portrait image comprises a face image;
calculating expression similarity between a face image in the portrait image and an expression package image in a preset expression package image library;
determining a target expression package image and an expression package material based on the expression similarity, wherein the target expression package image belongs to the expression package image library, and the expression package material belongs to the portrait image;
extracting text information of the target expression package image, and integrating the text information and the expression package materials to generate a target expression package.
2. The method of generating an emoticon according to claim 1, wherein the determining a target emoticon image and emoticon material based on the emotion similarity comprises:
when the expression similarity between a first face image and a first expression package image is larger than or equal to a first preset similarity threshold value, taking the first expression package image as a target expression package image, and taking the first face image as a static expression package material, wherein the first face image is any face image in the face image, and the first expression package image is any expression package image in an expression package image library.
3. The method of generating an emoticon according to claim 1, wherein the determining a target emoticon image and emoticon material based on the emotion similarity comprises:
when the expression similarity between a first face image and a first expression package image is larger than or equal to a first preset similarity threshold value, acquiring attribute information of the first face image in the person video;
acquiring a multi-frame image corresponding to the first face image from the person video according to the attribute information;
calculating the expression similarity between each frame of image and the first expression package image;
and when the expression similarity between the face image in the first image and the first expression package image is greater than or equal to a second preset similarity threshold, taking the first expression package image as the target expression package image, and taking the face image in the first image as a static expression package material, wherein the first image is any one of the multi-frame images.
4. The method of generating an expression package according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first expression package image, the method further comprises:
and when the expression similarity between the face image in the second image and the first expression package image is greater than or equal to the second preset similarity threshold, taking the first expression package image as the target expression package image, and taking the face image in the first image and the face image in the second image as dynamic expression package materials, wherein the second image is at least one frame image adjacent to the first image in the multi-frame images.
5. The method of generating an expression package according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first expression package image, the method further comprises:
and when the expression similarity between the face image in the third image and the second expression package image is the maximum, taking the second expression package image as the target expression package image, and taking the face image in the third image as a static expression package material, wherein the third image is at least one frame image adjacent to the first image in the multi-frame images.
6. The method of generating an expression package according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first expression package image, the method further comprises:
and when the expression similarity between the face image in the third image and the second expression package image is maximum, taking the second expression package image as the target expression package image, and taking the face image and a fourth image in the third image as dynamic expression package materials, wherein the fourth image is at least two continuous frames of images in the multi-frame image, and at least one frame of image in the at least two frames of images is adjacent to the first image.
7. The method of any one of claims 1 to 6, wherein the calculating of the expression similarity between the facial image and the expression package image in the expression package image library comprises:
extracting facial expression characteristics of the facial image through a preset deep neural network;
and comparing the facial expression features with the expression features of the expression packet image to obtain the expression similarity.
8. An expression package generation apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring at least one portrait image from a to-be-processed figure video, and the portrait image comprises a face image;
the computing module is used for computing the expression similarity between the face image in the portrait image and the expression package image in a preset expression package image library;
the determining module is used for determining a target expression package image and an expression package material based on the expression similarity, wherein the target expression package image belongs to the expression package image library, and the expression package material belongs to the portrait image;
and the generation module is used for extracting the text information of the target expression package image and integrating the text information and the expression package materials to generate the target expression package.
9. A terminal device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the emoticon generation method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the emoticon generation method according to any one of claims 1 to 7.
CN201911197094.5A 2019-11-29 2019-11-29 Expression package generation method and device and terminal equipment Active CN110889379B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911197094.5A CN110889379B (en) 2019-11-29 2019-11-29 Expression package generation method and device and terminal equipment
PCT/CN2020/129209 WO2021104097A1 (en) 2019-11-29 2020-11-17 Meme generation method and apparatus, and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911197094.5A CN110889379B (en) 2019-11-29 2019-11-29 Expression package generation method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN110889379A true CN110889379A (en) 2020-03-17
CN110889379B CN110889379B (en) 2024-02-20

Family

ID=69749402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911197094.5A Active CN110889379B (en) 2019-11-29 2019-11-29 Expression package generation method and device and terminal equipment

Country Status (2)

Country Link
CN (1) CN110889379B (en)
WO (1) WO2021104097A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111372141A (en) * 2020-03-18 2020-07-03 腾讯科技(深圳)有限公司 Expression image generation method and device and electronic equipment
CN111586466A (en) * 2020-05-08 2020-08-25 腾讯科技(深圳)有限公司 Video data processing method and device and storage medium
CN111768481A (en) * 2020-05-19 2020-10-13 北京奇艺世纪科技有限公司 Expression package generation method and device
CN111881776A (en) * 2020-07-07 2020-11-03 腾讯科技(深圳)有限公司 Dynamic expression obtaining method and device, storage medium and electronic equipment
WO2021104097A1 (en) * 2019-11-29 2021-06-03 深圳先进技术研究院 Meme generation method and apparatus, and terminal device
WO2023284640A1 (en) * 2021-07-15 2023-01-19 维沃移动通信有限公司 Picture processing method and electronic device
CN117150063A (en) * 2023-10-26 2023-12-01 深圳慢云智能科技有限公司 Image generation method and system based on scene recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369196A (en) * 2017-06-30 2017-11-21 广东欧珀移动通信有限公司 Expression, which packs, makees method, apparatus, storage medium and electronic equipment
CN109508399A (en) * 2018-11-20 2019-03-22 维沃移动通信有限公司 A kind of facial expression image processing method, mobile terminal
US20190122403A1 (en) * 2017-10-23 2019-04-25 Paypal, Inc. System and method for generating emoji mashups with machine learning
CN110321845A (en) * 2019-07-04 2019-10-11 北京奇艺世纪科技有限公司 A kind of method, apparatus and electronic equipment for extracting expression packet from video
CN110458916A (en) * 2019-07-05 2019-11-15 深圳壹账通智能科技有限公司 Expression packet automatic generation method, device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803909A (en) * 2017-02-21 2017-06-06 腾讯科技(深圳)有限公司 The generation method and terminal of a kind of video file
CN107239535A (en) * 2017-05-31 2017-10-10 北京小米移动软件有限公司 Similar pictures search method and device
FR3081244B1 (en) * 2018-05-17 2020-05-29 Idemia Identity & Security France CHARACTER RECOGNITION PROCESS
CN110162670B (en) * 2019-05-27 2020-05-08 北京字节跳动网络技术有限公司 Method and device for generating expression package
CN110889379B (en) * 2019-11-29 2024-02-20 深圳先进技术研究院 Expression package generation method and device and terminal equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369196A (en) * 2017-06-30 2017-11-21 广东欧珀移动通信有限公司 Expression, which packs, makees method, apparatus, storage medium and electronic equipment
US20190122403A1 (en) * 2017-10-23 2019-04-25 Paypal, Inc. System and method for generating emoji mashups with machine learning
CN109508399A (en) * 2018-11-20 2019-03-22 维沃移动通信有限公司 A kind of facial expression image processing method, mobile terminal
CN110321845A (en) * 2019-07-04 2019-10-11 北京奇艺世纪科技有限公司 A kind of method, apparatus and electronic equipment for extracting expression packet from video
CN110458916A (en) * 2019-07-05 2019-11-15 深圳壹账通智能科技有限公司 Expression packet automatic generation method, device, computer equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021104097A1 (en) * 2019-11-29 2021-06-03 深圳先进技术研究院 Meme generation method and apparatus, and terminal device
CN111372141A (en) * 2020-03-18 2020-07-03 腾讯科技(深圳)有限公司 Expression image generation method and device and electronic equipment
CN111372141B (en) * 2020-03-18 2024-01-05 腾讯科技(深圳)有限公司 Expression image generation method and device and electronic equipment
CN111586466A (en) * 2020-05-08 2020-08-25 腾讯科技(深圳)有限公司 Video data processing method and device and storage medium
CN111768481A (en) * 2020-05-19 2020-10-13 北京奇艺世纪科技有限公司 Expression package generation method and device
CN111881776A (en) * 2020-07-07 2020-11-03 腾讯科技(深圳)有限公司 Dynamic expression obtaining method and device, storage medium and electronic equipment
CN111881776B (en) * 2020-07-07 2023-07-07 腾讯科技(深圳)有限公司 Dynamic expression acquisition method and device, storage medium and electronic equipment
WO2023284640A1 (en) * 2021-07-15 2023-01-19 维沃移动通信有限公司 Picture processing method and electronic device
CN117150063A (en) * 2023-10-26 2023-12-01 深圳慢云智能科技有限公司 Image generation method and system based on scene recognition
CN117150063B (en) * 2023-10-26 2024-02-06 深圳慢云智能科技有限公司 Image generation method and system based on scene recognition

Also Published As

Publication number Publication date
CN110889379B (en) 2024-02-20
WO2021104097A1 (en) 2021-06-03

Similar Documents

Publication Publication Date Title
CN110889379B (en) Expression package generation method and device and terminal equipment
WO2020063319A1 (en) Dynamic emoticon-generating method, computer-readable storage medium and computer device
CN111612873B (en) GIF picture generation method and device and electronic equipment
CN111241340B (en) Video tag determining method, device, terminal and storage medium
CN111489290B (en) Face image super-resolution reconstruction method and device and terminal equipment
CN108961267B (en) Picture processing method, picture processing device and terminal equipment
CN112532882B (en) Image display method and device
CN113434716B (en) Cross-modal information retrieval method and device
JP2020127194A (en) Computer system and program
CN109961403B (en) Photo adjusting method and device, storage medium and electronic equipment
JP2023543964A (en) Image processing method, image processing device, electronic device, storage medium and computer program
CN113722541A (en) Video fingerprint generation method and device, electronic equipment and storage medium
CN110717452B (en) Image recognition method, device, terminal and computer readable storage medium
CN108776959B (en) Image processing method and device and terminal equipment
CN111063006A (en) Image-based literary work generation method, device, equipment and storage medium
CN113225451B (en) Image processing method and device and electronic equipment
CN113676734A (en) Image compression method and image compression device
CN111325656B (en) Image processing method, image processing device and terminal equipment
CN111984173B (en) Expression package generation method and device
CN110942065B (en) Text box selection method, text box selection device, terminal equipment and computer readable storage medium
CN112764601B (en) Information display method and device and electronic equipment
CN111107259B (en) Image acquisition method and device and electronic equipment
CN115359524A (en) Construction method of face feature library, and person style image identification method and device
CN117453635A (en) Image deletion method, device, electronic equipment and readable storage medium
CN113703901A (en) Graphic code display method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant