WO2021104097A1 - Meme generation method and apparatus, and terminal device - Google Patents

Meme generation method and apparatus, and terminal device Download PDF

Info

Publication number
WO2021104097A1
WO2021104097A1 PCT/CN2020/129209 CN2020129209W WO2021104097A1 WO 2021104097 A1 WO2021104097 A1 WO 2021104097A1 CN 2020129209 W CN2020129209 W CN 2020129209W WO 2021104097 A1 WO2021104097 A1 WO 2021104097A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
emoticon
package
emoticon package
face
Prior art date
Application number
PCT/CN2020/129209
Other languages
French (fr)
Chinese (zh)
Inventor
乔宇
李英
孟锝斌
彭小江
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2021104097A1 publication Critical patent/WO2021104097A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application belongs to the technical field of video applications, and in particular relates to a method, device, terminal device, and computer-readable storage medium for generating emoticons.
  • emoticons are a way of using pictures to express feelings.
  • the original emoticons were mostly designed for professionals, such as emoji emoticons, QQ emoticons, etc.
  • people began to popularize emoticons with pictures and text, but because the production of emoticons requires manual extraction of emoticons or adding text information, it will be time-consuming and labor-intensive, and it will also generate emoticons in character videos. The problem of inefficiency.
  • the embodiments of the present application provide a method, a device and a terminal device for generating an emoticon package to solve the problem of low efficiency of extracting a face image from a character video and generating a corresponding emoticon package in the prior art.
  • an embodiment of the present application provides a method for generating an emoticon package, including:
  • the determining the target emoticon package image and emoticon package material based on the expression similarity includes:
  • the first emoticon pack image is used as the target emoticon pack image
  • the first person A face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image is any emoticon pack in the emoticon pack image library image.
  • the determining the target emoticon package image and emoticon package material based on the expression similarity includes:
  • the first emoticon package image is used as the target emoticon package image .
  • the face image in the first image is any one of the multiple frames of images.
  • the method further includes:
  • the first emoticon package image is used as the target expression A packet image, using the face image in the first image and the face image in the second image as dynamic emoticon packet materials, wherein the second image is the same as the first image in the multi-frame image At least one adjacent frame of image.
  • the method further includes:
  • the second emoticon package image is used as the target emoticon package image
  • the human face in the third image The image is used as a static emoticon package material, wherein the third image is at least one frame of image adjacent to the first image among the multiple frames of images.
  • the method further includes:
  • the second emoticon package image is used as the target emoticon package image
  • the human face in the third image The image and the fourth image are used as dynamic emoticon package materials, wherein the fourth image is at least two consecutive images in the multi-frame images, and at least one of the at least two images is related to the first image.
  • the images are adjacent.
  • the calculating the expression similarity between the face image and the emoticon package image in the emoticon package image library includes:
  • the facial expression feature is compared with the expression feature of the emoticon package image to obtain the expression similarity.
  • an emoticon package generation device including:
  • An obtaining module configured to obtain at least one portrait image from a character video to be processed, the portrait image including a face image
  • Determining module used to determine a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
  • the generating module is used for extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate the target emoticon package.
  • an embodiment of the present application also provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor executes the computer program When realizing the steps of the method in the first aspect.
  • an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program implements the steps of the method in the first aspect when the computer program is executed by a processor.
  • the emoticon package generation method, device, and terminal equipment provided by the embodiments of the present application have the following beneficial effects:
  • the portrait image contains a face image
  • the target emoticon package image and the emoticon package material are determined based on the expression similarity, wherein the target emoticon package image belongs to the emoticon package image library, and the emoticon package material belongs to the A portrait image, extracting text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package.
  • the present invention can determine the target emoticon pack image and emoticon pack material according to the degree of expression similarity, and automatically integrate the text information of the emoticon pack image with the emoticon pack material, the user does not need to manually select the emoticon pack image, which reduces the user's operational burden, Allow users to quickly and easily make their own emoticons in the video.
  • the facial expression similarity the facial image or video fragment is extracted from the character video as the emoticon package material, which realizes the generation of the emoticon package based on the expression similarity calculation, and improves the efficiency of emoticon generation.
  • FIG. 1 is a schematic diagram of an implementation process of a method for generating an emoticon package provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of the implementation process of the emoticon package generation method provided by another embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an emoticon package generating apparatus provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • 300-emoji package generation device 310-acquisition module; 320-calculation module; 330-determination module; 340-generation module; 400-terminal equipment; 410-memory; 420-processor; 430-computer program.
  • the term “if” can be construed as “when” or “once” or “in response to determination” or “in response to detecting ".
  • the phrase “if determined” or “if detected [described condition or event]” can be construed as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • FIG. 1 shows a method for generating an emoticon package provided by an embodiment of the present application, and the method for generating an emoticon package may include the following S101 to S104.
  • S101 Obtain at least one portrait image from a character video to be processed, where the portrait image includes a face image;
  • the execution subject of the emoticon package generation method may compose each frame image in the above-mentioned character video into a portrait image, or the above-mentioned execution subject may be based on a preset step size (for example, 1 or 2, etc.) from the above-mentioned character video.
  • the image is extracted from the image, and the extracted images are formed into a portrait image.
  • the images in the portrait image are arranged in the order of playback in the above-mentioned character video.
  • the preset step length can be set according to actual needs. There are no specific restrictions.
  • the execution body may receive the emoticon package generation request sent by the user through the electronic device in real time, and the character video may be a video included in the emoticon package generation request received by the execution body.
  • the character video can be pre-stored in the electronic device, or it can be a video saved on the website for playback, or it can be recorded in real time through the electronic device.
  • Face recognition technology processes the portrait image in the person video, and extracts at least one portrait image with a face image.
  • the portrait image includes background, text, etc.
  • the face image can be a person's face or multiple people's human face.
  • the electronic device is a terminal device, it can meet the user's needs for extracting a specific face image from the video; when the electronic device is a server, by running the device for extracting the face image from the character video in the electronic device, It can meet the facial image extraction requirements of platforms such as video websites.
  • a face detection algorithm is used to detect a face image with a face from multiple portrait images, and the face image can be marked, which is convenient for extracting facial expression features from the face image.
  • the face detection algorithm is used to detect at least one portrait image in the person video to obtain the corresponding face detection result.
  • the face detection result is used to indicate whether a human face is displayed in the portrait image.
  • the face detection algorithm can be a multi-task convolutional neural network (Multi-task convolutional neural network).
  • neural network, MTCNN multi-task convolutional neural network
  • MTCNN multi-task neural network model for face detection tasks.
  • the model mainly uses three cascaded networks and the idea of candidate boxes plus classifiers to perform fast and efficient face detection.
  • the cascaded networks are P-Net that quickly generates candidate windows, R-Net that performs high-precision candidate window filtering and selection, and O-Net that generates final bounding boxes and face key points.
  • the human face in the video of the person is detected by a multi-task convolutional neural network to obtain a corresponding face image.
  • S102 Calculate the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;
  • a deep neural network is used to extract facial expression features in the facial image.
  • the deep neural network can be used in ImageNet, Face Recognition data or facial expression data and other databases are pre-trained, and the extracted facial expression features are compared with the pre-processed expression packs in the emoticon pack image library, where the emoticon pack feature can be a new input
  • the emoticon feature, feature comparison is the convolutional layer feature or the fully connected layer feature of the deep neural network, and the classification probability of the neural network can also be used as the feature.
  • Feature comparison can be calculated based on feature cosine similarity, or Euclidean distance can be calculated after feature normalization, and the calculated result is used as expression similarity.
  • the emoticon pack image library can be downloaded or collected from the Internet, or it can be an emoticon pack image manually added or created by the user, and classified according to the text information, format, etc. of the emoticon pack image , Use the text extraction method to obtain the text in the emoticon, such as OCR text recognition software, if the emoticon does not contain text, you can obtain the name and format of the emoticon, or extract the surrounding text of the emoticon to form the text of the emoticon Information, according to the emoticon package and the corresponding text information to establish an emoticon package image library.
  • OCR text recognition software if the emoticon does not contain text, you can obtain the name and format of the emoticon, or extract the surrounding text of the emoticon to form the text of the emoticon Information, according to the emoticon package and the corresponding text information to establish an emoticon package image library.
  • S103 Determine a target emoticon pack image and emoticon pack material based on the expression similarity, where the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
  • the calculated expression similarity can be marked or sorted, or the expression similarities can be compared one by one to obtain the expression.
  • the value corresponding to the similarity is the largest one or more.
  • the determined target emoticon pack image, emoticon pack material, and corresponding emoticon similarity can also be associated, so as to quickly find the target emoticon pack image and emoticon pack material.
  • S104 Extract text information of the target emoticon package image, and integrate the text information with the emoticon package material to generate a target emoticon package.
  • the emoticon pack image is used as the target emoticon pack image, the text information of the target emoticon pack image is extracted, the face image in the portrait image is extracted as the emoticon pack material, and the text information is added by subtitles or naming method and the emoticon pack material is integrated to generate the target emoticon pack .
  • the facial images in the multiple portrait images are respectively calculated with the emoticons in the emoticon pack image library, and the emoticon pack images and face images corresponding to the emoticon similarity that exceed the preset threshold are selected, so that the target The accurate matching of the emoticon image and the emoticon material also improves the efficiency of emoticon generation.
  • the determining the target emoticon package image and the emoticon package material based on the expression similarity includes:
  • the first emoticon pack image is used as the target emoticon pack image
  • the first person A face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image is any emoticon pack in the emoticon pack image library image.
  • the emoticon package image is used as the target emoticon package Image
  • the target emoticon pack image can be understood as an emoticon pack template
  • the emoticon pack template is used to extract text information. Precisely screen out the emoticon pack images and corresponding face images that meet the requirements, so as to quickly generate new emoticon packs, and save the generated emoticon packs in the emoticon pack image library, which can be downloaded or forwarded by users, thereby improving users Activity.
  • FIG. 2 shows a schematic diagram of an implementation process of a method for generating an emoticon package according to another embodiment of the present application, wherein the above S103 includes the following S201 to S204.
  • the expression similarity can be set to a value in [0,100]
  • the first preset similarity threshold can be set to 50
  • the expression similarity between the first face image and the first emoticon package image is 55.
  • the attribute information of the first face image in the character video is obtained.
  • the attribute information includes the position and time of the first face image in the entire video, so as to facilitate subsequent interception of the character video based on the attribute information
  • Related video clips effectively avoid background information from interfering with the normal extraction of face images.
  • Face images, portrait images include face images, background information, subtitles, etc., thereby improving the accuracy of extracting emoticon materials.
  • S202 Acquire multiple frames of images corresponding to the first face image from the person video according to the attribute information
  • the first face image may appear one or more times in the person video.
  • the corresponding multi-frame images that is, a video clip, are intercepted from the person video.
  • the face area can be detected from the obtained video frames, and after the face area is detected, the face frame set is obtained, and then based on the person
  • the face frame collection extracts the face image from the obtained video clips, that is, the image containing the area within the face frame, and calculates the facial expression of each frame of the image with the first expression pack image to obtain the corresponding expression similarity In order to select the emoticon package material that meets the requirements.
  • the above-mentioned execution subject may be based on the expression similarity corresponding to the above-mentioned multi-frame images.
  • the expression similarity may be represented by a numerical value within [0,100]. In practice, the smaller the value, the lower the expression similarity. The larger the expression, the higher the similarity.
  • the first preset similarity threshold may be set to 50
  • Obtain the attribute information of the first face image in the character video the attribute information includes the time and location of the first face image, and the multiple frames of images corresponding to the first face image can be located according to the attribute information, and the A video clip of a certain period of time (such as 3 seconds) before and after the time in the character video is obtained.
  • the images around the face position in the character video are obtained to form a condensed video clip, which can remove irrelevant background information in the video clip.
  • the expression similarity is greater than or equal to the second
  • a frame of image with a similarity threshold is preset
  • the face image in the image is used as a static emoticon pack material, that is, an emoticon pack picture.
  • the size of the image around the face position can be set to four times the size of the original face image. This makes it easy to distinguish between the face image and the background information.
  • the first preset similarity threshold and the second similarity threshold may be the same or different, and are set according to actual conditions, which are not specifically limited here.
  • one frame of image may contain one or more face regions, or there is no face region; and when there are one or more face regions, a corresponding number of face frames can be obtained, and then, a corresponding number can be obtained.
  • the above-mentioned face frame set is a set about at least one face frame, the face frame is a rectangular frame surrounding the face area, and since the face frame can be characterized by coordinate information, therefore, The aforementioned face frame set may specifically include at least one piece of coordinate information, and each piece of coordinate information can determine a face frame.
  • the face frame In order to extract the complete expression of the emoticon package material, the face frame can be expanded outward to obtain a new rectangle, and the image block enclosed by the new rectangle can be taken out to obtain the face image.
  • a pre-trained face detection model can be used to detect the face area from the acquired multi-frame images, so as to quickly and accurately extract the emoticon package material.
  • the first emoticon package image is used as the target expression A packet image, using the face image in the first image and the face image in the second image as dynamic emoticon packet materials, wherein the second image is the same as the first image in the multi-frame image At least one adjacent frame of image.
  • the expression similarity between the first image and the second image and the first emoticon package image respectively meet the condition, that is, the expression similarity is greater than or equal to the second preset similarity threshold.
  • the face image in the first image and the face image in the second image can be used as dynamic emoticon package materials.
  • the face images in the three frames of images can also be used as dynamic emoticon pack materials, that is, dynamic emoticon packs can be generated, which expands the types of emoticon pack generation and adds the fun of using emoticon packs.
  • the second emoticon package image is used as the target emoticon package image
  • the human face in the third image The image is used as a static emoticon package material, wherein the third image is at least one frame of image adjacent to the first image among the multiple frames of images.
  • the second emoticon pack image is taken as The target emoticon package image
  • the face image in the third image is used as the static emoticon package material, that is, the emoticon package picture.
  • the second emoticon package image is used as the target emoticon package image
  • the human face in the third image The image and the fourth image are used as dynamic emoticon package materials, wherein the fourth image is at least two consecutive images in the multi-frame images, and at least one of the at least two images is related to the first image.
  • the images are adjacent.
  • the expression similarity between the face image in the portrait image and the first emoticon pack image is recorded as the first expression similarity
  • the portrait The expression similarity between the face image in the image and the second expression pack image is recorded as the second expression similarity.
  • the first expression similarity is less than the second expression similarity
  • the second expression similarity corresponding to the second expression similarity is selected.
  • the emoticon package image is used as the target emoticon package image.
  • the continuous face images in the previous and subsequent frames of the image can be used as the dynamic emoticon pack material to improve The display effect of the emoticon package.
  • the user in order to improve the display effect of the target emoticon package, the user can edit the target emoticon package according to his own wishes or add props effects, such as hats, mushroom heads and other effect props, and/or add some art Words, watermarks, etc.
  • the target emoticon package is made into a preset format and stored in the emoticon package image library such as interfaces and chat tools for users to operate.
  • the preset format is set according to needs.
  • the preset format can be It is a GIF format.
  • the GIF format can store multiple images. Multiple images saved in a file can be read out and displayed on the screen to form a simple animation to improve user operability and experience.
  • the portrait image contains a face image
  • the target emoticon package image and the emoticon package material are determined based on the expression similarity, wherein the target emoticon package image belongs to the emoticon package image library, and the emoticon package material belongs to the A portrait image, extracting text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package.
  • the present invention can determine the target emoticon pack image and emoticon pack material according to the degree of expression similarity, and automatically integrate the text information of the emoticon pack image with the emoticon pack material, the user does not need to manually select the emoticon pack image, which reduces the user's operational burden, Allow users to quickly and easily make their own emoticons in the video.
  • the facial expression similarity the facial image or video fragment is extracted from the character video as the emoticon package material, which realizes the generation of the emoticon package based on the expression similarity calculation, and improves the efficiency of emoticon generation.
  • FIG. 3 shows a schematic structural diagram of an emoticon package generating apparatus 300 provided by the present application, as shown in FIG. 3, including:
  • the obtaining module 310 is configured to obtain at least one portrait image from a character video to be processed, where the portrait image includes a face image;
  • the calculation module 320 is configured to calculate the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;
  • the determining module 330 is configured to determine a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
  • the generating module 340 is configured to extract text information of the target emoticon package image, and integrate the text information with the emoticon package material to generate a target emoticon package.
  • the determining module 330 is specifically configured to: when the expression similarity between the first facial image and the first emoticon package image is greater than or equal to a first preset similarity threshold, the first emoticon package The image is used as a target emoticon pack image, and the first face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image Is any emoticon package image in the emoticon package image library.
  • the determining module 330 specifically includes:
  • the first acquiring unit is configured to acquire the first face image in the person video when the expression similarity between the first face image and the first emoticon package image is greater than or equal to a first preset similarity threshold. Attribute information in;
  • a second acquiring unit configured to acquire multiple frames of images corresponding to the first face image from the person video according to the attribute information
  • a calculation unit configured to calculate the expression similarity between each frame of the image and the first expression pack image
  • the first material determining unit is configured to combine the first emoticon package when the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold
  • An image is used as the target emoticon pack image, and the face image in the first image is used as a static emoticon pack material, wherein the first image is any one of the multiple frames of images.
  • the determining module 330 may further include:
  • the second material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon pack image, when the face image in the second image is the same as the first emoticon pack image.
  • the expression similarity between the emoticon package images is greater than or equal to the second preset similarity threshold
  • the first emoticon package image is used as the target emoticon package image
  • the face image in the second image is used as a dynamic emoticon package material, wherein the second image is at least one frame of image adjacent to the first image among the multiple frames of images.
  • the determining module 330 may further include:
  • the third material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon package image, when the face image in the third image is compared with the second emoticon package When the expression similarity between the images is the greatest, the second emoticon pack image is used as the target emoticon pack image, and the face image in the third image is used as the static emoticon pack material, wherein the third emoticon pack image
  • the image is at least one frame of image adjacent to the first image among the multiple frames of images.
  • the determining module 330 may further include:
  • the fourth material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon package image, when the face image in the third image is compared with the second emoticon package
  • the second emoticon pack image is used as the target emoticon pack image
  • the face image and the fourth image in the third image are used as dynamic emoticon pack materials, wherein,
  • the fourth image is at least two consecutive frames of images in the multiple frames of images, and at least one of the at least two frames of images is adjacent to the first image.
  • the acquisition module 310 is further configured to extract facial expression features of the facial image through a preset deep neural network, and the calculation module 320 compares the facial expression features with those of the emoticon package image. The expression features are compared to obtain the expression similarity.
  • FIG. 4 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device 400 includes a memory 410, at least one processor 420, and is stored in the memory 410 and can be stored in the memory 410.
  • the computer program 430 running on the processor 420 implements the aforementioned emoticon package generation method when the processor 420 executes the computer program 430.
  • the terminal device 400 may be a desktop computer, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra mobile personal computer (ultra -mobile personal On terminal devices such as a computer (UMPC), a netbook, and a personal digital assistant (personal digital assistant, PDA), the embodiments of this application do not impose any restrictions on the specific types of terminal devices.
  • UMPC computer
  • PDA personal digital assistant
  • the terminal device 400 may include but is not limited to a processor 420 and a memory 410. Those skilled in the art can understand that FIG. 4 is only an example of the terminal device 400, and does not constitute a limitation on the terminal device 400. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices.
  • the so-called processor 420 may be a central processing unit (Central Processing Unit, CPU), and the processor 420 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 410 may be an internal storage unit of the terminal device 400, such as a hard disk or a memory of the terminal device 400. In other embodiments, the memory 410 may also be an external storage device of the terminal device 400, such as a plug-in hard disk equipped on the terminal device 400, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card, etc. Further, the memory 410 may also include both an internal storage unit of the terminal device 400 and an external storage device. The memory 410 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, such as the program code of the computer program. The memory 410 may also be used to temporarily store data that has been output or will be output.
  • a boot loader Boot Loader
  • the embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.
  • the embodiments of the present application provide a computer program product.
  • the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program can be stored in a computer-readable storage medium.
  • the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may at least include: any entity or device capable of carrying computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (Read-Only Memory, ROM), and random access memory (Random Access Memory, RAM), electric carrier signal, telecommunications signal, and software distribution medium.
  • any entity or device capable of carrying computer program code to the photographing device/terminal device recording medium, computer memory, read-only memory (Read-Only Memory, ROM), and random access memory (Random Access Memory, RAM), electric carrier signal, telecommunications signal, and software distribution medium.
  • computer-readable media cannot be electrical carrier signals and telecommunication signals.
  • the disclosed apparatus/network equipment and method may be implemented in other ways.
  • the device/network device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units.
  • components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

A meme generation method and apparatus, and a terminal device, which relate to the technical field of visual application. The method comprises: acquiring at least one portrait image from a figure video to be processed, wherein the portrait image includes a facial image (S101); calculating expression similarity between the facial image in the portrait image and a meme image in a preset meme image library (S102); determining a target meme image and meme material on the basis of the expression similarity, wherein the target meme image belongs to the meme image library, and the meme material belongs to the portrait image (S103); and extracting text information from the target meme image, and integrating the text information with the meme material to generate a target meme (S104). According to the method, efficiency of meme generation and accuracy of text information matching are improved.

Description

表情包生成方法、装置及终端设备Emoticon package generation method, device and terminal equipment 技术领域Technical field
本申请属于视频应用技术领域,尤其涉及一种表情包生成方法、装置、终端设备及计算机可读存储介质。This application belongs to the technical field of video applications, and in particular relates to a method, device, terminal device, and computer-readable storage medium for generating emoticons.
背景技术Background technique
随着网络的普及,移动通讯的逐步发展,越来越多的人们开始使用即时通讯工具。为了活跃聊天氛围,表情包是一种利用图片来表示感情的一种方式。最初的表情包,多为专业人士设计,如emoji表情符号,QQ表情等。随着表情的发展,人们开始流行图片加文字的表情包,但由于表情包的制作需要人工抽取表情包或添加文本信息,这样会耗时且耗费人力,也会给在人物视频中生成表情包的效率不高的问题。With the popularization of the Internet and the gradual development of mobile communications, more and more people are beginning to use instant messaging tools. In order to activate the chat atmosphere, emoticons are a way of using pictures to express feelings. The original emoticons were mostly designed for professionals, such as emoji emoticons, QQ emoticons, etc. With the development of emoticons, people began to popularize emoticons with pictures and text, but because the production of emoticons requires manual extraction of emoticons or adding text information, it will be time-consuming and labor-intensive, and it will also generate emoticons in character videos. The problem of inefficiency.
技术问题technical problem
有鉴于此,本申请实施例提供了一种表情包生成方法、装置及终端设备,以解决现有技术中在人物视频中提取人脸图像并生成相应的表情包的效率不高的问题。In view of this, the embodiments of the present application provide a method, a device and a terminal device for generating an emoticon package to solve the problem of low efficiency of extracting a face image from a character video and generating a corresponding emoticon package in the prior art.
技术解决方案Technical solutions
第一方面,本申请实施例提供了一种表情包生成方法,包括:In the first aspect, an embodiment of the present application provides a method for generating an emoticon package, including:
从待处理的人物视频中获取至少一张人像图像,所述人像图像中包含人脸图像;Acquiring at least one portrait image from a character video to be processed, where the portrait image includes a face image;
计算所述人像图像中的人脸图像与预设的表情包图像库中的表情包图像之间的表情相似度;Calculating the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;
基于所述表情相似度确定目标表情包图像以及表情包素材,其中,所述目标表情包图像属于所述表情包图像库,所述表情包素材属于所述人像图像;Determining a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
提取所述目标表情包图像的文本信息,并将所述文本信息与所述表情包素材进行整合生成目标表情包。Extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package.
可选地,所述基于所述表情相似度确定目标表情包图像以及表情包素材,包括:Optionally, the determining the target emoticon package image and emoticon package material based on the expression similarity includes:
当第一人脸图像与第一表情包图像之间的表情相似度大于或等于第一预设相似度阈值时,将所述第一表情包图像作为目标表情包图像,将所述第一人脸图像作为静态的表情包素材,其中,所述第一人脸图像为所述人像图像中的任一人脸图像,所述第一表情包图像为所述表情包图像库中的任一表情包图像。When the expression similarity between the first face image and the first emoticon pack image is greater than or equal to the first preset similarity threshold, the first emoticon pack image is used as the target emoticon pack image, and the first person A face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image is any emoticon pack in the emoticon pack image library image.
可选地,所述基于所述表情相似度确定目标表情包图像以及表情包素材,包括:Optionally, the determining the target emoticon package image and emoticon package material based on the expression similarity includes:
当第一人脸图像与第一表情包图像之间的表情相似度大于或等于第一预设相似度阈值时,获取所述第一人脸图像在所述人物视频中的属性信息;When the expression similarity between the first face image and the first emoticon package image is greater than or equal to the first preset similarity threshold, acquiring attribute information of the first face image in the person video;
根据所述属性信息从所述人物视频中获取所述第一人脸图像对应的多帧图像;Acquiring multiple frames of images corresponding to the first face image from the person video according to the attribute information;
计算每帧所述图像与所述第一表情包图像之间的表情相似度;Calculating the expression similarity between each frame of the image and the first emoticon package image;
当第一图像中的人脸图像与所述第一表情包图像之间的表情相似度大于或等于第二预设相似度阈值时,将所述第一表情包图像作为所述目标表情包图像,将所述第一图像中的人脸图像作为静态的表情包素材,其中,所述第一图像为所述多帧图像中的任一帧图像。When the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold, the first emoticon package image is used as the target emoticon package image , Using the face image in the first image as a static emoticon package material, wherein the first image is any one of the multiple frames of images.
可选地,所述计算每帧所述图像与所述第一表情包图像之间的表情相似度之后,还包括:Optionally, after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further includes:
当第二图像中的人脸图像与所述第一表情包图像之间的表情相似度大于或等于所述第二预设相似度阈值时,将所述第一表情包图像作为所述目标表情包图像,将所述第一图像中的人脸图像以及第二图像中的人脸图像作为动态的表情包素材,其中,所述第二图像为所述多帧图像中与所述第一图像相邻的至少一帧图像。When the expression similarity between the face image in the second image and the first emoticon package image is greater than or equal to the second preset similarity threshold, the first emoticon package image is used as the target expression A packet image, using the face image in the first image and the face image in the second image as dynamic emoticon packet materials, wherein the second image is the same as the first image in the multi-frame image At least one adjacent frame of image.
可选地,所述计算每帧所述图像与所述第一表情包图像之间的表情相似度之后,还包括:Optionally, after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further includes:
当第三图像中的人脸图像与第二表情包图像之间的表情相似度最大时,将所述第二表情包图像作为所述目标表情包图像,将所述第三图像中的人脸图像作为静态的表情包素材,其中,所述第三图像为所述多帧图像中与所述第一图像相邻的至少一帧图像。When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image is used as a static emoticon package material, wherein the third image is at least one frame of image adjacent to the first image among the multiple frames of images.
可选地,所述计算每帧所述图像与所述第一表情包图像之间的表情相似度之后,还包括:Optionally, after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further includes:
当第三图像中的人脸图像与第二表情包图像之间的表情相似度最大时,将所述第二表情包图像作为所述目标表情包图像,将所述第三图像中的人脸图像以及第四图像作为动态的表情包素材,其中,所述第四图像为所述多帧图像中连续的至少两帧图像,且所述至少两帧图像中至少有一帧图像与所述第一图像相邻。When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image and the fourth image are used as dynamic emoticon package materials, wherein the fourth image is at least two consecutive images in the multi-frame images, and at least one of the at least two images is related to the first image. The images are adjacent.
可选地,所述计算所述人脸图像与所述表情包图像库中的表情包图像之间的表情相似度,包括:Optionally, the calculating the expression similarity between the face image and the emoticon package image in the emoticon package image library includes:
通过预设的深度神经网络提取所述人脸图像的人脸表情特征;Extracting facial expression features of the facial image through a preset deep neural network;
将所述人脸表情特征与所述表情包图像的表情特征进行特征比对得到所述表情相似度。The facial expression feature is compared with the expression feature of the emoticon package image to obtain the expression similarity.
第二方面,本申请实施例还提供了一种表情包生成装置,包括:In the second aspect, an embodiment of the present application also provides an emoticon package generation device, including:
获取模块,用于从待处理的人物视频中获取至少一张人像图像,所述人像图像中包含人脸图像;An obtaining module, configured to obtain at least one portrait image from a character video to be processed, the portrait image including a face image;
计算模块,用于计算所述人像图像中的人脸图像与预设的表情包图像库中的表情包图像之间的表情相似度;A calculation module for calculating the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;
确定模块:用于基于所述表情相似度确定目标表情包图像以及表情包素材,其中,所述目标表情包图像属于所述表情包图像库,所述表情包素材属于所述人像图像;Determining module: used to determine a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
生成模块,用于提取所述目标表情包图像的文本信息,并将所述文本信息与所述表情包素材进行整合生成目标表情包。The generating module is used for extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate the target emoticon package.
第三方面,本申请实施例还提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现第一方面中方法的步骤。In a third aspect, an embodiment of the present application also provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor executes the computer program When realizing the steps of the method in the first aspect.
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现第一方面中方法的步骤。In a fourth aspect, an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program implements the steps of the method in the first aspect when the computer program is executed by a processor.
有益效果Beneficial effect
本申请实施例提供的一种表情包生成方法、装置及终端设备具有以下有益效果:The emoticon package generation method, device, and terminal equipment provided by the embodiments of the present application have the following beneficial effects:
在本申请实施例中,通过从待处理的人物视频中提取出人像图像,所述人像图像包含人脸图像,再计算所述人像图像中的人脸图像与预设的表情包图像库中的表情包图像之间的表情相似度,基于所述表情相似度确定目标表情包图像以及表情包素材,其中,所述目标表情包图像属于所述表情包图像库,所述表情包素材属于所述人像图像,提取所述目标表情包图像的文本信息,并将所述文本信息与所述表情包素材进行整合生成目标表情包。由于本发明可以根据表情相似度来确定目标表情包图像和表情包素材,并自动将表情包图像的文本信息与表情包素材进行整合,无需用户手动选择表情包图像,减轻了用户的操作负担,使用户可以快速、轻松的在视频中自己制作表情包。根据表情相似度从人物视频中提取人脸图像或者视频片段作为表情包素材,实现了基于表情相似度计算的表情包的生成,提高了表情包生成的效率。In the embodiment of the present application, by extracting a portrait image from a person video to be processed, the portrait image contains a face image, and then calculating the face image in the portrait image and the preset emoticon image library Based on the expression similarity between the emoticon package images, the target emoticon package image and the emoticon package material are determined based on the expression similarity, wherein the target emoticon package image belongs to the emoticon package image library, and the emoticon package material belongs to the A portrait image, extracting text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package. Since the present invention can determine the target emoticon pack image and emoticon pack material according to the degree of expression similarity, and automatically integrate the text information of the emoticon pack image with the emoticon pack material, the user does not need to manually select the emoticon pack image, which reduces the user's operational burden, Allow users to quickly and easily make their own emoticons in the video. According to the facial expression similarity, the facial image or video fragment is extracted from the character video as the emoticon package material, which realizes the generation of the emoticon package based on the expression similarity calculation, and improves the efficiency of emoticon generation.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only of the present application. For some embodiments, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative labor.
图1是本申请实施例提供的表情包生成方法的实现流程示意图;FIG. 1 is a schematic diagram of an implementation process of a method for generating an emoticon package provided by an embodiment of the present application;
图2是本申请另一实施例提供的表情包生成方法的实现流程示意图;2 is a schematic diagram of the implementation process of the emoticon package generation method provided by another embodiment of the present application;
图3是本申请实施例提供的表情包生成装置的结构示意图;FIG. 3 is a schematic structural diagram of an emoticon package generating apparatus provided by an embodiment of the present application;
图4是本申请实施例提供的终端设备的结构示意图。Fig. 4 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
主要元件符号说明:Symbol description of main components:
300-表情包生成装置;310-获取模块;320-计算模块;330-确定模块;340-生成模块;400-终端设备;410-存储器;420-处理器;430-计算机程序。300-emoji package generation device; 310-acquisition module; 320-calculation module; 330-determination module; 340-generation module; 400-terminal equipment; 410-memory; 420-processor; 430-computer program.
本发明的实施方式Embodiments of the present invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the term "and/or" used in the specification of this application and the appended claims refers to any combination of one or more of the items listed in association and all possible combinations, and includes these combinations.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to determination" or "in response to detecting ". Similarly, the phrase "if determined" or "if detected [described condition or event]" can be construed as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.
为了说明本发明所述的技术方案,通过以下具体实施例来进行说明。In order to illustrate the technical solution of the present invention, the following specific embodiments are used to illustrate.
如图1所示,图1示出了本申请实施例提供的表情包生成方法,所述表情包生成方法可包括如下S101至S104。As shown in FIG. 1, FIG. 1 shows a method for generating an emoticon package provided by an embodiment of the present application, and the method for generating an emoticon package may include the following S101 to S104.
S101:从待处理的人物视频中获取至少一张人像图像,所述人像图像中包含人脸图像;S101: Obtain at least one portrait image from a character video to be processed, where the portrait image includes a face image;
在本实施例中,表情包生成方法的执行主体可以将上述人物视频中的各帧图像组成人像图像,或者,上述执行主体可以基于预设步长(例如1或2等),从上述人物视频中提取出图像,并将提取出的图像组成人像图像,其中,人像图像中的图像是按照在上述人物视频中的播放先后顺序排列的,应当理解,预设步长可以根据实际需要设置,此处不做具体限定。In this embodiment, the execution subject of the emoticon package generation method may compose each frame image in the above-mentioned character video into a portrait image, or the above-mentioned execution subject may be based on a preset step size (for example, 1 or 2, etc.) from the above-mentioned character video. The image is extracted from the image, and the extracted images are formed into a portrait image. The images in the portrait image are arranged in the order of playback in the above-mentioned character video. It should be understood that the preset step length can be set according to actual needs. There are no specific restrictions.
需要说明的是,上述执行主体可以实时地接收用户通过电子设备发送的表情包生成请求,上述人物视频可以是上述执行主体接收到的表情包生成请求所包含的视频。It should be noted that the execution body may receive the emoticon package generation request sent by the user through the electronic device in real time, and the character video may be a video included in the emoticon package generation request received by the execution body.
可以理解的是,人物视频可以是预先存储在电子设备中,也可以是保存在网站上供播放的视频,还可以是通过电子设备进行实时录制的,在将人物视频进行播放或者录制时,采用人脸识别技术对人物视频中的人像图像进行处理,提取出带有人脸图像的至少一张人像图像,人像图像中包括背景、文字等,人脸图像可以是一个人的人脸或多个人的人脸。当电子设备为终端设备的情况时,可以满足用户从视频中提取出特定的人脸图像的需求;当电子设备为服务器时,通过在电子设备中运行从人物视频中提取人脸图像的装置,可以满足视频网站等平台的人脸图像的提取需求。It is understandable that the character video can be pre-stored in the electronic device, or it can be a video saved on the website for playback, or it can be recorded in real time through the electronic device. When the character video is played or recorded, use Face recognition technology processes the portrait image in the person video, and extracts at least one portrait image with a face image. The portrait image includes background, text, etc. The face image can be a person's face or multiple people's human face. When the electronic device is a terminal device, it can meet the user's needs for extracting a specific face image from the video; when the electronic device is a server, by running the device for extracting the face image from the character video in the electronic device, It can meet the facial image extraction requirements of platforms such as video websites.
具体的,采用人脸检测算法从多张人像图像中检测出带有人脸的人脸图像,可以对人脸图像进行标记,便于对人脸图像进行人脸表情特征的提取。Specifically, a face detection algorithm is used to detect a face image with a face from multiple portrait images, and the face image can be marked, which is convenient for extracting facial expression features from the face image.
需要说明的是,人像图像中可能存在一个或多个人脸,也可能不存在人脸,采用人脸检测算法,对人物视频中的至少一张人像图像进行人脸检测得到相应的人脸检测结果,该人脸检测结果用于指示该人像图像中是否显示有人脸。人脸检测算法可以是多任务卷积神经网络(Multi-task convolutional neural network,MTCNN),用于人脸检测任务的多任务神经网络模型,该模型主要采用了三个级联的网络和候选框加分类器的思想,进行快速高效的人脸检测,这三个级联的网络分别是快速生成候选窗口的P-Net、进行高精度候选窗口过滤选择的R-Net和生成最终边界框与人脸关键点的O-Net。通过多任务卷积神经网络检测所述人物视频中的人脸得到相应的人脸图像。It should be noted that there may be one or more faces in the portrait image, or there may be no face. The face detection algorithm is used to detect at least one portrait image in the person video to obtain the corresponding face detection result. , The face detection result is used to indicate whether a human face is displayed in the portrait image. The face detection algorithm can be a multi-task convolutional neural network (Multi-task convolutional neural network). neural network, MTCNN), a multi-task neural network model for face detection tasks. The model mainly uses three cascaded networks and the idea of candidate boxes plus classifiers to perform fast and efficient face detection. The cascaded networks are P-Net that quickly generates candidate windows, R-Net that performs high-precision candidate window filtering and selection, and O-Net that generates final bounding boxes and face key points. The human face in the video of the person is detected by a multi-task convolutional neural network to obtain a corresponding face image.
S102:计算所述人像图像中的人脸图像与预设的表情包图像库中的表情包图像之间的表情相似度;S102: Calculate the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;
具体的,将多张人像图像中的人脸图像分别与表情包图像进行表情相似度计算时,采用深度神经网络提取人脸图像中的人脸表情特征,该深度神经网络可以在ImageNet、人脸识别数据或者人脸表情数据等数据库上预先训练,将提取的人脸表情特征与表情包图像库中的预先处理好的表情包进行特征比对,其中,该表情包特征可以是输入的新的表情包特征,特征比对是深度神经网络的卷积层特征或全连接层特征,也可以采用神经网络的分类概率做特征。得到表情包特征和人脸图片特征。特征比对可以基于特征余弦相似度计算,也可以特征归一化后进行欧式距离计算,计算得到的结果作为表情相似度。采用上述的人脸检测算法得到至少一张包含有人脸图像的人像图像,计算每张所述人像图像分别与预设的表情包图像库中的表情包图像之间的表情相似度,根据多个表情相似度可以筛选出符合要求的表情包图像,从而提高所述人像图像的人脸图像与所述表情包图像进行匹配的精确度。Specifically, when calculating facial expression similarity between facial images in multiple portrait images and emoticon package images, a deep neural network is used to extract facial expression features in the facial image. The deep neural network can be used in ImageNet, Face Recognition data or facial expression data and other databases are pre-trained, and the extracted facial expression features are compared with the pre-processed expression packs in the emoticon pack image library, where the emoticon pack feature can be a new input The emoticon feature, feature comparison is the convolutional layer feature or the fully connected layer feature of the deep neural network, and the classification probability of the neural network can also be used as the feature. Obtain emoticon package features and facial image features. Feature comparison can be calculated based on feature cosine similarity, or Euclidean distance can be calculated after feature normalization, and the calculated result is used as expression similarity. Use the aforementioned face detection algorithm to obtain at least one portrait image containing a human face image, and calculate the expression similarity between each of the portrait images and the emoticon package images in the preset emoticon package image library, according to multiple The expression similarity can filter out the emoticon package images that meet the requirements, thereby improving the accuracy of matching between the facial image of the portrait image and the emoticon package image.
需要说明的是,在执行上述S102之前,表情包图像库可以是从互联网上下载或收集的,也可以是用户手动添加或者制作的表情包图像,根据表情包图像的文字信息、格式等进行分类,采用文本提取的方式获取表情包中的文字,如OCR文字识别软件,若表情包中不包含文字,可以获取该表情包的命名及格式,或提取表情包周边的文字形成该表情包的文本信息,根据表情包及对应的文本信息建立表情包图像库。It should be noted that before performing the above S102, the emoticon pack image library can be downloaded or collected from the Internet, or it can be an emoticon pack image manually added or created by the user, and classified according to the text information, format, etc. of the emoticon pack image , Use the text extraction method to obtain the text in the emoticon, such as OCR text recognition software, if the emoticon does not contain text, you can obtain the name and format of the emoticon, or extract the surrounding text of the emoticon to form the text of the emoticon Information, according to the emoticon package and the corresponding text information to establish an emoticon package image library.
S103:基于所述表情相似度确定目标表情包图像以及表情包素材,其中,所述目标表情包图像属于所述表情包图像库,所述表情包素材属于所述人像图像;S103: Determine a target emoticon pack image and emoticon pack material based on the expression similarity, where the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
具体的,在计算每张人脸图像与表情包图像库中的表情包图像的表情相似度之后,可以将计算得到的表情相似度进行标记或排序,也可以将这些表情相似度逐一比较得到表情相似度对应的值最大的一个或多个。计算得到的表情相似度存在一个或多个符合上述条件,对应的表情包图像库中的表情包图像也是一个或多个,以确定表情包图像库中的目标表情包图像和人像图像中的人脸图像,将该人脸图像作为表情包素材用于生成新的表情包。另外,还可以将已确定的目标表情包图像、表情包素材及对应的表情相似度进行关联,以便快速查找出目标表情包图像和表情包素材。Specifically, after calculating the expression similarity between each face image and the emoticon package image in the emoticon package image library, the calculated expression similarity can be marked or sorted, or the expression similarities can be compared one by one to obtain the expression. The value corresponding to the similarity is the largest one or more. There are one or more expression similarities calculated that meet the above conditions, and there are also one or more expression package images in the corresponding expression package image library to determine the target expression package image and the person in the portrait image in the expression package image library. Face image, using the face image as the emoticon package material to generate a new emoticon package. In addition, the determined target emoticon pack image, emoticon pack material, and corresponding emoticon similarity can also be associated, so as to quickly find the target emoticon pack image and emoticon pack material.
S104:提取所述目标表情包图像的文本信息,并将所述文本信息与所述表情包素材进行整合生成目标表情包。S104: Extract text information of the target emoticon package image, and integrate the text information with the emoticon package material to generate a target emoticon package.
具体的,打开视频,启动表情包自动制作命令,后台运行人脸检测算法,定义检测到的人脸为人脸图像1、人脸图像2……人脸图像N,可以在检测到人脸图像1时与表情包图像库中的表情包图像进行表情相似度计算,也可以设定检测时间或检测到人脸图像的数量之后与表情包图像库中的表情包图像进行表情相似度计算。以检测到人脸图像1为例,将识别到的人脸图像1中的人脸与表情包图像库中的人脸进行表情相似度计算得到相应的表情相似度,可以选取表情相似度最大的表情包图像作为目标表情包图像,提取目标表情包图像的文本信息,提取人像图像中的人脸图像作为表情包素材,将文本信息采用字幕添加或者命名方式与表情包素材进行整合生成目标表情包。另外,多张人像图像中的人脸图像分别与表情包图像库中的表情包进行表情相似度计算,选取超出预设阈值的表情相似度对应的表情包图像及人脸图像,这样可以使目标表情包图像和表情包素材精准匹配,也提高了表情包生成的效率。Specifically, open the video, start the emoticon package to automatically make commands, run the face detection algorithm in the background, and define the detected face as face image 1, face image 2...face image N, which can be detected in face image 1 When calculating the expression similarity with the emoticon package image in the emoticon package image library, you can also set the detection time or the number of detected facial images and then perform the expression similarity calculation with the emoticon package image in the emoticon package image library. Take the detected face image 1 as an example, calculate the facial expression similarity between the recognized face image 1 and the face in the emoticon package image library to obtain the corresponding expression similarity, and you can select the one with the greatest expression similarity The emoticon pack image is used as the target emoticon pack image, the text information of the target emoticon pack image is extracted, the face image in the portrait image is extracted as the emoticon pack material, and the text information is added by subtitles or naming method and the emoticon pack material is integrated to generate the target emoticon pack . In addition, the facial images in the multiple portrait images are respectively calculated with the emoticons in the emoticon pack image library, and the emoticon pack images and face images corresponding to the emoticon similarity that exceed the preset threshold are selected, so that the target The accurate matching of the emoticon image and the emoticon material also improves the efficiency of emoticon generation.
进一步地,所述基于所述表情相似度确定目标表情包图像以及表情包素材,包括:Further, the determining the target emoticon package image and the emoticon package material based on the expression similarity includes:
当第一人脸图像与第一表情包图像之间的表情相似度大于或等于第一预设相似度阈值时,将所述第一表情包图像作为目标表情包图像,将所述第一人脸图像作为静态的表情包素材,其中,所述第一人脸图像为所述人像图像中的任一人脸图像,所述第一表情包图像为所述表情包图像库中的任一表情包图像。When the expression similarity between the first face image and the first emoticon pack image is greater than or equal to the first preset similarity threshold, the first emoticon pack image is used as the target emoticon pack image, and the first person A face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image is any emoticon pack in the emoticon pack image library image.
具体的,以人脸图像a为例,当人脸图像a与表情包图像库中的表情包图像的表情相似度大于或者等于第一预设相似度阈值时,该表情包图像作为目标表情包图像,该目标表情包图像可以理解为表情包模板,该表情包模板用于提取文本信息。精准筛选出符合要求的表情包图像及对应的人脸图像,以便快速生成新的表情包,并将生成的表情包保存在该表情包图像库中,可以提供用户下载或者转发,从而提高了用户的活跃度。Specifically, taking the face image a as an example, when the expression similarity between the face image a and the emoticon package image in the emoticon package image library is greater than or equal to the first preset similarity threshold, the emoticon package image is used as the target emoticon package Image, the target emoticon pack image can be understood as an emoticon pack template, and the emoticon pack template is used to extract text information. Precisely screen out the emoticon pack images and corresponding face images that meet the requirements, so as to quickly generate new emoticon packs, and save the generated emoticon packs in the emoticon pack image library, which can be downloaded or forwarded by users, thereby improving users Activity.
如图2所示,图2示出了本申请另一实施例提供的表情包生成方法的实现流程示意图,其中上述S103包括如下S201至S204。As shown in FIG. 2, FIG. 2 shows a schematic diagram of an implementation process of a method for generating an emoticon package according to another embodiment of the present application, wherein the above S103 includes the following S201 to S204.
S201:当第一人脸图像与第一表情包图像之间的表情相似度大于或等于第一预设相似度阈值时,获取所述第一人脸图像在所述人物视频中的属性信息;S201: When the expression similarity between the first face image and the first emoticon package image is greater than or equal to a first preset similarity threshold, obtain attribute information of the first face image in the person video;
具体的,表情相似度可以设定为[0,100]中的数值,第一预设相似度阈值可以设定50,第一人脸图像与第一表情包图像之间的表情相似度为55时符合上述条件,之后获取该第一人脸图像在人物视频中的属性信息,该属性信息包括第一人脸图像在整段视频中的位置、时间等,便于后续根据属性信息从该人物视频中截取相关的视频片段,有效避免了背景信息干扰人脸图像的正常提取。Specifically, the expression similarity can be set to a value in [0,100], the first preset similarity threshold can be set to 50, and the expression similarity between the first face image and the first emoticon package image is 55. After the above conditions, the attribute information of the first face image in the character video is obtained. The attribute information includes the position and time of the first face image in the entire video, so as to facilitate subsequent interception of the character video based on the attribute information Related video clips effectively avoid background information from interfering with the normal extraction of face images.
需要说明的是,人像图像中的每张人脸图像与表情包图像库中的每张表情包图像均存在一一对应的表情相似度,选取表情相似度大于或等于第一预设相似度的人脸图像,人像图像包括人脸图像、背景信息、字幕等,从而提高了提取表情包素材的精准度。It should be noted that there is a one-to-one corresponding expression similarity between each face image in the portrait image and each emoticon package image in the emoticon package image library, and the expression similarity is greater than or equal to the first preset similarity. Face images, portrait images include face images, background information, subtitles, etc., thereby improving the accuracy of extracting emoticon materials.
S202:根据所述属性信息从所述人物视频中获取所述第一人脸图像对应的多帧图像;S202: Acquire multiple frames of images corresponding to the first face image from the person video according to the attribute information;
在本实施例中,第一人脸图像可能在该人物视频中出现一次或者多次,根据第一人脸图像的属性信息从人物视频中截取对应的多帧图像,即视频片,可换言之,对该第一人脸图像进行人脸检测及追踪,根据检测及追踪结果从该人物视频中提取人脸图像的视频帧,以保证表情包素材提取的完整性。In this embodiment, the first face image may appear one or more times in the person video. According to the attribute information of the first face image, the corresponding multi-frame images, that is, a video clip, are intercepted from the person video. In other words, Perform face detection and tracking on the first face image, and extract video frames of the face image from the person video according to the detection and tracking results to ensure the integrity of the emoticon pack material extraction.
S203:计算每帧所述图像与所述第一表情包图像之间的表情相似度;S203: Calculate the expression similarity between each frame of the image and the first expression pack image;
在本实施例中,在从人物视频中获取对应的多帧图像后,可以从获取到的视频帧中检测人脸区域,并在检测到人脸区域后得到人脸框集合,然后基于该人脸框集合从获取到的视频片段中提取出人脸图像,即包含有人脸框内区域的图像,将每帧图像的人脸表情与第一表情包图像进行表情相似度计算得到相应的表情相似度,以便选取满足要求的表情包素材。In this embodiment, after obtaining the corresponding multi-frame images from the person video, the face area can be detected from the obtained video frames, and after the face area is detected, the face frame set is obtained, and then based on the person The face frame collection extracts the face image from the obtained video clips, that is, the image containing the area within the face frame, and calculates the facial expression of each frame of the image with the first expression pack image to obtain the corresponding expression similarity In order to select the emoticon package material that meets the requirements.
S204:当第一图像中的人脸图像与所述第一表情包图像之间的表情相似度大于或等于第二预设相似度阈值时,将所述第一表情包图像作为所述目标表情包图像,将所述第一图像中的人脸图像作为静态的表情包素材,其中,所述第一图像为所述多帧图像中的任一帧图像。S204: When the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold, use the first emoticon package image as the target expression Packet image, using the face image in the first image as a static emoticon packet material, wherein the first image is any one of the multiple frames of images.
在本实施例中,上述执行主体可以基于上述多帧图像所对应的表情相似度,该表情相似度可以处于[0,100]内的数值表示,实践中,数值越小表示表情相似度越低,数值越大表示表情相似度越高。当人像图像中存在第一图像与表情包图像库中存储的第一表情包图像之前的表情相似度大于或者等于第一预设相似度阈值时,例如第一预设相似度可以设为50,获取第一人脸图像在该人物视频中的属性信息,该属性信息包括该第一人脸图像出现的时间、位置等,根据该属性信息可以定位第一人脸图像对应的多帧图像,获取人物视频中该时间前后一定时间(如3秒)的视频片段,获取人物视频中的人脸位置周围的图像组成精简的视频片段,可以去除视频片段中无关的背景信息。计算每帧图像与第一表情包图像之间的表情相似度,判断各帧图像分别与第一表情包图像进行表情相似度计算得到相应的表情相似度,当存在表情相似度大于或者等于第二预设相似度阈值的一帧图像时,将该图像中的人脸图像作为静态的表情包素材,即表情包图片,人脸位置周围的图像大小可以设置为原人脸图像的四倍大小,这样便于区分出人脸图像和背景信息。需要说明的是,第一预设相似度阈值和第二相似度阈值可以相同,也可以不同,根据实际情况设定,此处不做具体限定。In this embodiment, the above-mentioned execution subject may be based on the expression similarity corresponding to the above-mentioned multi-frame images. The expression similarity may be represented by a numerical value within [0,100]. In practice, the smaller the value, the lower the expression similarity. The larger the expression, the higher the similarity. When the facial expression similarity between the first image in the portrait image and the first emoticon package image stored in the emoticon pack image library is greater than or equal to the first preset similarity threshold, for example, the first preset similarity may be set to 50, Obtain the attribute information of the first face image in the character video, the attribute information includes the time and location of the first face image, and the multiple frames of images corresponding to the first face image can be located according to the attribute information, and the A video clip of a certain period of time (such as 3 seconds) before and after the time in the character video is obtained. The images around the face position in the character video are obtained to form a condensed video clip, which can remove irrelevant background information in the video clip. Calculate the expression similarity between each frame of image and the first emoticon package image, and determine that each frame of image is calculated with the first emoticon package image to obtain the corresponding expression similarity. When the expression similarity is greater than or equal to the second When a frame of image with a similarity threshold is preset, the face image in the image is used as a static emoticon pack material, that is, an emoticon pack picture. The size of the image around the face position can be set to four times the size of the original face image. This makes it easy to distinguish between the face image and the background information. It should be noted that the first preset similarity threshold and the second similarity threshold may be the same or different, and are set according to actual conditions, which are not specifically limited here.
可以理解的是,一帧图像中可以包含一个多个人脸区域,也可以不存在人脸区域;而当存在一个或多个人脸区域时,可以获得相应数量的人脸框,进而,获得相应数量的人脸图像,其中,上述的人脸框集合为关于至少一个人脸框的集合,人脸框为包围人脸区域的矩形框,并且,由于人脸框可以通过坐标信息来表征,因此,上述的人脸框集合中具体可以包括至少一个坐标信息,而每一个坐标信息能够确定一个人脸框。为了提取表情包素材的完整表达,可以将人脸框向外扩张得到新的矩形,将该新的矩形包围的图像块取出得到人脸图像。例如,可以采用预先训练的人脸检测模型,从所获取的多帧图像中检测到人脸区域,从而快速精准提取出表情包素材。It is understandable that one frame of image may contain one or more face regions, or there is no face region; and when there are one or more face regions, a corresponding number of face frames can be obtained, and then, a corresponding number can be obtained. In the face image of, the above-mentioned face frame set is a set about at least one face frame, the face frame is a rectangular frame surrounding the face area, and since the face frame can be characterized by coordinate information, therefore, The aforementioned face frame set may specifically include at least one piece of coordinate information, and each piece of coordinate information can determine a face frame. In order to extract the complete expression of the emoticon package material, the face frame can be expanded outward to obtain a new rectangle, and the image block enclosed by the new rectangle can be taken out to obtain the face image. For example, a pre-trained face detection model can be used to detect the face area from the acquired multi-frame images, so as to quickly and accurately extract the emoticon package material.
进一步地,S203之后,还包括:Further, after S203, it also includes:
当第二图像中的人脸图像与所述第一表情包图像之间的表情相似度大于或等于所述第二预设相似度阈值时,将所述第一表情包图像作为所述目标表情包图像,将所述第一图像中的人脸图像以及第二图像中的人脸图像作为动态的表情包素材,其中,所述第二图像为所述多帧图像中与所述第一图像相邻的至少一帧图像。When the expression similarity between the face image in the second image and the first emoticon package image is greater than or equal to the second preset similarity threshold, the first emoticon package image is used as the target expression A packet image, using the face image in the first image and the face image in the second image as dynamic emoticon packet materials, wherein the second image is the same as the first image in the multi-frame image At least one adjacent frame of image.
在本实施例中,以选取三帧图像为例,第一图像和第二图像分别与第一表情包图像的表情相似度均符合条件,即表情相似度大于或等于第二预设相似度阈值时,可以将第一图像中的人脸图像和第二图像中的人脸图像作为动态的表情包素材。若第一图像与第一表情包图像的表情相似度符合条件、并且第三图像与第一表情包图像的表情相似度符合条件,第二图像与第一表情包图像的表情相似度不符合条件,也可以将该三帧图像中的人脸图像作为动态的表情包素材,即可以生成动态表情包,扩大了表情包生成的类型,也增添了使用表情包的趣味。In this embodiment, taking the selection of three images as an example, the expression similarity between the first image and the second image and the first emoticon package image respectively meet the condition, that is, the expression similarity is greater than or equal to the second preset similarity threshold. At this time, the face image in the first image and the face image in the second image can be used as dynamic emoticon package materials. If the expression similarity between the first image and the first emoticon pack image meets the condition, and the expression similarity between the third image and the first emoticon pack image meets the condition, the expression similarity between the second image and the first emoticon pack image does not meet the condition , The face images in the three frames of images can also be used as dynamic emoticon pack materials, that is, dynamic emoticon packs can be generated, which expands the types of emoticon pack generation and adds the fun of using emoticon packs.
进一步地,S203之后,还包括:Further, after S203, it also includes:
当第三图像中的人脸图像与第二表情包图像之间的表情相似度最大时,将所述第二表情包图像作为所述目标表情包图像,将所述第三图像中的人脸图像作为静态的表情包素材,其中,所述第三图像为所述多帧图像中与所述第一图像相邻的至少一帧图像。When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image is used as a static emoticon package material, wherein the third image is at least one frame of image adjacent to the first image among the multiple frames of images.
在本实施例中,以选取三帧图像为例,当该多帧图像中的第三图像中的人脸图像与第二表情包图像的表情相似度最大时,将该第二表情包图像作为目标表情包图像,将第三图像中的人脸图像作为静态的表情包素材,即表情包图片。通过选取表情相似度最大的表情包图像及对应的图像中的人脸图像,可以精确地提取出待生成的表情包图像,使生成的表情包表达情感的准确性。In this embodiment, taking the selection of three frames of images as an example, when the facial image in the third image in the multi-frame images has the greatest expression similarity with the second emoticon pack image, the second emoticon pack image is taken as The target emoticon package image, the face image in the third image is used as the static emoticon package material, that is, the emoticon package picture. By selecting the emoticon pack image with the greatest expression similarity and the face image in the corresponding image, the emoticon pack image to be generated can be accurately extracted, so that the generated emoticon pack can express emotion accurately.
进一步地,S203之后,还包括:Further, after S203, it also includes:
当第三图像中的人脸图像与第二表情包图像之间的表情相似度最大时,将所述第二表情包图像作为所述目标表情包图像,将所述第三图像中的人脸图像以及第四图像作为动态的表情包素材,其中,所述第四图像为所述多帧图像中连续的至少两帧图像,且所述至少两帧图像中至少有一帧图像与所述第一图像相邻。When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image and the fourth image are used as dynamic emoticon package materials, wherein the fourth image is at least two consecutive images in the multi-frame images, and at least one of the at least two images is related to the first image. The images are adjacent.
在本实施例中,以表情包图像库中有两张表情包图像为例,人像图像中的人脸图像与第一表情包图像之间的表情相似度记为第一表情相似度,该人像图像中的人脸图像与第二表情包图像之间的表情相似度记为第二表情相似度,当第一表情相似度小于第二表情相似度时,选取第二表情相似度对应的第二表情包图像作为目标表情包图像。在选取的多帧图像中可能存在图像中的人脸图像与表情包图像之间的相似度最大,可以将该图像的前后几帧图像中连续的人脸图像作为动态的表情包素材,以提高表情包的显示效果。In this embodiment, taking two emoticon pack images in the emoticon pack image library as an example, the expression similarity between the face image in the portrait image and the first emoticon pack image is recorded as the first expression similarity, and the portrait The expression similarity between the face image in the image and the second expression pack image is recorded as the second expression similarity. When the first expression similarity is less than the second expression similarity, the second expression similarity corresponding to the second expression similarity is selected. The emoticon package image is used as the target emoticon package image. In the selected multi-frame images, there may be the greatest similarity between the face image in the image and the emoticon pack image. The continuous face images in the previous and subsequent frames of the image can be used as the dynamic emoticon pack material to improve The display effect of the emoticon package.
在另一实施例中,为了提高目标表情包的显示效果,用户可以根据自己的意愿对该目标表情包进行编辑或者添加道具效果,例如帽子、蘑菇头等效果的道具,和/或,添加一些艺术字、水印等,将该目标表情包制作成预设格式存储在供用户操作的界面、聊天工具等表情包图像库中,该预设格式是根据需要进行设定的,例如,预设格式可以是GIF格式,GIF格式可以存储多个图像,将保存在一个文件中的多个图像读取出来并显示在屏幕上,可构成一种简单的动画,以提高用户的可操作性和体验度。In another embodiment, in order to improve the display effect of the target emoticon package, the user can edit the target emoticon package according to his own wishes or add props effects, such as hats, mushroom heads and other effect props, and/or add some art Words, watermarks, etc., the target emoticon package is made into a preset format and stored in the emoticon package image library such as interfaces and chat tools for users to operate. The preset format is set according to needs. For example, the preset format can be It is a GIF format. The GIF format can store multiple images. Multiple images saved in a file can be read out and displayed on the screen to form a simple animation to improve user operability and experience.
在本申请实施例中,通过从待处理的人物视频中提取出人像图像,所述人像图像包含人脸图像,再计算所述人像图像中的人脸图像与预设的表情包图像库中的表情包图像之间的表情相似度,基于所述表情相似度确定目标表情包图像以及表情包素材,其中,所述目标表情包图像属于所述表情包图像库,所述表情包素材属于所述人像图像,提取所述目标表情包图像的文本信息,并将所述文本信息与所述表情包素材进行整合生成目标表情包。由于本发明可以根据表情相似度来确定目标表情包图像和表情包素材,并自动将表情包图像的文本信息与表情包素材进行整合,无需用户手动选择表情包图像,减轻了用户的操作负担,使用户可以快速、轻松的在视频中自己制作表情包。根据表情相似度从人物视频中提取人脸图像或者视频片段作为表情包素材,实现了基于表情相似度计算的表情包的生成,提高了表情包生成的效率。In the embodiment of the present application, by extracting a portrait image from a person video to be processed, the portrait image contains a face image, and then calculating the face image in the portrait image and the preset emoticon image library Based on the expression similarity between the emoticon package images, the target emoticon package image and the emoticon package material are determined based on the expression similarity, wherein the target emoticon package image belongs to the emoticon package image library, and the emoticon package material belongs to the A portrait image, extracting text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package. Since the present invention can determine the target emoticon pack image and emoticon pack material according to the degree of expression similarity, and automatically integrate the text information of the emoticon pack image with the emoticon pack material, the user does not need to manually select the emoticon pack image, which reduces the user's operational burden, Allow users to quickly and easily make their own emoticons in the video. According to the facial expression similarity, the facial image or video fragment is extracted from the character video as the emoticon package material, which realizes the generation of the emoticon package based on the expression similarity calculation, and improves the efficiency of emoticon generation.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。对应于上文实施例所述的表情包生成方法。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the sequence of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application. It corresponds to the emoticon package generation method described in the above embodiment.
请参阅3,图3示出了本申请提供的表情包生成装置300的结构示意图,如图3所示,包括:Please refer to 3. FIG. 3 shows a schematic structural diagram of an emoticon package generating apparatus 300 provided by the present application, as shown in FIG. 3, including:
获取模块310,用于从待处理的人物视频中获取至少一张人像图像,所述人像图像中包含人脸图像;The obtaining module 310 is configured to obtain at least one portrait image from a character video to be processed, where the portrait image includes a face image;
计算模块320,用于计算所述人像图像中的人脸图像与预设的表情包图像库中的表情包图像之间的表情相似度;The calculation module 320 is configured to calculate the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;
确定模块330,用于基于所述表情相似度确定目标表情包图像以及表情包素材,其中,所述目标表情包图像属于所述表情包图像库,所述表情包素材属于所述人像图像;The determining module 330 is configured to determine a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
生成模块340,用于提取所述目标表情包图像的文本信息,并将所述文本信息与所述表情包素材进行整合生成目标表情包。The generating module 340 is configured to extract text information of the target emoticon package image, and integrate the text information with the emoticon package material to generate a target emoticon package.
可选地,所述确定模块330具体用于:当第一人脸图像与第一表情包图像之间的表情相似度大于或等于第一预设相似度阈值时,将所述第一表情包图像作为目标表情包图像,将所述第一人脸图像作为静态的表情包素材,其中,所述第一人脸图像为所述人像图像中的任一人脸图像,所述第一表情包图像为所述表情包图像库中的任一表情包图像。Optionally, the determining module 330 is specifically configured to: when the expression similarity between the first facial image and the first emoticon package image is greater than or equal to a first preset similarity threshold, the first emoticon package The image is used as a target emoticon pack image, and the first face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image Is any emoticon package image in the emoticon package image library.
可选地,所述确定模块330具体包括:Optionally, the determining module 330 specifically includes:
第一获取单元,用于当第一人脸图像与第一表情包图像之间的表情相似度大于或等于第一预设相似度阈值时,获取所述第一人脸图像在所述人物视频中的属性信息;The first acquiring unit is configured to acquire the first face image in the person video when the expression similarity between the first face image and the first emoticon package image is greater than or equal to a first preset similarity threshold. Attribute information in;
第二获取单元,用于根据所述属性信息从所述人物视频中获取所述第一人脸图像对应的多帧图像;A second acquiring unit, configured to acquire multiple frames of images corresponding to the first face image from the person video according to the attribute information;
计算单元,用于计算每帧所述图像与所述第一表情包图像之间的表情相似度;A calculation unit, configured to calculate the expression similarity between each frame of the image and the first expression pack image;
第一素材确定单元,用于当第一图像中的人脸图像与所述第一表情包图像之间的表情相似度大于或等于第二预设相似度阈值时,将所述第一表情包图像作为所述目标表情包图像,将所述第一图像中的人脸图像作为静态的表情包素材,其中,所述第一图像为所述多帧图像中的任一帧图像。The first material determining unit is configured to combine the first emoticon package when the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold An image is used as the target emoticon pack image, and the face image in the first image is used as a static emoticon pack material, wherein the first image is any one of the multiple frames of images.
可选地,所述确定模块330还可以包括:Optionally, the determining module 330 may further include:
第二素材确定单元,用于在所述计算单元计算得到每帧所述图像与所述第一表情包图像之间的表情相似度之后,当第二图像中的人脸图像与所述第一表情包图像之间的表情相似度大于或等于所述第二预设相似度阈值时,将所述第一表情包图像作为所述目标表情包图像,将所述第一图像中的人脸图像以及第二图像中的人脸图像作为动态的表情包素材,其中,所述第二图像为所述多帧图像中与所述第一图像相邻的至少一帧图像。The second material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon pack image, when the face image in the second image is the same as the first emoticon pack image. When the expression similarity between the emoticon package images is greater than or equal to the second preset similarity threshold, the first emoticon package image is used as the target emoticon package image, and the face image in the first image And the face image in the second image is used as a dynamic emoticon package material, wherein the second image is at least one frame of image adjacent to the first image among the multiple frames of images.
可选地,所述确定模块330还可以包括:Optionally, the determining module 330 may further include:
第三素材确定单元,用于在所述计算单元计算得到每帧所述图像与所述第一表情包图像之间的表情相似度之后,当第三图像中的人脸图像与第二表情包图像之间的表情相似度最大时,将所述第二表情包图像作为所述目标表情包图像,将所述第三图像中的人脸图像作为静态的表情包素材,其中,所述第三图像为所述多帧图像中与所述第一图像相邻的至少一帧图像。The third material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon package image, when the face image in the third image is compared with the second emoticon package When the expression similarity between the images is the greatest, the second emoticon pack image is used as the target emoticon pack image, and the face image in the third image is used as the static emoticon pack material, wherein the third emoticon pack image The image is at least one frame of image adjacent to the first image among the multiple frames of images.
可选地,所述确定模块330还可以包括:Optionally, the determining module 330 may further include:
第四素材确定单元,用于在所述计算单元计算得到每帧所述图像与所述第一表情包图像之间的表情相似度之后,当第三图像中的人脸图像与第二表情包图像之间的表情相似度最大时,将所述第二表情包图像作为所述目标表情包图像,将所述第三图像中的人脸图像以及第四图像作为动态的表情包素材,其中,所述第四图像为所述多帧图像中连续的至少两帧图像,且所述至少两帧图像中至少有一帧图像与所述第一图像相邻。The fourth material determining unit is configured to: after the calculation unit calculates the expression similarity between each frame of the image and the first emoticon package image, when the face image in the third image is compared with the second emoticon package When the expression similarity between the images is maximum, the second emoticon pack image is used as the target emoticon pack image, and the face image and the fourth image in the third image are used as dynamic emoticon pack materials, wherein, The fourth image is at least two consecutive frames of images in the multiple frames of images, and at least one of the at least two frames of images is adjacent to the first image.
可选地,所述获取模块310还用于通过预设的深度神经网络提取所述人脸图像的人脸表情特征,所述计算模块320将所述人脸表情特征与所述表情包图像的表情特征进行特征比对得到所述表情相似度。Optionally, the acquisition module 310 is further configured to extract facial expression features of the facial image through a preset deep neural network, and the calculation module 320 compares the facial expression features with those of the emoticon package image. The expression features are compared to obtain the expression similarity.
请参阅图4,图4是本申请实施例还提供的终端设备的结构示意图,如图4所示,终端设备400包括存储器410、至少一个处理器420以及存储在所述存储器410中并可在所述处理器420上运行的计算机程序430,所述处理器420执行所述计算机程序430时实现上述的表情包生成方法。Please refer to FIG. 4, which is a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 4, the terminal device 400 includes a memory 410, at least one processor 420, and is stored in the memory 410 and can be stored in the memory 410. The computer program 430 running on the processor 420 implements the aforementioned emoticon package generation method when the processor 420 executes the computer program 430.
终端设备400可以是桌上型计算机、手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等终端设备上,本申请实施例对终端设备的具体类型不作任何限制。The terminal device 400 may be a desktop computer, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra mobile personal computer (ultra -mobile personal On terminal devices such as a computer (UMPC), a netbook, and a personal digital assistant (personal digital assistant, PDA), the embodiments of this application do not impose any restrictions on the specific types of terminal devices.
该终端设备400可包括但不仅限于处理器420、存储器410。本领域技术人员可以理解,图4仅仅是终端设备400的举例,并不构成对终端设备400的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备等。The terminal device 400 may include but is not limited to a processor 420 and a memory 410. Those skilled in the art can understand that FIG. 4 is only an example of the terminal device 400, and does not constitute a limitation on the terminal device 400. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices.
所称处理器420可以是中央处理单元(Central Processing Unit,CPU),该处理器420还可以是其他通用处理器、数字信号处理器 (Digital Signal Processor,DSP)、专用集成电路 (Application Specific Integrated Circuit,ASIC)、现成可编程门阵列 (Field-Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 420 may be a central processing unit (Central Processing Unit, CPU), and the processor 420 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
所述存储器410在一些实施例中可以是终端设备400的内部存储单元,例如终端设备400的硬盘或内存。所述存储器410在另一些实施例中也可以是所述终端设备400的外部存储设备,例如终端设备400上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器410还可以既包括终端设备400的内部存储单元也包括外部存储设备。所述存储器410用于存储操作系统、应用程序、引导装载程序(Boot Loader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器410还可以用于暂时地存储已经输出或者将要输出的数据。In some embodiments, the memory 410 may be an internal storage unit of the terminal device 400, such as a hard disk or a memory of the terminal device 400. In other embodiments, the memory 410 may also be an external storage device of the terminal device 400, such as a plug-in hard disk equipped on the terminal device 400, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card, etc. Further, the memory 410 may also include both an internal storage unit of the terminal device 400 and an external storage device. The memory 410 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, such as the program code of the computer program. The memory 410 may also be used to temporarily store data that has been output or will be output.
需要说明的是,上述表情包生成装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction and execution process between the above-mentioned emoticon package generating apparatus/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment. Part, I won’t repeat it here.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述辅助拍摄装置中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above-mentioned auxiliary photographing device, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。The embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在移动终端上运行时,使得移动终端执行时实现可实现上述各个方法实施例中的步骤。The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiments and methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (Read-Only Memory, ROM), and random access memory (Random Access Memory, RAM), electric carrier signal, telecommunications signal, and software distribution medium. Such as U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, in accordance with legislation and patent practices, computer-readable media cannot be electrical carrier signals and telecommunication signals.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。The reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed apparatus/network equipment and method may be implemented in other ways. For example, the device/network device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims (10)

  1. 一种表情包生成方法,其特征在于,包括: A method for generating emoticons, which is characterized in that it includes:
    从待处理的人物视频中获取至少一张人像图像,所述人像图像中包含人脸图像;Acquiring at least one portrait image from a character video to be processed, where the portrait image includes a face image;
    计算所述人像图像中的人脸图像与预设的表情包图像库中的表情包图像之间的表情相似度;Calculating the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;
    基于所述表情相似度确定目标表情包图像以及表情包素材,其中,所述目标表情包图像属于所述表情包图像库,所述表情包素材属于所述人像图像;Determining a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
    提取所述目标表情包图像的文本信息,并将所述文本信息与所述表情包素材进行整合生成目标表情包。Extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate a target emoticon package.
  2. 根据权利要求1所述的表情包生成方法,其特征在于,所述基于所述表情相似度确定目标表情包图像以及表情包素材,包括: The emoticon package generation method according to claim 1, wherein the determining the target emoticon package image and the emoticon package material based on the expression similarity comprises:
    当第一人脸图像与第一表情包图像之间的表情相似度大于或等于第一预设相似度阈值时,将所述第一表情包图像作为目标表情包图像,将所述第一人脸图像作为静态的表情包素材,其中,所述第一人脸图像为所述人像图像中的任一人脸图像,所述第一表情包图像为所述表情包图像库中的任一表情包图像。When the expression similarity between the first face image and the first emoticon pack image is greater than or equal to the first preset similarity threshold, the first emoticon pack image is used as the target emoticon pack image, and the first person A face image is used as a static emoticon pack material, wherein the first face image is any face image in the portrait image, and the first emoticon pack image is any emoticon pack in the emoticon pack image library image.
  3. 根据权利要求1所述的表情包生成方法,其特征在于,所述基于所述表情相似度确定目标表情包图像以及表情包素材,包括: The emoticon package generation method according to claim 1, wherein the determining the target emoticon package image and the emoticon package material based on the expression similarity comprises:
    当第一人脸图像与第一表情包图像之间的表情相似度大于或等于第一预设相似度阈值时,获取所述第一人脸图像在所述人物视频中的属性信息;When the expression similarity between the first face image and the first emoticon package image is greater than or equal to the first preset similarity threshold, acquiring attribute information of the first face image in the person video;
    根据所述属性信息从所述人物视频中获取所述第一人脸图像对应的多帧图像;Acquiring multiple frames of images corresponding to the first face image from the person video according to the attribute information;
    计算每帧所述图像与所述第一表情包图像之间的表情相似度;Calculating the expression similarity between each frame of the image and the first emoticon package image;
    当第一图像中的人脸图像与所述第一表情包图像之间的表情相似度大于或等于第二预设相似度阈值时,将所述第一表情包图像作为所述目标表情包图像,将所述第一图像中的人脸图像作为静态的表情包素材,其中,所述第一图像为所述多帧图像中的任一帧图像。When the expression similarity between the face image in the first image and the first emoticon package image is greater than or equal to a second preset similarity threshold, the first emoticon package image is used as the target emoticon package image , Using the face image in the first image as a static emoticon package material, wherein the first image is any one of the multiple frames of images.
  4. 根据权利要求3所述的表情包生成方法,其特征在于,所述计算每帧所述图像与所述第一表情包图像之间的表情相似度之后,还包括: The emoticon package generation method according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further comprises:
    当第二图像中的人脸图像与所述第一表情包图像之间的表情相似度大于或等于所述第二预设相似度阈值时,将所述第一表情包图像作为所述目标表情包图像,将所述第一图像中的人脸图像以及第二图像中的人脸图像作为动态的表情包素材,其中,所述第二图像为所述多帧图像中与所述第一图像相邻的至少一帧图像。When the expression similarity between the face image in the second image and the first emoticon package image is greater than or equal to the second preset similarity threshold, the first emoticon package image is used as the target expression A packet image, using the face image in the first image and the face image in the second image as dynamic emoticon packet materials, wherein the second image is the same as the first image in the multi-frame image At least one adjacent frame of image.
  5. 根据权利要求3所述的表情包生成方法,其特征在于,所述计算每帧所述图像与所述第一表情包图像之间的表情相似度之后,还包括: The emoticon package generation method according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further comprises:
    当第三图像中的人脸图像与第二表情包图像之间的表情相似度最大时,将所述第二表情包图像作为所述目标表情包图像,将所述第三图像中的人脸图像作为静态的表情包素材,其中,所述第三图像为所述多帧图像中与所述第一图像相邻的至少一帧图像。When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image is used as a static emoticon package material, wherein the third image is at least one frame of image adjacent to the first image among the multiple frames of images.
  6. 根据权利要求3所述的表情包生成方法,其特征在于,所述计算每帧所述图像与所述第一表情包图像之间的表情相似度之后,还包括: The emoticon package generation method according to claim 3, wherein after calculating the expression similarity between each frame of the image and the first emoticon package image, the method further comprises:
    当第三图像中的人脸图像与第二表情包图像之间的表情相似度最大时,将所述第二表情包图像作为所述目标表情包图像,将所述第三图像中的人脸图像以及第四图像作为动态的表情包素材,其中,所述第四图像为所述多帧图像中连续的至少两帧图像,且所述至少两帧图像中至少有一帧图像与所述第一图像相邻。When the facial expression similarity between the face image in the third image and the second emoticon package image is the largest, the second emoticon package image is used as the target emoticon package image, and the human face in the third image The image and the fourth image are used as dynamic emoticon package materials, wherein the fourth image is at least two consecutive images in the multi-frame images, and at least one of the at least two images is related to the first image. The images are adjacent.
  7. 根据权利要求1至6任一项所述的表情包生成方法,其特征在于,所述计算所述人脸图像与所述表情包图像库中的表情包图像之间的表情相似度,包括:The emoticon package generation method according to any one of claims 1 to 6, wherein the calculating the expression similarity between the face image and the emoticon package image in the emoticon package image library comprises:
    通过预设的深度神经网络提取所述人脸图像的人脸表情特征;Extracting facial expression features of the facial image through a preset deep neural network;
    将所述人脸表情特征与所述表情包图像的表情特征进行特征比对得到所述表情相似度。The facial expression feature is compared with the expression feature of the emoticon package image to obtain the expression similarity.
  8. 一种表情包生成装置,其特征在于,包括: An emoticon package generating device, which is characterized in that it comprises:
    获取模块,用于从待处理的人物视频中获取至少一张人像图像,所述人像图像中包含人脸图像;An obtaining module, configured to obtain at least one portrait image from a character video to be processed, the portrait image including a face image;
    计算模块,用于计算所述人像图像中的人脸图像与预设的表情包图像库中的表情包图像之间的表情相似度;A calculation module for calculating the expression similarity between the face image in the portrait image and the emoticon package image in the preset emoticon package image library;
    确定模块,用于基于所述表情相似度确定目标表情包图像以及表情包素材,其中,所述目标表情包图像属于所述表情包图像库,所述表情包素材属于所述人像图像;A determining module, configured to determine a target emoticon pack image and emoticon pack material based on the expression similarity, wherein the target emoticon pack image belongs to the emoticon pack image library, and the emoticon pack material belongs to the portrait image;
    生成模块,用于提取所述目标表情包图像的文本信息,并将所述文本信息与所述表情包素材进行整合生成目标表情包。The generating module is used for extracting the text information of the target emoticon package image, and integrating the text information with the emoticon package material to generate the target emoticon package.
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的表情包生成方法。 A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program as claimed in claims 1 to 7. Any one of the emoticon package generation method.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的表情包生成方法。A computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein the computer program is executed by a processor to implement the emoticon package generation method according to any one of claims 1 to 7 .
PCT/CN2020/129209 2019-11-29 2020-11-17 Meme generation method and apparatus, and terminal device WO2021104097A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911197094.5 2019-11-29
CN201911197094.5A CN110889379B (en) 2019-11-29 2019-11-29 Expression package generation method and device and terminal equipment

Publications (1)

Publication Number Publication Date
WO2021104097A1 true WO2021104097A1 (en) 2021-06-03

Family

ID=69749402

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129209 WO2021104097A1 (en) 2019-11-29 2020-11-17 Meme generation method and apparatus, and terminal device

Country Status (2)

Country Link
CN (1) CN110889379B (en)
WO (1) WO2021104097A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889379B (en) * 2019-11-29 2024-02-20 深圳先进技术研究院 Expression package generation method and device and terminal equipment
CN111372141B (en) * 2020-03-18 2024-01-05 腾讯科技(深圳)有限公司 Expression image generation method and device and electronic equipment
CN111586466B (en) * 2020-05-08 2021-05-28 腾讯科技(深圳)有限公司 Video data processing method and device and storage medium
CN111768481A (en) * 2020-05-19 2020-10-13 北京奇艺世纪科技有限公司 Expression package generation method and device
CN111753131A (en) * 2020-06-28 2020-10-09 北京百度网讯科技有限公司 Expression package generation method and device, electronic device and medium
CN111881776B (en) * 2020-07-07 2023-07-07 腾讯科技(深圳)有限公司 Dynamic expression acquisition method and device, storage medium and electronic equipment
CN113436297A (en) * 2021-07-15 2021-09-24 维沃移动通信有限公司 Picture processing method and electronic equipment
CN117150063B (en) * 2023-10-26 2024-02-06 深圳慢云智能科技有限公司 Image generation method and system based on scene recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803909A (en) * 2017-02-21 2017-06-06 腾讯科技(深圳)有限公司 The generation method and terminal of a kind of video file
CN107239535A (en) * 2017-05-31 2017-10-10 北京小米移动软件有限公司 Similar pictures search method and device
CN110162670A (en) * 2019-05-27 2019-08-23 北京字节跳动网络技术有限公司 Method and apparatus for generating expression packet
US20190354791A1 (en) * 2018-05-17 2019-11-21 Idemia Identity & Security France Character recognition method
CN110889379A (en) * 2019-11-29 2020-03-17 深圳先进技术研究院 Expression package generation method and device and terminal equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369196B (en) * 2017-06-30 2021-08-24 Oppo广东移动通信有限公司 Expression package manufacturing method and device, storage medium and electronic equipment
US10593087B2 (en) * 2017-10-23 2020-03-17 Paypal, Inc. System and method for generating emoji mashups with machine learning
CN109508399A (en) * 2018-11-20 2019-03-22 维沃移动通信有限公司 A kind of facial expression image processing method, mobile terminal
CN110321845B (en) * 2019-07-04 2021-06-18 北京奇艺世纪科技有限公司 Method and device for extracting emotion packets from video and electronic equipment
CN110458916A (en) * 2019-07-05 2019-11-15 深圳壹账通智能科技有限公司 Expression packet automatic generation method, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803909A (en) * 2017-02-21 2017-06-06 腾讯科技(深圳)有限公司 The generation method and terminal of a kind of video file
CN107239535A (en) * 2017-05-31 2017-10-10 北京小米移动软件有限公司 Similar pictures search method and device
US20190354791A1 (en) * 2018-05-17 2019-11-21 Idemia Identity & Security France Character recognition method
CN110162670A (en) * 2019-05-27 2019-08-23 北京字节跳动网络技术有限公司 Method and apparatus for generating expression packet
CN110889379A (en) * 2019-11-29 2020-03-17 深圳先进技术研究院 Expression package generation method and device and terminal equipment

Also Published As

Publication number Publication date
CN110889379A (en) 2020-03-17
CN110889379B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
WO2021104097A1 (en) Meme generation method and apparatus, and terminal device
CN108833973B (en) Video feature extraction method and device and computer equipment
US20220350842A1 (en) Video tag determination method, terminal, and storage medium
WO2019041521A1 (en) Apparatus and method for extracting user keyword, and computer-readable storage medium
CN111489290B (en) Face image super-resolution reconstruction method and device and terminal equipment
WO2019153504A1 (en) Group creation method and terminal thereof
CN108961267B (en) Picture processing method, picture processing device and terminal equipment
CN108898082B (en) Picture processing method, picture processing device and terminal equipment
CN111814770A (en) Content keyword extraction method of news video, terminal device and medium
CN111209970A (en) Video classification method and device, storage medium and server
WO2020259449A1 (en) Method and device for generating short video
JP2021034003A (en) Human object recognition method, apparatus, electronic device, storage medium, and program
CN112532882B (en) Image display method and device
WO2023197648A1 (en) Screenshot processing method and apparatus, electronic device, and computer readable medium
WO2020135756A1 (en) Video segment extraction method, apparatus and device, and computer-readable storage medium
CN111818385B (en) Video processing method, video processing device and terminal equipment
WO2021135286A1 (en) Video processing method, video searching method, terminal device, and computer-readable storage medium
CN113205047A (en) Drug name identification method and device, computer equipment and storage medium
CN109886239B (en) Portrait clustering method, device and system
CN111128233A (en) Recording detection method and device, electronic equipment and storage medium
CN110232267B (en) Business card display method and device, electronic equipment and storage medium
CN108932704B (en) Picture processing method, picture processing device and terminal equipment
WO2023173659A1 (en) Face matching method and apparatus, electronic device, storage medium, computer program product, and computer program
CN115544214A (en) Event processing method and device and computer readable storage medium
CN113361486A (en) Multi-pose face recognition method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20893238

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20893238

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 19/01/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20893238

Country of ref document: EP

Kind code of ref document: A1