CN111461962A - Image processing method, electronic equipment and computer readable storage medium - Google Patents

Image processing method, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111461962A
CN111461962A CN202010232138.XA CN202010232138A CN111461962A CN 111461962 A CN111461962 A CN 111461962A CN 202010232138 A CN202010232138 A CN 202010232138A CN 111461962 A CN111461962 A CN 111461962A
Authority
CN
China
Prior art keywords
image
frame
frames
key
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010232138.XA
Other languages
Chinese (zh)
Inventor
赵琦
王科
张健
颜忠伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MIGU Culture Technology Co Ltd
Original Assignee
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MIGU Culture Technology Co Ltd filed Critical MIGU Culture Technology Co Ltd
Priority to CN202010232138.XA priority Critical patent/CN111461962A/en
Publication of CN111461962A publication Critical patent/CN111461962A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image processing method, an electronic device and a computer readable storage medium, wherein the image processing method comprises the following steps: extracting key frames from a video to be processed; selecting a target image area in the key frame, wherein the target image area comprises a face image area; and carrying out exaggeration processing on the face image area to generate a cartoon image. According to the embodiment of the invention, the key frame is extracted from the video to be processed, the region containing the face image in the key frame is selected as the target image region, and the face image region is subjected to exaggeration processing to be matched with the story plot, so that manual creation is not needed, the creation efficiency of the cartoon can be effectively improved, and the creation cost of the cartoon can be reduced.

Description

Image processing method, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to an image processing method, an electronic device, and a computer-readable storage medium.
Background
Caricatures are enjoyed by more and more people as a popular artistic expression. However, the creation threshold of the comic is relatively high, so that the creator needs to have good drawing capability, composition capability, narrative capability and the like, and meanwhile, the creation of the comic is very time-consuming work. At present, the cartoon creation mostly adopts a mode of manually drawing the cartoon by a creator, and uses some computer software to assist creation, but due to frequent updating of some videos, such as football match, the generation speed of the match videos exceeds the creation speed of the cartoon, so that the updating frequency of the cartoon is not high.
Disclosure of Invention
The invention provides an image processing method, electronic equipment and a computer-readable storage medium, which aim to solve the problem of low creation speed of cartoons.
An embodiment of the present invention provides an image processing method, including:
extracting key frames from a video to be processed;
selecting a target image area in the key frame, wherein the target image area comprises a face image area;
and carrying out exaggeration processing on the face image area to generate a cartoon image.
Optionally, the extracting key frames from the video to be processed includes:
dividing the video to be processed into at least one shot sequence, wherein the shot sequence comprises at least one frame of image;
performing frame pixel detection on the at least one shot sequence to determine candidate frames;
and extracting the key frame from the candidate frames.
Optionally, segmenting the video to be processed into at least one shot sequence includes:
acquiring a total frame difference between two adjacent frames of images in the video to be processed;
judging whether two adjacent frames of images belong to the same shot sequence or not according to the total frame difference, and obtaining a judgment result;
and dividing the video to be processed into different shot sequences according to the judgment result.
Optionally, the performing frame pixel detection on the at least one shot sequence and determining candidate frames includes:
extracting macro scene description feature vectors of all image frames in a target shot sequence;
and taking the frame with the macro scene description feature vector smaller than a first threshold value as the candidate frame.
Optionally, the extracting the key frame from the candidate frames includes:
calculating a frame difference between two adjacent frames in the candidate frames;
and if the frame difference is larger than a second threshold value, taking the image frame with the later display time in the two adjacent frames as the key frame.
Optionally, the key frame includes a sudden change frame, and the selecting a target image region in the key frame includes:
extracting key areas in the mutation frames, wherein the key areas comprise face image areas;
and filtering the key area to obtain the target image area.
Optionally, the performing an exaggeration process on the face image area to generate a cartoon image includes:
selecting a target part image to be processed in the face image area;
and performing exaggeration processing on the target part image to generate a cartoon image.
Optionally, before the performing the exaggeration process on the target part image, the method further includes:
determining a deformation central point and a deformation radius of the target part image;
and determining an exaggerated deformation range of the target part image according to the deformation central point and the deformation radius.
Optionally, the performing an exaggeration process on the target part image to generate a cartoon image includes:
according to the mapping relationComprises the following steps:
Figure BDA0002429592420000021
generating an exaggeration-processed cartoon image;
wherein, P is the original image of the face image area, and Q is the cartoon image after the face image area is subjected to the exaggeration treatment; piCharacteristic points, Q, of the image of said target portion in the original imageiFeature points of the target part image subjected to the exaggeration processing; riIs the deformation radius of the target part image;
b is a deformation basis function, B (t) ═ 1-t)2And t is the time corresponding to the characteristic point.
In accordance with another aspect of the present invention, there is provided an electronic apparatus including: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the image processing method described above when executing the computer program.
According to another aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the image processing method described above.
According to the embodiment of the invention, the key frame is extracted from the video to be processed, the region containing the face image in the key frame is selected as the target image region, and the face image region is subjected to exaggeration processing to be matched with the story plot, so that manual creation is not needed, the creation efficiency of the cartoon can be effectively improved, and the creation cost of the cartoon can be reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of an image processing method according to an embodiment of the present invention;
FIG. 2 is a second flowchart of an image processing method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating key regions extracted according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an exaggerated treatment of an eye according to an embodiment of the present invention;
FIG. 5 is a schematic view of a mouth exaggeration process according to an embodiment of the present invention;
FIG. 6 is one of the schematic diagrams of a dialog box according to an embodiment of the invention;
FIG. 7 is a second schematic diagram of a dialog box according to an embodiment of the present invention;
FIG. 8 is a third flowchart illustrating an image processing method according to an embodiment of the invention;
FIG. 9 is a schematic diagram showing a configuration of an image processing apparatus according to an embodiment of the present invention;
fig. 10 is a schematic diagram of an implementation structure of the electronic device according to the embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided only to help the full understanding of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention. In addition, the terms "system" and "network" are often used interchangeably herein.
In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
In the embodiment of the present invention, the electronic device may be a mobile phone (or a mobile phone), or other devices capable of sending or receiving wireless signals, including user Equipment, a Personal Digital Assistant (PDA), a wireless modem, a wireless communication apparatus, a handheld apparatus, a laptop computer, a cordless phone, a wireless local loop (W LL) station, a CPE (Customer premises Equipment) or a mobile intelligent hotspot capable of converting a mobile signal into a WiFi signal, an intelligent appliance, or other devices capable of autonomously communicating with a mobile communication network without human operation.
The embodiment of the invention provides an image processing method, which solves the problem of low creation speed of a cartoon in the prior art.
As shown in fig. 1, the image processing method includes:
step 101, extracting key frames from a video to be processed.
The video to be processed is a video material which needs to be converted into a cartoon image. The video segment typically includes a plurality of shots, such as a football game video, which may include sequences of shots, including long shots, medium shots, short shots, close-ups, replay shots, off-scene shots, and the like. The shot sequence is composed of a huge number of frames, and if the shot sequence is detected frame by frame, the amount of data to be processed is too large, so that in order to increase the video processing efficiency and reduce the unnecessary data processing amount, a key frame is required to be extracted from the video to be processed, and the key frame is a meaningful and representative frame in the video to be processed.
And 102, selecting a target image area in the key frame, wherein the target image area comprises a face image area.
The target image area is an area needing cartoon conversion. Because the video frame contains a large amount of picture information, the cartoon image only needs to intercept important areas, such as the face area of a figure image, a football in a football match, a ball frame and other key areas. The target image region may be extracted using a deep learning algorithm. In the embodiment of the present invention, the target image area is an area including a face image, so that when the target image area is converted into a cartoon image, the face image area can be subjected to an exaggeration process.
And 103, performing exaggeration processing on the face image area to generate a cartoon image.
In this embodiment, the exaggeration processing refers to performing exaggeration processing on the features of the face region, such as: the key parts such as eyes, mouth and the like in the face area are exaggeratedly expressed, and corresponding cartoon portraits are drawn to match story lines, for example, special emotions such as surprise, anger and the like are expressed.
According to the embodiment of the invention, the key frame is extracted from the video to be processed, the region containing the face image in the key frame is selected as the target image region, and the face image region is subjected to exaggeration processing to be matched with the story plot, so that manual creation is not needed, the creation efficiency of the cartoon can be effectively improved, and the creation cost of the cartoon can be reduced.
Optionally, as shown in fig. 2, the step 101 includes:
step 201, dividing the video to be processed into at least one shot sequence, wherein the shot sequence comprises at least one frame of image.
Due to the particularity of partial videos, such as a football match video, which is composed of specific shot scenes, the scene change among different shots is not complicated, and more sudden shots are caused due to the match rhythm. Further, a shot segmentation method based on pixel comparison may be adopted to segment the video to be processed into at least one shot sequence, including:
acquiring a total frame difference between two adjacent frames of images in the video to be processed; judging whether two adjacent frames of images belong to the same shot sequence or not according to the total frame difference, and obtaining a judgment result; and dividing the video to be processed into different shot sequences according to the judgment result.
Specifically, according to the formula:
Figure BDA0002429592420000061
calculating the total frame difference between two adjacent frames of images in the video to be processed; where E (k, k +1) denotes a total frame difference between the k-th frame and the k + 1-th frame, Ik(x, y) denotes a gray value at (x, y) of the k-th frame, Ik+1(x, y) represents a gray value of the k +1 th frame at (x, y); h denotes a frame height, and W denotes a frame width.
Comparing the total frame difference to a third threshold; under the condition that the total frame difference is larger than the third threshold, two adjacent frames of images are different shot sequences; and under the condition that the total frame difference is smaller than or equal to the third threshold, two adjacent frames of images are the same shot sequence.
The result of comparing the total frame difference with the third threshold value can be expressed by the following formula:
Figure BDA0002429592420000062
wherein R isn(x, y) represents a judgment result of comparing E (k, k +1) with the third threshold value T. According to RnAnd (x, y) roughly cutting the video to be processed into different shots, wherein the different shots represent different shot languages, so that generation of the cartoon language can be assisted based on the different shot languages, for example, close-up shots of players in a football match, content monologue of the players in the cartoon can be assisted to be generated, and the like.
Step 202, performing frame pixel detection on the at least one shot sequence, and determining a candidate frame.
The candidate frames are candidate materials for generating the cartoon, and the key frames need to be extracted from the candidate frames. In particular, a higher threshold may be used for frame pixel detection for the at least one shot sequence.
Further, when the candidate frames are determined, extracting the macro scene description feature vectors of all the image frames in the target shot sequence; and taking the frame with the macro scene description feature vector smaller than a first threshold value as the candidate frame.
In this embodiment, after the video to be processed is divided into at least one shot sequence, when a certain shot sequence in the at least one shot sequence is processed, the shot sequence is the target shot sequence. Frame pixel detection needs to be performed separately for all shot sequences, for example: segmenting the video to be processed into two shot sequences A { f1,f2…fn},B{g1,g2…gnWhen determining candidate frames, frame pixel detection needs to be performed on the lens sequence a and the lens sequence B respectively, macro scene description feature vectors of all image frames in the lens sequence a and the lens sequence B need to be extracted, and all frames smaller than a first threshold are marked as the candidate frames, such as fi,0<i<n。
Step 203, extracting the key frame from the candidate frames.
The key frames are significant and representative frames in the video to be processed, and a plurality of frames are selected from different shot materials to serve as materials generated by subsequent cartoons in key frame extraction. The key frames may include abrupt frames and gradual frames, and in an embodiment of the present invention, the key frames are the abrupt frames. Taking a football match as an example, the football match is composed of a series of shot sequences, the change between shots is abrupt change, the change between non-shots is gradual change, a large amount of boring moments exist in the match, the boring moments are reflected in a video and are gradual change frames, the wonderful moments generally appear in the abrupt change frames, and the abrupt change frames are needed for generating materials of the cartoon.
Specifically, when the key frame is extracted from the candidate frames, calculating a frame difference between two adjacent frames in the candidate frames; and if the frame difference is larger than a second threshold value, taking the image frame with the later display time in the two adjacent frames as the key frame.
It should be noted that the second threshold is a lower threshold, and the second threshold is smaller than the first threshold. Taking the key frame as an abrupt change frame as an example, when the candidate frame is detected by using the second threshold, comparing every two adjacent frames in the candidate frame and calculating a frame difference, and if the frame difference is greater than the second threshold, taking a comparison ending frame, namely, taking the image frame with the later display time as the abrupt change frame; and if the frame difference is smaller than or equal to the second threshold value but the accumulated frame difference is larger than the second threshold value, taking the last frame when the accumulated frame difference is larger than the second threshold value as an end frame, wherein all frames from the first frame starting comparison to the end frame are gradual-change frames. And extracting all the mutation frames in the candidate frames by the method.
In the embodiment, the key frames are extracted from the video to be processed by using a double threshold method, the mutation frames and the gradual change frames are distinguished, the mutation frames comprise the wonderful moments in the video, the mutation frames are used as the key frames for generating the cartoon images, wonderful and meaningful parts in the video clips can be converted into the cartoon images, and the cartoon creation efficiency is effectively improved.
Optionally, the key frames comprise abrupt frames, and the step 102 comprises:
extracting key areas in the mutation frames, wherein the key areas comprise face image areas; and filtering the key area to obtain the target image area.
Extracting a key region in the mutation frame, wherein pixel points with pixel values larger than or equal to a fourth threshold value in the mutation frame form the key region, and the key region comprises a face image region; and filtering the key area to obtain the target image area, wherein the filtered target image area is a cartoon image material comprising a face image area.
Since the image frame contains a large amount of picture information, the cartoon only needs to intercept an important area, such as a face image area, as shown by a frame selection area 31 in fig. 3. The key area can be extracted by adopting a deep learning algorithm, a football match video is taken as an example, faces, a football, a ball frame and the like of players are generally defined as the key area in the cartoon, namely, the key area frame is selected by adopting the deep learning algorithm. The key area is an area larger than the target image area, and therefore the key area needs to be extracted first, and then the target image area needs to be extracted from the key area.
For the extraction of the key area, an artificial intelligence method can be adopted, and the extraction result is obtained by learning a sample with a label. If a plurality of key areas exist in one frame of image, all the key areas are selected, specifically, the method includes:
(1) setting a fourth threshold value M, wherein in a mutation frame comprising a face image region, all pixel points with pixel values larger than or equal to the fourth threshold value M are regarded as key regions, and the key regions comprise the face image region;
(2) the first size can be a smaller size, such as N × N (N is a step value, such as 3), that is, the step size N of the key area in the abrupt change frame is filtered from left to right by using a filter of N × N size from top to bottom, and the filtered image area is the first image area;
(3) and then filtering the first image area by using a filter with a second size to obtain the target image area. The second dimension may be a larger dimension, in particular the second dimension is larger than the first dimension. Filtering the first image area by using a filter with a second size, and further reducing the image range to obtain the target image area;
(4) and when the key frame comprises a plurality of abrupt change frames, traversing all the abrupt change frames, and determining the target image areas and the size ranges of the target image areas in all the abrupt change frames.
Optionally, the step 103 includes: selecting a target part image to be processed in the face image area; and performing exaggeration processing on the target part image to generate a cartoon image.
In this embodiment, the target portion image may be an important portion such as an eye and a mouth. Taking a football game as an example, the comics related to the football game generally exaggerate the eyes or the mouth of a player and draw corresponding comic portraits to match with story lines so as to express some special emotions such as surprise, anger and the like.
Specifically, before the target part image is subjected to the exaggeration processing, a deformation center point and a deformation radius of the target part image may be determined first; and determining an exaggerated deformation range of the target part image according to the deformation central point and the deformation radius. For example, the eyes are subjected to exaggeration treatment, the exaggeration deformation of the eyes draws a circle by taking the eyeballs as the centers of the circles, and the image in the circle is subjected to exaggeration deformation by taking the circle radius as the exaggeration range.
Specifically, when the target part image is subjected to the exaggeration processing to generate the cartoon image, according to the mapping relation:
Figure BDA0002429592420000091
generating an exaggeration-processed cartoon image;
wherein, P is the original image of the face image area, and Q is the cartoon image after the face image area is subjected to the exaggeration treatment; piCharacteristic points, Q, of the image of said target portion in the original imageiFeature points of the target part image subjected to the exaggeration processing; riIs the deformation radius of the target part image;
b is a deformation basis function, B (t) ═ 1-t)2And t is the time corresponding to the characteristic point.
As shown in FIG. 4, taking the exaggeration of eyes as an example, the above formula can be regarded as that the face image P of the player exaggerates eyes to form a face image Q, and P is the face image PiIs an eyeball-related feature point, Q, in the original imageiTo exaggerate the deformed eyeball-related characteristic points, RiTo exaggerate the radius, the eyeball is usedThe circle center and the canthus are radii to draw a circle. And B is a deformation basis function used for dynamically adjusting the variation range.
Wherein, when P ═ PiAt time t is 0, then B is at most 1; when P is far away from PiWhen the deformation is carried out, the value of the deformation basis function is gradually reduced; when | P-Pi‖>RiWhen t is 1, B takes a value of 0.
It should be noted that, in order to realize the smooth transition of the transformed image, only the image in the deformation range is subjected to the exaggerated deformation based on the above formula; through the deformation basis function, the closer to the deformation central point, the larger the deformation range, and the farther from the deformation central point, the smaller the deformation range.
Optionally, as shown in fig. 5, the mouth may be subjected to an exaggerated deformation, and similarly, the mouth is exaggerated to be regarded as an image mapping in which the player face image P is changed into the face image Q after being exaggerated, the image mapping is performed with the center positions of two mouth corners as the center of a circle, the distance from the center of the circle to the mouth corner as a radius, the image in the circular range is subjected to an exaggerated deformation, and the mapping relationship and principle of the exaggerated deformation are similar to the steps of the exaggerated processing of the eyes, and are not described herein again.
Optionally, after the exaggeration processing is performed on the face image region, the method further includes: and adding a cartoon dialog box in the cartoon image after the exaggeration processing. According to the set cartoon script, an elliptical dialog box and a radial dialog box can be selected to add character contents to the cartoon image. Taking the example of converting a football game video into a cartoon image, an elliptical dialog box can be selected for character dialog, and the dialog box is set at the corner edge of a character mouth; a radial dialog box is selected as a goal-shooting and center-shooting door frame in the close shot, and is set at the corner of a figure mouth. In which the text in the dialog box is supplemented according to the cartoon script content, the form of the dialog box is shown in fig. 6 and 7.
Specifically, as shown in fig. 8, the image processing method according to the embodiment of the present invention includes performing shot segmentation on a video to be processed, extracting key frames, selecting a target image area from the key frames, performing exaggeration processing on a face image area in the target image area, adding a cartoon dialog box and content supplement, and generating a final cartoon image. It should be noted that, if the key frame is an image frame that does not include a face image area, or the key frame includes other important areas except the face image area, the key frame or the other important areas except the face image area in the key frame are processed by using an image stylization algorithm to be converted into a cartoon style, so as to form a cartoon image.
As shown in fig. 9, an embodiment of the present invention also provides an image processing apparatus including:
an extracting module 910, configured to extract a key frame from a video to be processed;
a selecting module 920, configured to select a target image region in the key frame, where the target image region includes a face image region;
and the processing module 930 is configured to perform an exaggeration process on the face image region to generate a cartoon image.
Optionally, the extracting module 910 includes:
the shot segmentation unit is used for segmenting the video to be processed into at least one shot sequence, and the shot sequence comprises at least one frame of image;
a determining unit, configured to perform frame pixel detection on the at least one shot sequence, and determine a candidate frame;
a first extraction unit, configured to extract the key frame from the candidate frames.
Optionally, the lens division unit includes:
the acquiring subunit is used for acquiring the total frame difference between two adjacent frames of images in the video to be processed;
the judging subunit is used for judging whether the two adjacent frames of images belong to the same shot sequence or not according to the total frame difference and obtaining a judgment result;
and the segmentation subunit is used for segmenting the video to be processed into different shot sequences according to the judgment result.
Optionally, the determining unit is specifically configured to:
extracting macro scene description feature vectors of all image frames in a target shot sequence;
and taking the frame with the macro scene description feature vector smaller than a first threshold value as the candidate frame.
Optionally, the first extraction unit is specifically configured to:
calculating a frame difference between two adjacent frames in the candidate frames;
and if the frame difference is larger than a second threshold value, taking the image frame with the later display time in the two adjacent frames as the key frame.
Optionally, the key frames include abrupt frames, and the selecting module 920 includes:
the second extraction unit is used for extracting a key area in the mutation frame, wherein the key area comprises a face image area;
and the filtering unit is used for filtering the key area to obtain the target image area.
Optionally, the processing module 930 includes:
the selection unit is used for selecting a target part image to be processed in the face image area;
and the processing unit is used for carrying out exaggeration processing on the target part image to generate a cartoon image.
Optionally, the apparatus further comprises:
the first determining module is used for determining a deformation central point and a deformation radius of the target part image;
and the second determining module is used for determining the exaggerated deformation range of the target part image according to the deformation central point and the deformation radius.
Optionally, the processing unit is specifically configured to:
according to the mapping relation:
Figure BDA0002429592420000111
generating an exaggeration-processed cartoon image;
wherein P is the original image of the face image region, and Q is the face image regionPerforming exaggeration processing on the cartoon image; piCharacteristic points, Q, of the image of said target portion in the original imageiFeature points of the target part image subjected to the exaggeration processing; riIs the deformation radius of the target part image;
b is a deformation basis function, B (t) ═ 1-t)2And t is the time corresponding to the characteristic point.
According to the embodiment of the invention, the key frame is extracted from the video to be processed, the region containing the face image in the key frame is selected as the target image region, and the face image region is subjected to exaggeration processing to be matched with the story plot, so that manual creation is not needed, the creation efficiency of the cartoon can be effectively improved, and the creation cost of the cartoon can be reduced.
As shown in fig. 10, an embodiment of the present invention further provides an electronic device, which includes a processor 110, a memory 120, and a computer program stored on the memory 120 and executable on the processor 110, where the processor 110 implements the steps of the image processing method when executing the computer program. Specifically, the processor 110 is configured to extract a key frame from the video to be processed;
selecting a target image area in the key frame, wherein the target image area comprises a face image area;
and carrying out exaggeration processing on the face image area to generate a cartoon image.
Optionally, when the processor 110 extracts a key frame from the video to be processed, the following steps are implemented:
dividing the video to be processed into at least one shot sequence, wherein the shot sequence comprises at least one frame of image;
performing frame pixel detection on the at least one shot sequence to determine candidate frames;
and extracting the key frame from the candidate frames.
Optionally, when the processor 110 segments the video to be processed into at least one shot sequence, the following steps are implemented:
acquiring a total frame difference between two adjacent frames of images in the video to be processed;
judging whether two adjacent frames of images belong to the same shot sequence or not according to the total frame difference, and obtaining a judgment result;
and dividing the video to be processed into different shot sequences according to the judgment result.
Optionally, the processor 110 performs frame pixel detection on the at least one shot sequence, and when determining a candidate frame, implements the following steps:
extracting macro scene description feature vectors of all image frames in a target shot sequence;
and taking the frame with the macro scene description feature vector smaller than a first threshold value as the candidate frame.
Optionally, when the processor 110 extracts the key frame from the candidate frames, the following steps are implemented:
calculating a frame difference between two adjacent frames in the candidate frames;
and if the frame difference is larger than a second threshold value, taking the image frame with the later display time in the two adjacent frames as the key frame.
Optionally, the key frame includes an abrupt change frame, and the processor 110 implements the following steps when selecting the target image region in the key frame:
extracting key areas in the mutation frames, wherein the key areas comprise face image areas;
and filtering the key area to obtain the target image area.
Optionally, the processor 110 performs an exaggeration process on the face image area to generate a cartoon image, and implements the following steps:
selecting a target part image to be processed in the face image area;
and performing exaggeration processing on the target part image to generate a cartoon image.
Optionally, the processor 110 is further configured to:
determining a deformation central point and a deformation radius of the target part image;
and determining an exaggerated deformation range of the target part image according to the deformation central point and the deformation radius.
Optionally, when the processor 110 performs an exaggeration process on the target part image to generate a cartoon image, the following steps are implemented:
according to the mapping relation:
Figure BDA0002429592420000131
generating an exaggeration-processed cartoon image;
wherein, P is the original image of the face image area, and Q is the cartoon image after the face image area is subjected to the exaggeration treatment; piCharacteristic points, Q, of the image of said target portion in the original imageiFeature points of the target part image subjected to the exaggeration processing; riIs the deformation radius of the target part image;
b is a deformation basis function, B (t) ═ 1-t)2And t is the time corresponding to the characteristic point.
The bus architecture may include any number of interconnected buses and bridges, among which are linked together by one or more processors 110, represented by processor 110, and various circuits of memory 120, represented by memory 120. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The processor 110 is responsible for managing the bus architecture and general processing, and the memory 120 may store data used by the processor in performing operations.
Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments may be performed by hardware, or may be instructed to be performed by associated hardware by a computer program that includes instructions for performing some or all of the steps of the above methods; and the computer program may be stored in a readable storage medium, which may be any form of storage medium.
In addition, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the image processing method described above. And the same technical effect can be achieved, and in order to avoid repetition, the description is omitted.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the preferred embodiments of the present invention have been described, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims (11)

1. An image processing method, comprising:
extracting key frames from a video to be processed;
selecting a target image area in the key frame, wherein the target image area comprises a face image area;
and carrying out exaggeration processing on the face image area to generate a cartoon image.
2. The image processing method according to claim 1, wherein said extracting key frames from the video to be processed comprises:
dividing the video to be processed into at least one shot sequence, wherein the shot sequence comprises at least one frame of image;
performing frame pixel detection on the at least one shot sequence to determine candidate frames;
and extracting the key frame from the candidate frames.
3. The image processing method according to claim 2, wherein the segmenting the video to be processed into at least one shot sequence comprises:
acquiring a total frame difference between two adjacent frames of images in the video to be processed;
judging whether two adjacent frames of images belong to the same shot sequence or not according to the total frame difference, and obtaining a judgment result;
and dividing the video to be processed into different shot sequences according to the judgment result.
4. The image processing method according to claim 2, wherein the performing frame pixel detection on the at least one shot sequence to determine candidate frames comprises:
extracting macro scene description feature vectors of all image frames in a target shot sequence;
and taking the frame with the macro scene description feature vector smaller than a first threshold value as the candidate frame.
5. The method according to claim 2, wherein said extracting the key frame from the candidate frames comprises:
calculating a frame difference between two adjacent frames in the candidate frames;
and if the frame difference is larger than a second threshold value, taking the image frame with the later display time in the two adjacent frames as the key frame.
6. The image processing method according to claim 1, wherein the key frame comprises an abrupt change frame, and the selecting a target image region in the key frame comprises:
extracting key areas in the mutation frames, wherein the key areas comprise face image areas;
and filtering the key area to obtain the target image area.
7. The image processing method according to claim 1, wherein the generating a cartoon image by performing an exaggeration process on the face image region includes:
selecting a target part image to be processed in the face image area;
and performing exaggeration processing on the target part image to generate a cartoon image.
8. The image processing method according to claim 7, wherein before the subjecting the target part image to the exaggeration processing, the method further comprises:
determining a deformation central point and a deformation radius of the target part image;
and determining an exaggerated deformation range of the target part image according to the deformation central point and the deformation radius.
9. The image processing method according to claim 7, wherein the generating a comic image by performing an exaggeration process on the target portion image includes:
according to the mapping relation:
Figure FDA0002429592410000021
generating an exaggeration-processed cartoon image;
wherein, P is the original image of the face image area, and Q is the cartoon image after the face image area is subjected to the exaggeration treatment; piCharacteristic points, Q, of the image of said target portion in the original imageiFeature points of the target part image subjected to the exaggeration processing; riIs the deformation radius of the target part image;
b is a deformation basis function, B (t) ═ 1-t)2And t is the time corresponding to the characteristic point.
10. An electronic device, comprising: processor, memory and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the image processing method according to any of claims 1 to 9 when executing the computer program.
11. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the image processing method according to any one of claims 1-9.
CN202010232138.XA 2020-03-27 2020-03-27 Image processing method, electronic equipment and computer readable storage medium Pending CN111461962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010232138.XA CN111461962A (en) 2020-03-27 2020-03-27 Image processing method, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010232138.XA CN111461962A (en) 2020-03-27 2020-03-27 Image processing method, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111461962A true CN111461962A (en) 2020-07-28

Family

ID=71678313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010232138.XA Pending CN111461962A (en) 2020-03-27 2020-03-27 Image processing method, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111461962A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030163315A1 (en) * 2002-02-25 2003-08-28 Koninklijke Philips Electronics N.V. Method and system for generating caricaturized talking heads
CN104200505A (en) * 2014-08-27 2014-12-10 西安理工大学 Cartoon-type animation generation method for human face video image
CN104867161A (en) * 2015-05-14 2015-08-26 国家电网公司 Video-processing method and device
CN105049875A (en) * 2015-07-24 2015-11-11 上海上大海润信息系统有限公司 Accurate key frame extraction method based on mixed features and sudden change detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030163315A1 (en) * 2002-02-25 2003-08-28 Koninklijke Philips Electronics N.V. Method and system for generating caricaturized talking heads
CN104200505A (en) * 2014-08-27 2014-12-10 西安理工大学 Cartoon-type animation generation method for human face video image
CN104867161A (en) * 2015-05-14 2015-08-26 国家电网公司 Video-processing method and device
CN105049875A (en) * 2015-07-24 2015-11-11 上海上大海润信息系统有限公司 Accurate key frame extraction method based on mixed features and sudden change detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何颖: "基于视频的漫画人脸动画研究与实现", 《中国优秀硕士学位论文全文数据库》 *

Similar Documents

Publication Publication Date Title
CN111385644A (en) Video processing method, electronic equipment and computer readable storage medium
US10733421B2 (en) Method for processing video, electronic device and storage medium
Märki et al. Bilateral space video segmentation
CN112232425B (en) Image processing method, device, storage medium and electronic equipment
CN108520223B (en) Video image segmentation method, segmentation device, storage medium and terminal equipment
US8175376B2 (en) Framework for image thumbnailing based on visual similarity
Ye et al. Co-saliency detection via co-salient object discovery and recovery
US8244044B2 (en) Feature selection and extraction
Liu et al. Interactive image segmentation based on level sets of probabilities
CN107333071A (en) Video processing method and device, electronic equipment and storage medium
US8879835B2 (en) Fast adaptive edge-aware matting
US10249029B2 (en) Reconstruction of missing regions of images
Meng et al. Weakly supervised semantic segmentation by a class-level multiple group cosegmentation and foreground fusion strategy
GB2523330A (en) Method, apparatus and computer program product for segmentation of objects in media content
US20210158593A1 (en) Pose selection and animation of characters using video data and training techniques
TW202105327A (en) Image processing method, processor, electronic equipment and computer readable storage medium thereof
Le et al. Object removal from complex videos using a few annotations
Zhang et al. Detecting and removing visual distractors for video aesthetic enhancement
CN116363261A (en) Training method of image editing model, image editing method and device
WO2023024653A1 (en) Image processing method, image processing apparatus, electronic device and storage medium
Zhao et al. Cartoon image processing: a survey
Baghel et al. Image conditioned keyframe-based video summarization using object detection
CN108961314B (en) Moving image generation method, moving image generation device, electronic device, and computer-readable storage medium
Tao et al. Video decolorization using visual proximity coherence optimization
Xiao et al. Interactive deep colorization and its application for image compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200728

RJ01 Rejection of invention patent application after publication