CN111277893B - Video processing method and device, readable medium and electronic equipment - Google Patents

Video processing method and device, readable medium and electronic equipment Download PDF

Info

Publication number
CN111277893B
CN111277893B CN202010089115.8A CN202010089115A CN111277893B CN 111277893 B CN111277893 B CN 111277893B CN 202010089115 A CN202010089115 A CN 202010089115A CN 111277893 B CN111277893 B CN 111277893B
Authority
CN
China
Prior art keywords
target
image frame
face region
video
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010089115.8A
Other languages
Chinese (zh)
Other versions
CN111277893A (en
Inventor
王兢业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010089115.8A priority Critical patent/CN111277893B/en
Publication of CN111277893A publication Critical patent/CN111277893A/en
Application granted granted Critical
Publication of CN111277893B publication Critical patent/CN111277893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)

Abstract

The disclosure relates to a video processing method, a video processing device, a readable medium and an electronic device. The method comprises the following steps: detecting a face region in a first image frame of an original video; amplifying and rotating the face area to obtain a second image frame; outputting the second image frame for video rendering. Through the technical scheme, after the face area in the first image frame of the original video is amplified and rotated, the corresponding second image frame can be obtained, the second image frame is output to conduct video rendering, the continuous image frames form a dynamic video effect during rendering, and a user can see the dynamic video effect of amplifying the face and shaking the head. Therefore, the diversity of the effect which can be presented by the video is improved, and the use requirement of a user is met.

Description

Video processing method and device, readable medium and electronic equipment
Technical Field
The present disclosure relates to the field of video technologies, and in particular, to a video processing method, an apparatus, a readable medium, and an electronic device.
Background
With the increasing popularization of terminal equipment, terminal equipment such as mobile phones and tablet computers become an indispensable part of daily life and work of people, and people can take photos and videos through the terminals. With the continuous development of computer technology, the functions that can be realized by the terminal are more and more abundant, and effects can be added to videos shot by users to attract the users. In the related art, when a video is processed, the processing mode is single, the realized effect is monotonous, and the requirements of users cannot be met.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a video processing method, the method comprising:
detecting a face region in a first image frame of an original video;
amplifying and rotating the face area to obtain a second image frame;
outputting the second image frame for video rendering.
In a second aspect, the present disclosure provides a video processing apparatus, the apparatus comprising:
the first detection module is used for detecting a face area in a first image frame of an original video;
the execution module is used for amplifying and rotating the face area to obtain a second image frame;
a first output module for outputting the second image frame for video rendering.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect of the present disclosure.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the method of the first aspect of the present disclosure.
Through the technical scheme, after the face area in the first image frame of the original video is amplified and rotated, the corresponding second image frame can be obtained, the second image frame is output to conduct video rendering, the continuous image frames form a dynamic video effect during rendering, and a user can see the dynamic video effect of amplifying the face and shaking the head. Therefore, the diversity of the effect which can be presented by the video is improved, and the use requirement of a user is met.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
In the drawings:
fig. 1 is a flow diagram illustrating a video processing method according to an example embodiment.
Fig. 2a is a schematic diagram illustrating a first image frame according to an exemplary embodiment.
Fig. 2b is a schematic diagram of a second image frame obtained after the face region in fig. 2a is enlarged and rotated.
FIG. 3 is a flow chart illustrating a method of zooming in on a face region according to an exemplary embodiment.
Fig. 4a is a schematic diagram illustrating a first image frame according to an exemplary embodiment.
Fig. 4b shows a schematic diagram of the detected face region in fig. 4 a.
Fig. 4c shows a schematic diagram of the corrected face region.
Fig. 5 is a block diagram illustrating a video processing apparatus according to an example embodiment.
Fig. 6 is a schematic structural diagram of an electronic device according to an exemplary embodiment.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 1 is a flowchart illustrating a video processing method according to an exemplary embodiment, which may be applied to a terminal, such as a smart phone, a tablet computer, a Personal Computer (PC), a notebook computer, and the like. As shown in fig. 1, the method may include S101-S103.
In S101, a face region in a first image frame of an original video is detected.
For example, the original video may be a real-time video captured by a camera on the terminal, and the first image frame may refer to an image frame currently captured by the camera. The original video may also be a video pre-stored in the terminal, and the first image frame may be any image frame in the original video, for example, an image frame to be displayed during playing of the original video. Fig. 2a is a schematic diagram of an exemplary first image frame. The manner of detecting the face region from the first image frame may refer to the related art. The number of faces appearing in the first image frame may be one or more, and accordingly, the number of face regions detected in this step may be one or more. It should be noted that fig. 2a only shows an example of a face region, but does not limit the embodiments of the present disclosure.
In S102, the face region is enlarged and rotated, resulting in a second image frame.
In the present disclosure, after the face region in the first image frame is enlarged and rotated, another image may be obtained, and the another image is the second image frame. Fig. 2b is a schematic diagram of a second image frame obtained after enlarging and rotating the face region in fig. 2a, as shown in fig. 2 b. In this step, the detected face region may be enlarged and then rotated, or may be rotated and then enlarged, or enlarged and rotated at the same time, which is not limited in this disclosure.
If a plurality of face regions are detected in S101, some face regions in the plurality of face regions may be enlarged and rotated in this step, or all the face regions may be enlarged and rotated at the same time.
In S103, a second image frame is output for video rendering.
In the step, a second image frame obtained after the face area in the first image frame is amplified and rotated is output to perform video rendering, continuous image frames form a dynamic video effect during rendering, and the user can see the video effects of amplifying the face and shaking the head by controlling the rotation direction and the rotation angle of the face area in each frame of image.
For example, taking a real-time video shot by a camera as an original video as an example, processing of a first image frame shot by the camera is performed in real time, so that a second image frame can be output in real time on a shooting interface for video rendering, and a user can see video effects of face enlargement and head shaking on the shooting interface. For example, if the original video is a video pre-stored in the terminal, the processing of the first image frame in the original video is performed dynamically, and therefore, the second image frame can be output on the playing interface of the original video for video rendering, so that the user can see the video effect of enlarging the face and shaking the head on the playing interface.
Through the technical scheme, after the face area in the first image frame of the original video is amplified and rotated, the corresponding second image frame can be obtained, the second image frame is output to conduct video rendering, the continuous image frames form a dynamic video effect during rendering, and a user can see the dynamic video effect of amplifying the face and shaking the head. Therefore, the diversity of the effect which can be presented by the video is improved, and the use requirement of a user is met.
In the present disclosure, before the face area is enlarged and rotated, the target trigger event may be detected first, and in response to detecting the target trigger event, the step of enlarging and rotating the face area may be performed. Wherein the target trigger event may include a target expression and/or a target action. That is, in the present disclosure, the target trigger event serves as a prerequisite for turning on the operation of enlarging and rotating the face region.
In one embodiment, the target trigger event may include a target expression. The target expression is, for example, smile, and it may be detected whether the person in the first image frame has made a smile expression by way of smile recognition in the related art, and if the smile expression has been made, it is considered that the target trigger event has been detected, and then the face area is enlarged and rotated. If the number of the face areas is multiple, only the face area corresponding to the person with the smiling expression can be amplified and rotated.
In another embodiment, the target trigger event may include a target action. The target action may be a preset gesture made by a person in the image, or an operation performed by the terminal user on the terminal, such as clicking a screen, shaking the terminal, and the like. For example, image recognition may be performed on the first image frame to detect whether a preset gesture, such as a V-shaped gesture, is performed on a person in the first image frame, and if the preset gesture is performed, it is considered that a target trigger event is detected, and then the face area is enlarged and rotated. For example, if there are a plurality of detected face regions, for example, only the face region corresponding to the person who has made the preset gesture may be enlarged and rotated. The operations performed on the terminal by the end user can be detected by sensors integrated in the terminal. If the operations of clicking a screen, shaking a terminal and the like implemented by the user are detected, the target trigger event is considered to be detected, and then the face area is amplified and rotated. For example, if a plurality of detected face regions are detected, and a target action of a user clicking a screen or shaking the terminal is detected, the plurality of face regions can be all enlarged and rotated.
In yet another embodiment, the target trigger event may include a target expression and a target action. For example, a target trigger event may be considered to be generated when one of a target expression or a target action is detected. For example, it is detected that a person in the first image frame makes a smiling expression, or it is detected that the user clicks the screen, and the process of enlarging and rotating the face area may be performed.
Through the technical scheme, on one hand, the operation of amplifying and rotating the face area can be automatically started after the target trigger event is detected, namely, the video effects of amplifying the face and shaking the head can be automatically presented after the target trigger event is detected. On the other hand, the target trigger event can be a target expression and/or a target action made by the user, so that the video effects of amplifying the human face and shaking the head are presented after the target trigger event is detected, the interactivity with the user can be enhanced, and the user experience is improved.
In addition, if no target triggering event is detected, for example, no target expression and target action are detected, the human face area in the first image frame may not be enlarged and rotated, and the first image frame is directly output for video rendering.
FIG. 3 is a flow chart illustrating a method of zooming in on a face region according to an exemplary embodiment. As shown in fig. 3, the amplification method may include S301 to S303.
In S301, the inclination angle of the face in the first image frame with respect to the first target direction is determined.
The first target direction may be a transverse axis direction or a longitudinal axis direction of the terminal. If the terminal is currently in the vertical screen posture, the width direction of the terminal can be taken as the direction of a transverse axis, and the length direction of the terminal can be taken as the direction of a longitudinal axis; if the terminal is currently in the landscape orientation, the length direction along the terminal may be taken as the horizontal axis direction, and the width direction along the terminal may be taken as the vertical axis direction. For example, the horizontal axis direction may be regarded as the x axis, and the vertical axis direction may be regarded as the y axis.
Fig. 4a shows a schematic view of a first image frame, as shown in fig. 4a, when the user is taking a video, the head may be tilted. If the x axis is taken as a first target direction, the inclination angle of the face relative to the x axis is alpha; if the y axis is taken as the first target direction, the inclination angle of the human face relative to the y axis is theta.
In S302, when the inclination angle satisfies a preset condition, the face region is subjected to position correction according to the inclination angle, so as to obtain a corrected face region.
In this step, the inclination angle satisfying the preset condition may indicate that the inclination angle of the face with respect to the y-axis is small. If the x axis is taken as the first target direction, the preset condition may be that the inclination angle is greater than a preset first angle threshold; if the y-axis is the first target direction, the preset condition may be that the tilt angle is smaller than a preset second angle threshold. The first angle threshold and the second angle threshold may be pre-calibrated.
It should be noted that, as described above, the head of the user may be tilted when the user takes a video, and in the step of detecting the face region in the first image frame in S101, when the face segmentation is performed, as shown in fig. 4b, the detected face region is correct, and therefore, it is necessary to correct the position of the detected face region so that the corrected face region matches the actual tilt angle when the user takes a video.
In an alternative embodiment, as shown in fig. 4b, the detected face area is located and represented by four vertex position information of a quadrilateral frame, for example, the upper left corner position information of the quadrilateral frame is denoted as T0, the lower right corner position information is denoted as T1, the lower left corner position information is denoted as T2, and the upper right corner position information is denoted as T3. T0, T1, T2, and T3 may be represented by two-dimensional coordinates and may be determined according to the resolution of the quadrangular box. For example, if the resolution of the quadrilateral frame is 128 × 128, T0 is (0.0 ), T1 is (128.0 ), T2 is (0.0, 128.0), and T3 is (128.0, 0.0).
For example, with the y-axis as the first target direction, the position of the face region is corrected according to the inclination angle θ of the face with respect to the y-axis, and the position correction can be performed by the following formula:
Figure BDA0002383118760000081
Figure BDA0002383118760000082
Figure BDA0002383118760000083
Figure BDA0002383118760000084
as shown in fig. 4c, P0 represents the position information of the upper left corner of the corrected face region, P1 represents the position information of the lower right corner of the corrected face region, P2 represents the position information of the lower left corner of the corrected face region, and P3 represents the position information of the upper right corner of the corrected face region. In this way, the corrected face area is consistent with the actual inclination angle of the user when shooting the video.
In S303, the corrected face region is enlarged according to the magnification.
In this disclosure, when the corrected face region is enlarged, the central point position information of the corrected face region may be determined first.
Each of the above-mentioned P0, P1, P2 and P3 may be represented by two-dimensional coordinates, for example, P0 ═ P (P1)0,x,p0,y),P1=(p1,x,p1,y),P2=(p2,x,p2,y),P3=(p3,x,p3,y). Illustratively, the center point position information of the corrected face region is (c)x,cy) The center point position information may be determined by the following formula:
Figure BDA0002383118760000085
Figure BDA0002383118760000086
and then, amplifying the corrected human face area according to the amplification factor and the central point position information.
Exemplarily, for each pixel point in the corrected face region, the position information of the pixel point is recorded as Q0=(q0,x,q0,y) After amplification, the position information of the pixel point is Q1=(q1,x,q1,y) Q can be determined by the following formula1
q1,x=cx+k·(q0,x-cx)
q1,y=cy+k·(q0,y-cy)
Where k represents the magnification.
In the present disclosure, when the face area is enlarged, in one embodiment, the magnification k may be a preset value for each image frame in the original video.
Preferably, in another embodiment, the magnification k may be dynamically varied. As described above, the steps of zooming in and rotating the face region may be performed after detecting the target trigger event. Based on this, the magnification k may be dynamically varied according to the detected target trigger event. Illustratively, enlarging the face region may further include: determining event characteristic information of a target trigger event; and determining the magnification according to the event characteristic information.
As set forth above, the target trigger event may include a target expression and/or a target action, and accordingly, the event characteristic information may include: the expression characteristic information of the target expression and/or the action characteristic information of the target action.
The expression feature information may indicate an expression type and an expression degree, for example, the expression feature information indicates a smile and a smile degree. In the present disclosure, the magnification may have a corresponding relationship with the expression degree, for example, the magnification is positively correlated with the expression degree, i.e., the greater the expression degree, the greater the magnification. Illustratively, the target expression is smile, the greater the degree of smile, the greater the magnification, i.e., the more pronounced the big head effect. If a plurality of face regions are detected in S101, when the plurality of face regions are enlarged, the magnification of each face region may be determined according to the expression degree of the corresponding person, that is, the magnification of each face region may be different.
The motion characteristic information may indicate a motion type as well as a motion magnitude. For example, the motion characteristic information indicates a shake terminal and a shake magnitude. In the present disclosure, the amplification factor may have a corresponding relationship with the action amplitude, for example, the amplification factor has a positive correlation with the action amplitude, i.e., the amplification factor is larger as the action amplitude is larger. For example, the target motion is a shaking terminal, and the larger the shaking amplitude is, the larger the magnification is, i.e., the more significant the head effect is.
When the event feature information includes expression feature information and motion feature information, the magnification factor may be determined comprehensively from the expression feature information and the motion feature information. For example, the user may make both a smiling expression and a motion of shaking the terminal, and the magnification may be determined according to a result of weighted summation of magnifications corresponding to the smiling degree and the shaking amplitude, respectively.
Through the technical scheme, the magnification factor can be dynamically changed according to the detected target trigger event, so that the interactivity with the user can be further improved, the video conversion effect is enriched, and the user experience is enhanced.
In the present disclosure, rotating the face region may include: and rotating the amplified human face area according to the target rotation angle relative to the second target direction.
The second target direction may be a horizontal axis direction or a vertical axis direction of the terminal, and as described above, may be in a landscape orientation or a portrait orientation depending on whether the terminal is currently in a landscape orientation or a portrait orientationAttitude information of the attitude determines the horizontal axis direction and the vertical axis direction. Taking the second target direction as the vertical axis direction as an example, the example in S303 is continued for each pixel point in the enlarged face region, and the position information of the pixel point is recorded as Q1=(q1,x,q1,y) After the rotation, the position information of the pixel point is Q2=(q2,x,q2,y) Q can be determined by the following formula2
Figure BDA0002383118760000101
Where β represents the target rotation angle.
In one embodiment, the target rotation angle may be determined according to a timestamp corresponding to the first image frame. For example, the target rotation angle may be a trigonometric function with respect to the time stamp t, and the target rotation angle β may be determined by the following formula:
β=sin t
because the value of the beta is switched between positive and negative along with the change of the timestamp, for example, when the value of the beta is positive, the rotating direction of the face area is right, and when the value of the beta is negative, the rotating direction of the face area is left, a video effect of shaking the head left and right can be presented during video rendering.
Preferably, in another embodiment, the panning amplitude and panning frequency in the video effect are dynamically adjustable. As mentioned above, the step of enlarging and rotating the face area may be performed after the target trigger event is detected, based on which the target rotation angle β may also be dynamically adjusted according to the detected target trigger event. For example, rotating the face region may further include: determining event characteristic information of a target trigger event; and determining the target rotation angle according to the event characteristic information and the timestamp corresponding to the first image frame. The target trigger event and event characteristic information are described above. For example, the target rotation angle β may be determined by the following formula:
β=a·sin bt
where a may represent a target rotation amplitude and b may represent a target rotation frequency. The values of a and b can be determined according to the event characteristic information of the target trigger event, namely, the values are dynamically adjusted according to the expression characteristic information of the target expression and/or the action characteristic information of the target action made by the user.
For example, the expressive feature information may indicate an expression type and an expression degree, e.g., the expressive feature information indicates a smile and a smile degree. In this disclosure, values of the target rotation amplitude and the target rotation frequency may both have a corresponding relationship with the expression degree, for example, the target rotation amplitude and the target rotation frequency both have a positive correlation with the expression degree, that is, the larger the expression degree is, the larger the target rotation amplitude and the target rotation frequency are. Illustratively, the target expression is smiling, and the larger the smiling degree is, the larger the target rotation amplitude and the target rotation frequency are, and thus, the larger the panning amplitude and the panning frequency of the character in the rendered video effect is.
For example, the motion characteristic information may indicate a type of motion and a magnitude of the motion. For example, the motion characteristic information indicates a shake terminal and a shake magnitude. In the present disclosure, values of the target rotation amplitude and the target rotation frequency may both have a corresponding relationship with the motion amplitude, for example, both the target rotation amplitude and the target rotation frequency are positively correlated with the motion amplitude, that is, the larger the motion amplitude is, the larger the target rotation amplitude and the target rotation frequency are. For example, the target motion is a shaking terminal, the larger the shaking amplitude is, the larger the target rotation amplitude and the target rotation frequency are, and thus, the larger the shaking amplitude and shaking frequency of the character in the rendered video effect is.
For example, when the event feature information includes expression feature information and motion feature information, the value of the target rotation amplitude may be determined according to a weighted summation result of target rotation amplitudes corresponding to the expression degree and the motion amplitude, respectively, and the value of the target rotation frequency may be determined according to a weighted summation result of target rotation frequencies corresponding to the expression degree and the motion amplitude, respectively.
Through the technical scheme, when the target rotation angle is determined, the values of the target rotation amplitude and the target rotation frequency can be dynamically adjusted according to the detected target trigger event. Therefore, the interactivity with the user can be further improved, the video conversion effect is enriched, and the user experience is enhanced.
Based on the same inventive concept, the disclosure also provides a video processing device. Fig. 5 is a block diagram illustrating a video processing apparatus according to an exemplary embodiment, and as shown in fig. 5, the apparatus 500 may include:
a first detection module 501, configured to detect a face region in a first image frame of an original video;
an executing module 502, configured to amplify and rotate the face region to obtain a second image frame;
a first output module 503, configured to output the second image frame for video rendering.
By adopting the device, after the face area in the first image frame of the original video is amplified and rotated, the corresponding second image frame can be obtained, the second image frame is output to perform video rendering, the continuous image frames form a dynamic video effect during rendering, and a user can see the dynamic video effect of amplifying the face and shaking the head. Therefore, the diversity of the effect which can be presented by the video is improved, and the use requirement of a user is met.
Optionally, the apparatus 500 may further include:
a second detection module, configured to detect a target trigger event before the execution module 502 amplifies and rotates the face region;
the executing module 502 is configured to execute the enlarging and rotating of the face region in response to the second detecting module detecting the target triggering event.
Optionally, the apparatus 500 may further include:
and the second output module is used for outputting the first image frame for video rendering if the target trigger event is not detected.
Optionally, the executing module 502 may be configured to determine an inclination angle of the face in the first image frame with respect to a first target direction; under the condition that the inclination angle meets a preset condition, carrying out position correction on the face region according to the inclination angle to obtain a corrected face region; and amplifying the corrected human face area according to the amplification factor.
Optionally, the executing module 502 enlarges and rotates the face region after the second detecting module detects the target triggering event; correspondingly, the execution module 502 may be further configured to determine event characteristic information of the target trigger event; and determining the magnification according to the event characteristic information.
Optionally, the executing module 502 may be configured to rotate the enlarged face region according to a target rotation angle relative to a second target direction.
Optionally, the executing module 502 may be further configured to determine the target rotation angle according to a timestamp corresponding to the first image frame.
Optionally, the executing module 502 enlarges and rotates the face region after the second detecting module detects the target triggering event; correspondingly, the execution module 502 may be further configured to determine event characteristic information of the target trigger event; and determining the target rotation angle according to the event characteristic information and the timestamp corresponding to the first image frame.
Referring now to FIG. 6, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some implementations, the clients may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: detecting a face region in a first image frame of an original video; amplifying and rotating the face area to obtain a second image frame; outputting the second image frame for video rendering.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module does not in some cases constitute a limitation of the module itself, for example, the first detection module may also be described as a "face region detection module".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides, in accordance with one or more embodiments of the present disclosure, a video processing method, the method comprising: detecting a face region in a first image frame of an original video; amplifying and rotating the face area to obtain a second image frame; outputting the second image frame for video rendering.
Example 2 provides the method of example 1, prior to the step of magnifying and rotating the face region, further comprising: detecting a target trigger event; and in response to detecting the target trigger event, performing the steps of enlarging and rotating the face region.
Example 3 provides the method of example 2, further comprising, in accordance with one or more embodiments of the present disclosure: and if the target trigger event is not detected, outputting the first image frame for video rendering.
Example 4 provides the method of example 2 or example 3, the target triggering event comprising: a target expression and/or a target action.
Example 5 provides the method of example 1, the raw video being a real-time video captured by a camera, in accordance with one or more embodiments of the present disclosure.
Example 6 provides the method of example 1, the enlarging the face region, comprising: determining an inclination angle of a face in the first image frame relative to a first target direction; under the condition that the inclination angle meets a preset condition, carrying out position correction on the face region according to the inclination angle to obtain a corrected face region; and amplifying the corrected human face area according to the amplification factor.
Example 7 provides the method of example 6, the step of magnifying and rotating the face region being performed after detecting a target trigger event, in accordance with one or more embodiments of the present disclosure; correspondingly, the amplifying the face region further includes: determining event characteristic information of the target trigger event; and determining the magnification according to the event characteristic information.
Example 8 provides the method of example 1, wherein rotating the face region, according to one or more embodiments of the present disclosure, includes: and rotating the amplified face region according to the target rotation angle relative to the second target direction.
Example 9 provides the method of example 8, the rotating the face region, further comprising: and determining the target rotation angle according to the timestamp corresponding to the first image frame.
Example 10 provides the method of example 8, the step of magnifying and rotating the face region is performed after detecting a target trigger event, according to one or more embodiments of the present disclosure; correspondingly, the rotating the face region further includes: determining event characteristic information of the target trigger event; and determining the target rotation angle according to the event characteristic information and the timestamp corresponding to the first image frame.
Example 11 provides the method of example 7 or example 10, the target triggering event comprising: a target expression and/or a target action; accordingly, the event feature information includes: and the expression characteristic information of the target expression and/or the action characteristic information of the target action.
Example 12 provides, in accordance with one or more embodiments of the present disclosure, a video processing apparatus, the apparatus comprising: the first detection module is used for detecting a face area in a first image frame of an original video; the execution module is used for amplifying and rotating the face area to obtain a second image frame; a first output module for outputting the second image frame for video rendering.
Example 13 provides a computer readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the methods of examples 1-11, in accordance with one or more embodiments of the present disclosure.
Example 14 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to implement the steps of the methods of examples 1 to 11.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (10)

1. A method of video processing, the method comprising:
detecting a face region in a first image frame of an original video;
amplifying and rotating the face area to obtain a second image frame;
outputting the second image frame for video rendering;
the rotating the face region includes:
rotating the amplified face region according to a target rotation angle relative to a second target direction;
the rotating the face region further comprises: determining the target rotation angle according to the timestamp corresponding to the first image frame;
or, the step of magnifying and rotating the face region is performed after a target trigger event is detected; correspondingly, the rotating the face region further includes: determining event characteristic information of the target trigger event; determining the target rotation angle according to the event characteristic information and the timestamp corresponding to the first image frame;
the amplifying the human face area comprises:
determining an inclination angle of a face in the first image frame relative to a first target direction;
under the condition that the inclination angle meets a preset condition, carrying out position correction on the face region according to the inclination angle to obtain a corrected face region;
and amplifying the corrected human face area according to the amplification factor.
2. The method of claim 1, wherein prior to the step of magnifying and rotating the face region, the method further comprises:
detecting a target trigger event;
and in response to detecting the target trigger event, performing the steps of enlarging and rotating the face region.
3. The method of claim 2, further comprising:
and if the target trigger event is not detected, outputting the first image frame for video rendering.
4. The method of claim 2 or 3, wherein the target triggering event comprises: a target expression and/or a target action.
5. The method of claim 1, wherein the original video is a real-time video captured by a camera.
6. The method of claim 1, wherein the step of magnifying and rotating the face region is performed after detecting a target trigger event; accordingly, the number of the first and second electrodes,
the amplifying the human face area further comprises:
determining event characteristic information of the target trigger event;
and determining the magnification according to the event characteristic information.
7. The method of claim 1 or 6, wherein the target triggering event comprises: a target expression and/or a target action; accordingly, the number of the first and second electrodes,
the event characteristic information comprises: and the expression characteristic information of the target expression and/or the action characteristic information of the target action.
8. A video processing apparatus, characterized in that the apparatus comprises:
the first detection module is used for detecting a face area in a first image frame of an original video;
the execution module is used for amplifying and rotating the face area to obtain a second image frame;
a first output module for outputting the second image frame for video rendering;
the execution module is used for rotating the amplified face region according to a target rotation angle relative to a second target direction;
the execution module is used for determining the target rotation angle according to the timestamp corresponding to the first image frame;
or the executing module enlarges and rotates the face region after detecting a target triggering event; correspondingly, the execution module is used for determining event characteristic information of the target trigger event; determining the target rotation angle according to the event characteristic information and the timestamp corresponding to the first image frame;
the execution module is used for determining the inclination angle of the face in the first image frame relative to a first target direction; under the condition that the inclination angle meets a preset condition, carrying out position correction on the face region according to the inclination angle to obtain a corrected face region; and amplifying the corrected human face area according to the amplification factor.
9. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 7.
10. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.
CN202010089115.8A 2020-02-12 2020-02-12 Video processing method and device, readable medium and electronic equipment Active CN111277893B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010089115.8A CN111277893B (en) 2020-02-12 2020-02-12 Video processing method and device, readable medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010089115.8A CN111277893B (en) 2020-02-12 2020-02-12 Video processing method and device, readable medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111277893A CN111277893A (en) 2020-06-12
CN111277893B true CN111277893B (en) 2021-06-25

Family

ID=70999414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010089115.8A Active CN111277893B (en) 2020-02-12 2020-02-12 Video processing method and device, readable medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111277893B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035105A (en) * 2020-09-16 2020-12-04 北京思明启创科技有限公司 Rendering method, device and equipment of visualization area and storage medium
CN112188260A (en) * 2020-10-26 2021-01-05 咪咕文化科技有限公司 Video sharing method, electronic device and readable storage medium
CN112764845B (en) * 2020-12-30 2022-09-16 北京字跳网络技术有限公司 Video processing method and device, electronic equipment and computer readable storage medium
CN112887796B (en) * 2021-02-10 2022-07-22 北京字跳网络技术有限公司 Video generation method, device, equipment and medium
CN114793274A (en) * 2021-11-25 2022-07-26 北京萌特博智能机器人科技有限公司 Data fusion method and device based on video projection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101247482A (en) * 2007-05-16 2008-08-20 北京思比科微电子技术有限公司 Method and device for implementing dynamic image processing
CN101452582A (en) * 2008-12-18 2009-06-10 北京中星微电子有限公司 Method and device for implementing three-dimensional video specific action
CN106231415A (en) * 2016-08-18 2016-12-14 北京奇虎科技有限公司 A kind of interactive method and device adding face's specially good effect in net cast
CN108958610A (en) * 2018-07-27 2018-12-07 北京微播视界科技有限公司 Special efficacy generation method, device and electronic equipment based on face
CN109165570A (en) * 2018-08-03 2019-01-08 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN110378839A (en) * 2019-06-28 2019-10-25 北京字节跳动网络技术有限公司 Face image processing process, device, medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035413B (en) * 2017-09-01 2021-12-14 深圳市云之梦科技有限公司 Virtual fitting method and system for image deformation
CN111344715B (en) * 2017-09-13 2024-04-09 皇家飞利浦有限公司 Object recognition system and method
CN110162670B (en) * 2019-05-27 2020-05-08 北京字节跳动网络技术有限公司 Method and device for generating expression package

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101247482A (en) * 2007-05-16 2008-08-20 北京思比科微电子技术有限公司 Method and device for implementing dynamic image processing
CN101452582A (en) * 2008-12-18 2009-06-10 北京中星微电子有限公司 Method and device for implementing three-dimensional video specific action
CN106231415A (en) * 2016-08-18 2016-12-14 北京奇虎科技有限公司 A kind of interactive method and device adding face's specially good effect in net cast
CN108958610A (en) * 2018-07-27 2018-12-07 北京微播视界科技有限公司 Special efficacy generation method, device and electronic equipment based on face
CN109165570A (en) * 2018-08-03 2019-01-08 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN110378839A (en) * 2019-06-28 2019-10-25 北京字节跳动网络技术有限公司 Face image processing process, device, medium and electronic equipment

Also Published As

Publication number Publication date
CN111277893A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN111277893B (en) Video processing method and device, readable medium and electronic equipment
CN111405173B (en) Image acquisition method and device, point reading equipment, electronic equipment and storage medium
US12094085B2 (en) Video denoising method and apparatus, terminal, and storage medium
US11849211B2 (en) Video processing method, terminal device and storage medium
CN110728622B (en) Fisheye image processing method, device, electronic equipment and computer readable medium
CN111459364B (en) Icon updating method and device and electronic equipment
CN110839174A (en) Image processing method and device, computer equipment and storage medium
US12041379B2 (en) Image special effect processing method, apparatus, and electronic device, and computer-readable storage medium
CN111935442A (en) Information display method and device and electronic equipment
CN116934577A (en) Method, device, equipment and medium for generating style image
CN111586444A (en) Video processing method and device, electronic equipment and storage medium
CN111556248B (en) Shooting method, shooting device, storage medium and mobile terminal
CN116596748A (en) Image stylization processing method, apparatus, device, storage medium, and program product
CN114187169B (en) Method, device, equipment and storage medium for generating video special effect package
CN114125485B (en) Image processing method, device, equipment and medium
CN116069221A (en) Media content display method and device, electronic equipment and storage medium
CN113010258B (en) Picture issuing method, device, equipment and storage medium
CN113837918B (en) Method and device for realizing rendering isolation by multiple processes
CN116527993A (en) Video processing method, apparatus, electronic device, storage medium and program product
CN114897688A (en) Video processing method, video processing device, computer equipment and medium
CN110769129B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114419298A (en) Virtual object generation method, device, equipment and storage medium
CN111353929A (en) Image processing method and device and electronic equipment
CN113592734B (en) Image processing method and device and electronic equipment
CN115937010B (en) Image processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant