CN111836072A

CN111836072A - Video processing method, device, equipment and storage medium

Info

Publication number: CN111836072A
Application number: CN202010434409.XA
Authority: CN
Inventors: 张修宝; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2020-10-27
Anticipated expiration: 2040-05-21
Also published as: CN111836072B

Abstract

The present disclosure relates to a video processing method, apparatus, device, and storage medium. The method described herein includes obtaining feature point information for a given portion of an object in each of a plurality of frames of a video. The feature point information indicates positions of a plurality of feature points of a given portion in a plurality of frames, respectively. The method also includes determining, based on the feature point information, that a positional relationship of a plurality of feature points of the given portion varies across inter-frame positions between the plurality of frames. The method also includes selecting a plurality of target frames from the plurality of frames that assume different poses of the given portion based at least on the inter-frame position change. According to the scheme, on the basis of the feature point information, the frames presenting the differentiated postures of the given parts can be conveniently and quickly selected by measuring the change condition of the position relation of the feature points among the frames.

Description

Video processing method, device, equipment and storage medium

Technical Field

The present disclosure relates generally to the field of computer vision, and more particularly to video processing methods, apparatus, devices, and computer-readable storage media.

Background

Gesture and motion detection of objects is a technique commonly used in the field of human-computer interaction. By detecting the posture of a specific part of the object, various controls can be realized. For example, by detecting a plurality of facial gestures of a driver, the driver can be reminded to pay attention to driving safety, and safety supervision is realized; by detecting various facial gestures of the viewer, the screen angle of the display device can be automatically adjusted, and so on.

Currently, there are mainly two schemes for detecting the pose of an object. One approach is to determine a particular gesture by detecting motion of a particular part using conventional motion sensors (e.g., angle, displacement sensors, etc.). However, this solution has high requirements for hardware devices and is not suitable for many scenarios. Another approach is based on video processing techniques to detect different poses of a particular part of an object from each frame of a video by capturing a video of the particular object. The scheme has low requirement on hardware and is more suitable for different scenes.

Disclosure of Invention

According to some embodiments of the present disclosure, a video processing scheme is provided.

In a first aspect of the disclosure, a method for video processing is provided. The method includes obtaining feature point information for a given portion of an object in each of a plurality of frames of a video. The feature point information indicates positions of a plurality of feature points of a given portion in a plurality of frames, respectively. The method also includes determining, based on the feature point information, that a positional relationship of a plurality of feature points of the given portion varies across inter-frame positions between the plurality of frames. The method also includes selecting a plurality of target frames from the plurality of frames that assume different poses of the given portion based at least on the inter-frame position change.

In a second aspect of the disclosure, an apparatus for video processing is provided. The apparatus includes an acquisition module configured to acquire feature point information of a given portion of an object in each of a plurality of frames of a video. The feature point information indicates positions of a plurality of feature points of a given portion in a plurality of frames, respectively. The apparatus also includes a determination module configured to determine, based on the feature point information, that a positional relationship of a plurality of feature points of the given portion varies across inter-frame positions between the plurality of frames. The apparatus also includes a selection module configured to select, from the plurality of frames, a plurality of target frames exhibiting different poses of the given portion based at least on the inter-frame position change.

In a third aspect of the present disclosure, there is provided an electronic device comprising a memory and a processor, wherein the memory is for storing computer-executable instructions that are executed by the processor to implement a method according to the first aspect of the present disclosure.

In a fourth aspect of the present disclosure, a computer-readable storage medium is provided having computer-executable instructions stored thereon, wherein the computer-executable instructions are executed by a processor to implement a method according to the first aspect of the present disclosure.

According to various embodiments of the present disclosure, a user intent determination model is trained by using user data without user intent tags such that the model is able to better learn the interactions and representations between user features, thereby improving the accuracy of the user intent determination model.

Drawings

Features, advantages, and other aspects of various implementations of the disclosure will become more apparent with reference to the following detailed description when taken in conjunction with the accompanying drawings. Several implementations of the present disclosure are illustrated herein by way of example, and not by way of limitation, in the figures of the accompanying drawings:

FIG. 1 illustrates an example environment for video processing according to an embodiment of this disclosure;

FIG. 2 shows a flow diagram of a process for video processing according to an embodiment of the present disclosure;

FIG. 3 illustrates some examples of feature points of a frame according to embodiments of the present disclosure;

FIG. 4 shows another example of feature points of a frame according to an embodiment of the present disclosure;

FIG. 5 shows a flow diagram of a process for selecting a target frame, according to an embodiment of the present disclosure;

FIG. 6 shows a flowchart of a process for selecting a target frame based on an inter-frame position change, according to an embodiment of the present disclosure;

fig. 7 shows a block diagram of an apparatus for video processing according to an embodiment of the present disclosure; and

FIG. 8 illustrates a block diagram of a computing device in which one or more embodiments of the present disclosure may be implemented.

Detailed Description

Preferred implementations of the present disclosure will be described in more detail below with reference to the accompanying drawings. While a preferred implementation of the present disclosure is shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited by the implementations set forth herein. Rather, these implementations are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The term "include" and variations thereof as used herein is meant to be inclusive in an open-ended manner, i.e., "including but not limited to". Unless specifically stated otherwise, the term "or" means "and/or". The term "based on" means "based at least in part on". The terms "one example implementation" and "one implementation" mean "at least one example implementation". The term "another implementation" means "at least one additional implementation". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

In detecting the posture of the specific portion based on the video processing, since the video capturing is continuous, the posture difference of the specific portion is small in a short period of time, or the repetition of the posture occurs in a long period of time. Therefore, it is desirable to screen out some frames from the video that exhibit a differential pose for processing by subsequent tasks.

In some solutions, it is proposed to detect a specific part of an object (e.g. a face) from each frame of a video; estimating the orientation of the face in each frame to determine three attitude angles of a yaw angle (yaw), a pitch angle (pitch), and a roll angle (roll) of the face; frames exhibiting different poses are then selected according to the spacing of each pose angle. However, the estimation of the face angle needs to be completed by designing a complex model, which is expensive in computing resources and time consumption and is not suitable for a device with limited computing power.

Embodiments of the present disclosure propose a video processing scheme, in particular for selecting from a video a plurality of target frames presenting different poses of a given portion of an object. The scheme utilizes the characteristic point information of a given part of an object in a plurality of frames of the video to determine the change situation (called 'inter-frame position change') of the position relation of the plurality of characteristic points of the given part among the plurality of frames of the video. Target frames exhibiting different poses of a given portion are selected from a plurality of frames of the video, at least by an inter-frame position change.

According to the scheme, on the basis of the feature point information, the frames presenting the differentiated postures can be conveniently and quickly selected by measuring the change condition of the position relation of the feature points among the frames, and the method is low in process complexity and easy to realize. In addition, the feature point information may be already extracted in the process of object identification in the previous stage, so that no additional processing needs to be performed on the video frame when the target frame is selected, which makes the whole video processing process more efficient and less in computing resource consumption, and can be implemented even on a device with limited computing resources.

Some example embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates an example environment 100 for video processing according to an embodiment of this disclosure. As shown in fig. 1, the example environment 100 includes a computing device 110, the computing device 110 configured to process a video 120. The video 120 is a video related to the object 102 that is composed of a plurality of frames 122-1, 122-2, 122-3, … …, 122-N, where N is an integer greater than 1. For ease of discussion, frames 122-1, 122-2, 122-3, … …, 122-N are collectively or individually referred to herein as frames 122.

Computing device 110 may be a terminal device or a server device. The terminal device may be, for example, various portable or fixed terminals, such as a smartphone, a tablet, a desktop computer, a notebook computer, an in-vehicle device, a navigation device, a multimedia player device, a smart speaker, a smart wearable device (such as a smart watch, smart glasses), and so forth. The server device may be, for example, a centralized server, a distributed server, a mainframe, an edge computing device, a cloud, and so on.

The computing device 110 includes a detection processing module 130 configured to detect whether a given portion 132 of the object 102 is present in the frame 122 of the video 120 and determine an area in which the given portion 132 of the object 102 is located in the frame 122. In some cases, it may not be that every frame 122 of the video 120 presents the object 102 or a given portion 132 of the object 102. The frames may be filtered out of the various frames of the video 120 as detected by the detection processing module 130.

Computing device 110 also includes a frame selection module 140 configured to select a set of frames, e.g., frames 122-1, 122-5, 122-8, … …, 122-M, etc., from frames 122 of video 120 such that the different poses, i.e., the poses having a degree of difference, of a given portion 132 of presentation object 102 in the selected frames. The selected frame 122 is referred to herein as the target frame 122. Note that the frame numbers 122-1, 122-5, 122-8, … …, 122-M listed here are examples only, and do not represent that the target frames must be selected from the video 120 in this order.

The given portion 132 under consideration may be a portion of the object 102 having certain characteristics. In one example, the given portion 132 includes a face of the object 102, such as a face of a person. The selection of the target frame 122 will be discussed in more detail below in conjunction with the flow diagram of FIG. 2.

Fig. 2 shows a flow diagram of a process 200 for video processing according to an embodiment of the present disclosure. Process 200 may be implemented by computing device 110 of fig. 1, and in particular at frame selection module 140 of computing device 110, to screen out multiple target frames from multiple frames of a video that present different poses of a given portion 132 of an object.

At block 210, the computing device 110 obtains feature point information for each of the given portion 132 of the object 102 in the plurality of frames 122 of the video 120. The feature point information of each frame 122 indicates the location of a plurality of feature points of a given portion 132 of the object 102 in the plurality of frames 122, respectively.

The object 102 may be any object whose pose is variable. The given portion 132 of the object 102 of interest includes a face of the object, such as a face of a person. In the example embodiments below, for discussion purposes, an example is given for the human subject 102, and a description is given for the human face as an example. Some embodiments of the present disclosure may also be similarly applied to selection of poses for other objects and other parts, such as for the torso of a human body, and the like.

Herein, feature points, also referred to as keypoints, of a given portion of an object refer to feature locations that help identify or detect the given portion. For a face, the basic feature points include five feature points that identify facial five sense organs, namely two eye (e.g., eye center) feature points, a nose feature point (e.g., nose tip feature point), two mouth corner feature points. Fig. 3 shows different locations of five basic feature points of the face in some frames 122, including a left-eye feature point (denoted "a"), a right-eye feature point (denoted "B"), a nose feature point (denoted "P"), a left-mouth-corner feature point (denoted "C"), and a right-mouth-corner feature point (denoted "D"). The different positional relationships of these feature points will affect the determination of the pose of the face, as discussed further below in connection with these examples of fig. 3.

In addition to the five basic feature points, the number of feature points may be further expanded to more finely identify a given portion. For example, the feature points may also include further feature points that identify the outline of facial features. In addition to the five sense organs, the feature points of the face may include two or more fine feature points that identify the eyebrows, and/or a plurality of outer contour feature points that identify the face. Fig. 4 gives an example of 68 feature points identifying a face in frame 122. It should be understood that other numbers of feature points may also be present. Note that regardless of the total number of possible feature points, as the pose of the face changes, e.g., the face is significantly deflected to the left or right, all feature points may not be located in certain frames that exhibit such facial poses.

A frame 122 may be considered an image. Thus, in each frame 122, the position of each feature point may be represented as the coordinate position of the feature point in the two-dimensional coordinate system of the frame 122. The two-dimensional coordinate system includes a horizontal axis, which may correspond to a horizontal direction of the frame 122, and a vertical axis, which corresponds to a vertical direction of the frame 122. The origin of the coordinate system may be located, for example, at one of the four vertices (e.g., the lower left corner) of the frame 122, or may be located at the center of the frame 122. The setting of such a coordinate system may be very flexible, as long as a consistent coordinate system setting needs to be maintained over multiple frames 122. In the following example, for discussion purposes, the bottom left corner of the frame is primarily illustrated as the origin of the coordinate system.

In some embodiments, the frames 122 of interest in the video 120 may be those frames determined to render a given portion 132 of the object 102. Frames representing a given portion 132 of object 102 may be filtered, for example, by an object detection process (e.g., performed by detection processing module 130 in computing device 110). In some embodiments, the feature point information of a given portion 132 of the object 102 in each frame 122 may have been determined when detecting the given portion 132 from the frames 122, since determining such feature point information is typically a step in object detection. In some embodiments, the computing device 110 may also again locate feature points in each frame 122 if the feature point information cannot be directly obtained from other processes.

Since the frame rate of the frames 122 may be high, and the change in pose of a given portion 132 of the object 102 will typically be lower than the frame rate, in some embodiments, the computing device 110 may also sample a number of frames 122 at predetermined intervals from the frames comprised in the video 120 for selecting a target frame therefrom. That is, it may not be necessary to confirm for all frames of the video 120 one by one whether it is suitable to be selected as the target frame 122. Sampling at predetermined intervals may reduce the throughput of the frame selection process without causing some omission of certain poses for a given portion 132.

At block 220, the computing device 110 determines, based on the feature point information, that the positional relationship of the plurality of feature points of the given portion 132 varies across the inter-frame position between the plurality of frames 122. The feature point information reflects the positions of a plurality of feature points of the given portion 132 in the respective frames, and a change in the posture of the given portion 132 tends to cause a change in the positional relationship of these feature points. The selection of the target frame 122 from among the frames 122 is facilitated by determining the change in the positional relationship of the feature points among the frames 122. Accordingly, at block 230, the computing device 110 selects a plurality of target frames from the plurality of frames that exhibit different poses of the given portion 132 based at least on the inter-frame position change.

Different poses of a given portion 132 may result from different states of movement of the given portion 132. For example, a yaw of the face from side to side and/or a yaw from top to bottom may result in different poses of the face. For example, a different angle of the face to the left or right back in the horizontal direction (side face) may be considered a different side face posture, a different angle of the face to the up or down direction in the vertical direction (head up or head down) may be considered a different pitch posture, and a different angle of the face to both the left or right shoulder in the horizontal and vertical directions (head tilt) may result in a different head tilt posture.

Since the face may change over a continuous range of angles in the horizontal and vertical directions, if the angle change is small, the pose change is not visually perceived, but multiple frames 122 of the video 120 may capture such a pose. Alternatively, if the face is held against rotation for a period of time, the frames 122 captured during this period of time may also assume the same or a less different pose. In embodiments of the present disclosure, the plurality of target frames 122 selected by the computing device 110 from the video 120 are expected to be capable of presenting different, distinctive poses of a given portion 132.

In embodiments of the present disclosure, the exact angle of deflection of a given portion 132 in a certain direction may not be of concern when selecting a target frame 122, but rather only needs to be such that the pose of a given portion 132 is visually perceptually different, with some difference, over the selected plurality of frames. The positional relationship of the plurality of feature points in the respective frames 122 provides a basis for selection of such target frames across the inter-frame positional variation between the plurality of frames. In some embodiments, if the plurality of feature points vary greatly across the inter-frame position between two frames, this means that the pose of a given portion 132 varies greatly in the two frames, and thus the two frames may be selected as the target frame.

In some embodiments, a number of frames 122 in the video 122 to be considered may be evaluated on a frame-by-frame basis as to whether the frame is suitable for selection as a target frame. In scaling each frame 122, the computing device determines that the positional relationship of the plurality of feature points varies across the inter-frame position between the current frame 122 and each of the other target frames that have been selected. The current frame 122 may be selected as a target frame if the determined change in position between frames for the current frame 122 exceeds a predetermined change threshold. If the determined change in inter-frame position for the current frame 122 does not exceed the predetermined change threshold, then the current frame 122 should not be selected as the target frame and may be discarded. By performing similar determination for each frame, a plurality of target frames having different poses can be selected from the frames 122 of the video 120.

In some embodiments, in selecting the target frame 122, the computing device 110 may determine an intra-frame positional relationship within each frame 122 of a plurality of feature points of the given portion 132 in addition to the inter-frame positional relationship. The intra-frame positional relationship may also be determined based on the acquired feature point information. For each frame 122, the intra-frame positional relationship may indicate information about the pose of the given portion 132 within the frame to some extent, thereby facilitating determination of whether the current frame is suitable as a target frame.

How to select the target frame based on the inter-frame position change and the intra-frame position change determined from the feature point information will be described in more detail below with reference to fig. 5 and 6. Fig. 5 illustrates an example of a process 500 for selecting a target frame, in accordance with some embodiments of the present disclosure. In process 500, computing device 110 measures, for each frame of the plurality of frames 122 of video 120, whether it is suitable to be selected as a target frame.

Specifically, at block 510, the computing device 110 selects a given frame 122 from the unmeasured frames 122 of the plurality of frames 122. The computing device 110 may select from the beginning of the video 120 in order, for example, by first selecting the first frame 122 of the video 120. As mentioned above, in some embodiments, the computing device 110 may sample some frames 122 from the video 120 at predetermined intervals for determining whether it is suitable as a target frame. In other examples, the computing device 110 may not necessarily be in chronological order, but may randomly select a given frame 122 to be used next for the determination.

At block 520, the computing device 110 determines an intra-frame positional relationship within the given frame 122 of a plurality of feature points of the given portion 132. The intra-frame positional relationship indicates the relative positioning of the plurality of feature points within a given frame 122. As mentioned above, within each frame 122, the locations of the feature points may be represented as coordinate locations within a two-dimensional coordinate system. Thus, in some embodiments, the computing device 110 determines an offset condition in a predetermined direction for a predetermined feature point of the plurality of feature points relative to one or more reference feature points of the plurality of feature points. The predetermined direction may include a horizontal direction or a vertical direction. By observing the shift of a predetermined feature point relative to other feature points in either the horizontal or vertical direction within a given frame 122, it may be determined that the shift of the given portion 132 in either the horizontal or vertical direction is reflected.

In some embodiments where the given portion 132 is a face, the predetermined feature points to be observed may include nose feature points P, particularly nose tip feature points, because the positional relationship of the nose feature points relative to other feature points may change more significantly as the face moves in the horizontal and/or vertical directions. In some embodiments, in determining the intra-frame positional relationship, the computing device 110 may determine the positioning of the nose feature point P with respect to the area formed by the two eye feature points AB and the two mouth corner feature points CD. The region formed by the two eye feature points AB and the two mouth corner feature points CD may be regarded as one quadrangular region. The computing device 110 may determine whether the nose feature point P exceeds this region in the horizontal or vertical direction.

For example, if the face is deflected to the left or right by a particularly large angle in the horizontal direction, i.e., the side face angle is large, the nose feature point P may exceed the quadrangular region, for example, the left or right of the quadrangular region. As depicted in frame 122-3 in fig. 3, the nose feature point P exceeds the area formed by the two eye feature points AB and the two mouth corner feature points CD due to the large angle of left-handed deflection of the face. Similarly, if the face is deflected upward or downward by a particularly large angle in the vertical direction, i.e., the head-up or head-down angle is large, the nose feature point P may exceed a quadrangular region, for example, the upper or lower side of the region. In contrast, if the deflection angle of the face in the horizontal or vertical direction is not particularly large, the nose feature point P may still be within the quadrangular region. As depicted in frame 122-1 in fig. 3, the nose feature point P is within the area formed by the two eye feature points AB and the two mouth corner feature points CD. Although the quadrilateral areas are not depicted in frames 122-2 and 122-3 in FIG. 3, it can be seen that the nose feature point P is within the corresponding quadrilateral area.

There are many methods that can be used to determine whether the nose feature point P exceeds the area formed by the two eye feature points AB and the two mouth corner feature points CD. In one embodiment, the computing device 110 may calculate based on an area method. If the area of the quadrangular region (A, B, C, D) formed by the two eye feature points and the two mouth corner feature points is equal to the sum of the areas of the nose feature point P and any two of the four vertices, that is, the area of the quadrangular region (A, B, C, D) is equal to the sum of the areas of (P, A, B), (P, B, C), (P, C, D) and (P, D, A). If the area of the quadrangular region (A, B, C, D) is equal to the sum of the areas of the four triangles, it means that the nose feature point P does not exceed the region formed by the two eye feature points AB and the two mouth corner feature points CD; otherwise, it means that the nose feature point P is not in the region.

Of course, the determination may be made by comparing the abscissa and ordinate values of the nose feature point P with those of the two eye feature points AB and the two mouth angle feature points CD, and the like, in addition to the area method. Embodiments of the present disclosure are not limited in this respect.

In some embodiments, in determining the intra-frame positional relationship, if it is determined that the nose feature point P is out of the area formed by the two eye feature points AB and the two mouth corner feature points CD, the computing device 110 may also determine that the nose feature point P: whether or not its abscissa value is smaller than those of the left-eye feature point a and the left-mouth-angle feature point C in the horizontal direction (if the face is deflected leftward), whether or not its abscissa value is larger than those of the right-eye feature point B and the right-mouth-angle feature point D in the horizontal direction (if the face is deflected rightward), whether or not its abscissa value is larger than those of the left-eye feature point AB in the vertical direction, and so on. Further, in some embodiments, computing device 110 may also determine the extent to which nose feature point P is greater than or less than these feature points, e.g., may determine the difference between nose feature point P and the abscissa of left eye feature point a and left mouth corner feature point C.

In some embodiments, in addition to considering the shift of the nose feature point P with respect to the left and right eye feature points AB and the left and right mouth corner feature points CD in the horizontal or vertical direction, the relative shift between other feature points may be considered. For example, a shift between the nose feature point P relative to one or more outer contour feature points of the face, in particular outer contour feature points at approximately the same height in the vertical direction as the nose feature point, may be considered. In general, when the face is oriented straight ahead (i.e., the face), the offset between the nose feature point P and the two outer contour feature points that are bilaterally symmetrical is substantially equal due to the bilateral symmetry of the face. The two outer contours that are symmetrical to each other in the left-right direction as used herein refer to two outer contour feature points that are at substantially the same height in the vertical direction. As the face deflects left and right, the offset of the nose feature point P from the two outer contour feature points that are left-right symmetric in the horizontal direction changes, for example, the distance from one outer contour feature point becomes larger, and the distance from the other outer contour feature point becomes smaller. Accordingly, the computing device 110 may determine a ratio of two distances (difference in abscissa positions) in the horizontal direction of the nose feature point P and the two symmetric outer contour feature points as the intra-frame positional relationship of these feature points within the given frame 122.

In the vertical direction, the offset of the nose feature point P (e.g., nose tip feature point) in the vertical direction with respect to the other feature points (e.g., mouth center feature point and nose root feature point) may also be considered, for example, the ratio of two distances (difference in ordinate positions) of the nose tip feature point to the mouth center feature point and the nose root feature point in the vertical direction is determined as the intra-frame positional relationship of these feature points within the given frame 122.

In further examples, other feature points besides the nose feature point P, in particular, the nose tip feature point, such as a mouth corner feature point, a mouth center feature point, a feature point of the nose bridge, and so forth, may additionally or alternatively be considered, and the offset between these feature points with respect to other reference feature points may be determined as the intra-frame positional relationship of the plurality of feature points within the given frame 122. In general, intra-frame positional relationships are determined to indicate the relative positioning of a plurality of feature points within a particular frame 122.

It should be understood that, although five basic feature points are taken as an example in some examples above to discuss how to determine the positional relationship between the feature points, this does not mean that the acquired feature point information indicates only the positions of these five feature points within the respective frames 122. In some embodiments, the feature point information may indicate more feature points regarding facial features, but in order to facilitate determination of the positional relationship within the frame, for the parts of the left and right eyes, nose, mouth, and the like, a representative feature point may be selected from among or based on an average of the related feature points to represent the feature point of the corresponding part. Many variations of these aspects are possible. In the embodiment to be described below, when the feature points of facial five sense organs are mentioned, such a deformation is also applicable, and will not be specifically described below.

After determining the intra-frame positional relationship, at block 530, the computing device 110 determines whether the intra-frame positional relationship of the plurality of feature points of the given portion 132 within the current given frame 122 is indicative of a significant yaw attitude. Since intra-frame positional relationships may only be able to indicate the pose of a given portion 132 within one frame 122 and may not be directly used to determine pose changes relative to other frames, in some embodiments, intra-frame positional relationships are used to roughly divide whether a given portion 132 has a significant yaw pose within a current frame 122. A significant yaw attitude as referred to herein means that there is a significant yaw of a given portion 132 in either the horizontal or vertical direction.

In some embodiments, if the intra-frame positional relationship is determined to indicate that the nose feature point P is beyond the region formed by the two eye feature points AB and the two mouth corner feature points CD in the horizontal or vertical direction, the computing device 110 determines that the given portion 132 is in a significant yaw pose within the current given top frame 122 because such a relative positional relationship of these feature points would occur when the face is deflected to the left, to the right, or upwardly and downwardly by a particularly large angle, as depicted in frame 122-3 of fig. 3. In some embodiments, if the intra-frame positional relationship indicates that the nose feature point P exceeds the above-described region, the computing device 110 further determines whether the intra-frame positional relationship also indicates that the abscissa of the nose feature point P is smaller than the abscissa values of both the left-eye feature point a and the left-mouth corner feature point C (if the face is deflected to the left), or larger than the abscissa values of both the right-eye feature point B and the right-mouth corner feature point D (if the face is deflected to the right). If so, which means that the face is deflected to the left or right by a greater angle, the computing device 110 may determine that the given portion 132 is in a significant deflection pose within the current given top frame 122.

In some embodiments, if the intra-frame positional relationship is determined to indicate a shift condition between the nose feature point P relative to one or more outer contour feature points of the face, such as a ratio of two distances (difference in abscissa position) in the horizontal direction of the nose feature point P and two symmetric outer contour feature points, the computing device 110 may also compare the determined ratio with a predetermined threshold. For example, when the face is directed straight ahead, the ratio of the two distances is substantially equal to 1; as the face deflects left or right, this ratio will gradually deviate from 1 (greater or less than 1, depending on how the face deflects and which of the two distances is chosen as the denominator). Thus, a threshold value may be set. If the ratio of the two distances is greater than (or less than) the threshold, then it may be determined that the given portion 132 (e.g., face) has a very large degree of deflection, and the computing device 110 may determine that the given portion 132 is in a significant deflection posture. Similarly, if the intra-frame positional relationship indicates the ratio of two distances (difference in ordinate positions) in the vertical direction of the nose tip feature point to the mouth center feature point and the nose root feature point. The computing device 110 may also determine whether a given portion 132 (e.g., a face) is deflected by a significant angle, in a significant deflection posture, by comparing this ratio to a predetermined threshold (which may be different than the threshold associated with the outer contour feature points).

If computing device 110 determines that given portion 132 is not in a prominent yaw orientation, i.e., that given portion 132 has a small yaw angle and is in a non-prominent yaw orientation, computing device 110 may further determine whether given frame 122 is suitable for selection as a target frame based on the inter-frame position change at block 540. How to determine whether a given frame 122 is suitable to be selected as a target frame based on inter-frame position changes will be discussed in detail below with reference to process 600 of fig. 6.

If it is determined that the given portion 132 is in a significant deflection pose, at block 550, the computing device 110 further determines whether a significant deflection pose was detected in all frames 122 of the video 130. Because significant yaw poses may be repeated in the video 130, it is not desirable to always select a repeated pose as the target frame, so if the computing device 110 has not detected a significant yaw pose in any of the previous given frames 122, the computing device 110 selects the current given frame as the target frame at block 560. If the computing device 110 has detected a significant yaw attitude in any of the previous given frames 122, which means that the current frame 122 is not suitable to be selected again, the computing device 110 discards the current given frame at block 570.

In some embodiments, although process 500 of fig. 5 is not depicted, computing device 110 may also select multiple frames 122 in a significant deflection pose as target frames according to the degree of significant deflection of a given frame as indicated by the intra-frame positional relationship. As mentioned above, the intra-frame positional relationship indicates that the nose feature point P is smaller in the horizontal direction at the same time in the abscissa value thereof than the abscissa values of the left-eye feature point a and the left-mouth corner feature point C (if the face is deflected leftward), and also indicates the difference between the abscissas; or similarly also indicates that the abscissa value of the nose feature point P is larger than the abscissa values of the right-eye feature point B and the right-mouth-angle feature point D at the same time (if the face is deflected to the right), and also indicates the difference between the abscissas. In this case, computing device 110 may consider such a difference in abscissas a measure of the degree of significant deflection, and thereby select multiple frames 122 in the significant deflection pose as target frames such that the target frames have different degrees of significant deflection. For other frames 122 at similar degrees of significant deflection, the computing device 110 will no longer select them as target frames.

After the determination is made at block 540, process 500 proceeds to block 580 regardless of the determination result. Further, after

blocks

560 and 570, process 500 also proceeds to block 580. At block 580, the computing device 110 determines whether there are any unmet frames in the plurality of frames 122 of the video 120, i.e., the frames have not yet individually determined whether they can be selected as target frames. If there are more unmetered frames, the computing device 110 returns to block 510 and continues to repeat the process 500. By determining on a frame-by-frame basis, the computing device 110 may perform a determination on each frame 122 of the plurality of frames 122 of the video 120 to select a plurality of target frames therefrom. At this point, process 500 ends.

The following will continue with reference to process 600 of fig. 6 to specifically discuss how, at block 540 of process 500, if a given portion 132 is in a non-prominent yaw attitude, it is determined whether a given frame 122 is suitable for selection as a target frame based on inter-frame positional relationships.

Given portion 132 may still deflect over a large range of angles in the horizontal and/or vertical directions when given portion 132 is in an insignificant deflection attitude. Accordingly, it may be desirable to select multiple frames from these ranges of angles that exhibit different poses of given portion 132, in which case the selection may refer to the inter-frame positional relationship for given portion 132 between the current given frame 122 and the previously selected target frame.

In some embodiments, for a current given frame 122, the inter-frame positional relationship indicates an inter-frame positional relationship of the given portion 132 across the current given frame 122 and each target frame that has been previously selected, which may include a horizontal deflection change of the given portion 132 in a vertical direction across the given frame 122 and the selected target frame. Vertical deflection change refers to the change between the vertical deflection metric of a given portion 132 in a given frame 122 and the vertical deflection metric in a selected target frame; similarly, a horizontal deflection change refers to a change in the horizontal deflection metric of a given portion 132 in a given frame 122 from the horizontal deflection metric in a selected target frame.

An exemplary embodiment of how to measure the vertical deflection metric and the horizontal deflection metric within a frame to determine the vertical deflection change and the horizontal deflection change between frames is given in fig. 6. Note that in the example of fig. 6, vertical deflection changes and horizontal deflection changes are determined simultaneously to indicate inter-frame position changes. However, in other embodiments, the computing device 110 may only consider and calculate vertical deflection changes or horizontal deflection changes. If this is the case, in some steps of process 600, some parameter values associated with another deflection change will not need to be determined, and some steps associated with another deflection change may be omitted.

At block 610, for a given frame 122, if a vertical deflection metric for a given portion 132 in the given frame 122 is to be determined, the computing device 110 calculates a first angle from an angle formed by two line segments of a predetermined feature point of the given portion 132 respectively connected to a first pair of reference feature points, and calculates a second angle from the predetermined feature point to an angle formed by two line segments of a second pair of reference feature points; if a horizontal deflection metric is to be determined for a given portion 132 in a given frame 122, computing device 110 calculates a third angle from the angle formed by the two line segments connecting the predetermined feature point to the third pair of reference feature points, respectively, and calculates a fourth angle from the angle formed by the two line segments connecting the predetermined feature point to the fourth pair of reference feature points.

In embodiments where the given portion 132 is a face, the selection of particular feature points and reference feature points for vertical deflection variation is based on the observation that: if the face is moved in the vertical direction, the ratio of the angle between the connecting line between the feature point on the left-right symmetry axis of the face and the feature point on both sides of the symmetry axis changes. In some embodiments, the specific feature point may be selected as a nose feature point P, the first pair of reference feature points may be selected as left and right eye feature points AB, and the second pair of reference feature points may be selected as left and right mouth corner feature points CD.

For example, in the example frames 122-1, 122-2, and 122-4 of FIG. 3, the computing device 110 may calculate the included angle α formed by the line (line segment) between the nose feature point P and the left and right eye feature points AB₁(corresponding to "first angle"), and an angle formed by a line (line segment) connecting the nose feature point P and the left and right mouth corner feature points CDα₂(corresponding to the "second angle"). The calculation of the angle may for example be based on the positions of these feature points, for example coordinate values in a two-dimensional coordinate system. As can be seen from the example of fig. 3, these two angles α₁And alpha₂The ratio of (d) will vary with the angle of deflection of the face in the vertical direction.

In addition to the nose feature points, the left and right eye feature points, and the left and right mouth corner feature points, the computing device 110 may also select any other feature points that are on the left and right axes of symmetry of the face (if the feature point information indicates the location of these feature points in a given frame 122) when determining the vertical deflection metric. The specific feature points that may be selected include a tip feature point, a feature point on the bridge of the nose (as shown in fig. 4), and so forth. In addition to the left and right eye feature points, the first pair of reference feature points may be selected as a pair of feature points that are located above a specific feature point (e.g., a nose feature point) in the vertical direction and are in left-right symmetrical positions, such as left and right eyebrow center feature points, or a pair of symmetrical outer contour feature points in the upper half of the face. The second pair of reference feature points may be selected as a pair of feature points that are located vertically below a specific feature point (e.g., a nose feature point) and at left-right symmetrical positions, such as a pair of symmetrical feature points on the mouth, or a pair of symmetrical outer contour feature points on the lower half of the face.

For the case where the given portion 132 is a face, the selection of particular and reference feature points for horizontal deflection variation is based on the observation that: if the face moves in the horizontal direction, the ratio of the angle between the feature points on the upper and lower symmetry axes of the face and the line connecting the feature points on both sides of the symmetry axis changes. In some embodiments, the specific feature point may be selected as the nose feature point P, the third pair of reference feature points may be selected as the left-eye feature point a and the left-mouth corner feature point C, and the fourth pair of reference feature points may be selected as the right-eye feature point B and the right-mouth corner feature point D.

For example, in the example frames 122-1, 122-2, and 122-4 of FIG. 3, computing device 110 may calculate a connection between nose feature point P and left eye feature point A and left mouth corner feature point CAngle beta of included angle formed by line (line segment)₁(corresponding to the "third angle"), and an angle β formed by a line (line segment) connecting the nose feature point P and the right-eye feature point B and the right-mouth-angle feature point D₂(corresponding to a "fourth angle"). The calculation of the angle may for example be based on the positions of these feature points, for example coordinate values in a two-dimensional coordinate system. As can be seen from the example of fig. 3, these two angles β₁And beta₂The ratio of (d) will vary with the degree of deflection of the face in the horizontal direction.

Similarly, in addition to the feature points mentioned above, the computing device 110 may also select any other feature points that are on the top and bottom axis of symmetry of the face when determining the horizontal deflection metric (if the feature point information indicates the location of these feature points in a given frame 122). The specific feature points that may be selected include a tip feature point, a feature point on the bridge of the nose (as shown in fig. 4), and so forth. In addition to the left and right eye feature points, the third pair of reference feature points may be selected as a pair of feature points located on the left side of a specific feature point (e.g., a nose feature point) in the horizontal direction (the abscissa value of the third pair of reference feature points is smaller than that of the specific feature point), and at different positions up and down with respect to the specific feature point as a boundary point, such as a center feature point of the left eyebrow and a feature point on the mouth of the left half, or a pair of outline feature points on the upper and lower halves of the face. The fourth pair of reference feature points may be selected as a pair of feature points located on the right side of the specific feature point (e.g., a nose feature point) in the horizontal direction (the abscissa value of the fourth pair of reference feature points is smaller than that of the specific feature point), and at different positions above and below the specific feature point as a boundary point, such as a center feature point of the right eyebrow and a feature point on the right mouth half, or a pair of outer contour feature points on the upper and lower faces of the face.

At block 620, the computing device 110 determines a ratio of the first angle and the second angle, and a ratio of the third angle and the fourth angle. For example, for a vertical deflection metric for a given frame 122, in the example of fig. 3, the computing device 110 determines

Where m represents the ratio of the first angle and the second angle. For a horizontal deflection metric for a given frame 122, in the example of fig. 3, computing device 110 determines

Where n represents the ratio of the third angle and the fourth angle.

At block 630, the computing device 110 determines a vertical deflection metric and a horizontal deflection metric for the given portion 132 within the given frame 122 based on the determined ratio. In some embodiments, the computing device 110 may determine the vertical deflection metric directly as a ratio of the first angle and the second angle, e.g.

And/or the horizontal deflection metric may be determined directly as a ratio of the third angle and the fourth angle, e.g.

In other embodiments, the vertical deflection metric and/or the horizontal deflection metric may also be determined as a weighted value of a ratio of the corresponding angles, and/or summed with a predetermined offset.

At block 640, computing device 110 determines a vertical deflection change of given portion 132 across given frame 122 and the selected target frame based on a difference between a vertical deflection metric of given portion 132 within given frame 122 and a vertical deflection metric of given portion 132 within the selected target frame; and determining a horizontal deflection change of given portion 132 across given frame 122 and the selected target frame based on a difference between a horizontal deflection metric of given portion 132 within given frame 122 and a horizontal deflection metric of the given portion within the selected target frame. The vertical deflection change and/or the horizontal deflection change may be determined as a value of the corresponding difference, or may be weighted based on the value of the corresponding difference and/or summed with a predetermined bias to determine the vertical deflection change and/or the horizontal deflection change.

The computing device 110 may save the verticals of the previously selected target frames for a given portion 132A straight deflection metric and a horizontal deflection metric. For example, assume that a first list (denoted as list1) records therein vertical deflection metrics determined from a previously selected target frame, and a second list (denoted as list2) records therein horizontal deflection metrics determined from a previously selected target frame. Computing device 110 calculates the difference between the vertical deflection metric of given portion 132 within given frame 122 and each of the vertical deflection metrics in first list1, and calculates the difference between the horizontal deflection metric of given portion 132 within given frame 122 and each of the horizontal deflection metrics in second list 2. The list of these two differences can be expressed as: list1_ diff [ [ delta ] ]₁₁,Δ₁₂,…,Δ_1n]And list2_ diff [ [ Delta ] ]₂₁,Δ₂₂,…,Δ_2n]Where Δ indicates the difference between the vertical or horizontal deflection metrics of the two frames.

In some embodiments, if a previous target frame was selected because the intra-frame positional relationship indicated a significant deflection attitude, computing device 110 may also record in the first list and the second list, respectively, a vertical deflection metric and a horizontal deflection metric for given portion 132 within this target frame. In some embodiments, if the current given frame 122 is the first frame selected for measurement from the video 120, the computing device 110 may directly determine that frame as the target frame and record the vertical deflection metric and the horizontal deflection metric of the given portion 132 within that frame into the first list and the second list, respectively.

At block 650, the computing device 110 determines whether the change in vertical deflection exceeds a first change threshold. For example, the computing device 110 determines the difference list1_ diff [ [ Δ ] ]₁₁,Δ₁₂,…,Δ_1n]Whether any of the values exceeds a first variation threshold. The first variation threshold may be configured empirically and/or according to system requirements. If the vertical deflection change exceeds the first change threshold, which means that the given portion 132 has a more significant vertical deflection change in the given frame 122 relative to the existing target frame, accordingly, at block 660, the computing device 110 selects the given frame 122 as the target frame.

If the vertical deflection change does not exceed the first change threshold,at block 670, the computing device 110 determines whether the change in horizontal deflection exceeds a second change threshold. For example, the computing device 110 determines the difference list2_ diff [ [ Δ ] ]₂₁,Δ₂₂,…,Δ_2n]Whether any of the values exceeds a second variation threshold. The second variation threshold may be configured empirically and/or according to system requirements. The second variation threshold may or may not be the same as the first variation threshold. If the horizontal deflection change exceeds the second change threshold, which means that the given portion 132 has a more significant horizontal deflection change in the given frame 122 relative to the existing target frame, accordingly, at block 680, the computing device 110 selects the given frame 122 as the target frame.

If the vertical deflection change does not exceed the first change threshold and the horizontal deflection change does not exceed the second change threshold, at block 690, computing device 110 drops the current given frame 122.

After determining to select the given frame 122 as the target frame or to drop the given frame 122, the computing device 110 returns to block 580 of the process 500 to determine whether continued processing of the yet unmetered frames 122 is still needed.

The above describes a process of selecting a target frame from among a plurality of frames 122 of the video 120 by based on the inter-frame positional change and the intra-frame positional relationship. Through such a process, target frames presenting different poses of the given portion 132 can be selected without requiring accurate deflection angles of the given portion 132 in the horizontal and vertical directions for estimation, saving computational resources and improving overall frame selection efficiency.

Examples of the method according to the present disclosure have been described above in detail with reference to fig. 2 to 6, in the following the implementation of the respective apparatus will be described.

Fig. 7 illustrates a block diagram of an apparatus 700 for video processing according to some embodiments of the present disclosure. The apparatus 700 may be embodied as or included in the computing device 110.

As shown, the apparatus 700 includes an obtaining module 710 configured to obtain feature point information of a given portion of an object in each of a plurality of frames of a video, the feature point information indicating a location of a plurality of feature points of the given portion in the plurality of frames, respectively. The apparatus 700 further comprises a determining module 720 configured to determine, based on the feature point information, that a positional relationship of a plurality of feature points of the given portion varies across inter-frame positions between the plurality of frames. The apparatus 700 further includes a selection module 730 configured to select a plurality of target frames from the plurality of frames that exhibit different poses of the given portion based at least on the inter-frame position change.

In some embodiments, the determining module 710 includes: for a given frame of the plurality of frames, an inter-frame positional relationship determination module configured to determine that a positional relationship of the plurality of feature points varies across an inter-frame position between the given frame and the selected target frame if at least one frame of the plurality of frames has been selected as the target frame. In some embodiments, the selection module 730 includes: a comparison-based selection module configured to select a given frame as a target frame if a positional relationship of the plurality of feature points varies across an inter-frame position between the given frame and the selected target frame by more than a variation threshold.

In some embodiments, the selection module 730 includes: an intra-frame positional relationship determination module configured to determine, based on the feature point information, an intra-frame positional relationship of each of a plurality of feature points of the given portion within a plurality of frames; and a fine selection module configured to select a plurality of target frames based on the intra-frame positional relationship and the inter-frame positional change.

In some embodiments, the fine selection module comprises: for a given frame of the plurality of frames, a pose determination module configured to determine that the given portion is in a significant yaw pose or a non-significant yaw pose within the given frame based on an intra-frame positional relationship of a plurality of feature points of the given portion within the given frame; a significant-deflection-pose-based selection module configured to select a given frame as a target frame if it is determined that the given portion is at a significant deflection pose within the given frame; and a non-significant deflection pose based selection module configured to determine whether the given frame is suitable for selection as a target frame based on inter-frame position changes if it is determined that the given portion is in a non-significant deflection pose within the given frame.

In some embodiments, the intra-frame position relationship determination module comprises: for a given frame of the plurality of frames, an offset case determination module configured to determine an offset case in a predetermined direction of a predetermined feature point of the plurality of feature points relative to one or more reference feature points of the plurality of feature points.

In some embodiments, the given portion comprises a face, the predetermined feature points comprise nose feature points, the reference feature points comprise two eye feature points and two mouth corner feature points of the face, and the predetermined direction is a horizontal or vertical direction. In some embodiments, the offset case determination module comprises: a region relation determination module configured to determine whether the nose feature point is beyond or within a region formed by the two eye feature points and the two mouth angle feature points in a horizontal or vertical direction.

In some embodiments, the inter-frame position relationship determination module comprises: a first angle determination module configured to calculate, for a given frame, a first angle of an included angle formed by two line segments respectively connected from a predetermined feature point of the plurality of feature points to a first pair of reference feature points of the plurality of feature points; a second angle determination module configured to calculate, for a given frame, a second angle of an included angle formed by two line segments respectively connected from the predetermined feature points to a second pair of reference feature points among the plurality of feature points; a vertical deflection metric determination module configured to determine a vertical deflection metric for the given portion within the given frame based on a ratio of the first angle to the second angle; and a vertical deflection change determination module configured to determine a vertical deflection change of the given portion across the given frame and the selected target frame based on a difference between a vertical deflection metric of the given portion within the given frame and a vertical deflection metric of the given portion within the selected target frame.

In some embodiments, the given portion comprises a face, the predetermined feature points comprise nose feature points, and the first pair of reference feature points comprises two eye feature points, and the second pair of reference points comprises two mouth corner feature points.

In some embodiments, the inter-frame position relationship determination module comprises: a third angle determination module configured to calculate, for a given frame, a third angle of an included angle formed by two line segments respectively connected from a predetermined feature point of the plurality of feature points to a third pair of reference feature points of the plurality of feature points; a fourth angle determination module configured to calculate, for the given frame, a fourth angle of an angle formed by two line segments respectively connected from the predetermined feature point to a fourth pair of reference feature points among the plurality of feature points; a horizontal deflection metric determination module configured to determine a horizontal deflection metric for the given portion within the given frame based on a ratio of the third angle to the fourth angle; and a horizontal deflection change determination module configured to determine a horizontal deflection change of the given portion across the given frame and the selected target frame based on a difference between a horizontal deflection metric of the given portion within the given frame and a horizontal deflection metric of the given portion within the selected target frame.

In some embodiments, the given portion comprises a face, the predetermined feature points comprise nose feature points, and the third pair of reference feature points comprise left eye feature points and left mouth corner feature points, and the fourth pair of reference points comprise right eye feature points and right mouth corner feature points.

Fig. 8 illustrates a block diagram that illustrates a computing device/server 800 in which one or more embodiments of the disclosure may be implemented. It should be understood that the computing device/server 800 illustrated in fig. 8 is merely exemplary and should not be construed as limiting in any way the functionality and scope of the embodiments described herein. Computing device 110 of fig. 1 may be implemented as or included in computing device/server 800.

As shown in fig. 8, computing device/server 800 is in the form of a general purpose computing device. Components of computing device/server 800 may include, but are not limited to, one or more processors or processing units 810, memory 820, storage 830, one or more communication units 840, one or more input devices 850, and one or more output devices 860. The processing unit 810 may be a real or virtual processor and can perform various processes according to programs stored in the memory 820. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capability of computing device/server 800.

Computing device/server 800 typically includes a number of computer storage media. Such media may be any available media that is accessible by computing device/server 800 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. The memory 820 may be volatile memory (e.g., registers, cache, Random Access Memory (RAM)), non-volatile memory (e.g., Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof. Storage device 830 may be a removable or non-removable medium and may include a machine-readable medium, such as a flash drive, a magnetic disk, or any other medium that may be capable of being used to store information and/or data (e.g., training data for training) and that may be accessed within computing device/server 800.

Computing device/server 800 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, non-volatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces. Memory 820 may include a computer program product 825 having one or more program modules configured to perform the various methods or acts of the various embodiments of the disclosure.

Communication unit 840 enables communication with other computing devices over a communication medium. Additionally, the functionality of the components of computing device/server 800 may be implemented in a single computing cluster or multiple computing machines capable of communicating over a communications connection. Thus, computing device/server 800 may operate in a networked environment using logical connections to one or more other servers, network Personal Computers (PCs), or another network node.

The input device 850 may be one or more input devices such as a mouse, keyboard, trackball, or the like. The output device(s) 860 may be one or more output devices such as a display, speakers, printer, or the like. Computing device/server 800 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., as desired, through communication unit 840, with one or more devices that enable a user to interact with computing device/server 800, or with any device (e.g., network card, modem, etc.) that enables computing device/server 800 to communicate with one or more other computing devices. Such communication may be performed via input/output (I/O) interfaces (not shown).

According to an exemplary implementation of the present disclosure, a computer-readable storage medium having stored thereon computer-executable instructions is provided, wherein the computer-executable instructions are executed by a processor to implement the above-described method. According to an exemplary implementation of the present disclosure, there is also provided a computer program product, tangibly stored on a non-transitory computer-readable medium and comprising computer-executable instructions, which are executed by a processor to implement the method described above.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, devices and computer program products implemented in accordance with the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing has described implementations of the present disclosure, and the above description is illustrative, not exhaustive, and not limited to the implementations disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described implementations. The terminology used herein was chosen in order to best explain the principles of various implementations, the practical application, or improvements to the technology in the marketplace, or to enable others of ordinary skill in the art to understand various implementations disclosed herein.

Claims

1. A method for video processing, comprising:

acquiring feature point information of a given part of an object in a plurality of frames of a video, wherein the feature point information respectively indicates positions of a plurality of feature points of the given part in the plurality of frames;

determining, based on the feature point information, that a positional relationship of the plurality of feature points of the given portion varies across inter-frame positions between the plurality of frames; and

selecting, from the plurality of frames, a plurality of target frames that assume different poses of the given portion based at least on the inter-frame positional changes.

2. The method of claim 1, wherein determining the inter-frame position change comprises: for a given frame of the plurality of frames,

determining that the positional relationship of the plurality of feature points varies across the inter-frame positions between the given frame and the selected target frame if at least one of the plurality of frames has been selected as a target frame, and

wherein selecting the plurality of target frames comprises:

selecting the given frame as a target frame if the positional relationship of the plurality of feature points varies across the inter-frame position between the given frame and the selected target frame by more than a variation threshold.

3. The method of claim 1, wherein selecting the plurality of target frames comprises:

determining, based on the feature point information, respective intra-frame positional relationships of the plurality of feature points of the given portion within the plurality of frames; and

selecting the plurality of target frames based on the intra-frame positional relationship and the inter-frame positional change.

4. The method of claim 3, wherein selecting the plurality of target frames based on the intra-frame positional relationship and the inter-frame positional change comprises:

for a given frame of the plurality of frames,

determining that the given portion is in a significant yaw attitude or a non-significant yaw attitude within the given frame based on an intra-frame positional relationship of the plurality of feature points of the given portion within the given frame;

selecting the given frame as a target frame if it is determined that the given portion is in the significant-deflection pose within the given frame; and

determining whether the given frame is suitable to be selected as a target frame based on the inter-frame position change if it is determined that the given portion is in the insignificant deflection attitude within the given frame.

5. The method of claim 3, wherein determining the intra-frame positional relationship comprises:

for a given frame of the plurality of frames,

determining an offset in a predetermined direction of a predetermined feature point of the plurality of feature points relative to one or more reference feature points of the plurality of feature points.

6. The method of claim 5, wherein the given portion comprises a face, the predetermined feature points comprise nose feature points, the reference feature points comprise two eye feature points and two mouth corner feature points of the face, the predetermined direction is a horizontal or vertical direction, and wherein determining the offset condition comprises:

determining whether the nose feature point is beyond or within an area formed by the two eye feature points and the two mouth angle feature points in a horizontal or vertical direction.

7. The method of claim 2, wherein determining that the positional relationship of the plurality of feature points varies across the inter-frame position between the given frame and the selected target frame comprises:

calculating, for the given frame, a first angle of an included angle formed from two line segments in which predetermined feature points of the plurality of feature points are respectively connected to a first pair of reference feature points of the plurality of feature points;

calculating, for the given frame, a second angle of an included angle formed by two line segments respectively connected from the predetermined feature points to a second pair of reference feature points among the plurality of feature points;

determining a vertical deflection metric for the given portion within the given frame based on a ratio of the first angle to the second angle; and

determining a vertical deflection change of the given portion across the given frame and the selected target frame based on a difference between the vertical deflection metric of the given portion within the given frame and a vertical deflection metric of the given portion within the selected target frame.

8. The method of claim 7, wherein the given portion comprises a face, the predetermined feature points comprise nose feature points, and

wherein the first pair of reference feature points comprises two eye feature points and the second pair of reference points comprises two mouth corner feature points.

9. The method of claim 6, wherein determining that the positional relationship of the plurality of feature points varies across the inter-frame position between the given frame and the selected target frame comprises:

calculating, for the given frame, a third angle of an included angle formed from two line segments connecting predetermined feature points of the plurality of feature points to a third pair of reference feature points of the plurality of feature points, respectively;

calculating, for the given frame, a fourth angle of an included angle formed by two line segments respectively connected from the predetermined feature point to a fourth pair of reference feature points among the plurality of feature points;

determining a horizontal deflection metric for the given portion within the given frame based on a ratio of the third angle to the fourth angle; and

determining a horizontal deflection change of the given portion across the given frame and the selected target frame based on a difference between the horizontal deflection metric of the given portion within the given frame and a horizontal deflection metric of the given portion within the selected target frame.

10. The method of claim 9, wherein the given portion comprises a face, the predetermined feature points comprise nose feature points, and

wherein the third pair of reference feature points comprises a left-eye feature point and a left-mouth-corner feature point, and the fourth pair of reference points comprises a right-eye feature point and a right-mouth-corner feature point.

11. An apparatus for video processing, comprising:

an acquisition module configured to acquire feature point information of a given portion of an object in each of a plurality of frames of a video, the feature point information indicating positions of a plurality of feature points of the given portion in the plurality of frames, respectively;

a determination module configured to determine, based on the feature point information, that a positional relationship of the plurality of feature points of the given portion varies across inter-frame positions between the plurality of frames; and

a selection module configured to select a plurality of target frames from the plurality of frames that assume different poses of the given portion based at least on the inter-frame positional changes.

12. An electronic device, comprising:

a processor; and

a memory storing computer-executable instructions configured, when executed by the processor, to implement the method of any one of claims 1 to 10.

13. A computer-readable storage medium having computer-executable instructions stored thereon, wherein the computer-executable instructions are executed by a processor to implement the method of any one of claims 1 to 10.