Detailed Description
the present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
fig. 1 shows an exemplary system architecture 100 to which the method for detecting face keypoints or the apparatus for detecting face keypoints of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a voice interaction application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, 103.
the terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
When the terminal devices 101, 102, 103 are hardware, an image capturing device may be mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user may capture video using an image capture device on the terminal device 101, 102, 103.
The terminal devices 101, 102, and 103 may perform processing such as face detection on frames in a video played by the terminal devices or a video recorded by a user; the face detection result (e.g., the position information of the face detection frame) may also be analyzed and the like, and the position of the face detection frame may be updated.
The server 105 may be a server that provides various services, such as a video processing server for storing, managing, or analyzing videos uploaded by the terminal devices 101, 102, 103. The video processing server may store a large amount of video and may transmit the video to the terminal apparatuses 101, 102, 103.
The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for generating information provided in the embodiments of the present application is generally executed by the terminal devices 101, 102, and 103, and accordingly, the apparatus for generating information is generally disposed in the terminal devices 101, 102, and 103.
It is noted that in the case where the terminal devices 101, 102, 103 can implement the related functions of the server 105, the server 105 may not be provided in the system architecture 100.
it should be further noted that the server 105 may also perform processing such as face detection on the stored video or the video uploaded by the terminal devices 101, 102, and 103, and return the processing result to the terminal devices 101, 102, and 103. At this time, the method for generating information provided in the embodiment of the present application may also be executed by the server 105, and accordingly, the apparatus for generating information may also be provided in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
with continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating information in accordance with the present application is shown. The method for generating information comprises the following steps:
Step 201, obtaining position information of a first face detection frame obtained after performing face detection on a current frame of a target video in advance, and obtaining position information of a second face detection frame in a previous frame of the current frame stored in advance.
In the present embodiment, the execution subject of the method for generating information, for example, the terminal apparatuses 101, 102, 103 shown in fig. 1) may perform recording or playing of a video. The played video can be a video pre-stored locally; or may be a video obtained from a server (e.g., server 105 shown in fig. 1) via a wired connection or a wireless connection. Here, when recording a video, the execution main body may be mounted or connected with an image capture device (e.g., a camera). It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
In this embodiment, the execution subject may obtain position information of a first face detection frame obtained after performing face detection on a current frame of the target video in advance, and obtain position information of a second face detection frame stored in advance in a previous frame of the current frame. The target video may be a video currently being played or a video being recorded by a user. And is not limited herein.
Here, the current frame of the target video may be a frame of the target video whose face detection frame is to be subjected to position update. As an example, the execution subject may sequentially perform face detection on each frame in the target video in the order of the timestamps of the frames, and after performing face detection on each frame except for the first frame, may perform position correction on the obtained face detection frame. The frame to be subjected to the position correction of the face detection frame at present may be referred to as a current frame of the target video. Take the following two scenarios as examples:
In one scenario, the target video may be a video being played by the executing entity. In the process of playing the target video, the execution main body can carry out face detection on each frame to be played one by one to obtain the position information of the face detection frame of the frame. When the frame is a non-first frame, after the position information of the face detection frame is obtained, the position information of the face detection frame of the frame can be corrected, and then the frame is played. The frame to be subjected to the face detection frame position correction at the current time may be the current frame.
in another scenario, the target video may be a video that the execution subject is recording. In the recording process of the target video, the execution main body can carry out face detection on each captured frame one by one to obtain the position information of the face detection frame of the frame. After the first frame is captured, after face detection is performed on each next captured frame, position correction can be performed on the obtained face detection frame, and then the frame is displayed. The latest frame obtained at the current time and to which the position correction of the face detection frame has not been performed may be the current frame.
It should be noted that the execution subject may perform face detection on the frame of the target video in various ways. As an example, the execution subject may store a face detection model trained in advance. The execution main body can input the frame in the frames in the target video into a pre-trained face detection model to obtain the position information of the face detection frame of the frame. The face detection model may be used to detect a region where a face object in an image is located (which may be represented by a face detection frame, where the face detection frame may be a rectangular frame). In practice, the face detection model may output position information of the face detection frame. Here, the face detection model may be obtained by performing supervised training on an existing convolutional neural network based on a sample set (including a face image and a label indicating a position of a face object region) by using a machine learning method. Various existing structures can be used for the convolutional neural network, such as DenseBox, VGGNet, ResNet, SegNet, and the like. It should be noted that the machine learning method and the supervised training method are well-known technologies that are widely researched and applied at present, and are not described herein again.
It is to be noted that the position information of the face detection frame may be information for indicating and uniquely determining the position of the face detection frame in the frame.
Alternatively, the position information of the face detection box may include coordinates of four vertices of the face detection box.
Optionally, the position information of the face detection frame may include coordinates of any diagonal vertex of the face detection frame. Such as the coordinates of the top left vertex and the coordinates of the bottom right vertex.
alternatively, the position information of the face detection box may include coordinates of any vertex of the face detection box and the length and width of the face detection box.
It should be noted that the position information is not limited to the above list, and may include other information that can be used to indicate and uniquely determine the position of the face detection frame.
step 202, determining the intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information.
In this embodiment, the execution subject may determine an Intersection-over-unity (IOU) ratio of the first face detection frame and the second face detection frame based on the acquired position information of the first face detection frame and the acquired position information of the second face detection frame.
In practice, the intersection ratio of two rectangles may be the ratio of the area of the region where the two rectangles intersect to the area of the region where the two rectangles are in phase. Here, the area of the region of the two rectangles in phase is equal to the sum of the areas of the two rectangles minus the area of the region of intersection of the two rectangles. In practice, the cross-over ratio is a number in the interval [0,1 ].
in this embodiment, the position of the face detection frame in the frame can be determined by the position information of the face detection frame. Therefore, the coordinates of each vertex of the first face detection frame in the current frame can be determined by the position information of the first face detection frame. And determining the coordinates of each vertex of the second face detection frame in the previous frame of the current frame according to the position information of the second face detection frame. As an example, if the position information of the face detection box may include the coordinates of any vertex (e.g., upper left vertex) of the face detection box and the length and width of the face detection box, the abscissa of the upper left vertex may be added to the length, and the ordinate of the upper left vertex may be added to the width, resulting in the coordinates of the upper right vertex, lower left vertex, and lower right vertex, respectively.
in this embodiment, the vertex coordinates of the first face detection frame and the vertex coordinates of the second face detection frame are obtained. Therefore, the length and the width of the rectangle intersecting the first face detection frame and the second face detection frame can be determined by using the vertex coordinates of the first face detection frame and the vertex coordinates of the second face detection frame. Further, the area of the intersecting rectangles (which may be referred to as the intersection area) can be obtained. Then, the sum of the areas of the first face detection frame and the second face detection frame (which may be referred to as a total area) may be calculated. The difference between the total area and the intersection area (which may be referred to as the phase-parallel area) may then be calculated. Finally, the ratio of the intersection area to the parallel area may be determined as the intersection ratio of the first face detection frame and the second face detection frame.
It should be noted that the intersection ratio calculation method is a well-known technique widely studied and applied at present, and is not described herein again.
step 203, based on the intersection ratio, determines the weight of the acquired position information of each face detection frame.
in this embodiment, the executing entity may determine the weight of the position information of the first face detection frame and the weight of the second face detection frame respectively based on the intersection ratio determined in step 202. See in particular the following steps:
In the first step, the intersection ratio may be calculated in a formula established in advance, and the calculation result is determined as the weight of the position information of the second face detection frame. The formula established in advance may be various formulas satisfying preset conditions, and is not limited herein. The preset conditions include: the larger the intersection ratio is, the larger the calculation result of the formula is; the smaller the cross-over ratio, the smaller the calculation of the above formula. When the intersection ratio is 0, the calculation result is 0; when the intersection ratio is 1, the calculation result is 1.
In the second step, a difference value between a preset value (e.g., 1) and the weight of the position information of the second face detection frame may be determined as the weight of the position information of the first face detection frame.
the order of determining the weight of the position information of the first face detection frame and the weight of the position information of the second face detection frame is not limited here. The execution body may modify the formula established in advance to determine the weight of the position information of the first face detection frame first and then determine the weight of the position information of the second face detection frame.
In some optional implementations of the embodiment, the execution body may perform the exponentiation operation with the cross-over ratio as a base number and a first preset value (e.g., 6, or 3, etc.) as an exponent. Here, the first preset value may be determined by a skilled person based on a large number of data statistics and experiments. Then, the execution subject may determine the calculation result of the exponentiation as the weight of the position information of the second face detection frame, and determine a difference between a second preset value (e.g., 1) and the determined weight of the position information of the second face detection frame as the weight of the position information of the first face detection frame.
In some optional implementations of the embodiment, the execution subject may perform an exponentiation operation with a natural constant as a base number and a difference between an inverse of the cross-over ratio and a second preset value (e.g., 1) as an exponent. Then, the inverse of the result of the exponentiation calculation may be determined as the weight of the position information of the second face detection frame, and the difference between the second preset value and the determined weight of the position information of the second face detection frame may be determined as the weight of the position information of the first face detection frame.
The execution body may determine the weight of each acquired location information in other manners. And is not limited to the above implementation. For example, the exponentiation may be performed with a predetermined value (e.g., 2 or 3) as a base and a difference between the reciprocal of the cross ratio and a second predetermined value (e.g., 1) as an exponent. Then, the inverse of the result of the exponentiation calculation may be determined as the weight of the position information of the second face detection frame, and the difference between the second preset value and the weight may be determined as the weight of the position information of the first face detection frame.
In the conventional method, the average value of the coordinates of the corresponding vertices (for example, both top-left vertices) in the face detection frames in the previous frame and the current frame is generally used as the coordinates after correction of the vertex (top-left vertex) in the current frame. This results in corrected coordinates of each vertex of the current frame. In the mode, when the human face object moves fast, the human face detection frame cannot follow the motion of the human face object, the dragging sense is strong, and the accuracy is low. And the position of the face detection frame in the current frame is corrected by utilizing the weight determined based on the intersection ratio in the application. The larger the cross-over ratio, the slower the face object moves; the smaller the intersection ratio, the faster the face object moves. Therefore, different weights can be calculated according to different intersection ratios. Therefore, dragging feeling is reduced, and timeliness and accuracy of the face detection frame are improved.
In the conventional method, there is also a method of determining the weight of the vertex coordinates of the face detection frame by the distance between the coordinates of the corresponding vertex (for example, both top left vertices) in the face detection frame in the previous frame and the current frame. However, in this method, the weights of the coordinates of the respective vertices are independent, and the entire face detection frame cannot be considered. Therefore, the smoothing effect is poor. In the present application, the entire area of the face detection frame is considered in the process of determining the cross-over ratio by using the weight determined based on the cross-over ratio, and the weights of the vertex coordinates in the same face detection frame are the same, so that the face detection frame is considered as a whole. The smoothing effect is improved.
Step 204, determining target position information of the first face detection frame based on the determined weight and the acquired position information, so as to update the position of the first face detection frame.
In this embodiment, the execution subject described above may determine the target position information of the first face detection frame based on the determined weight and the acquired position information to update the position of the first face detection frame. Here, the execution body may correct the position information of the first face detection frame based on the determined weight. That is, the vertex coordinates of the first face detection frame are corrected.
In some optional implementations of the present embodiment, the position information of the face detection box may include coordinates of four vertices of the face detection box. In this case, the execution body may correct the coordinates of the first face detection frame. Specifically, for each vertex, the following steps may be performed (the sitting vertex is described here as an example, and the rest of the vertices are not described again):
firstly, weighting the abscissa of the upper left vertex of the first face detection frame and the abscissa of the upper left vertex of the second face detection frame. That is, the abscissa of the top-left vertex of the first face detection frame is multiplied by the weight of the position information of the first face detection frame to obtain a first numerical value. And multiplying the abscissa of the upper left vertex of the second face detection frame by the weight of the position information of the second face detection frame to obtain a second numerical value. And determining the product of the first numerical value and the second numerical value as the abscissa of the top left vertex of the modified first face detection frame.
And secondly, weighting the vertical coordinate of the upper left vertex of the first face detection frame and the vertical coordinate of the upper left vertex of the second face detection frame. That is, the vertical coordinate of the top-left vertex of the first face detection frame is multiplied by the weight of the position information of the first face detection frame to obtain the third numerical value. And multiplying the vertical coordinate of the upper left vertex of the second face detection frame by the weight of the position information of the second face detection frame to obtain a fourth numerical value. And determining the product of the third numerical value and the fourth numerical value as the ordinate of the top left vertex of the modified first face detection frame.
And step three, summarizing the horizontal coordinates and the vertical coordinates obtained in the step one and the step two into coordinates of the top left vertex of the corrected first face detection frame.
after the coordinates of each vertex of the first face detection frame are corrected, the electronic device may summarize the corrected coordinates of each vertex as the target position information. Thus, the position of the first face detection frame can be updated.
In some optional implementations of the embodiment, the position information of the first face detection box includes specified diagonal vertex coordinates of the first face detection box, and the position information of the second face detection box includes specified diagonal vertex coordinates of the second face detection box. The coordinates of the designated diagonal vertex of the first face detection box may include coordinates of a first vertex (e.g., top left vertex) and coordinates of a second vertex (e.g., bottom right vertex). The specified diagonal vertex coordinates of the second face detection box described above may include coordinates of a third vertex (e.g., an upper left vertex) and coordinates of a fourth vertex (e.g., a lower right vertex). In this case, the execution agent may update the position of the first face detection frame by determining a weight of the position information of the first face detection frame as a weight of the designated diagonal vertex coordinates of the first face detection frame, determining a weight of the position information of the second face detection frame as a weight of the designated diagonal vertex coordinates of the second face detection frame, and determining a result of weighting calculation between the designated diagonal vertex coordinates of the first face detection frame and the designated diagonal vertex coordinates of the second face detection frame as target diagonal vertex coordinates of the first face detection frame.
Optionally, the target diagonal vertex coordinates of the first face detection box may be calculated according to the following operation sequence:
First, a result of weighted calculation of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate may be determined as a first target abscissa;
next, a weighted calculation result of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate may be determined as a first target ordinate;
Next, a weighted calculation result of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate may be determined as a second target abscissa;
Next, a weighted calculation result of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate may be determined as a second target ordinate;
Finally, coordinates formed by the first target abscissa and the first target ordinate and coordinates formed by the second target abscissa and the second target ordinate may be determined as target diagonal vertex coordinates of the first face detection frame. Since the coordinates of a set of diagonal vertices are known, the location of the rectangular box can be uniquely determined. Therefore, the position of the first face detection frame can be updated.
it should be noted that, in this implementation, other operation sequences may also be used to calculate the target diagonal vertex coordinates of the first face detection box, which is not described herein again.
It should be noted that, in this implementation, after the target diagonal vertex coordinates are calculated, another pair of diagonal vertex coordinates of the first face detection box may also be calculated according to the target diagonal vertex coordinates. Thereby obtaining coordinates of four vertices of the first face detection box.
In some optional implementations of the present embodiment, the position information of the face detection box may include coordinates of any vertex of the face detection box and a length and a width of the face detection box. In this case, the execution body may first specify coordinates of a diagonal vertex of the vertex based on the coordinates of the vertex, the length, and the width. Alternatively, the coordinates of the remaining three vertices are determined. Then, the target position information of the first face detection frame may be determined by using the operation procedures described in the above two implementation manners. Therefore, the position of the first face detection frame is updated.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 3, the user records a target video using the self-timer mode of the terminal device 301.
After capturing the first frame, the terminal device performs face point detection on the first frame by using the stored face detection model, and obtains position information 302 of a face detection frame in the first frame.
After capturing the second frame, the terminal device performs face detection on the second frame by using the stored face detection model. Then, the position information 303 of the face detection frame of the second frame is acquired. Meanwhile, the position information 302 of the face detection frame in the first frame is acquired. Next, the intersection ratio of the first face detection frame and the second face detection frame may be determined based on the position information 302 and the position information 303. Thereafter, the weight of the acquired position information of each face detection frame may be determined based on the intersection ratio, and the weight of the acquired position information 302 and the weight of the position information 303 may be determined. Finally, target position information 304 of the face detection frame of the second frame (i.e., final position information of the face detection frame of the second frame) may be determined based on the determined weight and the acquired position information 302 and position information 303.
After capturing the third frame, the terminal device performs face point detection on the third frame by using the stored face detection model. Then, the position information 305 of the face detection frame of the third frame is acquired. At the same time, the updated position information of the face detection frame in the second frame (i.e., the target position information 304) is acquired. Next, the intersection ratio of the second face detection frame and the third face detection frame may be determined based on the target position information 304 and the position information 305. Thereafter, the weight of the acquired position information of each face detection frame may be determined based on the intersection ratio, and the weight of the acquired target position information 304 and the weight of the position information 305 may be determined. Finally, target position information 306 of the face detection frame of the third frame (i.e., final position information of the face detection frame of the third frame) may be determined based on the determined weight and the acquired target position information 304 and position information 305.
And so on. Finally, the terminal device 301 may obtain the position information of the face detection frame in each frame in the recorded video.
In the method provided by the above embodiment of the present application, position information of a first face detection frame of a current frame of a target video and position information of a second face detection frame of a previous frame, which are generated in advance, are acquired, so that an intersection ratio of the first face detection frame and the second face detection frame can be determined based on the acquired position information. Then, the weight of the position information of each of the acquired face detection frames is determined based on the intersection ratio, and the weight of the position information of each of the acquired face detection frames can be determined. Finally, target position information of the first face detection frame may be determined based on the determined weight and the acquired position information to update a position of the first face detection frame. Therefore, the position of the face detection frame of the rear frame can be adjusted based on the intersection and comparison of the face detection frames of the front frame and the rear frame. The position of the face detection frame of the later frame takes the position of the face detection frame of the previous frame into consideration, and the whole area of the face detection frame of the previous frame is taken into consideration instead of a single coordinate, so that the shake of the face detection frame in the video is reduced, and the smoothing effect and the moving stability of the face detection frame in the video are improved.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for generating information is shown. The flow 400 of the method for generating information comprises the steps of:
Step 401, obtaining position information of a first face detection frame obtained after performing face detection on a current frame of a target video in advance, and obtaining position information of a second face detection frame in a previous frame of the current frame stored in advance.
in the present embodiment, the execution subject of the method for generating information, for example, terminal devices 101, 102, 103 shown in fig. 1) may acquire position information of a first face detection frame obtained after face detection is performed in advance on a current frame of a target video, and acquire position information of a second face detection frame obtained after face detection is performed in advance on a frame previous to the current frame.
In this embodiment, the position information of the first face detection frame may include designated diagonal vertex coordinates (e.g., coordinates of an upper left vertex and a lower right vertex) of the first face detection frame, and the position information of the second face detection frame may include designated diagonal vertex coordinates of the second face detection frame.
Step 402, determining the intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information.
In this embodiment, the executing body may determine, by using the position information of the first face detection frame, coordinates of each remaining vertex of the first face detection frame in the current frame, so as to obtain coordinates of each vertex of the first face detection frame. Similarly, the coordinates of each vertex of the second face detection frame in the previous frame of the current frame can be determined through the position information of the second face detection frame. And then, determining the length and the width of the rectangle intersected by the first face detection frame and the second face detection frame by utilizing the vertex coordinates of the first face detection frame and the vertex coordinates of the second face detection frame. Further, the area of the intersecting rectangles (which may be referred to as the intersection area) can be obtained. Then, the sum of the areas of the first face detection frame and the second face detection frame (which may be referred to as a total area) may be calculated. The difference between the total area and the intersection area (which may be referred to as the phase-parallel area) may then be calculated. Finally, the ratio of the intersection area to the parallel area may be determined as the intersection ratio of the first face detection frame and the second face detection frame.
and 403, performing exponentiation operation by using a natural constant as a base number and a difference between the reciprocal of the cross-over ratio and a second preset value as an exponent.
In this embodiment, the execution body may perform an exponentiation operation with a natural constant as a base number and a difference between a reciprocal of the cross-over ratio and a second preset value (e.g., 1) as an exponent.
In step 404, the reciprocal of the result of the power operation is determined as the weight of the position information of the second face detection frame, and the difference between the second preset value and the weight is determined as the weight of the position information of the first face detection frame.
in this embodiment, the execution subject may determine the inverse of the result of the exponentiation calculation as the weight of the position information of the second face detection frame, and determine the difference between the second preset value (for example, 1) and the determined weight as the weight of the position information of the first face detection frame.
Step 405, using the weight of the position information of the first face detection frame as the weight of the specified diagonal vertex coordinates of the first face detection frame, using the weight of the position information of the second face detection frame as the weight of the specified diagonal vertex coordinates of the second face detection frame, and determining the weighted calculation result of the specified diagonal vertex coordinates of the first face detection frame and the specified diagonal vertex coordinates of the second face detection frame as the target diagonal vertex coordinates of the first face detection frame so as to update the position of the first face detection frame.
In the present embodiment, the execution subject may take the weight of the position information of the first face detection frame as the weight of the specified diagonal vertex coordinates of the first face detection frame. And taking the weight of the position information of the second face detection frame as the weight of the appointed diagonal vertex coordinate of the second face detection frame. And determining a weighting calculation result of the specified diagonal vertex coordinates of the first face detection frame and the specified diagonal vertex coordinates of the second face detection frame as target diagonal vertex coordinates of the first face detection frame so as to update the position of the first face detection frame. The coordinates of the designated diagonal vertex of the first face detection box may include coordinates of a first vertex (e.g., top left vertex) and coordinates of a second vertex (e.g., bottom right vertex). The specified diagonal vertex coordinates of the second face detection box described above may include coordinates of a third vertex (e.g., an upper left vertex) and coordinates of a fourth vertex (e.g., a lower right vertex).
Specifically, the result of the weighted calculation of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate described above may be first determined as the first target abscissa. Next, a result of weighting calculation of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate may be determined as the first target ordinate. Next, the result of the weighted calculation of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate may be determined as the second target abscissa. Next, the result of the weighted calculation of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate may be determined as the second target ordinate. Finally, coordinates formed by the first target abscissa and the first target ordinate and coordinates formed by the second target abscissa and the second target ordinate may be determined as target diagonal vertex coordinates of the first face detection frame. Since the coordinates of a set of diagonal vertices are known, the location of the rectangular box can be uniquely determined. Therefore, the position of the first face detection frame can be updated.
as can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for generating information in the present embodiment highlights the step of determining the weight of the face detection box of the current frame and the previous frame respectively. When the intersection of the first face detection frame and the second face detection frame is small, the moving amplitude of the face object from the previous frame to the current frame is large. At this time, the weight determined by the method of the present embodiment is larger for the position information of the first face detection frame (the face detection frame of the current frame) and smaller for the position information of the second face detection frame (the face detection frame of the previous frame). When the intersection of the first face detection frame and the second face detection frame is large, the moving amplitude of the face object from the previous frame to the current frame is small. The weight determined by the method of this embodiment is smaller for the position information of the first face detection frame, and larger for the position information of the second face detection frame. Therefore, the face detection frame can move smoothly, the shake of the face detection frame in the video is further reduced, and the smooth effect and the moving stability of the face detection frame in the video are further improved.
with further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for generating information according to the present embodiment includes: an obtaining unit 501 configured to obtain position information of a first face detection frame obtained after a current frame of a target video is subjected to face detection in advance, and obtain position information of a second face detection frame stored in advance in a previous frame of the current frame; a first determination unit 502 configured to determine an intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information; a second determining unit 503 configured to determine a weight of the acquired position information of each face detection frame based on the intersection ratio; an updating unit 504 configured to determine target position information of the first face detection frame based on the determined weight and the acquired position information to update a position of the first face detection frame.
in some optional implementations of the present embodiment, the second determining unit 503 may include a first operation module and a first determining module (not shown in the figure). The first operation module may be configured to perform a power operation with the cross-over ratio as a base number and a first preset value as an exponent. The first determination module may be configured to determine a calculation result of the exponentiation as a weight of the position information of the second face detection frame, and determine a difference between a second preset value and the weight as a weight of the position information of the first face detection frame.
In some optional implementations of the present embodiment, the second determining unit 503 may include a second operation module and a second determining module (not shown in the figure). The second operation module may be configured to perform a power operation with a natural constant as a base number and a difference between a reciprocal of the cross ratio and a second preset value as an exponent. The second determination module may be configured to determine a reciprocal of the result of the exponentiation calculation as a weight of the position information of the second face detection frame, and determine a difference between the second preset value and the weight as a weight of the position information of the first face detection frame.
In some optional implementations of the embodiment, the position information of the first face detection box may include specified diagonal vertex coordinates of the first face detection box, and the position information of the second face detection box may include specified diagonal vertex coordinates of the second face detection box. And, the update unit 504 may be further configured to: and determining a weight of position information of the first face detection frame as a weight of a designated diagonal vertex coordinate of the first face detection frame, a weight of position information of the second face detection frame as a weight of a designated diagonal vertex coordinate of the second face detection frame, and a result of a weighted calculation of the designated diagonal vertex coordinate of the first face detection frame and the designated diagonal vertex coordinate of the second face detection frame as a target diagonal vertex coordinate of the first face detection frame, so as to update the position of the first face detection frame.
In some optional implementations of the embodiment, the specified diagonal vertex coordinates of the first face detection box may include a first vertex coordinate and a second vertex coordinate, and the specified diagonal vertex coordinates of the second face detection box may include a third vertex coordinate and a fourth vertex coordinate. And, the update unit 504 may be further configured to: determining a weighted calculation result of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate as a first target abscissa; determining a weighted calculation result of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate as a first target ordinate; determining a weighted calculation result of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate as a second target abscissa; determining a weighted calculation result of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate as a second target ordinate; and determining coordinates consisting of the first target abscissa and the first target ordinate, and coordinates consisting of the second target abscissa and the second target ordinate, as target diagonal vertex coordinates of the first face detection frame.
In the apparatus provided by the above embodiment of the present application, the obtaining unit 501 obtains the position information of the first face detection frame of the current frame of the target video and the position information of the second face detection frame of the previous frame, which are generated in advance, so that the first determining unit 502 may determine the intersection ratio between the first face detection frame and the second face detection frame based on the obtained position information. Then, the second determination unit 503 determines the weight of the acquired position information of each face detection frame based on the intersection ratio, and may determine the weight of the acquired position information of each face detection frame. Finally, the updating unit 504 may determine target position information of the first face detection frame based on the determined weight and the acquired position information to update the position of the first face detection frame. Therefore, the position of the face detection frame of the rear frame can be adjusted based on the intersection and comparison of the face detection frames of the front frame and the rear frame. The position of the face detection frame of the later frame takes the position of the face detection frame of the previous frame into consideration, and the whole area of the face detection frame of the previous frame is taken into consideration instead of a single coordinate, so that the shake of the face detection frame in the video is reduced, and the smoothing effect and the moving stability of the face detection frame in the video are improved.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
as shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first determination unit, a second determination unit, and an update unit. The names of these units do not in some cases form a limitation on the unit itself, and for example, the updating unit may also be described as a "unit that updates the position of the second face detection frame".
as another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring position information of a first face detection frame obtained after face detection is performed on a current frame of a target video in advance, and acquiring position information of a second face detection frame obtained after face detection is performed on a previous frame of the current frame in advance; determining the intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information; determining the weight of the acquired position information of each face detection frame based on the intersection ratio; target position information of the first face detection frame is determined based on the determined weight and the acquired position information to update the position of the first face detection frame.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.