CN109308469B - Method and apparatus for generating information - Google Patents

Method and apparatus for generating information Download PDF

Info

Publication number
CN109308469B
CN109308469B CN201811110674.1A CN201811110674A CN109308469B CN 109308469 B CN109308469 B CN 109308469B CN 201811110674 A CN201811110674 A CN 201811110674A CN 109308469 B CN109308469 B CN 109308469B
Authority
CN
China
Prior art keywords
face detection
detection frame
position information
weight
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811110674.1A
Other languages
Chinese (zh)
Other versions
CN109308469A (en
Inventor
吴兴龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811110674.1A priority Critical patent/CN109308469B/en
Priority to PCT/CN2018/115974 priority patent/WO2020056903A1/en
Publication of CN109308469A publication Critical patent/CN109308469A/en
Application granted granted Critical
Publication of CN109308469B publication Critical patent/CN109308469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

the embodiment of the application discloses a method and a device for generating information. One embodiment of the method comprises: acquiring position information of a first face detection frame obtained after face detection is performed on a current frame of a target video in advance, and acquiring position information of a second face detection frame obtained after face detection is performed on a previous frame of the current frame in advance; determining the intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information; determining the weight of the acquired position information of each face detection frame based on the intersection ratio; target position information of the first face detection frame is determined based on the determined weight and the acquired position information to update the position of the first face detection frame. This embodiment improves the smoothing effect of the face detection frame.

Description

Method and apparatus for generating information
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating information.
Background
Face Detection (Face Detection) refers to a process of searching any given image by adopting a certain strategy to determine whether the image contains a Face object, if so, returning the position and size of the Face object, and the returned result can be embodied in the image in the form of a Face Detection frame.
when the face detection is carried out on the face object in the video, a face detection frame is generated in each frame. The related mode is that the face detection is directly carried out on each frame to obtain a face detection frame used for indicating a face object in each frame.
Disclosure of Invention
the embodiment of the application provides a method and a device for generating information.
In a first aspect, an embodiment of the present application provides a method for generating information, where the method includes: acquiring position information of a first face detection frame obtained after face detection is performed on a current frame of a target video in advance, and acquiring position information of a second face detection frame in a previous frame of the current frame which is stored in advance; determining the intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information; determining the weight of the acquired position information of each face detection frame based on the intersection ratio; target position information of the first face detection frame is determined based on the determined weight and the acquired position information to update the position of the first face detection frame.
in some embodiments, determining the weight of the acquired position information of each face detection frame based on the intersection ratio includes: taking the cross-over ratio as a base number and taking a first preset numerical value as an exponent to perform power operation; and determining the calculation result of the exponentiation as the weight of the position information of the second face detection frame, and determining the difference value between the second preset value and the weight as the weight of the position information of the first face detection frame.
In some embodiments, determining the weight of the acquired position information of each face detection frame based on the intersection ratio includes: taking a natural constant as a base number, and taking the difference between the reciprocal of the cross-parallel ratio and a second preset numerical value as an exponent to perform power operation; and determining the reciprocal of the calculation result of the exponentiation as the weight of the position information of the second face detection frame, and determining the difference value between the second preset numerical value and the weight as the weight of the position information of the first face detection frame.
In some embodiments, the position information of the first face detection box includes specified diagonal vertex coordinates of the first face detection box, and the position information of the second face detection box includes specified diagonal vertex coordinates of the second face detection box; and determining target position information of the first face detection frame based on the determined weight and the acquired position information to update the position of the first face detection frame, including: and determining a weighted calculation result of the specified diagonal vertex coordinates of the first face detection frame and the specified diagonal vertex coordinates of the second face detection frame as target diagonal vertex coordinates of the first face detection frame so as to update the position of the first face detection frame.
In some embodiments, the specified diagonal vertex coordinates of the first face detection box comprise first vertex coordinates and second vertex coordinates, and the specified diagonal vertex coordinates of the second face detection box comprise third vertex coordinates and fourth vertex coordinates; and determining a weighted calculation result of the specified diagonal vertex coordinates of the first face detection box and the specified diagonal vertex coordinates of the second face detection box as target diagonal vertex coordinates of the first face detection box, including: determining a weighted calculation result of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate as a first target abscissa; determining a weighted calculation result of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate as a first target ordinate; determining a weighted calculation result of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate as a second target abscissa; determining a weighted calculation result of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate as a second target ordinate; and determining coordinates formed by the first target abscissa and the first target ordinate and coordinates formed by the second target abscissa and the second target ordinate as target diagonal vertex coordinates of the first face detection frame.
in a second aspect, an embodiment of the present application provides an apparatus for generating information, where the apparatus includes: an acquisition unit configured to acquire position information of a first face detection frame obtained after a current frame of a target video is subjected to face detection in advance, and acquire position information of a second face detection frame in a previous frame of the current frame stored in advance; a first determination unit configured to determine an intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information; a second determination unit configured to determine a weight of the acquired position information of each face detection frame based on the intersection ratio; an updating unit configured to determine target position information of the first face detection frame based on the determined weight and the acquired position information to update a position of the first face detection frame.
In some embodiments, the second determining unit comprises: a first operation module configured to perform an exponentiation operation with the cross-over ratio as a base number and a first preset value as an exponent; a first determination module configured to determine a calculation result of the exponentiation as a weight of the position information of the second face detection frame, and determine a difference value of the second preset value and the weight as a weight of the position information of the first face detection frame.
In some embodiments, the second determining unit comprises: a second operation module configured to perform a power operation with a natural constant as a base number and a difference between a reciprocal of the cross-over ratio and a second preset value as an exponent; a second determination module configured to determine a reciprocal of the result of the exponentiation calculation as a weight of the position information of the second face detection frame, and determine a difference value between a second preset value and the weight as a weight of the position information of the first face detection frame.
In some embodiments, the position information of the first face detection box includes specified diagonal vertex coordinates of the first face detection box, and the position information of the second face detection box includes specified diagonal vertex coordinates of the second face detection box; and an update unit further configured to: and determining a weighted calculation result of the specified diagonal vertex coordinates of the first face detection frame and the specified diagonal vertex coordinates of the second face detection frame as target diagonal vertex coordinates of the first face detection frame so as to update the position of the first face detection frame.
In some embodiments, the specified diagonal vertex coordinates of the first face detection box comprise first vertex coordinates and second vertex coordinates, and the specified diagonal vertex coordinates of the second face detection box comprise third vertex coordinates and fourth vertex coordinates; and an update unit further configured to: determining a weighted calculation result of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate as a first target abscissa; determining a weighted calculation result of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate as a first target ordinate; determining a weighted calculation result of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate as a second target abscissa; determining a weighted calculation result of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate as a second target ordinate; and determining coordinates formed by the first target abscissa and the first target ordinate and coordinates formed by the second target abscissa and the second target ordinate as target diagonal vertex coordinates of the first face detection frame.
in a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the embodiments of the first aspect described above.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method according to any one of the embodiments of the first aspect.
According to the method and the device for generating information, the position information of the first face detection frame of the current frame of the target video and the position information of the second face detection frame of the previous frame are obtained, so that the intersection ratio of the first face detection frame and the second face detection frame can be determined based on the obtained position information. Then, the weight of the position information of each of the acquired face detection frames is determined based on the intersection ratio, and the weight of the position information of each of the acquired face detection frames can be determined. Finally, target position information of the first face detection frame may be determined based on the determined weight and the acquired position information to update the position of the first face detection frame. Therefore, the position of the face detection frame of the rear frame can be adjusted based on the intersection and comparison of the face detection frames of the front frame and the rear frame. The position of the face detection frame of the rear frame takes the position of the face detection frame of the front frame into consideration, and the whole area of the face detection frame of the front frame is taken into consideration instead of a single coordinate, so that the jitter of the face detection frame in the video is reduced, and the moving smoothness and stability of the face detection frame in the video are improved.
drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for generating information according to the present application;
FIG. 3 is a schematic illustration of an application scenario of a method for generating information according to the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method for generating information according to the present application;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.
Detailed Description
the present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
fig. 1 shows an exemplary system architecture 100 to which the method for detecting face keypoints or the apparatus for detecting face keypoints of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a voice interaction application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, 103.
the terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
When the terminal devices 101, 102, 103 are hardware, an image capturing device may be mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user may capture video using an image capture device on the terminal device 101, 102, 103.
The terminal devices 101, 102, and 103 may perform processing such as face detection on frames in a video played by the terminal devices or a video recorded by a user; the face detection result (e.g., the position information of the face detection frame) may also be analyzed and the like, and the position of the face detection frame may be updated.
The server 105 may be a server that provides various services, such as a video processing server for storing, managing, or analyzing videos uploaded by the terminal devices 101, 102, 103. The video processing server may store a large amount of video and may transmit the video to the terminal apparatuses 101, 102, 103.
The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for generating information provided in the embodiments of the present application is generally executed by the terminal devices 101, 102, and 103, and accordingly, the apparatus for generating information is generally disposed in the terminal devices 101, 102, and 103.
It is noted that in the case where the terminal devices 101, 102, 103 can implement the related functions of the server 105, the server 105 may not be provided in the system architecture 100.
it should be further noted that the server 105 may also perform processing such as face detection on the stored video or the video uploaded by the terminal devices 101, 102, and 103, and return the processing result to the terminal devices 101, 102, and 103. At this time, the method for generating information provided in the embodiment of the present application may also be executed by the server 105, and accordingly, the apparatus for generating information may also be provided in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
with continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating information in accordance with the present application is shown. The method for generating information comprises the following steps:
Step 201, obtaining position information of a first face detection frame obtained after performing face detection on a current frame of a target video in advance, and obtaining position information of a second face detection frame in a previous frame of the current frame stored in advance.
In the present embodiment, the execution subject of the method for generating information, for example, the terminal apparatuses 101, 102, 103 shown in fig. 1) may perform recording or playing of a video. The played video can be a video pre-stored locally; or may be a video obtained from a server (e.g., server 105 shown in fig. 1) via a wired connection or a wireless connection. Here, when recording a video, the execution main body may be mounted or connected with an image capture device (e.g., a camera). It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
In this embodiment, the execution subject may obtain position information of a first face detection frame obtained after performing face detection on a current frame of the target video in advance, and obtain position information of a second face detection frame stored in advance in a previous frame of the current frame. The target video may be a video currently being played or a video being recorded by a user. And is not limited herein.
Here, the current frame of the target video may be a frame of the target video whose face detection frame is to be subjected to position update. As an example, the execution subject may sequentially perform face detection on each frame in the target video in the order of the timestamps of the frames, and after performing face detection on each frame except for the first frame, may perform position correction on the obtained face detection frame. The frame to be subjected to the position correction of the face detection frame at present may be referred to as a current frame of the target video. Take the following two scenarios as examples:
In one scenario, the target video may be a video being played by the executing entity. In the process of playing the target video, the execution main body can carry out face detection on each frame to be played one by one to obtain the position information of the face detection frame of the frame. When the frame is a non-first frame, after the position information of the face detection frame is obtained, the position information of the face detection frame of the frame can be corrected, and then the frame is played. The frame to be subjected to the face detection frame position correction at the current time may be the current frame.
in another scenario, the target video may be a video that the execution subject is recording. In the recording process of the target video, the execution main body can carry out face detection on each captured frame one by one to obtain the position information of the face detection frame of the frame. After the first frame is captured, after face detection is performed on each next captured frame, position correction can be performed on the obtained face detection frame, and then the frame is displayed. The latest frame obtained at the current time and to which the position correction of the face detection frame has not been performed may be the current frame.
It should be noted that the execution subject may perform face detection on the frame of the target video in various ways. As an example, the execution subject may store a face detection model trained in advance. The execution main body can input the frame in the frames in the target video into a pre-trained face detection model to obtain the position information of the face detection frame of the frame. The face detection model may be used to detect a region where a face object in an image is located (which may be represented by a face detection frame, where the face detection frame may be a rectangular frame). In practice, the face detection model may output position information of the face detection frame. Here, the face detection model may be obtained by performing supervised training on an existing convolutional neural network based on a sample set (including a face image and a label indicating a position of a face object region) by using a machine learning method. Various existing structures can be used for the convolutional neural network, such as DenseBox, VGGNet, ResNet, SegNet, and the like. It should be noted that the machine learning method and the supervised training method are well-known technologies that are widely researched and applied at present, and are not described herein again.
It is to be noted that the position information of the face detection frame may be information for indicating and uniquely determining the position of the face detection frame in the frame.
Alternatively, the position information of the face detection box may include coordinates of four vertices of the face detection box.
Optionally, the position information of the face detection frame may include coordinates of any diagonal vertex of the face detection frame. Such as the coordinates of the top left vertex and the coordinates of the bottom right vertex.
alternatively, the position information of the face detection box may include coordinates of any vertex of the face detection box and the length and width of the face detection box.
It should be noted that the position information is not limited to the above list, and may include other information that can be used to indicate and uniquely determine the position of the face detection frame.
step 202, determining the intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information.
In this embodiment, the execution subject may determine an Intersection-over-unity (IOU) ratio of the first face detection frame and the second face detection frame based on the acquired position information of the first face detection frame and the acquired position information of the second face detection frame.
In practice, the intersection ratio of two rectangles may be the ratio of the area of the region where the two rectangles intersect to the area of the region where the two rectangles are in phase. Here, the area of the region of the two rectangles in phase is equal to the sum of the areas of the two rectangles minus the area of the region of intersection of the two rectangles. In practice, the cross-over ratio is a number in the interval [0,1 ].
in this embodiment, the position of the face detection frame in the frame can be determined by the position information of the face detection frame. Therefore, the coordinates of each vertex of the first face detection frame in the current frame can be determined by the position information of the first face detection frame. And determining the coordinates of each vertex of the second face detection frame in the previous frame of the current frame according to the position information of the second face detection frame. As an example, if the position information of the face detection box may include the coordinates of any vertex (e.g., upper left vertex) of the face detection box and the length and width of the face detection box, the abscissa of the upper left vertex may be added to the length, and the ordinate of the upper left vertex may be added to the width, resulting in the coordinates of the upper right vertex, lower left vertex, and lower right vertex, respectively.
in this embodiment, the vertex coordinates of the first face detection frame and the vertex coordinates of the second face detection frame are obtained. Therefore, the length and the width of the rectangle intersecting the first face detection frame and the second face detection frame can be determined by using the vertex coordinates of the first face detection frame and the vertex coordinates of the second face detection frame. Further, the area of the intersecting rectangles (which may be referred to as the intersection area) can be obtained. Then, the sum of the areas of the first face detection frame and the second face detection frame (which may be referred to as a total area) may be calculated. The difference between the total area and the intersection area (which may be referred to as the phase-parallel area) may then be calculated. Finally, the ratio of the intersection area to the parallel area may be determined as the intersection ratio of the first face detection frame and the second face detection frame.
It should be noted that the intersection ratio calculation method is a well-known technique widely studied and applied at present, and is not described herein again.
step 203, based on the intersection ratio, determines the weight of the acquired position information of each face detection frame.
in this embodiment, the executing entity may determine the weight of the position information of the first face detection frame and the weight of the second face detection frame respectively based on the intersection ratio determined in step 202. See in particular the following steps:
In the first step, the intersection ratio may be calculated in a formula established in advance, and the calculation result is determined as the weight of the position information of the second face detection frame. The formula established in advance may be various formulas satisfying preset conditions, and is not limited herein. The preset conditions include: the larger the intersection ratio is, the larger the calculation result of the formula is; the smaller the cross-over ratio, the smaller the calculation of the above formula. When the intersection ratio is 0, the calculation result is 0; when the intersection ratio is 1, the calculation result is 1.
In the second step, a difference value between a preset value (e.g., 1) and the weight of the position information of the second face detection frame may be determined as the weight of the position information of the first face detection frame.
the order of determining the weight of the position information of the first face detection frame and the weight of the position information of the second face detection frame is not limited here. The execution body may modify the formula established in advance to determine the weight of the position information of the first face detection frame first and then determine the weight of the position information of the second face detection frame.
In some optional implementations of the embodiment, the execution body may perform the exponentiation operation with the cross-over ratio as a base number and a first preset value (e.g., 6, or 3, etc.) as an exponent. Here, the first preset value may be determined by a skilled person based on a large number of data statistics and experiments. Then, the execution subject may determine the calculation result of the exponentiation as the weight of the position information of the second face detection frame, and determine a difference between a second preset value (e.g., 1) and the determined weight of the position information of the second face detection frame as the weight of the position information of the first face detection frame.
In some optional implementations of the embodiment, the execution subject may perform an exponentiation operation with a natural constant as a base number and a difference between an inverse of the cross-over ratio and a second preset value (e.g., 1) as an exponent. Then, the inverse of the result of the exponentiation calculation may be determined as the weight of the position information of the second face detection frame, and the difference between the second preset value and the determined weight of the position information of the second face detection frame may be determined as the weight of the position information of the first face detection frame.
The execution body may determine the weight of each acquired location information in other manners. And is not limited to the above implementation. For example, the exponentiation may be performed with a predetermined value (e.g., 2 or 3) as a base and a difference between the reciprocal of the cross ratio and a second predetermined value (e.g., 1) as an exponent. Then, the inverse of the result of the exponentiation calculation may be determined as the weight of the position information of the second face detection frame, and the difference between the second preset value and the weight may be determined as the weight of the position information of the first face detection frame.
In the conventional method, the average value of the coordinates of the corresponding vertices (for example, both top-left vertices) in the face detection frames in the previous frame and the current frame is generally used as the coordinates after correction of the vertex (top-left vertex) in the current frame. This results in corrected coordinates of each vertex of the current frame. In the mode, when the human face object moves fast, the human face detection frame cannot follow the motion of the human face object, the dragging sense is strong, and the accuracy is low. And the position of the face detection frame in the current frame is corrected by utilizing the weight determined based on the intersection ratio in the application. The larger the cross-over ratio, the slower the face object moves; the smaller the intersection ratio, the faster the face object moves. Therefore, different weights can be calculated according to different intersection ratios. Therefore, dragging feeling is reduced, and timeliness and accuracy of the face detection frame are improved.
In the conventional method, there is also a method of determining the weight of the vertex coordinates of the face detection frame by the distance between the coordinates of the corresponding vertex (for example, both top left vertices) in the face detection frame in the previous frame and the current frame. However, in this method, the weights of the coordinates of the respective vertices are independent, and the entire face detection frame cannot be considered. Therefore, the smoothing effect is poor. In the present application, the entire area of the face detection frame is considered in the process of determining the cross-over ratio by using the weight determined based on the cross-over ratio, and the weights of the vertex coordinates in the same face detection frame are the same, so that the face detection frame is considered as a whole. The smoothing effect is improved.
Step 204, determining target position information of the first face detection frame based on the determined weight and the acquired position information, so as to update the position of the first face detection frame.
In this embodiment, the execution subject described above may determine the target position information of the first face detection frame based on the determined weight and the acquired position information to update the position of the first face detection frame. Here, the execution body may correct the position information of the first face detection frame based on the determined weight. That is, the vertex coordinates of the first face detection frame are corrected.
In some optional implementations of the present embodiment, the position information of the face detection box may include coordinates of four vertices of the face detection box. In this case, the execution body may correct the coordinates of the first face detection frame. Specifically, for each vertex, the following steps may be performed (the sitting vertex is described here as an example, and the rest of the vertices are not described again):
firstly, weighting the abscissa of the upper left vertex of the first face detection frame and the abscissa of the upper left vertex of the second face detection frame. That is, the abscissa of the top-left vertex of the first face detection frame is multiplied by the weight of the position information of the first face detection frame to obtain a first numerical value. And multiplying the abscissa of the upper left vertex of the second face detection frame by the weight of the position information of the second face detection frame to obtain a second numerical value. And determining the product of the first numerical value and the second numerical value as the abscissa of the top left vertex of the modified first face detection frame.
And secondly, weighting the vertical coordinate of the upper left vertex of the first face detection frame and the vertical coordinate of the upper left vertex of the second face detection frame. That is, the vertical coordinate of the top-left vertex of the first face detection frame is multiplied by the weight of the position information of the first face detection frame to obtain the third numerical value. And multiplying the vertical coordinate of the upper left vertex of the second face detection frame by the weight of the position information of the second face detection frame to obtain a fourth numerical value. And determining the product of the third numerical value and the fourth numerical value as the ordinate of the top left vertex of the modified first face detection frame.
And step three, summarizing the horizontal coordinates and the vertical coordinates obtained in the step one and the step two into coordinates of the top left vertex of the corrected first face detection frame.
after the coordinates of each vertex of the first face detection frame are corrected, the electronic device may summarize the corrected coordinates of each vertex as the target position information. Thus, the position of the first face detection frame can be updated.
In some optional implementations of the embodiment, the position information of the first face detection box includes specified diagonal vertex coordinates of the first face detection box, and the position information of the second face detection box includes specified diagonal vertex coordinates of the second face detection box. The coordinates of the designated diagonal vertex of the first face detection box may include coordinates of a first vertex (e.g., top left vertex) and coordinates of a second vertex (e.g., bottom right vertex). The specified diagonal vertex coordinates of the second face detection box described above may include coordinates of a third vertex (e.g., an upper left vertex) and coordinates of a fourth vertex (e.g., a lower right vertex). In this case, the execution agent may update the position of the first face detection frame by determining a weight of the position information of the first face detection frame as a weight of the designated diagonal vertex coordinates of the first face detection frame, determining a weight of the position information of the second face detection frame as a weight of the designated diagonal vertex coordinates of the second face detection frame, and determining a result of weighting calculation between the designated diagonal vertex coordinates of the first face detection frame and the designated diagonal vertex coordinates of the second face detection frame as target diagonal vertex coordinates of the first face detection frame.
Optionally, the target diagonal vertex coordinates of the first face detection box may be calculated according to the following operation sequence:
First, a result of weighted calculation of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate may be determined as a first target abscissa;
next, a weighted calculation result of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate may be determined as a first target ordinate;
Next, a weighted calculation result of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate may be determined as a second target abscissa;
Next, a weighted calculation result of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate may be determined as a second target ordinate;
Finally, coordinates formed by the first target abscissa and the first target ordinate and coordinates formed by the second target abscissa and the second target ordinate may be determined as target diagonal vertex coordinates of the first face detection frame. Since the coordinates of a set of diagonal vertices are known, the location of the rectangular box can be uniquely determined. Therefore, the position of the first face detection frame can be updated.
it should be noted that, in this implementation, other operation sequences may also be used to calculate the target diagonal vertex coordinates of the first face detection box, which is not described herein again.
It should be noted that, in this implementation, after the target diagonal vertex coordinates are calculated, another pair of diagonal vertex coordinates of the first face detection box may also be calculated according to the target diagonal vertex coordinates. Thereby obtaining coordinates of four vertices of the first face detection box.
In some optional implementations of the present embodiment, the position information of the face detection box may include coordinates of any vertex of the face detection box and a length and a width of the face detection box. In this case, the execution body may first specify coordinates of a diagonal vertex of the vertex based on the coordinates of the vertex, the length, and the width. Alternatively, the coordinates of the remaining three vertices are determined. Then, the target position information of the first face detection frame may be determined by using the operation procedures described in the above two implementation manners. Therefore, the position of the first face detection frame is updated.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 3, the user records a target video using the self-timer mode of the terminal device 301.
After capturing the first frame, the terminal device performs face point detection on the first frame by using the stored face detection model, and obtains position information 302 of a face detection frame in the first frame.
After capturing the second frame, the terminal device performs face detection on the second frame by using the stored face detection model. Then, the position information 303 of the face detection frame of the second frame is acquired. Meanwhile, the position information 302 of the face detection frame in the first frame is acquired. Next, the intersection ratio of the first face detection frame and the second face detection frame may be determined based on the position information 302 and the position information 303. Thereafter, the weight of the acquired position information of each face detection frame may be determined based on the intersection ratio, and the weight of the acquired position information 302 and the weight of the position information 303 may be determined. Finally, target position information 304 of the face detection frame of the second frame (i.e., final position information of the face detection frame of the second frame) may be determined based on the determined weight and the acquired position information 302 and position information 303.
After capturing the third frame, the terminal device performs face point detection on the third frame by using the stored face detection model. Then, the position information 305 of the face detection frame of the third frame is acquired. At the same time, the updated position information of the face detection frame in the second frame (i.e., the target position information 304) is acquired. Next, the intersection ratio of the second face detection frame and the third face detection frame may be determined based on the target position information 304 and the position information 305. Thereafter, the weight of the acquired position information of each face detection frame may be determined based on the intersection ratio, and the weight of the acquired target position information 304 and the weight of the position information 305 may be determined. Finally, target position information 306 of the face detection frame of the third frame (i.e., final position information of the face detection frame of the third frame) may be determined based on the determined weight and the acquired target position information 304 and position information 305.
And so on. Finally, the terminal device 301 may obtain the position information of the face detection frame in each frame in the recorded video.
In the method provided by the above embodiment of the present application, position information of a first face detection frame of a current frame of a target video and position information of a second face detection frame of a previous frame, which are generated in advance, are acquired, so that an intersection ratio of the first face detection frame and the second face detection frame can be determined based on the acquired position information. Then, the weight of the position information of each of the acquired face detection frames is determined based on the intersection ratio, and the weight of the position information of each of the acquired face detection frames can be determined. Finally, target position information of the first face detection frame may be determined based on the determined weight and the acquired position information to update a position of the first face detection frame. Therefore, the position of the face detection frame of the rear frame can be adjusted based on the intersection and comparison of the face detection frames of the front frame and the rear frame. The position of the face detection frame of the later frame takes the position of the face detection frame of the previous frame into consideration, and the whole area of the face detection frame of the previous frame is taken into consideration instead of a single coordinate, so that the shake of the face detection frame in the video is reduced, and the smoothing effect and the moving stability of the face detection frame in the video are improved.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for generating information is shown. The flow 400 of the method for generating information comprises the steps of:
Step 401, obtaining position information of a first face detection frame obtained after performing face detection on a current frame of a target video in advance, and obtaining position information of a second face detection frame in a previous frame of the current frame stored in advance.
in the present embodiment, the execution subject of the method for generating information, for example, terminal devices 101, 102, 103 shown in fig. 1) may acquire position information of a first face detection frame obtained after face detection is performed in advance on a current frame of a target video, and acquire position information of a second face detection frame obtained after face detection is performed in advance on a frame previous to the current frame.
In this embodiment, the position information of the first face detection frame may include designated diagonal vertex coordinates (e.g., coordinates of an upper left vertex and a lower right vertex) of the first face detection frame, and the position information of the second face detection frame may include designated diagonal vertex coordinates of the second face detection frame.
Step 402, determining the intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information.
In this embodiment, the executing body may determine, by using the position information of the first face detection frame, coordinates of each remaining vertex of the first face detection frame in the current frame, so as to obtain coordinates of each vertex of the first face detection frame. Similarly, the coordinates of each vertex of the second face detection frame in the previous frame of the current frame can be determined through the position information of the second face detection frame. And then, determining the length and the width of the rectangle intersected by the first face detection frame and the second face detection frame by utilizing the vertex coordinates of the first face detection frame and the vertex coordinates of the second face detection frame. Further, the area of the intersecting rectangles (which may be referred to as the intersection area) can be obtained. Then, the sum of the areas of the first face detection frame and the second face detection frame (which may be referred to as a total area) may be calculated. The difference between the total area and the intersection area (which may be referred to as the phase-parallel area) may then be calculated. Finally, the ratio of the intersection area to the parallel area may be determined as the intersection ratio of the first face detection frame and the second face detection frame.
and 403, performing exponentiation operation by using a natural constant as a base number and a difference between the reciprocal of the cross-over ratio and a second preset value as an exponent.
In this embodiment, the execution body may perform an exponentiation operation with a natural constant as a base number and a difference between a reciprocal of the cross-over ratio and a second preset value (e.g., 1) as an exponent.
In step 404, the reciprocal of the result of the power operation is determined as the weight of the position information of the second face detection frame, and the difference between the second preset value and the weight is determined as the weight of the position information of the first face detection frame.
in this embodiment, the execution subject may determine the inverse of the result of the exponentiation calculation as the weight of the position information of the second face detection frame, and determine the difference between the second preset value (for example, 1) and the determined weight as the weight of the position information of the first face detection frame.
Step 405, using the weight of the position information of the first face detection frame as the weight of the specified diagonal vertex coordinates of the first face detection frame, using the weight of the position information of the second face detection frame as the weight of the specified diagonal vertex coordinates of the second face detection frame, and determining the weighted calculation result of the specified diagonal vertex coordinates of the first face detection frame and the specified diagonal vertex coordinates of the second face detection frame as the target diagonal vertex coordinates of the first face detection frame so as to update the position of the first face detection frame.
In the present embodiment, the execution subject may take the weight of the position information of the first face detection frame as the weight of the specified diagonal vertex coordinates of the first face detection frame. And taking the weight of the position information of the second face detection frame as the weight of the appointed diagonal vertex coordinate of the second face detection frame. And determining a weighting calculation result of the specified diagonal vertex coordinates of the first face detection frame and the specified diagonal vertex coordinates of the second face detection frame as target diagonal vertex coordinates of the first face detection frame so as to update the position of the first face detection frame. The coordinates of the designated diagonal vertex of the first face detection box may include coordinates of a first vertex (e.g., top left vertex) and coordinates of a second vertex (e.g., bottom right vertex). The specified diagonal vertex coordinates of the second face detection box described above may include coordinates of a third vertex (e.g., an upper left vertex) and coordinates of a fourth vertex (e.g., a lower right vertex).
Specifically, the result of the weighted calculation of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate described above may be first determined as the first target abscissa. Next, a result of weighting calculation of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate may be determined as the first target ordinate. Next, the result of the weighted calculation of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate may be determined as the second target abscissa. Next, the result of the weighted calculation of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate may be determined as the second target ordinate. Finally, coordinates formed by the first target abscissa and the first target ordinate and coordinates formed by the second target abscissa and the second target ordinate may be determined as target diagonal vertex coordinates of the first face detection frame. Since the coordinates of a set of diagonal vertices are known, the location of the rectangular box can be uniquely determined. Therefore, the position of the first face detection frame can be updated.
as can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for generating information in the present embodiment highlights the step of determining the weight of the face detection box of the current frame and the previous frame respectively. When the intersection of the first face detection frame and the second face detection frame is small, the moving amplitude of the face object from the previous frame to the current frame is large. At this time, the weight determined by the method of the present embodiment is larger for the position information of the first face detection frame (the face detection frame of the current frame) and smaller for the position information of the second face detection frame (the face detection frame of the previous frame). When the intersection of the first face detection frame and the second face detection frame is large, the moving amplitude of the face object from the previous frame to the current frame is small. The weight determined by the method of this embodiment is smaller for the position information of the first face detection frame, and larger for the position information of the second face detection frame. Therefore, the face detection frame can move smoothly, the shake of the face detection frame in the video is further reduced, and the smooth effect and the moving stability of the face detection frame in the video are further improved.
with further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for generating information according to the present embodiment includes: an obtaining unit 501 configured to obtain position information of a first face detection frame obtained after a current frame of a target video is subjected to face detection in advance, and obtain position information of a second face detection frame stored in advance in a previous frame of the current frame; a first determination unit 502 configured to determine an intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information; a second determining unit 503 configured to determine a weight of the acquired position information of each face detection frame based on the intersection ratio; an updating unit 504 configured to determine target position information of the first face detection frame based on the determined weight and the acquired position information to update a position of the first face detection frame.
in some optional implementations of the present embodiment, the second determining unit 503 may include a first operation module and a first determining module (not shown in the figure). The first operation module may be configured to perform a power operation with the cross-over ratio as a base number and a first preset value as an exponent. The first determination module may be configured to determine a calculation result of the exponentiation as a weight of the position information of the second face detection frame, and determine a difference between a second preset value and the weight as a weight of the position information of the first face detection frame.
In some optional implementations of the present embodiment, the second determining unit 503 may include a second operation module and a second determining module (not shown in the figure). The second operation module may be configured to perform a power operation with a natural constant as a base number and a difference between a reciprocal of the cross ratio and a second preset value as an exponent. The second determination module may be configured to determine a reciprocal of the result of the exponentiation calculation as a weight of the position information of the second face detection frame, and determine a difference between the second preset value and the weight as a weight of the position information of the first face detection frame.
In some optional implementations of the embodiment, the position information of the first face detection box may include specified diagonal vertex coordinates of the first face detection box, and the position information of the second face detection box may include specified diagonal vertex coordinates of the second face detection box. And, the update unit 504 may be further configured to: and determining a weight of position information of the first face detection frame as a weight of a designated diagonal vertex coordinate of the first face detection frame, a weight of position information of the second face detection frame as a weight of a designated diagonal vertex coordinate of the second face detection frame, and a result of a weighted calculation of the designated diagonal vertex coordinate of the first face detection frame and the designated diagonal vertex coordinate of the second face detection frame as a target diagonal vertex coordinate of the first face detection frame, so as to update the position of the first face detection frame.
In some optional implementations of the embodiment, the specified diagonal vertex coordinates of the first face detection box may include a first vertex coordinate and a second vertex coordinate, and the specified diagonal vertex coordinates of the second face detection box may include a third vertex coordinate and a fourth vertex coordinate. And, the update unit 504 may be further configured to: determining a weighted calculation result of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate as a first target abscissa; determining a weighted calculation result of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate as a first target ordinate; determining a weighted calculation result of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate as a second target abscissa; determining a weighted calculation result of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate as a second target ordinate; and determining coordinates consisting of the first target abscissa and the first target ordinate, and coordinates consisting of the second target abscissa and the second target ordinate, as target diagonal vertex coordinates of the first face detection frame.
In the apparatus provided by the above embodiment of the present application, the obtaining unit 501 obtains the position information of the first face detection frame of the current frame of the target video and the position information of the second face detection frame of the previous frame, which are generated in advance, so that the first determining unit 502 may determine the intersection ratio between the first face detection frame and the second face detection frame based on the obtained position information. Then, the second determination unit 503 determines the weight of the acquired position information of each face detection frame based on the intersection ratio, and may determine the weight of the acquired position information of each face detection frame. Finally, the updating unit 504 may determine target position information of the first face detection frame based on the determined weight and the acquired position information to update the position of the first face detection frame. Therefore, the position of the face detection frame of the rear frame can be adjusted based on the intersection and comparison of the face detection frames of the front frame and the rear frame. The position of the face detection frame of the later frame takes the position of the face detection frame of the previous frame into consideration, and the whole area of the face detection frame of the previous frame is taken into consideration instead of a single coordinate, so that the shake of the face detection frame in the video is reduced, and the smoothing effect and the moving stability of the face detection frame in the video are improved.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
as shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first determination unit, a second determination unit, and an update unit. The names of these units do not in some cases form a limitation on the unit itself, and for example, the updating unit may also be described as a "unit that updates the position of the second face detection frame".
as another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring position information of a first face detection frame obtained after face detection is performed on a current frame of a target video in advance, and acquiring position information of a second face detection frame obtained after face detection is performed on a previous frame of the current frame in advance; determining the intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information; determining the weight of the acquired position information of each face detection frame based on the intersection ratio; target position information of the first face detection frame is determined based on the determined weight and the acquired position information to update the position of the first face detection frame.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. a method for generating information, comprising:
Acquiring position information of a first face detection frame obtained after face detection is performed on a current frame of a target video in advance, and acquiring position information of a second face detection frame in a previous frame of the current frame, which is stored in advance;
Determining an intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information;
determining the weight of the acquired position information of each face detection frame based on the cross-over ratio, wherein the weight of the position information of the second face detection frame is positively correlated with the cross-over ratio, and the weight of the position information of the first face detection frame is negatively correlated with the cross-over ratio;
determining target position information of the first face detection frame based on the determined weight and the acquired position information to update a position of the first face detection frame.
2. The method for generating information according to claim 1, wherein the determining a weight of the acquired position information of each face detection frame based on the intersection ratio includes:
Taking the cross-over ratio as a base number and taking a first preset numerical value as an exponent to perform power operation;
And determining the calculation result of the exponentiation as the weight of the position information of the second face detection frame, and determining the difference value between a second preset numerical value and the weight as the weight of the position information of the first face detection frame.
3. The method for generating information according to claim 1, wherein the determining a weight of the acquired position information of each face detection frame based on the intersection ratio includes:
Taking a natural constant as a base number, and taking the difference between the reciprocal of the cross-over ratio and a second preset numerical value as an exponent to perform power operation;
And determining the reciprocal of the power operation calculation result as the weight of the position information of the second face detection frame, and determining the difference value between the second preset numerical value and the weight as the weight of the position information of the first face detection frame.
4. the method for generating information according to claim 1, wherein the position information of the first face detection box includes specified diagonal vertex coordinates of the first face detection box, and the position information of the second face detection box includes specified diagonal vertex coordinates of the second face detection box; and
The determining target position information of the first face detection frame based on the determined weight and the acquired position information to update the position of the first face detection frame includes:
And determining a weight calculation result of the specified diagonal vertex coordinates of the first face detection frame and the second face detection frame as target diagonal vertex coordinates of the first face detection frame so as to update the position of the first face detection frame.
5. The method for generating information of claim 4, wherein the specified diagonal vertex coordinates of the first face detection box comprise first vertex coordinates and second vertex coordinates, and the specified diagonal vertex coordinates of the second face detection box comprise third vertex coordinates and fourth vertex coordinates; and
Determining a result of weighting calculation of the designated diagonal vertex coordinates of the first face detection box and the designated diagonal vertex coordinates of the second face detection box as target diagonal vertex coordinates of the first face detection box, including:
determining a weighted calculation result of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate as a first target abscissa;
Determining a weighted calculation result of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate as a first target ordinate;
Determining a weighted calculation result of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate as a second target abscissa;
Determining a weighted calculation result of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate as a second target ordinate;
And determining coordinates formed by the first target abscissa and the first target ordinate and coordinates formed by the second target abscissa and the second target ordinate as target diagonal vertex coordinates of the first face detection frame.
6. An apparatus for generating information, comprising:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire position information of a first face detection frame obtained after face detection is carried out on a current frame of a target video in advance, and acquire position information of a second face detection frame stored in advance in a previous frame of the current frame;
A first determination unit configured to determine an intersection ratio of the first face detection frame and the second face detection frame based on the acquired position information;
A second determination unit configured to determine a weight of the acquired position information of each face detection frame based on the cross-over ratio, wherein the weight of the position information of the second face detection frame is positively correlated with the cross-over ratio, and the weight of the position information of the first face detection frame is negatively correlated with the cross-over ratio;
An updating unit configured to determine target position information of the first face detection frame based on the determined weight and the acquired position information to update a position of the first face detection frame.
7. the apparatus for generating information according to claim 6, wherein the second determining unit includes:
A first operation module configured to perform a power operation with the cross-over ratio as a base number and a first preset numerical value as an exponent;
A first determination module configured to determine a calculation result of the exponentiation as a weight of the position information of the second face detection frame, and determine a difference value between a second preset value and the weight as a weight of the position information of the first face detection frame.
8. The apparatus for generating information according to claim 6, wherein the second determining unit includes:
a second operation module configured to perform a power operation with a natural constant as a base number and a difference between a reciprocal of the cross-over ratio and a second preset value as an exponent;
a second determination module configured to determine a reciprocal of a result of the exponentiation calculation as a weight of the position information of the second face detection frame, and determine a difference value between the second preset value and the weight as a weight of the position information of the first face detection frame.
9. The apparatus for generating information according to claim 6, wherein the position information of the first face detection box includes specified diagonal vertex coordinates of the first face detection box, and the position information of the second face detection box includes specified diagonal vertex coordinates of the second face detection box; and
The update unit, further configured to:
and determining a weight calculation result of the specified diagonal vertex coordinates of the first face detection frame and the second face detection frame as target diagonal vertex coordinates of the first face detection frame so as to update the position of the first face detection frame.
10. the apparatus for generating information of claim 9, wherein the specified diagonal vertex coordinates of the first face detection box comprise first vertex coordinates and second vertex coordinates, and the specified diagonal vertex coordinates of the second face detection box comprise third vertex coordinates and fourth vertex coordinates; and
The update unit, further configured to:
Determining a weighted calculation result of the abscissa of the first vertex coordinate and the abscissa of the third vertex coordinate as a first target abscissa;
determining a weighted calculation result of the ordinate of the first vertex coordinate and the ordinate of the third vertex coordinate as a first target ordinate;
determining a weighted calculation result of the abscissa of the second vertex coordinate and the abscissa of the fourth vertex coordinate as a second target abscissa;
determining a weighted calculation result of the ordinate of the second vertex coordinate and the ordinate of the fourth vertex coordinate as a second target ordinate;
And determining coordinates formed by the first target abscissa and the first target ordinate and coordinates formed by the second target abscissa and the second target ordinate as target diagonal vertex coordinates of the first face detection frame.
11. an electronic device, comprising:
One or more processors;
A storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201811110674.1A 2018-09-21 2018-09-21 Method and apparatus for generating information Active CN109308469B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811110674.1A CN109308469B (en) 2018-09-21 2018-09-21 Method and apparatus for generating information
PCT/CN2018/115974 WO2020056903A1 (en) 2018-09-21 2018-11-16 Information generating method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811110674.1A CN109308469B (en) 2018-09-21 2018-09-21 Method and apparatus for generating information

Publications (2)

Publication Number Publication Date
CN109308469A CN109308469A (en) 2019-02-05
CN109308469B true CN109308469B (en) 2019-12-10

Family

ID=65224012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811110674.1A Active CN109308469B (en) 2018-09-21 2018-09-21 Method and apparatus for generating information

Country Status (2)

Country Link
CN (1) CN109308469B (en)
WO (1) WO2020056903A1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677585A (en) * 2019-09-30 2020-01-10 Oppo广东移动通信有限公司 Target detection frame output method and device, terminal and storage medium
CN111179310B (en) * 2019-12-20 2024-06-25 腾讯科技(深圳)有限公司 Video data processing method, device, electronic equipment and computer readable medium
CN113592874B (en) * 2020-04-30 2024-06-14 杭州海康威视数字技术股份有限公司 Image display method, device and computer equipment
CN112084992B (en) * 2020-09-18 2021-04-13 北京中电兴发科技有限公司 Face frame selection method in face key point detection module
CN112584089B (en) * 2020-12-10 2021-05-07 浙江华创视讯科技有限公司 Face brightness adjusting method and device, computer equipment and storage medium
CN112613462B (en) * 2020-12-29 2022-09-23 安徽大学 Weighted intersection ratio method
CN113065457B (en) * 2021-03-30 2024-05-17 广州繁星互娱信息科技有限公司 Face detection point processing method and device, computer equipment and storage medium
CN112990084A (en) * 2021-04-07 2021-06-18 北京字跳网络技术有限公司 Image processing method and device and electronic equipment
CN112949785B (en) * 2021-05-14 2021-08-20 长沙智能驾驶研究院有限公司 Object detection method, device, equipment and computer storage medium
CN113283349A (en) * 2021-05-28 2021-08-20 中国公路工程咨询集团有限公司 Traffic infrastructure construction target monitoring system and method based on target anchor frame optimization strategy
CN113538274A (en) * 2021-07-14 2021-10-22 Oppo广东移动通信有限公司 Image beautifying processing method and device, storage medium and electronic equipment
CN115661798B (en) * 2022-12-23 2023-03-21 小米汽车科技有限公司 Method and device for determining target area, vehicle and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228112A (en) * 2016-07-08 2016-12-14 深圳市优必选科技有限公司 Face datection tracking and robot head method for controlling rotation and robot
CN106934817A (en) * 2017-02-23 2017-07-07 中国科学院自动化研究所 Based on multiattribute multi-object tracking method and device
CN107330920A (en) * 2017-06-28 2017-11-07 华中科技大学 A kind of monitor video multi-target tracking method based on deep learning
CN108388879A (en) * 2018-03-15 2018-08-10 斑马网络技术有限公司 Mesh object detection method, device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9984282B2 (en) * 2015-12-10 2018-05-29 Perfect Corp. Systems and methods for distinguishing facial features for cosmetic application
CN106952288B (en) * 2017-03-31 2019-09-24 西北工业大学 Based on convolution feature and global search detect it is long when block robust tracking method
CN108009473B (en) * 2017-10-31 2021-08-24 深圳大学 Video structuralization processing method, system and storage device based on target behavior attribute
CN107832741A (en) * 2017-11-28 2018-03-23 北京小米移动软件有限公司 The method, apparatus and computer-readable recording medium of facial modeling
CN108197568A (en) * 2017-12-31 2018-06-22 广州二元科技有限公司 A kind of processing method for the recognition of face being lifted in digitized video
CN108196680B (en) * 2018-01-25 2021-10-08 盛视科技股份有限公司 Robot vision following method based on human body feature extraction and retrieval

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228112A (en) * 2016-07-08 2016-12-14 深圳市优必选科技有限公司 Face datection tracking and robot head method for controlling rotation and robot
CN106934817A (en) * 2017-02-23 2017-07-07 中国科学院自动化研究所 Based on multiattribute multi-object tracking method and device
CN107330920A (en) * 2017-06-28 2017-11-07 华中科技大学 A kind of monitor video multi-target tracking method based on deep learning
CN108388879A (en) * 2018-03-15 2018-08-10 斑马网络技术有限公司 Mesh object detection method, device and storage medium

Also Published As

Publication number Publication date
WO2020056903A1 (en) 2020-03-26
CN109308469A (en) 2019-02-05

Similar Documents

Publication Publication Date Title
CN109308469B (en) Method and apparatus for generating information
CN108197618B (en) Method and device for generating human face detection model
CN110086988A (en) Shooting angle method of adjustment, device, equipment and its storage medium
CN109389072B (en) Data processing method and device
CN104394422A (en) Video segmentation point acquisition method and device
KR102467015B1 (en) Explore media collections using opt-out interstitial
CN110059623B (en) Method and apparatus for generating information
CN109272543B (en) Method and apparatus for generating a model
CN109271929B (en) Detection method and device
CN110516678B (en) Image processing method and device
CN109325996B (en) Method and device for generating information
CN109600544B (en) Local dynamic image generation method and device
CN109977905B (en) Method and apparatus for processing fundus images
EP3885980A1 (en) Method and apparatus for processing information, device, medium and computer program product
CN108648140B (en) Image splicing method, system, equipment and storage medium
CN111314626B (en) Method and apparatus for processing video
CN111597465A (en) Display method and device and electronic equipment
CN110211017B (en) Image processing method and device and electronic equipment
CN104541304B (en) Use the destination object angle-determining of multiple cameras
CN111652675A (en) Display method and device and electronic equipment
CN111833459B (en) Image processing method and device, electronic equipment and storage medium
CN110189364B (en) Method and device for generating information, and target tracking method and device
CN117459662A (en) Video playing method, video identifying method, video playing device, video playing equipment and storage medium
CN109816791B (en) Method and apparatus for generating information
CN111027495A (en) Method and device for detecting key points of human body

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

CP01 Change in the name or title of a patent holder