WO2020052062A1 - 检测方法和装置 - Google Patents

检测方法和装置 Download PDF

Info

Publication number
WO2020052062A1
WO2020052062A1 PCT/CN2018/115973 CN2018115973W WO2020052062A1 WO 2020052062 A1 WO2020052062 A1 WO 2020052062A1 CN 2018115973 W CN2018115973 W CN 2018115973W WO 2020052062 A1 WO2020052062 A1 WO 2020052062A1
Authority
WO
WIPO (PCT)
Prior art keywords
mouth opening
determining
current frame
face object
face
Prior art date
Application number
PCT/CN2018/115973
Other languages
English (en)
French (fr)
Inventor
邓启力
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020052062A1 publication Critical patent/WO2020052062A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular, to a detection method and device.
  • face keypoint detection is usually required in frames in a video. Then, according to the detection results of the key points of the face, the facial expressions of the face objects in the video are determined.
  • the related method is usually to determine that the mouth is open when the mouth opening distance is above a certain threshold; when the mouth opening distance is below the threshold, determine that the mouth is closed.
  • the embodiments of the present application provide a detection method and device.
  • an embodiment of the present application provides a detection method.
  • the method includes: obtaining a face keypoint detection result obtained by performing face keypoint detection on a face object in a current frame of a target video; Face keypoint detection results to determine the mouth opening distance of the face object in the current frame; based on a predetermined, open and closed state of the face object's mouth in the previous frame of the current frame, determine the target threshold; based on the mouth opening The comparison between the distance and the target threshold determines the mouth opening and closing state of the face object in the current frame.
  • determining the target threshold value based on a predetermined, open and closed state of the face object in the previous frame of the current frame includes: in response to determining the open and closed state of the face object in the previous frame In the open state, the preset first threshold value is determined as the target threshold value; in response to determining that the mouth opening and closing state of the face object in the previous frame is the closed state, the preset second threshold value is determined as the target threshold value, where , The first threshold is smaller than the second threshold.
  • determining a mouth opening / closing state of a face object in a current frame based on a comparison between a mouth opening distance and a target threshold includes: in response to determining that a mouth opening distance is greater than a target threshold, determining a person in the current frame The mouth opening and closing state of the face object is an open state; and in response to determining that the mouth opening distance is not greater than the target threshold, it is determined that the mouth opening and closing state of the face object in the current frame is a closed state.
  • the method further includes: in response to determining that there is no previous frame of the current frame, using a preset initial threshold as a target threshold, and determining a person in the current frame based on a comparison of the mouth opening distance and the target threshold Faces' mouths open and closed.
  • the method further comprises: in response to determining that the mouth opening and closing state of the face object in the current frame is an open state, obtaining a target special effect, and displaying the target special effect at the mouth position of the face object in the current frame.
  • an embodiment of the present application provides a detection device, the device includes: an obtaining unit configured to obtain a face key obtained by performing face key point detection on a face object in a current frame of a target video Point detection result; a first determination unit configured to determine a mouth opening distance of a face object in a current frame based on a face key point detection result; a second determination unit configured to be based on a predetermined, current frame's The mouth opening and closing state of the face object in the previous frame determines the target threshold; the third determining unit is configured to determine the mouth opening and closing state of the face object in the current frame based on the comparison of the mouth opening distance and the target threshold.
  • the second determination unit includes: a first determination module configured to determine a preset first threshold value in response to determining that the mouth opening and closing state of the face object in the previous frame is an open state Is a target threshold; a second determination module configured to determine a preset second threshold as a target threshold in response to determining that the mouth opening and closing state of the face object in the previous frame is a closed state, wherein the first threshold is less than The second threshold.
  • the third determining unit includes a third determining module configured to determine that the mouth opening and closing state of the face object in the current frame is the opened state in response to determining that the mouth opening distance is greater than the target threshold;
  • a fourth determination module is configured to determine that the mouth opening and closing state of the face object in the current frame is a closed state in response to determining that the mouth opening distance is not greater than the target threshold.
  • the apparatus further includes: a fourth determining unit configured to respond to determining that there is no previous frame of the current frame, using a preset initial threshold as the target threshold, and based on the mouth opening distance and the target threshold By comparison, the mouth opening and closing state of the face object in the current frame is determined.
  • the apparatus further includes: a display unit configured to respond to determining that the mouth opening and closing state of the face object in the current frame is an open state, acquiring a target special effect, and mouth of the face object in the current frame. Position display target special effects.
  • an embodiment of the present application provides an electronic device including: one or more processors; a storage device that stores one or more programs thereon; when one or more programs are processed by one or more processors Execution causes one or more processors to implement the method as in any one of the first aspects described above.
  • an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method as in any one of the foregoing first embodiments is implemented.
  • the detection method and device provided in the embodiments of the present application can obtain a face keypoint detection result obtained by performing face keypoint detection on a face object in a current frame of a target video, so that it can detect based on the face keypoint.
  • the mouth opening distance of the face object in the current frame is determined.
  • a target threshold can be determined, so that the mouth opening and closing of the face object in the current frame can be determined based on the comparison between the target threshold and the determined mouth opening distance. status.
  • the target threshold for data comparison with the mouth opening distance is determined based on the mouth opening and closing state of the face object in the previous frame, that is, the mouth opening and closing of the face object in the previous frame can be considered
  • the effect of the state on the mouth opening and closing state of the face object in the current frame is considered
  • the accuracy of the detection result of the mouth opening and closing state of the face object in the video can be improved.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a detection method according to the present application.
  • FIG. 3 is a schematic diagram of an application scenario of the detection method according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of a detection device according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 to which the detection method or detection device of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as voice interaction applications, shopping applications, search applications, instant communication tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Pictures Experts Group) Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio, Layer 4 IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • MP3 players Motion Pictures Experts Group Audio Layer III, moving picture expert compression standard audio layer 3
  • MP4 Motion Picture Experts Group Audio, Layer 4 IV, moving picture expert compression standard audio layer 4
  • player laptop portable computer and desktop computer, etc.
  • laptop portable computer and desktop computer etc.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited
  • an image acquisition device may also be installed thereon.
  • the image acquisition device can be various devices that can implement the function of acquiring images, such as cameras, sensors, and so on. Users can use the image capture device on the terminal devices 101, 102, 103 to capture video.
  • Terminal devices 101, 102, and 103 can perform face detection, face key point detection, and other processing on the video they play or frames recorded by users; they can also analyze and calculate the results of face key point detection.
  • a target threshold value can also be selected based on the mouth opening and closing state of a frame to use the target threshold value and the mouth opening distance in the next frame to The mouth opening and closing state of the face object in the frame is detected, and a detection result is obtained.
  • the server 105 may be a server providing various services, such as a video processing server for storing, managing, or analyzing videos uploaded by the terminal devices 101, 102, and 103.
  • the video processing server can store a large number of videos, and can send videos to the terminal devices 101, 102, and 103.
  • the server 105 may be hardware or software.
  • the server can be implemented as a distributed server cluster consisting of multiple servers or as a single server.
  • the server can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the detection methods provided in the embodiments of the present application are generally executed by the terminal devices 101, 102, and 103, and accordingly, the detection devices are generally disposed in the terminal devices 101, 102, and 103.
  • the server 105 may not be set in the system architecture 100.
  • the server 105 can also perform face detection, face key point detection, mouth opening and closing status detection on its stored videos or videos uploaded by terminal devices 101, 102, and 103, and process The results are returned to the terminal devices 101, 102, 103.
  • the detection method provided in the embodiment of the present application may also be executed by the server 105, and accordingly, the detection device may also be set in the server 105.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the detection method includes the following steps:
  • Step 201 Obtain a face keypoint detection result obtained by performing face keypoint detection on a face object in a current frame of a target video.
  • an execution subject of the detection method can record or play a video.
  • the video that it plays may be a video that is stored locally in advance; it may also be a video that is obtained from a server (such as the server 105 shown in FIG. 1) through a wired connection or a wireless connection.
  • a server such as the server 105 shown in FIG. 1
  • the above-mentioned execution body may be installed or connected with an image acquisition device (for example, a camera).
  • wireless connection methods may include, but are not limited to, 3G / 4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods known or developed in the future. .
  • the execution subject may obtain a face keypoint detection result obtained by performing face keypoint detection on a face object in a current frame of the target video.
  • the target video may be a video currently being played or a video being recorded by a user. It is not limited here.
  • the above-mentioned face key detection result may include the position (can be represented by coordinates) of the key points of each face.
  • the key points of the face may be key points in the face (for example, points with semantic information, or points that affect the contour or facial features of the face, etc.).
  • the key face detection results may include coordinates of the center position of the upper lip, coordinates of the center position of the lower lip, and the like.
  • the current frame of the target video may be a frame in which the face object of the target video to be detected is opened and closed.
  • the above-mentioned execution subject may sequentially detect the mouth opening and closing state of the face object in each frame of the target video in the order of the timestamps of the frames.
  • the frame currently to be detected for the mouth opening and closing state can be referred to as the current frame of the target video.
  • the target video may be a video being played by the execution subject.
  • the above-mentioned executing subject may perform face keypoint detection on each frame to be played one by one to obtain the face keypoint detection result of the face object in the frame, so as to detect the keypoints in the frame.
  • the face object detects the mouth opening and closing state, and then plays the frame.
  • the frame to be played at the current moment may be the current frame.
  • the target video may be a video being recorded by the above-mentioned execution subject.
  • the above-mentioned executing subject may perform face keypoint detection on each captured frame one by one to obtain the face keypoint detection result of the face object in the frame, so as to detect the keypoints in the frame.
  • the face object detects the mouth opening and closing state, and then displays the frame.
  • the latest frame acquired at the current moment may be the current frame.
  • the keypoint detection of the face can be performed in various ways.
  • a face keypoint detection model for face keypoint detection on an image may be stored in the execution subject in advance.
  • the frame For each frame of the target video, the frame can be input into the above-mentioned face keypoint detection model to obtain a face keypoint detection result.
  • the face keypoint detection model can be obtained by supervised training of the existing convolutional neural network based on a sample set using a machine learning method.
  • the convolutional neural network can use various existing structures, such as DenseBox, VGGNet, ResNet, SegNet, and so on. It should be noted that the above-mentioned machine learning method and supervised training method are well-known technologies that are widely studied and applied at present, and will not be repeated here.
  • a face detection model for performing face detection on an image may also be stored in the execution subject in advance.
  • the execution subject may first input the frame to a face detection model to obtain a face detection result (for example, for indicating the position of a region where a face object is located, That is, the position of the face detection frame).
  • a screenshot can be taken of the area where the face object is located to obtain a face image.
  • the face image can be input to a face keypoint detection model to obtain a face keypoint detection result.
  • Step 202 Determine a mouth opening distance of a face object in a current frame based on a detection result of a face keypoint.
  • the above-mentioned executing subject may first adjust the scaling ratio of the face object based on the detection result of the key points of the face. For example, the distance from the coordinates of the forehead to the coordinates of the chin in the detection result of the key points of the face may be calculated, and then the ratio of the distance to the preset distance is determined, and the ratio is determined as the scaling ratio. Then, because the face detection result may include the coordinates of the center position of the upper lip of the face object, and the coordinates of the center position of the lower lip, the execution subject may calculate the distance between the two coordinates and divide the distance by the above. Scaling to determine how far the mouth is open. It should be noted that other distances can also be used to determine the zoom ratio, which is not limited here. For example, the ratio of the distance between the left and right corners of the mouth to another preset distance can be used as the scaling ratio.
  • the above-mentioned executing subject may also perform face detection in advance, after determining the area of the face object, zooming the area, and then performing face key point detection. At this time, the distance between the coordinates of the center position of the upper lip and the coordinates of the center position of the lower lip in the face keypoint detection result is the mouth opening distance.
  • Step 203 Determine a target threshold based on a predetermined mouth opening and closing state of a face object in a previous frame of the current frame.
  • the execution subject since the execution subject can sequentially detect the mouth opening and closing state of the face object in the frame in the target video, the execution subject has previously determined when performing the mouth opening and closing state detection on the current frame. The detection result of the mouth opening and closing state of the face object in the previous frame of the current frame is obtained.
  • the above-mentioned execution subject may determine the target threshold based on a predetermined state of mouth opening and closing of a face object in a previous frame of the current frame.
  • the target threshold may be a threshold selected by the execution subject from a plurality of thresholds preset by the user based on the mouth opening and closing states of the face object in the previous frame.
  • the faces of the face objects in the previous frame have different opening and closing states, and the target thresholds are also different.
  • the face object in the previous frame and the current frame may be the face of the same person.
  • the face objects in the current frame and the previous frame are the user's face.
  • the preset first threshold value in response to determining that the mouth opening and closing state of the face object in the previous frame is the open state, the preset first threshold value may be determined as the target threshold value.
  • a preset second threshold value in response to determining that the mouth opening and closing state of the face object in the previous frame is a closed state, a preset second threshold value may be determined as a target threshold value.
  • the first threshold may be smaller than the second threshold.
  • a technician may set multiple thresholds based on a large amount of data statistics and experiments in advance, and the sizes of the multiple thresholds are different.
  • the execution subject may use a threshold value that is greater than a preset intermediate value as any one of the plurality of threshold values as a target threshold value.
  • the execution subject may use any one of the plurality of thresholds that is smaller than a preset intermediate value as a target threshold.
  • the preset intermediate value may be an average value of the plurality of threshold values, or may be any value greater than a minimum value of the plurality of threshold values and smaller than a maximum value of the plurality of threshold values.
  • a single threshold is usually set.
  • the mouth opening state is considered; if the mouth opening distance is less than the threshold, the mouth closing state is considered.
  • the detection result will jump back and forth, resulting in poor stability and accuracy of the detection result.
  • the method of determining the target threshold based on the mouth opening and closing status of the face object in the previous frame is adopted. By selecting different thresholds, frequent detection results can be avoided, and the stability and accuracy of the detection results are improved.
  • the execution subject may use a preset initial threshold as a target threshold, and determine a mouth open / closed state of a face object in the current frame based on a comparison between the mouth opening distance and the target threshold. For example, if the mouth opening distance is greater than the target threshold, the mouth may be determined to be open; if the mouth opening distance is not greater than the target threshold, the mouth may be determined to be closed.
  • the above-mentioned initial threshold may be set according to actual requirements.
  • the initial threshold may be larger than the first threshold and smaller than the second threshold.
  • Step 204 Determine the mouth opening and closing state of the face object in the current frame based on the comparison between the mouth opening distance and the target threshold.
  • the above-mentioned executing subject may determine the mouth opening / closing state of the face object in the current frame based on a comparison between the mouth opening distance determined in step 202 and the target threshold determined in step 203.
  • the executing subject in response to determining that the mouth opening distance is greater than the target threshold, may determine that the mouth opening and closing state of the face object in the current frame is an opened state. In response to determining that the mouth opening distance is not greater than the target threshold, the execution subject may determine that the mouth opening and closing state of the face object in the current frame is a closed state.
  • the executing subject in response to determining that the mouth opening distance is greater than the target threshold, may determine that the mouth opening and closing state of the face object in the current frame is an opened state. In response to determining that the mouth opening distance is less than the target threshold, the execution subject may determine that the mouth opening and closing state of the face object in the current frame is a closed state. In response to determining that the mouth opening distance is equal to the target threshold, the execution subject may use the mouth opening and closing state of the previous frame as the mouth opening and closing state of the current frame.
  • the execution subject in response to determining that the mouth opening and closing state of the face object in the current frame is an open state, obtains a target special effect (such as a sticker on the mouth).
  • a target special effect such as a sticker on the mouth.
  • FIG. 3 is a schematic diagram of an application scenario of the detection method according to this embodiment.
  • a user uses a self-timer mode of the terminal device 301 to record a target video.
  • the terminal device uses the stored face keypoint detection model to perform face keypoint detection on the current frame, and obtains a face keypoint detection result 302.
  • the terminal device 301 determines the mouth opening distance 303 of the face object in the current frame based on the face keypoint detection result 302.
  • the terminal device 301 obtains the mouth opening and closing state 304 of the face object in the previous frame, so that the target threshold 305 can be determined.
  • the terminal device 301 can determine the mouth opening and closing state 306 of the face object in the current frame.
  • the method provided by the foregoing embodiment of the present application can obtain a face keypoint detection result obtained by performing face keypoint detection on a face object in a current frame of a target video, so as to be capable of being based on the face keypoint detection result.
  • a target threshold can be determined, so that the mouth opening and closing of the face object in the current frame can be determined based on the comparison between the target threshold and the determined mouth opening distance. status.
  • the target threshold for data comparison with the mouth opening distance is determined based on the mouth opening and closing state of the face object in the previous frame, that is, the mouth opening and closing of the face object in the previous frame can be considered
  • the effect of the state on the mouth opening and closing state of the face object in the current frame Therefore, by selecting different target thresholds, frequent detection results can be avoided, and the stability and accuracy of the detection results of the mouth opening and closing states of the face objects in the video can be improved.
  • FIG. 4 illustrates a flow 400 of yet another embodiment of a detection method.
  • the process 400 of the detection method includes the following steps:
  • Step 401 Obtain a face keypoint detection result obtained by performing face keypoint detection on a face object in a current frame of a target video.
  • an execution subject of the detection method obtains a human face obtained by performing face keypoint detection on a face object in a current frame of a target video. Key point detection results.
  • face detection in the current frame may also be performed in advance, so as to determine a region where the face object is located.
  • the area may also be scaled so that the size (eg, length) of the area is the same as the preset size (eg, length).
  • Step 402 Determine a mouth opening distance of a face object in a current frame based on a detection result of a face keypoint.
  • the execution subject may calculate the distance between the two coordinates, and The distance is determined as the mouth opening distance.
  • Step 403 In response to determining that the mouth opening and closing state of the face object in the previous frame is an open state, determine a preset first threshold value as a target threshold value.
  • the execution subject may determine the preset first threshold value as the target threshold value (for example, 0.2).
  • step 404 in response to determining that the mouth opening and closing state of the face object in the previous frame is a closed state, a preset second threshold value is determined as a target threshold value.
  • the execution subject in response to determining that the mouth opening and closing state of the face object in the previous frame is a closed state, the execution subject may determine a preset second threshold value as a target threshold value.
  • the first threshold may be smaller than the second threshold.
  • step 405 based on the comparison between the mouth opening distance and the target threshold, the mouth opening and closing state of the face object in the current frame is determined.
  • the electronic device may determine the mouth opening / closing state of the face object in the current frame based on a comparison between the mouth opening distance and the target threshold. Specifically, in response to determining that the mouth opening distance is greater than the target threshold, it may be determined that the mouth opening and closing state of the face object in the current frame is an opened state. In response to determining that the mouth opening distance is not greater than the target threshold, it may be determined that the mouth opening and closing state of the face object in the current frame is a closed state.
  • step 406 in response to determining that the mouth opening and closing state of the face object in the current frame is an open state, obtaining a target special effect, and displaying the target special effect at the mouth position of the face object in the current frame.
  • the execution subject in response to determining that the mouth opening and closing state of the face object in the current frame is an open state, the execution subject may obtain a target special effect (such as a sticker on the mouth).
  • a target special effect such as a sticker on the mouth.
  • the position of the mouth shows the above target special effects.
  • the process 400 of the detection method in this embodiment involves the steps of detecting a mouth opening and closing state by setting a double threshold. Therefore, the solution described in this embodiment can determine the target threshold based on the mouth opening and closing status of the face object in the previous frame. By selecting different thresholds, frequent detection results can be avoided and the stability of the detection results can be improved. And accuracy. In addition, it also involves the step of displaying the target special effects after determining that the mouth opening and closing state is the opened state. This can enrich the presentation form of the video.
  • this application provides an embodiment of a detection device.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2, and the device may be specifically applied.
  • the device may be specifically applied.
  • various electronic equipment In various electronic equipment.
  • the detection device 500 includes: an obtaining unit 501 configured to obtain a face key point obtained by performing face key point detection on a face object in a current frame of a target video Detection result; a first determination unit 502 is configured to determine a mouth opening distance of a face object in the current frame based on the detection result of the keypoints of the face; a second determination unit 503 is configured to be based on a predetermined, The mouth opening / closing state of the face object in the previous frame of the current frame determines a target threshold; a third determining unit 504 is configured to determine, based on a comparison between the mouth opening distance and the target threshold, the The state of the mouth of the face object is opened and closed.
  • the foregoing second determination unit 503 may include a first determination module and a second determination module (not shown in the figure).
  • the first determining module may be configured to determine a preset first threshold value as a target threshold value in response to determining that the mouth opening and closing state of the face object in the previous frame is an open state.
  • the second determining module may be configured to determine a preset second threshold value as a target threshold value in response to determining that the mouth opening and closing state of the face object in the previous frame is a target threshold value, wherein the first threshold value is smaller than the first threshold value.
  • Two thresholds Two thresholds.
  • the third determining unit 504 may include a third determining module and a fourth determining module (not shown in the figure).
  • the third determination module may be configured to determine that the mouth opening and closing state of the face object in the current frame is an open state in response to determining that the mouth opening distance is greater than the target threshold.
  • the fourth determination module may be configured to determine, in response to determining that the mouth opening distance is not greater than the target threshold, the mouth opening and closing state of the face object in the current frame is a closed state.
  • the apparatus may further include a fourth determining unit (not shown in the figure).
  • the fourth determining unit may be configured to determine, in response to determining that there is no previous frame of the current frame, a preset initial threshold value as the target threshold value, and determine the current frame in the current frame based on a comparison between the mouth opening distance and the target threshold value. Human face with mouth open and closed.
  • the device may further include a display unit (not shown in the figure).
  • the display unit may be configured to obtain the target special effect in response to determining that the mouth opening and closing state of the face object in the current frame is an open state, and display the target special effect at the mouth position of the face object in the current frame.
  • the apparatus provided by the foregoing embodiment of the present application obtains a face keypoint detection result obtained by performing face keypoint detection on a face object in a current frame of a target video through the obtaining unit 501, so that the first determining unit 502 can Based on the detection result of the face keypoint, the mouth opening distance of the face object in the current frame is determined. Then the second determining unit 503 can determine the target threshold based on the mouth opening and closing status of the face object in the previous frame, so that the third determining unit 504 can determine the current based on the comparison of the target threshold and the determined mouth opening distance The mouth of the face object in the frame is opened and closed.
  • the target threshold for data comparison with the mouth opening distance is determined based on the mouth opening and closing state of the face object in the previous frame, that is, the mouth opening and closing of the face object in the previous frame can be considered
  • the effect of the state on the mouth opening and closing state of the face object in the current frame Therefore, by selecting different target thresholds, frequent detection results can be avoided, and the stability and accuracy of the detection results of the mouth opening and closing states of the face objects in the video can be improved.
  • FIG. 6 illustrates a schematic structural diagram of a computer system 600 suitable for implementing an electronic device according to an embodiment of the present application.
  • the electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into a random access memory (RAM) 603 according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608. Instead, perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read-only memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following components are connected to the I / O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the speaker; a storage portion 608 including a hard disk and the like; a communication section 609 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I / O interface 605 as necessary.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as necessary, so that a computer program read therefrom is installed into the storage section 608 as necessary.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611.
  • CPU central processing unit
  • the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an obtaining unit, a first determining unit, a second determining unit, and a third determining unit.
  • a processor includes an obtaining unit, a first determining unit, a second determining unit, and a third determining unit.
  • the name of these units does not constitute a limitation on the unit itself in some cases.
  • the obtaining unit may also be described as "obtaining face keypoint detection on a face object in the current frame of the target video. The unit of the detection result of the face keypoints. "
  • the present application also provides a computer-readable medium, which may be included in the device described in the foregoing embodiments; or may exist alone without being assembled into the device.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device is caused to: obtain a face key point detected by a face object in a current frame of a target video The obtained face keypoint detection result; based on the face keypoint detection result, determining the mouth opening distance of the face object in the current frame; based on a predetermined face object in the previous frame of the current frame Determine the target threshold value based on the mouth opening and closing state of the mouth; based on the comparison between the mouth opening distance and the target threshold, determine the mouth opening and closing state of the face object in the current frame.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种检测方法和装置,该方法包括:获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果(201);基于该人脸关键点检测结果,确定该当前帧中的人脸对象的嘴巴张开距离(202);基于预先确定的、该当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值(203);基于该嘴巴张开距离与该目标阈值的比较,确定该当前帧中的人脸对象的嘴巴开闭状态(204)。提高了视频中人脸对象嘴巴开闭状态检测结果的准确性。

Description

检测方法和装置
本专利申请要求于2018年9月14日提交的、申请号为201811075036.0、申请人为北京字节跳动网络技术有限公司、发明名称为“检测方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请实施例涉及计算机技术领域,具体涉及检测方法和装置。
背景技术
随着计算机技术的发展,通常需要在视频中的帧进行人脸关键点检测。而后根据人脸关键点检测结果,确定视频中的人脸对象的表情等。
在对嘴巴开闭状态检测时,相关的方式通常是当嘴巴张开距离处于某阈值以上时,确定嘴巴为张开状态;当嘴巴张开距离处于该阈值以下时,确定嘴巴为闭合状态。
发明内容
本申请实施例提出了检测方法和装置。
第一方面,本申请实施例提供了一种检测方法,该方法包括:获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果;基于人脸关键点检测结果,确定当前帧中的人脸对象的嘴巴张开距离;基于预先确定的、当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值;基于嘴巴张开距离与目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
在一些实施例中,基于预先确定的、当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值,包括:响应于确定上一帧中的人 脸对象的嘴巴开闭状态为张开状态,将预设的第一阈值确定为目标阈值;响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,将预设的第二阈值确定为目标阈值,其中,第一阈值小于第二阈值。
在一些实施例中,基于嘴巴张开距离与目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态,包括:响应于确定嘴巴张开距离大于目标阈值,确定当前帧中的人脸对象的嘴巴开闭状态为张开状态;响应于确定嘴巴张开距离不大于目标阈值,确定当前帧中的人脸对象的嘴巴开闭状态为闭合状态。
在一些实施例中,该方法还包括:响应于确定不存在当前帧的上一帧,将预先设置的初始阈值作为目标阈值,基于嘴巴张开距离与目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
在一些实施例中,该方法还包括:响应于确定当前帧中的人脸对象的嘴巴开闭状态为张开状态,获取目标特效,在当前帧的人脸对象的嘴巴位置展示目标特效。
第二方面,本申请实施例提供了一种检测装置,该装置包括:获取单元,被配置成获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果;第一确定单元,被配置成基于人脸关键点检测结果,确定当前帧中的人脸对象的嘴巴张开距离;第二确定单元,被配置成基于预先确定的、当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值;第三确定单元,被配置成基于嘴巴张开距离与目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
在一些实施例中,第二确定单元,包括:第一确定模块,被配置成响应于确定上一帧中的人脸对象的嘴巴开闭状态为张开状态,将预设的第一阈值确定为目标阈值;第二确定模块,被配置成响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,将预设的第二阈值确定为目标阈值,其中,第一阈值小于第二阈值。
在一些实施例中,第三确定单元,包括:第三确定模块,被配置成响应于确定嘴巴张开距离大于目标阈值,确定当前帧中的人脸对象的嘴巴开闭状态为张开状态;第四确定模块,被配置成响应于确定嘴 巴张开距离不大于目标阈值,确定当前帧中的人脸对象的嘴巴开闭状态为闭合状态。
在一些实施例中,该装置还包括:第四确定单元,被配置成响应于确定不存在当前帧的上一帧,将预先设置的初始阈值作为目标阈值,基于嘴巴张开距离与目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
在一些实施例中,该装置还包括:展示单元,被配置成响应于确定当前帧中的人脸对象的嘴巴开闭状态为张开状态,获取目标特效,在当前帧的人脸对象的嘴巴位置展示目标特效。
第三方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上述第一方面中任一实施例的方法。
第四方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如上述第一方面中任一实施例的方法。
本申请实施例提供的检测方法和装置,通过获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果,从而能够基于该人脸关键点检测结果,确定出当前帧中的人脸对象的嘴巴张开距离。而后基于上一帧中的人脸对象的嘴巴开闭状态,可以确定出目标阈值,从而可以基于目标阈值与所确定的嘴巴张开距离的比较,确定当前帧中的人脸对象的嘴巴开闭状态。由此,与嘴巴张开距离进行数据比较的目标阈值是基于上一帧中的人脸对象的嘴巴开闭状态进行确定的,即,可以考虑到上一帧中的人脸对象的嘴巴开闭状态对当前帧中的人脸对象的嘴巴开闭状态的影响。由此,可以提高视频中的人脸对象的嘴巴开闭状态的检测结果的准确性。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本申请的检测方法的一个实施例的流程图;
图3是根据本申请的检测方法的一个应用场景的示意图;
图4是根据本申请的检测方法的又一个实施例的流程图;
图5是根据本申请的检测装置的一个实施例的结构示意图;
图6是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的检测方法或检测装置的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如语音交互类应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、 MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
当终端设备101、102、103为硬件时,其上还可以安装有图像采集设备。图像采集设备可以是各种能实现采集图像功能的设备,如摄像头、传感器等等。用户可以利用终端设备101、102、103上的图像采集设备,来采集视频。
终端设备101、102、103可以对其所播放的视频或用户所录制的视频中的帧进行人脸检测、人脸关键点检测等处理;也可以对人脸关键点检测结果进行分析、计算等,确定视频中各帧的人脸对象的嘴巴张开距离;还可以基于某一帧的嘴巴开闭状态选取目标阈值,以利用该目标阈值和下一帧中的嘴巴张开距离,对下一帧中的人脸对象的嘴巴开闭状态进行检测,得到检测结果。
服务器105可以是提供各种服务的服务器,例如用于对终端设备101、102、103上传的视频进行存储、管理或者分析的视频处理服务器。视频处理服务器可以存储有大量的视频,并可以向终端设备101、102、103发送视频。
需要说明的是,服务器105可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
需要说明的是,本申请实施例所提供的检测方法一般由终端设备101、102、103执行,相应地,检测装置一般设置于终端设备101、102、103中。
需要指出的是,在终端设备101、102、103可以实现服务器105 的相关功能的情况下,系统架构100中可以不设置服务器105。
还需要指出的是,服务器105也可以对其所存储的视频或者终端设备101、102、103所上传的视频进行人脸检测、人脸关键点检测、嘴巴开闭状态检测等处理,并将处理结果返回给终端设备101、102、103。此时,本申请实施例所提供的检测方法也可以由服务器105执行,相应地,检测装置也可以设置于服务器105中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的检测方法的一个实施例的流程200。该检测方法,包括以下步骤:
步骤201,获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果。
在本实施例中,检测方法的执行主体(例如图1所示的终端设备101、102、103)可以进行视频的录制或播放。其所播放的视频可以是预先存储在本地的视频;也可以是通过有线连接或者无线连接方式,从服务器(例如图1所示的服务器105)中获取的视频。此处,当进行视频的录制时,上述执行主体可以安装或连接有图像采集装置(例如摄像头)。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。
在本实施例中,上述执行主体可以获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果。其中,上述目标视频可以是当前正在播放的视频,也可以是用户正在录制的视频。此处不作限定。上述人脸关键检测结果可以包括各个人脸关键点的位置(可以用坐标表示)。实践中,人脸关键点可以是人脸中的关键的点(例如具有语义信息的点,或者影响脸部轮廓或者五官形状的点等)。人脸关键检测结果中可以包括上嘴唇中心位置的坐标,下嘴唇中心位置的坐标等。
此处,目标视频的当前帧,可以是目标视频中待对其中的人脸对 象进行嘴巴开闭状态检测的帧。作为示例,上述执行主体可以按照帧的时间戳的顺序,依次对目标视频中的每一帧中的人脸对象进行嘴巴开闭状态检测。当前待进行嘴巴开闭状态检测的帧,即可以称为目标视频的当前帧。以如下两种场景为例:
在一种场景中,目标视频可以是上述执行主体正在播放的视频。在目标视频的播放过程中,上述执行主体可以逐一地对每一个待播放的帧进行人脸关键点检测,得到该帧中的人脸对象的人脸关键点检测结果,以便对该帧中的人脸对象进行嘴巴开闭状态检测,进而进行该帧的播放。在当前时刻即将播放的帧,可以为当前帧。
在另一种场景中,目标视频可以是上述执行主体正在录制的视频。在目标视频的录制过程中,上述执行主体可以逐一地对每一个已捕获的帧进行人脸关键点检测,得到该帧中的人脸对象的人脸关键点检测结果,以便对该帧中的人脸对象进行嘴巴开闭状态检测,进而对该帧进行显示。在当前时刻所获取的最新的帧,可以为当前帧。
需要说明的是,可以利用各种方式进行人脸关键点检测。例如,上述执行主体中可以预先存储有用于对图像进行人脸关键点检测的人脸关键点检测模型。对于目标视频的每一帧,可以将该帧输入到上述人脸关键点检测模型中,得到人脸关键点检测结果。这里,人脸关键点检测模型可以是利用机器学习方法,基于样本集,对现有的卷积神经网络进行有监督训练得到的。其中,卷积神经网络可以使用各种现有的结构,例如DenseBox、VGGNet、ResNet、SegNet等。需要说明的是,上述机器学习方法、有监督训练方法是目前广泛研究和应用的公知技术,在此不再赘述。
在本实施例的一些可选的实现方式中,上述执行主体中还可以预先存储有用于对图像进行人脸检测的人脸检测模型。此时,在待对某一帧进行嘴巴开闭状态检测时,上述执行主体可以首先将该帧输入至人脸检测模型,得到人脸检测结果(例如用于指示人脸对象所在区域的位置,即,人脸检测框的位置)。而后,可以对该人脸对象所在区域进行截图,即可得到人脸图像。之后,可以将该人脸图像输入至人脸关键点检测模型,得到人脸关键点检测结果。
步骤202,基于人脸关键点检测结果,确定当前帧中的人脸对象的嘴巴张开距离。
在本实施例中,上述执行主体可以首先基于人脸关键点检测结果调整人脸对象的缩放比例。例如,可以计算人脸关键点检测结果中的额头的坐标到下巴的坐标的距离,而后确定该距离与预设距离的比值,将该比值确定为缩放比例。而后,由于人脸检测结果中可以包括人脸对象的上嘴唇中心位置的坐标,下嘴唇中心位置的坐标,因此,上述执行主体可以计算这两个坐标之间的距离,将该距离除以上述缩放比例,确定出嘴巴张开距离。需要指出的是,还可以使用其他距离确定缩放比例,此处不作限定。例如,可以使用左右嘴角的距离与另一预设距离的比值作为缩放比例。
需要说明的是,上述执行主体也可以预先进行人脸检测,在确定出人脸对象区域后对该区域进行缩放,而后再进行人脸关键点检测。此时,人脸关键点检测结果中的上嘴唇中心位置的坐标与下嘴唇中心位置的坐标的距离,即为嘴巴张开距离。
步骤203,基于预先确定的、当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值。
在本实施例中,由于上述执行主体可以依次对目标视频中的帧中的人脸对象进行嘴巴开闭状态检测,因此,上述执行主体在对当前帧进行嘴巴开闭状态检测时,已预先确定出当前帧的上一帧中的人脸对象的嘴巴开闭状态的检测结果。此时,上述执行主体可以基于预先确定的、当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值。此处,目标阈值可以是上述执行主体基于上一帧中的人脸对象的嘴巴开闭状态,从用户所预先设置的多个阈值中所选取的阈值。此处,上一帧中的人脸对象的嘴巴开闭状态不同,目标阈值也不同。
需要说明的是,此处的上一帧和当前帧中的人脸对象可以是同一个人的人脸。例如,用户录制自拍视频的过程中,当前帧和上一帧中的人脸对象都是该用户的人脸。
在本实施例的一些可选的实现方式中,响应于确定上一帧中的人脸对象的嘴巴开闭状态为张开状态,可以将预设的第一阈值确定为目 标阈值。响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,可以将预设的第二阈值确定为目标阈值。其中,上述第一阈值可以小于上述第二阈值。需要说明的是,在这种实现方式中,技术人员可以预先基于大量的数据统计和试验设定两个阈值(分别为第一阈值和第二阈值)。
在本实施例的一些可选的实现方式中,技术人员可以预先基于大量的数据统计和试验设定多个阈值,且多个阈值的大小不同。响应于确定上一帧中的人脸对象的嘴巴开闭状态为张开状态,上述执行主体可以将上述多个阈值中的任一大于预设中间值的阈值作为目标阈值。响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,上述执行主体可以将上述多个阈值中的任一小于预设中间值的阈值作为目标阈值。此处,预设中间值可以是上述多个阈值的平均值,也可以是任一大于上述多个阈值中的最小值且小于上述多个阈值中的最大值的数值。
在以往的方式中,通常设置一个单一的阈值。当嘴巴张开距离大于该预置,则认为是张嘴状态;若嘴巴张开距离小于该阈值,则认为是闭嘴状态。以往的这种方式,在嘴巴张开距离处于该阈值附近时,会造成检测结果来回跳动,导致检测结果的稳定性和准确性均较差。而采用本实施例中基于上一帧中的人脸对象的嘴巴开闭状态确定目标阈值的方式,通过不同阈值的选取,可以避免检测结果频繁跳动,提高了检测结果的稳定性和准确性。
在本实施例的一些可选的实现方式中,若不存在当前帧的上一帧,即当前帧为目标视频的首帧。上述执行主体可以将预先设置的初始阈值作为目标阈值,基于上述嘴巴张开距离与上述目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。例如,若嘴巴张开距离大于目标阈值,可以确定为嘴巴为张开状态;若嘴巴张开距离不大于上述目标阈值,可以确定为嘴巴为闭合状态。此处,上述初始阈值可以根据实际需求设定。
在本实施例的一些可选的实现方式中,上述初始阈值可以大于上述第一阈值且小于上述第二阈值。
步骤204,基于嘴巴张开距离与目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
在本实施例中,上述执行主体可以基于步骤202所确定的嘴巴张开距离与步骤203所确定的目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
在本实施例的一些可选的实现方式中,响应于确定上述嘴巴张开距离大于上述目标阈值,上述执行主体可以确定上述当前帧中的人脸对象的嘴巴开闭状态为张开状态。响应于确定上述嘴巴张开距离不大于上述目标阈值,上述执行主体可以确定上述当前帧中的人脸对象的嘴巴开闭状态为闭合状态。
在本实施例的一些可选的实现方式中,响应于确定上述嘴巴张开距离大于上述目标阈值,上述执行主体可以确定上述当前帧中的人脸对象的嘴巴开闭状态为张开状态。响应于确定上述嘴巴张开距离小于上述目标阈值,上述执行主体可以确定上述当前帧中的人脸对象的嘴巴开闭状态为闭合状态。响应于确定上述嘴巴张开距离等于上述目标阈值,上述执行主体可以将上一帧的嘴巴开闭状态作为当前帧的嘴巴开闭状态。
在本实施例的一些可选的实现方式中,响应于确定上述当前帧中的人脸对象的嘴巴开闭状态为张开状态,上述执行主体获取目标特效(例如嘴巴的贴纸),在上述当前帧的人脸对象的嘴巴位置展示上述目标特效。
继续参见图3,图3是根据本实施例的检测方法的应用场景的一个示意图。在图3的应用场景中,用户使用终端设备301的自拍模式录制目标视频。终端设备在捕获到当前帧后,利用其所存储的人脸关键点检测模型对当前帧进行了人脸关键点检测,并获取了人脸关键点检测结果302。而后,终端设备301基于人脸关键点检测结果302确定出当前帧中的人脸对象的嘴巴张开距离303。接着,终端设备301获取到上一帧中的人脸对象的嘴巴开闭状态304,从而可以确定出目标阈值305。最后,上述终端设备301基于目标阈值305与嘴巴张开距离303的比较,可以确定出当前帧中的人脸对象的嘴巴开闭状态 306。
本申请的上述实施例提供的方法,通过获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果,从而能够基于该人脸关键点检测结果,确定出当前帧中的人脸对象的嘴巴张开距离。而后基于上一帧中的人脸对象的嘴巴开闭状态,可以确定出目标阈值,从而可以基于目标阈值与所确定的嘴巴张开距离的比较,确定当前帧中的人脸对象的嘴巴开闭状态。由此,与嘴巴张开距离进行数据比较的目标阈值是基于上一帧中的人脸对象的嘴巴开闭状态进行确定的,即,可以考虑到上一帧中的人脸对象的嘴巴开闭状态对当前帧中的人脸对象的嘴巴开闭状态的影响。由此,通过不同目标阈值的选取,可以避免检测结果频繁跳动,提高了视频中的人脸对象的嘴巴开闭状态的检测结果的稳定性和准确性。
进一步参考图4,其示出了检测方法的又一个实施例的流程400。该检测方法的流程400,包括以下步骤:
步骤401,获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果。
在本实施例中,检测方法的执行主体(例如图1所示的终端设备101、102、103)获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果。
在本实施例中,在对当前帧中的人脸对象进行人脸关键点检测之前,还可以预先进行当前帧的人脸检测,以便确定人脸对象所在区域。此外,在确定人脸对象所在区域之后,还可以对该区域进行缩放,以使该区域的尺寸(例如长度)与预设的尺寸(例如长度)相同。
步骤402,基于人脸关键点检测结果,确定当前帧中的人脸对象的嘴巴张开距离。
在本实施例中,由于人脸检测结果中可以包括人脸对象的上嘴唇中心位置的坐标,下嘴唇中心位置的坐标,因此,上述执行主体可以计算这两个坐标之间的距离,将该距离确定为嘴巴张开距离。
步骤403,响应于确定上一帧中的人脸对象的嘴巴开闭状态为张开状态,将预设的第一阈值确定为目标阈值。
在本实施例中,响应于确定上一帧中的人脸对象的嘴巴开闭状态为张开状态,上述执行主体可以将预设的第一阈值确定为目标阈值(例如0.2)。
步骤404,响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,将预设的第二阈值确定为目标阈值。
在本实施例中,响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,上述执行主体可以将预设的第二阈值确定为目标阈值。其中,上述第一阈值可以小于上述第二阈值。
步骤405,基于嘴巴张开距离与目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
在本实施例中,上述电子设备可以基于上述嘴巴张开距离与上述目标阈值的比较,确定上述当前帧中的人脸对象的嘴巴开闭状态。具体地,响应于确定上述嘴巴张开距离大于上述目标阈值,可以确定上述当前帧中的人脸对象的嘴巴开闭状态为张开状态。响应于确定上述嘴巴张开距离不大于上述目标阈值,可以确定上述当前帧中的人脸对象的嘴巴开闭状态为闭合状态。
步骤406,响应于确定当前帧中的人脸对象的嘴巴开闭状态为张开状态,获取目标特效,在当前帧的人脸对象的嘴巴位置展示目标特效。
在本实施例中,响应于确定上述当前帧中的人脸对象的嘴巴开闭状态为张开状态,上述执行主体可以获取目标特效(例如嘴巴的贴纸),在上述当前帧的人脸对象的嘴巴位置展示上述目标特效。
从图4中可以看出,与图2对应的实施例相比,本实施例中的检测方法的流程400涉及了对通过设置双阈值对嘴巴开闭状态进行检测的步骤。由此,本实施例描述的方案可以基于上一帧中的人脸对象的嘴巴开闭状态确定目标阈值的方式,通过不同阈值的选取,可以避免检测结果频繁跳动,提高了检测结果的稳定性和准确性。此外,还涉及了在确定嘴巴开闭状态为张开状态后进行目标特效的展示步骤。由此,可以丰富视频的展现形式。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供 了一种检测装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例所述的检测装置500包括:获取单元501,被配置成获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果;第一确定单元502,被配置成基于上述人脸关键点检测结果,确定上述当前帧中的人脸对象的嘴巴张开距离;第二确定单元503,被配置成基于预先确定的、上述当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值;第三确定单元504,被配置成基于上述嘴巴张开距离与上述目标阈值的比较,确定上述当前帧中的人脸对象的嘴巴开闭状态。
在本实施例的一些可选的实现方式中,上述第二确定单元503可以包括第一确定模块和第二确定模块(图中未示出)。其中,上述第一确定模块可以被配置成响应于确定上一帧中的人脸对象的嘴巴开闭状态为张开状态,将预设的第一阈值确定为目标阈值。上述第二确定模块可以被配置成响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,将预设的第二阈值确定为目标阈值,其中,上述第一阈值小于上述第二阈值。
在本实施例的一些可选的实现方式中,上述第三确定单元504可以包括第三确定模块和第四确定模块(图中未示出)。其中,上述第三确定模块可以被配置成响应于确定上述嘴巴张开距离大于上述目标阈值,确定上述当前帧中的人脸对象的嘴巴开闭状态为张开状态。上述第四确定模块可以被配置成响应于确定上述嘴巴张开距离不大于上述目标阈值,确定上述当前帧中的人脸对象的嘴巴开闭状态为闭合状态。
在本实施例的一些可选的实现方式中,该装置还可以包括第四确定单元(图中未示出)。其中,上述第四确定单元可以被配置成响应于确定不存在当前帧的上一帧,将预先设置的初始阈值作为目标阈值,基于上述嘴巴张开距离与上述目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
在本实施例的一些可选的实现方式中,该装置还可以包括展示单元(图中未示出)。其中,上述展示单元可以被配置成响应于确定上述 当前帧中的人脸对象的嘴巴开闭状态为张开状态,获取目标特效,在上述当前帧的人脸对象的嘴巴位置展示上述目标特效。
本申请的上述实施例提供的装置,通过获取单元501获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果,从而第一确定单元502能够基于该人脸关键点检测结果,确定出当前帧中的人脸对象的嘴巴张开距离。而后第二确定单元503基于上一帧中的人脸对象的嘴巴开闭状态,可以确定出目标阈值,从而第三确定单元504可以基于目标阈值与所确定的嘴巴张开距离的比较,确定当前帧中的人脸对象的嘴巴开闭状态。由此,与嘴巴张开距离进行数据比较的目标阈值是基于上一帧中的人脸对象的嘴巴开闭状态进行确定的,即,可以考虑到上一帧中的人脸对象的嘴巴开闭状态对当前帧中的人脸对象的嘴巴开闭状态的影响。由此,通过不同目标阈值的选取,可以避免检测结果频繁跳动,提高了视频中的人脸对象的嘴巴开闭状态的检测结果的稳定性和准确性。
下面参考图6,其示出了适于用来实现本申请实施例的电子设备的计算机系统600的结构示意图。图6示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据 需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点 上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、第一确定单元、第二确定单元和第三确定单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该装置执行时,使得该装置:获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果;基于该人脸关键点检测结果,确定该当前帧中的人脸对象的嘴巴张开距离;基于预先确定的、该当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值;基于该嘴巴张开距离与该目标阈值的比较,确定该当前帧中的人脸对象的嘴巴开闭状态。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于) 具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (12)

  1. 一种检测方法,包括:
    获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果;
    基于所述人脸关键点检测结果,确定所述当前帧中的人脸对象的嘴巴张开距离;
    基于预先确定的、所述当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值;
    基于所述嘴巴张开距离与所述目标阈值的比较,确定所述当前帧中的人脸对象的嘴巴开闭状态。
  2. 根据权利要求1所述的检测方法,其中,所述基于预先确定的、所述当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值,包括:
    响应于确定上一帧中的人脸对象的嘴巴开闭状态为张开状态,将预设的第一阈值确定为目标阈值;
    响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,将预设的第二阈值确定为目标阈值,其中,所述第一阈值小于所述第二阈值。
  3. 根据权利要求1所述的检测方法,其中,所述基于所述嘴巴张开距离与所述目标阈值的比较,确定所述当前帧中的人脸对象的嘴巴开闭状态,包括:
    响应于确定所述嘴巴张开距离大于所述目标阈值,确定所述当前帧中的人脸对象的嘴巴开闭状态为张开状态;
    响应于确定所述嘴巴张开距离不大于所述目标阈值,确定所述当前帧中的人脸对象的嘴巴开闭状态为闭合状态。
  4. 根据权利要求1所述的检测方法,其中,所述方法还包括:
    响应于确定不存在当前帧的上一帧,将预先设置的初始阈值作为目标阈值,基于所述嘴巴张开距离与所述目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
  5. 根据权利要求1所述的检测方法,其中,所述方法还包括:
    响应于确定所述当前帧中的人脸对象的嘴巴开闭状态为张开状态,获取目标特效,在所述当前帧的人脸对象的嘴巴位置展示所述目标特效。
  6. 一种检测装置,包括:
    获取单元,被配置成获取对目标视频的当前帧中的人脸对象进行人脸关键点检测后所得到的人脸关键点检测结果;
    第一确定单元,被配置成基于所述人脸关键点检测结果,确定所述当前帧中的人脸对象的嘴巴张开距离;
    第二确定单元,被配置成基于预先确定的、所述当前帧的上一帧中的人脸对象的嘴巴开闭状态,确定目标阈值;
    第三确定单元,被配置成基于所述嘴巴张开距离与所述目标阈值的比较,确定所述当前帧中的人脸对象的嘴巴开闭状态。
  7. 根据权利要求6所述的检测装置,其中,所述第二确定单元,包括:
    第一确定模块,被配置成响应于确定上一帧中的人脸对象的嘴巴开闭状态为张开状态,将预设的第一阈值确定为目标阈值;
    第二确定模块,被配置成响应于确定上一帧中的人脸对象的嘴巴开闭状态为闭合状态,将预设的第二阈值确定为目标阈值,其中,所述第一阈值小于所述第二阈值。
  8. 根据权利要求6所述的检测装置,其中,所述第三确定单元,包括:
    第三确定模块,被配置成响应于确定所述嘴巴张开距离大于所述 目标阈值,确定所述当前帧中的人脸对象的嘴巴开闭状态为张开状态;
    第四确定模块,被配置成响应于确定所述嘴巴张开距离不大于所述目标阈值,确定所述当前帧中的人脸对象的嘴巴开闭状态为闭合状态。
  9. 根据权利要求6所述的检测装置,其中,所述装置还包括:
    第四确定单元,被配置成响应于确定不存在当前帧的上一帧,将预先设置的初始阈值作为目标阈值,基于所述嘴巴张开距离与所述目标阈值的比较,确定当前帧中的人脸对象的嘴巴开闭状态。
  10. 根据权利要求6所述的检测装置,其中,所述装置还包括:
    展示单元,被配置成响应于确定所述当前帧中的人脸对象的嘴巴开闭状态为张开状态,获取目标特效,在所述当前帧的人脸对象的嘴巴位置展示所述目标特效。
  11. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-5中任一所述的方法。
  12. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-5中任一所述的方法。
PCT/CN2018/115973 2018-09-14 2018-11-16 检测方法和装置 WO2020052062A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811075036.0A CN109271929B (zh) 2018-09-14 2018-09-14 检测方法和装置
CN201811075036.0 2018-09-14

Publications (1)

Publication Number Publication Date
WO2020052062A1 true WO2020052062A1 (zh) 2020-03-19

Family

ID=65189111

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/115973 WO2020052062A1 (zh) 2018-09-14 2018-11-16 检测方法和装置

Country Status (2)

Country Link
CN (1) CN109271929B (zh)
WO (1) WO2020052062A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898529A (zh) * 2020-07-29 2020-11-06 北京字节跳动网络技术有限公司 脸部检测方法、装置、电子设备和计算机可读介质
CN114359673A (zh) * 2022-01-10 2022-04-15 北京林业大学 基于度量学习的小样本烟雾检测方法、装置和设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008922B (zh) * 2019-04-12 2023-04-18 腾讯科技(深圳)有限公司 用于终端设备的图像处理方法、设备、装置、介质
CN110188712B (zh) * 2019-06-03 2021-10-12 北京字节跳动网络技术有限公司 用于处理图像的方法和装置

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951730A (zh) * 2014-03-26 2015-09-30 联想(北京)有限公司 一种唇动检测方法、装置及电子设备
CN105989329A (zh) * 2014-12-11 2016-10-05 由田新技股份有限公司 检测人员使用手持装置的方法及装置
CN106650624A (zh) * 2016-11-15 2017-05-10 东软集团股份有限公司 人脸追踪方法及装置
CN106709400A (zh) * 2015-11-12 2017-05-24 阿里巴巴集团控股有限公司 一种感官张闭状态的识别方法、装置及客户端
CN106897658A (zh) * 2015-12-18 2017-06-27 腾讯科技(深圳)有限公司 人脸活体的鉴别方法和装置
CN107358153A (zh) * 2017-06-02 2017-11-17 广州视源电子科技股份有限公司 一种嘴部运动检测方法和装置及活体识别方法和系统
CN107368777A (zh) * 2017-06-02 2017-11-21 广州视源电子科技股份有限公司 一种微笑动作检测方法和装置及活体识别方法和系统
US20180048860A1 (en) * 2016-08-12 2018-02-15 Line Corporation Method and system for measuring quality of video call
JP2018085001A (ja) * 2016-11-24 2018-05-31 キヤノンマーケティングジャパン株式会社 情報処理装置、情報処理方法、プログラム

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794464B (zh) * 2015-05-13 2019-06-07 上海依图网络科技有限公司 一种基于相对属性的活体检测方法
WO2017000213A1 (zh) * 2015-06-30 2017-01-05 北京旷视科技有限公司 活体检测方法及设备、计算机程序产品

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951730A (zh) * 2014-03-26 2015-09-30 联想(北京)有限公司 一种唇动检测方法、装置及电子设备
CN105989329A (zh) * 2014-12-11 2016-10-05 由田新技股份有限公司 检测人员使用手持装置的方法及装置
CN106709400A (zh) * 2015-11-12 2017-05-24 阿里巴巴集团控股有限公司 一种感官张闭状态的识别方法、装置及客户端
CN106897658A (zh) * 2015-12-18 2017-06-27 腾讯科技(深圳)有限公司 人脸活体的鉴别方法和装置
US20180048860A1 (en) * 2016-08-12 2018-02-15 Line Corporation Method and system for measuring quality of video call
CN106650624A (zh) * 2016-11-15 2017-05-10 东软集团股份有限公司 人脸追踪方法及装置
JP2018085001A (ja) * 2016-11-24 2018-05-31 キヤノンマーケティングジャパン株式会社 情報処理装置、情報処理方法、プログラム
CN107358153A (zh) * 2017-06-02 2017-11-17 广州视源电子科技股份有限公司 一种嘴部运动检测方法和装置及活体识别方法和系统
CN107368777A (zh) * 2017-06-02 2017-11-21 广州视源电子科技股份有限公司 一种微笑动作检测方法和装置及活体识别方法和系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898529A (zh) * 2020-07-29 2020-11-06 北京字节跳动网络技术有限公司 脸部检测方法、装置、电子设备和计算机可读介质
CN114359673A (zh) * 2022-01-10 2022-04-15 北京林业大学 基于度量学习的小样本烟雾检测方法、装置和设备
CN114359673B (zh) * 2022-01-10 2024-04-09 北京林业大学 基于度量学习的小样本烟雾检测方法、装置和设备

Also Published As

Publication number Publication date
CN109271929B (zh) 2020-08-04
CN109271929A (zh) 2019-01-25

Similar Documents

Publication Publication Date Title
US10438077B2 (en) Face liveness detection method, terminal, server and storage medium
WO2020056903A1 (zh) 用于生成信息的方法和装置
US20230005265A1 (en) Systems and methods for generating media content
WO2020052062A1 (zh) 检测方法和装置
US10706892B2 (en) Method and apparatus for finding and using video portions that are relevant to adjacent still images
WO2019242222A1 (zh) 用于生成信息的方法和装置
TWI253860B (en) Method for generating a slide show of an image
JP2022523606A (ja) 動画解析のためのゲーティングモデル
WO2020024484A1 (zh) 用于输出数据的方法和装置
EP3195601B1 (en) Method of providing visual sound image and electronic device implementing the same
US9934820B2 (en) Mobile device video personalization
US11196962B2 (en) Method and a device for a video call based on a virtual image
JP7209851B2 (ja) 画像変形の制御方法、装置およびハードウェア装置
WO2020215722A1 (zh) 视频处理方法和装置、电子设备及计算机可读存储介质
WO2021254502A1 (zh) 目标对象显示方法、装置及电子设备
WO2021047069A1 (zh) 人脸识别方法和电子终端设备
WO2021169616A1 (zh) 非活体人脸的检测方法、装置、计算机设备及存储介质
WO2018095252A1 (zh) 视频录制方法及装置
WO2021190625A1 (zh) 拍摄方法和设备
WO2021179719A1 (zh) 人脸活体检测方法、装置、介质及电子设备
US11144766B2 (en) Method for fast visual data annotation
JPWO2015178234A1 (ja) 画像検索システム、検索画面表示方法
US20200402253A1 (en) Head pose estimation
WO2021073204A1 (zh) 对象的显示方法、装置、电子设备及计算机可读存储介质
KR20140033667A (ko) 객체 기반 동영상 편집 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18932987

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/06/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18932987

Country of ref document: EP

Kind code of ref document: A1