WO2017088727A1 - Image processing method and apparatus - Google Patents

Image processing method and apparatus Download PDF

Info

Publication number
WO2017088727A1
WO2017088727A1 PCT/CN2016/106752 CN2016106752W WO2017088727A1 WO 2017088727 A1 WO2017088727 A1 WO 2017088727A1 CN 2016106752 W CN2016106752 W CN 2016106752W WO 2017088727 A1 WO2017088727 A1 WO 2017088727A1
Authority
WO
WIPO (PCT)
Prior art keywords
mouth
frame
state
face
facial features
Prior art date
Application number
PCT/CN2016/106752
Other languages
French (fr)
Chinese (zh)
Inventor
汪铖杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017088727A1 publication Critical patent/WO2017088727A1/en
Priority to US15/680,976 priority Critical patent/US10360441B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application relates to the field of communications technologies, and in particular, to an image processing method and apparatus.
  • Face recognition also known as face recognition, face recognition and face recognition.
  • face recognition Compared with fingerprint scanning or iris recognition, facial recognition has the characteristics of convenient use, obvious intuitiveness, high recognition accuracy, and difficulty in counterfeiting. Therefore, it is easier for users to accept.
  • the embodiment of the present application provides an image processing method and device, which can improve the correct rate of recognition and improve the recognition effect.
  • An embodiment of the present application provides an image processing method, including:
  • a mouth motion of the corresponding face in the video data is identified based on the identified mouth state.
  • an image processing apparatus including:
  • An acquiring unit configured to acquire video data, and extract a frame having facial features from the video data
  • a determining unit configured to determine a mouth position from the frames to obtain a mouth image
  • An analyzing unit configured to analyze the mouth image to obtain a mouth feature
  • An identifier unit configured to identify a mouth state according to the mouth feature by using a preset rule
  • an identifying unit configured to identify a mouth motion of the corresponding face in the video data based on the identified mouth state.
  • a frame having facial features is extracted from the video data, and then the mouth position is determined from the extracted frames to obtain a mouth image, and then analyzed.
  • the mouth feature is obtained, and then the mouth state is identified according to the mouth feature by using a preset rule as a basis for judging whether the mouth is moving, thereby realizing recognition of the mouth motion. Because the scheme has low dependence on the accuracy of the key points of the facial features, it is better than the existing schemes. Even if the face is shaken in the video, the recognition result will not be too great. Impact, in summary, the program can greatly improve the correct rate of recognition and improve the recognition effect.
  • FIG. 1 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 2a is another flowchart of an image processing method provided by an embodiment of the present application.
  • 2b is a schematic diagram of a rectangular frame of a face coordinate in an image processing method according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
  • facial recognition is widely used.
  • facial recognition technology may be applied to data security, or facial recognition technology may be used for face capture and tracking.
  • facial recognition the recognition of the mouth is one of the most important parts. For example, by judging whether the face in the video data has a mouth movement, it is possible to judge the facial expression of the object or determine whether the object is talking. and many more.
  • a facial five-point key point positioning technique is generally used, that is, a plurality of points are used to locate the mouth of each frame of the video image sequence, and then The internal area of the mouth is calculated by using these point coordinates, and finally, by calculating the change of the area, it is determined whether there is a mouth movement in the face in the video.
  • the embodiment of the present application provides an image processing method and apparatus. The details will be described separately below.
  • the image processing apparatus may be specifically integrated in a device such as a terminal or a server.
  • the terminal may include a mobile phone, a tablet, a laptop, or a personal computer (PC, Personal Computer).
  • An image processing method includes: acquiring video data, and extracting a frame having facial features from the video data; determining a mouth position from the frame to obtain a mouth image; analyzing the mouth image to obtain a mouth feature; Using a preset rule, the mouth state is identified according to the mouth feature; based on the identifier, the mouth motion of the corresponding face in the video data is identified.
  • the specific process of the image processing method may include the following steps:
  • the facial features may include eyebrows, eyes, nose and/or mouth, etc., if these features are present in a certain frame image, they may be considered as frames having facial features.
  • the mouth image can be determined by the following method.
  • the frame may be subjected to face detection to obtain a rectangular frame of the face coordinate, and the facial features are positioned according to the rectangular frame of the face coordinate to obtain a key point of the facial features, and then the facial five is determined according to the key points of the facial features.
  • face detection to obtain a rectangular frame of the face coordinate
  • the facial features are positioned according to the rectangular frame of the face coordinate to obtain a key point of the facial features
  • the facial five is determined according to the key points of the facial features.
  • the coordinate position of the official may be subjected to face detection to obtain a rectangular frame of the face coordinate, and the facial features are positioned according to the rectangular frame of the face coordinate to obtain a key point of the facial features, and then the facial five is determined according to the key points of the facial features.
  • the coordinate position of the official may be subjected to face detection to obtain a rectangular frame of the face coordinate, and the facial features are positioned according to the rectangular frame of the face coordinate to obtain a key point of the facial features, and then the facial five is
  • the key point of the face also known as the key feature point of the face, refers to the area of the face with characteristic features, such as the corner of the eye or the corner of the mouth.
  • the key point of the five senses is part of the key points of the face, mainly used to identify the five senses.
  • the five-point key points can be obtained by using the facial coordinate rectangular frame to perform the five-point key point positioning.
  • the key point of the face nose area can be determined as the midpoint of the center line of the two nostrils, that is, the nose lip center point.
  • the key points of the mouth area can be determined by locating the two corner points of the mouth.
  • the position of the mouth is determined based on the coordinate position of the facial features to obtain a mouth image.
  • the mouth position may be determined according to the coordinate position of the facial features, and then the image corresponding to the mouth position is intercepted or captured from the frame image to obtain a mouth image.
  • a texture feature can be specifically extracted from the mouth image to obtain a mouth feature.
  • the texture feature may include a histogram of oriented gradient (HOG) feature, a local binary pattern (LBP) feature, or a Gabor feature.
  • HOG histogram of oriented gradient
  • LBP local binary pattern
  • Gabor feature a Gabor feature
  • the mouth state is identified according to the mouth feature.
  • the preset rule may be set according to the requirements of the actual application.
  • the mouth feature may be classified by a regression device or a classifier, and then the mouth state is identified based on the classification, and the like.
  • Using the preset rule to identify the mouth state according to the mouth feature may include the following steps:
  • the mouth features are classified using a regression or classifier.
  • the mouth feature can be classified by a support vector machine (SVM), or other regressions or classifiers such as a linear regression, a random forest, or the like can be used to classify the mouth features, etc. Wait.
  • SVM support vector machine
  • other regressions or classifiers such as a linear regression, a random forest, or the like can be used to classify the mouth features, etc. Wait.
  • the mouth state flag is set for the frame
  • the closed state flag is set for the frame.
  • the mouth state is the open mouth state or the closed mouth state, it can be determined that the mouth state is a fuzzy state, and then, at this time, there is no need to set the identification bit, that is, neither the mouth opening state identifier is set. Bit, also does not set the shutdown status flag.
  • the method may be adopted in a parallel manner, or a loop operation may be adopted, that is, the frame that needs to be identified by the mouth state is determined first, and then the operations of steps 102 to 104 are performed, and the mouth state identifier is determined for the current need. After the frame processing is completed, the process returns to perform the determination of the frame currently requiring the mouth state identification to perform the mouth state identification processing on the next frame until all the frames having the facial features in the video data are processed (ie, the mouth state). The logo is completed.
  • a face may appear in the video data, and multiple faces may appear, and one frame may include one face or multiple faces, and different faces may be distinguished by facial features;
  • the corresponding frame may be extracted from the video data by the facial feature of the target face to obtain a target frame set. For example, if the mouth motion analysis of the face A is required, the facial feature of the face A may be used from the video data. Extract all frames with face A, get the target frame set, and so on. Identifying the mouth motion of the corresponding face in the video data based on the identifier includes the following steps:
  • a mouth motion analysis request triggered by a user by clicking or sliding a trigger key may be received, and the like.
  • a facial feature of the target face may be acquired according to the target face, and then a frame having the facial feature of the target face is extracted from the video data according to the facial feature of the target facial to obtain a target frame set.
  • the frame having the facial feature of the target face can be acquired from the frame identifying the mouth state obtained by performing the steps 102 to 104 described above.
  • the target frame set includes four frames: frame 1, frame 2, frame 3, and frame 4, wherein frame 1 and frame 2 have a mouth state flag, frame 3 has no flag, and frame 4 has a closed state.
  • the identification bit can be determined at this time that the frame in the target frame set has both the open mouth status flag and the closed state status bit, and then step S4 is performed; otherwise, if frame 1, frame 2, frame 3, and frame 4 are not present
  • the flag bit, or only the mouth state flag or the mouth state flag may determine that the frame in the target frame set does not have a mouth state flag at the same time
  • the identification and the shutdown status flag are then executed, and step S5 is performed.
  • a frame having a facial feature is extracted from the video data, and then the mouth position is determined from the extracted frame to obtain a mouth image, and then analyzed.
  • the mouth feature, and then, using the preset rule, the mouth state is identified according to the mouth feature, as a basis for judging whether the mouth is moving, thereby realizing the recognition of the mouth movement;
  • the accuracy of the key point positioning results is low. Therefore, compared with the existing solution, the stability is better. Even if the face is shaken in the video, the recognition result will not be greatly affected. In short, the solution can Greatly improve the correct rate of recognition and improve the recognition effect.
  • the image processing apparatus is specifically integrated in the terminal, and the mouth state of the face in each frame is identified by a cyclic operation as an example.
  • an image processing method may be as follows:
  • the terminal acquires video data, and performs face detection on the video data to extract a frame having facial features.
  • the facial features may include eyebrows, eyes, nose and/or mouth, and the like. If these features are present in a certain frame image, the frame can be considered to be a frame having facial features.
  • One frame, the second frame, and the third frame are extracted.
  • the terminal determines, according to the extracted frame with the facial feature, a frame that needs to be identified by the mouth state.
  • the mouth state may be identified in sequence, for example, the first frame is determined to be the frame currently requiring the mouth state identification. Then, steps 203 to 209 are performed, and then the second frame is determined to be a frame that needs to be identified by the mouth state, and then steps 203 to 209 are performed, and then the second frame is determined to be a frame that needs to be identified by the mouth state. Analogy, and so on.
  • FIG. 2b is a schematic diagram of a rectangular frame of the face coordinate in the image processing method provided by the embodiment of the present application.
  • the terminal performs the five-point key point positioning according to the rectangular frame of the face coordinate, and obtains a key point of the facial features, and determines a coordinate position of the facial features according to the key points of the facial features.
  • the key point of the face also known as the key feature point of the face, refers to the area of the face with characteristic features, such as the corner of the eye or the corner of the mouth.
  • the key point of the five senses is part of the key points of the face, mainly used to identify the five senses.
  • the key points of the facial features can be obtained, and the manner of obtaining the key points of the facial features can be various, which can be determined according to the requirements of the actual application.
  • the key points of the nose area can be determined as two nostrils. The midpoint of the center line, the center point of the nose and lips.
  • the key points of the mouth area and the like can be determined by locating the two corner points of the mouth.
  • the terminal determines a mouth position according to a coordinate position of the facial features to obtain a mouth image.
  • the terminal may determine the position of the mouth according to the coordinate position of the facial features of the face, and then intercept or capture an image corresponding to the position of the mouth from the image of the frame to obtain a mouth image.
  • the terminal extracts a texture feature from the mouth image to obtain a mouth feature.
  • the texture feature may include: an HOG feature, an LBP feature, or a Gabor feature.
  • the terminal classifies the mouth features by using an SVM.
  • the terminal identifies the state of the mouth according to the classification result.
  • the terminal may be as follows:
  • the mouth state flag is set for the frame
  • the closed state flag is set for the frame
  • the mouth state is the open mouth state or the closed mouth state
  • the mouth status flag is not set, and the mouth status flag is not set.
  • step 210 is performed, and if no, return to step 202.
  • step 210 is performed.
  • the terminal identifies, according to the identifier, a mouth motion of the corresponding face in the video data. For example, it can be as follows:
  • the terminal receives a mouth motion analysis request, and the mouth motion analysis request indicates a target face that needs to perform mouth motion analysis.
  • a mouth motion analysis request triggered by a user by clicking or sliding a trigger key may be received, and the like.
  • the terminal extracts a frame corresponding to the target face from the video data according to the target face, to obtain a target frame set.
  • the terminal may acquire a facial feature of the target face according to the target face, and then extract a frame having the facial feature of the target facial from the video data according to the facial feature of the target facial to obtain a target frame set.
  • all the frames having the facial features can be extracted from the video data, and the frame with the facial features identified by the extracted frame with the facial features is identified.
  • frames having facial features of the target face are extracted from all of the identified frames having facial features.
  • the terminal determines whether the frame in the target frame set has both the open state flag and the closed state flag. If yes, execute S4. If not, execute S5.
  • the target frame set includes four frames: frame 1, frame 2, frame 3, and frame 4, wherein frame 1 and frame 2 have a mouth state flag, frame 3 has no flag, and frame 4 has a closed state.
  • the identification bit can be determined at this time that the frame in the target frame set has both the open mouth status flag and the closed state status bit, and then step S4 is performed; otherwise, if frame 1, frame 2, frame 3, and frame 4 are not present.
  • the flag bit, or only the mouth state flag or the mouth state flag may be determined that the frame in the target frame set does not have both the mouth state flag and the mouth state flag, and step S5 is performed.
  • a frame having a facial feature is extracted from the video data, and then the mouth position is determined from the extracted frame to obtain a mouth image, and then Mouth
  • the texture features are extracted from the image, and the texture features are classified by SVM, and the mouth state is identified based on the classification result as a basis for judging whether the mouth is moving, thereby realizing the recognition of the mouth motion. Because the scheme has low dependence on the accuracy of the key points of the facial features, it is better than the existing schemes. Even if the face is shaken in the video, the recognition result will not be too great. Impact, in summary, the program can greatly improve the correct rate of recognition and improve the recognition effect.
  • the embodiment of the present application further provides an image processing apparatus.
  • the image processing apparatus may include: an obtaining unit 301, a determining unit 302, an analyzing unit 303, an identifying unit 304, and an identifying unit 305.
  • the obtaining unit 301 is configured to acquire video data, and extract a frame having facial features from the video data.
  • the obtaining unit 301 may be specifically configured to read video data that needs to be recognized by the face, and extract a frame having facial features from the video data by using a face recognition technology.
  • the facial features may include eyebrows, eyes, nose and/or mouth, and the like. If these features are present in a certain frame image, it can be considered as a frame having facial features.
  • the determining unit 302 is configured to determine the mouth position from each frame to obtain a mouth image.
  • the determining unit 302 can include a positioning subunit and a determining subunit.
  • the positioning sub-unit is configured to locate facial features in each frame to obtain a coordinate position of the facial features.
  • the positioning sub-unit can be used for performing face detection on each frame to obtain a rectangular frame of the face coordinate, and performing a five-point key point positioning according to the rectangular frame of the face coordinate to obtain a key point of the facial features, and determining the coordinates of the facial features according to the key points of the facial features. position.
  • the key points of the facial features can be obtained, and the manner of obtaining the key points of the facial features can be various, which can be determined according to the requirements of the actual application.
  • the key points of the nose area can be determined as two nostrils. The midpoint of the center line, the center point of the nose and lips.
  • the key points of the mouth area can be determined by determining the two corner points of the mouth.
  • the determining subunit is configured to determine a mouth position according to a coordinate position of the facial features to obtain a mouth image.
  • the determining subunit may be specifically configured to determine a mouth position according to a coordinate position of the facial features, and then intercept or capture an image corresponding to the mouth position from the frame image to obtain a mouth image.
  • the analyzing unit 303 is configured to analyze the mouth image to obtain a mouth feature.
  • the analyzing unit 303 is specifically configured to extract a texture feature from the mouth image to obtain a mouth feature.
  • the texture feature may include an HOG feature, an LBP feature, or a Gabor feature.
  • the marking unit 304 is configured to identify the mouth state according to the mouth feature by using a preset rule.
  • the preset rule may be set according to requirements of an actual application.
  • the identifier unit may include: a classification subunit and an identifier subunit.
  • the classification subunit is configured to classify the mouth features by using a regression or classifier.
  • the classification sub-unit may be specifically used to classify the mouth features by using the SVM, or may also use a linear regression, a random forest or other regression or classifier to classify the mouth features, and the like.
  • the identifier subunit is configured to identify the mouth state according to the classification result. For example, it can be as follows:
  • the mouth state flag is set for the frame
  • the closed state flag is set for the frame.
  • the mouth state is the open mouth state or the closed mouth state
  • the mouth status flag is not set, and the mouth status flag is not set.
  • the identifying unit 305 is configured to determine a mouth motion of the corresponding face in the video data based on the identifier. For example, the details can be as follows:
  • the mouth motion analysis request indicating a target face that needs to perform mouth motion analysis
  • the target frame set includes four frames: frame 1, frame 2, frame 3, and frame 4, wherein frame 1 and frame 2 have a mouth state flag, frame 3 has no flag, and frame 4 has a closed state.
  • the identification bit can be determined at this time that the frame in the target frame set has both the open mouth status flag and the closed mouth status flag, so that the target face has a mouth opening motion state; otherwise, if frame 1, frame 2, frame 3 and If there is no flag in the frame 4, or only the mouth state flag or the mouth state flag is present, it may be determined that the frame in the target frame does not have both the mouth state flag and the mouth state flag, and then the target is determined. There is no mouth movement state on the face.
  • the image processing device may be specifically integrated in a device such as a terminal or a server, and the terminal may include a device such as a mobile phone, a tablet computer, a notebook computer, or a PC.
  • the foregoing units may be implemented as a separate entity, or may be implemented in any combination, and may be implemented as the same or a plurality of entities.
  • the foregoing method embodiments and details are not described herein.
  • the image processing apparatus of the present embodiment extracts a frame having facial features from the video data after acquiring the video data, and then the determining unit 302 determines the mouth position from the extracted frame to obtain the mouth.
  • the image of the part is further analyzed by the analyzing unit 303 to obtain the mouth feature, and then the marking unit 304 uses the preset rule to identify the mouth state according to the mouth feature as the identification unit 305.
  • FIG. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. As shown in FIG. 4, the apparatus includes a processor 401, a non-volatile computer readable memory 402, a display unit 403, and a network communication interface 404. These components communicate over bus 405.
  • a plurality of program modules are stored in the memory 402, including an operating system 406, a network communication module 407, and an application 408.
  • the processor 401 can read various modules (not shown) included in the application in the memory 402 to perform various functional applications and data processing of image processing.
  • the processor 401 in this embodiment may be one or multiple, and may be a CPU, a processing unit/module, an ASIC, a logic module, or a programmable gate array.
  • the operating system 406 can be: a Windows operating system, an Android operating system, or an Apple iPhone OS operating system.
  • Application 408 can include an image processing module 409.
  • the image processing module 409 may include: a computer executable instruction set 409-1 formed by the obtaining unit 301, the determining unit 302, the analyzing unit 303, the identifying unit 304, and the identifying unit 305 in FIG. 3, and corresponding metadata and heuristics. Algorithm 409-2. These sets of computer executable instructions may be executed by the processor 401 and perform the functions of the method illustrated in FIG. 1 or FIG. 2a or the image processing apparatus illustrated in FIG.
  • the network communication interface 404 cooperates with the network communication module 407 to complete transmission and reception of various network signals of the image processing apparatus.
  • the display unit 403 has a display panel for completing input and display of related information.
  • the network communication interface 404 and the network communication module 407 may not be included.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: Read Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.
  • ROM Read Only Memory
  • RAM Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Disclosed are an image processing method and apparatus. The image processing method comprises: acquiring video data, and extracting frames having face features from the video data; determining a mouth position from each frame to obtain a mouth image; analyzing the mouth image to obtain mouth features; identifying a mouth state according to the mouth features by utilizing a preset rule; and identifying a mouth action of a corresponding face in the video data on the basis of the identified mouth state. The solution can increase the identification accuracy and improve the identification effect.

Description

一种图像处理方法和装置Image processing method and device
本申请要求于2015年11月25日提交中国专利局、申请号为201510827420.1、发明名称为“一种面部识别方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 20151082742, the entire disclosure of which is hereby incorporated by reference. .
技术领域Technical field
本申请涉及通信技术领域,具体涉及一种图像处理方法和装置。The present application relates to the field of communications technologies, and in particular, to an image processing method and apparatus.
背景技术Background technique
随着通信技术的发展,各种各样的生物特征识别技术也应运而生,面部识别就是其中的一种。面部识别,又称人脸识别、面像识别和面容识别等。与指纹扫描或虹膜识别等技术相比,面部识别具有使用方便、直观性突出、识别精度高、以及不易仿冒等特点,因此,也较容易被广大用户所接受。With the development of communication technology, a variety of biometric recognition technologies have emerged, and facial recognition is one of them. Face recognition, also known as face recognition, face recognition and face recognition. Compared with fingerprint scanning or iris recognition, facial recognition has the characteristics of convenient use, obvious intuitiveness, high recognition accuracy, and difficulty in counterfeiting. Therefore, it is easier for users to accept.
发明内容Summary of the invention
本申请实施例提供一种图像处理方法和装置,可以提高识别的正确率,改善识别效果。The embodiment of the present application provides an image processing method and device, which can improve the correct rate of recognition and improve the recognition effect.
本申请实施例提供一种图像处理方法,包括:An embodiment of the present application provides an image processing method, including:
获取视频数据;Obtain video data;
从所述视频数据中提取具有面部特征的帧;Extracting a frame having facial features from the video data;
从所述各帧中确定出嘴部位置,得到嘴部图像;Determining a mouth position from the frames to obtain a mouth image;
分析所述嘴部图像,得到嘴部特征;Analyzing the mouth image to obtain a mouth feature;
利用预设规则,根据所述嘴部特征对嘴部状态进行标识;Using a preset rule to identify the state of the mouth according to the mouth feature;
基于所标识的嘴部状态识别所述视频数据中相应面部的嘴部动作。A mouth motion of the corresponding face in the video data is identified based on the identified mouth state.
相应的,本申请实施例还提供一种图像处理装置,包括:Correspondingly, the embodiment of the present application further provides an image processing apparatus, including:
获取单元,用于获取视频数据,并从所述视频数据中提取具有面部特征的帧;An acquiring unit, configured to acquire video data, and extract a frame having facial features from the video data;
确定单元,用于从所述各帧中确定出嘴部位置,得到嘴部图像;a determining unit, configured to determine a mouth position from the frames to obtain a mouth image;
分析单元,用于分析所述嘴部图像,得到嘴部特征; An analyzing unit, configured to analyze the mouth image to obtain a mouth feature;
标识单元,用于利用预设规则,根据所述嘴部特征对嘴部状态进行标识;An identifier unit, configured to identify a mouth state according to the mouth feature by using a preset rule;
识别单元,用于基于所标识的嘴部状态识别所述视频数据中相应面部的嘴部动作。And an identifying unit, configured to identify a mouth motion of the corresponding face in the video data based on the identified mouth state.
在本申请实施例中,采用在获取到视频数据后,从该视频数据中提取具有面部特征的帧,然后,从该提取的各帧中确定出嘴部位置,得到嘴部图像,进而进行分析,得到嘴部特征,再然后,利用预设规则,根据该嘴部特征对嘴部状态进行标识,以作为判断嘴部是否运动的依据,从而实现对嘴部动作的识别。由于该方案对人脸五官关键点定位结果精准性的依赖较低,因此,相对于现有方案而言,稳定性更佳,即便视频中人脸晃动,对识别结果也不会有太大的影响,总而言之,该方案可以大大提高识别的正确率,改善识别效果。In the embodiment of the present application, after the video data is acquired, a frame having facial features is extracted from the video data, and then the mouth position is determined from the extracted frames to obtain a mouth image, and then analyzed. The mouth feature is obtained, and then the mouth state is identified according to the mouth feature by using a preset rule as a basis for judging whether the mouth is moving, thereby realizing recognition of the mouth motion. Because the scheme has low dependence on the accuracy of the key points of the facial features, it is better than the existing schemes. Even if the face is shaken in the video, the recognition result will not be too great. Impact, in summary, the program can greatly improve the correct rate of recognition and improve the recognition effect.
附图说明DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings can also be obtained from those skilled in the art based on these drawings without paying any creative effort.
图1是本申请实施例提供的图像处理方法的流程图;1 is a flowchart of an image processing method provided by an embodiment of the present application;
图2a是本申请实施例提供的图像处理方法的另一流程图;2a is another flowchart of an image processing method provided by an embodiment of the present application;
图2b是本申请实施例提供的图像处理方法中面部坐标矩形框的示意图;2b is a schematic diagram of a rectangular frame of a face coordinate in an image processing method according to an embodiment of the present application;
图3是本申请实施例提供的图像处理装置的结构示意图;3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
图4是本申请实施例提供的图像处理装置的结构示意图。FIG. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present application without creative efforts are within the scope of the present application.
在本申请一实施例中,面部识别的应用较为广泛,例如,可以将面部识别技术应用于数据安全方面,或者,也可以利用面部识别技术进行人脸捕捉和追踪。 在面部识别中,对嘴部的识别是其中的一个很重要的部分,比如,通过判断视频数据中的面部是否有张嘴运动,就可以判断该对象的面部表情,或者判断该对象是否正在说话,等等。在本实施例中,在判断视频数据中的面部是否有张嘴运动时,一般会使用人脸五官关键点定位技术,即使用多个点定位视频序列中每一帧人脸图像的嘴部,然后利用这些点坐标计算出嘴部的内部面积,最后通过计算面积的变化来判定视频中人脸是否存在张嘴运动。In an embodiment of the present application, facial recognition is widely used. For example, facial recognition technology may be applied to data security, or facial recognition technology may be used for face capture and tracking. In facial recognition, the recognition of the mouth is one of the most important parts. For example, by judging whether the face in the video data has a mouth movement, it is possible to judge the facial expression of the object or determine whether the object is talking. and many more. In this embodiment, when determining whether the face in the video data has a mouth opening motion, a facial five-point key point positioning technique is generally used, that is, a plurality of points are used to locate the mouth of each frame of the video image sequence, and then The internal area of the mouth is calculated by using these point coordinates, and finally, by calculating the change of the area, it is determined whether there is a mouth movement in the face in the video.
在本方案中,若视频中的人脸有晃动,则人脸五官关键点会出现定位失败或偏差较大的情况,这将会造成计算得到的嘴部的内面积错误,并最终导致张嘴运动状态检测的失败,也就是说,该方案的正确率并不高,识别效果较差。In this scheme, if the face in the video is shaken, the key points of the face may appear to be failed or the deviation is large, which will cause the calculated inner area of the mouth to be wrong, and eventually cause the mouth movement. The failure of state detection, that is to say, the correct rate of the scheme is not high, and the recognition effect is poor.
为了面部识别的正确率和识别效果,本申请实施例提供一种图像处理方法和装置。以下将分别进行详细说明。For the correct rate and recognition effect of the face recognition, the embodiment of the present application provides an image processing method and apparatus. The details will be described separately below.
本实施例将从图像处理装置的角度进行描述,该图像处理装置具体可以集成在终端或服务器等设备中。该终端可以包括手机、平板电脑、笔记本电脑或个人计算机(PC,Personal Computer)等设备。This embodiment will be described from the perspective of an image processing apparatus, and the image processing apparatus may be specifically integrated in a device such as a terminal or a server. The terminal may include a mobile phone, a tablet, a laptop, or a personal computer (PC, Personal Computer).
一种图像处理方法,包括:获取视频数据,并从该视频数据中提取具有面部特征的帧;从该帧中确定嘴部位置,得到嘴部图像;分析该嘴部图像,得到嘴部特征;利用预设规则,根据该嘴部特征对嘴部状态进行标识;基于该标识识别该视频数据中相应面部的嘴部动作。An image processing method includes: acquiring video data, and extracting a frame having facial features from the video data; determining a mouth position from the frame to obtain a mouth image; analyzing the mouth image to obtain a mouth feature; Using a preset rule, the mouth state is identified according to the mouth feature; based on the identifier, the mouth motion of the corresponding face in the video data is identified.
如图1所示,该图像处理方法的具体流程可以包括如下步骤:As shown in FIG. 1, the specific process of the image processing method may include the following steps:
101、获取视频数据,并从该视频数据中提取具有面部特征的帧。101. Acquire video data, and extract a frame having facial features from the video data.
例如,具体可以读取需要进行面部识别的视频数据,并利用面人脸识别技术从该视频数据中提取出具有面部特征的帧。For example, it is specifically possible to read video data that requires facial recognition, and extract a frame having facial features from the video data using face recognition technology.
其中,该面部特征可以包括眉毛、眼睛、鼻子和/或嘴巴等,若某一帧图像中具有这些特征,则可以认为是具有面部特征的帧。Wherein, the facial features may include eyebrows, eyes, nose and/or mouth, etc., if these features are present in a certain frame image, they may be considered as frames having facial features.
102、从各帧中确定出嘴部位置,得到嘴部图像。102. Determine the position of the mouth from each frame to obtain a mouth image.
在本实施例中,可以利用如下方法确定嘴部图像。In the present embodiment, the mouth image can be determined by the following method.
(1)对该帧中的面部五官进行定位,得到面部五官的坐标位置。(1) Positioning the facial features in the frame to obtain the coordinate position of the facial features.
例如,可以对该帧进行面部检测,得到面部坐标矩形框,根据该面部坐标矩形框进行五官关键点定位,得到五官关键点,然后,根据该五官关键点确定出面部五 官的坐标位置。For example, the frame may be subjected to face detection to obtain a rectangular frame of the face coordinate, and the facial features are positioned according to the rectangular frame of the face coordinate to obtain a key point of the facial features, and then the facial five is determined according to the key points of the facial features. The coordinate position of the official.
其中,人脸关键点,也称为人脸关键特征点,指的是人脸中具有特质特征的区域,比如眼角或嘴角等。五官关键点是人脸关键点中的一部分,主要用于对五官进行识别。Among them, the key point of the face, also known as the key feature point of the face, refers to the area of the face with characteristic features, such as the corner of the eye or the corner of the mouth. The key point of the five senses is part of the key points of the face, mainly used to identify the five senses.
其中,可以利用多种方式根据面部坐标矩形框进行五官关键点定位得到五官关键点。例如,可以将人脸鼻子区域的关键点确定为两个鼻孔中心连线的中点处,即鼻唇中心点。可以通过定位两个嘴角点来确定嘴部区域的关键点。Among them, the five-point key points can be obtained by using the facial coordinate rectangular frame to perform the five-point key point positioning. For example, the key point of the face nose area can be determined as the midpoint of the center line of the two nostrils, that is, the nose lip center point. The key points of the mouth area can be determined by locating the two corner points of the mouth.
(2)根据该面部五官的坐标位置确定嘴部位置,得到嘴部图像。(2) The position of the mouth is determined based on the coordinate position of the facial features to obtain a mouth image.
例如,具体可以根据该面部五官的坐标位置确定嘴部位置,然后,从这一帧图像中截取或抠取该嘴部位置对应的图像,得到嘴部图像。For example, the mouth position may be determined according to the coordinate position of the facial features, and then the image corresponding to the mouth position is intercepted or captured from the frame image to obtain a mouth image.
103、分析该嘴部图像,得到嘴部特征。103. Analyze the mouth image to obtain a mouth feature.
例如,具体可以从该嘴部图像中提取纹理特征,得到嘴部特征。For example, a texture feature can be specifically extracted from the mouth image to obtain a mouth feature.
其中,纹理特征可以包括方向梯度直方图(HOG,histogram of oriented gradient)特征、局部二值模式(LBP,Local binary pattern)特征或Gabor特征等。The texture feature may include a histogram of oriented gradient (HOG) feature, a local binary pattern (LBP) feature, or a Gabor feature.
104、利用预设规则,根据该嘴部特征对嘴部状态进行标识。104. Using a preset rule, the mouth state is identified according to the mouth feature.
其中,该预设规则可以根据实际应用的需求进行设置,例如,可以采用回归器或分类器对该嘴部特征进行分类,然后基于该分类对嘴部状态进行标识,等等。利用预设规则,根据该嘴部特征对嘴部状态进行标识可以包括如下步骤:The preset rule may be set according to the requirements of the actual application. For example, the mouth feature may be classified by a regression device or a classifier, and then the mouth state is identified based on the classification, and the like. Using the preset rule to identify the mouth state according to the mouth feature may include the following steps:
(1)采用回归器或分类器对该嘴部特征进行分类。(1) The mouth features are classified using a regression or classifier.
例如,可以采用支持向量机(SVM,Support Vector Machine)对该嘴部特征进行分类,或者,还可以采用线性回归器、随机森林等其他回归器或分类器来对该嘴部特征进行分类,等等。For example, the mouth feature can be classified by a support vector machine (SVM), or other regressions or classifiers such as a linear regression, a random forest, or the like can be used to classify the mouth features, etc. Wait.
(2)根据分类结果对嘴部状态进行标识,例如,可以如下:(2) Identify the mouth state according to the classification result, for example, as follows:
若根据该分类结果确定嘴部状态为张嘴状态,则为该帧设置张嘴状态标识位;If it is determined according to the classification result that the mouth state is the open mouth state, the mouth state flag is set for the frame;
若根据该分类结果确定嘴部状态为闭嘴状态,则为该帧设置闭嘴状态标识位。If it is determined according to the classification result that the mouth state is the closed state, the closed state flag is set for the frame.
需说明的是,若根据该分类结果无法确定嘴部状态为张嘴状态还是闭嘴状态,则可以确定该嘴部状态为模糊状态,那么此时,无需设置标识位,即既不设置张嘴状态标识位,也不设置闭嘴状态标识位。It should be noted that if it is not determined according to the classification result that the mouth state is the open mouth state or the closed mouth state, it can be determined that the mouth state is a fuzzy state, and then, at this time, there is no need to set the identification bit, that is, neither the mouth opening state identifier is set. Bit, also does not set the shutdown status flag.
此外,还需说明的是,在对该视频数据中的各个具有面部特征的帧的嘴部状态 进行标识时,可以采用并行的方式,也可以采用循环操作的方式,即先确定当前需要进行嘴部状态标识的帧,然后执行步骤102至104的操作,在对该当前需要进行嘴部状态标识的帧处理完毕之后,再返回执行确定当前需要进行嘴部状态标识的帧,以对下一个帧进行嘴部状态标识处理,直至该视频数据中所有具有面部特征的帧都处理(即嘴部状态标识)完毕。In addition, it should be noted that the mouth state of each frame having facial features in the video data When the identification is performed, the method may be adopted in a parallel manner, or a loop operation may be adopted, that is, the frame that needs to be identified by the mouth state is determined first, and then the operations of steps 102 to 104 are performed, and the mouth state identifier is determined for the current need. After the frame processing is completed, the process returns to perform the determination of the frame currently requiring the mouth state identification to perform the mouth state identification processing on the next frame until all the frames having the facial features in the video data are processed (ie, the mouth state). The logo is completed.
105、基于所标识的嘴部状态识别该视频数据中相应面部的嘴部动作。105. Identify a mouth motion of the corresponding face in the video data based on the identified mouth state.
其中,该视频数据中可以出现一个人脸(面部),也可以出现多个人脸,而一个帧中可以包括一个人脸,也可以多个人脸,不同的人脸可以通过面部特征进行区分;即可以通过目标面部的面部特征从该视频数据中提取对应的帧,得到目标帧集合,比如,如果需要对人脸A进行嘴部动作分析,则可以根据人脸A的面部特征从该视频数据中提取出所有具有人脸A的帧,得到目标帧集合,以此类推,等等。基于该标识识别该视频数据中相应面部的嘴部动作包括如下步骤:Wherein, a face (face) may appear in the video data, and multiple faces may appear, and one frame may include one face or multiple faces, and different faces may be distinguished by facial features; The corresponding frame may be extracted from the video data by the facial feature of the target face to obtain a target frame set. For example, if the mouth motion analysis of the face A is required, the facial feature of the face A may be used from the video data. Extract all frames with face A, get the target frame set, and so on. Identifying the mouth motion of the corresponding face in the video data based on the identifier includes the following steps:
S1、接收嘴部动作分析请求,该嘴部动作分析请求指示需要进行嘴部动作分析的目标面部。S1. Receive a mouth motion analysis request, and the mouth motion analysis request indicates a target face that needs to perform mouth motion analysis.
例如,可以接收用户通过点击或滑动触发键而触发的嘴部动作分析请求,等等。For example, a mouth motion analysis request triggered by a user by clicking or sliding a trigger key may be received, and the like.
S2、根据该目标面部从该视频数据中提取对应该目标面部的帧,得到目标帧集合。S2. Extract a frame corresponding to the target face from the video data according to the target face, and obtain a target frame set.
例如,可以根据该目标面部获取目标面部的面部特征,然后,根据该目标面部的面部特征从该视频数据中提取出具有该目标面部的面部特征的帧,得到目标帧集合。For example, a facial feature of the target face may be acquired according to the target face, and then a frame having the facial feature of the target face is extracted from the video data according to the facial feature of the target facial to obtain a target frame set.
在本步骤中,可以从上述循环执行步骤102至步骤104而得到的标识了嘴部状态的帧中获取具有该目标面部的面部特征的帧。In this step, the frame having the facial feature of the target face can be acquired from the frame identifying the mouth state obtained by performing the steps 102 to 104 described above.
S3、确定该目标帧集合中的帧是否同时存在张嘴状态标识位和闭嘴状态标识位,若是,则执行S4,若否,则执行S5。S3. Determine whether the frame in the target frame set has both the open state flag and the closed state flag. If yes, execute S4. If not, execute S5.
例如,若目标帧集合中包括四个帧:帧1、帧2、帧3和帧4,其中,帧1和帧2存在张嘴状态标识位,帧3没有标识位,而帧4存在闭嘴状态标识位,则此时可以确定该目标帧集合中的帧同时存在张嘴状态标识位和闭嘴状态标识位,于是执行步骤S4;否则,若帧1、帧2、帧3和帧4均不存在标识位,或只存在张嘴状态标识位或闭嘴状态标识位,则可以确定该目标帧集合中的帧没有同时存在张嘴状态标 识位和闭嘴状态标识位,于是执行步骤S5。For example, if the target frame set includes four frames: frame 1, frame 2, frame 3, and frame 4, wherein frame 1 and frame 2 have a mouth state flag, frame 3 has no flag, and frame 4 has a closed state. The identification bit can be determined at this time that the frame in the target frame set has both the open mouth status flag and the closed state status bit, and then step S4 is performed; otherwise, if frame 1, frame 2, frame 3, and frame 4 are not present The flag bit, or only the mouth state flag or the mouth state flag, may determine that the frame in the target frame set does not have a mouth state flag at the same time The identification and the shutdown status flag are then executed, and step S5 is performed.
S4、确定该目标帧集合中的帧同时存在张嘴状态标识位和闭嘴状态标识位时,确定该目标面部存在张嘴运动状态;S4. Determine that a frame in the target frame set has both a mouth state flag and a mouth state flag, and determine that the target face has a mouth motion state;
S5、确定该目标帧集合中的帧没有同时存在张嘴状态标识位和闭嘴状态标识位时,确定该目标面部不存在张嘴运动状态。S5. When it is determined that the frame in the target frame set does not have both the open state flag and the closed state flag, it is determined that the target face does not have a mouth motion state.
由上可知,本实施例采用在获取到视频数据后,从该视频数据中提取具有面部特征的帧,然后,从该提取的帧中确定嘴部位置,得到嘴部图像,进而进行分析,得到嘴部特征,再然后,利用预设规则,根据该嘴部特征对嘴部状态进行标识,以作为判断嘴部是否运动的依据,从而实现对嘴部动作的识别;由于该方案对人脸五官关键点定位结果精准性的依赖较低,因此,相对于现有方案而言,稳定性更佳,即便视频中人脸晃动,对识别结果也不会有太大的影响,总而言之,该方案可以大大提高识别的正确率,改善识别效果。As can be seen from the above, in this embodiment, after the video data is acquired, a frame having a facial feature is extracted from the video data, and then the mouth position is determined from the extracted frame to obtain a mouth image, and then analyzed. The mouth feature, and then, using the preset rule, the mouth state is identified according to the mouth feature, as a basis for judging whether the mouth is moving, thereby realizing the recognition of the mouth movement; The accuracy of the key point positioning results is low. Therefore, compared with the existing solution, the stability is better. Even if the face is shaken in the video, the recognition result will not be greatly affected. In short, the solution can Greatly improve the correct rate of recognition and improve the recognition effect.
根据实施例一所描述的方法,以下将举例作进一步详细说明。According to the method described in Embodiment 1, the following will be exemplified in further detail.
在本实施例中,将以该图像处理装置具体集成在终端中,且采用循环操作的方式对各个帧中的人脸的嘴部状态进行标识为例进行说明。In this embodiment, the image processing apparatus is specifically integrated in the terminal, and the mouth state of the face in each frame is identified by a cyclic operation as an example.
如图2a所示,一种图像处理方法,具体流程可以如下:As shown in FIG. 2a, an image processing method may be as follows:
201、终端获取视频数据,并对该视频数据进行人脸检测,以提取具有面部特征的帧。201. The terminal acquires video data, and performs face detection on the video data to extract a frame having facial features.
其中,该面部特征可以包括眉毛、眼睛、鼻子和/或嘴巴等。若某一帧图像中具有这些特征,则可以认为该帧是具有面部特征的帧。Wherein, the facial features may include eyebrows, eyes, nose and/or mouth, and the like. If these features are present in a certain frame image, the frame can be considered to be a frame having facial features.
例如,如果通过人脸检测,确定该视频数据中的第一帧、第二帧和第三帧均具有面部特征,而第四帧和第五帧不具有面部特征,则此时,可以将第一帧、第二帧和第三帧提取出来。For example, if it is determined by face detection that the first frame, the second frame, and the third frame in the video data have facial features, and the fourth frame and the fifth frame do not have facial features, then One frame, the second frame, and the third frame are extracted.
202、终端根据提取到的具有面部特征的帧确定当前需要进行嘴部状态标识的帧。202. The terminal determines, according to the extracted frame with the facial feature, a frame that needs to be identified by the mouth state.
例如,如果提取到的帧为第一帧、第二帧和第三帧,则可以依次对这些帧进行嘴部状态的标识,比如,先确定第一帧为当前需要进行嘴部状态标识的帧,然后执行步骤203~209,再确定第二帧为当前需要进行嘴部状态标识的帧,然后再执行步骤203~209,再确定第二帧为当前需要进行嘴部状态标识的帧,以此类推,等等。 For example, if the extracted frames are the first frame, the second frame, and the third frame, the mouth state may be identified in sequence, for example, the first frame is determined to be the frame currently requiring the mouth state identification. Then, steps 203 to 209 are performed, and then the second frame is determined to be a frame that needs to be identified by the mouth state, and then steps 203 to 209 are performed, and then the second frame is determined to be a frame that needs to be identified by the mouth state. Analogy, and so on.
203、终端对该当前需要进行嘴部状态标识的帧进行面部检测,得到面部坐标矩形框,例如,参见图2b,图2b为本申请实施例提供的图像处理方法中面部坐标矩形框的示意图。203. The terminal performs face detection on the frame that needs to be identified by the mouth state, and obtains a rectangular frame of the face coordinate. For example, referring to FIG. 2b, FIG. 2b is a schematic diagram of a rectangular frame of the face coordinate in the image processing method provided by the embodiment of the present application.
204、终端根据该面部坐标矩形框进行五官关键点定位,得到五官关键点,根据该五官关键点确定面部五官的坐标位置。204. The terminal performs the five-point key point positioning according to the rectangular frame of the face coordinate, and obtains a key point of the facial features, and determines a coordinate position of the facial features according to the key points of the facial features.
其中,人脸关键点,也称为人脸关键特征点,指的是人脸中具有特质特征的区域,比如眼角或嘴角等。五官关键点是人脸关键点的一部分,主要用于识别五官。Among them, the key point of the face, also known as the key feature point of the face, refers to the area of the face with characteristic features, such as the corner of the eye or the corner of the mouth. The key point of the five senses is part of the key points of the face, mainly used to identify the five senses.
其中,根据面部坐标矩形框进行五官关键点定位,得到五官关键点的方式可以有多种,具体可以根据实际应用的需求而定,比如,可以将人脸鼻子区域的关键点确定为两个鼻孔中心连线的中点处,即鼻唇中心点。可以通过定位两个嘴角点来确定嘴部区域的关键点等等。Among them, according to the rectangular coordinate frame of the face coordinate, the key points of the facial features can be obtained, and the manner of obtaining the key points of the facial features can be various, which can be determined according to the requirements of the actual application. For example, the key points of the nose area can be determined as two nostrils. The midpoint of the center line, the center point of the nose and lips. The key points of the mouth area and the like can be determined by locating the two corner points of the mouth.
205、终端根据该面部五官的坐标位置确定嘴部位置,得到嘴部图像。205. The terminal determines a mouth position according to a coordinate position of the facial features to obtain a mouth image.
例如,终端可以根据该面部五官的坐标位置确定嘴部位置,然后,从这一帧图像中截取或抠取该嘴部位置对应的图像,得到嘴部图像。For example, the terminal may determine the position of the mouth according to the coordinate position of the facial features of the face, and then intercept or capture an image corresponding to the position of the mouth from the image of the frame to obtain a mouth image.
206、终端从该嘴部图像中提取纹理特征,得到嘴部特征。206. The terminal extracts a texture feature from the mouth image to obtain a mouth feature.
其中,纹理特征可以包括:HOG特征、LBP特征或Gabor特征等。The texture feature may include: an HOG feature, an LBP feature, or a Gabor feature.
207、终端采用SVM对该嘴部特征进行分类。207. The terminal classifies the mouth features by using an SVM.
需说明的是,除了SVM之外,还可以采用线性回归器、随机森林等其他回归器或分类器来对该嘴部特征进行分类。It should be noted that in addition to the SVM, other regressions or classifiers such as linear regression and random forest may be used to classify the mouth features.
208、终端根据分类结果对嘴部状态进行标识,例如,可以如下:208. The terminal identifies the state of the mouth according to the classification result. For example, the terminal may be as follows:
若根据该分类结果确定嘴部状态为张嘴状态,则为该帧设置张嘴状态标识位;If it is determined according to the classification result that the mouth state is the open mouth state, the mouth state flag is set for the frame;
若根据该分类结果确定嘴部状态为闭嘴状态,则为该帧设置闭嘴状态标识位;If it is determined according to the classification result that the mouth state is the closed state, the closed state flag is set for the frame;
需说明的是,若根据该分类结果无法确定嘴部状态为张嘴状态还是闭嘴状态,则可以确定该嘴部状态为模糊状态,那么此时,无需进行标识位的设置操作,即既不设置张嘴状态标识位,也不设置闭嘴状态标识位。It should be noted that if it is not determined according to the classification result that the mouth state is the open mouth state or the closed mouth state, it can be determined that the mouth state is a fuzzy state, and at this time, the setting operation of the marker bit is not required, that is, neither is set. The mouth status flag is not set, and the mouth status flag is not set.
209、终端确定该视频数据中所有具有面部特征的帧是否都处理完毕,若是,则执行步骤210,若否,则返回执行步骤202。209. The terminal determines whether all the frames having the facial features in the video data are processed. If yes, step 210 is performed, and if no, return to step 202.
例如,如果该视频数据中具有面部特征的帧只有第一帧、第二帧和第三帧,则在标识完第一帧之后,由于还有第二帧和第三帧还未处理,因此,需要继续对第二 帧进行嘴部状态的标识,因此,需要返回执行步骤202,而如果第二帧和第三帧也都标识完毕,则执行步骤210。For example, if the frame having the facial features in the video data has only the first frame, the second frame, and the third frame, after the first frame is identified, since the second frame and the third frame are not yet processed, Need to continue to the second The frame performs the identification of the mouth state, so it is necessary to return to step 202, and if both the second frame and the third frame are also identified, step 210 is performed.
210、终端基于该标识识别该视频数据中相应面部的嘴部动作。例如,可以如下:210. The terminal identifies, according to the identifier, a mouth motion of the corresponding face in the video data. For example, it can be as follows:
S1、终端接收嘴部动作分析请求,该嘴部动作分析请求指示需要进行嘴部动作分析的目标面部。S1. The terminal receives a mouth motion analysis request, and the mouth motion analysis request indicates a target face that needs to perform mouth motion analysis.
例如,可以接收用户通过点击或滑动触发键而触发的嘴部动作分析请求,等等。For example, a mouth motion analysis request triggered by a user by clicking or sliding a trigger key may be received, and the like.
S2、终端根据该目标面部从该视频数据中提取与该目标面部对应的帧,得到目标帧集合。S2. The terminal extracts a frame corresponding to the target face from the video data according to the target face, to obtain a target frame set.
例如,终端可以根据该目标面部获取目标面部的面部特征,然后,根据该目标面部的面部特征从该视频数据中提取出具有该目标面部的面部特征的帧,得到目标帧集合。For example, the terminal may acquire a facial feature of the target face according to the target face, and then extract a frame having the facial feature of the target facial from the video data according to the facial feature of the target facial to obtain a target frame set.
利用上述步骤202至步骤209,可以从该视频数据中提取出所有具有面部特征的帧,并对所有提取到的具有面部特征的帧进行嘴部状态标识得到标识后的具有面部特征的帧。在本实施例中,从所有标识后的具有面部特征的帧中提取具有该目标面部的面部特征的帧。With the above steps 202 to 209, all the frames having the facial features can be extracted from the video data, and the frame with the facial features identified by the extracted frame with the facial features is identified. In the present embodiment, frames having facial features of the target face are extracted from all of the identified frames having facial features.
S3、终端确定该目标帧集合中的帧是否同时存在张嘴状态标识位和闭嘴状态标识位,若是,则执行S4,若否,则执行S5。S3. The terminal determines whether the frame in the target frame set has both the open state flag and the closed state flag. If yes, execute S4. If not, execute S5.
例如,若目标帧集合中包括四个帧:帧1、帧2、帧3和帧4,其中,帧1和帧2存在张嘴状态标识位,帧3没有标识位,而帧4存在闭嘴状态标识位,则此时可以确定该目标帧集合中的帧同时存在张嘴状态标识位和闭嘴状态标识位,于是执行步骤S4;否则,若帧1、帧2、帧3和帧4均不存在标识位,或只存在张嘴状态标识位或闭嘴状态标识位,则可以确定该目标帧集合中的帧没有同时存在张嘴状态标识位和闭嘴状态标识位,于是执行步骤S5。For example, if the target frame set includes four frames: frame 1, frame 2, frame 3, and frame 4, wherein frame 1 and frame 2 have a mouth state flag, frame 3 has no flag, and frame 4 has a closed state. The identification bit can be determined at this time that the frame in the target frame set has both the open mouth status flag and the closed state status bit, and then step S4 is performed; otherwise, if frame 1, frame 2, frame 3, and frame 4 are not present The flag bit, or only the mouth state flag or the mouth state flag, may be determined that the frame in the target frame set does not have both the mouth state flag and the mouth state flag, and step S5 is performed.
S4、终端确定该目标帧集合中的帧同时存在张嘴状态标识位和闭嘴状态标识位时,确定该目标面部存在张嘴运动状态。S4. When the terminal determines that the frame in the target frame set has both the open mouth status flag and the closed state status bit, it is determined that the target face has a mouth movement state.
S5、终端确定该目标帧集合中的帧没有同时存在张嘴状态标识位和闭嘴状态标识位时,确定该目标面部不存在张嘴运动状态。S5. When the terminal determines that the frame in the target frame set does not have both the open state flag and the closed state flag, it is determined that the target face does not have a mouth motion state.
由上可知,在本实施例中,在获取到视频数据后,从该视频数据中提取具有面部特征的帧,然后,从该提取的帧中确定嘴部位置,得到嘴部图像,进而从该嘴部 图像中提取纹理特征,利用SVM对该纹理特征进行分类,并基于分类结果对嘴部状态进行标识,以作为判断嘴部是否运动的依据,从而实现对嘴部动作的识别。由于该方案对人脸五官关键点定位结果精准性的依赖较低,因此,相对于现有方案而言,稳定性更佳,即便视频中人脸晃动,对识别结果也不会有太大的影响,总而言之,该方案可以大大提高识别的正确率,改善识别效果。As can be seen from the above, in the embodiment, after the video data is acquired, a frame having a facial feature is extracted from the video data, and then the mouth position is determined from the extracted frame to obtain a mouth image, and then Mouth The texture features are extracted from the image, and the texture features are classified by SVM, and the mouth state is identified based on the classification result as a basis for judging whether the mouth is moving, thereby realizing the recognition of the mouth motion. Because the scheme has low dependence on the accuracy of the key points of the facial features, it is better than the existing schemes. Even if the face is shaken in the video, the recognition result will not be too great. Impact, in summary, the program can greatly improve the correct rate of recognition and improve the recognition effect.
本申请实施例还提供一种图像处理装置,如图3所示,该图像处理装置可以包括:获取单元301、确定单元302、分析单元303、标识单元304和识别单元305。The embodiment of the present application further provides an image processing apparatus. As shown in FIG. 3, the image processing apparatus may include: an obtaining unit 301, a determining unit 302, an analyzing unit 303, an identifying unit 304, and an identifying unit 305.
获取单元301,用于获取视频数据,并从该视频数据中提取具有面部特征的帧。The obtaining unit 301 is configured to acquire video data, and extract a frame having facial features from the video data.
例如,获取单元301,具体可以用于读取需要进行面部识别的视频数据,并利用人脸识别技术从该视频数据中提取具有面部特征的帧。For example, the obtaining unit 301 may be specifically configured to read video data that needs to be recognized by the face, and extract a frame having facial features from the video data by using a face recognition technology.
其中,该面部特征可以包括眉毛、眼睛、鼻子和/或嘴巴等。若某一帧图像中具有这些特征,则可以认为是具有面部特征的帧。Wherein, the facial features may include eyebrows, eyes, nose and/or mouth, and the like. If these features are present in a certain frame image, it can be considered as a frame having facial features.
确定单元302,用于从各帧中确定出嘴部位置,得到嘴部图像。The determining unit 302 is configured to determine the mouth position from each frame to obtain a mouth image.
例如,该确定单元302可以包括定位子单元和确定子单元。For example, the determining unit 302 can include a positioning subunit and a determining subunit.
该定位子单元,用于对各帧中的面部五官进行定位,得到面部五官的坐标位置。例如,该定位子单元,可以用于对各帧进行面部检测,得到面部坐标矩形框,根据该面部坐标矩形框进行五官关键点定位,得到五官关键点,根据该五官关键点确定面部五官的坐标位置。The positioning sub-unit is configured to locate facial features in each frame to obtain a coordinate position of the facial features. For example, the positioning sub-unit can be used for performing face detection on each frame to obtain a rectangular frame of the face coordinate, and performing a five-point key point positioning according to the rectangular frame of the face coordinate to obtain a key point of the facial features, and determining the coordinates of the facial features according to the key points of the facial features. position.
其中,根据面部坐标矩形框进行五官关键点定位,得到五官关键点的方式可以有多种,具体可以根据实际应用的需求而定,比如,可以将人脸鼻子区域的关键点确定为两个鼻孔中心连线的中点处,即鼻唇中心点。可以通过确定两个嘴角点来嘴部区域的关键点。Among them, according to the rectangular coordinate frame of the face coordinate, the key points of the facial features can be obtained, and the manner of obtaining the key points of the facial features can be various, which can be determined according to the requirements of the actual application. For example, the key points of the nose area can be determined as two nostrils. The midpoint of the center line, the center point of the nose and lips. The key points of the mouth area can be determined by determining the two corner points of the mouth.
该确定子单元,用于根据该面部五官的坐标位置确定嘴部位置,得到嘴部图像。The determining subunit is configured to determine a mouth position according to a coordinate position of the facial features to obtain a mouth image.
例如,该确定子单元,具体可以用于根据该面部五官的坐标位置确定嘴部位置,然后,从这一帧图像中截取或抠取该嘴部位置对应的图像,得到嘴部图像。For example, the determining subunit may be specifically configured to determine a mouth position according to a coordinate position of the facial features, and then intercept or capture an image corresponding to the mouth position from the frame image to obtain a mouth image.
分析单元303,用于分析该嘴部图像,得到嘴部特征。The analyzing unit 303 is configured to analyze the mouth image to obtain a mouth feature.
例如,该分析单元303,具体用于从该嘴部图像中提取纹理特征,得到嘴部特征。For example, the analyzing unit 303 is specifically configured to extract a texture feature from the mouth image to obtain a mouth feature.
其中,纹理特征可以包括HOG特征、LBP特征或Gabor特征等。 The texture feature may include an HOG feature, an LBP feature, or a Gabor feature.
标识单元304,用于利用预设规则,根据该嘴部特征对嘴部状态进行标识。The marking unit 304 is configured to identify the mouth state according to the mouth feature by using a preset rule.
其中,该预设规则可以根据实际应用的需求进行设置,例如,该标识单元可以包括:分类子单元和标识子单元。The preset rule may be set according to requirements of an actual application. For example, the identifier unit may include: a classification subunit and an identifier subunit.
该分类子单元,用于采用回归器或分类器对该嘴部特征进行分类。The classification subunit is configured to classify the mouth features by using a regression or classifier.
例如,分类子单元,具体可以用于采用SVM对该嘴部特征进行分类,或者,也可以采用线性回归器、随机森林等其他回归器或分类器来对该嘴部特征进行分类,等等。For example, the classification sub-unit may be specifically used to classify the mouth features by using the SVM, or may also use a linear regression, a random forest or other regression or classifier to classify the mouth features, and the like.
标识子单元,用于根据分类结果对嘴部状态进行标识。例如,可以如下:The identifier subunit is configured to identify the mouth state according to the classification result. For example, it can be as follows:
若根据该分类结果确定嘴部状态为张嘴状态,则为该帧设置张嘴状态标识位;If it is determined according to the classification result that the mouth state is the open mouth state, the mouth state flag is set for the frame;
若根据该分类结果确定嘴部状态为闭嘴状态,则为该帧设置闭嘴状态标识位。If it is determined according to the classification result that the mouth state is the closed state, the closed state flag is set for the frame.
需说明的是,若根据该分类结果无法确定嘴部状态为张嘴状态还是闭嘴状态,则可以确定该嘴部状态为模糊状态,那么此时,无需进行标识位的设置操作,即既不设置张嘴状态标识位,也不设置闭嘴状态标识位。It should be noted that if it is not determined according to the classification result that the mouth state is the open mouth state or the closed mouth state, it can be determined that the mouth state is a fuzzy state, and at this time, the setting operation of the marker bit is not required, that is, neither is set. The mouth status flag is not set, and the mouth status flag is not set.
(5)识别单元305;(5) identification unit 305;
识别单元305,用于基于该标识确定该视频数据中相应面部的嘴部动作。例如,具体可以如下:The identifying unit 305 is configured to determine a mouth motion of the corresponding face in the video data based on the identifier. For example, the details can be as follows:
接收嘴部动作分析请求,该嘴部动作分析请求指示需要进行嘴部动作分析的目标面部;Receiving a mouth motion analysis request, the mouth motion analysis request indicating a target face that needs to perform mouth motion analysis;
根据该目标面部从该视频数据中提取对应的帧,得到目标帧集合;Extracting a corresponding frame from the video data according to the target face, to obtain a target frame set;
确定该目标帧集合中的帧是否同时存在张嘴状态标识位和闭嘴状态标识位;Determining whether the frame in the target frame set has both a mouth state flag and a mouth state flag;
若是,则确定该目标面部存在张嘴运动状态;If yes, determining that the target face has a mouth open motion state;
若否,则确定该目标面部不存在张嘴运动状态。If not, it is determined that there is no open mouth motion state on the target face.
例如,若目标帧集合中包括四个帧:帧1、帧2、帧3和帧4,其中,帧1和帧2存在张嘴状态标识位,帧3没有标识位,而帧4存在闭嘴状态标识位,则此时可以确定该目标帧集合中的帧同时存在张嘴状态标识位和闭嘴状态标识位,于是确定该目标面部存在张嘴运动状态;否则,若帧1、帧2、帧3和帧4均不存在标识位,或只存在张嘴状态标识位或闭嘴状态标识位,则可以确定该目标帧集合中的帧没有同时存在张嘴状态标识位和闭嘴状态标识位,于是确定该目标面部不存在张嘴运动状态。 For example, if the target frame set includes four frames: frame 1, frame 2, frame 3, and frame 4, wherein frame 1 and frame 2 have a mouth state flag, frame 3 has no flag, and frame 4 has a closed state. The identification bit can be determined at this time that the frame in the target frame set has both the open mouth status flag and the closed mouth status flag, so that the target face has a mouth opening motion state; otherwise, if frame 1, frame 2, frame 3 and If there is no flag in the frame 4, or only the mouth state flag or the mouth state flag is present, it may be determined that the frame in the target frame does not have both the mouth state flag and the mouth state flag, and then the target is determined. There is no mouth movement state on the face.
该图像处理装置具体可以集成在终端或服务器等设备中,该终端可以包括手机、平板电脑、笔记本电脑或PC等设备。The image processing device may be specifically integrated in a device such as a terminal or a server, and the terminal may include a device such as a mobile phone, a tablet computer, a notebook computer, or a PC.
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。In the specific implementation, the foregoing units may be implemented as a separate entity, or may be implemented in any combination, and may be implemented as the same or a plurality of entities. For the specific implementation of the foregoing, refer to the foregoing method embodiments, and details are not described herein.
由上可知,本实施例的图像处理装置采用在获取到视频数据后,从该视频数据中提取具有面部特征的帧,然后,由确定单元302从该提取的帧中确定嘴部位置,得到嘴部图像,进而由分析单元303对该嘴部图像进行分析,得到嘴部特征,再然后,由标识单元304利用预设规则,根据该嘴部特征对嘴部状态进行标识,以作为识别单元305判断嘴部是否运动的依据,从而实现对嘴部动作的识别;由于该方案对人脸五官关键点定位结果精准性的依赖较低,因此,相对于现有方案而言,稳定性更佳,即便视频中人脸晃动,对识别结果也不会有太大的影响,总而言之,该方案可以大大提高识别的正确率,改善识别效果。As can be seen from the above, the image processing apparatus of the present embodiment extracts a frame having facial features from the video data after acquiring the video data, and then the determining unit 302 determines the mouth position from the extracted frame to obtain the mouth. The image of the part is further analyzed by the analyzing unit 303 to obtain the mouth feature, and then the marking unit 304 uses the preset rule to identify the mouth state according to the mouth feature as the identification unit 305. The basis for judging whether the mouth is moving, thereby realizing the recognition of the mouth movement; since the scheme has low dependence on the accuracy of the positioning result of the facial features of the facial features, the stability is better than the existing scheme. Even if the face is shaken in the video, it will not have much influence on the recognition result. In short, the scheme can greatly improve the recognition accuracy and improve the recognition effect.
图4是本申请实施例提供的图像处理装置的结构示意图。如图4所示,该装置包括:处理器401、非易失性计算机可读存储器402、显示单元403、网络通信接口404。这些组件通过总线405进行通信。FIG. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. As shown in FIG. 4, the apparatus includes a processor 401, a non-volatile computer readable memory 402, a display unit 403, and a network communication interface 404. These components communicate over bus 405.
本实施例中,存储器402中存储有多个程序模块,包括操作系统406、网络通信模块407和应用程序408。In this embodiment, a plurality of program modules are stored in the memory 402, including an operating system 406, a network communication module 407, and an application 408.
处理器401可以读取存储器402中的应用程序所包括的各种模块(图中未示出)来执行图像处理的各种功能应用以及数据处理。本实施例中的处理器401可以为一个,也可以为多个,其可以为CPU,处理单元/模块,ASIC,逻辑模块或可编程门阵列等。The processor 401 can read various modules (not shown) included in the application in the memory 402 to perform various functional applications and data processing of image processing. The processor 401 in this embodiment may be one or multiple, and may be a CPU, a processing unit/module, an ASIC, a logic module, or a programmable gate array.
其中,操作系统406可以为:Windows操作系统、Android操作系统或苹果iPhone OS操作系统。The operating system 406 can be: a Windows operating system, an Android operating system, or an Apple iPhone OS operating system.
应用程序408可包括:图像处理模块409。该图像处理模块409可包括:图3中的获取单元301、确定单元302、分析单元303、标识单元304和识别单元305所形成的计算机可执行指令集409-1及对应的元数据和启发式算法409-2。这些计算机可执行指令集可以由所述处理器401执行并完成图1或图2a所示方法或图3所示图像处理装置的功能。 Application 408 can include an image processing module 409. The image processing module 409 may include: a computer executable instruction set 409-1 formed by the obtaining unit 301, the determining unit 302, the analyzing unit 303, the identifying unit 304, and the identifying unit 305 in FIG. 3, and corresponding metadata and heuristics. Algorithm 409-2. These sets of computer executable instructions may be executed by the processor 401 and perform the functions of the method illustrated in FIG. 1 or FIG. 2a or the image processing apparatus illustrated in FIG.
在本实施例中,网络通信接口404与网络通信模块407相配合完成图像处理装置的各种网络信号的收发。In this embodiment, the network communication interface 404 cooperates with the network communication module 407 to complete transmission and reception of various network signals of the image processing apparatus.
显示单元403具有一显示面板,用于完成相关信息的输入及显示。The display unit 403 has a display panel for completing input and display of related information.
如果该图像处理装置无通信需求,也可以不包括网络通信接口404及网络通信模块407。If the image processing apparatus has no communication requirements, the network communication interface 404 and the network communication module 407 may not be included.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。A person skilled in the art may understand that all or part of the various steps of the foregoing embodiments may be performed by a program to instruct related hardware. The program may be stored in a computer readable storage medium, and the storage medium may include: Read Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.
以上对本申请实施例所提供的一种图像处理方法和装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。 An image processing method and apparatus provided by the embodiments of the present application are described in detail. The principles and implementations of the present application are described in the specific examples. The description of the above embodiments is only used to help understand the present application. The method and its core idea; at the same time, those skilled in the art, according to the idea of the present application, there will be changes in the specific implementation manner and the scope of application, in summary, the contents of this specification should not be construed as Application restrictions.

Claims (15)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    获取视频数据;Obtain video data;
    从所述视频数据中提取具有面部特征的帧;Extracting a frame having facial features from the video data;
    从所述各帧中确定出嘴部位置,得到嘴部图像;Determining a mouth position from the frames to obtain a mouth image;
    分析所述嘴部图像,得到嘴部特征;Analyzing the mouth image to obtain a mouth feature;
    利用预设规则,根据所述嘴部特征对嘴部状态进行标识;Using a preset rule to identify the state of the mouth according to the mouth feature;
    基于所标识的嘴部状态识别所述视频数据中相应面部的嘴部动作。A mouth motion of the corresponding face in the video data is identified based on the identified mouth state.
  2. 根据权利要求1所述的方法,其特征在于,所述从所述帧中确定嘴部位置,得到嘴部图像,包括:The method according to claim 1, wherein said determining a mouth position from said frame to obtain a mouth image comprises:
    对所述各帧中的面部五官进行定位,得到面部五官的坐标位置;Positioning the facial features in the frames to obtain a coordinate position of the facial features;
    根据所述面部五官的坐标位置确定所述嘴部位置,得到所述嘴部图像。The mouth position is determined according to the coordinate position of the facial features, and the mouth image is obtained.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述各帧中的面部五官进行定位,得到面部五官的坐标位置,包括:The method according to claim 2, wherein the positioning of facial features in the frames to obtain coordinate positions of facial features comprises:
    对所述各帧进行面部检测,得到面部坐标矩形框;Performing face detection on each frame to obtain a rectangular frame of face coordinates;
    根据所述面部坐标矩形框进行五官关键点定位,得到五官关键点;According to the rectangular frame of the face coordinate, the five-point key point is positioned to obtain the five-point key point;
    根据所述五官关键点确定所述面部五官的坐标位置。The coordinate position of the facial features is determined according to the facial features.
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述分析所述嘴部图像,得到所述嘴部特征,包括:The method according to any one of claims 1 to 3, wherein the analyzing the mouth image to obtain the mouth feature comprises:
    从所述嘴部图像中提取纹理特征,得到嘴部特征。A texture feature is extracted from the mouth image to obtain a mouth feature.
  5. 根据权利要求1至3任一项所述的方法,其特征在于,所述利用预设规则,根据所述嘴部特征对嘴部状态进行标识,包括:The method according to any one of claims 1 to 3, wherein the using the preset rule to identify the mouth state according to the mouth feature comprises:
    采用回归器或分类器对所述嘴部特征进行分类;Sorting the mouth features using a regression or classifier;
    根据分类结果对嘴部状态进行标识。The mouth state is identified based on the classification result.
  6. 根据权利要求5所述的方法,其特征在于,所述根据分类结果对嘴部状态进行标识,包括:The method according to claim 5, wherein the identifying the mouth state according to the classification result comprises:
    若根据所述分类结果确定所述嘴部状态为张嘴状态,则为所述帧设置张嘴状态标识位; If it is determined that the mouth state is a mouth state according to the classification result, setting a mouth state flag for the frame;
    若根据所述分类结果确定嘴部状态为闭嘴状态,则为所述帧设置闭嘴状态标识位。If it is determined according to the classification result that the mouth state is the closed state, the closed state flag is set for the frame.
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述标识识别所述视频数据中相应面部的嘴部动作,包括:The method according to claim 6, wherein the identifying the mouth motion of the corresponding face in the video data based on the identifier comprises:
    接收嘴部动作分析请求,所述嘴部动作分析请求指示需要进行嘴部动作分析的目标面部;Receiving a mouth motion analysis request, the mouth motion analysis request indicating a target face that needs to perform mouth motion analysis;
    根据所述目标面部从所述各帧中提取与该目标面部对应的帧,得到目标帧集合;Extracting a frame corresponding to the target face from the frames according to the target face, to obtain a target frame set;
    确定所述目标帧集合中是否有设置有张嘴状态标识位和闭嘴状态标识位的帧;Determining whether there is a frame in the target frame set with a mouth opening status flag and a mouth closing status flag bit;
    若有设置有张嘴状态标识位和闭嘴状态标识位的帧,则确定所述目标面部存在张嘴运动状态;If there is a frame provided with a mouth state flag and a mouth state flag, it is determined that the target face has a mouth movement state;
    若没有设置有张嘴状态标识位和闭嘴状态标识位的帧,则确定所述目标面部不存在张嘴运动状态。If the frame of the mouth state flag and the mouth state flag is not set, it is determined that the target face does not have a mouth motion state.
  8. 一种图像处理装置,其特征在于,包括:An image processing apparatus, comprising:
    获取单元,用于获取视频数据,并从所述视频数据中提取具有面部特征的帧。And an obtaining unit, configured to acquire video data, and extract a frame having facial features from the video data.
    确定单元,用于从所述各帧中确定出嘴部位置,得到嘴部图像;a determining unit, configured to determine a mouth position from the frames to obtain a mouth image;
    分析单元,用于分析所述嘴部图像,得到嘴部特征;An analyzing unit, configured to analyze the mouth image to obtain a mouth feature;
    标识单元,用于利用预设规则,根据所述嘴部特征对嘴部状态进行标识;An identifier unit, configured to identify a mouth state according to the mouth feature by using a preset rule;
    识别单元,用于基于所标识的嘴部状态识别所述视频数据中相应面部的嘴部动作。And an identifying unit, configured to identify a mouth motion of the corresponding face in the video data based on the identified mouth state.
  9. 根据权利要求8所述的装置,其特征在于,所述确定单元包括:定位子单元和确定子单元;The apparatus according to claim 8, wherein the determining unit comprises: a positioning subunit and a determining subunit;
    所述定位子单元,用于对所述各帧中的面部五官进行定位,得到面部五官的坐标位置;The positioning subunit is configured to locate facial features in the frames to obtain a coordinate position of the facial features;
    所述确定子单元,用于根据所述面部五官的坐标位置确定嘴部位置,得到嘴部图像。The determining subunit is configured to determine a mouth position according to a coordinate position of the facial features to obtain a mouth image.
  10. 根据权利要求9所述的装置,其特征在于,The device of claim 9 wherein:
    所述定位子单元,进一步用于对所述各帧进行面部检测,得到面部坐标矩形框;根据所述面部坐标矩形框进行五官关键点定位,得到五官关键点;根据所述五官关键点确定面部五官的坐标位置。 The locating sub-unit is further configured to perform face detection on the frames to obtain a rectangular frame of a face coordinate; perform a five-point key point positioning according to the rectangular frame of the face coordinate to obtain a five-point key point; and determine a face according to the key point of the facial features The coordinate position of the five senses.
  11. 根据权利要求8至10任一项所述的装置,其特征在于,A device according to any one of claims 8 to 10, characterized in that
    分析单元,具体用于从所述嘴部图像中提取纹理特征,得到嘴部特征。The analyzing unit is specifically configured to extract a texture feature from the mouth image to obtain a mouth feature.
  12. 根据权利要求8至10任一项所述的装置,其特征在于,所述标识单元包括分类子单元和标识子单元。The apparatus according to any one of claims 8 to 10, wherein the identification unit comprises a classification subunit and an identification subunit.
    该分类子单元,用于采用回归器或分类器对所述嘴部特征进行分类;The classification subunit is configured to classify the mouth features by using a regression device or a classifier;
    标识子单元,用于根据分类结果对嘴部状态进行标识。The identifier subunit is configured to identify the mouth state according to the classification result.
  13. 根据权利要求12所述的装置,其特征在于,The device according to claim 12, characterized in that
    所述标识子单元,进一步用于若根据所述分类结果确定嘴部状态为张嘴状态,则为所述帧设置张嘴状态标识位;若根据所述分类结果确定嘴部状态为闭嘴状态,则为所述帧设置闭嘴状态标识位。The identifier subunit is further configured to: if the mouth state is determined to be a mouth opening state according to the classification result, setting a mouth opening state identifier bit for the frame; and if the mouth state is determined to be a closed state according to the classification result, A closed state flag is set for the frame.
  14. 根据权利要求13所述的装置,其特征在于,所述识别单元,进一步用于The apparatus according to claim 13, wherein said identification unit is further used for
    接收嘴部动作分析请求,所述嘴部动作分析请求指示需要进行嘴部动作分析的目标面部;Receiving a mouth motion analysis request, the mouth motion analysis request indicating a target face that needs to perform mouth motion analysis;
    根据所述目标面部从所述各帧中提取与该目标面部对应的帧,得到目标帧集合;Extracting a frame corresponding to the target face from the frames according to the target face, to obtain a target frame set;
    确定所述目标帧集合中是否有设置有张嘴状态标识位和闭嘴状态标识位的帧;Determining whether there is a frame in the target frame set with a mouth opening status flag and a mouth closing status flag bit;
    若有设置有张嘴状态标识位和闭嘴状态标识位的帧,则确定所述目标面部存在张嘴运动状态;If there is a frame provided with a mouth state flag and a mouth state flag, it is determined that the target face has a mouth movement state;
    若没有设置有张嘴状态标识位和闭嘴状态标识位的帧,则确定所述目标面部不存在张嘴运动状态。If the frame of the mouth state flag and the mouth state flag is not set, it is determined that the target face does not have a mouth motion state.
  15. 一种计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行以执行所述权利要求1至7任一项所述的方法。 A computer readable storage medium storing computer readable instructions executable by at least one processor to perform the method of any one of claims 1 to 7.
PCT/CN2016/106752 2015-11-25 2016-11-22 Image processing method and apparatus WO2017088727A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/680,976 US10360441B2 (en) 2015-11-25 2017-08-18 Image processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510827420.1 2015-11-25
CN201510827420.1A CN106778450B (en) 2015-11-25 2015-11-25 Face recognition method and device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/079163 Continuation WO2017107345A1 (en) 2015-11-25 2016-04-13 Image processing method and apparatus

Related Child Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2016/079163 Continuation WO2017107345A1 (en) 2015-11-25 2016-04-13 Image processing method and apparatus
US15/680,976 Continuation US10360441B2 (en) 2015-11-25 2017-08-18 Image processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2017088727A1 true WO2017088727A1 (en) 2017-06-01

Family

ID=58763013

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/106752 WO2017088727A1 (en) 2015-11-25 2016-11-22 Image processing method and apparatus

Country Status (2)

Country Link
CN (1) CN106778450B (en)
WO (1) WO2017088727A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451564A (en) * 2017-07-31 2017-12-08 上海爱优威软件开发有限公司 A kind of human face action control method and system
CN109034064A (en) * 2018-07-26 2018-12-18 长沙舍同智能科技有限责任公司 Near-infrared face identification method, device and realization device
CN109815806A (en) * 2018-12-19 2019-05-28 平安科技(深圳)有限公司 Face identification method and device, computer equipment, computer storage medium
CN110544200A (en) * 2019-08-30 2019-12-06 北京宠拍科技有限公司 method for realizing mouth interchange between human and cat in video
CN111382624A (en) * 2018-12-28 2020-07-07 杭州海康威视数字技术股份有限公司 Action recognition method, device, equipment and readable storage medium
CN111611850A (en) * 2020-04-09 2020-09-01 吴子华 Seat use state analysis processing method, system and storage medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330370B (en) * 2017-06-02 2020-06-19 广州视源电子科技股份有限公司 Forehead wrinkle action detection method and device and living body identification method and system
CN107368777A (en) * 2017-06-02 2017-11-21 广州视源电子科技股份有限公司 A kind of smile motion detection method and device and vivo identification method and system
CN107330914B (en) * 2017-06-02 2021-02-02 广州视源电子科技股份有限公司 Human face part motion detection method and device and living body identification method and system
CN107358155A (en) * 2017-06-02 2017-11-17 广州视源电子科技股份有限公司 A kind of funny face motion detection method and device and vivo identification method and system
CN107609474B (en) * 2017-08-07 2020-05-01 深圳市科迈爱康科技有限公司 Limb action recognition method and device, robot and storage medium
CN107992813A (en) * 2017-11-27 2018-05-04 北京搜狗科技发展有限公司 A kind of lip condition detection method and device
CN112826486A (en) * 2019-11-25 2021-05-25 虹软科技股份有限公司 Heart rate estimation method and device and electronic equipment applying same
CN111666820B (en) * 2020-05-11 2023-06-20 北京中广上洋科技股份有限公司 Speech state recognition method and device, storage medium and terminal
CN114299596B (en) * 2022-03-09 2022-06-07 深圳联和智慧科技有限公司 Smart city face recognition matching method and system and cloud platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1421816A (en) * 2001-11-23 2003-06-04 纬创资通股份有限公司 Wireless recognition apparatus for fingerprint and method thereof
CN1439997A (en) * 2002-02-22 2003-09-03 杭州中正生物认证技术有限公司 Fingerprint identifying method and system
CN104637246A (en) * 2015-02-02 2015-05-20 合肥工业大学 Driver multi-behavior early warning system and danger evaluation method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877056A (en) * 2009-12-21 2010-11-03 北京中星微电子有限公司 Facial expression recognition method and system, and training method and system of expression classifier
CN102097003B (en) * 2010-12-31 2014-03-19 北京星河易达科技有限公司 Intelligent traffic safety system and terminal
US9159321B2 (en) * 2012-02-27 2015-10-13 Hong Kong Baptist University Lip-password based speaker verification system
CN104951730B (en) * 2014-03-26 2018-08-31 联想(北京)有限公司 A kind of lip moves detection method, device and electronic equipment
CN104134058B (en) * 2014-07-21 2017-07-11 成都万维图新信息技术有限公司 A kind of face image processing process

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1421816A (en) * 2001-11-23 2003-06-04 纬创资通股份有限公司 Wireless recognition apparatus for fingerprint and method thereof
CN1439997A (en) * 2002-02-22 2003-09-03 杭州中正生物认证技术有限公司 Fingerprint identifying method and system
CN104637246A (en) * 2015-02-02 2015-05-20 合肥工业大学 Driver multi-behavior early warning system and danger evaluation method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451564A (en) * 2017-07-31 2017-12-08 上海爱优威软件开发有限公司 A kind of human face action control method and system
CN109034064A (en) * 2018-07-26 2018-12-18 长沙舍同智能科技有限责任公司 Near-infrared face identification method, device and realization device
CN109034064B (en) * 2018-07-26 2021-01-08 长沙舍同智能科技有限责任公司 Near-infrared face recognition method, device and implementation device
CN109815806A (en) * 2018-12-19 2019-05-28 平安科技(深圳)有限公司 Face identification method and device, computer equipment, computer storage medium
CN111382624A (en) * 2018-12-28 2020-07-07 杭州海康威视数字技术股份有限公司 Action recognition method, device, equipment and readable storage medium
CN111382624B (en) * 2018-12-28 2023-08-11 杭州海康威视数字技术股份有限公司 Action recognition method, device, equipment and readable storage medium
CN110544200A (en) * 2019-08-30 2019-12-06 北京宠拍科技有限公司 method for realizing mouth interchange between human and cat in video
CN110544200B (en) * 2019-08-30 2024-05-24 北京神州数码云科信息技术有限公司 Method for realizing mouth exchange between person and cat in video
CN111611850A (en) * 2020-04-09 2020-09-01 吴子华 Seat use state analysis processing method, system and storage medium

Also Published As

Publication number Publication date
CN106778450B (en) 2020-04-24
CN106778450A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
WO2017088727A1 (en) Image processing method and apparatus
US10956719B2 (en) Depth image based face anti-spoofing
WO2019232866A1 (en) Human eye model training method, human eye recognition method, apparatus, device and medium
Carcagnì et al. Facial expression recognition and histograms of oriented gradients: a comprehensive study
WO2019232862A1 (en) Mouth model training method and apparatus, mouth recognition method and apparatus, device, and medium
US8913798B2 (en) System for recognizing disguised face using gabor feature and SVM classifier and method thereof
US9405962B2 (en) Method for on-the-fly learning of facial artifacts for facial emotion recognition
US10706267B2 (en) Compact models for object recognition
Samangouei et al. Attribute-based continuous user authentication on mobile devices
US10733279B2 (en) Multiple-tiered facial recognition
US9575566B2 (en) Technologies for robust two-dimensional gesture recognition
GB2500321A (en) Dealing with occluding features in face detection methods
Smith-Creasey et al. Continuous face authentication scheme for mobile devices with tracking and liveness detection
EP2370932B1 (en) Method, apparatus and computer program product for providing face pose estimation
Findling et al. Towards face unlock: on the difficulty of reliably detecting faces on mobile phones
US10360441B2 (en) Image processing method and apparatus
Dave et al. Face recognition in mobile phones
US10296782B2 (en) Processing device and method for face detection
Kawulok Energy-based blob analysis for improving precision of skin segmentation
Han et al. Efficient eye-blinking detection on smartphones: A hybrid approach based on deep learning
Lahiani et al. Hand pose estimation system based on Viola-Jones algorithm for android devices
Findling et al. Towards pan shot face unlock: Using biometric face information from different perspectives to unlock mobile devices
CN110363187B (en) Face recognition method, face recognition device, machine readable medium and equipment
Karappa et al. Detection of sign-language content in video through polar motion profiles
US11074676B2 (en) Correction of misaligned eyes in images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16867950

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/11/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16867950

Country of ref document: EP

Kind code of ref document: A1