WO2017107345A1 - Image processing method and apparatus - Google Patents

Image processing method and apparatus Download PDF

Info

Publication number
WO2017107345A1
WO2017107345A1 PCT/CN2016/079163 CN2016079163W WO2017107345A1 WO 2017107345 A1 WO2017107345 A1 WO 2017107345A1 CN 2016079163 W CN2016079163 W CN 2016079163W WO 2017107345 A1 WO2017107345 A1 WO 2017107345A1
Authority
WO
WIPO (PCT)
Prior art keywords
lip
map
region
feature
face
Prior art date
Application number
PCT/CN2016/079163
Other languages
French (fr)
Chinese (zh)
Inventor
倪辉
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017107345A1 publication Critical patent/WO2017107345A1/en
Priority to US15/680,976 priority Critical patent/US10360441B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to the field of video image processing technologies, and in particular, to an image processing method and apparatus.
  • Some Internet scenes usually involve the process of lip recognition.
  • an identity authentication scenario in order to prevent illegal users from using static pictures to confuse audio and video, it is usually necessary to record a video image of the user's speech, and then perform lip movement recognition on the video image. Process to confirm the identity of a legitimate user.
  • One of the solutions for performing lip movement recognition processing on an image in the prior art is: calculating the area of the lip region in each frame of the image in the video, and confirming whether the lip occurs by the difference in the area of the lip region between the frame images. move.
  • Another solution is: extracting a lip opening and closing state in each frame of the image in the video, and detecting whether a lip motion occurs according to the opening and closing amplitude.
  • the prior art relies on the amplitude of the change of the lip. If the change of the lip is small, the change of the area of the lip area and the extent of the opening and closing of the lip are not obvious enough, which may affect the accuracy of the lip recognition result and influence. The utility of the prior art solutions.
  • the embodiment of the invention provides an image processing method and device, which can recognize the lip movement according to the lip variation of the image over time span, can avoid the influence of the lip variation amplitude, improve the accuracy of the recognition result, and improve the practicability of the image processing.
  • a first aspect of the embodiments of the present invention provides an image processing method, which may include:
  • the lip motion recognition is performed according to the texture feature of the lip change map, and the recognition result is obtained.
  • the detecting a face area in each frame image included in the to-be-processed video, and locating the lip area from the face area includes:
  • a face detection algorithm is used to detect a face region in each frame image
  • a lip registration region is used to locate the lip region from the face region.
  • the feature column pixel construction lip variation map of extracting a lip region from each frame image comprises:
  • the extracted feature column pixel map is spliced according to the chronological order of each frame image to obtain a lip change map.
  • the extracting the feature column pixmap from the lip region map comprises:
  • a column of pixel maps composed of all the pixels located on the vertical axis in the lip region map is extracted as a feature column pixmap.
  • the preset position is a central pixel point position of the lip region map.
  • the lip movement recognition is performed according to the texture feature of the lip change map, and the recognition result is obtained, including:
  • the texture feature comprising an LBP feature and/or an HOG feature
  • the texture features are classified by using a preset classification algorithm to obtain a lip motion recognition result, and the recognition result includes: lip movement or lip motion does not occur.
  • a second aspect of the embodiments of the present invention provides an image processing apparatus, which may include:
  • a positioning unit configured to detect a face area in each frame image included in the to-be-processed video, and locate a lip area from the face area;
  • a building unit configured to extract a feature column pixel of the lip region from the image of each frame to construct a lip variation map
  • a lip motion recognition unit configured to perform lip motion recognition according to the texture feature of the lip change map to obtain a recognition result.
  • the positioning unit comprises:
  • a parsing unit configured to parse the video to be processed to obtain at least one frame of image
  • a face detecting unit configured to detect a face region in each frame image by using a face detection algorithm
  • a face registration unit is configured to locate a lip region from the face region using a face registration algorithm.
  • the building unit comprises:
  • An intercepting unit for intercepting a lip region map in each frame image
  • An extracting unit configured to extract a feature column pixmap from the lip region map
  • the splicing processing unit is configured to perform splicing processing on the extracted feature column pixmap according to the chronological order of each frame image to obtain a lip change map.
  • the extracting unit comprises:
  • a position determining unit configured to determine a preset position in the lip area map
  • a vertical axis determining unit for drawing a vertical axis along the preset position
  • the feature column pixel extracting unit is configured to extract a column of pixel maps composed of all the pixels located on the vertical axis in the lip region map as a feature column pixel map.
  • the preset position is a central pixel point position of the lip region map.
  • the lip movement recognition unit comprises:
  • a calculating unit configured to calculate a texture feature of the lip variation map, the texture feature comprising an LBP (Local Binary Patterns) feature and/or a HOG (Histogram of Oriented Gradient) feature;
  • LBP Local Binary Patterns
  • HOG Histogram of Oriented Gradient
  • a classification unit configured to classify the texture features by using a preset classification algorithm, to obtain a lip recognition result, where the recognition result comprises: lip motion or lip motion does not occur.
  • the face region detection and the lip region localization are performed on each frame image included in the video, and the feature column pixels of the lip region are extracted from each frame image to construct a lip variation map, because the lip portion
  • the change graph is from each frame image, which enables the lip change map to reflect the time span of each image composition as a whole; the lip motion recognition through the texture feature of the lip change map obtains the recognition result, that is, the lip according to the time span Change recognition lip movement, can avoid the influence of the amplitude of the lip change, identify the effect The rate is higher and the recognition result is more accurate.
  • FIG. 1 is a flowchart of an image processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an Internet device according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention.
  • the face region detection and the lip region localization are performed on each frame image included in the video, and the feature column pixels of the lip region are extracted from each frame image to construct a lip variation map, because the lip portion
  • the change graph is from each frame image, which enables the lip change map to reflect the time span of each image composition as a whole; the lip motion recognition through the texture feature of the lip change map obtains the recognition result, that is, the lip according to the time span
  • the change recognizes the lip movement, which can avoid the influence of the change range of the lip, the recognition efficiency is high and the recognition result is high.
  • the image processing method of the embodiment of the present invention can be applied to many Internet scenarios, for example, in a voice input scenario, the voice acquisition process can be controlled by lip movement recognition of the user's speech video; for example, in an identity authentication scenario.
  • the identity of the legitimate user can be confirmed by lip-moving recognition of the user's speaking video, and the illegal user is prevented from using the static picture to confuse the audiovisual;
  • the image processing apparatus of the embodiment of the present invention can be applied to various devices in an Internet scenario, for example, can be applied to a terminal or applied to a server.
  • an embodiment of the present invention provides an image processing method. Referring to FIG. 1, the method may perform the following steps S101-S103.
  • S101 Detect a face region in each frame image included in the to-be-processed video, and locate a lip region from the face region.
  • the video to be processed may be a video recorded in real time.
  • the terminal may record the user's speaking video as a to-be-processed video in real time.
  • the to-be-processed video may also be the received real-time video.
  • the server may receive the user-speaking video recorded by the terminal in real time as the to-be-processed video.
  • the face detection technology refers to determining whether a given image contains a human face by using a certain strategy scan, and determining the position, size and posture of the face in the image after determining the inclusion.
  • the face registration technique refers to the use of a certain algorithm to clearly distinguish the contours of the face, nose, and lips of the face according to the position, size, and posture of the face.
  • the method of the present embodiment specifically relates to the face detection technology and the face registration technology in the process of performing step S101; specifically, the method performs the following steps s11-s13 when performing step S101:
  • S11 parsing the processed video to obtain at least one frame of image.
  • the video is composed of one frame and one frame of image in chronological order. Therefore, the frame to be processed can be framed to obtain one frame and one frame of image.
  • the face detection algorithm may include, but is not limited to, a PCA (Principal Component Analysis) algorithm, an elastic model based method, a Hidden Markov Model, and the like.
  • a face detection algorithm can be used to determine a face area, which is used to display the position, size and posture of the face in each frame image.
  • the face registration algorithm may include, but is not limited to, a Lasso face regression registration algorithm, a wavelet domain algorithm, and the like.
  • the face registration algorithm can accurately locate the lip region for the face position, size and posture of the face region displayed in each frame image.
  • the lip change map requires an overall reflection of the lip change over the time span. Since the video is composed of one frame and one frame of image in chronological order, and the video is movable over the time span of the frame of each frame The state reflects the change of the lip. Therefore, this step can construct the lip change map by using the changing characteristics of the lip region in each frame image.
  • the method performs the following steps s21-s23 when performing step S101:
  • the lip region map is intercepted in each frame image. Since the lip region has been accurately positioned from each frame image, the lip region map can be directly intercepted from each frame image in this step s21, and then the first lip region map can be intercepted in the first frame image. A second lip area map can be captured in the second frame image, and so on.
  • the feature column pixel refers to a column of pixels in a frame image that can reflect the characteristics of the lip change, and the image formed by the feature column is referred to as a feature column pixel map.
  • the method performs the following steps ss221-ss223 when performing step s22:
  • the preset position may be the position of any pixel point in the lip area map.
  • the change of the center of the lip is most obvious when the lip is moved. Therefore, in the embodiment of the present invention, the preset position is the lip area.
  • the center pixel position of the graph is the graph.
  • Ss223 extracts a column of pixel maps composed of all the pixels located on the vertical axis in the lip region map as a feature column pixel map.
  • the characteristic column pixmap can be extracted longitudinally along the preset position; it can be understood that if the pre-prevention The position is the central pixel position of the lip region map, and the extracted feature column pixel map is a column of pixel maps in the center of the lip region.
  • S23 Perform splicing processing on the extracted feature column pixmap according to the chronological order of each frame image to obtain a lip change map.
  • the feature column pixel map can be extracted from the preset position in each frame image, and the lip change map obtained by splicing the feature column pixel map extracted from each frame image also reflects the lip portion. The change in the preset position.
  • the pixel map of the central column of the lip region extracted from the first frame image may be referred to as a first central column pixel map; Extracted into the central column pixel map of the lip region, which can be called the second central column image
  • the splicing process in the step s23 may be: after the second central column pixel image is horizontally spliced into the first central column pixel image, the third central column pixel image is horizontally spliced to the second central portion.
  • a lip change map is formed, which reflects the change in the center of the lip.
  • S103 Perform lip movement recognition according to the texture feature of the lip change map to obtain a recognition result.
  • Lip recognition is the process of confirming whether or not lip movement occurs.
  • the method specifically performs the following steps s31-s32 when performing step S103:
  • S31 Calculate texture features of the lip variation map, including but not limited to: LBP features and/or HOG features.
  • the LBP feature can effectively describe and measure the local texture information of the image, and has significant advantages such as rotation invariance and gray invariance.
  • the LBP algorithm can be used to calculate the LBP feature of the lip change map.
  • the HOG feature is a feature descriptor for performing object detection in image processing; in the process of performing step s31, the HOG algorithm may be used to calculate the HOG feature of the lip change map.
  • the texture feature may also include other features such as SIFT features, so the method may also use other algorithms to calculate the texture features of the lip variation map during the execution of step s31.
  • S32 classify the texture feature by using a preset classification algorithm to obtain a lip recognition result, where the recognition result includes: lip motion or lip motion does not occur.
  • the preset classification algorithm may include, but is not limited to, a Bayesian algorithm, a logistic regression algorithm, and an SVM (Support Vector Machine) algorithm.
  • SVM Small Vector Machine
  • the texture feature is substituted into the SVM algorithm classifier as an input parameter, and the SVM algorithm classifier can output the classification result (ie, the lip recognition result).
  • the image processing method is used to perform face region detection and lip region localization for each frame image included in the video, and the feature column pixel of the lip region is extracted from each frame image to construct a lip variation map. Since the lip change map is from each frame image, the lip change map can reflect the time span of each image composition as a whole; the lip motion recognition through the texture feature of the lip change map obtains the recognition result, that is, according to the time span. The lip change on the upper side recognizes the lip movement, which can avoid the influence of the amplitude of the lip change, and the recognition efficiency is high and the recognition result is high.
  • the embodiment of the present invention further provides an Internet device, which may be a terminal or a server.
  • the internal structure of the Internet device may include, but is not limited to, processing. , user interface, network interface and memory.
  • the processor, the user interface, the network interface, and the memory in the Internet device can be connected by a bus or other means.
  • a bus connection is taken as an example.
  • the user interface is a medium for realizing interaction and information exchange between the user and the Internet device, and the specific embodiment thereof may include a display for output and a keyboard for input, etc.
  • the keyboard here can be either a physical keyboard or a touch screen virtual keyboard, or a keyboard combining physical and touch screen virtual.
  • a processor or CPU (Central Processing Unit)
  • CPU Central Processing Unit
  • Memory is a memory device in an Internet device that stores programs and data. It can be understood that the memory herein may be a high speed RAM memory, or may be a non-volatile memory, such as at least one disk memory; optionally, at least one storage located away from the foregoing processor. Device.
  • the memory provides a storage space that stores an operating system of the Internet device and also stores an image processing device.
  • the Internet device can execute the corresponding steps of the method flow shown in FIG. 1 by running the image processing device in the memory.
  • the image processing apparatus operates as follows:
  • the locating unit 101 is configured to detect a face area in each frame image included in the to-be-processed video, and locate a lip area from the face area.
  • the building unit 102 is configured to extract a feature column pixel construction lip change map of the lip region from each frame image.
  • the lip motion recognition unit 103 is configured to perform lip motion recognition according to the texture feature of the lip change map to obtain a recognition result.
  • the image processing apparatus runs the following unit in the process of running the positioning unit 101:
  • the parsing unit 1001 is configured to parse the video to be processed to obtain at least one frame of image.
  • the face detecting unit 1002 is configured to detect a face region in each frame image by using a face detection algorithm. area.
  • the face registration unit 1003 is configured to locate a lip region from the face region by using a face registration algorithm.
  • the image processing apparatus runs the following units in the process of running the building unit 102:
  • the intercepting unit 2001 is configured to intercept a lip region map in each frame image.
  • the extracting unit 2002 is configured to extract a feature column pixmap from the lip region map.
  • the splicing processing unit 2003 is configured to perform splicing processing on the extracted feature column pixmap according to the chronological order of each frame image to obtain a lip change map.
  • the image processing apparatus runs the following unit in the process of running the extracting unit 2002:
  • the position determining unit 2221 is configured to determine a preset position in the lip area map; preferably, the preset position is a central pixel point position of the lip area map.
  • the vertical axis determining unit 2222 is configured to draw a vertical axis along the preset position.
  • the feature column pixel extracting unit 2223 is configured to extract, as a feature column pixmap, a column of pixel maps composed of all the pixels located on the vertical axis in the lip region map.
  • the image processing apparatus runs the following unit in the process of running the lip recognition unit 103:
  • the calculating unit 3001 is configured to calculate a texture feature of the lip variation map, the texture feature including an LBP feature and/or an HOG feature.
  • the classification unit 3002 is configured to classify the texture features by using a preset classification algorithm to obtain a lip motion recognition result, where the recognition result includes: lip motion or lip motion does not occur.
  • the embodiment of the present invention performs face detection and lip region localization for each frame image included in the video by running the image processing device, and extracts the lip from each frame image.
  • the feature column of the region constructs a lip variation map. Since the lip variation map is from each frame image, the lip variation map can reflect the time span of each image composition as a whole; the lip motion is performed by the texture feature of the lip variation map. Recognizing the recognition result, that is, recognizing the lip movement according to the change of the lip over the time span, can avoid the influence of the amplitude of the lip change, the recognition efficiency is high and the recognition result is high.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are an image processing method and apparatus. The method may comprise: detecting a face area in each frame of images contained in a video to be processed, and positioning a lip area from the face area (S101); extracting a feature column pixel of the lip area from each frame of images to construct a lip variation graph (S102); and performing lip movement recognition according to a texture feature of the lip variation graph to obtain a recognition result (S103). A lip movement is recognised according to a lip variation of an image over a time span, which can avoid the impact of amplitude of the lip variation, improve the accuracy of a recognition result, and improve the practicability of image processing.

Description

一种图像处理方法及装置Image processing method and device
本发明要求2015年12月26日递交的发明名称为“一种图像处理方法及装置”的申请号201510996643.0的在先申请优先权,上述在先申请的内容以引入的方式并入本文本中。The present application claims the priority of the priority of the application Serial No. 201510996643.0, filed on Dec.
技术领域Technical field
本发明涉及互联网技术领域,具体涉及视频图像处理技术领域,尤其涉及一种图像处理方法及装置。The present invention relates to the field of Internet technologies, and in particular, to the field of video image processing technologies, and in particular, to an image processing method and apparatus.
背景技术Background technique
某些互联网场景通常会涉及唇部识别的过程,例如:在身份认证场景中,为了避免非法用户采用静态图片混淆视听,通常需要录制用户说话的视频图像,然后对该视频图像进行唇动识别等处理来确认合法用户身份。现有技术中对图像进行唇动识别处理的其中一种方案为:计算视频中每一帧图像中的唇部区域的面积大小,再通过帧图像之间唇部区域的面积差确认是否发生唇动。另一种方案为:提取视频中每一帧图像中唇部开闭状态,依据开闭幅度来检测是否发生唇动。现有技术均依赖于唇部变化幅度,若唇部变化幅度较小,其唇部区域的面积变化及唇部开闭状态的幅度都不够明显,从而会影响唇动识别结果的准确性,影响现有技术方案的实用性。Some Internet scenes usually involve the process of lip recognition. For example, in an identity authentication scenario, in order to prevent illegal users from using static pictures to confuse audio and video, it is usually necessary to record a video image of the user's speech, and then perform lip movement recognition on the video image. Process to confirm the identity of a legitimate user. One of the solutions for performing lip movement recognition processing on an image in the prior art is: calculating the area of the lip region in each frame of the image in the video, and confirming whether the lip occurs by the difference in the area of the lip region between the frame images. move. Another solution is: extracting a lip opening and closing state in each frame of the image in the video, and detecting whether a lip motion occurs according to the opening and closing amplitude. The prior art relies on the amplitude of the change of the lip. If the change of the lip is small, the change of the area of the lip area and the extent of the opening and closing of the lip are not obvious enough, which may affect the accuracy of the lip recognition result and influence. The utility of the prior art solutions.
发明内容Summary of the invention
本发明实施例提供一种图像处理方法及装置,依据图像在时间跨度上的唇部变化识别唇动,能够避免唇部变化幅度的影响,提升识别结果准确性,提升图像处理的实用性。The embodiment of the invention provides an image processing method and device, which can recognize the lip movement according to the lip variation of the image over time span, can avoid the influence of the lip variation amplitude, improve the accuracy of the recognition result, and improve the practicability of the image processing.
本发明实施例第一方面提供一种图像处理方法,可包括:A first aspect of the embodiments of the present invention provides an image processing method, which may include:
在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域;Detecting a face area in each frame image included in the to-be-processed video, and locating a lip area from the face area;
从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图; Extracting a feature column pixel of the lip region from each of the frame images to construct a lip variation map;
根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果。The lip motion recognition is performed according to the texture feature of the lip change map, and the recognition result is obtained.
优选地,所述在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域,包括:Preferably, the detecting a face area in each frame image included in the to-be-processed video, and locating the lip area from the face area, includes:
对待处理视频进行解析获得至少一帧图像;Parsing the processed video to obtain at least one frame of image;
采用人脸检测算法在每一帧图像中检测人脸区域;A face detection algorithm is used to detect a face region in each frame image;
采用人脸配准算法从所述人脸区域中定位唇部区域。A lip registration region is used to locate the lip region from the face region.
优选地,所述从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图,包括:Preferably, the feature column pixel construction lip variation map of extracting a lip region from each frame image comprises:
在每一帧图像中截取唇部区域图;Intercepting a lip region map in each frame of image;
从所述唇部区域图中提取特征列像素图;Extracting a feature column pixmap from the lip region map;
按照每一帧图像的时间顺序对所提取的特征列像素图进行拼接处理,获得唇部变化图。The extracted feature column pixel map is spliced according to the chronological order of each frame image to obtain a lip change map.
优选地,所述从所述唇部区域图中提取特征列像素图,包括:Preferably, the extracting the feature column pixmap from the lip region map comprises:
在所述唇部区域图中确定预设位置;Determining a preset position in the lip region map;
沿所述预设位置绘制纵轴;Drawing a vertical axis along the preset position;
提取由所述唇部区域图中位于所述纵轴的所有像素点构成的一列像素图作为特征列像素图。A column of pixel maps composed of all the pixels located on the vertical axis in the lip region map is extracted as a feature column pixmap.
优选地,所述预设位置为所述唇部区域图的中心像素点位置。Preferably, the preset position is a central pixel point position of the lip region map.
优选地,所述根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果,包括:Preferably, the lip movement recognition is performed according to the texture feature of the lip change map, and the recognition result is obtained, including:
计算所述唇部变化图的纹理特征,所述纹理特征包括LBP特征和/或HOG特征;Calculating a texture feature of the lip variation map, the texture feature comprising an LBP feature and/or an HOG feature;
采用预设分类算法对所述纹理特征进行分类,获得唇动识别结果,所述识别结果包括:发生唇动或未发生唇动。The texture features are classified by using a preset classification algorithm to obtain a lip motion recognition result, and the recognition result includes: lip movement or lip motion does not occur.
本发明实施例第二方面提供一种图像处理装置,可包括:A second aspect of the embodiments of the present invention provides an image processing apparatus, which may include:
定位单元,用于在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域;a positioning unit, configured to detect a face area in each frame image included in the to-be-processed video, and locate a lip area from the face area;
构建单元,用于从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图; a building unit, configured to extract a feature column pixel of the lip region from the image of each frame to construct a lip variation map;
唇动识别单元,用于根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果。a lip motion recognition unit configured to perform lip motion recognition according to the texture feature of the lip change map to obtain a recognition result.
优选地,所述定位单元包括:Preferably, the positioning unit comprises:
解析单元,用于对待处理视频进行解析获得至少一帧图像;a parsing unit, configured to parse the video to be processed to obtain at least one frame of image;
人脸检测单元,用于采用人脸检测算法在每一帧图像中检测人脸区域;a face detecting unit, configured to detect a face region in each frame image by using a face detection algorithm;
人脸配准单元,用于采用人脸配准算法从所述人脸区域中定位唇部区域。A face registration unit is configured to locate a lip region from the face region using a face registration algorithm.
优选地,所述构建单元包括:Preferably, the building unit comprises:
截取单元,用于在每一帧图像中截取唇部区域图;An intercepting unit for intercepting a lip region map in each frame image;
提取单元,用于从所述唇部区域图中提取特征列像素图;An extracting unit, configured to extract a feature column pixmap from the lip region map;
拼接处理单元,用于按照每一帧图像的时间顺序对所提取的特征列像素图进行拼接处理,获得唇部变化图。The splicing processing unit is configured to perform splicing processing on the extracted feature column pixmap according to the chronological order of each frame image to obtain a lip change map.
优选地,所述提取单元包括:Preferably, the extracting unit comprises:
位置确定单元,用于在所述唇部区域图中确定预设位置;a position determining unit, configured to determine a preset position in the lip area map;
纵轴确定单元,用于沿所述预设位置绘制纵轴;a vertical axis determining unit for drawing a vertical axis along the preset position;
特征列像素提取单元,用于提取由所述唇部区域图中位于所述纵轴的所有像素点构成的一列像素图作为特征列像素图。The feature column pixel extracting unit is configured to extract a column of pixel maps composed of all the pixels located on the vertical axis in the lip region map as a feature column pixel map.
优选地,所述预设位置为所述唇部区域图的中心像素点位置。Preferably, the preset position is a central pixel point position of the lip region map.
优选地,所述唇动识别单元包括:Preferably, the lip movement recognition unit comprises:
计算单元,用于计算所述唇部变化图的纹理特征,所述纹理特征包括LBP(Local Binary Patterns,局部二值模式)特征和/或HOG(Histogram of Oriented Gradient,方向梯度直方图)特征;a calculating unit, configured to calculate a texture feature of the lip variation map, the texture feature comprising an LBP (Local Binary Patterns) feature and/or a HOG (Histogram of Oriented Gradient) feature;
分类单元,用于采用预设分类算法对所述纹理特征进行分类,获得唇动识别结果,所述识别结果包括:发生唇动或未发生唇动。And a classification unit, configured to classify the texture features by using a preset classification algorithm, to obtain a lip recognition result, where the recognition result comprises: lip motion or lip motion does not occur.
实施本发明实施例,具有如下有益效果:Embodiments of the present invention have the following beneficial effects:
本发明实施例中,对视频所包含的每一帧图像进行人脸区域检测及唇部区域定位,并且从每一帧图像中提取唇部区域的特征列像素构建唇部变化图,由于唇部变化图来自于每一帧图像,这使得唇部变化图能够整体反映各图像组成的时间跨度;通过唇部变化图的纹理特征进行唇动识别获得识别结果,也就是依据时间跨度上的唇部变化识别唇动,能够避免唇部变化幅度的影响,识别效 率较高且识别结果准确度较高。In the embodiment of the present invention, the face region detection and the lip region localization are performed on each frame image included in the video, and the feature column pixels of the lip region are extracted from each frame image to construct a lip variation map, because the lip portion The change graph is from each frame image, which enables the lip change map to reflect the time span of each image composition as a whole; the lip motion recognition through the texture feature of the lip change map obtains the recognition result, that is, the lip according to the time span Change recognition lip movement, can avoid the influence of the amplitude of the lip change, identify the effect The rate is higher and the recognition result is more accurate.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图1为本发明实施例提供的一种图像处理方法的流程图;FIG. 1 is a flowchart of an image processing method according to an embodiment of the present invention;
图2为本发明实施例提供的一种互联网设备的结构示意图;2 is a schematic structural diagram of an Internet device according to an embodiment of the present invention;
图3为本发明实施例提供的一种图像处理装置的结构示意图。FIG. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明实施例中,对视频所包含的每一帧图像进行人脸区域检测及唇部区域定位,并且从每一帧图像中提取唇部区域的特征列像素构建唇部变化图,由于唇部变化图来自于每一帧图像,这使得唇部变化图能够整体反映各图像组成的时间跨度;通过唇部变化图的纹理特征进行唇动识别获得识别结果,也就是依据时间跨度上的唇部变化识别唇动,能够避免唇部变化幅度的影响,识别效率较高且识别结果准确度较高。In the embodiment of the present invention, the face region detection and the lip region localization are performed on each frame image included in the video, and the feature column pixels of the lip region are extracted from each frame image to construct a lip variation map, because the lip portion The change graph is from each frame image, which enables the lip change map to reflect the time span of each image composition as a whole; the lip motion recognition through the texture feature of the lip change map obtains the recognition result, that is, the lip according to the time span The change recognizes the lip movement, which can avoid the influence of the change range of the lip, the recognition efficiency is high and the recognition result is high.
本发明实施例的图像处理方法可以被应用于许多互联网场景中,例如:在语音输入场景中,可通过对用户说话视频进行唇动识别来控制语音的获取过程;再如:在身份认证场景中,可通过对用户说话视频进行唇动识别来确认合法用户身份,避免非法用户采用静态图片混淆视听;等等。同理,本发明实施例的图像处理装置可以被应用于互联网场景中的各个设备中,例如:可被应用于终端中,或者被应用于服务器中。 The image processing method of the embodiment of the present invention can be applied to many Internet scenarios, for example, in a voice input scenario, the voice acquisition process can be controlled by lip movement recognition of the user's speech video; for example, in an identity authentication scenario. The identity of the legitimate user can be confirmed by lip-moving recognition of the user's speaking video, and the illegal user is prevented from using the static picture to confuse the audiovisual; Similarly, the image processing apparatus of the embodiment of the present invention can be applied to various devices in an Internet scenario, for example, can be applied to a terminal or applied to a server.
基于上述描述,本发明实施例提供了一种图像处理方法,请参见图1,该方法可执行以下步骤S101-S103。Based on the foregoing description, an embodiment of the present invention provides an image processing method. Referring to FIG. 1, the method may perform the following steps S101-S103.
S101,在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域。S101: Detect a face region in each frame image included in the to-be-processed video, and locate a lip region from the face region.
待处理视频可以是实时录制的视频,例如:用户向终端发起语音输入请求时,终端可实时录制用户说话视频作为待处理视频。待处理视频也可以是接收到的实时视频,例如:服务器对终端侧用户进行身份认证时,服务器可接收终端实时录制的用户说话视频作为待处理视频。人脸检测技术是指采用一定的策略扫描确定所给定的图像中是否含有人脸,在确定含有后能够确定人脸在图像中的位置、大小和姿态。人脸配准技术是指采用一定的算法依据人脸的位置、大小和姿态清晰分辨出人脸的眼、鼻、唇部等轮廓。本实施例的方法在执行步骤S101的过程中具体涉及人脸检测技术和人脸配准技术;具体地,该方法在执行步骤S101时执行如下步骤s11-s13:The video to be processed may be a video recorded in real time. For example, when the user initiates a voice input request to the terminal, the terminal may record the user's speaking video as a to-be-processed video in real time. The to-be-processed video may also be the received real-time video. For example, when the server authenticates the terminal-side user, the server may receive the user-speaking video recorded by the terminal in real time as the to-be-processed video. The face detection technology refers to determining whether a given image contains a human face by using a certain strategy scan, and determining the position, size and posture of the face in the image after determining the inclusion. The face registration technique refers to the use of a certain algorithm to clearly distinguish the contours of the face, nose, and lips of the face according to the position, size, and posture of the face. The method of the present embodiment specifically relates to the face detection technology and the face registration technology in the process of performing step S101; specifically, the method performs the following steps s11-s13 when performing step S101:
s11,对待处理视频进行解析获得至少一帧图像。视频是由一帧一帧图像按照时间顺序构成的,因此,对待处理视频进行分帧处理即可得到一帧一帧图像。S11, parsing the processed video to obtain at least one frame of image. The video is composed of one frame and one frame of image in chronological order. Therefore, the frame to be processed can be framed to obtain one frame and one frame of image.
s12,采用人脸检测算法在每一帧图像中检测人脸区域。S12, using a face detection algorithm to detect a face region in each frame image.
人脸检测算法可包括但不限于:PCA(Principal Component Analysis,基于主成分分析)算法、基于弹性模型的方法、隐马尔可夫模型方法(Hidden Markov Model)等等。针对视频分帧处理获得的每一帧图像,采用人脸检测算法可确定出人脸区域,该人脸区域用于展示人脸在每一帧图像中的位置、大小及姿态。The face detection algorithm may include, but is not limited to, a PCA (Principal Component Analysis) algorithm, an elastic model based method, a Hidden Markov Model, and the like. For each frame image obtained by the video framing process, a face detection algorithm can be used to determine a face area, which is used to display the position, size and posture of the face in each frame image.
s13,采用人脸配准算法从所述人脸区域中定位唇部区域。S13, using a face registration algorithm to locate a lip region from the face region.
人脸配准算法可包括但不限于:Lasso整脸回归配准算法、小波域算法等等。针对每一帧图像中的人脸区域所展示的人脸位置、大小及姿态,采用人脸配准算法可精确定位唇部区域。The face registration algorithm may include, but is not limited to, a Lasso face regression registration algorithm, a wavelet domain algorithm, and the like. The face registration algorithm can accurately locate the lip region for the face position, size and posture of the face region displayed in each frame image.
S102,从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图。S102. Extract a feature column of the lip region from the image of each frame to construct a lip variation map.
所述唇部变化图要求从时间跨度上整体反映唇部变化。由于视频是由一帧一帧图像按照时间顺序构成,并且视频在该各帧图像组成的时间跨度上能够动 态反映唇部变化情况,因此,本步骤可以采用每一帧图像中的唇部区域的变化特征来构建唇部变化图。具体实现中,该方法在执行步骤S101时具体执行如下步骤s21-s23:The lip change map requires an overall reflection of the lip change over the time span. Since the video is composed of one frame and one frame of image in chronological order, and the video is movable over the time span of the frame of each frame The state reflects the change of the lip. Therefore, this step can construct the lip change map by using the changing characteristics of the lip region in each frame image. In a specific implementation, the method performs the following steps s21-s23 when performing step S101:
s21,在每一帧图像中截取唇部区域图。由于已从每一帧图像中精确定位唇部区域,本步骤s21中可从每一帧图像中直接截取唇部区域图,那么,第一帧图像中可截取到第一幅唇部区域图,第二帧图像中可截取到第二幅唇部区域图,以此类推。S21, the lip region map is intercepted in each frame image. Since the lip region has been accurately positioned from each frame image, the lip region map can be directly intercepted from each frame image in this step s21, and then the first lip region map can be intercepted in the first frame image. A second lip area map can be captured in the second frame image, and so on.
s22,从所述唇部区域图中提取特征列像素图。S22, extracting a feature column pixmap from the lip region map.
特征列像素点是指一帧图像中能够反映唇部变化特点的一列像素点,该特征列像素点形成的图像称为特征列像素图。具体实现中,该方法在执行步骤s22时具体执行如下步骤ss221-ss223:The feature column pixel refers to a column of pixels in a frame image that can reflect the characteristics of the lip change, and the image formed by the feature column is referred to as a feature column pixel map. In a specific implementation, the method performs the following steps ss221-ss223 when performing step s22:
ss221,在所述唇部区域图中确定预设位置。Ss221, determining a preset position in the lip area map.
所述预设位置可以为唇部区域图中任意像素点的位置,由于唇动时唇部中央的变化最为明显,因此,本发明实施例优选地,所述预设位置为所述唇部区域图的中心像素点位置。The preset position may be the position of any pixel point in the lip area map. The change of the center of the lip is most obvious when the lip is moved. Therefore, in the embodiment of the present invention, the preset position is the lip area. The center pixel position of the graph.
ss222,沿所述预设位置绘制纵轴。Ss222, drawing a vertical axis along the preset position.
ss223,提取由所述唇部区域图中位于所述纵轴的所有像素点构成的一列像素图作为特征列像素图。Ss223 extracts a column of pixel maps composed of all the pixels located on the vertical axis in the lip region map as a feature column pixel map.
唇动时唇部变化的直接表现为唇部张开,这属于唇部的纵向变化,因此步骤ss222-ss223中,可以沿预设位置纵向提取特征列像素图;可以理解的是,若该预设位置为唇部区域图的中心像素点位置,所提取的特征列像素图即为唇部区域中央的一列像素图。The change of the lip during lip movement is directly manifested by the opening of the lip, which belongs to the longitudinal change of the lip. Therefore, in step ss222-ss223, the characteristic column pixmap can be extracted longitudinally along the preset position; it can be understood that if the pre-prevention The position is the central pixel position of the lip region map, and the extracted feature column pixel map is a column of pixel maps in the center of the lip region.
s23,按照每一帧图像的时间顺序对所提取的特征列像素图进行拼接处理,获得唇部变化图。S23: Perform splicing processing on the extracted feature column pixmap according to the chronological order of each frame image to obtain a lip change map.
经过上述步骤s22可以从每一帧图像中的预设位置提取出特征列像素图,步骤s23将从各帧图像提取到的特征列像素图拼接后获得的唇部变化图,也反映了唇部的预设位置处的变化情况。以预设位置为唇部区域图的中心像素点位置为例:从第一帧图像中提取到唇部区域中央列像素图,可称为第一中央列像素图;从第二帧图像中也提取到唇部区域中央列像素图,可称为第二中央列像 素图;以此类推;那么,本步骤s23中的拼接处理可以为:将第二中央列像素图横向拼接于第一中央列像素图之后,将第三中央列像素图横向拼接于第二中央列像素图之后,以此类推从而形成唇部变化图,此唇部变化图反映了唇部中央的变化情况。After the above step s22, the feature column pixel map can be extracted from the preset position in each frame image, and the lip change map obtained by splicing the feature column pixel map extracted from each frame image also reflects the lip portion. The change in the preset position. Taking the preset position as the central pixel position of the lip region map as an example: the pixel map of the central column of the lip region extracted from the first frame image may be referred to as a first central column pixel map; Extracted into the central column pixel map of the lip region, which can be called the second central column image The splicing process in the step s23 may be: after the second central column pixel image is horizontally spliced into the first central column pixel image, the third central column pixel image is horizontally spliced to the second central portion. After the pixmap is listed, and so on, a lip change map is formed, which reflects the change in the center of the lip.
S103,根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果。S103: Perform lip movement recognition according to the texture feature of the lip change map to obtain a recognition result.
唇动识别是确认是否发生唇动的过程。该方法在执行步骤S103时具体执行如下步骤s31-s32:Lip recognition is the process of confirming whether or not lip movement occurs. The method specifically performs the following steps s31-s32 when performing step S103:
s31,计算所述唇部变化图的纹理特征,所述纹理特征包括但不限于:LBP特征和/或HOG特征。S31. Calculate texture features of the lip variation map, including but not limited to: LBP features and/or HOG features.
LBP特征可有效描述和度量图像局部的纹理信息,具备旋转不变性和灰度不变性等显著的优点;该方法在执行步骤s31的过程中,可以采用LBP算法来计算唇部变化图的LBP特征。HOG特征是一种在图像处理中用于进行物体检测的特征描述子;该方法在执行步骤s31的过程中,可以采用HOG算法来计算唇部变化图的HOG特征。可以理解的是,所述纹理特征还可包括诸如SIFT特征等其他特征,因此该方法在执行步骤s31的过程中还可采用其他算法来计算唇部变化图的纹理特征。The LBP feature can effectively describe and measure the local texture information of the image, and has significant advantages such as rotation invariance and gray invariance. In the process of performing step s31, the LBP algorithm can be used to calculate the LBP feature of the lip change map. . The HOG feature is a feature descriptor for performing object detection in image processing; in the process of performing step s31, the HOG algorithm may be used to calculate the HOG feature of the lip change map. It can be understood that the texture feature may also include other features such as SIFT features, so the method may also use other algorithms to calculate the texture features of the lip variation map during the execution of step s31.
s32,采用预设分类算法对所述纹理特征进行分类,获得唇动识别结果,所述识别结果包括:发生唇动或未发生唇动。S32: classify the texture feature by using a preset classification algorithm to obtain a lip recognition result, where the recognition result includes: lip motion or lip motion does not occur.
所述预设分类算法可包括但不限于:贝叶斯算法、逻辑回归算法及SVM(Support Vector Machine,支持向量机)算法。以SVM算法为例,将所述纹理特征作为输入参数代入SVM算法分类器中,则SVM算法分类器则可以输出分类结果(即唇动识别结果)。The preset classification algorithm may include, but is not limited to, a Bayesian algorithm, a logistic regression algorithm, and an SVM (Support Vector Machine) algorithm. Taking the SVM algorithm as an example, the texture feature is substituted into the SVM algorithm classifier as an input parameter, and the SVM algorithm classifier can output the classification result (ie, the lip recognition result).
本发明实施例通过运行图像处理方法,对视频所包含的每一帧图像进行人脸区域检测及唇部区域定位,并且从每一帧图像中提取唇部区域的特征列像素构建唇部变化图,由于唇部变化图来自于每一帧图像,这使得唇部变化图能够整体反映各图像组成的时间跨度;通过唇部变化图的纹理特征进行唇动识别获得识别结果,也就是依据时间跨度上的唇部变化识别唇动,能够避免唇部变化幅度的影响,识别效率较高且识别结果准确度较高。 In the embodiment of the present invention, the image processing method is used to perform face region detection and lip region localization for each frame image included in the video, and the feature column pixel of the lip region is extracted from each frame image to construct a lip variation map. Since the lip change map is from each frame image, the lip change map can reflect the time span of each image composition as a whole; the lip motion recognition through the texture feature of the lip change map obtains the recognition result, that is, according to the time span. The lip change on the upper side recognizes the lip movement, which can avoid the influence of the amplitude of the lip change, and the recognition efficiency is high and the recognition result is high.
基于上述实施例所示的图像处理方法,本发明实施例还提供了一种互联网设备,该互联网设备可以为终端或服务器;请参见图2,该互联网设备的内部结构可包括但不限于:处理器、用户接口、网络接口及存储器。其中,互联网设备内的处理器、用户接口、网络接口及存储器可通过总线或其他方式连接,在本发明实施例所示图2中以通过总线连接为例。The embodiment of the present invention further provides an Internet device, which may be a terminal or a server. Referring to FIG. 2, the internal structure of the Internet device may include, but is not limited to, processing. , user interface, network interface and memory. The processor, the user interface, the network interface, and the memory in the Internet device can be connected by a bus or other means. In FIG. 2 shown in the embodiment of the present invention, a bus connection is taken as an example.
其中,用户接口是实现用户与该互联网设备进行交互和信息交换的媒介,其具体体现可以包括用于输出的显示屏(Display)以及用于输入的键盘(Keyboard)等等,需要说明的是,此处的键盘既可以为实体键盘,也可以为触屏虚拟键盘,还可以为实体与触屏虚拟相结合的键盘。处理器(或称CPU(Central Processing Unit,中央处理器))是互联网设备的计算核心以及控制核心,其可以解析互联网设备内的各类指令以及处理各类数据。存储器(Memory)是互联网设备中的记忆设备,用于存放程序和数据。可以理解的是,此处的存储器可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;可选的还可以是至少一个位于远离前述处理器的存储装置。存储器提供存储空间,该存储空间存储了互联网设备的操作系统,还存储了图像处理装置。The user interface is a medium for realizing interaction and information exchange between the user and the Internet device, and the specific embodiment thereof may include a display for output and a keyboard for input, etc., it should be noted that The keyboard here can be either a physical keyboard or a touch screen virtual keyboard, or a keyboard combining physical and touch screen virtual. A processor (or CPU (Central Processing Unit)) is the computing core and control core of an Internet device, which can parse various types of instructions in the Internet device and process various types of data. Memory is a memory device in an Internet device that stores programs and data. It can be understood that the memory herein may be a high speed RAM memory, or may be a non-volatile memory, such as at least one disk memory; optionally, at least one storage located away from the foregoing processor. Device. The memory provides a storage space that stores an operating system of the Internet device and also stores an image processing device.
在本发明实施例中,互联网设备通过运行存储器中的图像处理装置可以执行上述图1所示方法流程的相应步骤。请一并参见图3,该图像处理装置运行如下单元:In the embodiment of the present invention, the Internet device can execute the corresponding steps of the method flow shown in FIG. 1 by running the image processing device in the memory. Referring to FIG. 3 together, the image processing apparatus operates as follows:
定位单元101,用于在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域。The locating unit 101 is configured to detect a face area in each frame image included in the to-be-processed video, and locate a lip area from the face area.
构建单元102,用于从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图。The building unit 102 is configured to extract a feature column pixel construction lip change map of the lip region from each frame image.
唇动识别单元103,用于根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果。The lip motion recognition unit 103 is configured to perform lip motion recognition according to the texture feature of the lip change map to obtain a recognition result.
具体实现中,该图像处理装置在运行定位单元101的过程中,具体运行如下单元:In a specific implementation, the image processing apparatus runs the following unit in the process of running the positioning unit 101:
解析单元1001,用于对待处理视频进行解析获得至少一帧图像。The parsing unit 1001 is configured to parse the video to be processed to obtain at least one frame of image.
人脸检测单元1002,用于采用人脸检测算法在每一帧图像中检测人脸区 域。The face detecting unit 1002 is configured to detect a face region in each frame image by using a face detection algorithm. area.
人脸配准单元1003,用于采用人脸配准算法从所述人脸区域中定位唇部区域。The face registration unit 1003 is configured to locate a lip region from the face region by using a face registration algorithm.
具体实现中,该图像处理装置在运行构建单元102的过程中,具体运行如下单元:In a specific implementation, the image processing apparatus runs the following units in the process of running the building unit 102:
截取单元2001,用于在每一帧图像中截取唇部区域图。The intercepting unit 2001 is configured to intercept a lip region map in each frame image.
提取单元2002,用于从所述唇部区域图中提取特征列像素图。The extracting unit 2002 is configured to extract a feature column pixmap from the lip region map.
拼接处理单元2003,用于按照每一帧图像的时间顺序对所提取的特征列像素图进行拼接处理,获得唇部变化图。The splicing processing unit 2003 is configured to perform splicing processing on the extracted feature column pixmap according to the chronological order of each frame image to obtain a lip change map.
具体实现中,该图像处理装置在运行提取单元2002的过程中,具体运行如下单元:In a specific implementation, the image processing apparatus runs the following unit in the process of running the extracting unit 2002:
位置确定单元2221,用于在所述唇部区域图中确定预设位置;优选地,所述预设位置为所述唇部区域图的中心像素点位置。The position determining unit 2221 is configured to determine a preset position in the lip area map; preferably, the preset position is a central pixel point position of the lip area map.
纵轴确定单元2222,用于沿所述预设位置绘制纵轴。The vertical axis determining unit 2222 is configured to draw a vertical axis along the preset position.
特征列像素提取单元2223,用于提取由所述唇部区域图中位于所述纵轴的所有像素点构成的一列像素图作为特征列像素图。The feature column pixel extracting unit 2223 is configured to extract, as a feature column pixmap, a column of pixel maps composed of all the pixels located on the vertical axis in the lip region map.
具体实现中,该图像处理装置在运行唇动识别单元103的过程中,具体运行如下单元:In a specific implementation, the image processing apparatus runs the following unit in the process of running the lip recognition unit 103:
计算单元3001,用于计算所述唇部变化图的纹理特征,所述纹理特征包括LBP特征和/或HOG特征。The calculating unit 3001 is configured to calculate a texture feature of the lip variation map, the texture feature including an LBP feature and/or an HOG feature.
分类单元3002,用于采用预设分类算法对所述纹理特征进行分类,获得唇动识别结果,所述识别结果包括:发生唇动或未发生唇动。The classification unit 3002 is configured to classify the texture features by using a preset classification algorithm to obtain a lip motion recognition result, where the recognition result includes: lip motion or lip motion does not occur.
与图2所示的方法同理,本发明实施例通过运行图像处理装置,对视频所包含的每一帧图像进行人脸区域检测及唇部区域定位,并且从每一帧图像中提取唇部区域的特征列像素构建唇部变化图,由于唇部变化图来自于每一帧图像,这使得唇部变化图能够整体反映各图像组成的时间跨度;通过唇部变化图的纹理特征进行唇动识别获得识别结果,也就是依据时间跨度上的唇部变化识别唇动,能够避免唇部变化幅度的影响,识别效率较高且识别结果准确度较高。In the same manner as the method shown in FIG. 2, the embodiment of the present invention performs face detection and lip region localization for each frame image included in the video by running the image processing device, and extracts the lip from each frame image. The feature column of the region constructs a lip variation map. Since the lip variation map is from each frame image, the lip variation map can reflect the time span of each image composition as a whole; the lip motion is performed by the texture feature of the lip variation map. Recognizing the recognition result, that is, recognizing the lip movement according to the change of the lip over the time span, can avoid the influence of the amplitude of the lip change, the recognition efficiency is high and the recognition result is high.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程, 是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。One of ordinary skill in the art can understand all or part of the process in implementing the above embodiments. This may be accomplished by a computer program instructing the associated hardware, which may be stored in a computer readable storage medium, which, when executed, may include the flow of an embodiment of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
以上所揭露的仅为本发明较佳实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。 The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and thus equivalent changes made in the claims of the present invention are still within the scope of the present invention.

Claims (18)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域;Detecting a face area in each frame image included in the to-be-processed video, and locating a lip area from the face area;
    从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图;Extracting a feature column pixel of the lip region from each of the frame images to construct a lip variation map;
    根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果。The lip motion recognition is performed according to the texture feature of the lip change map, and the recognition result is obtained.
  2. 如权利要求1所述的方法,其特征在于,所述在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域,包括:The method according to claim 1, wherein the detecting a face region in each frame image included in the video to be processed and locating the lip region from the face region comprises:
    对待处理视频进行解析获得至少一帧图像;Parsing the processed video to obtain at least one frame of image;
    采用人脸检测算法在每一帧图像中检测人脸区域;A face detection algorithm is used to detect a face region in each frame image;
    采用人脸配准算法从所述人脸区域中定位唇部区域。A lip registration region is used to locate the lip region from the face region.
  3. 如权利要求2所述的方法,其特征在于,所述从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图,包括:The method of claim 2, wherein the extracting the characteristic column of the lip region from the image of each frame to construct a lip variation map comprises:
    在每一帧图像中截取唇部区域图;Intercepting a lip region map in each frame of image;
    从所述唇部区域图中提取特征列像素图;Extracting a feature column pixmap from the lip region map;
    按照每一帧图像的时间顺序对所提取的特征列像素图进行拼接处理,获得唇部变化图。The extracted feature column pixel map is spliced according to the chronological order of each frame image to obtain a lip change map.
  4. 如权利要求3所述的方法,其特征在于,所述从所述唇部区域图中提取特征列像素图,包括:The method of claim 3, wherein the extracting the feature column pixmap from the lip region map comprises:
    在所述唇部区域图中确定预设位置;Determining a preset position in the lip region map;
    沿所述预设位置绘制纵轴;Drawing a vertical axis along the preset position;
    提取由所述唇部区域图中位于所述纵轴的所有像素点构成的一列像素图作为特征列像素图。A column of pixel maps composed of all the pixels located on the vertical axis in the lip region map is extracted as a feature column pixmap.
  5. 如权利要求4所述的方法,其特征在于,所述预设位置为所述唇部区域图的中心像素点位置。 The method of claim 4 wherein said predetermined position is a central pixel point location of said lip region map.
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果,包括:The method according to any one of claims 1 to 5, wherein the lip movement recognition is performed according to the texture feature of the lip change map, and the recognition result is obtained, including:
    计算所述唇部变化图的纹理特征,所述纹理特征包括LBP特征和/或HOG特征;Calculating a texture feature of the lip variation map, the texture feature comprising an LBP feature and/or an HOG feature;
    采用预设分类算法对所述纹理特征进行分类,获得唇动识别结果,所述识别结果包括:发生唇动或未发生唇动。The texture features are classified by using a preset classification algorithm to obtain a lip motion recognition result, and the recognition result includes: lip movement or lip motion does not occur.
  7. 一种图像处理装置,其特征在于,包括:An image processing apparatus, comprising:
    定位单元,用于在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域;a positioning unit, configured to detect a face area in each frame image included in the to-be-processed video, and locate a lip area from the face area;
    构建单元,用于从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图;a building unit, configured to extract a feature column pixel of the lip region from the image of each frame to construct a lip variation map;
    唇动识别单元,用于根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果。a lip motion recognition unit configured to perform lip motion recognition according to the texture feature of the lip change map to obtain a recognition result.
  8. 如权利要求7所述的装置,其特征在于,所述定位单元包括:The device according to claim 7, wherein the positioning unit comprises:
    解析单元,用于对待处理视频进行解析获得至少一帧图像;a parsing unit, configured to parse the video to be processed to obtain at least one frame of image;
    人脸检测单元,用于采用人脸检测算法在每一帧图像中检测人脸区域;a face detecting unit, configured to detect a face region in each frame image by using a face detection algorithm;
    人脸配准单元,用于采用人脸配准算法从所述人脸区域中定位唇部区域。A face registration unit is configured to locate a lip region from the face region using a face registration algorithm.
  9. 如权利要求8所述的装置,其特征在于,所述构建单元包括:The apparatus of claim 8 wherein said building unit comprises:
    截取单元,用于在每一帧图像中截取唇部区域图;An intercepting unit for intercepting a lip region map in each frame image;
    提取单元,用于从所述唇部区域图中提取特征列像素图;An extracting unit, configured to extract a feature column pixmap from the lip region map;
    拼接处理单元,用于按照每一帧图像的时间顺序对所提取的特征列像素图进行拼接处理,获得唇部变化图。The splicing processing unit is configured to perform splicing processing on the extracted feature column pixmap according to the chronological order of each frame image to obtain a lip change map.
  10. 如权利要求9所述的装置,其特征在于,所述提取单元包括:The apparatus according to claim 9, wherein said extracting unit comprises:
    位置确定单元,用于在所述唇部区域图中确定预设位置; a position determining unit, configured to determine a preset position in the lip area map;
    纵轴确定单元,用于沿所述预设位置绘制纵轴;a vertical axis determining unit for drawing a vertical axis along the preset position;
    特征列像素提取单元,用于提取由所述唇部区域图中位于所述纵轴的所有像素点构成的一列像素图作为特征列像素图。The feature column pixel extracting unit is configured to extract a column of pixel maps composed of all the pixels located on the vertical axis in the lip region map as a feature column pixel map.
  11. 如权利要求10所述的装置,其特征在于,所述预设位置为所述唇部区域图的中心像素点位置。The device of claim 10 wherein said predetermined position is a central pixel point location of said lip region map.
  12. 如权利要求7-11任一项所述的装置,其特征在于,所述唇动识别单元包括:The device according to any one of claims 7 to 11, wherein the lip movement recognition unit comprises:
    计算单元,用于计算所述唇部变化图的纹理特征,所述纹理特征包括LBP特征和/或HOG特征;a calculating unit, configured to calculate a texture feature of the lip variation map, the texture feature comprising an LBP feature and/or an HOG feature;
    分类单元,用于采用预设分类算法对所述纹理特征进行分类,获得唇动识别结果,所述识别结果包括:发生唇动或未发生唇动。And a classification unit, configured to classify the texture features by using a preset classification algorithm, to obtain a lip recognition result, where the recognition result comprises: lip motion or lip motion does not occur.
  13. 一种互联网设备,包括:An internet device, including:
    存储器,存储一组程序代码;以及a memory that stores a set of program code;
    处理器,用于执行所述程序代码以执行以下操作:a processor for executing the program code to perform the following operations:
    在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域;Detecting a face area in each frame image included in the to-be-processed video, and locating a lip area from the face area;
    从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图;Extracting a feature column pixel of the lip region from each of the frame images to construct a lip variation map;
    根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果。The lip motion recognition is performed according to the texture feature of the lip change map, and the recognition result is obtained.
  14. 如权利要求13所述的互联网设备,其特征在于,所述在待处理视频所包含的每一帧图像中检测人脸区域,并从所述人脸区域中定位唇部区域,包括:The Internet device according to claim 13, wherein the detecting a face region in each frame image included in the to-be-processed video and locating the lip region from the face region comprises:
    对待处理视频进行解析获得至少一帧图像;Parsing the processed video to obtain at least one frame of image;
    采用人脸检测算法在每一帧图像中检测人脸区域;A face detection algorithm is used to detect a face region in each frame image;
    采用人脸配准算法从所述人脸区域中定位唇部区域。 A lip registration region is used to locate the lip region from the face region.
  15. 如权利要求14所述的互联网设备,其特征在于,所述从所述每一帧图像中提取唇部区域的特征列像素构建唇部变化图,包括:The Internet device according to claim 14, wherein the feature column pixel construction lip variation map for extracting a lip region from each frame image comprises:
    在每一帧图像中截取唇部区域图;Intercepting a lip region map in each frame of image;
    从所述唇部区域图中提取特征列像素图;Extracting a feature column pixmap from the lip region map;
    按照每一帧图像的时间顺序对所提取的特征列像素图进行拼接处理,获得唇部变化图。The extracted feature column pixel map is spliced according to the chronological order of each frame image to obtain a lip change map.
  16. 如权利要求15所述的互联网设备,其特征在于,所述从所述唇部区域图中提取特征列像素图,包括:The Internet device according to claim 15, wherein the extracting the feature column pixmap from the lip region map comprises:
    在所述唇部区域图中确定预设位置;Determining a preset position in the lip region map;
    沿所述预设位置绘制纵轴;Drawing a vertical axis along the preset position;
    提取由所述唇部区域图中位于所述纵轴的所有像素点构成的一列像素图作为特征列像素图。A column of pixel maps composed of all the pixels located on the vertical axis in the lip region map is extracted as a feature column pixmap.
  17. 如权利要求16所述的互联网设备,其特征在于,所述预设位置为所述唇部区域图的中心像素点位置。The Internet device according to claim 16, wherein said preset position is a central pixel point position of said lip region map.
  18. 如权利要求13-17任一项所述的互联网设备,其特征在于,所述根据所述唇部变化图的纹理特征进行唇动识别,获得识别结果,包括:The Internet device according to any one of claims 13 to 17, wherein the lip recognition is performed according to the texture feature of the lip change map, and the recognition result is obtained, including:
    计算所述唇部变化图的纹理特征,所述纹理特征包括LBP特征和/或HOG特征;Calculating a texture feature of the lip variation map, the texture feature comprising an LBP feature and/or an HOG feature;
    采用预设分类算法对所述纹理特征进行分类,获得唇动识别结果,所述识别结果包括:发生唇动或未发生唇动。 The texture features are classified by using a preset classification algorithm to obtain a lip motion recognition result, and the recognition result includes: lip movement or lip motion does not occur.
PCT/CN2016/079163 2015-11-25 2016-04-13 Image processing method and apparatus WO2017107345A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/680,976 US10360441B2 (en) 2015-11-25 2017-08-18 Image processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510996643.0A CN106919891B (en) 2015-12-26 2015-12-26 A kind of image processing method and device
CN201510996643.0 2015-12-26

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/106752 Continuation WO2017088727A1 (en) 2015-11-25 2016-11-22 Image processing method and apparatus

Related Child Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2016/106752 Continuation WO2017088727A1 (en) 2015-11-25 2016-11-22 Image processing method and apparatus
US15/680,976 Continuation US10360441B2 (en) 2015-11-25 2017-08-18 Image processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2017107345A1 true WO2017107345A1 (en) 2017-06-29

Family

ID=59088924

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/079163 WO2017107345A1 (en) 2015-11-25 2016-04-13 Image processing method and apparatus

Country Status (2)

Country Link
CN (1) CN106919891B (en)
WO (1) WO2017107345A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966654A (en) * 2021-03-29 2021-06-15 深圳市优必选科技股份有限公司 Lip movement detection method and device, terminal equipment and computer readable storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679449B (en) * 2017-08-17 2018-08-03 平安科技(深圳)有限公司 Lip motion method for catching, device and storage medium
CN108763897A (en) * 2018-05-22 2018-11-06 平安科技(深圳)有限公司 Method of calibration, terminal device and the medium of identity legitimacy
CN109460713B (en) * 2018-10-16 2021-03-30 京东数字科技控股有限公司 Identification method, device and equipment for animal parturition
CN111259711A (en) * 2018-12-03 2020-06-09 北京嘀嘀无限科技发展有限公司 Lip movement identification method and system
CN111931662A (en) * 2020-08-12 2020-11-13 中国工商银行股份有限公司 Lip reading identification system and method and self-service terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6421453B1 (en) * 1998-05-15 2002-07-16 International Business Machines Corporation Apparatus and methods for user recognition employing behavioral passwords
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN104200146A (en) * 2014-08-29 2014-12-10 华侨大学 Identity verifying method with video human face and digital lip movement password combined
CN104361276A (en) * 2014-11-18 2015-02-18 新开普电子股份有限公司 Multi-mode biometric authentication method and multi-mode biometric authentication system
CN104838339A (en) * 2013-01-07 2015-08-12 日立麦克赛尔株式会社 Portable terminal device and information processing system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1839410B (en) * 2003-07-18 2015-05-20 佳能株式会社 Image processor, imaging apparatus and image processing method
JP2006259900A (en) * 2005-03-15 2006-09-28 Omron Corp Image processing system, image processor and processing method, recording medium, and program
US9110501B2 (en) * 2012-04-17 2015-08-18 Samsung Electronics Co., Ltd. Method and apparatus for detecting talking segments in a video sequence using visual cues
CN104331160A (en) * 2014-10-30 2015-02-04 重庆邮电大学 Lip state recognition-based intelligent wheelchair human-computer interaction system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6421453B1 (en) * 1998-05-15 2002-07-16 International Business Machines Corporation Apparatus and methods for user recognition employing behavioral passwords
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN104838339A (en) * 2013-01-07 2015-08-12 日立麦克赛尔株式会社 Portable terminal device and information processing system
CN104200146A (en) * 2014-08-29 2014-12-10 华侨大学 Identity verifying method with video human face and digital lip movement password combined
CN104361276A (en) * 2014-11-18 2015-02-18 新开普电子股份有限公司 Multi-mode biometric authentication method and multi-mode biometric authentication system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966654A (en) * 2021-03-29 2021-06-15 深圳市优必选科技股份有限公司 Lip movement detection method and device, terminal equipment and computer readable storage medium
CN112966654B (en) * 2021-03-29 2023-12-19 深圳市优必选科技股份有限公司 Lip movement detection method, lip movement detection device, terminal equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN106919891B (en) 2019-08-23
CN106919891A (en) 2017-07-04

Similar Documents

Publication Publication Date Title
US10438077B2 (en) Face liveness detection method, terminal, server and storage medium
WO2017107345A1 (en) Image processing method and apparatus
US11727663B2 (en) Method and apparatus for detecting face key point, computer device and storage medium
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
CN109902630B (en) Attention judging method, device, system, equipment and storage medium
US10636152B2 (en) System and method of hybrid tracking for match moving
US10769496B2 (en) Logo detection
US20180260643A1 (en) Verification method and system
WO2017088727A1 (en) Image processing method and apparatus
US10360441B2 (en) Image processing method and apparatus
WO2019242672A1 (en) Method, device and system for target tracking
CN106778453B (en) Method and device for detecting glasses wearing in face image
WO2020056903A1 (en) Information generating method and device
JP2011123529A (en) Information processing apparatus, information processing method, and program
WO2020007191A1 (en) Method and apparatus for living body recognition and detection, and medium and electronic device
WO2020052062A1 (en) Detection method and device
US20230306792A1 (en) Spoof Detection Based on Challenge Response Analysis
WO2020164277A1 (en) Monitoring method and apparatus based on audio and video linkage, and terminal device and medium
CN114549557A (en) Portrait segmentation network training method, device, equipment and medium
US20230237837A1 (en) Extracting Facial Imagery from Online Sessions
US20220207917A1 (en) Facial expression image processing method and apparatus, and electronic device
WO2024131291A1 (en) Face liveness detection method and apparatus, device, and storage medium
US20220122341A1 (en) Target detection method and apparatus, electronic device, and computer storage medium
US10140727B2 (en) Image target relative position determining method, device, and system thereof
US11481940B2 (en) Structural facial modifications in images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16877159

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 13/11/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16877159

Country of ref document: EP

Kind code of ref document: A1