WO2022198751A1 - Rapid facial detection method based on multi-layer preprocessing - Google Patents

Rapid facial detection method based on multi-layer preprocessing Download PDF

Info

Publication number
WO2022198751A1
WO2022198751A1 PCT/CN2021/091026 CN2021091026W WO2022198751A1 WO 2022198751 A1 WO2022198751 A1 WO 2022198751A1 CN 2021091026 W CN2021091026 W CN 2021091026W WO 2022198751 A1 WO2022198751 A1 WO 2022198751A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
coordinates
skin color
measured
pixel
Prior art date
Application number
PCT/CN2021/091026
Other languages
French (fr)
Chinese (zh)
Inventor
张晖
叶子皓
赵海涛
孙雁飞
朱洪波
Original Assignee
南京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京邮电大学 filed Critical 南京邮电大学
Priority to JP2022512825A priority Critical patent/JP7335018B2/en
Publication of WO2022198751A1 publication Critical patent/WO2022198751A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to the field of target detection, in particular to a method for detecting faces quickly and accurately through multi-layer preprocessing.
  • Face recognition technology is an important technology that is widely used in various monitoring, security, personnel management and image production fields. Face recognition technology includes two parts: face detection and distinction. Face detection refers to finding the positions of all faces in the image, while face discrimination can determine whether two faces are the same person. Face detection is the basis of face recognition technology, because the next step can only be done if the position of all faces is found first.
  • Face detection as a subfield of object detection, has many mature algorithms, such as Haar cascade classifiers that combine digital image features and classification algorithms or convolutional neural networks in the field of deep learning.
  • the convolutional neural network as one of the most advanced algorithms at present, performs very well in the face detection problem.
  • Various optimally designed and fully trained convolutional neural networks can detect faces with high accuracy in various lighting, angles, and even partial occlusions.
  • Exemplary embodiments of the present application provide a fast face detection method based on multi-layer preprocessing, which combines multiple image processing methods and convolutional neural network technologies, and aims to solve the problem that the operation of convolutional neural networks is relatively slow.
  • a method for fast face detection based on multi-layer preprocessing is provided, and the specific operation steps are as follows:
  • S102 utilize the ellipse skin color model to judge whether all the pixels in the image obtained in S101 are skin color pixels one by one, to obtain the skin color area, wherein when the blue chromaticity and red chromaticity components of any pixel meet the requirements of the elliptical skin color model, then the described The pixel is judged to be the skin color pixel;
  • S104 carry out effective search position filtering to the described processed skin color region that is processed in S103, obtain effective search position, utilize contour extraction technology to extract the contour of effective search position, corresponding each contour generates a frame to be measured;
  • S105 use a convolutional neural network with a face detection function to detect the frames to be tested obtained in S104 one by one, and provide the face location coordinates in the frame to be tested;
  • S106 Determine the coordinates of the face positioning frame according to the coordinates of the frame to be measured and the face positioning coordinates in the frame to be measured.
  • the elliptical skin tone model requires:
  • Cb represents the blue chrominance component of the pixel
  • Cr represents the red chrominance component of the pixel
  • the step of performing effective search position filtering on the processed skin tone region includes:
  • Effective search position filtering is performed on the processed skin color region using a filter matrix, wherein pixel values in the processed skin color region, pixel values in the filter matrix, and pixel values in the effective search positions satisfy the following formula:
  • dst(i,j) is the pixel value of the coordinate (i,j) in the effective search position dst
  • src(i+x,j+y) is the coordinate (i+x,j+y) in the skin color area src Pixel value
  • f(x,y) is the pixel value of the coordinate (x,y) in the filter matrix f
  • the size of the filter matrix f is (2a+1) ⁇ (2b+1)
  • the center coordinate is (0,0)
  • t is the preset effective search rate ESR threshold
  • area is the number of pixels whose value is 1 in the filter matrix f.
  • the coordinates of the upper left corner (left, top) and the coordinates of the lower right corner (right, bottom) of the frame to be tested are:
  • (left', top'), (right', bottom') are the coordinates of the upper left corner and the lower right corner of the circumscribed rectangle of the outline, respectively.
  • the effective search rate is defined as the ratio of the area of the skin color region in the frame to be measured to the area of the frame to be measured.
  • the step of converting the image to be detected from the RGB color space to the YCbCr color space includes:
  • the color space conversion is performed on the to-be-detected image using the following formula:
  • Y, Cb, and Cr represent the luminance, blue chrominance components, and red chrominance components of the pixel, respectively, and R, G, and B represent the red, green, and blue components of the pixel, respectively.
  • the step of performing morphological processing on the skin color region includes: removing free skin color points and thin line structures through an opening operation.
  • the step of performing morphological processing on the skin color region further includes: filling holes and bridging gaps through a closing operation.
  • the frame to be tested includes at least frame A to be tested and frame B to be tested, and the S104 further includes:
  • the frames A and B to be measured are merged, wherein if the area of the frame to be measured C obtained after the frames A and B to be measured are merged is less than or equal to the sum of the areas of the frames to be measured A and B, then The to-be-measured frame A and B are merged, otherwise the to-be-measured frame A and B are not merged.
  • the coordinates of the upper left corner (l C , t C ) and the coordinates of the lower right corner (r C , b C ) of the frame C to be tested are:
  • (l A , t A ) and (r A , b A ) are the coordinates of the upper left corner and the lower right corner of the frame A to be tested, respectively
  • (l B , t B ) and (r B , b B ) are the coordinates of the upper left corner and the lower right corner of the frame A to be tested, respectively.
  • the coordinates of the upper left corner and the lower right corner of the face positioning frame in S106 are respectively:
  • (l C , t C ), (r C , b C ) are the coordinates of the upper left corner and the lower right corner of the frame to be tested c, respectively, (l', t'), (r', b') are the volumes The coordinates of the upper left corner and the lower right corner of a face location in the frame c to be tested output by the product neural network.
  • the effective search rate is defined as the ratio of the area of the skin color region in the frame to be measured to the area of the frame to be measured.
  • a computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the steps of the method described in any of the foregoing embodiments are implemented.
  • the present application can reduce the size of the search area required by the multi-layer preprocessing technology while retaining the high accuracy of the face detection convolutional neural network, thereby greatly improving its running speed.
  • FIG. 1 is a flowchart of a fast face detection method based on multi-layer preprocessing according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of efficient search position filtering (ESPF filtering) according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of generating a frame to be tested according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of merging frames to be tested according to an embodiment of the present application.
  • convolutional neural networks can detect faces with high accuracy in various lighting, angles and even partial occlusions.
  • convolutional neural networks also have their own shortcomings, that is, fast operations are very dependent on GPUs with powerful floating-point computing capabilities. Limited by cost, volume and power, it is difficult to support the fast operation of convolutional neural networks on small edge terminals.
  • a method for fast face detection based on multi-layer preprocessing specifically includes the following operation steps.
  • S101 perform color space conversion on an input image (image to be detected), from a default RGB color space to a YCbCr color space. This is because YCbCr separates the brightness and chromaticity of the color, so it is more suitable for scenes that classify colors under different lighting conditions.
  • S102 utilize the ellipse skin color model to judge whether all the pixels in the image obtained in S101 are skin color pixels one by one, to obtain the skin color area, wherein when the blue chromaticity and red chromaticity components of any pixel meet the requirements of the elliptical skin color model, then the described The pixel is judged to be the skin color pixel.
  • the skin color in the YCbCr space approximately presents an elliptical columnar distribution, that is, the distribution of the skin color in the CbCr plane is close to an ellipse.
  • the center position of the skin ellipse is (155,113)
  • the length of the long axis is 30
  • the length of the short axis is 20
  • the inclination angle is 45° ( Anticlockwise rotation). Therefore, the skin color ellipse equation is:
  • the pixel can be considered as a skin color pixel.
  • the skin color region (or skin color mask) can be obtained by judging with formula 3.
  • Morphological operations are a series of techniques in the field of image processing to process the shape features of binarized images.
  • the basic idea is to modify the pixel values in the image by using a specific shape of structural elements and rules, so as to achieve the effects of eliminating noise, filling pores, trimming burrs and smoothing edges for further image analysis and target recognition.
  • Basic morphological operations include Erosion and Dilation. Erosion is used to remove fine structures such as noise and glitches, while dilation is used to fill holes and gaps.
  • the structuring element is slid pixel by pixel on the input image, and all the input image pixels whose 1-values are opposite in the structuring element are called corresponding pixels.
  • the pixel facing the anchor point of the structuring element is expressed by the formula as follows:
  • dst, src, and E represent the output image, input image and structuring element, respectively.
  • the structuring element takes the anchor point as the coordinate center, (i, j) is the anchor position coordinate of the current structuring element, and (x, y) is the structuring element.
  • the offset of the relative anchor point. Equation 4 shows that during the erosion process, only when the 1-valued region of the structuring element is completely covered by the 1-valued region of the input image, the pixel value of the anchor point position of the output image is 1. This causes the contours of the 1-value areas of the image to shrink, that is, visually the 1-value areas appear to be eroded.
  • the dilation operation is similar to the erosion operation, except that the minimum value becomes the maximum value, and its formula is as follows:
  • Equation 5 shows that in the dilation process, only when the 1-valued region of the structuring element is completely covered by the 0-valued region of the input image, the pixel value at the anchor point position of the output image is 0. This will cause the contours of the 1-valued areas of the image to expand, that is, visually the 1-valued areas appear to be inflated. Erosion and swelling can cause large changes in the area of skin tone areas.
  • Opening and Closing refers to successively eroding and dilating the image with the same structural element.
  • the ON operation can disconnect small connections and remove noise.
  • the closing operation refers to the expansion first and then the corrosion, which can connect the adjacent areas and fill the pores. Morphological processing is performed on the obtained skin color area, the free skin color points and thin line structures are removed by the opening operation, and the holes in the smaller skin color area are filled and small gaps are bridged by the closing operation.
  • the opening and closing operations have little effect on the area of the skin tone area while removing noise and filling pores.
  • the skin color mask obtained in S102 is respectively opened and closed to obtain the final skin color mask.
  • S104 Perform effective search position filtering on the processed skin color region obtained in S103 to obtain an effective search position, extract the contour of the effective search position by using a contour extraction technique, and generate a frame to be measured corresponding to each contour.
  • Effective search position filtering (Effective Search Position Filtering, ESPF) is performed on the finally obtained skin color area to obtain all effective search position pixel areas.
  • ESPF filtering is a special image filtering operation, which uses an ellipse-shaped filtering matrix and an effective search rate (Effective Search Rate, ESR)-based filtering calculation operation.
  • the effective search rate is defined as the ratio of the area A s of the skin color area in the frame to be measured to the area of the frame to be measured A r , and its formula is as follows:
  • the ESPF calculation process can be expressed as:
  • dst, src and f in the formula are the output image, input image and filter matrix, respectively.
  • the size of the filter matrix is (2a+1) ⁇ (2b+1), the center coordinate is (0, 0), t is the preset ESR threshold, and area is the number of 1-valued pixels in the filter matrix.
  • the filter matrix used in EPF filtering is an ellipse matrix, in which the 1 values are arranged as a regular ellipse inscribed in the rectangle, as shown in the filter matrix in Figure 2.
  • the output image of ESPF filtering is the effective search position, and then the contour of the effective search position is extracted by contour extraction technology, and a test frame is generated for each contour.
  • the frame to be measured is obtained by extending a rectangle circumscribing a contour to the surroundings by a certain distance. The four sides of the circumscribed rectangle are all circumscribed to the contour and each side is parallel to each side of the image. The extension distance is equal to half the size of the filter matrix.
  • each frame to be tested obtained after EPF filtering has a higher ESR.
  • non-face skin color parts such as small-area skin-color areas and narrow and long skin-color areas are eliminated by ESPF filtering, and the problem of skin color area connectivity is also solved.
  • S105 use a convolutional neural network with a face detection function to detect the to-be-measured frames obtained in S104 one by one, and provide the face location coordinates in the to-be-measured frame.
  • Figure 4 shows the effect of merging the frames to be tested, in which two pairs of frames to be tested that overlap in large areas are merged, which further reduces the area to be searched by the convolutional neural network and improves the search efficiency.
  • Step 7 the convolutional neural network will output the coordinates of all face positioning frames in the frame to be measured relative to the frame to be measured. If the coordinates of the upper left corner and the lower right corner of the frame to be measured are (l C , t C ) and (r C , b C ), the convolutional neural network outputs the coordinates of the upper left corner and the lower right corner of a certain face positioning frame as (l', t') and (r', b'), then the actual coordinates of the upper left corner and lower right corner of the face positioning frame They are:
  • steps in the flowchart of FIG. 1 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIG. 1 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these sub-steps or stages The sequence is also not necessarily sequential, but may be performed alternately or alternately with other steps or sub-steps of other steps or at least a portion of a phase.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a facial detection method, comprising: performing color space conversion on an input original image; extracting a skin color region from the image by using an elliptical skin color model; correcting the skin color region by means of a morphological operation; by means of an effective search position filtering method, generating boxes to be subjected to detection; merging boxes to be subjected to detection that have an excessively high degree of overlapping; performing detection on the boxes to be subjected to detection one by one by using a convolutional neural network; and calculating the coordinates of a final face locating box.

Description

基于多层预处理的快速人脸检测方法A fast face detection method based on multi-layer preprocessing
相关申请Related applications
本申请要求于2021年3月25日提交中国专利局、申请号为2021103222047、申请名称为“一种基于多层预处理的快速人脸检测方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on March 25, 2021 with the application number 2021103222047 and titled "A Fast Face Detection Method Based on Multi-layer Preprocessing", the entire contents of which are approved by Reference is incorporated in this application.
技术领域technical field
本申请涉及目标检测领域,具体涉及一种通过多层预处理达到快速且精准地检测人脸的方法。The present application relates to the field of target detection, in particular to a method for detecting faces quickly and accurately through multi-layer preprocessing.
背景技术Background technique
人脸识别技术是一种被广泛应用于各种监控、安保、人员管理以及影像制作领域的重要技术。人脸识别技术包含了人脸的检测和区分两部分,其中人脸检测指的是找到图像中所有人脸出现的位置,而人脸区分则是可以判断两张人脸是否是同一人。人脸检测是人脸识别技术的基础,因为只有先找到所有人脸的位置才能进行下一步的处理。Face recognition technology is an important technology that is widely used in various monitoring, security, personnel management and image production fields. Face recognition technology includes two parts: face detection and distinction. Face detection refers to finding the positions of all faces in the image, while face discrimination can determine whether two faces are the same person. Face detection is the basis of face recognition technology, because the next step can only be done if the position of all faces is found first.
人脸检测作为目标检测领域的一个子领域,已经有了许多成熟的算法,如通过结合数字图像特征和分类算法的Haar级联分类器或者深度学习领域中的卷积神经网络。其中卷积神经网络作为目前最先进的算法之一,在人脸检测问题中的表现十分优异。各种经过优化设计和充分训练的卷积神经网络能够十分精准地检测出各种光照、角度甚至部分遮挡情况下的人脸。Face detection, as a subfield of object detection, has many mature algorithms, such as Haar cascade classifiers that combine digital image features and classification algorithms or convolutional neural networks in the field of deep learning. Among them, the convolutional neural network, as one of the most advanced algorithms at present, performs very well in the face detection problem. Various optimally designed and fully trained convolutional neural networks can detect faces with high accuracy in various lighting, angles, and even partial occlusions.
发明内容SUMMARY OF THE INVENTION
本申请各示例性实施例提供了一种基于多层预处理的快速人脸检测方法,其结合多种图像处理方法和卷积神经网络技术,目的在于解决卷积神经网络运算较为缓慢的问题。Exemplary embodiments of the present application provide a fast face detection method based on multi-layer preprocessing, which combines multiple image processing methods and convolutional neural network technologies, and aims to solve the problem that the operation of convolutional neural networks is relatively slow.
本申请的一方面,提供一种基于多层预处理的快速人脸检测方法,具体操作步骤如下:In one aspect of the present application, a method for fast face detection based on multi-layer preprocessing is provided, and the specific operation steps are as follows:
S101,将待检测图像从RGB颜色空间转换至YCbCr颜色空间;S101, converting the image to be detected from the RGB color space to the YCbCr color space;
S102,利用椭圆肤色模型对S101得到的图像中所有像素逐个判断是否为肤色像素,以得到肤色区域,其中当任一像素的蓝色色度、红色色度分量满足椭圆肤色模型要求时,则所述像素被判断为所述肤色像素;S102, utilize the ellipse skin color model to judge whether all the pixels in the image obtained in S101 are skin color pixels one by one, to obtain the skin color area, wherein when the blue chromaticity and red chromaticity components of any pixel meet the requirements of the elliptical skin color model, then the described The pixel is judged to be the skin color pixel;
S103,对S102得到的所述肤色区域进行形态学处理以得到经处理的肤色区域;S103, performing morphological processing on the skin color region obtained in S102 to obtain the processed skin color region;
S104,对S103中处理得到的所述经处理的肤色区域进行有效搜索位置滤波,得到有效搜索位置,利用轮廓提取技术提取有效搜索位置的轮廓,对应每个轮廓生成一个待测框;S104, carry out effective search position filtering to the described processed skin color region that is processed in S103, obtain effective search position, utilize contour extraction technology to extract the contour of effective search position, corresponding each contour generates a frame to be measured;
S105,使用具有人脸检测功能的卷积神经网络,对S104得到的所述待测框进行逐个检测,并给出所述待测框中的人脸定位坐标;S105, use a convolutional neural network with a face detection function to detect the frames to be tested obtained in S104 one by one, and provide the face location coordinates in the frame to be tested;
S106,根据所述待测框的坐标和所述待测框中的所述人脸定位坐标,确定人脸定位框的坐标。S106: Determine the coordinates of the face positioning frame according to the coordinates of the frame to be measured and the face positioning coordinates in the frame to be measured.
在一实施例中,所述椭圆肤色模型要求:In one embodiment, the elliptical skin tone model requires:
Cr(13Cr-10Cb-2900)+Cb(13Cb-1388)+295972≤0Cr(13Cr-10Cb-2900)+Cb(13Cb-1388)+295972≤0
其中,Cb表示像素的蓝色色度分量,Cr表示像素的红色色度分量。Among them, Cb represents the blue chrominance component of the pixel, and Cr represents the red chrominance component of the pixel.
在一实施例中,所述对所述经处理的肤色区域进行有效搜索位置滤波的步骤,包括:In one embodiment, the step of performing effective search position filtering on the processed skin tone region includes:
使用滤波矩阵对所述经处理的肤色区域进行有效搜索位置滤波,其中所述经处理的肤色区域中的像素值、所述滤波矩阵中的像素值以及所述有效搜索位置中的像素值满足以下公式:Effective search position filtering is performed on the processed skin color region using a filter matrix, wherein pixel values in the processed skin color region, pixel values in the filter matrix, and pixel values in the effective search positions satisfy the following formula:
Figure PCTCN2021091026-appb-000001
Figure PCTCN2021091026-appb-000001
其中,dst(i,j)为有效搜索位置dst中坐标(i,j)的像素值,src(i+x,j+y)为肤色区域src中坐标(i+x,j+y)的像素值,f(x,y)为滤波矩阵f中坐标(x,y)的像素值,滤波矩阵f的尺寸为(2a+1)×(2b+1)、中心坐标为(0,0),t为预设的有效 搜索率ESR阈值,area为滤波矩阵f中值为1的像素数量。Among them, dst(i,j) is the pixel value of the coordinate (i,j) in the effective search position dst, src(i+x,j+y) is the coordinate (i+x,j+y) in the skin color area src Pixel value, f(x,y) is the pixel value of the coordinate (x,y) in the filter matrix f, the size of the filter matrix f is (2a+1)×(2b+1), and the center coordinate is (0,0) , t is the preset effective search rate ESR threshold, area is the number of pixels whose value is 1 in the filter matrix f.
在一实施例中,待测框的左上角坐标(left,top)和右下角坐标(right,bottom)分别为:In one embodiment, the coordinates of the upper left corner (left, top) and the coordinates of the lower right corner (right, bottom) of the frame to be tested are:
(left,top)=(left′-b,top′-a)(left, top) = (left'-b, top'-a)
(right,bottom)=(right′+b,bottom′+a)(right, bottom) = (right'+b, bottom'+a)
其中,(left′,top′)、(right′,bottom′)分别为轮廓外接矩形的左上角和右下角坐标。Among them, (left', top'), (right', bottom') are the coordinates of the upper left corner and the lower right corner of the circumscribed rectangle of the outline, respectively.
在一实施例中,所述有效搜索率定义为所述待测框中所述肤色区域的面积与所述待测框的面积之比。In one embodiment, the effective search rate is defined as the ratio of the area of the skin color region in the frame to be measured to the area of the frame to be measured.
在一实施例中,所述将所述待检测图像从所述RGB颜色空间转换至所述YCbCr颜色空间的步骤包括:In one embodiment, the step of converting the image to be detected from the RGB color space to the YCbCr color space includes:
利用如下公式对所述待检测图像进行所述颜色空间转换:The color space conversion is performed on the to-be-detected image using the following formula:
Figure PCTCN2021091026-appb-000002
Figure PCTCN2021091026-appb-000002
其中,Y、Cb、Cr分别表示像素的亮度、蓝色色度分量、红色色度分量,R、G、B分别表示像素的红色、绿色、蓝色分量。Wherein, Y, Cb, and Cr represent the luminance, blue chrominance components, and red chrominance components of the pixel, respectively, and R, G, and B represent the red, green, and blue components of the pixel, respectively.
在一实施例中,对所述肤色区域进行形态学处理的步骤包括:通过开操作去除游离的肤色点和细线结构。In one embodiment, the step of performing morphological processing on the skin color region includes: removing free skin color points and thin line structures through an opening operation.
在一实施例中,对所述肤色区域进行形态学处理的步骤还包括:通过闭操作填补孔洞并弥合缝隙。In one embodiment, the step of performing morphological processing on the skin color region further includes: filling holes and bridging gaps through a closing operation.
在一实施例中,所述待测框至少包括待测框A和待测框B,所述S104还包括:In one embodiment, the frame to be tested includes at least frame A to be tested and frame B to be tested, and the S104 further includes:
对所述待测框A、B进行合并,其中若所述待测框A和B合并后得到的待测框C的面积小于或等于所述待测框A和B的面积之和,则将所述待测框A和B合并,否则待测框A和B不合并。The frames A and B to be measured are merged, wherein if the area of the frame to be measured C obtained after the frames A and B to be measured are merged is less than or equal to the sum of the areas of the frames to be measured A and B, then The to-be-measured frame A and B are merged, otherwise the to-be-measured frame A and B are not merged.
在一实施例中,待测框C的左上角坐标(l C,t C)和右下角坐标(r C,b C)分别为: In one embodiment, the coordinates of the upper left corner (l C , t C ) and the coordinates of the lower right corner (r C , b C ) of the frame C to be tested are:
(l C,t C)=(min(l A,l B),min(t A,t B)) (l C ,t C )=(min(l A ,l B ),min(t A ,t B ))
(r C,b C)=(max(r A,r B),max(b A,b B)) (r C ,b C )=(max(r A ,r B ),max(b A ,b B ))
其中,(l A,t A)、(r A,b A)分别为待测框A的左上角坐标和右下角坐标,(l B,t B)、(r B,b B)分别为待测框B的左上角坐标和右下角坐标。 Among them, (l A , t A ) and (r A , b A ) are the coordinates of the upper left corner and the lower right corner of the frame A to be tested, respectively, and (l B , t B ) and (r B , b B ) are the coordinates of the upper left corner and the lower right corner of the frame A to be tested, respectively. The coordinates of the upper left corner and the lower right corner of the frame B.
在一实施例中,S106中人脸定位框的左上角和右下角的坐标分别为:In one embodiment, the coordinates of the upper left corner and the lower right corner of the face positioning frame in S106 are respectively:
(l,t)=(l C+l′,t C+t′) (l, t) = (l C +l', t C +t')
(r,b)=(r C+r′,b C+b′) (r, b) = (r C +r', b C +b')
其中,(l C,t C)、(r C,b C)分别为待测框c的左上角坐标和右下角坐标,(l′,t′)、(r′,b′)分别为卷积神经网络输出的、待测框c中某个人脸定位的左上角和右下角坐标。 Among them, (l C , t C ), (r C , b C ) are the coordinates of the upper left corner and the lower right corner of the frame to be tested c, respectively, (l', t'), (r', b') are the volumes The coordinates of the upper left corner and the lower right corner of a face location in the frame c to be tested output by the product neural network.
在一实施例中,有效搜索率定义为待测框中肤色区域面积与待测框面积之比。In one embodiment, the effective search rate is defined as the ratio of the area of the skin color region in the frame to be measured to the area of the frame to be measured.
本申请的另一方面,提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现上述任一实施例的方法的步骤。Another aspect of the present application provides a computer device including a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method in any of the above embodiments when the processor executes the computer program.
本申请的又一方面,提供一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现上述任一实施例所述的方法的步骤。In yet another aspect of the present application, a computer-readable storage medium is provided on which a computer program is stored, wherein when the computer program is executed by a processor, the steps of the method described in any of the foregoing embodiments are implemented.
有益效果:本申请能在保留人脸检测卷积神经网络高准确率的同时通过多层预处理技术减少其需要搜索区域的大小,从而大幅提升其运行速度。Beneficial effects: The present application can reduce the size of the search area required by the multi-layer preprocessing technology while retaining the high accuracy of the face detection convolutional neural network, thereby greatly improving its running speed.
附图说明Description of drawings
图1是本申请一实施例的一种基于多层预处理的快速人脸检测方法的流程图。FIG. 1 is a flowchart of a fast face detection method based on multi-layer preprocessing according to an embodiment of the present application.
图2是本申请一实施例的有效搜索位置滤波(ESPF滤波)的示意图。FIG. 2 is a schematic diagram of efficient search position filtering (ESPF filtering) according to an embodiment of the present application.
图3是本申请一实施例的待测框生成示意图。FIG. 3 is a schematic diagram of generating a frame to be tested according to an embodiment of the present application.
图4是本申请一实施例的待测框合并示意图。FIG. 4 is a schematic diagram of merging frames to be tested according to an embodiment of the present application.
具体实施方式Detailed ways
如前所述,虽然各种经过优化设计和充分训练的卷积神经网络能够十分精准地检测出各种光照、角度甚至部分遮挡情况下的人脸。但是,卷积神经网络也有自身的缺点,即快速运算十分依赖具有强大浮点运算能力的GPU进行支持。受 限于成本、体积和功率的制约,在小型边缘终端上很难支持卷积神经网络的快速运算。As mentioned earlier, although various optimally designed and fully trained convolutional neural networks can detect faces with high accuracy in various lighting, angles and even partial occlusions. However, convolutional neural networks also have their own shortcomings, that is, fast operations are very dependent on GPUs with powerful floating-point computing capabilities. Limited by cost, volume and power, it is difficult to support the fast operation of convolutional neural networks on small edge terminals.
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
下面结合附图以及具体实施例对本申请的技术方案作进一步阐述:The technical solutions of the present application are further elaborated below in conjunction with the accompanying drawings and specific embodiments:
如图1所示的实施例中,一种基于多层预处理的的快速人脸检测方法,具体包括如下操作步骤。In the embodiment shown in FIG. 1 , a method for fast face detection based on multi-layer preprocessing specifically includes the following operation steps.
S101,对输入图像(待检测图像)进行颜色空间转换,从默认RGB颜色空间转换为YCbCr颜色空间。这是因为YCbCr分离出了颜色的亮度和色度,所以比较适用于不同光照情况下对颜色进行分类的场景。S101 , perform color space conversion on an input image (image to be detected), from a default RGB color space to a YCbCr color space. This is because YCbCr separates the brightness and chromaticity of the color, so it is more suitable for scenes that classify colors under different lighting conditions.
由于计算机领域中图像或视频的编码绝大部分是基于RGB颜色空间的,因此如果要使用YCbCr,则需要先将RGB颜色空间转换到YCbCr颜色空间。因为人眼对红绿蓝三种颜色的敏感度并不相同,所以转换亮度Y的时候需要给予红、绿、蓝不同的权重。具体转换公式如下:Since most of the encoding of images or videos in the computer field is based on the RGB color space, if you want to use YCbCr, you need to convert the RGB color space to the YCbCr color space first. Because the sensitivity of the human eye to the three colors of red, green and blue is not the same, it is necessary to give different weights to red, green and blue when converting the brightness Y. The specific conversion formula is as follows:
Figure PCTCN2021091026-appb-000003
Figure PCTCN2021091026-appb-000003
S102,利用椭圆肤色模型对S101得到的图像中所有像素逐个判断是否为肤色像素,以得到肤色区域,其中当任一像素的蓝色色度、红色色度分量满足椭圆肤色模型要求时,则所述像素被判断为所述肤色像素。S102, utilize the ellipse skin color model to judge whether all the pixels in the image obtained in S101 are skin color pixels one by one, to obtain the skin color area, wherein when the blue chromaticity and red chromaticity components of any pixel meet the requirements of the elliptical skin color model, then the described The pixel is judged to be the skin color pixel.
经过统计大量肤色可以发现,在YCbCr空间中肤色近似呈现椭圆柱状分布,即在CbCr平面中肤色的分布近接近一个椭圆。根据统计研究,如果以Cr为横轴Cb为纵轴建立平面直角坐标系,那么肤色椭圆的中心位置为(155,113),长轴长度为30,短轴长度为20,且倾斜角度为45°(逆时针旋转)。因此,肤色椭圆方程为:After counting a large number of skin colors, it can be found that the skin color in the YCbCr space approximately presents an elliptical columnar distribution, that is, the distribution of the skin color in the CbCr plane is close to an ellipse. According to statistical research, if a plane rectangular coordinate system is established with Cr as the horizontal axis and Cb as the vertical axis, the center position of the skin ellipse is (155,113), the length of the long axis is 30, the length of the short axis is 20, and the inclination angle is 45° ( Anticlockwise rotation). Therefore, the skin color ellipse equation is:
Figure PCTCN2021091026-appb-000004
Figure PCTCN2021091026-appb-000004
有了肤色椭圆模型之后,对于一个像素,若其蓝色色度Cb和红色色度Cr分量构成的点在肤色椭圆内,则可以判断其为肤色像素,否则就是非肤色像素。对公式2进行化简,获得像素是肤色像素的判断条件如下:With the skin color ellipse model, for a pixel, if the point formed by its blue chromaticity Cb and red chromaticity Cr components is within the skin color ellipse, it can be judged as a skin color pixel, otherwise it is a non-skin color pixel. Simplify formula 2, and the judgment conditions for obtaining a pixel as a skin color pixel are as follows:
Cr(13Cr-10Cb-2900)+Cb(13Cb-1388)+295972≤0(公式3)。Cr(13Cr-10Cb-2900)+Cb(13Cb-1388)+295972≤0 (Formula 3).
S101中,RGB图像转换到YCbCr空间后,当其中某像素的Cb、Cr分量满足公式3时,就可以认为该像素为肤色像素。对于输入图像中的每一个像素利用公式3进行判断可以得到肤色区域(或肤色蒙版)。In S101, after the RGB image is converted into the YCbCr space, when the Cb and Cr components of a certain pixel satisfy Formula 3, the pixel can be considered as a skin color pixel. For each pixel in the input image, the skin color region (or skin color mask) can be obtained by judging with formula 3.
S103,对S102得到的所述肤色区域进行形态学处理以得到经处理的肤色区域。S103, performing morphological processing on the skin color region obtained in S102 to obtain a processed skin color region.
形态学操作是图像处理领域中一系列处理二值化图像形状特征的技术。其基本思想是利用一个特定形状的结构元和规则来修改图像中像素值,从而达到消除噪点、弥补孔隙、修剪毛刺和光滑边缘等效果,以便进一步进行图像分析和目标识别。基本的形态学操作包括腐蚀(Erosion)和膨胀(Dilation)。腐蚀用来消除噪点、毛刺等精细结构,而膨胀用来弥补孔洞、缝隙。进行腐蚀操作时,将结构元在输入图像上逐个像素地进行滑动,将结构元中所有1值正对的输入图像像素称为对应像素,每次滑动将对应像素的最小值写入输出图像的正对结构元锚点位置的像素,其用公式表示如下:Morphological operations are a series of techniques in the field of image processing to process the shape features of binarized images. The basic idea is to modify the pixel values in the image by using a specific shape of structural elements and rules, so as to achieve the effects of eliminating noise, filling pores, trimming burrs and smoothing edges for further image analysis and target recognition. Basic morphological operations include Erosion and Dilation. Erosion is used to remove fine structures such as noise and glitches, while dilation is used to fill holes and gaps. When performing the erosion operation, the structuring element is slid pixel by pixel on the input image, and all the input image pixels whose 1-values are opposite in the structuring element are called corresponding pixels. The pixel facing the anchor point of the structuring element is expressed by the formula as follows:
Figure PCTCN2021091026-appb-000005
Figure PCTCN2021091026-appb-000005
其中dst、src、E分别表示输出图像、输入图像和结构元,结构元以锚点为坐标中心,(i,j)为当前结构元的锚点位置坐标,而(x,y)为结构元的相对锚点的偏移量。公式4表明了在腐蚀过程中,只有结构元的1值区域被输入图像的1值区域完全覆盖时,输出图像的锚点位置的像素值才为1。这会导致图像的1值区域轮廓收缩,即从视觉上1值区域仿佛被腐蚀了一样。膨胀操作与腐蚀操作类似,只不过最小值变为了最大值,其公式表示如下:Among them, dst, src, and E represent the output image, input image and structuring element, respectively. The structuring element takes the anchor point as the coordinate center, (i, j) is the anchor position coordinate of the current structuring element, and (x, y) is the structuring element. The offset of the relative anchor point. Equation 4 shows that during the erosion process, only when the 1-valued region of the structuring element is completely covered by the 1-valued region of the input image, the pixel value of the anchor point position of the output image is 1. This causes the contours of the 1-value areas of the image to shrink, that is, visually the 1-value areas appear to be eroded. The dilation operation is similar to the erosion operation, except that the minimum value becomes the maximum value, and its formula is as follows:
Figure PCTCN2021091026-appb-000006
Figure PCTCN2021091026-appb-000006
公式5表明了在膨胀过程中,只有结构元的1值区域被输入图像的0值区域完全覆盖时,输出图像锚点位置的像素值才为0。这会导致图像的1值区域轮廓扩张,即从视觉上1值区域仿佛膨胀了一样。腐蚀和膨胀会导致肤色区域面积大幅变化。Equation 5 shows that in the dilation process, only when the 1-valued region of the structuring element is completely covered by the 0-valued region of the input image, the pixel value at the anchor point position of the output image is 0. This will cause the contours of the 1-valued areas of the image to expand, that is, visually the 1-valued areas appear to be inflated. Erosion and swelling can cause large changes in the area of skin tone areas.
为了在不影响肤色区域大小的情况下消除噪点、填补孔隙,需要使用开操作(Opening)和闭操作(Closing)。开操作指用同一结构元对图像先后进行腐蚀和膨胀操作。开操作可以断开细小的连接、去除噪点。闭操作则是指先膨胀后腐蚀,其可以连接相近的区域、填补孔隙。对得到的肤色区域进行形态学处理,通过开操作去除游离的肤色点和细线结构,通过闭操作填补较小的肤色区域孔洞并弥合小缝隙。开操作和闭操作对肤色区域面积的影响不大同时还能去除噪点、填补孔隙。对S102中得到的肤色蒙版分别进行开操作和闭操作,可以得到最终的肤色蒙版。To remove noise and fill in pores without affecting the size of the skin tone area, you need to use Opening and Closing. The opening operation refers to successively eroding and dilating the image with the same structural element. The ON operation can disconnect small connections and remove noise. The closing operation refers to the expansion first and then the corrosion, which can connect the adjacent areas and fill the pores. Morphological processing is performed on the obtained skin color area, the free skin color points and thin line structures are removed by the opening operation, and the holes in the smaller skin color area are filled and small gaps are bridged by the closing operation. The opening and closing operations have little effect on the area of the skin tone area while removing noise and filling pores. The skin color mask obtained in S102 is respectively opened and closed to obtain the final skin color mask.
S104,对S103中处理得到的所述经处理的肤色区域进行有效搜索位置滤波,得到有效搜索位置,利用轮廓提取技术提取有效搜索位置的轮廓,对应每个轮廓生成一个待测框。S104: Perform effective search position filtering on the processed skin color region obtained in S103 to obtain an effective search position, extract the contour of the effective search position by using a contour extraction technique, and generate a frame to be measured corresponding to each contour.
对最终得到的肤色区域进行有效搜索位置滤波(Effective Search Position Filtering,ESPF)得到所有有效搜索位置像素区域。ESPF滤波是一种特殊的图像滤波操作,其使用了椭圆形状的滤波矩阵和基于有效搜索率(Effective Search Rate,ESR)的滤波计算操作。其中有效搜索率被定义为待测框中肤色区域面积A s与待测框面积之比A r,其公式如下: Effective search position filtering (Effective Search Position Filtering, ESPF) is performed on the finally obtained skin color area to obtain all effective search position pixel areas. ESPF filtering is a special image filtering operation, which uses an ellipse-shaped filtering matrix and an effective search rate (Effective Search Rate, ESR)-based filtering calculation operation. The effective search rate is defined as the ratio of the area A s of the skin color area in the frame to be measured to the area of the frame to be measured A r , and its formula is as follows:
Figure PCTCN2021091026-appb-000007
Figure PCTCN2021091026-appb-000007
ESPF计算过程可以用公式表示为:The ESPF calculation process can be expressed as:
Figure PCTCN2021091026-appb-000008
Figure PCTCN2021091026-appb-000008
公式中的dst、src和f分别为输出图像、输入图像和滤波矩阵。滤波矩阵的尺寸为(2a+1)×(2b+1)、中心坐标为(0,0),t为预设的ESR阈值,area为滤波矩阵中1值像素的数量。在ESPF滤波中使用的滤波矩阵是一个椭圆矩阵,其中的1值排列为一个内切于矩形的正椭圆形,如图2中滤波矩阵所示。dst, src and f in the formula are the output image, input image and filter matrix, respectively. The size of the filter matrix is (2a+1)×(2b+1), the center coordinate is (0, 0), t is the preset ESR threshold, and area is the number of 1-valued pixels in the filter matrix. The filter matrix used in EPF filtering is an ellipse matrix, in which the 1 values are arranged as a regular ellipse inscribed in the rectangle, as shown in the filter matrix in Figure 2.
如图2所示,ESPF滤波的输出图像即为有效搜索位置,再利用轮廓提取技术提取其中有效搜索位置的轮廓,对每个轮廓生成一个待测框。待测框由轮廓外 接矩形向周围扩展一定距离得到,该轮廓外接矩形四边均与轮廓相外切且各边与图像的各边平行。扩展距离等于滤波矩阵尺寸的一半,如果轮廓外接矩形框左上角和右下角的坐标分别为(left′,top′)、(right′,bottom′),滤波矩阵尺寸为(2a+1)×(2b+1),那么扩展得到的待测框的左上角坐标和右下角坐标为:As shown in Figure 2, the output image of ESPF filtering is the effective search position, and then the contour of the effective search position is extracted by contour extraction technology, and a test frame is generated for each contour. The frame to be measured is obtained by extending a rectangle circumscribing a contour to the surroundings by a certain distance. The four sides of the circumscribed rectangle are all circumscribed to the contour and each side is parallel to each side of the image. The extension distance is equal to half the size of the filter matrix. If the coordinates of the upper left corner and the lower right corner of the bounding rectangle of the outline are (left', top'), (right', bottom'), the filter matrix size is (2a+1)×( 2b+1), then the coordinates of the upper left corner and the lower right corner of the expanded frame to be tested are:
Figure PCTCN2021091026-appb-000009
Figure PCTCN2021091026-appb-000009
最终生成待测框效果如附图3所示,经过ESPF滤波后得到的每个待测框具有较高的ESR。此时小面积肤色区域、狭长型肤色区域等非人脸肤色部分被ESPF滤波消除,而肤色区域连通的问题也被解决。The final effect of generating the frame to be tested is shown in Figure 3, and each frame to be tested obtained after EPF filtering has a higher ESR. At this time, non-face skin color parts such as small-area skin-color areas and narrow and long skin-color areas are eliminated by ESPF filtering, and the problem of skin color area connectivity is also solved.
S105,使用具有人脸检测功能的卷积神经网络,对S104得到的所述待测框进行逐个检测,并给出所述待测框中的人脸定位坐标。S105 , use a convolutional neural network with a face detection function to detect the to-be-measured frames obtained in S104 one by one, and provide the face location coordinates in the to-be-measured frame.
检查是否有可以合并的待测框并将其全部合并,得到最终待测框。合并待测框即用一个更大的待测框C代替两个需要合并的待测框A和B,待测框C应当完全覆盖A和B同时面积尽量小,因此可得待测框C的左上角坐标和右下角坐标为:Check if there are frames to be merged and merge them all to get the final frame to be tested. Merging the test frame is to replace the two test frames A and B that need to be merged with a larger test frame C. The test frame C should completely cover A and B and the area should be as small as possible. Therefore, the test frame C can be obtained. The coordinates of the upper left corner and the lower right corner are:
Figure PCTCN2021091026-appb-000010
Figure PCTCN2021091026-appb-000010
同时合并待测框还应该满足总面积不增大的条件,即满足S C≤S A+S B,其中面积S=(r-l)(b-t)。如图4所示为合并待测框效果,其中两对大面积重叠的待测框被合并,进一步减小了卷积神经网络需要搜索的面积,提高了搜索效率。 At the same time, the combined to-be-measured frame should also satisfy the condition that the total area does not increase, that is, S C ≤ S A +S B , where the area S=(rl)(bt). Figure 4 shows the effect of merging the frames to be tested, in which two pairs of frames to be tested that overlap in large areas are merged, which further reduces the area to be searched by the convolutional neural network and improves the search efficiency.
S106,根据所述待测框的坐标和所述待测框中的所述人脸定位坐标,确定人脸定位框的坐标,以得到人脸检测结果。S106, according to the coordinates of the frame to be measured and the face positioning coordinates in the frame to be measured, determine the coordinates of the face positioning frame to obtain a face detection result.
使用具有人脸检测功能的卷积神经网络逐个检测每个最终待测框并给出其中的人脸定位坐标,这里输出的定位坐标是相对于待测框的。Use the convolutional neural network with face detection function to detect each final frame to be tested one by one and give the face positioning coordinates in it, where the output positioning coordinates are relative to the frame to be tested.
步骤7,卷积神经网络会输出待测框内所有人脸定位框相对于待测框的坐标,若待测框左上角和右下角坐标为(l C,t C)和(r C,b C),卷积神经网络输出某个人脸定位框左上角和右下角坐标为(l′,t′)和(r′,b′),那么该人脸定位框左上角和右下角的实际坐标分别为: Step 7, the convolutional neural network will output the coordinates of all face positioning frames in the frame to be measured relative to the frame to be measured. If the coordinates of the upper left corner and the lower right corner of the frame to be measured are (l C , t C ) and (r C , b C ), the convolutional neural network outputs the coordinates of the upper left corner and the lower right corner of a certain face positioning frame as (l', t') and (r', b'), then the actual coordinates of the upper left corner and lower right corner of the face positioning frame They are:
Figure PCTCN2021091026-appb-000011
Figure PCTCN2021091026-appb-000011
根据待测框坐标和其中的人脸定位坐标计算人脸定位框在图像中的实际坐标并输出,得到最终的人脸检测结果。Calculate the actual coordinates of the face positioning frame in the image according to the coordinates of the frame to be measured and the face positioning coordinates in it, and output them to obtain the final face detection result.
应该理解的是,虽然图1的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of FIG. 1 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIG. 1 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these sub-steps or stages The sequence is also not necessarily sequential, but may be performed alternately or alternately with other steps or sub-steps of other steps or at least a portion of a phase.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普 通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can also be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims (13)

  1. 一种人脸检测方法,包括:A face detection method, comprising:
    S101,将待检测图像从RGB颜色空间转换至YCbCr颜色空间;S101, converting the image to be detected from the RGB color space to the YCbCr color space;
    S102,利用椭圆肤色模型对S101得到的图像中所有像素逐个判断是否为肤色像素,以得到肤色区域,其中当任一像素的蓝色色度、红色色度分量满足椭圆肤色模型要求时,则所述像素被判断为所述肤色像素;S102, utilize the ellipse skin color model to judge whether all the pixels in the image obtained in S101 are skin color pixels one by one, to obtain the skin color area, wherein when the blue chromaticity and red chromaticity components of any pixel meet the requirements of the elliptical skin color model, then the described The pixel is judged to be the skin color pixel;
    S103,对S102得到的所述肤色区域进行形态学处理以得到经处理的肤色区域;S103, performing morphological processing on the skin color region obtained in S102 to obtain the processed skin color region;
    S104,对S103中处理得到的所述经处理的肤色区域进行有效搜索位置滤波,得到有效搜索位置,利用轮廓提取技术提取有效搜索位置的轮廓,对应每个轮廓生成一个待测框;S104, carry out effective search position filtering to the described processed skin color region that is processed in S103, obtain effective search position, utilize contour extraction technology to extract the contour of effective search position, corresponding each contour generates a frame to be measured;
    S105,使用具有人脸检测功能的卷积神经网络,对S104得到的所述待测框进行逐个检测,并给出所述待测框中的人脸定位坐标;S105, use a convolutional neural network with a face detection function to detect the frames to be tested obtained in S104 one by one, and provide the face location coordinates in the frame to be tested;
    S106,根据所述待测框的坐标和所述待测框中的所述人脸定位坐标,确定人脸定位框的坐标。S106: Determine the coordinates of the face positioning frame according to the coordinates of the frame to be measured and the face positioning coordinates in the frame to be measured.
  2. 根据权利要求1所述的方法,其中,所述椭圆肤色模型要求为:The method according to claim 1, wherein, the elliptical skin color model is required to be:
    Cr(13Cr-10Cb-2900)+Cb(13Cb-1388)+295972≤0Cr(13Cr-10Cb-2900)+Cb(13Cb-1388)+295972≤0
    其中,Cb表示所述像素的所述蓝色色度分量,且Cr表示所述像素的所述红色色度分量。where Cb represents the blue chrominance component of the pixel, and Cr represents the red chrominance component of the pixel.
  3. 根据权利要求1所述的方法,其中,所述对所述经处理的肤色区域进行有效搜索位置滤波的步骤,包括:The method of claim 1, wherein the step of performing effective search position filtering on the processed skin tone region comprises:
    使用滤波矩阵对所述经处理的肤色区域进行有效搜索位置滤波,其中所述经处理的肤色区域中的像素值、所述滤波矩阵中的像素值以及所述有效搜索位置中的像素值满足以下公式:Effective search position filtering is performed on the processed skin color region using a filter matrix, wherein pixel values in the processed skin color region, pixel values in the filter matrix, and pixel values in the effective search positions satisfy the following formula:
    Figure PCTCN2021091026-appb-100001
    Figure PCTCN2021091026-appb-100001
    其中,dst(i,j)表示有效搜索位置dst中坐标(i,j)处的像素值,src(i+x,j+y)表示肤色区域src中坐标(i+x,j+y)处的像素值,f(x,y)表示滤波矩阵f中坐标 (x,y)处的像素值,所述滤波矩阵f的尺寸为(2a+1)×(2b+1)、中心坐标为(0,0),t表示预设的有效搜索率ESR的阈值,以及area表示所述滤波矩阵f中值为1的像素数量。Among them, dst(i,j) represents the pixel value at the coordinate (i,j) in the effective search position dst, and src(i+x,j+y) represents the coordinate (i+x,j+y) in the skin color area src The pixel value at , f(x, y) represents the pixel value at the coordinate (x, y) in the filter matrix f, the size of the filter matrix f is (2a+1)×(2b+1), and the center coordinate is (0, 0), t represents a preset threshold of the effective search rate ESR, and area represents the number of pixels whose value is 1 in the filtering matrix f.
  4. 根据权利要求3所述的方法,其中,所述待测框的左上角坐标(left,top)和右下角坐标(right,bottom)分别为:The method according to claim 3, wherein the coordinates of the upper left corner (left, top) and the coordinates of the lower right corner (right, bottom) of the frame to be tested are:
    (left,top)=(left′-b,top′-a)(left, top) = (left'-b, top'-a)
    (right,bottom)=(right′+b,bottom′+a)(right, bottom) = (right'+b, bottom'+a)
    其中,(left′,top′)、(right′,bottom′)分别表示轮廓外接矩形的左上角和右下角坐标。Among them, (left', top') and (right', bottom') represent the coordinates of the upper left corner and the lower right corner of the circumscribed rectangle of the outline, respectively.
  5. 根据权利要求3所述的方法,其中,所述有效搜索率定义为所述待测框中所述肤色区域的面积与所述待测框的面积之比。The method according to claim 3, wherein the effective search rate is defined as the ratio of the area of the skin color region in the frame to be measured to the area of the frame to be measured.
  6. 根据权利要求1所述的方法,其中,所述将所述待检测图像从所述RGB颜色空间转换至所述YCbCr颜色空间的步骤包括:The method of claim 1, wherein the step of converting the image to be detected from the RGB color space to the YCbCr color space comprises:
    利用如下公式对所述待检测图像进行所述颜色空间转换:The color space conversion is performed on the to-be-detected image using the following formula:
    Figure PCTCN2021091026-appb-100002
    Figure PCTCN2021091026-appb-100002
    其中,Y、Cb、Cr分别表示所述像素的亮度、所述蓝色色度分量、所述红色色度分量,以及R、G、B分别表示所述像素的红色、绿色、蓝色分量。Wherein, Y, Cb, and Cr represent the luminance, the blue chrominance component, and the red chrominance component of the pixel, respectively, and R, G, and B represent the red, green, and blue components of the pixel, respectively.
  7. 根据权利要求1所述的方法,其中,对所述肤色区域进行形态学处理的步骤包括:The method of claim 1, wherein the step of performing morphological processing on the skin tone region comprises:
    通过开操作去除游离的肤色点和细线结构。Removes loose skin tone spots and fine line structures with an opening operation.
  8. 根据权利要求7所述的方法,其中,对所述肤色区域进行形态学处理的步骤还包括:The method according to claim 7, wherein the step of performing morphological processing on the skin color region further comprises:
    通过闭操作填补孔洞并弥合缝隙。Fill holes and bridge gaps by closing.
  9. 根据权利要求1所述的方法,其中,所述待测框至少包括待测框A和待测框B,所述S104还包括:The method according to claim 1, wherein the frame to be tested includes at least a frame to be tested A and a frame to be tested B, and the S104 further includes:
    对所述待测框A、B进行合并,其中若所述待测框A和B合并后得到的待测 框C的面积小于或等于所述待测框A和B的面积之和,则将所述待测框A和B合并,否则待测框A和B不合并。The frames A and B to be measured are merged, wherein if the area of the frame to be measured C obtained after the frames A and B to be measured are merged is less than or equal to the sum of the areas of the frames to be measured A and B, then The to-be-measured frame A and B are merged, otherwise the to-be-measured frame A and B are not merged.
  10. 根据权利要求9所述的方法,其中,所述待测框C的左上角坐标(l C,t C)和右下角坐标(r C,b C)分别为: The method according to claim 9, wherein the coordinates of the upper left corner (l C , t C ) and the coordinates of the lower right corner (r C , b C ) of the frame to be measured C are respectively:
    (l C,t C)=(min(l A,l B),min(t A,t B)) (l C ,t C )=(min(l A ,l B ),min(t A ,t B ))
    (r C,b C)=(max(r A,r B),max(b A,b B)) (r C ,b C )=(max(r A ,r B ),max(b A ,b B ))
    其中,(l A,t A)、(r A,b A)分别为所述待测框A的左上角坐标和右下角坐标,以及(l B,t B)、(r B,b B)分别为所述待测框B的左上角坐标和右下角坐标。 Among them, (l A , t A ), (r A , b A ) are the coordinates of the upper left corner and the lower right corner of the frame A to be tested, and (l B , t B ), (r B , b B ) are the coordinates of the upper left corner and the lower right corner of the frame B to be tested, respectively.
  11. 根据权利要求9所述的方法,其中,所述人脸定位框的左上角坐标(l,t)和右下角坐标(r,b)分别为:The method according to claim 9, wherein the coordinates of the upper left corner (l, t) and the coordinates of the lower right corner (r, b) of the face positioning frame are respectively:
    (l,t)=(l C+l′,t C+t′) (l, t) = (l C +l', t C +t')
    (r,b)=(r C+r′,b C+b′) (r, b) = (r C +r', b C +b')
    其中,(l C,t C)、(r C,b C)分别为所述待测框C的左上角坐标和右下角坐标,以及(l′,t′)、(r′,b′)分别为所述卷积神经网络输出的、所述待测框C中任一人脸定位框的左上角坐标和右下角坐标。 Among them, (l C , t C ), (r C , b C ) are the coordinates of the upper left corner and the lower right corner of the frame C to be tested, and (l', t'), (r', b') are the coordinates of the upper left corner and the coordinates of the lower right corner of any face locating frame in the frame to be tested C, which are output by the convolutional neural network, respectively.
  12. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现权利要求1至11中任一项所述方法的步骤。A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 11 when the processor executes the computer program.
  13. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至11中任一项所述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 11.
PCT/CN2021/091026 2021-03-25 2021-04-29 Rapid facial detection method based on multi-layer preprocessing WO2022198751A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022512825A JP7335018B2 (en) 2021-03-25 2021-04-29 A Fast Face Detection Method Based on Multilayer Preprocessing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110322204.7 2021-03-25
CN202110322204.7A CN113204991B (en) 2021-03-25 2021-03-25 Rapid face detection method based on multilayer preprocessing

Publications (1)

Publication Number Publication Date
WO2022198751A1 true WO2022198751A1 (en) 2022-09-29

Family

ID=77025720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091026 WO2022198751A1 (en) 2021-03-25 2021-04-29 Rapid facial detection method based on multi-layer preprocessing

Country Status (3)

Country Link
JP (1) JP7335018B2 (en)
CN (1) CN113204991B (en)
WO (1) WO2022198751A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694233B (en) * 2022-06-01 2022-08-23 成都信息工程大学 Multi-feature-based method for positioning human face in examination room monitoring video image

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040034611A1 (en) * 2002-08-13 2004-02-19 Samsung Electronics Co., Ltd. Face recognition method using artificial neural network and apparatus thereof
CN103632132A (en) * 2012-12-11 2014-03-12 广西工学院 Face detection and recognition method based on skin color segmentation and template matching
CN106485222A (en) * 2016-10-10 2017-03-08 上海电机学院 A kind of method for detecting human face being layered based on the colour of skin
CN110706295A (en) * 2019-09-10 2020-01-17 中国平安人寿保险股份有限公司 Face detection method, face detection device and computer-readable storage medium
CN111191532A (en) * 2019-12-18 2020-05-22 深圳供电局有限公司 Face recognition method and device based on construction area and computer equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100354875C (en) * 2005-09-29 2007-12-12 上海交通大学 Red eye moving method based on human face detection
US20080107341A1 (en) * 2006-11-02 2008-05-08 Juwei Lu Method And Apparatus For Detecting Faces In Digital Images
CN102324025B (en) * 2011-09-06 2013-03-20 北京航空航天大学 Human face detection and tracking method based on Gaussian skin color model and feature analysis
CN104331690B (en) * 2014-11-17 2017-08-29 成都品果科技有限公司 A kind of colour of skin method for detecting human face and system based on single image
CN108230331A (en) * 2017-09-30 2018-06-29 深圳市商汤科技有限公司 Image processing method and device, electronic equipment, computer storage media
CN109961016B (en) * 2019-02-26 2022-10-14 南京邮电大学 Multi-gesture accurate segmentation method for smart home scene

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040034611A1 (en) * 2002-08-13 2004-02-19 Samsung Electronics Co., Ltd. Face recognition method using artificial neural network and apparatus thereof
CN103632132A (en) * 2012-12-11 2014-03-12 广西工学院 Face detection and recognition method based on skin color segmentation and template matching
CN106485222A (en) * 2016-10-10 2017-03-08 上海电机学院 A kind of method for detecting human face being layered based on the colour of skin
CN110706295A (en) * 2019-09-10 2020-01-17 中国平安人寿保险股份有限公司 Face detection method, face detection device and computer-readable storage medium
CN111191532A (en) * 2019-12-18 2020-05-22 深圳供电局有限公司 Face recognition method and device based on construction area and computer equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QU SHIRU, XIONG BO: "A New and Better Face Detection Algorithm Using Gabor Filter and Neural Network", JOURNAL OF NORTHWESTERN POLYTECHNICAL UNIVERSITY, XUBEI GONGYE DAXUE , SHAANXI, CN, vol. 29, no. 5, 31 October 2011 (2011-10-31), CN , pages 690 - 694, XP055969052, ISSN: 1000-2758 *
ZHU ZHENGPING, SUN CUAN-QING,WANG YANG-PING: "Face Detection Method based on Complexion and Template Matching", ZIDONGHUA YU YIQI YIBIAO - PROCESS AUTOMATION INSTRUMENTATION., CHONGQING GONGYE ZIDONGHUA YIBIAO YANJIUSUO, CN, no. 6, 31 December 2008 (2008-12-31), CN , XP055969051, ISSN: 1001-9227 *

Also Published As

Publication number Publication date
JP7335018B2 (en) 2023-08-29
CN113204991A (en) 2021-08-03
JP2023522501A (en) 2023-05-31
CN113204991B (en) 2022-07-15

Similar Documents

Publication Publication Date Title
Li et al. Multi-angle head pose classification when wearing the mask for face recognition under the COVID-19 coronavirus epidemic
US10783354B2 (en) Facial image processing method and apparatus, and storage medium
CA2867365C (en) Method, system and computer storage medium for face detection
CN110084135B (en) Face recognition method, device, computer equipment and storage medium
Yan et al. One extended OTSU flame image recognition method using RGBL and stripe segmentation
US8358813B2 (en) Image preprocessing
CN110163842B (en) Building crack detection method and device, computer equipment and storage medium
WO2018040756A1 (en) Vehicle body colour identification method and device
US8358812B2 (en) Image Preprocessing
JP6932402B2 (en) Multi-gesture fine division method for smart home scenes
US8244004B2 (en) Image preprocessing
Hajraoui et al. Face detection algorithm based on skin detection, watershed method and gabor filters
Kalbkhani et al. An efficient algorithm for lip segmentation in color face images based on local information
WO2022198751A1 (en) Rapid facial detection method based on multi-layer preprocessing
CN111709305B (en) Face age identification method based on local image block
Daithankar et al. Analysis of skin color models for face detection
Parente et al. Assessing facial image accordance to ISO/ICAO requirements
Yi et al. Face detection method based on skin color segmentation and facial component localization
CN114038030A (en) Image tampering identification method, device and computer storage medium
Das et al. A novel approach towards detecting faces and gender using skin segmentation and template matching
Alrjebi et al. Two directional multiple colour fusion for face recognition
Huang et al. Eye detection based on skin color analysis with different poses under varying illumination environment
CN111209922B (en) Image color system style marking method, device, equipment and medium based on svm and opencv
Zhang et al. A method of facial wearable items recognition
WO2024066050A1 (en) Facial recognition method and apparatus based on visual template and pyramid strategy

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022512825

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932368

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932368

Country of ref document: EP

Kind code of ref document: A1