CN112036257A

CN112036257A - Non-perception face image acquisition method and system

Info

Publication number: CN112036257A
Application number: CN202010789776.1A
Authority: CN
Inventors: 刘守印; 方书雅; 胡骞鹤; 方冠男
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2020-12-04

Abstract

The invention discloses a method and a system for collecting an imperceptible face image, which comprise the step S1 of shooting a panoramic image of a to-be-detected area, detecting the position of a human body in the panoramic image by using a human body detection method, and obtaining a human body boundary frame vector b_i(ii) a Step S2, transmitting the human body bounding box vector bi to a PTZ camera motion model trained in advance, wherein the PTZ camera motion model is used for outputting and inputting a human body bounding box vector b_iThe PTZ camera motion parameter vector which can shoot the front face image of the person correspondingly; and step S3, the PTZ camera moves the camera direction and changes the camera focal length according to the motion parameter vector of the PTZ camera, aims at the people at the corresponding position to shoot, realizes the screening of the face angle through the face detection and the head gesture recognition algorithm, and obtains the front face image of the people. The invention adopts a two-stage detection method combining human body detection and face angle screening, and has high face detection efficiency and strong practicability.

Description

Non-perception face image acquisition method and system

Technical Field

The invention relates to the field of deep learning and computer vision, in particular to a method and a system for collecting an imperceptible face image.

Background

In some scenarios, such as a student class in a classroom, members of an organization meet in a room and require identity verification of the people in the room. When all the people face the same direction, such as one wall (blackboard, projection screen) facing a room, the face images of all the people in the room can be collected by arranging the cameras at proper positions, and the attendance of the people in a non-sensing class can be realized by applying a face recognition related algorithm.

The human face image acquisition method has the advantages that the actual classroom environment and the conference environment are dense in personnel and changeable in human posture, the missing detection rate of the existing human face image acquisition method is high, the quality of the extracted human face image is poor (the resolution is low, the human face image is not positive and the like), the diversity of the human face posture is not considered, the human face image suitable for human face recognition is difficult to extract, and the human face recognition rate is low.

The double cameras are mainly applied to the field of monitoring security and protection, and can realize the tracking of pedestrians or suspicious molecules. Generally, a dual camera is composed of a wide-angle camera (also called a panoramic camera) and a ptz (pan Tilt zoom) camera, which are respectively used for monitoring the whole area and tracking and capturing details. Its advantages are that two cameras are matched in different time. However, the application of dual cameras to person identification is relatively rare. Related researches have the method of using double-camera class inspection to identify students on duty, and a comparison document CN110647842A discloses an invention patent named as 'a double-camera class inspection method and system', wherein the patent controls a second camera to monitor students at the front of each desk table at a fixed point by acquiring coordinates of a central point of the desktop of each desk table, and the patent has the defects that information shot by a panoramic camera is not fully utilized, human face head images of all students are difficult to obtain, the condition of omission exists, a plurality of cruise routes need to be additionally designed, the efficiency is low, and detection resources and time are wasted.

Disclosure of Invention

The invention aims to overcome the defects of the background technology and provides a non-perception human face image acquisition method and system which have no detection omission and can acquire high-definition frontal face pictures of all people in a region to be detected.

In order to realize the purpose, the invention adopts the following technical scheme:

a method for collecting an imperceptible face image comprises the following steps:

step S1, from the fixed position alongShooting a panoramic image of an area to be detected at a fixed angle and a fixed focal length, detecting the position of a human body in the panoramic image, and obtaining a human body boundary frame vector b_i＝[y_1i,x_1i,y_2i,x_2i]Wherein (x)_1i,y_1i) (x) image coordinates representing the top left vertex of the human bounding box_2i,y_2i) Image coordinates representing the lower right vertex of the human body bounding box, i being a person index;

step S2, transmitting the human body bounding box vector bi to a PTZ camera motion model trained in advance, wherein the PTZ camera motion model is used for outputting and inputting a human body bounding box vector b_iCorresponding PTZ camera motion parameter vector p capable of shooting front face image_i,t_i,z_i]Wherein p is_iIs the value of the angle of movement of the PTZ camera in the horizontal direction, t_iIs the value of the angle of movement of the PTZ camera in the vertical direction, z_iIs the zoom factor of the PTZ camera;

and step S3, the PTZ camera moves the azimuth and changes the focal length according to the motion parameter vector of the PTZ camera, shoots the person aiming at the corresponding position and acquires the front face image.

Preferably, in step S1, the method for detecting the position of the human body in the panorama can be as follows: and detecting the human body boundary box in the panoramic image by adopting a Mask R-CNN target detection algorithm.

Preferably, in step S2, the PTZ camera motion model may adopt a multiple linear regression model, and multiple linear regression is performed on a plurality of human body bounding box vectors acquired manually and corresponding PTZ camera motion parameter vectors capable of capturing a front face image to obtain regression coefficients of the human body bounding box vectors and the PTZ camera motion parameter vectors, where the plurality of human body bounding box vectors acquired manually are obtained from a panoramic image of the region to be detected captured at the same position, the same angle, and the same focal length as in step S1.

Preferably, the multiple linear regression model that may be employed by the PTZ camera motion model is as follows:

wherein [ p ]_i,t_i,z_i]Representing PTZ camera motion parameter vectors, b_i＝[y_1i,x_1i,y_2i,x_2i]Representing the human bounding box vector, i being the human index, and β being the regression coefficient.

Preferably, in step S3, a front face image may be obtained by using a face detection and head pose recognition algorithm, the video stream captured by the PTZ camera is processed at regular time to detect a face picture in the video picture, then a face angle in the face picture is obtained by using a head pose recognition algorithm, and the front face image is screened out based on a preset standard of the face angle.

Preferably, the MTCNN algorithm may be used to detect a face picture in the video picture.

Preferably, the face angle in the face picture can be obtained by adopting an FSA-Net head posture recognition algorithm.

A non-perception human face image acquisition system comprises a panoramic camera, a PTZ camera and a server; the server comprises a human body detection module, a PTZ camera motion control module and a human face angle screening module; the panoramic camera is used for inputting a panoramic image of an area to be detected, which is shot from a fixed position along a fixed angle at a fixed focal length, into the human body detection module; the human body detection module is used for detecting the human body position in the panoramic image and acquiring a human body bounding box vector b_i＝[y_1i,x_1i,y_2i,x_2i]And input to the PTZ camera motion control module, wherein (x)_1i,y_1i) (x) image coordinates representing the top left vertex of the human bounding box_2i,y_2i) Image coordinates representing the lower right vertex of the human body bounding box, i being a person index; the PTZ camera motion control module includes a PTZ camera motion model to implement a from-the-human bounding box vector b_iMotion parameter vector [ p ] of PTZ camera to corresponding front face image_i,t_i,z_i]In which p is_iIs a PTZ camera in waterAngle value of movement in the square, t_iIs the value of the angle of movement of the PTZ camera in the vertical direction, z_iIs the zoom factor of the PTZ camera; the PTZ camera motion control module is used for acquiring the input human body bounding box vector b_iCorresponding PTZ camera motion parameter vector p capable of shooting front face image_i,t_i,z_i]And is used for controlling the PTZ camera to move the azimuth and change the focal length according to the motion parameter vector of the PTZ camera; the PTZ camera is used for outputting a shot video stream to the face angle screening module, and the moving range of the PTZ camera covers the panorama; the face angle screening module is used for extracting a front face image in a video stream.

Preferably, the panoramic camera and the PTZ camera may be installed in a classroom or conference room with the persons to be inspected facing in the same direction, and the panoramic camera takes a panoramic view covering all the persons to be inspected.

Preferably, the human body detection module is further configured to obtain the human body bounding box vector b_iThe PTZ camera motion control module is used for controlling the PTZ cameras to sequentially follow the corresponding PTZ camera motion parameter vector [ p ] according to the personnel index sequence_i,t_i,z_i]Moving the direction and changing the focal length to aim at the corresponding position of the person to be detected for shooting.

Preferably, the face angle screening module may include a face detection module and a head pose recognition module, the face detection module is used for performing timing screening on an input video stream to detect a face picture in the video picture, and the head pose recognition module is used for obtaining a face angle in the face picture and screening a front face image based on a preset standard.

Preferably, the non-perception human face image acquisition system can be applied to personnel counting in classrooms or meeting rooms or outdoor squares.

The method comprises the steps of detecting a human body position in a panoramic image by adopting a target detection algorithm, obtaining a human body boundary frame vector bi, establishing a PTZ camera motion model by adopting a multiple linear regression algorithm, realizing mapping from the human body boundary frame vector bi to a corresponding PTZ camera motion parameter vector [ pi, ti, zi ], wherein pi is an angle value of the PTZ camera moving in the horizontal direction, ti is an angle value of the PTZ camera moving in the vertical direction, and zi is a zoom multiple of the PTZ camera, controlling the PTZ camera to move, changing the azimuth, aligning the focal length and shooting a person at a corresponding position, and then obtaining a human face angle in a human face image by adopting a head posture recognition algorithm and screening out a front face image.

The invention fully utilizes the information in the panoramic image, obtains personnel positioning through human body detection to solve the problem of missed detection, adopts the PTZ camera to solve the problems of undersize and low definition of the acquired face image, and screens out the face image through identifying the face angle to solve the problem of face posture diversity. The invention adopts a two-stage detection method combining human body detection and face angle screening, and has high detection efficiency and strong practicability.

The invention will become more apparent from the following description when taken in conjunction with the accompanying drawings, which illustrate embodiments of the invention.

Drawings

Fig. 1 is a human body bounding box and a human body bounding box vector diagram in a panoramic image in embodiment 1 and embodiment 2 of the present invention.

Fig. 2 is a schematic diagram of a human body bounding box vector human index sorting method before the "S" type human index sorting in embodiment 1 of the present invention.

Fig. 3 is a schematic diagram of a human body bounding box vector personnel index sorting method after the S-shaped personnel index sorting in embodiment 1 of the present invention.

Fig. 4 is a system block diagram of embodiment 2 of the present invention.

Detailed Description

Embodiment 1, a method for collecting an image of an imperceptible face, includes:

and step S1, shooting a panoramic picture of the classroom at a fixed focal length along a fixed angle by using a fixedly-installed panoramic camera, wherein the classroom is an area to be detected. The people to be detected in the classroom face the same direction, and the panoramic image shot by the panoramic camera covers all the people to be detected.

And detecting the position of the human body in the panoramic image by adopting a Mask R-CNN target detection algorithm. Inputting the panoramic image to a Mask R-CNN target detection algorithm, and detecting object results of 80 types. Each object result includes a class label and bounding box information of the object. Screening human body boundary box vector b with class label of 'person' from output result of Mask RCNN target detection algorithm_i＝[y_1i,x_1i,y_2i,x_2i]As shown in FIG. 1, wherein (x)_1i,y_1i) (x) image coordinates representing the top left vertex of the human bounding box_2i,y_2i) Image coordinates representing the lower right vertex of the human bounding box, i being the person index.

And performing S-shaped personnel index sorting on the screened human body boundary frames according to the positions in the space. Before sorting, the human body bounding box vector b output by the Mask R-CNN target detection algorithm_iThe position corresponding to the human index is shown in fig. 2, the coordinates in the figure correspond to the coordinates of the panoramic image, and the numbers 1 to 13 represent the serial numbers corresponding to the human index i. The sorting method is that firstly, all the bounding boxes are sorted according to the ascending order of the vertical coordinate values of the upper left vertex; then, the bounding box is divided into lines according to the difference value of the vertical coordinates of the top left vertex. If the difference value of the vertical coordinates of the upper left vertexes of different bounding boxes is less than 90ps, the same row is formed; otherwise, the next row is stored. The odd rows are sorted in ascending order of the ordinate values of the upper left vertex to the bounding box, and the even rows are sorted in descending order. After sorting, all the human body bounding box vectors b_iThe position distribution in the coordinate graph is "S" type from top to bottom as shown in fig. 3.

Step S2, transmitting the sequenced human body bounding box vectors bi to a PTZ camera motion model trained in advance, wherein the PTZ camera motion model is used for outputting and inputting human body bounding box vectors b_iCorresponding PTZ camera motion parameter vector p capable of shooting front face image_i,t_i,z_i]Wherein p is_iIs the value of the angle of movement of the PTZ camera in the horizontal direction, t_iIs the value of the angle of movement of the PTZ camera in the vertical direction, z_iIs a variation of PTZ cameraThe coke multiple.

The method for pre-training the motion model of the PTZ camera is as follows: the PTZ camera motion model adopts a multiple linear regression model as follows:

and performing multivariate linear regression on a plurality of manually acquired human body bounding box vectors and corresponding PTZ camera motion parameter vectors capable of shooting the front face image to obtain regression coefficients beta and beta of the human body bounding box vectors and the PTZ camera motion parameter vectors, wherein the plurality of manually acquired human body bounding box vectors are obtained from the panoramic image of the classroom shot by the same panoramic camera along the same angle and the same focal length in the step S1.

During training, the test person(s) (one or more of each) are seated in various locations in the classroom, including the corners of the classroom. Obtaining the human body bounding box vector b of each tester according to the step S1_j＝[y_1j,x_1j,y_2j,x_2j]. Manually controlling the PTZ camera to move to the position of a tester, zooming the face image to obtain a front face image, and recording a camera motion parameter vector [ p ] at the moment_j,t_j,z_j]And forming a sample pair with the corresponding human body bounding box vector. 10 to 20 groups of samples are needed in the test experiment, and then the regression coefficient beta sum in the formula 1 can be solved through a multiple linear regression algorithm, so that the PTZ camera motion model modeling of the classroom is realized.

Step S3, the PTZ camera sequentially follows the corresponding PTZ camera motion parameter vector [ p ] according to the personnel index sequence_i,t_i,z_i]Moving the azimuth and changing the focal length, shooting aiming at people at corresponding positions, carrying out timing screening on video streams shot by a PTZ camera, detecting a face picture in the video picture by adopting an MTCNN algorithm, and then obtaining a three-dimensional vector of a head pose in the face picture by adopting an FSA-Net head pose recognition algorithm, wherein the three-dimensional vector of the head pose comprises three angle values of a yaw angle, a pitch angle and a roll angle, and the three-dimensional vector of the head pose comprises three angle values of a yaw angle, a pitch angle and a roll angleAnd the three-dimensional vector is used as a face angle and is compared with a preset threshold value, and if the face angle is within the threshold value range, a proper front face image is detected.

Embodiment 2, as shown in fig. 4, a system for capturing an image of a non-sensory human face includes a panoramic camera, a PTZ camera, and a server, where the server includes a human body detection module, a PTZ camera motion control module, and a human face angle filtering module.

The panoramic camera serves as a main camera, and the PTZ camera serves as a slave camera to form a master-slave double-camera. The panoramic camera and the PTZ camera are installed right in front of a classroom, and people to be detected in the classroom face the same direction. The panoramic camera is installed at a fixed position in a classroom at a fixed focal length along a fixed angle, and a panoramic image shot by the panoramic camera covers all the people to be detected.

The PTZ camera can be controlled by the server to realize functions of moving direction in the horizontal direction and the vertical direction and changing focal length, is used for capturing high-definition face images of each person to be detected, and the moving range of the PTZ camera covers a panorama shot by the panoramic camera.

In this embodiment, the minimum size of the face pixels of the students in the corners of the classroom and the last row photographed by the panoramic camera is not less than 25 × 20ps, and the face image is clear. The PTZ camera's range of motion covers every corner of the classroom, and there is no blind area of view. The face images of the PTZ camera taken to the corners of the classroom and the last row of students can be magnified to 100 x 100 ps.

The server adopts an operating system Ubuntu-16.04, and a system development language selects python 3.5. The server accesses frame pictures acquired from the panoramic camera and the PTZ camera over a network protocol using a python interface provided by OpenCV (computer vision library). In this embodiment, an rtsp (real Time Streaming protocol) real-Time Streaming protocol is selected. The specific implementation method of image acquisition is to acquire a frame image of a specified camera by calling a VideoCapture function of OpenCV and setting parameters to transfer information such as an IP address, a user name, a password, a port number, a channel number and the like of a video camera into the function. The channel numbers of the panoramic camera and the PTZ camera are divided into 1 and 2, and other equipment information is the same.

The panoramic camera inputs the shot panoramic picture into the human body detection module. The human body detection module detects the human body position in the panoramic image by adopting a Mask R-CNN target detection algorithm in the step S1 of the embodiment 1, and obtains human body bounding box vectors b of all the people to be detected_i＝[y_1i,x_1i,y_2i,x_2i]As shown in FIG. 1, wherein (x)_1i,y_1i) (x) image coordinates representing the top left vertex of the human bounding box_2i,y_2i) Image coordinates representing the lower right vertex of the human bounding box, i being the person index. And then, the screened human body boundary frames are subjected to S-type sorting according to the positions in the space by adopting the same personnel index sorting method as that in the step S1 in the embodiment 1, so that a new personnel index sequence is obtained.

The human body detection module is used for sequencing human body boundary frame vectors b of all to-be-detected personnel_iInput to the PTZ camera motion control module. The PTZ camera motion control module includes a PTZ camera motion model to implement a from-the-human bounding box vector b_iMotion parameter vector [ p ] of PTZ camera to corresponding front face image_i,t_i,z_i]In which p is_iIs the value of the angle of movement of the PTZ camera in the horizontal direction, t_iIs the value of the angle of movement of the PTZ camera in the vertical direction, z_iIs the zoom factor of the PTZ camera. The PTZ camera motion model of this embodiment is the same as the PTZ camera motion model of embodiment 1, and a multiple linear regression model is used. In this embodiment, the PTZ camera motion model is trained in advance by the same method as in embodiment 1, and regression coefficients of the human body bounding box vector and the PTZ camera motion parameter vector are obtained.

The PTZ camera motion control module is used for acquiring the input human body bounding box vector b_iCorresponding PTZ camera motion parameter vector p capable of shooting front face image_i,t_i,z_i]And is used for controlling the PTZ camera to follow the personnel index sequenceSub-pressing the corresponding PTZ camera motion parameter vector [ p ]_i,t_i,z_i]Moving the direction and changing the focal length to aim at the corresponding position of the person to be detected for shooting. And the PTZ camera motion control module transmits the PTZ camera motion parameter vector into a PTZ camera bottom layer control function, and sends an instruction and a motion parameter value to the PTZ camera through a bottom layer communication function to realize a motion control function.

The PTZ camera is used for outputting a shot video stream to the face angle screening module, and the face angle screening module is used for extracting a front face image in the video stream. The face angle screening module comprises a face detection module and a head posture identification module, the face detection module is used for carrying out timing screening on an input video stream, the face detection module adopts an MTCNN algorithm to detect a face picture in the video picture, the head posture identification module is used for obtaining a face angle in the face picture and screening out a front face image, the head posture identification module adopts an FSA-Net head posture identification algorithm to obtain a three-dimensional vector of a head posture in the face picture, the three-dimensional vector of the head posture comprises three angle values of a yaw angle, a pitch angle and a roll angle, the three-dimensional vector of the head posture is used as a face angle and is compared with a preset threshold value, and if the face angle is within the threshold value range, a proper front face image is considered to be detected.

The present invention has been described in connection with the preferred embodiments, but the present invention is not limited to the embodiments disclosed above, and is intended to cover various modifications, equivalent combinations, which are made in accordance with the spirit of the present invention.

Claims

1. A method for collecting an imperceptible face image is characterized by comprising the following steps:

step S1, shooting a panoramic image of the area to be detected along a fixed angle and a fixed focal length from a fixed position, detecting the position of the human body in the panoramic image, and obtaining a human body bounding box vector b_i＝[y_1i,x_1i,y_2i,x_2i]Wherein (x)_1i,y_1i) An image representing the upper left vertex of the human bounding boxCoordinate (x)_2i,y_2i) Image coordinates representing the lower right vertex of the human body bounding box, i being a person index;

2. The method for capturing the non-perceptual human face image as claimed in claim 1, wherein in the step S1, the method for detecting the human body position in the panoramic image is as follows: and detecting the human body boundary box in the panoramic image by adopting a Mask R-CNN target detection algorithm.

3. The method for capturing the sensorless human face image according to claim 1, wherein in step S2, the PTZ camera motion model uses a multiple linear regression model to perform multiple linear regression on the manually captured multiple human body bounding box vectors and the corresponding PTZ camera motion parameter vectors capable of capturing the front face image, so as to obtain regression coefficients of the human body bounding box vectors and the PTZ camera motion parameter vectors, and the manually captured multiple human body bounding box vectors are obtained from the panoramic image of the region to be detected captured at the same position, the same angle, and the same focal length as in step S1.

4. The method of claim 3, wherein the PTZ camera motion model uses a multiple linear regression model as follows:

5. The method according to claim 1, wherein in step S3, a front face image is obtained by using a face detection and head pose recognition algorithm, the video stream taken by the PTZ camera is processed at regular time to detect a face image in the video image, then a face angle in the face image is obtained by using a head pose recognition algorithm, and the front face image is screened out based on a preset standard of the face angle.

6. The method as claimed in claim 5, wherein the MTCNN algorithm is used to detect the face picture in the video picture.

7. The method according to claim 5, wherein the face angle in the face picture is obtained by using an FSA-Net head pose recognition algorithm.

8. A non-perception human face image acquisition system is characterized by comprising a panoramic camera, a PTZ camera and a server; the server comprises a human body detection module, a PTZ camera motion control module and a human face angle screening module;

the panoramic camera is used for inputting a panoramic image of an area to be detected, which is shot from a fixed position along a fixed angle at a fixed focal length, into the human body detection module;

the human body detection module is used for detecting the human body position in the panoramic image and acquiring a human body bounding box vector b_i＝[y_1i,x_1i,y_2i,x_2i]And input to the PTZ camera motion control module, wherein (x)_1i,y_1i) (x) image coordinates representing the top left vertex of the human bounding box_2i,y_2i) Image coordinates representing the lower right vertex of the human body bounding box, i being a person index;

the PTZ camera motion control module includes a PTZ camera motion model to implement a from-the-human bounding box vector b_iMotion parameter vector [ p ] of PTZ camera to corresponding front face image_i,t_i,z_i]In which p is_iIs the value of the angle of movement of the PTZ camera in the horizontal direction, t_iIs the value of the angle of movement of the PTZ camera in the vertical direction, z_iIs the zoom factor of the PTZ camera; the PTZ camera motion control module is used for acquiring the input human body bounding box vector b_iCorresponding PTZ camera motion parameter vector p capable of shooting front face image_i,t_i,z_i]And is used for controlling the PTZ camera to move the azimuth and change the focal length according to the motion parameter vector of the PTZ camera;

the PTZ camera is used for outputting a shot video stream to the face angle screening module, and the moving range of the PTZ camera covers the panorama;

the face angle screening module is used for extracting a front face image in a video stream.

9. A system for non-perceptual facial image capture as defined in claim 8, wherein the panoramic camera and the PTZ camera are installed in a classroom or conference room with the individuals to be detected facing in the same direction, and wherein the panoramic camera captures a panoramic view covering all of the individuals to be detected.

10. The system according to claim 8, wherein the human body detection module is further configured to obtain the human body bounding box vector b_iPerson index ordering by spatial position, the PTZ camera motion controlA module for controlling the PTZ camera to sequentially follow the corresponding PTZ camera motion parameter vector [ p ] according to the personnel index sequence_i,t_i,z_i]Moving the direction and changing the focal length to aim at the corresponding position of the person to be detected for shooting.

11. The system according to claim 8, wherein the face angle screening module comprises a face detection module and a head pose recognition module, the face detection module is used for performing timing screening on an input video stream to detect a face picture in the video picture, and the head pose recognition module is used for acquiring a face angle in the face picture and screening out a front face image based on a preset standard.

12. The system as claimed in claim 8, wherein the system is applied to people counting in classroom or meeting room or outdoor square.