CN109872279B

CN109872279B - Intelligent cloud platform face recognition and local encryption method based on neural network

Info

Publication number: CN109872279B
Application number: CN201811567034.3A
Authority: CN
Inventors: 赵晓芳; 林盛鑫; 刘华珠; 陈雪芳
Original assignee: Dongguan University of Technology
Current assignee: Shenzhen Lewei Innovation Technology Co ltd
Priority date: 2018-12-19
Filing date: 2018-12-19
Publication date: 2020-06-05
Anticipated expiration: 2038-12-19
Also published as: CN109872279A

Abstract

The invention discloses an intelligent cloud platform face recognition and local encryption method based on a neural network, which is used for processing an original image by adopting an image pyramid; establishing a face/non-face classifier, and outputting a large number of results which may be face classification, bounding box regression and face position; regression of the bounding box, refining of the candidate window and discarding of a large number of overlapped windows; landmark positioning, namely realizing the leaving of a candidate window body and simultaneously displaying five face key point positions; the data is encrypted by a chaotic logic diagram and an RC4 stream cipher. The invention adopts a local encryption mode, reduces the time for encrypting the picture and greatly improves the encryption efficiency.

Description

Intelligent cloud platform face recognition and local encryption method based on neural network

Technical Field

The invention relates to a face recognition and local encryption method, in particular to an intelligent cloud platform face recognition and local encryption method based on a neural network.

Background

With the rapid development of internet and cloud storage technologies, the wide application of big data and the popularization of intelligent terminals, the concepts of synchronization and backup gradually enter into the work and life of people, users can store and manage data on various intelligent terminal devices, and in order to prevent data loss on the terminal devices, users hope to backup the data to a safe place, so that the users can access own data on different terminal devices more conveniently. The intelligent cloud service enables the mobile phone and the cloud to carry out cloud data synchronization of data such as address lists, short messages and photos. However, with the increase of personal information data, the cloud data is leaked due to hacker attacks and other reasons, which brings great potential safety hazards such as personal privacy to people, and personal information needs to be encrypted and stored on the intelligent cloud platform. Considering that most of pictures at a mobile phone end are stored in high definition, each high-definition picture is fully encrypted, the operation pressure of a cloud end is greatly increased, in order to solve the safety problem of personal privacy, face detection is carried out on the pictures, the face part of the pictures is locally encrypted, the encryption time and the system operation pressure of the cloud end are greatly reduced, a template matching method commonly used by the traditional face recognition technology is provided, the flexibility of a template is relatively poor and cannot be changed according to the change of the face, the face can be shielded due to illumination, glasses wearing, a hat, posture and the like in actual life or has exaggerated expression, the final detection can be directly influenced, therefore, a face detection technology based on a neural network is required to be designed for face detection, only the detected face area is encrypted and finally stored in the cloud end, the problem of personal privacy security of the photo by the cloud is effectively solved.

Disclosure of Invention

The invention aims to solve the technical problem of providing an intelligent cloud platform face recognition and local encryption method based on a neural network, and the encryption efficiency is improved.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

an intelligent cloud platform face recognition and local encryption method based on a neural network is characterized by comprising the following steps:

the method comprises the following steps: processing the original image by adopting an image pyramid;

step two: establishing a face/non-face classifier, and outputting a large number of results which may be face classification, bounding box regression and face position;

step three: regression of the bounding box, refining of the candidate window and discarding of a large number of overlapped windows;

step four: landmark positioning, namely realizing the leaving of a candidate window body and simultaneously displaying five face key point positions;

step five: the data is encrypted by a chaotic logic diagram and an RC4 stream cipher.

Further, in the first step, the image pyramid setting scales are respectively 0.80,0.57,0.40,0.28,0.20,0.14,0.10,0.07,0.05,0.04, and 0.02.

Further, the second step specifically provides a large number of candidate windows for the face/non-face classifier, so as to exhaust all the candidate frames which may be faces, and calculate a bounding box regression vector;

exhaustive use of faces and non-faces is the cross entropy loss function

Realizing a classification task;

wherein p is_iRepresenting the probability that the i-th candidate frame is a human face,

and a real mark corresponding to the ith candidate form is represented, wherein 0: non-human face, 1: the face is taken as a candidate box only if Li is 1.

Further, the third step is that the bounding box regression has three layers of convolutional neural networks and a full connection layer and is used for screening negative samples; the bounding box regression plays a transition role of an intermediate network, rejects some non-face windows, calculates bounding box regression vectors, and provides suitable candidate boxes for a next layer of network.

Further, the fourth step is specifically that the landmark positioning is used to find 5 landmark points on the output face, and the landmark point calculation formula:

the regression loss is calculated using the euclidean distance, where,

in order to predict the value of the neural network,

real landmark coordinate values; a total of 5 points on both sides of the two eyes, nose and mouth corner are located, each point containing two values (x, y),

belong to the ten-tuple;

in the course of the training process,

and y_iIoU ratio:

0-0.3: non-human face

0.3-0.4: landmark

0.4-0.65: a part of face

0.65-1.00: a human face.

Further, the fifth step is specifically that

The chaotic logic diagram is one of popular chaotic systems, and a CLM function in a general form is considered

Where λ is a control parameter for the interval, λ ═ 0,4]，

Is a real number on the interval and,

let λ be 4, then the above formula is

And then encrypted by an RC4 stream cipher.

Further, the RC4 stream cipher encryption is implemented by converting an external key into an initial value X0;

generating a pseudo-random number on the CLM function using the initial value X0;

the byte stream and the pseudo-random number stream of the pure image are exclusive-ored while the encryption process is performed.

Further, the calibration is performed after the step two, the step three and the step four by respectively performing non-maximum suppression.

Further, the non-maximum suppression specifically includes sorting scores of all frames, and selecting a highest score and a frame corresponding to the highest score;

traversing the other frames, and deleting the frame if the overlapping area IOU of the current highest frame is larger than a certain threshold value;

and continuing to select one with the highest score from the unprocessed boxes, and repeating the process.

Compared with the prior art, the invention has the following advantages and effects: the method adopts the MTCNN, sequentially detects all the human faces in the pictures and 5 feature points on two sides of the eyes, the nose and the mouth corner of the human faces in a mode of cascading 3 convolutional neural networks, further generates a human face detection area and 5 feature point positions, and then efficiently encrypts the images of a chaotic logic diagram and an RC4 flow password in the detection area. Experiments prove that by adopting a local encryption mode, the size of an image to be encrypted is reduced, the occupation of a computer memory is saved, the picture encryption is accelerated, the encryption time of the whole picture is reduced, and the encryption efficiency is greatly improved.

Drawings

Fig. 1 is a flow chart of an intelligent cloud platform face recognition and local encryption method based on a neural network.

Detailed Description

The present invention is further illustrated by the following specific examples, which are illustrative of the present invention and are not to be construed as being limited thereto.

An intelligent cloud platform face recognition and local encryption method based on a neural network comprises the following steps:

the image pyramid is a kind of multi-scale representation of an image, and is an effective but conceptually simple structure to interpret an image in multi-resolution. A pyramid of an image is a series of image sets of progressively lower resolution arranged in a pyramid shape and derived from the same original image. The image set is obtained by down-sampling in steps, the sampling being stopped until a certain termination condition is reached. If the images at one level are compared to a pyramid, the higher the level, the smaller the image, and the lower the resolution. The reason for using the image pyramid here is to solve the scale-unfriendliness, i.e. to scale the image so that the smallest size that can be detected is equal to the smallest face size we want to detect, where we set scales of 0.80,0.57,0.40,0.28,0.20,0.14,0.10,0.07,0.05,0.04,0.02, respectively. (scales are set here based on the size of the image and the minimum detection frame size, the impact factor is calculated.).

MTCNN (Multi-task Cascaded Neural Networks), which is a face detection framework composed of three CNNs (compact Neural Networks) in a cascade mode. From the classification task to the detection task, the classification task is a Proposal Network (P-Net), a Refine Network (R-Net) and an Outputnetwork (O-Net), and the classification task mainly realizes the functions of face/non-face classification, bounding box regression, landmark positioning and the like. The P-Net mainly obtains regression vectors of candidate windows and boundary frames of a face region by adopting a full convolution neural network, and then combines highly overlapped candidate frames through Non Maximum Suppression (NMS) to obtain a large number of candidate windows and boundary regression vectors.

And finally, the pictures which are determined by the P-Net and contain a large number of candidate frames are put into the R-Net for training. R-Net removes most false-positive candidate frames by bounding box regression and NMS. R-Net has an additional full connection layer than P-Net, has effects of inhibiting the candidate form of false detection better. The O-Net function is the same as that of the R-Net layer, the O-Net layer has one more convolution layer than the R-Net layer, and the processing result is more precise. While removing the overlapping candidate windows, five key points of the face are detected and located (Landmark).

And a face/non-face classifier and a shallow CNN (CNN) quickly generate a candidate form. P-NET (pro posal) has three layers of convolution neural networkThe dots are light and exist to exhaust boxes that may be faces. The Proposal Net (P-Net) provides a large number of candidate windows in order to exhaust all possible candidate boxes that may be faces and to compute a bounding box regression vector. An exhaustive use of faces and non-faces is the cross entropy loss function. Cross Entropy (Cross Entropy) is an important concept in Shannon information theory, and is mainly used for measuring difference information between two probability distributions, and the Cross Entropy can be used as a loss function in a neural network (machine learning). Now, assuming that p represents the distribution of real markers, q is the distribution of predicted markers of the trained model, the cross entropy loss function can measure the similarity between p and q, and further realize the classification task. This is a face/non-face classification task, using a cross-entropy loss function, where p_iRepresenting the probability that the i-th candidate frame is a human face,

and (3) representing a real mark (0: non-face; 1: face, which is taken as a candidate frame only when Li is 1) corresponding to the ith candidate frame. Cross entropy formula:

the convolution kernel of each layer of the network is 3 x 3, the size of the input image is 12 x 3(12 is the length and width of the input image, 3 is the color channel), a large number of results which can be the classification of the human face, the regression of the bounding box and the position of the human face are output and transmitted to the R-NET for the second stage.

bounding box regression refines the candidate window by more complex CNN, discarding a large number of overlapping windows. R-NET (re-check) has three layers of convolutional neural networks and a full-link layer, and the R-NET (re-check) is present for screening negative samples. The RefingNet (R-NET) has one more convolution layer than the P-NET, plays a transitional role of an intermediate network, rejects some non-face windows, calculates bounding box regression vectors and provides a proper candidate frame for a next layer of network.

The convolution kernel size of the first and second layers was 3 x 3, the convolution kernel size of the third layer was 2 x 2, and the last fully connected layer had 128 neurons. The input layer neuron number is 24 x 3(24 is the length and width of the input image, 3 is the color channel), the output is the result of discarding most of the face classification, bounding box regression, and face position, the result is passed to the O-NET for the third stage.

and (4) landmark positioning, using a larger CNN to realize the elimination of a candidate frame, and simultaneously displaying five facial key point positions. Output Net (O-NET), which has one more layer of convolution than R-NET, is used to find 5 landmark points on the output face, and the landmark point calculation formula:

the regression loss is calculated using the euclidean distance, where,

in order to predict the value of the neural network,

is the real coordinate value of the landmark. A total of 5 points (both eyes, nose and mouth corners) are located, each point containing two values (x, y), so,

belonging to a ten-tuple. In the course of the training process,

and y_iIntersection-over-Union ratio of (IoU):

0-0.3: non-human face

0.3-0.4: landmark

0.4-0.65: a part of face

0.65-1.00: human face

The O-NET (re-check) has four layers of convolutional neural networks and a full connection layer; the convolution kernels of the first, second and third layers are 3 × 3, the convolution kernel of the fourth layer is 2 × 2, the last fully-connected layer has 256 neurons, the input size is 48 × 3(48 is the length and width of the input image, and 3 is a color channel), and the output is the accurate face classification, bounding box regression and face position results.

The three networks connect face classification, bounding box regression and face key point position in parallel on the last full connection layer in a multi-task mode, so that the network performance is improved in the training process while multiple tasks are parallel end to end.

Chaos is a ubiquitous phenomenon that exists in deterministic non-linear systems, which exhibits extremely high sensitivity to initial conditions and has random similar behavior. The chaotic logic diagram is one of popular chaotic systems, and a CLM function in a general form is considered

Where λ is a control parameter for the interval, λ ═ 0,4]，

Is a real number on the interval and,

let λ be 4, then the above formula is

And then encrypted by an RC4 stream cipher. Stream ciphers are symmetric cryptographic algorithms that produce a ciphertext output from a plaintext input stream bit-by-bit or byte-by-byte. RC4 is used in SSL/TLS (secure sockets protocol/transport layer security protocol) standards established for communications between web browsers and servers, as well as in wep (wireless equivalent privacy) protocol and the new WiFi protected access protocol (WAP), which are part of the IEEE 801.11 wireless local area network standard. Digital image transmission via the internet requires security protection.

The specific process of the RC4 stream cipher encryption is

Converting the external key to an initial value X0;

And step two, step three and step four are respectively calibrated by non-maximum suppression.

Non-Maximum Suppression (NMS), which is the meaning of suppressing elements that are not maxima, can be understood as a local Maximum search. The local representation is a neighborhood, and the neighborhood has two variable parameters, namely the dimension of the neighborhood and the size of the neighborhood. The general NMS algorithm is not discussed here but is used to extract the highest scoring window in object detection. For example, in pedestrian detection, a sliding window is subjected to feature extraction, and after classification and identification by a classifier, each window is subjected to a score. But sliding windows can result in many windows containing or mostly crossing other windows. The NMS is then used to select the window with the highest score (highest probability of being a pedestrian) in those neighborhoods and suppress those windows with low scores. NMS has very important applications in the field of computer vision, such as video target tracking, data mining, 3D reconstruction, target recognition, and texture analysis.

NMS algorithm flow in face detection:

sort the scores of all boxes, select the highest score and its corresponding box

Traverse the rest of the boxes, and if the overlap area (IOU) with the current highest sub-box is greater than a certain threshold, we delete the box.

Continue to select one with the highest score from the unprocessed boxes, and repeat the above process.

The method adopts the MTCNN, sequentially detects all the human faces in the pictures and 5 feature points on two sides of the eyes, the nose and the mouth corner of the human faces in a mode of cascading 3 convolutional neural networks, further generates a human face detection area and 5 feature point positions, and then efficiently encrypts the images of a chaotic logic diagram and an RC4 flow password in the detection area. As shown in fig. 1, experiments prove that, by adopting a local encryption mode, the size of an image to be encrypted is reduced, the occupation of a computer memory is saved, the picture encryption is accelerated, the time for encrypting a whole picture is reduced, and the encryption efficiency is greatly improved.

The above description of the present invention is intended to be illustrative. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.

Claims

1. An intelligent cloud platform face recognition and local encryption method based on a neural network is characterized by comprising the following steps:

step two: establishing a face/non-face classifier, and outputting results which may be face classification, bounding box regression and face position;

the second step is specifically that

The face/non-face classifier provides a candidate window, aims to exhaust all possible candidate frames of the face and calculates a regression vector of a bounding box;

exhaustive use of faces and non-faces is the cross entropy loss function

Realizing a classification task;

and a real mark corresponding to the ith candidate form is represented, wherein 0: non-human face, 1: the face is taken as a candidate frame only when Li is 1;

the fourth step is specifically that

Landmark positioning is used to find 5 landmark points on the output face, the landmark point calculation formula:

the regression loss is calculated using the euclidean distance, where,

in order to predict the value of the neural network,

real landmark coordinate values; a total of 5 eyes, nose and mouth corners are positionedA dot, each dot containing two values (x, y),

belong to the ten-tuple;

in the course of the training process,

and y_iIoU ratio:

0-0.3: non-human face

0.3-0.4: landmark

0.4-0.65: a part of face

0.65-1.00: a human face;

step five: encrypting data through a chaotic logic diagram and an RC4 stream cipher;

the fifth step is specifically that

Where λ is a control parameter for the interval, λ ═ 0,4]，

Is a real number on the interval and,

let λ be 4, then the above formula is

And then encrypted by an RC4 stream cipher.

2. The intelligent cloud platform face recognition and local encryption method based on the neural network as claimed in claim 1, wherein: in the first step, the setting scales of the image pyramid are respectively 0.80,0.57,0.40,0.28,0.20,0.14,0.10,0.07,0.05,0.04 and 0.02.

3. The intelligent cloud platform face recognition and local encryption method based on the neural network as claimed in claim 1, wherein: the third step is specifically that

The bounding box regression has three layers of convolutional neural networks and a full connection layer and is used for screening negative samples; the bounding box regression plays a transition role of the intermediate network, rejects some non-face windows, calculates a bounding box regression vector, and provides a suitable candidate box for the next layer of network.

4. The intelligent cloud platform face recognition and local encryption method based on the neural network as claimed in claim 3, wherein: the specific process of the RC4 stream cipher encryption is

Converting the external key to an initial value X0;

5. The intelligent cloud platform face recognition and local encryption method based on the neural network as claimed in claim 1, wherein: and respectively carrying out non-maximum value inhibition for calibration after the second step, the third step and the fourth step.

6. The intelligent cloud platform face recognition and local encryption method based on the neural network as claimed in claim 5, wherein: the non-maximum suppression is specifically

Sorting the scores of all the frames, and selecting the highest score and the frame corresponding to the highest score;

traversing the remaining boxes, and if the overlap area IoU with the current highest sub-box is greater than a certain threshold, deleting the box;