CN111212275A

CN111212275A - Vehicle-mounted monitoring device, monitoring system and monitoring method based on expression recognition

Info

Publication number: CN111212275A
Application number: CN202010145340.9A
Authority: CN
Inventors: 杨雪; 张乃欣; 夏细明; 陈巍; 刘静; 马培立; 张天程; 曹哲奇; 刘家乐; 汪晟; 卢欣; 杨承斌
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2020-05-29
Anticipated expiration: 2040-03-05
Also published as: CN111212275B

Abstract

The invention relates to a vehicle-mounted monitoring device, a monitoring system and a monitoring method based on expression recognition. The vehicle-mounted monitoring device has the advantages of good concealment, large monitoring visual field and good monitoring effect. The monitoring system and the monitoring method based on expression recognition can realize real-time monitoring of the mobile terminal, the monitoring data is not easy to lose, and the control operation of the monitoring device in the vehicle can be realized through the mobile terminal. The face detection and suspicious expression recognition are based on a deep learning algorithm to train an expression recognition model, potential invasion objects can be automatically recognized, and the recognition accuracy rate is high. Compared with other algorithms, the Adaboost face position detection model has better calculation efficiency and is suitable for the requirement of real-time operation. Compared with the traditional CNN network for classifying expressions, the LSTM network with guaranteed time sequence is used, the LSTM network is more concentrated on continuous expression changes, the detection result of each frame can influence the next frame, and suspicious judgment results can be output at each moment, so that the reliability of suspicious expression judgment and the timeliness of alarming are improved.

Description

Vehicle-mounted monitoring device, monitoring system and monitoring method based on expression recognition

Technical Field

The invention belongs to the field of security and protection, and particularly relates to a vehicle-mounted monitoring device, a monitoring system and a monitoring method based on expression recognition.

Background

The traditional vehicle-mounted monitoring device is easy to shield the sight line after being installed, and the shooting view is small, so that the vehicle and the inside and outside conditions can not be monitored in an all-round manner. In addition, the appearance is abrupt, the concealment is poor, and the problem that the film is easy to be discovered and damaged by lawless persons exists. With the improvement of the automobile holding capacity in China, the criminal cases related to vehicle damage and theft are increased day by day, and the property safety of people is seriously threatened. Therefore, a vehicle-mounted monitoring device with better concealment and monitoring effect is urgently needed.

Disclosure of Invention

The invention provides a vehicle-mounted monitoring device with better concealment and monitoring effect. The invention also provides a monitoring system and a monitoring method based on expression recognition, which can monitor and intelligently early warn infringement behaviors in real time.

The technical scheme adopted by the invention is as follows:

the vehicle-mounted monitoring device is of a bilateral symmetry structure and comprises a device body, a left camera arm, a right camera arm, a front camera device and a rearview mirror, wherein cavities matched with the corresponding camera arms are respectively formed in the left end and the right end of the device body, the left camera arm and the right camera arm are telescopically installed in the device body, and a driving mechanism for driving the two camera arms to stretch is installed in the middle position in the device body; the front camera device is fixed in the middle of the front end face of the device body, and the rearview mirror is fixed on the rear end face of the device body.

Furthermore, the left camera arm and the right camera arm respectively comprise a stepping motor, a motor base, a first rotating shaft, a sleeve and a camera module, the sleeve is slidably arranged in the device body and in a cavity at the corresponding end, the motor base is fixed at the end part of the inner side of the sleeve, the stepping motor is fixed on the motor base, and an output shaft of the stepping motor penetrates through the motor base and is fixedly connected with the first rotating shaft in the sleeve;

the camera module comprises a camera fixing box, a first knob, a second knob, a first camera, a first LED light supplementing lamp, a second photosensitive sensor and a stop block, wherein the first LED light supplementing lamp comprises a plurality of first LED lamp beads, the stop block, the first knob, the camera fixing box and the second knob are sequentially and fixedly connected, the stop block is matched with the end face of the corresponding end of the device body, the first camera is fixed in the middle of the front end face of the camera fixing box, the second photosensitive sensor and the plurality of first LED lamp beads are uniformly fixed on the front end face of the camera fixing box in the circumferential direction around the first camera, and the second knob is rotatably connected with the outer end of the first rotating shaft;

the driving mechanism comprises a steering engine, a gear, a first rack and a second rack, the steering engine is fixed at an intermediate position in the device body, the gear is fixed on an output shaft of the steering engine, the two racks are horizontally arranged in the device body and are respectively meshed with the gear, the left end of the first rack is integrally connected with a sleeve of the left camera shooting arm, the second rack is integrally connected with a sleeve of the right camera shooting arm, and grooves used for containing the opposite racks are respectively formed in the two sleeves.

Furthermore, the left camera arm and the right camera arm respectively comprise a second rotating shaft, the second rotating shaft is arranged in the sleeve and is rotationally connected with the first rotating shaft through a pin shaft to form a joint body capable of rotating forwards and backwards, the left side and the right side of the front end surface of the device body are respectively provided with a semicircular guide groove for the corresponding joint body to rotate forwards, and the two sleeves are correspondingly provided with holes; the second knob is rotatably connected with the outer end of the second rotating shaft.

Further, the front camera device comprises a front shell, a second camera, a second LED light supplement lamp and a third photosensitive sensor, the second LED light supplement lamp comprises a plurality of second LED lamp beads, the front shell is integrally fixed in the middle of the front end face of the device body, the second camera is fixed in the middle of the front end face of the front shell, and the third photosensitive sensor and the second LED lamp beads are uniformly fixed on the front end face of the front shell in the circumferential direction around the second camera; the rear-view mirror adopts the anti-dazzling rear-view mirror, rear-view mirror upper portion intermediate position is fixed with first photosensor, rear-view mirror upper portion left side is fixed with the hidden camera in a left side, rear-view mirror upper portion right side is fixed with the hidden camera in the right side.

The monitoring system based on expression recognition comprises a cloud server, a raspberry group, a Beidou GPS module, a mobile terminal provided with a monitoring APP and the vehicle-mounted monitoring device, wherein the steering engine and the two-step motor are matched with driving modules, each driving module is respectively communicated with a serial port of the raspberry group, each photosensitive sensor is respectively communicated with the serial port of the raspberry group, the Beidou GPS module is communicated with the serial port of the raspberry group, and each camera and each LED light supplementing lamp are respectively communicated with the serial port of the raspberry group; the raspberry pie and the cloud server realize real-time communication with the cloud server through a 5G/4G dual-frequency wireless network card, a TCP/IP protocol and an HTTP protocol through a 5G/4G dual-frequency wireless router, and upload monitoring videos and GPS data in real time; the mobile terminal is in real-time communication with the cloud server through a 5G/4G mobile network;

the cloud server includes: the communication module is used for realizing the two-way communication between the mobile terminal and the raspberry pi; the suspicious expression recognition module is used for judging the suspicious degree of the facial expression of the monitoring video; the risk early warning module is used for sending early warning to the mobile terminal when the suspicious expression identification module judges the suspicious expression with high confidence coefficient, and the early warning mode comprises short message prompt, APP popup and push prompt; the cloud data management module is used for storing and managing historical monitoring videos, video clips with suspicious human faces and GPS data;

the mobile terminal monitoring APP is used for managing and caching the cloud monitoring videos and the video segments of suspicious human faces, sending control instructions to the raspberry, and visually displaying monitoring pictures, the video segments of suspicious people, historical vehicle GPS tracks and the current position.

The monitoring method based on expression recognition comprises the following steps:

step 1), collecting a monitoring video;

step 2), training an expression recognition model in advance;

and 3) applying the expression recognition model to perform expression recognition on the monitoring video, and sending out early warning when the confidence coefficient of suspicious expression recognition is high.

Further, in step 2), the method for training the expression recognition model includes:

step a, inputting an AFEW data set, taking images with a set proportion in the AFEW data set as a training set, and taking the rest images in the AFEW data set as a verification set;

b, training a VGG16 network;

before VGG16 network training, loading model parameters of VGGFace as pre-training weights by adopting a transfer learning method; training of the VGG16 network is started by using a training set, a characteristic value array, picture labels and video labels of each frame are generated after characteristic extraction, a Softmax layer is added behind the VGG16 network for classification operation, and the classification reference is that the label of each picture is the same as that of each video frame;

the VGG16 network uses a convolution kernel of 3 x 3 to carry out convolution operation, the convolution sampling scale is 1 x 1, and the pooling interval is 2 x 2; the network is sequentially provided with two layers of convolution Conv3-64 and one layer of pooling pool 1/2; two layers of convolution Conv3-128, one layer of pooling pool 1/2; three layers of convolution Conv3-256, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; two full connecting layers FC-4096;

the pooling layer pool1/2 is the largest pooling layer with a window size of 2 × 2 and a step size of 2;

processing the linear output by using a Softmax loss function, and if the total number of the classes of the classifiers is c, determining the ratio of the index of the current element to the sum of the indexes of all the elements as the Softmax value S of the current element_i，

Wherein, V_iI is a category index and e represents a constant;

S_ithe corresponding Softmax loss function is L_i，

L_iThe closer to 0, the more accurate the classification result is; s_jRepresents the Softmax value of the current element;

step c, LSTM network training;

b, removing the added Softmax layer and the 2 nd layer full connection layer from the VGG16 after training is completed, and connecting the VGG16 in front of the LSTM network; after a video micro expression sequence extracted from each frame of video by adopting a traditional CNN model is calculated by VGG16 to obtain a video micro expression image characteristic value, the video micro expression image characteristic value is used as the input of an LSTM and is trained in an LSTM network; when an LSTM model is trained, selecting each 12 frames of video images as a segment for iteration;

the LSMT network structure is that each VGG16 is connected with an LSTM network with 128 hidden layer units in a single layer; the network consists of an LSTM circulation unit, wherein the circulation unit comprises a forgetting gate, an input gate and an output gate; the forgetting gate function ft is used for determining which information is lost by the network, and the output value of the forgetting gate function ft is between 0 and 1;

f_t＝σ(W^fx_t+U^fh_t-1)

where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the LSTM at the last moment; w^fAnd U^fThe weight coefficient parameters are determined after training and learning;

input gate function i_tNew information for deciding network to add

i_t＝σ(Wⁱx_t+Uⁱh_t-1)

Where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the LSTM at the last moment; wⁱAnd UⁱThe weight coefficient parameters are determined after training and learning;

updating door states

For describing the latest state of the current network

Where t denotes the current time, x_tIs at presentTime of day network input, h_t-1The network output value of the LSTM at the last moment; w^cAnd U^cThe weight coefficient parameters are determined after training and learning;

new information to be added after the update operation is c_t

Finally, the output gate function O can be determined_tIs composed of

o_t＝σ(W^ox_t+U^oh_t-1)

Where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the LSTM at the last moment; w^oAnd U^o⊙ represents point multiplication operation, which is weight coefficient parameter and needs to be determined after training and learning;

and the output value of the LSTM network at the current moment is h_t

h_t＝o_t⊙tanh(C_t)

Finally, sequentially stacking all layers of the LSTM network to increase depth and add the LSTM network to the whole LSTM system model, adopting the LSTM hidden state in the first layer of the 1 st network as LSTM input in the layer 1, and sequentially and circularly connecting all layers of networks to form a long-time dynamic network; effective information in the facial expression feature sequence can be continuously updated and output to the next layer of network;

d. sending the verification set into a VGG16 and an LSTM network, and outputting a test result;

sending the verification set into a VGG16 network after training, and giving a test result by a full connection layer FC in VGG 16;

e. calculating the accuracy of the model test result;

representing the validation set accuracy of the model by ACC

Wherein, TP is the number of positive classes predicted by the positive class, namely the number of correct models predicted; TN is the number of predicting the negative class as the negative class number, namely the number of predicting other classes of the model correctly; the FP predicts the negative class as the false positive class number, namely the model predicts the error of other classes as the number of the class; the FN predicts the positive class as the negative class number and fails to report, namely the model is the number of the prediction errors of the current class; ACC requirement is higher than 95%.

Further, in step a, 70% of images in the AFEW data set are used as a training set, and 30% of images are used as a verification set.

Further, in step 3), performing expression recognition on the monitoring video by using the expression recognition model, including:

step A, inputting a monitoring video image to be detected;

dividing the monitoring video to obtain images of each frame, and taking each 12 frames of video frames as a section to respectively detect each frame;

b, Adaboost, carrying out face position detection on the strong classifier and skin color segmentation obtained by algorithm training;

the face detection is divided into three parts: training a sample classifier, segmenting skin color and detecting human face; training a sample classifier to load a human face sample and a non-human face sample, and training the classifier by using an Adaboost algorithm; the skin color segmentation acquires a skin color area in the picture in a skin color area detection and segmentation mode, and excludes most non-skin color areas during face detection; the human face detection is to cascade strong classifiers trained by using an Adaboost algorithm, further carry out human face verification on the skin color area obtained in the skin color segmentation step, and position the position of the human face;

step C, VGG16, carrying out feature extraction and classification with the LSTM network;

after a face region obtained by an Adaboost algorithm is cut, the face region is sent to a VGG16 and a single-layer LSTM network after training is completed, a result judged by the model weight is output by a full-connection layer FC with the dimension of 128, and a result set comprises anger, gables, fear, happiness, sadness, surprise and neutrality;

step D, outputting a classification result;

classifying the classification result output by the full connection layer FC in the step C again according to the requirement of suspicious expression identification, classifying the angry, slight, fear and surprise expressions into suspicious expressions, classifying the happy and sad expressions into secondary suspicious expressions, and classifying the neutral expressions into normal expressions; finally, the output classification result set comprises suspicious expressions, sub-expressions and normal expressions; suspicious expressions, also known as suspicious expressions with high confidence levels.

Further, in step B, the step of detecting the face position is: firstly, dividing an image to be detected into 20 samples by 20 samples, wherein the size of a detection window is consistent with that of the samples; then, traversing the image from left to right and from top to bottom, detecting 20 regions by 20, and marking out possible face regions; then, setting a parameter scale to amplify the detection window according to the multiple, and circularly and ceaselessly amplifying the detection window to traverse the image until the detection window exceeds half of the size of the original image and then exiting the cycle; and merging the face areas detected in an overlapping mode once every traversal.

The invention has the beneficial effects that:

the vehicle-mounted monitoring device has the advantages of good concealment, large monitoring visual field and good monitoring effect. The monitoring system and the monitoring method based on expression recognition can realize real-time monitoring of the mobile terminal, the monitoring data is not easy to lose, the monitoring view of the monitoring device in the vehicle can be adjusted through the mobile terminal, and the operation is convenient. The face detection and suspicious expression recognition are based on a deep learning algorithm to train an expression recognition model, potential invasion objects can be automatically recognized, and the recognition accuracy rate is high. Compared with other algorithms, the Adaboost face position detection model has better calculation efficiency and meets the requirement of real-time operation. Compared with the traditional CNN network for classifying expressions, the LSTM network with guaranteed time sequence is used, the LSTM network is more concentrated on continuous expression changes, the detection result of each frame can influence the next frame, and suspicious judgment results can be output at each moment, so that the reliability of suspicious expression judgment and the timeliness of alarming are improved.

Drawings

FIG. 1 is a schematic structural diagram of a vehicle-mounted monitoring device according to the present invention;

FIG. 2 is a left side view of FIG. 1;

FIG. 3 is a schematic diagram of the internal structure of the vehicle-mounted monitoring device (the camera arm is in an extended state);

FIG. 4 is a view from the B-B direction of FIG. 3;

fig. 5 is a schematic diagram of the internal structure of the vehicle-mounted monitoring device (a storage state of the camera arm);

FIG. 6 is a front view of the in-vehicle monitoring device;

FIG. 7 is a rear view of FIG. 6;

FIG. 8 is a top view of FIG. 6;

FIG. 9 is a schematic diagram of a monitoring system according to the present invention;

FIG. 10 is a flow chart of the operation of the monitoring system;

FIG. 11 is a workflow diagram of a cloud server;

FIG. 12 is a short message warning flowchart;

FIG. 13 is a flowchart illustrating monitoring APP usage by a mobile terminal;

FIG. 14 is a flow chart of a monitoring method based on expression recognition according to the present invention;

FIG. 15 is a diagram of a suspicious facial expression recognition network model;

reference numerals: 1-a device body, 2-a left camera arm, 3-a right camera arm, 4-a front camera device, 5-a rear view mirror, 6-a first photosensitive sensor, 7-a left concealed camera, 8-a right concealed camera, 9-a steering engine, 10-a gear, 11-a camera fixing box, 12-a first knob, 13-a second knob, 14-a first camera, 15-a first LED light supplement lamp, 16-a second photosensitive sensor, 17-a stop block, 18-a stepping motor, 19-a motor base, 20-a first rotating shaft, 21-a sleeve, 22-a pin shaft, 23-a second rotating shaft, 24-a first rack, 25-a second rack, 26-a front shell, 27-a second camera and 28-a second LED light supplement lamp, 29-third light sensitive sensor.

Detailed Description

The vehicle-mounted monitoring device, the monitoring system based on expression recognition and the monitoring method of the invention are further described in detail with reference to the accompanying drawings and specific embodiments.

The vehicle-mounted monitoring device shown in fig. 1 and 2 has a bilaterally symmetrical structure, and includes a device body 1, a left camera arm 2, a right camera arm 3, a front camera 4, and a rearview mirror 5. The left end and the right end of the device body 1 are respectively provided with a cavity matched with the corresponding camera shooting arms, the left camera shooting arm 2 and the right camera shooting arm 3 are telescopically installed in the device body 1, and a driving mechanism for driving the two camera shooting arms to stretch is installed at the middle position in the device body 1. The front camera 4 is fixed at the middle position of the front end face of the device body 1, and the rear view mirror 5 is fixed on the rear end face of the device body 1.

Specifically, as shown in fig. 3 to 5, each of the left camera arm 2 and the right camera arm 3 includes a stepping motor 18, a motor base 19, a first rotating shaft 20, a sleeve 21, a second rotating shaft 23, and a camera module, the sleeve 21 is slidably disposed in the device body 1 and in the corresponding end cavity (the sleeve 21 is matched with the corresponding end cavity), the motor base 19 is fixed at the inner end of the sleeve 21 by a screw, the stepping motor 18 is fixed on the motor base 19 by a screw, and an output shaft of the stepping motor 18 passes through the motor base 19 and is fixedly connected to the first rotating shaft 20 in the sleeve 21 by a cone end stop screw. The second rotating shaft 23 is arranged in the sleeve 21 and is rotatably connected with the first rotating shaft 20 through a pin 22 to form a joint body capable of rotating forwards and backwards, semicircular guide grooves for the corresponding joint body to rotate forwards are formed in the left side and the right side of the front end face of the device body 1, and the two sleeves 21 correspond to the holes.

Referring to fig. 3 and 6, the camera module includes camera fixed box 11, first knob 12, second knob 13, first camera 14, first LED light filling lamp 15, second light sensor 16 and dog 17, first LED light filling lamp 15 includes a plurality of first LED lamp pearls, dog 17, first knob 12, camera fixed box 11 and second knob 13 are fixed connection in proper order, dog 17 and device body 1 correspond end face phase-match, first camera 14 is fixed at camera fixed box 11 preceding end face intermediate position, second light sensor 16 and a plurality of first LED lamp pearls surround first camera 14 circumference and evenly fix on camera fixed box 11 preceding end face, second knob 13 rotates with second pivot 23 outside terminal part to be connected, specifically, second pivot 23 outside terminal part an organic whole is provided with a minor axis, this minor axis rotates with second knob 13 to be connected.

As shown in fig. 3, the driving mechanism includes a steering engine 9, a gear 10, a first rack 24 and a second rack 25, the steering engine 9 is fixed at a middle position in the device body 1, the gear 10 is fixed on an output shaft of the steering engine 9, the two racks are horizontally arranged in the device body 1 and are respectively meshed with the gear 10, the left end of the first rack 24 is integrally connected with a sleeve 21 of the left camera arm 2, the second rack 25 is integrally connected with a sleeve 21 of the right camera arm 3, and grooves for accommodating the opposite racks are respectively formed in the two sleeves 21.

As shown in fig. 7, the front camera device 4 includes a front housing 26, a second camera 27, a second LED light supplement lamp 28 and a third photosensor 29, the second LED light supplement lamp 28 includes a plurality of second LED lamp beads, the front housing 26 is integrally fixed at a middle position of a front end surface of the device body 1, the second camera 27 is fixed at a middle position of a front end surface of the front housing 26, and the third photosensor 29 and the plurality of second LED lamp beads are circumferentially and uniformly fixed on the front end surface of the front housing 26 around the second camera 27. Rear-view mirror 5 adopts the anti-dazzling rear-view mirror, and 5 upper portion intermediate positions of rear-view mirror are fixed with

first photosensor

6, and 5 upper portion left sides of rear-view mirror are fixed with left

hidden camera

7, and 5 upper portion right sides of rear-view mirror are fixed with right hidden camera 8. Two hidden cameras are used for the interior passenger of car and the shooting of scene behind the car, because hide in reflector glass rear side, generally be difficult to by people discovery and notice, can realize the secret control, avoid supervisory equipment to discover and suffer destruction. The second camera 27 is used for taking images in front of the automobile.

The outer shell (the device body 1 and the front shell 26) of the vehicle-mounted monitoring device adopts ABS engineering plastics, and the shell can be selected to be in a color matched with the interior trim of an automobile. The appearance of the vehicle-mounted monitoring device is similar to that of a common vehicle-mounted inner rear-view mirror, and the vehicle-mounted monitoring device can be used as the common rear-view mirror at ordinary times. In the initial state, the two imaging arms are retracted inside the apparatus body 1, see fig. 5. During monitoring, the two camera arms can be driven by the steering engine 9 to synchronously extend, and the reference is made to fig. 2. Corresponding camera modules can be driven to rotate around the axis by the respective stepping motors 18 of the two camera arms, and meanwhile, the camera modules can also be rotated by manually rotating knobs on the camera modules. In addition, the second rotating shaft 23 can be rotated outward for adjusting the views of the left and right sides, as shown in fig. 1 and 8. The invention can monitor the conditions inside and outside the vehicle in a concealed way, the monitoring visual field is adjustable, the visual field visual angle range is large, and no blind area exists in the shooting scene. During normal use, the second camera 27 through left and right camera module and front end monitors. When abnormal use to and when left and right camera module was packed up, monitor through second camera 27 and two concealed cameras.

As shown in fig. 9, monitored control system based on expression discernment, including cloud ware, the raspberry group, big dipper GPS module, the mobile terminal (like the cell-phone, the panel computer) and foretell on-vehicle monitoring device that are equipped with control APP, steering wheel 9 and two step motor 18 all are supporting to have drive module, each drive module is with raspberry group serial port communication respectively, each photosensor is with raspberry group serial port communication respectively, big dipper GPS module and raspberry group serial port communication, each camera and each LED light filling lamp are with raspberry group serial port communication respectively. The raspberry group and the cloud server are connected through a 5G/4G double-frequency wireless network card through a UBS, a TCP/IP protocol and an HTTP protocol are forwarded through a 5G/4G double-frequency wireless router, real-time communication with the cloud server is achieved, and monitoring videos and GPS data are uploaded in real time (the raspberry group, the Beidou GPS module, the 5G/4G double-frequency wireless router and the plurality of driving modules are detachably and concealingly installed in a vehicle, and a peripheral box can also be manufactured and installed at a designated position in the vehicle, such as above a rearview mirror). The mobile terminal is in real-time communication with the cloud server through the 5G/4G mobile network.

The cloud server includes: the communication module is used for realizing the two-way communication with the mobile terminal and the raspberry group; the suspicious expression recognition module is used for judging the suspicious degree of the facial expression of the monitoring video; the risk early warning module is used for sending early warning to the mobile terminal when the suspicious expression identification module judges the suspicious expression with high confidence coefficient, and the early warning mode comprises short message prompt, APP popup and push prompt; and the cloud data management module is used for storing and managing historical monitoring videos, video clips with suspicious human faces and GPS data. Referring to fig. 11 and 12, when receiving the notification of the suspicious expression with high confidence, the risk early warning module initializes and calls the authentication data and other necessary data (such as a mobile phone number to be sent and short message content) of the cloud short message sending function API, sends a request with a data packet to an operator server providing a short message service, and after receiving a prompt of successful sending, ends the short message prompt. The mobile phone number to be sent is stored in a user information database in practical application, and the user identification code is determined through the raspberry group for binding, so that the mobile phone number of the user is uniquely determined. Referring to fig. 13, the mobile terminal APP monitors whether the cloud server sends a suspicious character occurrence prompt, and if the suspicious character occurrence prompt appears, a message window pops up to prompt a user, so that double prompts of a short message and the message window are realized, and the user is prevented from ignoring early warning. In addition, the display of the page can be switched according to the selection of the user, GPS data, video streams and video segments with suspicious characters can be acquired, and the user can select whether to store the video segments locally.

The mobile terminal monitoring APP is used for managing and caching the cloud monitoring videos and the video segments of suspicious human faces, sending control instructions to the raspberry, and visually displaying monitoring pictures, the video segments of suspicious people, historical vehicle GPS tracks and the current position. The user can be by the action of control APP control steering wheel 9 and two step motor 18 on the mobile terminal to and opening, closing of each LED light filling lamp, realize the flexible and rotation of left and right module of making a video recording, see figure 10.

The function of anti-dazzling rearview mirror realizes: the detection of automatic anti-dazzling photosensitive quantity is formed by the third photosensitive sensor 29 at the front end and the first photosensitive sensor 6 at the rear end, and the rearview mirror 5 in the embodiment selects a 4-level gray scale intelligent anti-dazzling lens, mainly comprises an upper glass substrate, a lower glass substrate, an upper conductive layer, a lower conductive layer and a liquid crystal layer, and is plated with a reflective film on the lower glass substrate. When light irradiates on the rearview mirror, if the brightness of the rear light is greater than that of the front light, the raspberry pie controls and outputs voltage on the conducting layer of the anti-dazzle lens, so that the light transmittance of the liquid crystal layer of the lens is changed, the higher the voltage is, the lower the light transmittance of the liquid crystal layer is, even if strong irradiation light irradiates on the rearview mirror, the dark light is displayed and cannot be dazzled when the light is reflected to the eyes of a driver through the rearview mirror, and the automatic anti-dazzle function is achieved.

The working process of the monitoring system based on expression recognition is as follows:

as shown in fig. 10, the raspberry group constructs a Socket communication data packet based on a TCP protocol from the acquired GPS data, constructs an HTTP protocol data packet from all camera surveillance videos, and transmits the HTTP protocol data packet to the cloud server. Socket communication requires a Socket server to be created at a cloud server and a Socket client to be created at a raspberry. The lifecycle of the Socket server side is to initialize a Socket environment, create a Socket server, specify a port number for Socket communication of the cloud server side, bind the port number and an IP address corresponding to the cloud server side, monitor a communication channel, receive a request sent by a client side, communicate with the client side for receiving and sending messages, and finally close a network environment and Socket connection after communication is completed. The lifecycle of the Socket client is to initialize a Socket environment, create the Socket client, connect with the IP address and the open port number of the cloud server, communicate with the server for message transmission and reception, and finally close the network environment and Socket connection after the communication is completed. Socket communication transmits Socket communication data packets by receiving and sending messages, and the content of the data packets is GPS data in a format specified by an NMEA-0183 protocol. And in HTTP protocol communication, a ffmpeg program is required to be run on a raspberry of the monitoring device end to recode the video stream into a h264 format for video compression, and the video stream is uploaded to a cloud server as a data packet. And the cloud server receives the video stream data, decompresses and stores the video stream data locally. And the raspberry periodically intercepts a scene in the video stream for light and shade judgment, and when the brightness of the scene is judged to be lower than a set value, the LED light supplement lamp corresponding to the camera in the scene is turned on.

As shown in fig. 11, the cloud server receives and stores data transmitted in real time by the raspberry pi, then forwards the GPS data to the mobile terminal monitoring APP, and performs suspicious expression recognition on the received monitoring video, and calls the risk early warning module to perform early warning when the suspicious expression recognition module returns a suspicious expression judgment result with high confidence.

step 1), collecting monitoring videos.

And step 2), training an expression recognition model in advance.

And 3) carrying out expression recognition on the monitoring video by applying the expression recognition model, and sending out early warning when the confidence coefficient of suspicious expression recognition is high.

Specifically, as shown in fig. 14, in step 2), the method for training the expression recognition model includes:

step a, inputting an AFEW data set, taking images with a set proportion in the AFEW data set as a training set, and taking the rest images in the AFEW data set as a verification set; the AEFW data set is provided by an EmotiW international emotion recognition competition, wherein the human face expression images are different from pictures shot in a common laboratory, are selected from film and television works shot naturally, have various video scenes with head shelters, light illumination changes, different rotation angles, motion-caused blurs and the like, and are similar to the shooting environment of vehicle-mounted security monitoring equipment. Therefore, the AFEW data set is used as a training set to train on the AFEW data set, which is beneficial to the convergence of the model.

B, training a VGG16 network;

before VGG16 training, the model parameters of VGGFace are loaded as pre-training weights by adopting a transfer learning method. Then, training of the VGG16 network is started by using the training set, a feature value array, picture labels and video labels of each frame are generated after feature extraction, then a Softmax layer is added behind the VGG16 network for classification operation, and the label of each picture is the same as that of each video frame for classification reference.

The VGG16 network performs convolution operations using a 3 × 3 convolution kernel with a convolution sample scale of 1 × 1 and a pooling interval of 2 × 2. The network is sequentially provided with two layers of convolution Conv3-64 and one layer of pooling pool 1/2; two layers of convolution Conv3-128, one layer of pooling pool 1/2; three layers of convolution Conv3-256, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; two full link layers FC-4096. Pooling layer pool1/2 is the largest pooling layer with window size of 2 × 2 and step size of 2, used to reduce the number of weight connections.

The Softmax layer is a multi-classification problem, and processes linear output using a Softmax loss function. If the total number of the classifiers is c, the current elementThe ratio of the index of the element to the sum of the indexes of all elements is the Softmax value S of the current element_i，

Wherein, V_iAnd i is a category index and e represents a constant for the output of the output unit at the previous stage of the classifier.

S_iThe corresponding Softmax loss function is L_i

L_iThe closer to 0, the more accurate the classification result is. Therefore, it is required to be based on L_iThe classification result is adjusted to achieve the best classification state. s_jIndicating the Softmax value of the current element.

Step c, LSTM network training;

step b, training the finished VGG16, removing the added Softmax layer and the layer 2 full connection layer, and connecting the VGG16 in front of the LSTM network. After the video micro expression sequence extracted from each frame of video by adopting the traditional CNN model is calculated by VGG16 to obtain the characteristic value of the video micro expression image, the characteristic value is used as the input of the LSTM and is trained in the LSTM network. When training the LSTM model, every 12 frames of video images are selected as a segment to iterate.

The LSMT network structure is that each VGG16 connects LSTM networks with 128 hidden layer units on a single layer, i.e. the full connection layer dimension of the LSTM network is 128. The network consists of LSTM loop units that include a forgetting gate, an input gate, and an output gate. The forgetting gate function ft is used to decide which information the network is to lose, and its output value is between 0 and 1.

f_t＝σ(W^fx_t+U^fh_t-1)

Where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the last time LSTM. W^fAnd U^fThe weight coefficient parameters need to be determined after training and learning.

Input gate function i_tNew information for deciding network to add

i_t＝σ(Wⁱx_t+Uⁱh_t-1)

Where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the last time LSTM. WⁱAnd UⁱThe weight coefficient parameters need to be determined after training and learning.

Updating door states

For describing the latest state of the current network

Where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the last time LSTM. W^cAnd U^cThe weight coefficient parameters need to be determined after training and learning.

New information to be added after the update operation is c_(t)

Finally, the output gate function O can be determined_tIs composed of

o_t＝σ(W^ox_t+U^oh_t-1)

Where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the last time LSTM. W^oAnd U^oThe weight coefficient parameter needs to be determined after training and learning, ⊙ represents a dot product operation.

And the output value of the LSTM network at the current moment is h_t

h_t＝o_t⊙tanh(C_t)

Finally, the LSTM networks are sequentially stacked to increase depth and add the depth to the whole LSTM system model, the LSTM hidden state in the first layer (1 st-1 st layer) of the 1 st network is used as LSTM input in the layer 1, and the networks are sequentially connected in a circulating mode, so that a long-time dynamic network can be formed. The effective information in the facial expression feature sequence stacked in the memory unit (i.e. the network memory of VGG 16) can be continuously updated and output to the next layer of network.

And 4, step 4: sending the verification set into a VGG16 and an LSTM network, and outputting a test result;

the validation set is sent into the VGG16 network after training, and the test result is given by the full connection layer FC in the VGG 16.

And 5: calculating the accuracy of the model test result;

representing the validation set accuracy of the model by ACC

Wherein, TP is the number of positive classes predicted by the positive class, namely the number of correct models predicted; TN is the number of predicting the negative class as the negative class number, namely the number of predicting other classes of the model correctly; the FP predicts the negative class as the false positive class number, namely the model predicts the error of other classes as the number of the class; the FN predicts the positive class as the negative class number, and the number of the prediction errors of the model is the number of the prediction errors of the model.

In the present invention, ACC is required to be higher than 95%.

As shown in fig. 15, in step 3), performing expression recognition on the monitored video stream by using the expression recognition model includes:

step A, inputting a monitoring video image to be detected;

and (3) dividing the monitoring video stream to obtain images of each frame, and detecting each frame by taking each 12 frames of video frames as a section.

the face detection is divided into three parts: sample classifier training, skin color segmentation and face detection. The sample classifier training mainly loads a face sample and a non-face sample, and trains the classifier by using an Adaboost algorithm. The skin color segmentation is mainly to obtain a skin color region in a picture by means of skin color region detection and segmentation, and to exclude most non-skin color regions during face detection. And the human face detection is to cascade strong classifiers trained by using an Adaboost algorithm, further carry out human face verification on the skin color area obtained in the skin color segmentation step and position the position of the human face.

The steps for detecting the face position are as follows: firstly, an image to be detected is divided into 20 samples by 20, and the size of a detection window is consistent with that of the samples. Then, the image is traversed from left to right and from top to bottom, 20 regions are detected, and possible face regions are marked. And then, setting a parameter scale to amplify the detection window according to the multiple, and circularly and ceaselessly amplifying the detection window to traverse the image until the detection window exceeds half of the size of the original image and then exiting the cycle. And merging the face areas detected in an overlapping mode once every traversal.

after the face region obtained by the Adaboost algorithm is cut, the face region is sent to the VGG16 and the single-layer LSTM network after training is completed, and the result judged by the model weight is output by the full-connection layer FC with the dimension of 128. The result set is angry, slight, frightened, happy, sad, surprised and neutral.

Step D, outputting a classification result;

and C, the classification result output by the full connection layer FC in the step C is reclassified according to the requirement of suspicious expression identification, the angry, slight, fear and surprise expressions are classified as suspicious expressions, the happy and sad expressions are classified as secondary suspicious expressions, and the neutral expressions are classified as normal expressions. And finally, outputting a classification result set including suspicious expressions, sub-expressions and normal expressions. Suspicious expressions, also known as suspicious expressions with high confidence levels.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any alternative or alternative method that can be easily conceived by those skilled in the art within the technical scope of the present invention should be covered by the scope of the present invention.

Claims

1. The vehicle-mounted monitoring device is characterized by being of a bilateral symmetry structure and comprising a device body (1), a left camera arm (2), a right camera arm (3), a front camera device (4) and a rearview mirror (5), wherein cavities matched with the corresponding camera arms are respectively formed in the left end and the right end of the device body (1), the left camera arm (2) and the right camera arm (3) are telescopically installed in the device body (1), and a driving mechanism for driving the two camera arms to stretch is installed in the middle of the inside of the device body (1); the front camera device (4) is fixed in the middle of the front end face of the device body (1), and the rear view mirror (5) is fixed on the rear end face of the device body (1).

2. The vehicle-mounted monitoring device according to claim 1, wherein each of the left camera arm (2) and the right camera arm (3) comprises a stepping motor (18), a motor base (19), a first rotating shaft (20), a sleeve (21) and a camera module, the sleeve (21) is slidably arranged in the device body (1) and corresponds to the end cavity, the motor base (19) is fixed at the inner end of the sleeve (21), the stepping motor (18) is fixed on the motor base (19), and an output shaft of the stepping motor (18) penetrates through the motor base (19) and is fixedly connected with the first rotating shaft (20) in the sleeve (21);

the camera module comprises a camera fixing box (11), a first knob (12), a second knob (13), a first camera (14), a first LED light supplement lamp (15), a second photosensitive sensor (16) and a stop block (17), wherein the first LED light supplement lamp (15) comprises a plurality of first LED lamp beads, the stop block (17), the first knob (12), the camera fixing box (11) and the second knob (13) are sequentially and fixedly connected, the stop block (17) is matched with the end face of the corresponding end of the device body (1), the first camera (14) is fixed in the middle of the front end face of the camera fixing box (11), the second photosensitive sensor (16) and the first LED lamp beads are uniformly fixed on the front end face of the camera fixing box (11) in the circumferential direction around the first camera (14), and the second knob (13) is rotatably connected with the outer end of the first rotating shaft (20);

the driving mechanism comprises a steering engine (9), a gear (10), a first rack (24) and a second rack (25), the steering engine (9) is fixed at the middle position in the device body (1), the gear (10) is fixed on an output shaft of the steering engine (9), the two racks are horizontally arranged in the device body (1), the two racks are respectively meshed with the gear (10), the left end of the first rack (24) is integrally connected with a sleeve (21) of the left camera shooting arm (2), the second rack (25) is integrally connected with a sleeve (21) of the right camera shooting arm (3), and grooves used for containing the opposite racks are respectively formed in the two sleeves (21).

3. The vehicle-mounted monitoring device according to claim 2, wherein the left camera arm (2) and the right camera arm (3) further comprise second rotating shafts (23), the second rotating shafts (23) are arranged in the sleeves (21) and are rotatably connected with the first rotating shafts (20) through pin shafts (22) to form a joint body capable of rotating forwards and backwards, semicircular guide grooves for forward rotation of the corresponding joint body are formed in the left side and the right side of the front end face of the device body (1), and the two sleeves (21) are correspondingly provided with holes; the second knob (13) is rotatably connected with the outer end of the second rotating shaft (23).

4. The vehicle-mounted monitoring device according to any one of claims 1 to 3, wherein the front camera device (4) comprises a front shell (26), a second camera (27), a second LED light supplement lamp (28) and a third photosensitive sensor (29), the second LED light supplement lamp (28) comprises a plurality of second LED lamp beads, the front shell (26) is integrally fixed at the middle position of the front end face of the device body (1), the second camera (27) is fixed at the middle position of the front end face of the front shell (26), and the third photosensitive sensor (29) and the plurality of second LED lamp beads are circumferentially and uniformly fixed on the front end face of the front shell (26) around the second camera (27); rearview mirror (5) adopt and prevent dazzling the rearview mirror, and rear-view mirror (5) upper portion intermediate position is fixed with first photosensor (6), and rear-view mirror (5) upper portion left side is fixed with left hidden camera (7), and rear-view mirror (5) upper portion right side is fixed with right hidden camera (8).

5. The monitoring system based on expression recognition is characterized by comprising a cloud server, a raspberry group, a Beidou GPS module, a mobile terminal provided with a monitoring APP and the vehicle-mounted monitoring device of claim 4, wherein a steering engine (9) and a two-step motor (18) are matched with driving modules, each driving module is respectively communicated with the raspberry group serial port, each photosensitive sensor is respectively communicated with the raspberry group serial port, the Beidou GPS module is communicated with the raspberry group serial port, and each camera and each LED light supplement lamp are respectively communicated with the raspberry group serial port; the raspberry pie and the cloud server realize real-time communication with the cloud server through a 5G/4G dual-frequency wireless network card, a TCP/IP protocol and an HTTP protocol through a 5G/4G dual-frequency wireless router, and upload monitoring videos and GPS data in real time; the mobile terminal is in real-time communication with the cloud server through a 5G/4G mobile network;

6. The monitoring method based on expression recognition is characterized by comprising the following steps of:

step 1), collecting a monitoring video;

step 2), training an expression recognition model in advance;

7. The monitoring method based on expression recognition of claim 6, wherein in the step 2), the method for training the expression recognition model comprises:

b, training a VGG16 network;

processing the linear output by using a Softmax loss function, and if the total number of the classes of the classifiers is c, determining that the ratio of the index of the current element to the sum of the indexes of all the elements is Sof of the current elementValue of tmax S_i，

Wherein, V_iI is a category index and e represents a constant;

S_ithe corresponding Softmax loss function is L_i，

step c, LSTM network training;

f_t＝σ(W^fx_t+U^fh_t-1)

where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the LSTM at the last moment; w^fAnd U^fFor the weight coefficient parameter, it needs to be trained and learnedThen determining;

input gate function i_tNew information for deciding network to add

i_t＝σ(Wⁱx_t+Uⁱh_t-1)

updating door states

For describing the latest state of the current network

Where t denotes the current time, x_tFor the network input at the present moment, h_t-1The network output value of the LSTM at the last moment; w^cAnd U^cThe weight coefficient parameters are determined after training and learning;

new information to be added after the update operation is c_t

Finally, the output gate function O can be determined_tIs composed of

o_t＝σ(W^ox_t+U^oh_t-1)

Where t denotes the current time, x_tFor the network at the present momentInput, h_t-1The network output value of the LSTM at the last moment; w^oAnd U^o⊙ represents point multiplication operation, which is weight coefficient parameter and needs to be determined after training and learning;

and the output value of the LSTM network at the current moment is h_t

h_t＝o_t⊙tanh(C_t)

e. calculating the accuracy of the model test result;

representing the validation set accuracy of the model by ACC

8. The method for monitoring based on expression recognition according to claim 7, wherein 70% of images in the AFEW data set are used as a training set and 30% of images are used as a verification set in step a.

9. The monitoring method based on expression recognition of claim 7 or 8, wherein in the step 3), the expression recognition of the monitoring video by applying the expression recognition model comprises:

step A, inputting a monitoring video image to be detected;

step D, outputting a classification result;

10. The monitoring method based on expression recognition according to 9, wherein in the step B, the step of detecting the face position is: firstly, dividing an image to be detected into 20 samples by 20 samples, wherein the size of a detection window is consistent with that of the samples; then, traversing the image from left to right and from top to bottom, detecting 20 regions by 20, and marking out possible face regions; then, setting a parameter scale to amplify the detection window according to the multiple, and circularly and ceaselessly amplifying the detection window to traverse the image until the detection window exceeds half of the size of the original image and then exiting the cycle; and merging the face areas detected in an overlapping mode once every traversal.