CN111212275A - Vehicle-mounted monitoring device, monitoring system and monitoring method based on expression recognition - Google Patents

Vehicle-mounted monitoring device, monitoring system and monitoring method based on expression recognition Download PDF

Info

Publication number
CN111212275A
CN111212275A CN202010145340.9A CN202010145340A CN111212275A CN 111212275 A CN111212275 A CN 111212275A CN 202010145340 A CN202010145340 A CN 202010145340A CN 111212275 A CN111212275 A CN 111212275A
Authority
CN
China
Prior art keywords
network
monitoring
camera
training
lstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010145340.9A
Other languages
Chinese (zh)
Other versions
CN111212275B (en
Inventor
杨雪
张乃欣
夏细明
陈巍
刘静
马培立
张天程
曹哲奇
刘家乐
汪晟
卢欣
杨承斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Technology
Original Assignee
Nanjing Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Technology filed Critical Nanjing Institute of Technology
Priority to CN202010145340.9A priority Critical patent/CN111212275B/en
Publication of CN111212275A publication Critical patent/CN111212275A/en
Application granted granted Critical
Publication of CN111212275B publication Critical patent/CN111212275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/56Cameras or camera modules comprising electronic image sensors; Control thereof provided with illuminating means
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a vehicle-mounted monitoring device, a monitoring system and a monitoring method based on expression recognition. The vehicle-mounted monitoring device has the advantages of good concealment, large monitoring visual field and good monitoring effect. The monitoring system and the monitoring method based on expression recognition can realize real-time monitoring of the mobile terminal, the monitoring data is not easy to lose, and the control operation of the monitoring device in the vehicle can be realized through the mobile terminal. The face detection and suspicious expression recognition are based on a deep learning algorithm to train an expression recognition model, potential invasion objects can be automatically recognized, and the recognition accuracy rate is high. Compared with other algorithms, the Adaboost face position detection model has better calculation efficiency and is suitable for the requirement of real-time operation. Compared with the traditional CNN network for classifying expressions, the LSTM network with guaranteed time sequence is used, the LSTM network is more concentrated on continuous expression changes, the detection result of each frame can influence the next frame, and suspicious judgment results can be output at each moment, so that the reliability of suspicious expression judgment and the timeliness of alarming are improved.

Description

Vehicle-mounted monitoring device, monitoring system and monitoring method based on expression recognition
Technical Field
The invention belongs to the field of security and protection, and particularly relates to a vehicle-mounted monitoring device, a monitoring system and a monitoring method based on expression recognition.
Background
The traditional vehicle-mounted monitoring device is easy to shield the sight line after being installed, and the shooting view is small, so that the vehicle and the inside and outside conditions can not be monitored in an all-round manner. In addition, the appearance is abrupt, the concealment is poor, and the problem that the film is easy to be discovered and damaged by lawless persons exists. With the improvement of the automobile holding capacity in China, the criminal cases related to vehicle damage and theft are increased day by day, and the property safety of people is seriously threatened. Therefore, a vehicle-mounted monitoring device with better concealment and monitoring effect is urgently needed.
Disclosure of Invention
The invention provides a vehicle-mounted monitoring device with better concealment and monitoring effect. The invention also provides a monitoring system and a monitoring method based on expression recognition, which can monitor and intelligently early warn infringement behaviors in real time.
The technical scheme adopted by the invention is as follows:
the vehicle-mounted monitoring device is of a bilateral symmetry structure and comprises a device body, a left camera arm, a right camera arm, a front camera device and a rearview mirror, wherein cavities matched with the corresponding camera arms are respectively formed in the left end and the right end of the device body, the left camera arm and the right camera arm are telescopically installed in the device body, and a driving mechanism for driving the two camera arms to stretch is installed in the middle position in the device body; the front camera device is fixed in the middle of the front end face of the device body, and the rearview mirror is fixed on the rear end face of the device body.
Furthermore, the left camera arm and the right camera arm respectively comprise a stepping motor, a motor base, a first rotating shaft, a sleeve and a camera module, the sleeve is slidably arranged in the device body and in a cavity at the corresponding end, the motor base is fixed at the end part of the inner side of the sleeve, the stepping motor is fixed on the motor base, and an output shaft of the stepping motor penetrates through the motor base and is fixedly connected with the first rotating shaft in the sleeve;
the camera module comprises a camera fixing box, a first knob, a second knob, a first camera, a first LED light supplementing lamp, a second photosensitive sensor and a stop block, wherein the first LED light supplementing lamp comprises a plurality of first LED lamp beads, the stop block, the first knob, the camera fixing box and the second knob are sequentially and fixedly connected, the stop block is matched with the end face of the corresponding end of the device body, the first camera is fixed in the middle of the front end face of the camera fixing box, the second photosensitive sensor and the plurality of first LED lamp beads are uniformly fixed on the front end face of the camera fixing box in the circumferential direction around the first camera, and the second knob is rotatably connected with the outer end of the first rotating shaft;
the driving mechanism comprises a steering engine, a gear, a first rack and a second rack, the steering engine is fixed at an intermediate position in the device body, the gear is fixed on an output shaft of the steering engine, the two racks are horizontally arranged in the device body and are respectively meshed with the gear, the left end of the first rack is integrally connected with a sleeve of the left camera shooting arm, the second rack is integrally connected with a sleeve of the right camera shooting arm, and grooves used for containing the opposite racks are respectively formed in the two sleeves.
Furthermore, the left camera arm and the right camera arm respectively comprise a second rotating shaft, the second rotating shaft is arranged in the sleeve and is rotationally connected with the first rotating shaft through a pin shaft to form a joint body capable of rotating forwards and backwards, the left side and the right side of the front end surface of the device body are respectively provided with a semicircular guide groove for the corresponding joint body to rotate forwards, and the two sleeves are correspondingly provided with holes; the second knob is rotatably connected with the outer end of the second rotating shaft.
Further, the front camera device comprises a front shell, a second camera, a second LED light supplement lamp and a third photosensitive sensor, the second LED light supplement lamp comprises a plurality of second LED lamp beads, the front shell is integrally fixed in the middle of the front end face of the device body, the second camera is fixed in the middle of the front end face of the front shell, and the third photosensitive sensor and the second LED lamp beads are uniformly fixed on the front end face of the front shell in the circumferential direction around the second camera; the rear-view mirror adopts the anti-dazzling rear-view mirror, rear-view mirror upper portion intermediate position is fixed with first photosensor, rear-view mirror upper portion left side is fixed with the hidden camera in a left side, rear-view mirror upper portion right side is fixed with the hidden camera in the right side.
The monitoring system based on expression recognition comprises a cloud server, a raspberry group, a Beidou GPS module, a mobile terminal provided with a monitoring APP and the vehicle-mounted monitoring device, wherein the steering engine and the two-step motor are matched with driving modules, each driving module is respectively communicated with a serial port of the raspberry group, each photosensitive sensor is respectively communicated with the serial port of the raspberry group, the Beidou GPS module is communicated with the serial port of the raspberry group, and each camera and each LED light supplementing lamp are respectively communicated with the serial port of the raspberry group; the raspberry pie and the cloud server realize real-time communication with the cloud server through a 5G/4G dual-frequency wireless network card, a TCP/IP protocol and an HTTP protocol through a 5G/4G dual-frequency wireless router, and upload monitoring videos and GPS data in real time; the mobile terminal is in real-time communication with the cloud server through a 5G/4G mobile network;
the cloud server includes: the communication module is used for realizing the two-way communication between the mobile terminal and the raspberry pi; the suspicious expression recognition module is used for judging the suspicious degree of the facial expression of the monitoring video; the risk early warning module is used for sending early warning to the mobile terminal when the suspicious expression identification module judges the suspicious expression with high confidence coefficient, and the early warning mode comprises short message prompt, APP popup and push prompt; the cloud data management module is used for storing and managing historical monitoring videos, video clips with suspicious human faces and GPS data;
the mobile terminal monitoring APP is used for managing and caching the cloud monitoring videos and the video segments of suspicious human faces, sending control instructions to the raspberry, and visually displaying monitoring pictures, the video segments of suspicious people, historical vehicle GPS tracks and the current position.
The monitoring method based on expression recognition comprises the following steps:
step 1), collecting a monitoring video;
step 2), training an expression recognition model in advance;
and 3) applying the expression recognition model to perform expression recognition on the monitoring video, and sending out early warning when the confidence coefficient of suspicious expression recognition is high.
Further, in step 2), the method for training the expression recognition model includes:
step a, inputting an AFEW data set, taking images with a set proportion in the AFEW data set as a training set, and taking the rest images in the AFEW data set as a verification set;
b, training a VGG16 network;
before VGG16 network training, loading model parameters of VGGFace as pre-training weights by adopting a transfer learning method; training of the VGG16 network is started by using a training set, a characteristic value array, picture labels and video labels of each frame are generated after characteristic extraction, a Softmax layer is added behind the VGG16 network for classification operation, and the classification reference is that the label of each picture is the same as that of each video frame;
the VGG16 network uses a convolution kernel of 3 x 3 to carry out convolution operation, the convolution sampling scale is 1 x 1, and the pooling interval is 2 x 2; the network is sequentially provided with two layers of convolution Conv3-64 and one layer of pooling pool 1/2; two layers of convolution Conv3-128, one layer of pooling pool 1/2; three layers of convolution Conv3-256, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; two full connecting layers FC-4096;
the pooling layer pool1/2 is the largest pooling layer with a window size of 2 × 2 and a step size of 2;
processing the linear output by using a Softmax loss function, and if the total number of the classes of the classifiers is c, determining the ratio of the index of the current element to the sum of the indexes of all the elements as the Softmax value S of the current elementi
Figure BDA0002400523740000031
Wherein, ViI is a category index and e represents a constant;
Sithe corresponding Softmax loss function is Li
Figure BDA0002400523740000032
LiThe closer to 0, the more accurate the classification result is; sjRepresents the Softmax value of the current element;
step c, LSTM network training;
b, removing the added Softmax layer and the 2 nd layer full connection layer from the VGG16 after training is completed, and connecting the VGG16 in front of the LSTM network; after a video micro expression sequence extracted from each frame of video by adopting a traditional CNN model is calculated by VGG16 to obtain a video micro expression image characteristic value, the video micro expression image characteristic value is used as the input of an LSTM and is trained in an LSTM network; when an LSTM model is trained, selecting each 12 frames of video images as a segment for iteration;
the LSMT network structure is that each VGG16 is connected with an LSTM network with 128 hidden layer units in a single layer; the network consists of an LSTM circulation unit, wherein the circulation unit comprises a forgetting gate, an input gate and an output gate; the forgetting gate function ft is used for determining which information is lost by the network, and the output value of the forgetting gate function ft is between 0 and 1;
ft=σ(Wfxt+Ufht-1)
Figure BDA0002400523740000041
where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the LSTM at the last moment; wfAnd UfThe weight coefficient parameters are determined after training and learning;
input gate function itNew information for deciding network to add
it=σ(Wixt+Uiht-1)
Figure BDA0002400523740000042
Where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the LSTM at the last moment; wiAnd UiThe weight coefficient parameters are determined after training and learning;
updating door states
Figure BDA0002400523740000043
For describing the latest state of the current network
Figure BDA0002400523740000044
Figure BDA0002400523740000045
Where t denotes the current time, xtIs at presentTime of day network input, ht-1The network output value of the LSTM at the last moment; wcAnd UcThe weight coefficient parameters are determined after training and learning;
new information to be added after the update operation is ct
Figure BDA0002400523740000046
Finally, the output gate function O can be determinedtIs composed of
ot=σ(Woxt+Uoht-1)
Where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the LSTM at the last moment; woAnd Uo⊙ represents point multiplication operation, which is weight coefficient parameter and needs to be determined after training and learning;
and the output value of the LSTM network at the current moment is ht
ht=ot⊙tanh(Ct)
Figure BDA0002400523740000047
Finally, sequentially stacking all layers of the LSTM network to increase depth and add the LSTM network to the whole LSTM system model, adopting the LSTM hidden state in the first layer of the 1 st network as LSTM input in the layer 1, and sequentially and circularly connecting all layers of networks to form a long-time dynamic network; effective information in the facial expression feature sequence can be continuously updated and output to the next layer of network;
d. sending the verification set into a VGG16 and an LSTM network, and outputting a test result;
sending the verification set into a VGG16 network after training, and giving a test result by a full connection layer FC in VGG 16;
e. calculating the accuracy of the model test result;
representing the validation set accuracy of the model by ACC
Figure BDA0002400523740000051
Wherein, TP is the number of positive classes predicted by the positive class, namely the number of correct models predicted; TN is the number of predicting the negative class as the negative class number, namely the number of predicting other classes of the model correctly; the FP predicts the negative class as the false positive class number, namely the model predicts the error of other classes as the number of the class; the FN predicts the positive class as the negative class number and fails to report, namely the model is the number of the prediction errors of the current class; ACC requirement is higher than 95%.
Further, in step a, 70% of images in the AFEW data set are used as a training set, and 30% of images are used as a verification set.
Further, in step 3), performing expression recognition on the monitoring video by using the expression recognition model, including:
step A, inputting a monitoring video image to be detected;
dividing the monitoring video to obtain images of each frame, and taking each 12 frames of video frames as a section to respectively detect each frame;
b, Adaboost, carrying out face position detection on the strong classifier and skin color segmentation obtained by algorithm training;
the face detection is divided into three parts: training a sample classifier, segmenting skin color and detecting human face; training a sample classifier to load a human face sample and a non-human face sample, and training the classifier by using an Adaboost algorithm; the skin color segmentation acquires a skin color area in the picture in a skin color area detection and segmentation mode, and excludes most non-skin color areas during face detection; the human face detection is to cascade strong classifiers trained by using an Adaboost algorithm, further carry out human face verification on the skin color area obtained in the skin color segmentation step, and position the position of the human face;
step C, VGG16, carrying out feature extraction and classification with the LSTM network;
after a face region obtained by an Adaboost algorithm is cut, the face region is sent to a VGG16 and a single-layer LSTM network after training is completed, a result judged by the model weight is output by a full-connection layer FC with the dimension of 128, and a result set comprises anger, gables, fear, happiness, sadness, surprise and neutrality;
step D, outputting a classification result;
classifying the classification result output by the full connection layer FC in the step C again according to the requirement of suspicious expression identification, classifying the angry, slight, fear and surprise expressions into suspicious expressions, classifying the happy and sad expressions into secondary suspicious expressions, and classifying the neutral expressions into normal expressions; finally, the output classification result set comprises suspicious expressions, sub-expressions and normal expressions; suspicious expressions, also known as suspicious expressions with high confidence levels.
Further, in step B, the step of detecting the face position is: firstly, dividing an image to be detected into 20 samples by 20 samples, wherein the size of a detection window is consistent with that of the samples; then, traversing the image from left to right and from top to bottom, detecting 20 regions by 20, and marking out possible face regions; then, setting a parameter scale to amplify the detection window according to the multiple, and circularly and ceaselessly amplifying the detection window to traverse the image until the detection window exceeds half of the size of the original image and then exiting the cycle; and merging the face areas detected in an overlapping mode once every traversal.
The invention has the beneficial effects that:
the vehicle-mounted monitoring device has the advantages of good concealment, large monitoring visual field and good monitoring effect. The monitoring system and the monitoring method based on expression recognition can realize real-time monitoring of the mobile terminal, the monitoring data is not easy to lose, the monitoring view of the monitoring device in the vehicle can be adjusted through the mobile terminal, and the operation is convenient. The face detection and suspicious expression recognition are based on a deep learning algorithm to train an expression recognition model, potential invasion objects can be automatically recognized, and the recognition accuracy rate is high. Compared with other algorithms, the Adaboost face position detection model has better calculation efficiency and meets the requirement of real-time operation. Compared with the traditional CNN network for classifying expressions, the LSTM network with guaranteed time sequence is used, the LSTM network is more concentrated on continuous expression changes, the detection result of each frame can influence the next frame, and suspicious judgment results can be output at each moment, so that the reliability of suspicious expression judgment and the timeliness of alarming are improved.
Drawings
FIG. 1 is a schematic structural diagram of a vehicle-mounted monitoring device according to the present invention;
FIG. 2 is a left side view of FIG. 1;
FIG. 3 is a schematic diagram of the internal structure of the vehicle-mounted monitoring device (the camera arm is in an extended state);
FIG. 4 is a view from the B-B direction of FIG. 3;
fig. 5 is a schematic diagram of the internal structure of the vehicle-mounted monitoring device (a storage state of the camera arm);
FIG. 6 is a front view of the in-vehicle monitoring device;
FIG. 7 is a rear view of FIG. 6;
FIG. 8 is a top view of FIG. 6;
FIG. 9 is a schematic diagram of a monitoring system according to the present invention;
FIG. 10 is a flow chart of the operation of the monitoring system;
FIG. 11 is a workflow diagram of a cloud server;
FIG. 12 is a short message warning flowchart;
FIG. 13 is a flowchart illustrating monitoring APP usage by a mobile terminal;
FIG. 14 is a flow chart of a monitoring method based on expression recognition according to the present invention;
FIG. 15 is a diagram of a suspicious facial expression recognition network model;
reference numerals: 1-a device body, 2-a left camera arm, 3-a right camera arm, 4-a front camera device, 5-a rear view mirror, 6-a first photosensitive sensor, 7-a left concealed camera, 8-a right concealed camera, 9-a steering engine, 10-a gear, 11-a camera fixing box, 12-a first knob, 13-a second knob, 14-a first camera, 15-a first LED light supplement lamp, 16-a second photosensitive sensor, 17-a stop block, 18-a stepping motor, 19-a motor base, 20-a first rotating shaft, 21-a sleeve, 22-a pin shaft, 23-a second rotating shaft, 24-a first rack, 25-a second rack, 26-a front shell, 27-a second camera and 28-a second LED light supplement lamp, 29-third light sensitive sensor.
Detailed Description
The vehicle-mounted monitoring device, the monitoring system based on expression recognition and the monitoring method of the invention are further described in detail with reference to the accompanying drawings and specific embodiments.
The vehicle-mounted monitoring device shown in fig. 1 and 2 has a bilaterally symmetrical structure, and includes a device body 1, a left camera arm 2, a right camera arm 3, a front camera 4, and a rearview mirror 5. The left end and the right end of the device body 1 are respectively provided with a cavity matched with the corresponding camera shooting arms, the left camera shooting arm 2 and the right camera shooting arm 3 are telescopically installed in the device body 1, and a driving mechanism for driving the two camera shooting arms to stretch is installed at the middle position in the device body 1. The front camera 4 is fixed at the middle position of the front end face of the device body 1, and the rear view mirror 5 is fixed on the rear end face of the device body 1.
Specifically, as shown in fig. 3 to 5, each of the left camera arm 2 and the right camera arm 3 includes a stepping motor 18, a motor base 19, a first rotating shaft 20, a sleeve 21, a second rotating shaft 23, and a camera module, the sleeve 21 is slidably disposed in the device body 1 and in the corresponding end cavity (the sleeve 21 is matched with the corresponding end cavity), the motor base 19 is fixed at the inner end of the sleeve 21 by a screw, the stepping motor 18 is fixed on the motor base 19 by a screw, and an output shaft of the stepping motor 18 passes through the motor base 19 and is fixedly connected to the first rotating shaft 20 in the sleeve 21 by a cone end stop screw. The second rotating shaft 23 is arranged in the sleeve 21 and is rotatably connected with the first rotating shaft 20 through a pin 22 to form a joint body capable of rotating forwards and backwards, semicircular guide grooves for the corresponding joint body to rotate forwards are formed in the left side and the right side of the front end face of the device body 1, and the two sleeves 21 correspond to the holes.
Referring to fig. 3 and 6, the camera module includes camera fixed box 11, first knob 12, second knob 13, first camera 14, first LED light filling lamp 15, second light sensor 16 and dog 17, first LED light filling lamp 15 includes a plurality of first LED lamp pearls, dog 17, first knob 12, camera fixed box 11 and second knob 13 are fixed connection in proper order, dog 17 and device body 1 correspond end face phase-match, first camera 14 is fixed at camera fixed box 11 preceding end face intermediate position, second light sensor 16 and a plurality of first LED lamp pearls surround first camera 14 circumference and evenly fix on camera fixed box 11 preceding end face, second knob 13 rotates with second pivot 23 outside terminal part to be connected, specifically, second pivot 23 outside terminal part an organic whole is provided with a minor axis, this minor axis rotates with second knob 13 to be connected.
As shown in fig. 3, the driving mechanism includes a steering engine 9, a gear 10, a first rack 24 and a second rack 25, the steering engine 9 is fixed at a middle position in the device body 1, the gear 10 is fixed on an output shaft of the steering engine 9, the two racks are horizontally arranged in the device body 1 and are respectively meshed with the gear 10, the left end of the first rack 24 is integrally connected with a sleeve 21 of the left camera arm 2, the second rack 25 is integrally connected with a sleeve 21 of the right camera arm 3, and grooves for accommodating the opposite racks are respectively formed in the two sleeves 21.
As shown in fig. 7, the front camera device 4 includes a front housing 26, a second camera 27, a second LED light supplement lamp 28 and a third photosensor 29, the second LED light supplement lamp 28 includes a plurality of second LED lamp beads, the front housing 26 is integrally fixed at a middle position of a front end surface of the device body 1, the second camera 27 is fixed at a middle position of a front end surface of the front housing 26, and the third photosensor 29 and the plurality of second LED lamp beads are circumferentially and uniformly fixed on the front end surface of the front housing 26 around the second camera 27. Rear-view mirror 5 adopts the anti-dazzling rear-view mirror, and 5 upper portion intermediate positions of rear-view mirror are fixed with first photosensor 6, and 5 upper portion left sides of rear-view mirror are fixed with left hidden camera 7, and 5 upper portion right sides of rear-view mirror are fixed with right hidden camera 8. Two hidden cameras are used for the interior passenger of car and the shooting of scene behind the car, because hide in reflector glass rear side, generally be difficult to by people discovery and notice, can realize the secret control, avoid supervisory equipment to discover and suffer destruction. The second camera 27 is used for taking images in front of the automobile.
The outer shell (the device body 1 and the front shell 26) of the vehicle-mounted monitoring device adopts ABS engineering plastics, and the shell can be selected to be in a color matched with the interior trim of an automobile. The appearance of the vehicle-mounted monitoring device is similar to that of a common vehicle-mounted inner rear-view mirror, and the vehicle-mounted monitoring device can be used as the common rear-view mirror at ordinary times. In the initial state, the two imaging arms are retracted inside the apparatus body 1, see fig. 5. During monitoring, the two camera arms can be driven by the steering engine 9 to synchronously extend, and the reference is made to fig. 2. Corresponding camera modules can be driven to rotate around the axis by the respective stepping motors 18 of the two camera arms, and meanwhile, the camera modules can also be rotated by manually rotating knobs on the camera modules. In addition, the second rotating shaft 23 can be rotated outward for adjusting the views of the left and right sides, as shown in fig. 1 and 8. The invention can monitor the conditions inside and outside the vehicle in a concealed way, the monitoring visual field is adjustable, the visual field visual angle range is large, and no blind area exists in the shooting scene. During normal use, the second camera 27 through left and right camera module and front end monitors. When abnormal use to and when left and right camera module was packed up, monitor through second camera 27 and two concealed cameras.
As shown in fig. 9, monitored control system based on expression discernment, including cloud ware, the raspberry group, big dipper GPS module, the mobile terminal (like the cell-phone, the panel computer) and foretell on-vehicle monitoring device that are equipped with control APP, steering wheel 9 and two step motor 18 all are supporting to have drive module, each drive module is with raspberry group serial port communication respectively, each photosensor is with raspberry group serial port communication respectively, big dipper GPS module and raspberry group serial port communication, each camera and each LED light filling lamp are with raspberry group serial port communication respectively. The raspberry group and the cloud server are connected through a 5G/4G double-frequency wireless network card through a UBS, a TCP/IP protocol and an HTTP protocol are forwarded through a 5G/4G double-frequency wireless router, real-time communication with the cloud server is achieved, and monitoring videos and GPS data are uploaded in real time (the raspberry group, the Beidou GPS module, the 5G/4G double-frequency wireless router and the plurality of driving modules are detachably and concealingly installed in a vehicle, and a peripheral box can also be manufactured and installed at a designated position in the vehicle, such as above a rearview mirror). The mobile terminal is in real-time communication with the cloud server through the 5G/4G mobile network.
The cloud server includes: the communication module is used for realizing the two-way communication with the mobile terminal and the raspberry group; the suspicious expression recognition module is used for judging the suspicious degree of the facial expression of the monitoring video; the risk early warning module is used for sending early warning to the mobile terminal when the suspicious expression identification module judges the suspicious expression with high confidence coefficient, and the early warning mode comprises short message prompt, APP popup and push prompt; and the cloud data management module is used for storing and managing historical monitoring videos, video clips with suspicious human faces and GPS data. Referring to fig. 11 and 12, when receiving the notification of the suspicious expression with high confidence, the risk early warning module initializes and calls the authentication data and other necessary data (such as a mobile phone number to be sent and short message content) of the cloud short message sending function API, sends a request with a data packet to an operator server providing a short message service, and after receiving a prompt of successful sending, ends the short message prompt. The mobile phone number to be sent is stored in a user information database in practical application, and the user identification code is determined through the raspberry group for binding, so that the mobile phone number of the user is uniquely determined. Referring to fig. 13, the mobile terminal APP monitors whether the cloud server sends a suspicious character occurrence prompt, and if the suspicious character occurrence prompt appears, a message window pops up to prompt a user, so that double prompts of a short message and the message window are realized, and the user is prevented from ignoring early warning. In addition, the display of the page can be switched according to the selection of the user, GPS data, video streams and video segments with suspicious characters can be acquired, and the user can select whether to store the video segments locally.
The mobile terminal monitoring APP is used for managing and caching the cloud monitoring videos and the video segments of suspicious human faces, sending control instructions to the raspberry, and visually displaying monitoring pictures, the video segments of suspicious people, historical vehicle GPS tracks and the current position. The user can be by the action of control APP control steering wheel 9 and two step motor 18 on the mobile terminal to and opening, closing of each LED light filling lamp, realize the flexible and rotation of left and right module of making a video recording, see figure 10.
The function of anti-dazzling rearview mirror realizes: the detection of automatic anti-dazzling photosensitive quantity is formed by the third photosensitive sensor 29 at the front end and the first photosensitive sensor 6 at the rear end, and the rearview mirror 5 in the embodiment selects a 4-level gray scale intelligent anti-dazzling lens, mainly comprises an upper glass substrate, a lower glass substrate, an upper conductive layer, a lower conductive layer and a liquid crystal layer, and is plated with a reflective film on the lower glass substrate. When light irradiates on the rearview mirror, if the brightness of the rear light is greater than that of the front light, the raspberry pie controls and outputs voltage on the conducting layer of the anti-dazzle lens, so that the light transmittance of the liquid crystal layer of the lens is changed, the higher the voltage is, the lower the light transmittance of the liquid crystal layer is, even if strong irradiation light irradiates on the rearview mirror, the dark light is displayed and cannot be dazzled when the light is reflected to the eyes of a driver through the rearview mirror, and the automatic anti-dazzle function is achieved.
The working process of the monitoring system based on expression recognition is as follows:
as shown in fig. 10, the raspberry group constructs a Socket communication data packet based on a TCP protocol from the acquired GPS data, constructs an HTTP protocol data packet from all camera surveillance videos, and transmits the HTTP protocol data packet to the cloud server. Socket communication requires a Socket server to be created at a cloud server and a Socket client to be created at a raspberry. The lifecycle of the Socket server side is to initialize a Socket environment, create a Socket server, specify a port number for Socket communication of the cloud server side, bind the port number and an IP address corresponding to the cloud server side, monitor a communication channel, receive a request sent by a client side, communicate with the client side for receiving and sending messages, and finally close a network environment and Socket connection after communication is completed. The lifecycle of the Socket client is to initialize a Socket environment, create the Socket client, connect with the IP address and the open port number of the cloud server, communicate with the server for message transmission and reception, and finally close the network environment and Socket connection after the communication is completed. Socket communication transmits Socket communication data packets by receiving and sending messages, and the content of the data packets is GPS data in a format specified by an NMEA-0183 protocol. And in HTTP protocol communication, a ffmpeg program is required to be run on a raspberry of the monitoring device end to recode the video stream into a h264 format for video compression, and the video stream is uploaded to a cloud server as a data packet. And the cloud server receives the video stream data, decompresses and stores the video stream data locally. And the raspberry periodically intercepts a scene in the video stream for light and shade judgment, and when the brightness of the scene is judged to be lower than a set value, the LED light supplement lamp corresponding to the camera in the scene is turned on.
As shown in fig. 11, the cloud server receives and stores data transmitted in real time by the raspberry pi, then forwards the GPS data to the mobile terminal monitoring APP, and performs suspicious expression recognition on the received monitoring video, and calls the risk early warning module to perform early warning when the suspicious expression recognition module returns a suspicious expression judgment result with high confidence.
The monitoring method based on expression recognition comprises the following steps:
step 1), collecting monitoring videos.
And step 2), training an expression recognition model in advance.
And 3) carrying out expression recognition on the monitoring video by applying the expression recognition model, and sending out early warning when the confidence coefficient of suspicious expression recognition is high.
Specifically, as shown in fig. 14, in step 2), the method for training the expression recognition model includes:
step a, inputting an AFEW data set, taking images with a set proportion in the AFEW data set as a training set, and taking the rest images in the AFEW data set as a verification set; the AEFW data set is provided by an EmotiW international emotion recognition competition, wherein the human face expression images are different from pictures shot in a common laboratory, are selected from film and television works shot naturally, have various video scenes with head shelters, light illumination changes, different rotation angles, motion-caused blurs and the like, and are similar to the shooting environment of vehicle-mounted security monitoring equipment. Therefore, the AFEW data set is used as a training set to train on the AFEW data set, which is beneficial to the convergence of the model.
B, training a VGG16 network;
before VGG16 training, the model parameters of VGGFace are loaded as pre-training weights by adopting a transfer learning method. Then, training of the VGG16 network is started by using the training set, a feature value array, picture labels and video labels of each frame are generated after feature extraction, then a Softmax layer is added behind the VGG16 network for classification operation, and the label of each picture is the same as that of each video frame for classification reference.
The VGG16 network performs convolution operations using a 3 × 3 convolution kernel with a convolution sample scale of 1 × 1 and a pooling interval of 2 × 2. The network is sequentially provided with two layers of convolution Conv3-64 and one layer of pooling pool 1/2; two layers of convolution Conv3-128, one layer of pooling pool 1/2; three layers of convolution Conv3-256, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; two full link layers FC-4096. Pooling layer pool1/2 is the largest pooling layer with window size of 2 × 2 and step size of 2, used to reduce the number of weight connections.
The Softmax layer is a multi-classification problem, and processes linear output using a Softmax loss function. If the total number of the classifiers is c, the current elementThe ratio of the index of the element to the sum of the indexes of all elements is the Softmax value S of the current elementi
Figure BDA0002400523740000111
Wherein, ViAnd i is a category index and e represents a constant for the output of the output unit at the previous stage of the classifier.
SiThe corresponding Softmax loss function is Li
Figure BDA0002400523740000112
LiThe closer to 0, the more accurate the classification result is. Therefore, it is required to be based on LiThe classification result is adjusted to achieve the best classification state. sjIndicating the Softmax value of the current element.
Step c, LSTM network training;
step b, training the finished VGG16, removing the added Softmax layer and the layer 2 full connection layer, and connecting the VGG16 in front of the LSTM network. After the video micro expression sequence extracted from each frame of video by adopting the traditional CNN model is calculated by VGG16 to obtain the characteristic value of the video micro expression image, the characteristic value is used as the input of the LSTM and is trained in the LSTM network. When training the LSTM model, every 12 frames of video images are selected as a segment to iterate.
The LSMT network structure is that each VGG16 connects LSTM networks with 128 hidden layer units on a single layer, i.e. the full connection layer dimension of the LSTM network is 128. The network consists of LSTM loop units that include a forgetting gate, an input gate, and an output gate. The forgetting gate function ft is used to decide which information the network is to lose, and its output value is between 0 and 1.
ft=σ(Wfxt+Ufht-1)
Figure BDA0002400523740000113
Where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the last time LSTM. WfAnd UfThe weight coefficient parameters need to be determined after training and learning.
Input gate function itNew information for deciding network to add
it=σ(Wixt+Uiht-1)
Figure BDA0002400523740000114
Where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the last time LSTM. WiAnd UiThe weight coefficient parameters need to be determined after training and learning.
Updating door states
Figure BDA0002400523740000115
For describing the latest state of the current network
Figure BDA0002400523740000121
Figure BDA0002400523740000122
Where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the last time LSTM. WcAnd UcThe weight coefficient parameters need to be determined after training and learning.
New information to be added after the update operation is c(t)
Figure BDA0002400523740000123
Finally, the output gate function O can be determinedtIs composed of
ot=σ(Woxt+Uoht-1)
Where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the last time LSTM. WoAnd UoThe weight coefficient parameter needs to be determined after training and learning, ⊙ represents a dot product operation.
And the output value of the LSTM network at the current moment is ht
ht=ot⊙tanh(Ct)
Figure BDA0002400523740000124
Finally, the LSTM networks are sequentially stacked to increase depth and add the depth to the whole LSTM system model, the LSTM hidden state in the first layer (1 st-1 st layer) of the 1 st network is used as LSTM input in the layer 1, and the networks are sequentially connected in a circulating mode, so that a long-time dynamic network can be formed. The effective information in the facial expression feature sequence stacked in the memory unit (i.e. the network memory of VGG 16) can be continuously updated and output to the next layer of network.
And 4, step 4: sending the verification set into a VGG16 and an LSTM network, and outputting a test result;
the validation set is sent into the VGG16 network after training, and the test result is given by the full connection layer FC in the VGG 16.
And 5: calculating the accuracy of the model test result;
representing the validation set accuracy of the model by ACC
Figure BDA0002400523740000125
Wherein, TP is the number of positive classes predicted by the positive class, namely the number of correct models predicted; TN is the number of predicting the negative class as the negative class number, namely the number of predicting other classes of the model correctly; the FP predicts the negative class as the false positive class number, namely the model predicts the error of other classes as the number of the class; the FN predicts the positive class as the negative class number, and the number of the prediction errors of the model is the number of the prediction errors of the model.
In the present invention, ACC is required to be higher than 95%.
As shown in fig. 15, in step 3), performing expression recognition on the monitored video stream by using the expression recognition model includes:
step A, inputting a monitoring video image to be detected;
and (3) dividing the monitoring video stream to obtain images of each frame, and detecting each frame by taking each 12 frames of video frames as a section.
B, Adaboost, carrying out face position detection on the strong classifier and skin color segmentation obtained by algorithm training;
the face detection is divided into three parts: sample classifier training, skin color segmentation and face detection. The sample classifier training mainly loads a face sample and a non-face sample, and trains the classifier by using an Adaboost algorithm. The skin color segmentation is mainly to obtain a skin color region in a picture by means of skin color region detection and segmentation, and to exclude most non-skin color regions during face detection. And the human face detection is to cascade strong classifiers trained by using an Adaboost algorithm, further carry out human face verification on the skin color area obtained in the skin color segmentation step and position the position of the human face.
The steps for detecting the face position are as follows: firstly, an image to be detected is divided into 20 samples by 20, and the size of a detection window is consistent with that of the samples. Then, the image is traversed from left to right and from top to bottom, 20 regions are detected, and possible face regions are marked. And then, setting a parameter scale to amplify the detection window according to the multiple, and circularly and ceaselessly amplifying the detection window to traverse the image until the detection window exceeds half of the size of the original image and then exiting the cycle. And merging the face areas detected in an overlapping mode once every traversal.
Step C, VGG16, carrying out feature extraction and classification with the LSTM network;
after the face region obtained by the Adaboost algorithm is cut, the face region is sent to the VGG16 and the single-layer LSTM network after training is completed, and the result judged by the model weight is output by the full-connection layer FC with the dimension of 128. The result set is angry, slight, frightened, happy, sad, surprised and neutral.
Step D, outputting a classification result;
and C, the classification result output by the full connection layer FC in the step C is reclassified according to the requirement of suspicious expression identification, the angry, slight, fear and surprise expressions are classified as suspicious expressions, the happy and sad expressions are classified as secondary suspicious expressions, and the neutral expressions are classified as normal expressions. And finally, outputting a classification result set including suspicious expressions, sub-expressions and normal expressions. Suspicious expressions, also known as suspicious expressions with high confidence levels.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any alternative or alternative method that can be easily conceived by those skilled in the art within the technical scope of the present invention should be covered by the scope of the present invention.

Claims (10)

1. The vehicle-mounted monitoring device is characterized by being of a bilateral symmetry structure and comprising a device body (1), a left camera arm (2), a right camera arm (3), a front camera device (4) and a rearview mirror (5), wherein cavities matched with the corresponding camera arms are respectively formed in the left end and the right end of the device body (1), the left camera arm (2) and the right camera arm (3) are telescopically installed in the device body (1), and a driving mechanism for driving the two camera arms to stretch is installed in the middle of the inside of the device body (1); the front camera device (4) is fixed in the middle of the front end face of the device body (1), and the rear view mirror (5) is fixed on the rear end face of the device body (1).
2. The vehicle-mounted monitoring device according to claim 1, wherein each of the left camera arm (2) and the right camera arm (3) comprises a stepping motor (18), a motor base (19), a first rotating shaft (20), a sleeve (21) and a camera module, the sleeve (21) is slidably arranged in the device body (1) and corresponds to the end cavity, the motor base (19) is fixed at the inner end of the sleeve (21), the stepping motor (18) is fixed on the motor base (19), and an output shaft of the stepping motor (18) penetrates through the motor base (19) and is fixedly connected with the first rotating shaft (20) in the sleeve (21);
the camera module comprises a camera fixing box (11), a first knob (12), a second knob (13), a first camera (14), a first LED light supplement lamp (15), a second photosensitive sensor (16) and a stop block (17), wherein the first LED light supplement lamp (15) comprises a plurality of first LED lamp beads, the stop block (17), the first knob (12), the camera fixing box (11) and the second knob (13) are sequentially and fixedly connected, the stop block (17) is matched with the end face of the corresponding end of the device body (1), the first camera (14) is fixed in the middle of the front end face of the camera fixing box (11), the second photosensitive sensor (16) and the first LED lamp beads are uniformly fixed on the front end face of the camera fixing box (11) in the circumferential direction around the first camera (14), and the second knob (13) is rotatably connected with the outer end of the first rotating shaft (20);
the driving mechanism comprises a steering engine (9), a gear (10), a first rack (24) and a second rack (25), the steering engine (9) is fixed at the middle position in the device body (1), the gear (10) is fixed on an output shaft of the steering engine (9), the two racks are horizontally arranged in the device body (1), the two racks are respectively meshed with the gear (10), the left end of the first rack (24) is integrally connected with a sleeve (21) of the left camera shooting arm (2), the second rack (25) is integrally connected with a sleeve (21) of the right camera shooting arm (3), and grooves used for containing the opposite racks are respectively formed in the two sleeves (21).
3. The vehicle-mounted monitoring device according to claim 2, wherein the left camera arm (2) and the right camera arm (3) further comprise second rotating shafts (23), the second rotating shafts (23) are arranged in the sleeves (21) and are rotatably connected with the first rotating shafts (20) through pin shafts (22) to form a joint body capable of rotating forwards and backwards, semicircular guide grooves for forward rotation of the corresponding joint body are formed in the left side and the right side of the front end face of the device body (1), and the two sleeves (21) are correspondingly provided with holes; the second knob (13) is rotatably connected with the outer end of the second rotating shaft (23).
4. The vehicle-mounted monitoring device according to any one of claims 1 to 3, wherein the front camera device (4) comprises a front shell (26), a second camera (27), a second LED light supplement lamp (28) and a third photosensitive sensor (29), the second LED light supplement lamp (28) comprises a plurality of second LED lamp beads, the front shell (26) is integrally fixed at the middle position of the front end face of the device body (1), the second camera (27) is fixed at the middle position of the front end face of the front shell (26), and the third photosensitive sensor (29) and the plurality of second LED lamp beads are circumferentially and uniformly fixed on the front end face of the front shell (26) around the second camera (27); rearview mirror (5) adopt and prevent dazzling the rearview mirror, and rear-view mirror (5) upper portion intermediate position is fixed with first photosensor (6), and rear-view mirror (5) upper portion left side is fixed with left hidden camera (7), and rear-view mirror (5) upper portion right side is fixed with right hidden camera (8).
5. The monitoring system based on expression recognition is characterized by comprising a cloud server, a raspberry group, a Beidou GPS module, a mobile terminal provided with a monitoring APP and the vehicle-mounted monitoring device of claim 4, wherein a steering engine (9) and a two-step motor (18) are matched with driving modules, each driving module is respectively communicated with the raspberry group serial port, each photosensitive sensor is respectively communicated with the raspberry group serial port, the Beidou GPS module is communicated with the raspberry group serial port, and each camera and each LED light supplement lamp are respectively communicated with the raspberry group serial port; the raspberry pie and the cloud server realize real-time communication with the cloud server through a 5G/4G dual-frequency wireless network card, a TCP/IP protocol and an HTTP protocol through a 5G/4G dual-frequency wireless router, and upload monitoring videos and GPS data in real time; the mobile terminal is in real-time communication with the cloud server through a 5G/4G mobile network;
the cloud server includes: the communication module is used for realizing the two-way communication between the mobile terminal and the raspberry pi; the suspicious expression recognition module is used for judging the suspicious degree of the facial expression of the monitoring video; the risk early warning module is used for sending early warning to the mobile terminal when the suspicious expression identification module judges the suspicious expression with high confidence coefficient, and the early warning mode comprises short message prompt, APP popup and push prompt; the cloud data management module is used for storing and managing historical monitoring videos, video clips with suspicious human faces and GPS data;
the mobile terminal monitoring APP is used for managing and caching the cloud monitoring videos and the video segments of suspicious human faces, sending control instructions to the raspberry, and visually displaying monitoring pictures, the video segments of suspicious people, historical vehicle GPS tracks and the current position.
6. The monitoring method based on expression recognition is characterized by comprising the following steps of:
step 1), collecting a monitoring video;
step 2), training an expression recognition model in advance;
and 3) applying the expression recognition model to perform expression recognition on the monitoring video, and sending out early warning when the confidence coefficient of suspicious expression recognition is high.
7. The monitoring method based on expression recognition of claim 6, wherein in the step 2), the method for training the expression recognition model comprises:
step a, inputting an AFEW data set, taking images with a set proportion in the AFEW data set as a training set, and taking the rest images in the AFEW data set as a verification set;
b, training a VGG16 network;
before VGG16 network training, loading model parameters of VGGFace as pre-training weights by adopting a transfer learning method; training of the VGG16 network is started by using a training set, a characteristic value array, picture labels and video labels of each frame are generated after characteristic extraction, a Softmax layer is added behind the VGG16 network for classification operation, and the classification reference is that the label of each picture is the same as that of each video frame;
the VGG16 network uses a convolution kernel of 3 x 3 to carry out convolution operation, the convolution sampling scale is 1 x 1, and the pooling interval is 2 x 2; the network is sequentially provided with two layers of convolution Conv3-64 and one layer of pooling pool 1/2; two layers of convolution Conv3-128, one layer of pooling pool 1/2; three layers of convolution Conv3-256, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; three layers of convolution Conv3-512, one layer of pooling pool 1/2; two full connecting layers FC-4096;
the pooling layer pool1/2 is the largest pooling layer with a window size of 2 × 2 and a step size of 2;
processing the linear output by using a Softmax loss function, and if the total number of the classes of the classifiers is c, determining that the ratio of the index of the current element to the sum of the indexes of all the elements is Sof of the current elementValue of tmax Si
Figure FDA0002400523730000031
Wherein, ViI is a category index and e represents a constant;
Sithe corresponding Softmax loss function is Li
Figure FDA0002400523730000032
LiThe closer to 0, the more accurate the classification result is; sjRepresents the Softmax value of the current element;
step c, LSTM network training;
b, removing the added Softmax layer and the 2 nd layer full connection layer from the VGG16 after training is completed, and connecting the VGG16 in front of the LSTM network; after a video micro expression sequence extracted from each frame of video by adopting a traditional CNN model is calculated by VGG16 to obtain a video micro expression image characteristic value, the video micro expression image characteristic value is used as the input of an LSTM and is trained in an LSTM network; when an LSTM model is trained, selecting each 12 frames of video images as a segment for iteration;
the LSMT network structure is that each VGG16 is connected with an LSTM network with 128 hidden layer units in a single layer; the network consists of an LSTM circulation unit, wherein the circulation unit comprises a forgetting gate, an input gate and an output gate; the forgetting gate function ft is used for determining which information is lost by the network, and the output value of the forgetting gate function ft is between 0 and 1;
ft=σ(Wfxt+Ufht-1)
Figure FDA0002400523730000033
where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the LSTM at the last moment; wfAnd UfFor the weight coefficient parameter, it needs to be trained and learnedThen determining;
input gate function itNew information for deciding network to add
it=σ(Wixt+Uiht-1)
Figure FDA0002400523730000041
Where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the LSTM at the last moment; wiAnd UiThe weight coefficient parameters are determined after training and learning;
updating door states
Figure FDA0002400523730000046
For describing the latest state of the current network
Figure FDA0002400523730000042
Figure FDA0002400523730000043
Where t denotes the current time, xtFor the network input at the present moment, ht-1The network output value of the LSTM at the last moment; wcAnd UcThe weight coefficient parameters are determined after training and learning;
new information to be added after the update operation is ct
Figure FDA0002400523730000044
Finally, the output gate function O can be determinedtIs composed of
ot=σ(Woxt+Uoht-1)
Where t denotes the current time, xtFor the network at the present momentInput, ht-1The network output value of the LSTM at the last moment; woAnd Uo⊙ represents point multiplication operation, which is weight coefficient parameter and needs to be determined after training and learning;
and the output value of the LSTM network at the current moment is ht
ht=ot⊙tanh(Ct)
Figure FDA0002400523730000045
Finally, sequentially stacking all layers of the LSTM network to increase depth and add the LSTM network to the whole LSTM system model, adopting the LSTM hidden state in the first layer of the 1 st network as LSTM input in the layer 1, and sequentially and circularly connecting all layers of networks to form a long-time dynamic network; effective information in the facial expression feature sequence can be continuously updated and output to the next layer of network;
d. sending the verification set into a VGG16 and an LSTM network, and outputting a test result;
sending the verification set into a VGG16 network after training, and giving a test result by a full connection layer FC in VGG 16;
e. calculating the accuracy of the model test result;
representing the validation set accuracy of the model by ACC
Figure FDA0002400523730000051
Wherein, TP is the number of positive classes predicted by the positive class, namely the number of correct models predicted; TN is the number of predicting the negative class as the negative class number, namely the number of predicting other classes of the model correctly; the FP predicts the negative class as the false positive class number, namely the model predicts the error of other classes as the number of the class; the FN predicts the positive class as the negative class number and fails to report, namely the model is the number of the prediction errors of the current class; ACC requirement is higher than 95%.
8. The method for monitoring based on expression recognition according to claim 7, wherein 70% of images in the AFEW data set are used as a training set and 30% of images are used as a verification set in step a.
9. The monitoring method based on expression recognition of claim 7 or 8, wherein in the step 3), the expression recognition of the monitoring video by applying the expression recognition model comprises:
step A, inputting a monitoring video image to be detected;
dividing the monitoring video to obtain images of each frame, and taking each 12 frames of video frames as a section to respectively detect each frame;
b, Adaboost, carrying out face position detection on the strong classifier and skin color segmentation obtained by algorithm training;
the face detection is divided into three parts: training a sample classifier, segmenting skin color and detecting human face; training a sample classifier to load a human face sample and a non-human face sample, and training the classifier by using an Adaboost algorithm; the skin color segmentation acquires a skin color area in the picture in a skin color area detection and segmentation mode, and excludes most non-skin color areas during face detection; the human face detection is to cascade strong classifiers trained by using an Adaboost algorithm, further carry out human face verification on the skin color area obtained in the skin color segmentation step, and position the position of the human face;
step C, VGG16, carrying out feature extraction and classification with the LSTM network;
after a face region obtained by an Adaboost algorithm is cut, the face region is sent to a VGG16 and a single-layer LSTM network after training is completed, a result judged by the model weight is output by a full-connection layer FC with the dimension of 128, and a result set comprises anger, gables, fear, happiness, sadness, surprise and neutrality;
step D, outputting a classification result;
classifying the classification result output by the full connection layer FC in the step C again according to the requirement of suspicious expression identification, classifying the angry, slight, fear and surprise expressions into suspicious expressions, classifying the happy and sad expressions into secondary suspicious expressions, and classifying the neutral expressions into normal expressions; finally, the output classification result set comprises suspicious expressions, sub-expressions and normal expressions; suspicious expressions, also known as suspicious expressions with high confidence levels.
10. The monitoring method based on expression recognition according to 9, wherein in the step B, the step of detecting the face position is: firstly, dividing an image to be detected into 20 samples by 20 samples, wherein the size of a detection window is consistent with that of the samples; then, traversing the image from left to right and from top to bottom, detecting 20 regions by 20, and marking out possible face regions; then, setting a parameter scale to amplify the detection window according to the multiple, and circularly and ceaselessly amplifying the detection window to traverse the image until the detection window exceeds half of the size of the original image and then exiting the cycle; and merging the face areas detected in an overlapping mode once every traversal.
CN202010145340.9A 2020-03-05 2020-03-05 Monitoring system and monitoring method based on expression recognition Active CN111212275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010145340.9A CN111212275B (en) 2020-03-05 2020-03-05 Monitoring system and monitoring method based on expression recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010145340.9A CN111212275B (en) 2020-03-05 2020-03-05 Monitoring system and monitoring method based on expression recognition

Publications (2)

Publication Number Publication Date
CN111212275A true CN111212275A (en) 2020-05-29
CN111212275B CN111212275B (en) 2024-06-28

Family

ID=70786161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010145340.9A Active CN111212275B (en) 2020-03-05 2020-03-05 Monitoring system and monitoring method based on expression recognition

Country Status (1)

Country Link
CN (1) CN111212275B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070011A (en) * 2020-09-08 2020-12-11 安徽兰臣信息科技有限公司 Noninductive face recognition camera shooting snapshot machine for finding lost children
CN113486867A (en) * 2021-09-07 2021-10-08 北京世纪好未来教育科技有限公司 Face micro-expression recognition method and device, electronic equipment and storage medium
CN114664084A (en) * 2022-03-02 2022-06-24 河南职业技术学院 Intelligent transportation system with face recognition function

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100201816A1 (en) * 2009-02-06 2010-08-12 Lee Ethan J Multi-display mirror system and method for expanded view around a vehicle
US20110080481A1 (en) * 2009-10-05 2011-04-07 Bellingham David W Automobile Rear View Mirror Assembly for Housing a Camera System and a Retractable Universal Mount
CN204915479U (en) * 2015-06-08 2015-12-30 芜湖瑞泰精密机械有限公司 Infrared rear -view mirror of car
CN207638752U (en) * 2017-12-07 2018-07-20 重庆互兴科技有限公司 A kind of 360 degree of adjustment automobile data recorder
CN109955786A (en) * 2019-03-27 2019-07-02 苏州清研微视电子科技有限公司 Vehicle-mounted active safety monitoring device
CN110422120A (en) * 2019-08-16 2019-11-08 深圳市尼欧科技有限公司 Intelligent back vision mirror and its vehicle with automatic telescopic camera
CN211019056U (en) * 2020-03-05 2020-07-14 南京工程学院 Vehicle-mounted monitoring device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100201816A1 (en) * 2009-02-06 2010-08-12 Lee Ethan J Multi-display mirror system and method for expanded view around a vehicle
US20110080481A1 (en) * 2009-10-05 2011-04-07 Bellingham David W Automobile Rear View Mirror Assembly for Housing a Camera System and a Retractable Universal Mount
CN204915479U (en) * 2015-06-08 2015-12-30 芜湖瑞泰精密机械有限公司 Infrared rear -view mirror of car
CN207638752U (en) * 2017-12-07 2018-07-20 重庆互兴科技有限公司 A kind of 360 degree of adjustment automobile data recorder
CN109955786A (en) * 2019-03-27 2019-07-02 苏州清研微视电子科技有限公司 Vehicle-mounted active safety monitoring device
CN110422120A (en) * 2019-08-16 2019-11-08 深圳市尼欧科技有限公司 Intelligent back vision mirror and its vehicle with automatic telescopic camera
CN211019056U (en) * 2020-03-05 2020-07-14 南京工程学院 Vehicle-mounted monitoring device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱光谦 李明诚: "解析汽车电动后视镜", 《汽车维修与保养》, 1 June 2013 (2013-06-01) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070011A (en) * 2020-09-08 2020-12-11 安徽兰臣信息科技有限公司 Noninductive face recognition camera shooting snapshot machine for finding lost children
CN113486867A (en) * 2021-09-07 2021-10-08 北京世纪好未来教育科技有限公司 Face micro-expression recognition method and device, electronic equipment and storage medium
CN113486867B (en) * 2021-09-07 2021-12-14 北京世纪好未来教育科技有限公司 Face micro-expression recognition method and device, electronic equipment and storage medium
CN114664084A (en) * 2022-03-02 2022-06-24 河南职业技术学院 Intelligent transportation system with face recognition function

Also Published As

Publication number Publication date
CN111212275B (en) 2024-06-28

Similar Documents

Publication Publication Date Title
US11488398B2 (en) Detecting illegal use of phone to prevent the driver from getting a fine
CN111212275A (en) Vehicle-mounted monitoring device, monitoring system and monitoring method based on expression recognition
US11205068B2 (en) Surveillance camera system looking at passing cars
TWI754887B (en) Method, device and electronic equipment for living detection and storage medium thereof
US11193312B1 (en) Child safety lock
US10744936B1 (en) Using camera data to automatically change the tint of transparent materials
US11109152B2 (en) Optimize the audio capture during conference call in cars
US11970156B1 (en) Parking assistance using a stereo camera and an added light source
WO2020226696A1 (en) System and method of generating a video dataset with varying fatigue levels by transfer learning
CN110291499A (en) Use the system and method for the Computational frame that the Driver Vision of complete convolution framework pays attention to
CN110678873A (en) Attention detection method based on cascade neural network, computer device and computer readable storage medium
US11645779B1 (en) Using vehicle cameras for automatically determining approach angles onto driveways
CN106559645A (en) Based on the monitoring method of video camera, system and device
CN111783654B (en) Vehicle weight identification method and device and electronic equipment
CN111160237A (en) Head pose estimation method and apparatus, electronic device, and storage medium
CN113065645A (en) Twin attention network, image processing method and device
CN109948509A (en) Obj State monitoring method, device and electronic equipment
CN211019056U (en) Vehicle-mounted monitoring device
US11840253B1 (en) Vision based, in-vehicle, remote command assist
US11531197B1 (en) Cleaning system to remove debris from a lens
WO2021189321A1 (en) Image processing method and device
US11951833B1 (en) Infotainment system permission control while driving using in-cabin monitoring
CN113421191B (en) Image processing method, device, equipment and storage medium
CN115147818A (en) Method and device for identifying mobile phone playing behaviors
CN113283286A (en) Driver abnormal behavior detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant