CN117218324A

CN117218324A - Camera regulation and control system and method based on artificial intelligence

Info

Publication number: CN117218324A
Application number: CN202311341381.5A
Authority: CN
Inventors: 张利新; 马建功; 范文培; 王瑞民
Original assignee: Guangdong Sohoo Technology Co ltd
Current assignee: Guangdong Sohoo Technology Co ltd
Priority date: 2023-10-17
Filing date: 2023-10-17
Publication date: 2023-12-12

Abstract

The application discloses a camera regulation and control system and a camera regulation and control method based on artificial intelligence, wherein the system comprises a control module, a memory and an early warning module, the control module is in bidirectional connection with the memory, the early warning module is in bidirectional connection with the memory, the memory comprises a route memory, data generated by the early warning module and historical data for predicting routes are stored in the route memory, the control module constructs a route prediction model according to the historical data, and the route prediction model is stored in the route memory. The method has the beneficial effects that the security centers passing through the escape route are identified by predicting the escape route of the target person, and the security centers are automatically reminded to make preparation for departure, so that the security center nearest to the escape route can respond and take action in time in the process of escaping of the target person, and the security center can deploy security personnel more rapidly.

Description

Camera regulation and control system and method based on artificial intelligence

Technical Field

The application relates to the technical field of video monitoring, in particular to a camera regulation and control system and method based on artificial intelligence.

Background

The camera capable of video monitoring is arranged to improve security level, the camera for video monitoring can detect specific abnormal conditions or activities by using an intelligent algorithm, and once abnormality is found, the camera can automatically trigger an alarm to play a certain role in deterrence. However, the conventional camera for video monitoring has the following technical problems:

1. most of the security centers which are closest to the target person are automatically reminded after the target person is identified, but the target person may change the escape route or select a route with a longer distance, and if the security center sends security personnel only according to the position of the target person, the security center may miss the opportunity to catch the target person;

2. under the scene that a multi-person conversation occurs in the image, the audio of the target person cannot be acquired independently, so that the analysis of the target person in the image becomes very difficult;

3. usually, only behavior abnormality is concerned, but influence of emotion on the behavior is ignored, so that accuracy in the aspect of abnormality identification is possibly reduced, and potential threat cannot be timely perceived, and an alarm cannot be timely triggered or corresponding measures are taken, so that the accuracy of alarm is greatly reduced.

Disclosure of Invention

The application aims to provide a camera regulation and control system and method based on artificial intelligence so as to solve the problems in the background technology.

In order to achieve the above purpose, the present application provides the following technical solutions: the camera regulation and control system based on artificial intelligence comprises a control module, a memory and an early warning module, wherein the control module is in bidirectional connection with the memory, and the early warning module is in bidirectional connection with the memory.

The early warning module comprises a positioning sub-module and a scene extraction sub-module.

The memory comprises a route memory, and data generated by the early warning module and historical data for predicting routes are stored in the route memory.

The historical data includes initial position information of a plurality of historical target persons, an escape route, and a two-dimensional model of an escape scene.

And the control module constructs a route prediction model according to the historical data, and the route prediction model is stored in the route memory.

The face recognition device comprises a memory, an audio module, a face recognition module and a rotating module, wherein the audio module is in bidirectional connection with the memory, the face recognition module is in bidirectional connection with the memory, and the output end of the control module is connected with the input end of the rotating module.

The memory also comprises an audio memory and a face memory.

The audio module comprises an audio acquisition sub-module, an audio separation sub-module, an audio identification sub-module and an audio matching sub-module, wherein the audio acquisition sub-module consists of a microphone array.

The face recognition module comprises a face acquisition sub-module, a face region sub-module, a feature extraction sub-module and a lip recognition sub-module.

The audio data and the sensitive words generated by the audio module are stored in the audio memory, and the face data generated by the face recognition module are stored in the face memory.

The rotating module is connected with the camera.

The face recognition module further comprises an emotion classification sub-module, and a plurality of face data with emotion labels are stored in the face memory.

The system also comprises a behavior recognition module which is connected with the memory in a bidirectional way.

The memory also includes a behavior memory.

The behavior recognition module comprises a motion trail construction sub-module, a human skeleton extraction sub-module, a human skeleton sequence construction sub-module and a human skeleton sequence analysis sub-module.

And the abnormal behavior data generated by the behavior recognition module and human skeleton sequence templates corresponding to various abnormal behaviors are stored in the behavior memory.

The alarm device is characterized by further comprising an alarm module, wherein the output end of the control module is connected with the input end of the alarm module, and the alarm module comprises a horn sub-module and an alarm lamp sub-module.

The method comprises a rotation method, a target locking method, an abnormality judging method and an early warning method.

The working flow of the control module of the application is as follows:

step S1: the rotation method is performed.

Step S2: and executing the target locking method to lock the target person.

Step S3: and executing the abnormality judgment method.

Step S4: and judging whether the target person is abnormal or not.

Step S5: executing the early warning method and starting the warning module.

The early warning method comprises the following steps:

step S101: and determining target position information of the target person and marking the target position information on a map.

Step S102: and extracting the characteristic information of the scene where the target person is located, and generating a target two-dimensional model.

Step S103: multiple target routes for which the target person may escape are acquired and marked on the map.

Step S104: and measuring the distances between the predicted target routes and the security center position on the map.

Step S105: and acquiring a plurality of target security centers meeting the distance requirement according to the set distance threshold.

Step S106: and sending information for preparing to start to a plurality of target security centers.

And the control module controls the warning module to start when the early warning method is executed.

The rotation method specifically comprises the following steps: the method comprises the steps of obtaining audio data through a microphone array, transmitting the audio data to a control module through an audio memory for processing and analysis, obtaining a microphone which collects the maximum sound energy of the audio data, determining the azimuth information of the microphone as the azimuth of sending the audio data, sending the azimuth to the control module through the audio memory, and sending a command to a rotation module by the control module, so that the camera is turned to the occurrence position of the audio data.

The control module controls the rotation module to rotate at fixed time.

The target locking method comprises the following steps:

step S201: the audio splitting module splits the audio data into a plurality of individual audio data and determines the plurality of individual audio data as an audio data set.

Step S202: the audio recognition sub-module can recognize whether the audio data set is provided with sensitive words, and the single person audio data with the sensitive words is determined to be target audio data.

Step S203: the audio recognition sub-module extracts a time domain and a frequency domain of the target audio data to obtain a plurality of audio feature vectors, and processes the plurality of audio feature vectors to obtain a first feature vector, wherein the first feature vector is a one-dimensional array, and the first feature vector represents information of the target audio data.

Step S204: and acquiring face data in the image through the face acquisition submodule, dividing the face data into a plurality of face data through the face dividing submodule, and determining the plurality of face data as a face data set.

Step S205: the feature extraction submodule extracts five-sense organ information of a plurality of face data respectively, each face data extracts a plurality of face feature vectors, the face feature vectors are processed to obtain second feature vectors, the second feature vectors are one-dimensional arrays, and the second feature vectors represent the face information of the face data.

Step S206: and recognizing lip actions of the plurality of face data through the lip recognition sub-module to obtain a plurality of lip data, and determining the plurality of lip data as a lip data set.

The lip recognition submodule extracts the shape, the change speed and the movement track of the plurality of lip data respectively, each lip data is extracted to a plurality of lip feature vectors, the plurality of lip feature vectors are processed to obtain a third feature vector, the third feature vector is a one-dimensional array, and the third feature vector represents information obtained by recognizing lip actions in an image.

Step S207: the first feature vector, the plurality of second feature vectors, and the plurality of third feature vectors are input into the audio matching submodule.

The audio matching submodule comprises a sound matching model and a lip synchronization model, each distance between the target audio feature vector and a plurality of face feature vectors is calculated through the sound matching model to obtain a sound matching score, each similarity between the target audio feature vector and the plurality of lip feature vectors is calculated through the lip synchronization model to obtain a lip synchronization score, a face image with the high score of both the sound matching score and the lip synchronization score is determined to be target face data, and the target face data is determined to be the target person.

The abnormality judgment method includes the steps of:

step S301: and judging whether the emotion of the target person is abnormal or not.

Step S302: and judging whether the behavior of the target person is abnormal or not.

The step S301 is performed simultaneously with the step S302.

The step S4 specifically includes: if the emotion and the behavior of the target person are abnormal, executing the early warning method and starting the warning module, wherein only one of the emotion and the behavior of the target person is abnormal, and executing the rotating method when the target person is normal.

The step S301 includes the steps of:

s301.1: and extracting a plurality of feature vectors from the target face data through the feature extraction submodule, and processing the feature extraction submodule to obtain a fourth feature vector, wherein the fourth feature vector is a one-dimensional array.

S301.2: and carrying out model training on the face data with the emotion labels, and processing a plurality of feature vectors obtained in the training process to obtain feature template vectors, wherein the feature template vectors are one-dimensional arrays, and in the emotion classification submodule, the fourth feature vectors are multiplied by the feature module vectors.

The numerical value generated by multiplying the fourth feature vector and the feature module vector is any subinterval of [0,1 ], [1,2 ], [2,3 ], [3,4 ], [4,5 ], [5,6 ], [6, 7), and the classification result of the subinterval is anger, light, aversion, fear, happiness, sadness and surprise.

And the numerical value generated by multiplying the fourth characteristic vector and the characteristic module vector is in any interval of [0, 1), [2, 3), and the emotion of the target person is abnormal.

The step S302 includes the steps of:

s302.1: and the motion trail constructing sub-module tracks the target face data through a target tracking algorithm and constructs the motion trail of the target person.

S302.2: the human skeleton extraction submodule extracts human key points of the target person from the image by utilizing a convolutional neural network, and connects the human key points belonging to the whole target person by utilizing a partial affinity domain to extract the human skeleton of the target person.

S302.3: and matching the human skeleton of the target person to the motion trail of the target person through the human skeleton sequence construction submodule to construct the human skeleton sequence of the target person.

S302.4: and carrying out behavioral anomaly analysis on the target person through the human skeleton sequence analysis submodule, and matching the human skeleton sequence of the target person with human skeleton sequence templates corresponding to various abnormal behaviors by utilizing a dynamic time warping algorithm to obtain a matching score.

The matching score is larger than a preset threshold value, and the target person is abnormal in behavior.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Compared with the prior art, the application has the beneficial effects that:

1. by predicting the escape route of the target person, security centers through which the escape route passes are identified, and the security centers are automatically reminded to make preparation for departure, so that the security center nearest to the escape route can timely respond and take action in the escape process of the target person, and the security centers can deploy security personnel more rapidly.

2. By adding the correspondence between the face and the audio, whether the sensitive word exists in the audio is judged, the corresponding face is locked into the target person, the behavior of the target person in the monitoring video can be more comprehensively analyzed and understood, and more accurate safety analysis and alarm are provided.

3. The emotion and the behavior of the target person are judged to be abnormal or not through the expression recognition and analysis of the target person and the behavior abnormality recognition of the target person, and then an alarm is triggered or corresponding measures are taken, so that the intelligence and the accuracy of video monitoring are improved, and the accuracy of security monitoring alarm is greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.

FIG. 1 is a block diagram of a system of the present application.

FIG. 2 is a flow chart of the operation of the controllable module of the present application.

FIG. 3 is a flow chart of the present application and the early warning method.

Fig. 4 is a circuit diagram of a control module according to the present application.

Fig. 5 is a circuit diagram of an audio acquisition sub-module of the present application.

The control system comprises a control module, a first control module and a second control module, wherein the first control module is used for controlling the first control module; 2. a memory; 21. a route memory; 22. an audio memory; 23. a face memory; 24. a behavior memory; 3. an early warning module; 31. positioning sub-modules; 32. a scene extraction sub-module; 4. an audio module; 41. an audio acquisition sub-module; 42. an audio frequency separation sub-module; 43. an audio recognition sub-module; 44. an audio matching sub-module; 5. a face recognition module; 51. a face acquisition sub-module; 52. a face region molecular module; 53. a feature extraction sub-module; 54. a lip recognition sub-module; 55. a mood classification sub-module; 6. a behavior recognition module; 61. a motion trail constructing sub-module; 62. a human skeleton extraction submodule; 63. constructing a submodule by using a human skeleton sequence; 64. a human skeleton sequence analysis sub-module; 7. a rotation module; 8. a warning module; 81. a horn submodule; 82. and the warning lamp sub-module.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus consistent with some aspects of the disclosure as detailed in the accompanying claims.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, the system for regulating and controlling a camera based on artificial intelligence according to the embodiment of the application comprises a control module 1, a memory 2, an early warning module 3, an audio module 4, a face recognition module 5, a behavior recognition module 6, a rotation module 7 and a warning module 8, wherein the control module 1 is in bidirectional connection with the memory 2, the early warning module 3 is in bidirectional connection with the memory 2, the audio module 4 is in bidirectional connection with the memory 2, the face recognition module 5 is in bidirectional connection with the memory 2, the behavior recognition module 6 is in bidirectional connection with the memory 2, the output end of the control module 1 is connected with the input end of the rotation module 7, and the output end of the control module 1 is connected with the input end of the warning module 8.

The memory 2 is in bidirectional connection with the control module 1, and the early warning module 3, the audio module 4, the face recognition module 5 and the behavior recognition module 6 realize connection communication through the memory 2.

The memory 2 includes a route memory 21, an audio memory 22, a face memory 23, and a behavior memory 24, where data generated by the early warning module 3 and historical data for predicting routes are stored in the route memory 21, audio data generated by the audio module 4 and sensitive words are stored in the audio memory 22, face data generated by the face recognition module 5 and a plurality of face data with emotion labels are stored in the face memory 23, and abnormal behavior data generated by the behavior recognition module 6 and human skeleton sequence templates of various abnormal behaviors are stored in the behavior memory 24.

The historical data for the predicted route includes initial location information of a plurality of historical target persons, an escape route, and a two-dimensional model of an escape scene.

The control module 1 retrieves the initial position information, the escape routes and the two-dimensional models of the escape scenes from the route memory 21, and the control module 1 trains the deep learning network according to the history data to construct a route prediction model, and the route prediction model is stored in the route memory 21.

The early warning module 3 includes a positioning sub-module 31 and a scene extraction sub-module 32.

The audio module 4 comprises an audio acquisition sub-module 41, an audio separation sub-module 42, an audio identification sub-module 43 and an audio matching sub-module 44, wherein the audio acquisition sub-module 41 is composed of a microphone array.

The face recognition module 5 comprises a face acquisition sub-module 51, a face distinguishing sub-module 52, a feature extraction sub-module 53, a lip recognition sub-module 54 and an emotion classification sub-module 55.

The behavior recognition module 6 comprises a motion trail construction sub-module 61, a human skeleton extraction sub-module 62, a human skeleton sequence construction sub-module 63 and a human skeleton sequence analysis sub-module 64.

The rotating module 7 is connected with the camera, and the camera is driven to rotate together when the rotating module 7 is started.

The warning module 8 comprises a speaker sub-module 81 and a warning light sub-module 82.

Referring to fig. 2-3, the camera regulation and control system and method based on artificial intelligence comprise a rotation method, an identification method and an early warning method based on a geographic information system for reminding security personnel of preparing to start.

The workflow of the control module 1 of the present application is as follows:

step S1: the rotation method is performed.

Step S2: and executing the target locking method to lock the target person.

Step S3: and executing the abnormality judgment method.

Step S4: and judging whether the target person is abnormal or not.

Step S5: executing the early warning method and starting the warning module 8.

The early warning method is based on the design of a geographic information system and is used for reminding security personnel of preparing to start.

The early warning method comprises the following steps:

step S101: the location sub-module 31 determines the location information of the target person, determines the location information as target location information, sends the target location information to the geographic information system, and marks the target location information on a map.

The positioning sub-module 31 may be a Global Positioning System (GPS).

Step S102: the scene extraction sub-module 32 extracts information such as geographic features and road networks of the scene where the target person is located through map data of the target position information in the geographic information system, and generates a corresponding target two-dimensional model.

Step S103: the control module 1 invokes the target position information, the target two-dimensional model and the route prediction model, takes the target position information and the target two-dimensional model as inputs of the route prediction model, outputs a plurality of target routes for the target person to possibly escape, sends the plurality of target routes to the geographic information system, and marks the plurality of target routes on the map.

Step S104: and the geographic information system performs distance measurement on the predicted target routes and security center positions marked in the geographic information system.

Step S105: the geographic information system marks a plurality of security centers meeting the distance requirements of a plurality of target routes according to a set distance threshold, and the security center meeting the distance requirements is determined as a target security center.

Step S106: the geographic information system sends information of a plurality of target security centers to the control module 1, and the control module 1 sends information ready for departure to the plurality of target security centers in a wireless communication mode according to the information of the target security centers.

The early warning method ensures that the security center closest to a plurality of target routes can respond and take actions in time, so that the security center can deploy security personnel more rapidly. The working mode solves the problem that the security center closest to the target person is automatically reminded after the target person is identified in the prior art, but the target person may change the escape route or select a route with a longer distance, and if the security center sends security personnel only according to the position of the target person, the opportunity for grabbing the target person may be missed.

The rotation method is to acquire audio data through the microphone array of the audio acquisition sub-module 41, transmit the audio data acquired by the microphone array to the control module 1 through the audio memory 22 for processing and analysis, obtain a microphone acquiring the maximum sound energy of the audio data, determine the azimuth information of the microphone as the azimuth of sending the audio data, send the azimuth to the control module 1 through the audio memory 22, and send a command to the rotation module 7 by the control module 1 to make the camera turn to the occurrence position of the audio data.

The rotation module 7 can rotate 360 degrees, so that the omnidirectional monitoring is realized, no dead angle exists, a wider area can be covered, more comprehensive monitoring information is provided, the control module 1 controls the rotation module 7 to rotate at regular time so as to prevent abnormal personnel from appearing at the shooting dead angle of the camera, but the audio collecting sub-module 41 does not collect audio at the shooting dead angle.

The target locking method comprises the following steps:

step S201: the audio splitting module 42 splits the audio data into a plurality of individual audio data, and determines the plurality of individual audio data as an audio data set.

Step S202: the audio recognition sub-module 43 may recognize whether the audio data set has a sensitive word, and determine the single audio data with the sensitive word as the target audio data.

Step S203: the audio recognition sub-module 43 extracts the time domain and the frequency domain of the target audio data through mel-frequency cepstrum coefficients to obtain a plurality of audio feature vectors, and averages the audio feature vectors to obtain a first feature vector, wherein the first feature vector is a one-dimensional array, and the first feature vector represents information of the target audio data.

Step S204: the face data in the image is acquired through the face acquisition sub-module 51, the face data is divided into a plurality of face data through the face division sub-module 52, and the plurality of face data are determined to be a face data set.

Step S205: the feature extraction submodule 53 extracts five-element information of a plurality of face data through a convolutional neural network model, each face data extracts a plurality of face feature vectors, a second feature vector is obtained after the face feature vectors are subjected to mean pooling, the second feature vector is a one-dimensional array, and the second feature vector represents face information of the face data.

Step S206: and the lip recognition sub-module 54 recognizes the lip actions of a plurality of face data to obtain a plurality of lip data, and determines the lip data as a lip data set.

The lip recognition submodule 54 extracts the shape, the change speed and the motion track of a plurality of lip data through a convolutional neural network model, each lip data is extracted to obtain a plurality of lip feature vectors, the lip feature vectors are subjected to mean pooling to obtain a third feature vector, the third feature vector is a one-dimensional array, and the third feature vector represents information obtained by recognizing lip actions in an image.

Step S207: the first feature vector, the plurality of second feature vectors, and the plurality of third feature vectors are input into the audio matching submodule 44.

The audio matching sub-module 44 includes a tone matching model and a lip synchronization model, where the tone matching model may be a neural network model, a bayesian network model, or a support vector machine model, and the tone matching model calculates each distance between the target audio feature vector and the face feature vectors to obtain a tone matching score, where the higher the tone matching score is, the smaller the distance is, and the higher the matching degree between the audio and the face is.

And the lip synchronization model calculates the similarity between the target audio feature vector and the lip feature vectors through cosine similarity to obtain lip synchronization scores, and the higher the lip synchronization score is, the higher the similarity is, and the higher the lip and audio synchronization is.

And determining the face image with the high score of the tone matching score and the high score of the lip synchronization score as target face data, and determining the target face data as the target person, so that the audio of the target person can be independently obtained under the scene that a multi-person conversation appears in the image. The working mode solves the problem that in the prior art, under the scene of multi-person dialogue in the image, the audio of the target person cannot be acquired independently, so that the analysis of the target person in the image becomes very difficult.

The abnormality judgment method includes the steps of:

S301.1: and extracting a plurality of feature vectors from the target face data through the feature extraction submodule 53, and carrying out mean pooling to obtain a fourth feature vector, wherein the fourth feature vector is a one-dimensional array.

S301.2: model training is performed on the face data with the emotion labels by using a deep learning algorithm, a feature template vector is obtained by carrying out mean pooling on a plurality of feature vectors obtained in the training process, the feature template vector is a one-dimensional array, and in the emotion classification sub-module 55, the fourth feature vector is multiplied by the feature module vector.

The emotion of the target person is either anger or aversion, namely the numerical value generated by multiplying the fourth eigenvector and the eigenvector is in any interval of [0, 1), [2, 3), and the emotion of the target person is abnormal.

S302.1: the motion trajectory construction sub-module 61 tracks the target face data through a target tracking algorithm (KCF) to construct a motion trajectory of the target person.

S302.2: the two-dimensional human body posture estimation needs to find human body key points to establish a behavior characteristic model, the joint points of each human body part serve as key points to well reflect the action posture of a human body, and the human body skeleton extraction submodule 62 extracts the human body key points of the target person from an image by using a convolutional neural network.

Each segment of body skeleton corresponds to a partial affinity domain diagram, the partial affinity domain is a set of two-dimensional vector fields, the size of the partial affinity domain is consistent with that of the original diagram, each point of the human body key points is a two-dimensional vector, namely a horizontal direction component and a vertical direction component, the two-dimensional vector represents the position and the direction of each segment of body skeleton, and the human body key points belonging to the whole target person can be connected by utilizing the partial affinity domain to extract the human body skeleton of the target person.

S302.3: the human skeleton of the target person is matched with the motion trail of the target person through the human skeleton sequence constructing submodule 63, so that the human skeleton sequence of the target person is constructed.

S302.4: the analysis sub-module 64 of human skeleton sequence analyzes the behavior abnormality of the target person, and matches the human skeleton sequence of the target person with human skeleton sequence templates corresponding to various abnormal behaviors by using a dynamic time warping algorithm to obtain a matching score, wherein the matching score is larger than a matching threshold, and the behavior abnormality of the target person is stored in the behavior memory 24 in advance.

The step S301 and the step 302 are performed simultaneously.

The step S4 specifically includes: and if the emotion and the behavior of the target person are abnormal, executing the early warning method, and if only one of the emotion and the behavior of the target person is abnormal, executing the rotating method.

Through the expression recognition and analysis of the target person and the behavior abnormality recognition of the target person, whether the emotion and the behavior of the target person are abnormal at the same time is judged, and then the early warning method and the warning module 8 are executed, so that the intellectualization and the accuracy of video monitoring are improved, and the accuracy of security monitoring and warning is greatly improved. . The working mode solves the problems that the prior art is generally only concerned about abnormal behavior, but neglects the influence of emotion on the behavior, the accuracy rate in the aspect of identifying the abnormal behavior is possibly reduced, and potential threats can not be timely perceived, and an alarm can not be timely triggered or corresponding measures can not be taken, so that the accuracy of the alarm is greatly reduced.

The camera is remotely connected with the user terminal, a user can view video monitoring on a mobile phone or a computer in real time, the distance threshold value in the early warning method is set at the user terminal and stored in the route memory 21, and even if the camera does not recognize that the person is abnormal, the warning module 8 is not started, and the user can remotely start the warning module 8 for driving the abnormal person.

Referring to fig. 4-5, the camera regulation system and method based on artificial intelligence is a circuit diagram of the control module 1 and the audio acquisition sub-module 41.

Although embodiments of the present application have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the application, the scope of which is defined in the appended claims and their equivalents.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope of the disclosure being indicated by the following claims.

Claims

1. The camera regulation and control system based on artificial intelligence is characterized by comprising a control module (1), a memory (2) and an early warning module (3), wherein the control module (1) is in bidirectional connection with the memory (2), and the early warning module (3) is in bidirectional connection with the memory (2);

the early warning module (3) comprises a positioning sub-module (31) and a scene extraction sub-module (32);

the memory (2) comprises a route memory (21), and the data generated by the early warning module (3) and the historical data for predicting the route are stored in the route memory (21);

the historical data comprises initial position information of a plurality of historical target characters, an escape route and a two-dimensional model of an escape scene;

the control module (1) constructs a route prediction model from the historical data, the route prediction model being stored in the route memory (21).

2. The camera regulation and control system based on artificial intelligence according to claim 1, further comprising an audio module (4), a face recognition module (5) and a rotation module (7), wherein the audio module (4) is bidirectionally connected with the memory (2), the face recognition module (5) is bidirectionally connected with the memory (2), and an output end of the control module (1) is connected with an input end of the rotation module (7);

the memory (2) also comprises an audio memory (22) and a face memory (23);

the audio module (4) comprises an audio acquisition sub-module (41), an audio separation sub-module (42), an audio identification sub-module (43) and an audio matching sub-module (44), wherein the audio acquisition sub-module (41) consists of a microphone array;

the face recognition module (5) comprises a face acquisition submodule (51), a face distinguishing submodule (52), a feature extraction submodule (53) and a lip recognition submodule (54);

the audio data and sensitive words generated by the audio module (4) are stored in the audio memory (22), and the face data generated by the face recognition module (5) are stored in the face memory (23);

the rotating module (7) is connected with the camera.

3. An artificial intelligence based camera regulation system according to claim 2, wherein the face recognition module (5) further comprises an emotion classification sub-module (55), and a plurality of face data with emotion labels are stored in the face memory (23).

4. The camera regulation and control system based on artificial intelligence according to claim 1, further comprising a behavior recognition module (6), wherein the behavior recognition module (6) is bi-directionally connected with the memory (2);

the memory (2) further comprises a behavior memory (24);

the behavior recognition module (6) comprises a motion trail construction sub-module (61), a human skeleton extraction sub-module (62), a human skeleton sequence construction sub-module (63) and a human skeleton sequence analysis sub-module (64);

abnormal behavior data generated by the behavior recognition module (6) and human skeleton sequence templates corresponding to various abnormal behaviors are stored in the behavior memory (24).

5. The camera regulation and control system based on artificial intelligence according to claim 1, further comprising a warning module (8), wherein the output end of the control module (1) is connected with the input end of the warning module (8), and the warning module (8) comprises a horn sub-module (81) and a warning lamp sub-module (82).

6. The camera regulation and control method based on artificial intelligence is applied to the camera regulation and control system based on artificial intelligence as set forth in the claims 1-5, and is characterized by comprising a rotation method, a target locking method, an abnormality judgment method and an early warning method;

the working flow of the control module (1) is as follows:

step S1: performing the rotation method;

step S2: executing the target locking method to lock the target person;

step S3: executing the abnormality judgment method;

step S4: judging whether the target person is abnormal or not;

step S5: executing the early warning method and starting the warning module (8);

the early warning method comprises the following steps:

step S101: determining target position information of the target person and marking the target position information on a map;

step S102: extracting characteristic information of a scene where the target person is located, and generating a target two-dimensional model;

step S103: acquiring a plurality of target routes which the target person possibly escapes from, and marking the target routes on the map;

step S104: measuring the distances between the predicted target routes and the security center position on the map;

step S105: acquiring a plurality of target security centers meeting the distance requirement according to the set distance threshold;

step S106: sending information for preparing departure to a plurality of target security centers;

and when the early warning method is executed, the control module (1) controls the warning module (8) to start.

7. The camera regulation and control method based on artificial intelligence according to claim 6, wherein the rotation method specifically comprises: acquiring audio data through the microphone array, transmitting the audio data to the control module (1) through the audio memory (22) for processing and analysis to obtain a microphone with the largest sound energy of the acquired audio data, determining the azimuth information of the microphone as the azimuth of the audio data, transmitting the azimuth to the control module (1) through the audio memory (22), and transmitting a command to the rotation module (7) by the control module (1) to enable the camera to turn to the occurrence position of the audio data;

the control module (1) controls the rotation module (7) to rotate at fixed time;

the target locking method comprises the following steps:

step S201: the audio separation sub-module (42) separates the audio data into a plurality of individual audio data, and determines the plurality of individual audio data as an audio data set;

step S202: the audio recognition sub-module (43) can recognize whether the audio data set is provided with sensitive words, and the single person audio data with the sensitive words is determined as target audio data;

step S203: the audio recognition sub-module (43) extracts a time domain and a frequency domain of the target audio data to obtain a plurality of audio feature vectors, and processes the plurality of audio feature vectors to obtain a first feature vector, wherein the first feature vector is a one-dimensional array, and the first feature vector represents information of the target audio data;

step S204: acquiring face data in an image through the face acquisition submodule (51), dividing the face data into a plurality of face data through the face dividing submodule (52), and determining the plurality of face data as a face data set;

step S205: the feature extraction submodule (53) is used for respectively extracting five-sense organ information of a plurality of face data, each face data is extracted to a plurality of face feature vectors, the face feature vectors are processed to obtain a second feature vector, the second feature vector is a one-dimensional array, and the second feature vector represents face information of the face data;

step S206: the lip recognition sub-module (54) recognizes the lip actions of a plurality of face data to obtain a plurality of lip data, and the lip data are determined to be a lip data set;

the lip recognition sub-module (54) extracts the shape, the change speed and the movement track of a plurality of lip data respectively, each lip data extracts a plurality of lip feature vectors, the lip feature vectors are processed to obtain a third feature vector, the third feature vector is a one-dimensional array, and the third feature vector represents information obtained by recognizing lip actions in an image;

step S207: inputting the first feature vector, the plurality of second feature vectors, and the plurality of third feature vectors into the audio matching sub-module (44);

the audio matching sub-module (44) comprises a sound matching model and a lip synchronization model, each distance between the target audio feature vector and a plurality of face feature vectors is calculated through the sound matching model to obtain a sound matching score, each similarity between the target audio feature vector and a plurality of lip feature vectors is calculated through the lip synchronization model to obtain a lip synchronization score, a face image with high score is determined as target face data, and the target face data is determined as the target person;

the abnormality judgment method includes the steps of:

step S301: judging whether the emotion of the target person is abnormal or not;

step S302: judging whether the behavior of the target person is abnormal or not;

the step S301 is performed simultaneously with the step S302;

the step S4 specifically includes: if the emotion and the behavior of the target person are abnormal, executing the early warning method and starting the warning module (8), wherein only one of the emotion and the behavior of the target person is abnormal, the target person is normal, and executing the rotating method.

8. The camera control method based on artificial intelligence according to claim 7, wherein the step S301 comprises the steps of:

s301.1: extracting a plurality of feature vectors from the target face data through the feature extraction submodule (53), and processing the feature vectors to obtain a fourth feature vector, wherein the fourth feature vector is a one-dimensional array;

s301.2: model training is carried out on a plurality of face data with emotion labels, a plurality of feature vectors obtained in the training process are processed to obtain feature template vectors, the feature template vectors are one-dimensional arrays, and in the emotion classification submodule (55), the fourth feature vectors are multiplied by the feature module vectors;

the numerical value generated by multiplying the fourth feature vector and the feature module vector is any subinterval in [0,1 ], [1,2 ], [2,3 ], [3,4 ], [4,5 ], [5,6 ], [6, 7), and the classification result of the subinterval is anger, light, aversion, fear, happiness, sadness and surprise respectively;

the numerical value generated after the multiplication of the fourth characteristic vector and the characteristic module vector is in any interval of [0, 1), [2, 3), and the emotion of the target person is abnormal;

the step S302 includes the steps of:

s302.1: the motion trail construction submodule (61) tracks the target face data through a target tracking algorithm to construct the motion trail of the target person;

s302.2: the human skeleton extraction submodule (62) is used for extracting human key points of the target person from an image by utilizing a convolutional neural network, connecting the human key points belonging to the whole target person by utilizing a partial affinity domain, and extracting the human skeleton of the target person;

s302.3: matching the human skeleton of the target person to the motion trail of the target person through the human skeleton sequence construction submodule (63) to construct a human skeleton sequence of the target person;

s302.4: performing behavioral anomaly analysis on the target person through the human skeleton sequence analysis submodule (64), and matching the human skeleton sequence of the target person with human skeleton sequence templates corresponding to various anomaly behaviors by using a dynamic time warping algorithm to obtain a matching score;