CN109672853B

CN109672853B - Early warning method, device and equipment based on video monitoring and computer storage medium

Info

Publication number: CN109672853B
Application number: CN201811120943.2A
Authority: CN
Inventors: 夏新
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2018-09-25
Filing date: 2018-09-25
Publication date: 2022-05-17
Anticipated expiration: 2038-09-25
Also published as: CN109672853A

Abstract

The invention discloses a video monitoring-based early warning method, which comprises the following steps: the method comprises the steps of receiving audio data of a camera on a monitoring site, extracting features of the audio data, judging whether a case happens according to the features, determining a sound source direction of the audio data when the case happens, aligning the camera on the monitoring site to the sound source direction to shoot a video of the sound source direction, and generating early warning information. The invention also discloses an early warning device based on video monitoring, early warning equipment based on video monitoring and a computer storage medium. According to the method and the device, whether a case happens or not is judged according to the characteristics of the audio data of the monitoring site, when the case happens, the video of the sound source direction of the case occurrence is shot, and the early warning information is generated.

Description

Early warning method, device and equipment based on video monitoring and computer storage medium

Technical Field

The invention relates to the technical field of computers, in particular to an early warning method based on video monitoring, a video monitoring device, video monitoring equipment and a computer storage medium.

Background

With the development of image processing technology and network technology, a public security system deploys a large number of cameras at street intersections in the jurisdiction range, and image data transmitted by the cameras becomes an important source for solving a case and obtaining evidence. In the prior art, videos transmitted by a high-definition camera are monitored and analyzed in real time, cases are discovered in real time, and early warning information is sent, but due to the fact that the number of the cameras is large and the data volume of the high-definition videos is huge, the problems of low processing efficiency, low information timeliness and the like are faced.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide an early warning method based on video monitoring, an early warning device based on video monitoring, early warning equipment based on video monitoring and a computer storage medium, and aims to solve the technical problem that in the prior art, the field monitoring processing efficiency of a high-definition camera is low.

In order to achieve the above object, the present invention provides a video monitoring-based early warning method, which comprises the following steps:

receiving audio data of a camera in a monitoring site;

extracting the characteristics of the audio data, and judging whether a case occurs according to the characteristics;

when a case is judged to occur, determining the sound source direction of the audio data;

and aligning the camera of the monitoring site to the sound source direction, and shooting the video of the sound source direction to generate early warning information.

Preferably, the step of extracting the feature of the audio data and determining whether there is a case according to the feature includes:

determining a signal-to-noise ratio of the audio data;

when the signal-to-noise ratio of the audio data exceeds a preset signal-to-noise ratio threshold value, converting the audio data into character information;

judging whether the character information contains preset keywords or not;

and when the text information contains the preset keywords, judging that a case occurs.

determining a noise power of the audio data;

and when the noise power of the audio data exceeds a preset noise power threshold value, judging that a case occurs.

Preferably, when a case occurs, the step of determining the sound source direction of the audio data further includes:

and adjusting the shooting parameters of the camera on the monitoring site, wherein the definition of the shooting parameters after adjustment is higher than that of the shooting parameters before adjustment.

Preferably, the step of adjusting the shooting parameters of the camera in the monitoring site, in which the definition of the adjusted shooting parameters is higher than that of the shooting parameters before adjustment, further includes:

acquiring a network identifier of an adjacent monitoring camera, wherein the distance between the adjacent monitoring camera and the camera of the monitoring site is within a preset distance range;

and sending a message to the adjacent monitoring camera according to the network identifier so as to inform the adjacent monitoring camera to adjust the shooting parameters, wherein the definition of the adjusted shooting parameters is higher than that of the shooting parameters before adjustment.

Preferably, the sending a message to the neighboring monitoring camera according to the network identifier to notify the neighboring monitoring camera to adjust the shooting parameters, where the step of setting the definition of the shooting parameters after adjustment higher than the definition of the shooting parameters before adjustment further includes:

determining the duration time of the case according to the time of the case occurrence and the current time;

and when the case duration is greater than a preset case duration threshold, recovering the shooting parameters of the monitoring field camera and the adjacent monitoring camera.

Preferably, the step of aligning the camera of the monitoring site with the sound source direction to shoot the video of the sound source direction, and the step of generating the warning information includes:

acquiring image data of the monitoring field camera;

recognizing a face picture from the image data, and extracting face features from the face picture;

judging whether the face features are matched with preset face features or not;

when the face features are matched with preset face features, acquiring early warning grade information corresponding to the preset face features;

and generating early warning information containing the early warning grade information.

In addition, in order to achieve the above object, the present invention further provides the video monitoring-based early warning apparatus, which includes:

the receiving module is used for receiving audio data of a camera in a monitoring site;

the characteristic processing module is used for extracting the characteristics of the audio data and judging whether a case occurs according to the characteristics;

the sound source estimation module is used for determining the sound source direction of the audio data when a case is judged to occur;

and the early warning module is used for aligning the camera of the monitoring site to the sound source direction and shooting the video of the sound source direction so as to generate early warning information.

In addition, to achieve the above object, the present invention further provides the video monitoring-based early warning device, including: the video monitoring-based early warning processing program realizes the steps of the video monitoring-based early warning method when being executed by the processor.

In addition, in order to achieve the above object, the present invention further provides a computer storage medium, wherein the computer storage medium stores an early warning processing program based on video monitoring, and the early warning processing program based on video monitoring implements the steps of the early warning processing method based on video monitoring when being executed by a processor.

The early warning method based on video monitoring, the early warning device based on video monitoring, the early warning equipment based on video monitoring and the computer storage medium provided by the embodiment of the invention receive the audio data of the camera of the monitoring site, extract the characteristics of the audio data, judge whether any case happens according to the characteristics, when a case is judged to occur, determining the sound source direction of the audio data, aligning the camera of the monitoring site to the sound source direction, and shooting the video of the sound source direction, to generate early warning information, the invention judges whether any case happens according to the characteristics of the audio data of the monitoring site, when a case happens, the video of the sound source direction of the case is shot, and early warning information is generated, because the more accurate monitoring video related to the case is obtained in time, the processing efficiency of the on-site monitoring video can be improved.

Drawings

FIG. 1 is a schematic diagram of an apparatus in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of the video surveillance-based early warning method according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of the video surveillance-based early warning method according to the present invention;

FIG. 4 is a schematic flow chart of a video surveillance-based early warning method according to a third embodiment of the present invention;

fig. 5 is a schematic functional block diagram of an embodiment of the video surveillance-based early warning apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a server (also called an event processing device, where the event processing device may be formed by a single event processing apparatus, or may be formed by combining other apparatuses with the event processing apparatus) in a hardware operating environment according to an embodiment of the present invention.

The server in the embodiment of the invention refers to a computer for managing resources and providing services for users, and is generally divided into a file server, a database server and an application server. The computer or computer system running the above software is also referred to as a server. Compared with a common PC (personal computer), the server has higher requirements on stability, safety, performance and the like; as shown in fig. 1, the server may include: the processor 1001 includes, for example, a Central Processing Unit (CPU), a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002, a chipset, a disk system, hardware such as a network, and the like. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., WIFI interface, WIreless FIdelity, WIFI interface). The memory 1005 may be a Random Access Memory (RAM) or a non-volatile memory (e.g., a disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001 described previously.

Optionally, the server may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, and a WiFi module; the input unit is compared with a display screen and a touch screen; the network interface can be selected from Bluetooth, a probe, 3G/4G/5G (the former numbers indicate the algebra of the cellular mobile communication network, namely the network of the first generation, and the English letter G indicates the generation) networking base station equipment and the like except WiFi in the wireless interface. Those skilled in the art will appreciate that the server architecture shown in FIG. 1 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, the computer software product is stored in a storage medium (storage medium: also called computer storage medium, computer medium, readable storage medium, computer readable storage medium, or direct storage medium, such as RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention, and as a computer storage medium, the memory 1005 may include an operating system, a network communication module, a user interface module, and a computer program.

In the server shown in fig. 1, the network interface 1004 is mainly used for connecting to a background database and performing data communication with the background database; the user interface 1003 is mainly used for connecting a client (called a user side or a terminal, which may be a fixed terminal or a mobile terminal, and is not described herein again) and performing data communication with the client; and the processor 1001 may be configured to call the computer program stored in the memory 1005 and perform the steps of the event processing method provided by the following embodiment of the present invention.

Referring to fig. 2, a first embodiment of the present invention provides a video monitoring-based early warning method, including:

and step S10, receiving the audio data of the camera in the monitoring site.

The camera of the monitoring site is connected with a special network entering the monitoring system through a network connection module or a special networking device of the camera, and transmits the shot video data to a server of the monitoring system in real time. When the server receives the video data transmitted by the camera, the video data is separated into audio stream compression coded data and video stream compression coded data according to the video packaging format, and then the audio stream compression coded data is further decompressed and decrypted to obtain an audio code stream, namely the voice data to be processed.

And step S20, extracting the characteristics of the audio data, and judging whether any case happens according to the characteristics.

According to the type of the monitoring place and the case scene to be identified, firstly determining the condition of case occurrence, and then determining the characteristics of the audio data which need to be extracted for judging the case occurrence.

For example, when the monitoring place is a street or a residential area, because the probability of criminal crime occurring when a number of people are in a vehicle is large and the ambient background noise is large is low, and the probability of criminal crime occurring when a number of people are in a vehicle is small and the ambient background noise is small at night is high, the scene can be used as a case scene to be identified. In the case scene to be identified, the case occurs under the condition that the voice containing the case keyword is detected, and the feature of the corresponding audio data to be extracted is the feature of the voice data contained in the audio data.

The voice data, i.e., the human voice data, can be recognized from the audio data by a machine learning method. Firstly, acquiring various human voice data as a model training positive sample data source, simultaneously acquiring non-human voices such as automobile voice and noise as a negative sample data source, extracting the spectral characteristics of the positive sample data source and the spectral characteristics of the negative sample data source by adopting a Mel cepstrum algorithm, inputting the two spectral characteristics into a neural network model for training, and generating a prediction model based on a training result. When audio data transmitted by a camera in a monitoring place is obtained, framing processing is carried out on the audio data, the frequency spectrum characteristic of each frame of audio data is extracted based on a Mel cepstrum algorithm, the frequency spectrum characteristic is input into a prediction model to predict a result, and whether each frame of audio data contains human voice is judged.

And calculating the signal-to-noise ratio of the recognized audio data containing the human voice data, converting the audio data into text characters by adopting recognition of an acoustic model and a language model when the signal-to-noise ratio exceeds a preset signal-to-noise ratio threshold value, and matching the text characters obtained by conversion with preset case keywords. And when the converted text characters are matched with the preset case keywords, judging that a case occurs.

When the monitoring site is a relatively quiet site such as a warehouse or a machine room, the noise power is usually maintained at a low and smooth level, and when a case such as a theft or a fire occurs, the noise power tends to increase, so that the case may be conditioned by detecting the noise level increase. Specifically, the noise power of the audio data transmitted by the camera in the monitoring place can be calculated, and when the noise power of the audio data exceeds a preset noise power threshold value, a case is judged to occur.

Step S30, when a case occurs, determining a sound source direction of the audio data.

In order to obtain the sound source direction estimation with high accuracy, the microphone array can be used for sound source direction estimation, namely, the microphone array is used for receiving sound signals, and multiple paths of sound signals are analyzed and processed to determine the plane or space coordinates of one or more sound sources in a space domain, so that the positions of the sound sources are obtained, and the directions of the sound sources are obtained. Wherein, a microphone array containing a plurality of microphones is installed on a camera of a monitored place, or a plurality of cameras can be arranged on the monitored place, and the microphone array is formed by the microphones on the plurality of cameras.

Sound source localization methods based on microphone arrays mainly have three categories, which are respectively: a method based on steerable beam forming, a method based on high resolution spectral estimation and a method based on time delay estimation. The method based on the time delay estimation is preferably used because the method is not limited by the microphone array structure and the calculation amount is small.

The method based on time delay estimation comprises two steps, the first step is to calculate the time difference from the sound source to the microphone pair, and the second step is to obtain the estimation of the sound source position according to the time difference of the microphone pair and the position of the microphone. In the first step, various correlation algorithms can be used to calculate the time difference, such as GCC (generalized Croos correlation) algorithm, LMS (least Mean square) algorithm, PHAT-GCC (phase Thransform) algorithm, etc., preferably PHAT-GCC algorithm.

The step of calculating the time difference from the sound source to each microphone pair using the PHAT-GCC algorithm is: and transforming the two received time domain signals of each microphone into frequency domain signals through fast Fourier transform, obtaining a cross-power spectrum function according to the frequency domain signals and the PHAT weighting function, carrying out reverse fast Fourier transform on the cross-power spectrum to obtain a cross-correlation function, and obtaining a time difference in the process of carrying out peak detection on the cross-correlation function.

After a plurality of time differences from a sound source to a plurality of microphone pairs are obtained by using a PHAT-GCC algorithm, the position of the sound source can be estimated by adopting a weighted least square method according to the position and the time differences of each microphone pair. In the same coordinate system, the horizontal line where the microphone array is located is taken as a reference, the center of the microphone array is taken as an origin, and the direction of the sound source is determined according to the position of the sound source, wherein the direction of the sound source can be represented by an included angle between the horizontal line where the microphone array is located and the sound source.

And step S40, aligning the camera of the monitoring site to the sound source direction, and shooting the video of the sound source direction to generate early warning information.

And calculating the angle to be rotated of the camera according to the current angle of the camera and the angle of the sound source direction calculated in the previous step, and controlling the camera to rotate by the angle to be rotated so as to align the sound source direction.

In this embodiment, the server performs the processing of these steps, generates the warning information, and sends the warning information to the display terminal.

In this embodiment, whether a case occurs is determined according to the characteristics of the audio data of the monitoring site, when the case occurs, a video of the sound source direction of the case occurrence is shot, and early warning information is generated.

Further, referring to fig. 3, a second embodiment of the present invention provides an early warning method based on video monitoring based on the first embodiment, where the embodiment further includes, after step S30:

and step S50, adjusting the shooting parameters of the camera of the monitoring site, wherein the definition of the adjusted shooting parameters is higher than that of the shooting parameters before adjustment.

The video data shot by the cameras arranged in each monitoring place are transmitted to the server of the monitoring system in real time, when the capacity of the video data is extremely large, huge pressure is brought to the storage resources and the storage performance of the server, and massive video data has excessive redundant information, so that the shooting parameters of the cameras in the monitoring places can be set to be parameters meeting the requirement of general definition, when a case is judged to occur through the characteristics of the audio data, the shooting parameters of the cameras in the monitoring places where the case occurs are adjusted, and the definition of the adjusted shooting parameters is higher than that of the shooting parameters before adjustment.

When the server judges whether a case happens, the server sends a message to the camera through the network to instruct the camera to adjust the shooting parameters. Specifically, shooting parameters of different levels of the camera can be stored in advance on the server, wherein the higher the level is, the higher the definition of the shooting parameters is, when the server judges that all cases of a current monitoring place occur, and when the current shooting parameter level of the camera of the monitoring place is obtained, the shooting parameters of the higher level are sent to the camera of the monitoring place through a message to indicate the camera to adjust the shooting parameters; the server can also send a message only containing the indication information of the high definition to the camera of the monitoring place where the case happens, and the camera automatically adjusts the relevant parameters when receiving the indication information, so that the definition corresponding to the adjusted shooting parameters is higher.

Further, since the geographical position of the related person in the case may change during the continuous process of the case, in order to obtain a more comprehensive and accurate clue of the case, a high-definition shooting mode needs to be started for the camera near the place where the case occurs in time. Specifically, the network identifier of a neighboring monitoring camera, which is within a preset distance range from the camera in the monitoring field, can be acquired, and an adjustment instruction message is sent to the neighboring monitoring camera according to the network identifier to notify the neighboring monitoring camera to adjust the shooting parameters, wherein the definition of the adjusted shooting parameters is higher than that of the shooting parameters before adjustment. The adjustment instruction message may be transmitted by a server or a camera of a case place.

It should be noted that there are various methods for acquiring the positioning information of the camera in the monitoring location. One method is as follows: a camera comprising a positioning module, such as a camera comprising a GPS module, is installed in a monitoring place, and software running on the camera can acquire positioning information of the camera in real time through the GPS module and send the positioning information to a server through a network. The other method comprises the following steps: and installing positioning equipment in the monitoring place, connecting the positioning equipment with a camera in the monitoring place to provide positioning information for the camera, and sending the positioning information to a server by software running on the camera through a network.

After the camera on the monitoring site adjusts the shooting parameters into high-definition shooting parameters according to the indication, the high-definition shooting video data volume is large, so that more network transmission bandwidth and server storage resources are occupied, and the shooting parameters of the camera on the monitoring site need to be recovered when the case is finished. The case duration time can be determined according to the case occurrence time and the current time, and when the case duration time is larger than a preset case duration time threshold value, shooting parameters of the monitoring field camera and the adjacent monitoring camera are recovered.

In the embodiment, the shooting parameters of the camera in the monitoring place where the case happens are adjusted to the shooting parameters corresponding to the high definition, so that more accurate video information related to the case is obtained in time, and the case detection is facilitated.

Further, referring to fig. 4, a third embodiment of the present invention provides a video monitoring-based early warning method based on the first embodiment or the second embodiment, where the present embodiment includes, in step S40:

and step S60, acquiring the image data of the monitoring field camera.

The names and the recent clear photos of wanted people are included in wanted orders released by the ministry of public security at different levels, the names and the recent clear photos of criminal full release personnel in the prior department of major crime are stored in a public security system database, and the clear photos of lost people can be released by relatives of lost people in the society through a network.

Step S70, recognizing a face image from the image data, and extracting face features from the face image.

And extracting the human face features in the human face picture by using a human face feature recognition algorithm. The human face features include feature information of human eyes, nose, mouth, eyebrows and the like, specifically, positions, shapes and sizes of five sense organs, relative proportion or relative position relationship of the five sense organs, skin color information or contour information of a human face and the like. The face feature recognition algorithm that can be adopted includes a method based on geometric features, a face recognition method based on Principal Component Analysis (PCA), a method of extracting local features (LBP), Histogram of Oriented Gradient (HOG), Gabor wavelet transform, etc., and the algorithm of extracting face features is not limited here.

And step S80, judging whether the face features are matched with preset face features.

When the face features in the face picture recognized from the image data are matched with the preset face features, specifically, the similarity between the two face features is calculated, and when the calculated similarity exceeds a preset threshold value, the matching is considered.

And step S90, when the face features are matched with preset face features, acquiring early warning grade information corresponding to the preset face features.

Different preset human face features correspond to different people, and social influence and harm degree of cases possibly related to different people are different, so that different preset human face features can correspond to different early warning levels. For example, people wanted by a public security system can be corresponding to a high-grade early warning level, full criminal releasing people with a serious pre-criminal department can be corresponding to a medium-grade early warning level, and lost people can be corresponding to a low-grade early warning level.

And step S100, generating early warning information containing the early warning grade information.

In this embodiment, the face features of the face picture obtained at the monitoring site where a case occurs are matched with the preset face features, and the early warning level information included in the early warning information is determined according to the early warning level of the preset face features during matching, so that the early warning efficiency is improved.

Referring to fig. 5, the present invention further provides a video monitoring-based early warning apparatus, including:

the receiving module 10 is used for receiving audio data of a camera in a monitoring site;

the feature processing module 20 is configured to extract features of the audio data, and determine whether a case occurs according to the features;

a sound source estimation module 30, configured to determine a sound source direction of the audio data when it is determined that a case occurs;

and the early warning module 40 is used for aligning the camera of the monitoring site with the sound source direction and shooting the video of the sound source direction to generate early warning information.

Optionally, the feature processing module 20 includes:

the signal-to-noise ratio calculation unit is used for determining the signal-to-noise ratio of the audio data;

the voice recognition unit is used for converting the audio data into character information when the signal-to-noise ratio of the audio data exceeds a preset signal-to-noise ratio threshold;

and the judging unit is used for judging whether the character information contains preset keywords or not, and judging that a case occurs when the character information contains the preset keywords.

Optionally, the feature processing module 20 includes:

a noise power calculation but, as such, is used to determine the noise power of the audio data;

and the judging unit is used for judging that a case occurs when the noise power of the audio data exceeds a preset noise power threshold value.

Optionally, the video monitoring-based early warning apparatus further includes:

and the shooting parameter adjusting module is used for adjusting the shooting parameters of the camera on the monitoring site, and the definition of the adjusted shooting parameters is higher than that of the shooting parameters before adjustment.

the acquisition module is used for acquiring the network identification of the adjacent monitoring camera, the distance between which and the camera of the monitoring site is within a preset distance range;

and the sending module is used for sending a message to the adjacent monitoring camera according to the network identifier so as to inform the adjacent monitoring camera of adjusting the shooting parameters, wherein the definition of the adjusted shooting parameters is higher than that of the shooting parameters before adjustment.

the timing module is used for determining the case duration according to the case occurrence time and the current time;

the shooting parameter adjusting module is further used for recovering the shooting parameters of the monitoring field camera and the adjacent monitoring camera when the case duration is larger than a preset case duration threshold.

Optionally, the early warning module 40 includes:

the acquisition unit is used for acquiring the image data of the monitoring field camera;

the face recognition unit is used for recognizing a face picture from the image data and extracting face features from the face picture;

the face recognition unit is further used for judging whether the face features are matched with preset face features or not, and acquiring early warning grade information corresponding to the preset face features when the face features are matched with the preset face features;

and the early warning unit is used for generating early warning information containing the early warning grade information.

The invention also provides an early warning device based on video monitoring, which comprises: the early warning processing program based on video monitoring is executed by the processor to realize the steps of the early warning method based on video monitoring.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores an early warning processing program based on video monitoring, and the early warning processing program based on video monitoring implements the steps of the early warning method based on video monitoring when being executed by a processor.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. The early warning method based on video monitoring is characterized by comprising the following steps of:

receiving audio data of a camera in a monitoring site;

extracting the characteristics of the audio data, and judging whether a case occurs according to the characteristics, wherein when the environmental background noise in the audio data is smaller than a preset threshold value, the voice data is identified from the audio data, and the signal to noise ratio of the audio data containing the voice data is determined;

judging whether the character information contains preset keywords or not;

when the text information contains the preset keywords, judging that a case occurs;

2. The video surveillance-based early warning method as claimed in claim 1, wherein the step of extracting the features of the audio data and determining whether there is a case according to the features comprises:

determining a noise power of the audio data;

3. The video surveillance-based warning method of claim 1, wherein the step of determining the direction of the audio data from which sound originates when a case occurs further comprises:

and adjusting the shooting parameters of the camera on the monitoring site, wherein the definition of the adjusted shooting parameters is higher than that of the shooting parameters before adjustment.

4. The video surveillance-based early warning method according to claim 3, wherein the step of adjusting the shooting parameters of the camera at the surveillance site, the definition of the adjusted shooting parameters being higher than the definition of the shooting parameters before the adjustment further comprises:

5. The video surveillance-based early warning method of claim 4, wherein the sending a message to the neighboring surveillance camera according to the network identifier to notify the neighboring surveillance camera to adjust the shooting parameters, wherein the step of adjusting the shooting parameters to have a higher definition than the shooting parameters before the adjustment further comprises:

6. The video surveillance-based warning method according to any one of claims 1 to 5, wherein the step of aiming a camera of the surveillance site at the sound source direction to shoot a video of the sound source direction and generating warning information comprises:

acquiring image data of the monitoring field camera;

judging whether the face features are matched with preset face features or not;

7. The early warning device based on video monitoring is characterized by comprising the following components:

the receiving module is used for receiving audio data of a camera on a monitoring site;

the characteristic processing module is used for extracting the characteristics of the audio data and judging whether a case occurs according to the characteristics, wherein when the environmental background noise in the audio data is smaller than a preset threshold value, the voice data is identified from the audio data, and the signal-to-noise ratio of the audio data containing the voice data is determined; when the signal-to-noise ratio of the audio data exceeds a preset signal-to-noise ratio threshold value, converting the audio data into character information; judging whether the character information contains preset keywords or not; when the text information contains the preset keywords, judging that a case occurs;

8. An early warning device based on video monitoring, characterized in that the early warning device based on video monitoring comprises: a memory, a processor, a camera and a video surveillance based warning processing program stored on the memory and executable on the processor, the video surveillance based warning processing program when executed by the processor implementing the steps of the video surveillance based warning method as claimed in any one of claims 1 to 6.

9. A computer-readable storage medium, wherein the computer-readable storage medium stores thereon a video surveillance-based warning processing program, and the video surveillance-based warning processing program, when executed by a processor, implements the steps of the video surveillance-based warning method according to any one of claims 1 to 6.