CN111126411B

CN111126411B - Abnormal behavior identification method and device

Info

Publication number: CN111126411B
Application number: CN201911083346.1A
Authority: CN
Inventors: 李中振; 潘华东; 殷俊; 张兴明; 彭志蓉; 高美
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2023-04-07
Anticipated expiration: 2039-11-07
Also published as: CN111126411A

Abstract

The invention provides an abnormal behavior identification method and device, wherein the method comprises the following steps: acquiring image characteristics of an RGB (red, green and blue) image, an optical flow image and a parallax image of a video image to be processed; determining a first probability, a second probability and a third probability of abnormal behaviors in the video image according to the RGB image, the optical flow image and the image characteristics of the disparity map respectively; determining the target probability of the video image with abnormal behaviors according to the first probability, the second probability and the third probability, the preset weight of the RGB image, the preset weight of the light flow image and the preset weight of the parallax image; when the target probability is larger than or equal to the preset threshold, the abnormal behavior is determined to exist in the video image, the problem that the false identification rate is high in the identification of the abnormal behavior of the pedestrian in the related technology can be solved, the space and time of the abnormal behavior and the motion speed information can be provided comprehensively, and whether the abnormal behavior exists in the video image can be judged more accurately.

Description

Abnormal behavior identification method and device

Technical Field

The invention relates to the field of image processing, in particular to an abnormal behavior identification method and device.

Background

In recent years, intelligent monitoring technology is more and more concerned by researchers and application personnel due to the requirement of security protection. In public places, the video monitoring system is installed to monitor the site, special events can be conveniently processed in time, events occurring in the site can be recorded, review is facilitated, and objective and powerful evidence is provided for investigation and evidence obtaining of the events. At present, video monitoring data obtained by a monitoring camera can only be read manually, so that time and labor are consumed, efficiency is not high, each important detail cannot be missed, and the video resources cannot be used efficiently and high-quality. The network monitoring cameras are wide in layout, a fighting detection and identification algorithm can be deployed on the monitoring cameras, the cameras which can detect whether fighting is carried out or not are installed in public places, alarming is carried out, manpower and financial resources can be greatly reduced, the rapid location can be helped, fighting events are processed, and the method plays an important role in maintaining social stability and protecting the safety of people. With the development of deep learning, more and more fields combine deep learning to obtain good effects, such as face recognition, human body detection, pedestrian re-recognition and the like. The deep convolution neural network can automatically learn the characteristics of the image, eliminates the limitation of manually selecting the characteristics, has the characteristic of weight sharing, and reduces the parameters. The long and short time memory network (LSTM) is a neural network commonly used in sequence learning, the transmission state is controlled by an input gate, an output gate and a forgetting gate in the LSTM, and important information which needs to be memorized for a long time and is forgotten is memorized.

A pedestrian abnormal behavior identification method based on 3D convolution is provided in the correlation technique, the idea of lightweight 2D convolution network MobileNet is transferred to a 3D network, and the calculation cost can be reduced on the basis of keeping the identification performance; meanwhile, a self-adaptive layer and a sparse time sampling strategy are adopted, so that a large amount of redundant information and fuzzy noise contained in continuous frames can be reduced. Since the selected data format only adopts the RGB format, false alarm is easy to generate.

Aiming at the problem of high false recognition rate in the recognition of abnormal behaviors of pedestrians in the related art, no solution is provided.

Disclosure of Invention

The embodiment of the invention provides an abnormal behavior identification method and device, which at least solve the problem of high false identification rate in the identification of abnormal behaviors of pedestrians in the related technology.

According to an embodiment of the present invention, there is provided an abnormal behavior recognition method including:

acquiring an RGB (red, green and blue) image of a video image to be processed, and processing the RGB image respectively to obtain a light flow graph and a parallax map;

respectively determining image characteristics of the RGB image, the light flow image and the disparity image;

determining a first probability, a second probability and a third probability of abnormal behaviors existing in the video image according to the image characteristics of the RGB image, the light flow image and the disparity map respectively;

determining the target probability of the video image with the abnormal behavior according to the first probability, the second probability and the third probability, the preset weight of the RGB image, the preset weight of the light flow image and the preset weight of the disparity map;

and determining that the abnormal behaviors exist in the video image under the condition that the target probability is greater than or equal to a preset threshold value.

Optionally, before the RGB map is processed to obtain a light flow map and a disparity map, the method further includes:

detecting the number of target objects in the video image and the distance between the target objects;

determining that the number of the target objects is greater than 1 and the distance between the target objects is less than a preset threshold.

Optionally, the determining the image characteristics of the RGB map, the light flow map, and the disparity map respectively comprises:

and inputting the RGB map, the light flow map and the disparity map into a first pre-trained target neural network model respectively to obtain image characteristics of the RGB map, the light flow map and the disparity map output by the first target neural network model.

Optionally, the determining, according to the image features of the RGB map, the light flow map, and the disparity map, a first probability, a second probability, and a third probability of abnormal behavior in the video image includes:

inputting the image features of the RGB map, the light flow map and the disparity map into a second target neural network model trained in advance respectively to obtain the first probability, the second probability and the third probability output by the second target neural network model.

Optionally, respectively inputting image features of the RGB map, the light flow map, and the disparity map into a second target neural network model trained in advance, and obtaining the first probability, the second probability, and the third probability output by the second target neural network model includes:

inputting the image features of the RGB map, the light flow map and the disparity map into at least one layer of LSTM of the second target neural network model respectively to obtain the image features output by the at least one layer of LSTM;

inputting the image features output by the at least one layer of LSTM of the light flow graph, the RGB graph, and the disparity graph into a softmax layer of the second target neural network model, respectively, resulting in the first, second, and third probabilities output by the softmax layer.

Optionally, before determining the image features of the RGB map, the light flow map and the disparity map respectively, the method further comprises:

acquiring RGB (red, green and blue) images, optical flow images and parallax images corresponding to video images acquired by a first predetermined number of binocular image acquisition devices;

respectively training a first original neural network model through the first predetermined number of RGB graphs, optical flow graphs and disparity maps, wherein the first predetermined number of RGB graphs, optical flow graphs and disparity maps are respectively input into the first original neural network model, and loss functions of image features of the RGB graphs, the optical flow graphs and the disparity maps output by the trained first target neural network model meet a first predetermined convergence condition.

Optionally, before determining the first probability, the second probability and the third probability of the abnormal behavior existing in the video image according to the image features of the RGB map, the light flow map and the disparity map, respectively, the method further includes:

acquiring image characteristics of an RGB (red, green and blue) image, an optical flow image and a parallax image corresponding to video images acquired by a second predetermined number of binocular image acquisition devices;

and training a second original neural network model through the image features of the second predetermined number of RGB maps, optical flow maps and disparity maps, wherein the image features of the second predetermined number of RGB maps, optical flow maps and disparity maps are input to the second original neural network model respectively, and a loss function of the probability of abnormal behaviors existing in the image video output by the trained second target neural network model meets a second predetermined convergence condition.

Optionally, determining a target probability of the video image having the abnormal behavior according to the first probability, the second probability, the third probability, and preset weights of the RGB map, the light flow map, and the disparity map by:

y＝(w ₁ x ₁ +w ₂ x ₂ +w ₃ x ₃ )/(w ₁ +w ₂ +w ₃ )，

wherein w ₁ 、w ₂ 、w ₃ The weight of the RGB map, the weight of the light flow map and the weight of the disparity map, x ₁ 、x ₂ 、x ₃ The first probability, the second probability, and the third probability, respectively.

According to another embodiment of the present invention, there is also provided an abnormal behavior recognition apparatus including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an RGB (red, green and blue) image of a video image to be processed and respectively processing the RGB image to obtain a light flow graph and a parallax map;

a first determining module, configured to determine image features of the RGB map, the light flow map, and the disparity map respectively;

a second determining module, configured to determine a first probability, a second probability, and a third probability of an abnormal behavior existing in the video image according to image features of the RGB map, the light flow map, and the disparity map, respectively;

a third determining module, configured to determine, according to the first probability, the second probability, and the third probability, and preset weights of the RGB map, the light flow map, and the disparity map, a target probability that the video image has the abnormal behavior;

and the fourth determining module is used for determining that the abnormal behaviors exist in the video image under the condition that the target probability is greater than or equal to a preset threshold value.

Optionally, the apparatus further comprises:

the detection module is used for detecting the number of target objects in the video image and the distance between the target objects;

a fifth determining module, configured to determine that the number of the target objects is greater than 1 and the distance between the target objects is smaller than a preset threshold.

Optionally, the first determining module is further configured to

Optionally, the second determining module is further configured to

Optionally, the second determining module includes:

the first input submodule is used for respectively inputting the image characteristics of the RGB map, the light-flow map and the disparity map into at least one layer of LSTM of the second target neural network model to obtain the image characteristics output by the at least one layer of LSTM;

a second input sub-module for inputting the image features output by the at least one layer of LSTM of the light flow graph, the RGB graph, and the disparity graph to a softmax layer of the second target neural network model, resulting in the first, second, and third probabilities output by the softmax layer, respectively.

Optionally, the apparatus comprises:

the first acquisition module is used for acquiring an RGB (red, green and blue) image, an optical flow image and a parallax image corresponding to the video images acquired by the first preset number of binocular image acquisition devices;

the first training module is configured to train a first original neural network model through the first predetermined number of RGB maps, optical flow maps, and disparity maps, where the first predetermined number of RGB maps, optical flow maps, and disparity maps are input to the first original neural network model, and loss functions of image features of the RGB maps, the optical flow maps, and the disparity maps output by the trained first target neural network model satisfy a first predetermined convergence condition.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring image characteristics of an RGB (red, green and blue) image, an optical flow image and a parallax image corresponding to the video images acquired by the second predetermined number of binocular image acquisition devices;

and the second training module is used for training a second original neural network model through the image features of the second predetermined number of RGB maps, optical flow maps and disparity maps, wherein the image features of the second predetermined number of RGB maps, optical flow maps and disparity maps are input into the second original neural network model respectively, and a loss function of the probability of abnormal behaviors existing in the image video output by the trained second target neural network model meets a second predetermined convergence condition.

Optionally, the third determining module is further configured to determine a target probability that the video image has the abnormal behavior according to the first probability, the second probability, the third probability, and preset weights of the RGB map, the light flow map, and the disparity map in the following manner:

y＝(w ₁ x ₁ +w ₂ x ₂ +w ₃ x ₃ )/(w ₁ +w ₂ +w ₃ )，

wherein, w ₁ 、w ₂ 、w ₃ The weight of the RGB map, the weight of the light flow map and the weight of the disparity map, x ₁ 、x ₂ 、x ₃ The first probability, the second probability, and the third probability, respectively.

According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above-described method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

By the method, the RGB image, the optical flow image and the image characteristics of the parallax image of the video image to be processed are obtained; determining a first probability, a second probability and a third probability of abnormal behaviors existing in the video image according to the image characteristics of the RGB image, the light flow graph and the disparity map respectively; determining the target probability of the video image with the abnormal behavior according to the first probability, the second probability and the third probability, the preset weight of the RGB image, the preset weight of the light flow image and the preset weight of the disparity map; under the condition that the target probability is greater than or equal to the preset threshold value, the abnormal behavior is determined to exist in the video image, the problem that the false recognition rate is high in the identification of the abnormal behavior of the pedestrian in the related technology can be solved, an RGB (red, green and blue) graph, a light flow graph and a parallax graph are adopted, three different data modalities can be used for providing space and time of the abnormal behavior and motion speed information in an all-round mode, and whether the abnormal behavior exists in the video image can be judged more accurately.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware structure of a mobile terminal of an abnormal behavior identification method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of abnormal behavior identification according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of abnormal behavior recognition according to an embodiment of the present invention;

fig. 4 is a block diagram of an abnormal behavior recognition apparatus according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Example 1

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a hardware structure block diagram of a mobile terminal of an abnormal behavior identification method according to an embodiment of the present invention, as shown in fig. 1, a mobile terminal 10 may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for a communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the message receiving method in the embodiment of the present invention, and the processor 102 executes the computer programs stored in the memory 104, thereby executing various functional applications and data processing, that is, implementing the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In this embodiment, a method for identifying an abnormal behavior running in the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of the method for identifying an abnormal behavior according to the embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, acquiring an RGB image of a video image to be processed, and processing the RGB image respectively to obtain a light flow graph and a parallax map;

in the embodiment of the invention, the video image to be processed can be acquired by a binocular acquisition device, and the RGB image is a RGB image.

Step S204, determining the image characteristics of the RGB map, the light flow map and the parallax map respectively;

step S206, determining a first probability, a second probability and a third probability of abnormal behaviors in the video image according to the RGB image, the light flow graph and the image characteristics of the disparity map respectively;

step S208, determining the target probability of the video image with the abnormal behavior according to the first probability, the second probability, the third probability, the preset weight of the RGB image, the preset weight of the light flow image and the preset weight of the parallax image;

step S210, determining that the abnormal behaviors exist in the video image under the condition that the target probability is larger than or equal to a preset threshold value.

Through the steps S202 to S210, image characteristics of an RGB (red, green and blue) image, an optical flow image and a parallax image of a video image to be processed are obtained; determining a first probability, a second probability and a third probability of abnormal behaviors existing in the video image according to the image characteristics of the RGB image, the light flow image and the disparity map respectively; determining the target probability of the video image with the abnormal behavior according to the first probability, the second probability and the third probability, the preset weight of the RGB image, the preset weight of the light flow image and the preset weight of the disparity map; under the condition that the target probability is greater than or equal to the preset threshold value, the abnormal behavior is determined to exist in the video image, the problem that the false recognition rate is high in the identification of the abnormal behavior of the pedestrian in the related technology can be solved, an RGB (red, green and blue) graph, a light flow graph and a parallax graph are adopted, three different data modalities can be used for providing space and time and movement speed information of the abnormal behavior in an all-round mode, and whether the abnormal behavior exists can be judged more accurately.

In the embodiment of the invention, before the RGB image is respectively processed to obtain a light flow graph and a disparity map, the number of target objects in the video image and the distance between the target objects are detected; determining that the number of the target objects is greater than 1 and the distance between the target objects is less than a preset threshold.

In the embodiment of the present invention, the number of target objects and the distance between the target objects in the video image are detected, and when the number of the target objects is equal to 1, or the number of the target objects is greater than 1 and the distance between the target objects is greater than or equal to the preset threshold, it is determined that the abnormal behavior does not exist in the video image.

In an embodiment of the present invention, the step S204 may specifically include: and inputting the RGB map, the light flow map and the disparity map into a first pre-trained target neural network model respectively to obtain image characteristics of the RGB map, the light flow map and the disparity map output by the first target neural network model.

In an embodiment of the present invention, the step S206 may specifically include: inputting the image features of the RGB map, the light flow map and the disparity map into a second target neural network model trained in advance respectively to obtain the first probability, the second probability and the third probability output by the second target neural network model.

Further, inputting the image features of the RGB map, the light flow map, and the disparity map to at least one layer of LSTM of the second target neural network model, respectively, to obtain image features output by the at least one layer of LSTM; inputting image features output by the at least one layer of LSTM of the light flow graph, the RGB graph, and the disparity graph into a softmax layer of the second target neural network model, respectively, resulting in the first probability, the second probability, and the third probability output by the softmax layer.

In the embodiment of the present invention, before determining the image features of the RGB diagram, the light flow diagram, and the disparity map, respectively, the training of the first target neural network model is completed, specifically, the RGB diagram, the light flow diagram, and the disparity map corresponding to the video images acquired by the first predetermined number of binocular image acquisition devices are acquired; respectively training a first original neural network model through the first predetermined number of RGB graphs, optical flow graphs and disparity maps, wherein the first predetermined number of RGB graphs, optical flow graphs and disparity maps are respectively input into the first original neural network model, and loss functions of image features of the RGB graphs, the optical flow graphs and the disparity maps output by the trained first target neural network model meet a first predetermined convergence condition.

In the embodiment of the present invention, before determining the first probability, the second probability and the third probability of abnormal behavior in the video image according to the image features of the RGB image, the optical flow graph and the disparity map, training of a second target neural network model needs to be completed, specifically, image features of the RGB image, the optical flow graph and the disparity map corresponding to the video images acquired by the second predetermined number of binocular image acquisition devices are acquired; and training a second original neural network model through the image features of the second predetermined number of RGB maps, optical flow maps and disparity maps respectively, wherein the image features of the second predetermined number of RGB maps, optical flow maps and disparity maps are input into the second original neural network model respectively, and a loss function of the probability that abnormal behaviors exist in the image video output by the trained second target neural network model meets a second predetermined convergence condition.

Further, determining the target probability of the video image having the abnormal behavior according to the first probability, the second probability, the third probability, the preset weight of the RGB map, the preset weight of the light flow map, and the preset weight of the disparity map by:

y＝(w ₁ x ₁ +w ₂ x ₂ +w ₃ x ₃ )/(w ₁ +w ₂ +w ₃ ) Wherein w is ₁ 、w ₂ 、w ₃ The weights of the RGB map, the light flow map and the parallax map are x ₁ 、x ₂ 、x ₃ The first probability, the second probability, and the third probability, respectively.

The embodiment of the invention provides that when the identification of abnormal behaviors such as fighting and the like is detected, the two-stage processing is divided into two stages, wherein one stage is used for detection, and the other stage is used for identification. And in the detection stage, a CNN neural network capable of automatically extracting image features is used for detecting a pedestrian target, and if no person is detected in the video, only a single person exists, or the distance between two persons exceeds a threshold value and is within a safe distance, the non-fighting behavior is judged. If more than two or more people exist, the distance is within the threshold value and the distance is at the dangerous distance, the area is extracted, data of the three modes of the light flow graph, the parallax graph and the RGB are sent into the three-flow network model, and finally fusion is carried out to identify whether the overhead behavior exists or not. The RGB graph, the light flow graph and the disparity map are adopted during training of the network model, three different data modalities can provide space and time of the fighting behavior and motion speed information in an all-round mode, and whether the fighting behavior is the fighting behavior can be judged more accurately. In the identification process, three depth CNN-LSTM-fused parallel network models are used for feature extraction, and whether the behavior is a fighting behavior is judged, so that manual feature selection is avoided. The method comprises the following steps:

and acquiring a public data set, and collecting an RGB (red, green and blue) image, a disparity map and an optical flow map of the binocular camera as input data forms of a network.

Fig. 3 is a schematic diagram of abnormal behavior recognition according to an embodiment of the present invention, and as shown in fig. 3, a CNN network for pedestrian detection is built in a detection stage, and three CNN + LSTM network models for overhead behavior recognition are built in an identification stage and connected in parallel, where the network models may be the same or may be adjusted according to different data modalities. Detecting the number of people in the video image through a CNN network, determining that no abnormal shelving behaviors exist in the video image under the condition that the distance between one person or a plurality of persons is larger than a threshold value, respectively inputting an optical flow graph, an RGB (red, green and blue) graph and a disparity map of the video image into the CNN + LSTM network model if the distance between the plurality of persons is smaller than the threshold value, and determining whether the abnormal shelving behaviors exist in the video image according to the identification result.

Training the target detection network CNN by using the public data set of the target detection class, training a behavior recognition network CNN + LSTM three-flow network model by using the public data set of the behavior recognition class and adopting data of different modes respectively, and storing a model weight file.

And (4) finely adjusting the trained network model by adopting the acquired data, and testing after fine adjustment.

The above process is explained in detail below.

And collecting real shelving videos under each scene by adopting a binocular camera, preparing a public data set for behavior recognition, and obtaining an RGB (red, green, blue) image, a parallax image and a light flow image.

The method comprises the steps of using a tensorflow deep learning framework to build a target detection network model YOLO-V3 and three CNN + LSTM fusion network parallel modes in a pycharm integrated environment, wherein the CNN adopts a VGG16 structure, three last layers of full connection layers are removed, 4 layers of LSTM layers with 128 units are replaced, and a softmax classifier is arranged at the last layer. The VGG16 is mainly used for extracting the spatial features of the images, and the LSTM is responsible for extracting the temporal features of the behavior actions and simultaneously enhancing the spatial features of the image sequences.

In the connection of the YOLO-V3 and VGG16+ LSTM fusion network, a branch form is adopted: if no person is detected in the detection result of the YOLO-V3, or only a single person is detected, or the distance of a plurality of persons exceeds a threshold value and is in a safe distance, directly outputting that no fighting behavior exists; otherwise, the detected people and the detected areas are respectively sent to the VGG + LSTM parallel network in three data modes of an RGB image, a disparity map and an optical flow map to identify the fighting behaviors. In the identification stage, a three-flow network model is designed, and three VGG + LSTM networks process three different data modes and are fused in a parallel mode.

The deep network model is pre-trained by means of the GPU using the COCO data set of the open gesture data set and the Kinects (300000 videos, 400 types) and KTH, the weight of the network model is obtained, and a weight file is saved.

And loading the obtained weight file, adopting an online data enhancement expansion data set for the collected data set, inputting the data set into a trained neural network model, and carrying out weight fine adjustment. And after the fine adjustment is finished, storing the weight, and inputting the weight into a test set to test the model.

According to the embodiment of the invention, the abnormal behavior of the framing is detected and identified in two stages, the target is detected in stages by using the CNN neural network, and the non-abnormal behavior is directly judged and processed, so that the judgment efficiency is improved. Through the network model, three different data modalities of the RGB image, the light-flow graph and the disparity map are respectively processed to perform a behavior recognition task, the light-flow graph can process motion information in the shelf printing process, the accuracy is higher, and the missing report and the false report can be reduced.

Example 2

An embodiment of the present invention further provides an abnormal behavior recognition apparatus, and fig. 4 is a block diagram of the abnormal behavior recognition apparatus according to the embodiment of the present invention, as shown in fig. 4, including:

the acquisition module 42 is configured to acquire an RGB image of a video image to be processed, and process the RGB image to obtain a light flow diagram and a disparity map;

a first determining module 44, configured to determine image features of the RGB map, the light flow map, and the disparity map respectively;

a second determining module 46, configured to determine a first probability, a second probability, and a third probability of abnormal behavior existing in the video image according to image features of the RGB map, the light flow map, and the disparity map, respectively;

a third determining module 48, configured to determine a target probability that the video image has the abnormal behavior according to the first probability, the second probability, and the third probability, and preset weights of the RGB map, the light flow map, and the disparity map;

a fourth determining module 410, configured to determine that the abnormal behavior exists in the video image when the target probability is greater than or equal to a preset threshold.

Optionally, the apparatus further comprises:

Optionally, the first determining module 44 is further configured to

And inputting the RGB image, the light flow graph and the disparity map into a first pre-trained target neural network model respectively to obtain image characteristics of the RGB image, the light flow graph and the disparity map output by the first target neural network model.

Optionally, the second determining module 46 is further configured to

Optionally, the second determining module 46 includes:

a second input sub-module, configured to input image features output by the at least one layer of LSTM of the light flow graph, the RGB graph, and the disparity map into a softmax layer of the second target neural network model, respectively, to obtain the first probability, the second probability, and the third probability output by the softmax layer.

Optionally, the apparatus comprises:

the first acquisition module is used for acquiring RGB images, optical flow images and parallax images corresponding to the video images acquired by the first predetermined number of binocular image acquisition devices;

Optionally, the apparatus further comprises:

Optionally, the third determining module 48 is further configured to determine a target probability that the video image has the abnormal behavior according to the preset weight of the RGB map, the preset weight of the light flow map, the preset weight of the disparity map, the preset first probability, the preset second probability, and the preset third probability, by:

y＝(w ₁ x ₁ +w ₂ x ₂ +w ₃ x ₃ )/(w ₁ +w ₂ +w ₃ )，

wherein, w ₁ 、w ₂ 、w ₃ The weights of the RGB map, the light flow map and the parallax map are x ₁ 、x ₂ 、x ₃ The first probability, the second probability, and the third probability, respectively.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring an RGB (red, green and blue) image of a video image to be processed, and processing the RGB image respectively to obtain a light flow graph and a parallax image;

s2, determining image characteristics of the RGB image, the light flow graph and the disparity map respectively;

s3, determining a first probability, a second probability and a third probability of abnormal behaviors in the video image according to the RGB image, the light flow graph and the image characteristics of the disparity map respectively;

s4, determining the target probability of the video image with the abnormal behavior according to the first probability, the second probability and the third probability, the preset weight of the RGB image, the preset weight of the light flow image and the preset weight of the parallax image;

and S5, determining that the abnormal behaviors exist in the video image under the condition that the target probability is greater than or equal to a preset threshold value.

Optionally, in this embodiment, the storage medium may include but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Example 4

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s2, respectively determining the image characteristics of the RGB image, the light flow image and the parallax image;

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized in a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a memory device and executed by a computing device, and in some cases, the steps shown or described may be executed out of order, or separately as individual integrated circuit modules, or multiple modules or steps thereof may be implemented as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An abnormal behavior recognition method, comprising:

a detection stage and an identification stage;

the detection phase comprises:

detecting the number of target objects in a video image to be processed and the distance between the target objects;

extracting a target area of the video image under the condition that the number of the target objects is larger than 1 and the distance between the target objects is smaller than a preset threshold value;

acquiring an RGB (red, green and blue) image of the video image, and respectively processing the RGB image to obtain a light flow graph and a parallax graph;

the identification phase comprises:

respectively determining image characteristics of the RGB map, the light flow map and the parallax map;

2. The method of claim 1, wherein determining image characteristics of the RGB map, the light flow map, and the disparity map, respectively, comprises:

3. The method of claim 1, wherein determining the first, second, and third probabilities of abnormal behavior in the video image based on image features of the RGB map, the light flow map, and the disparity map, respectively, comprises:

4. The method of claim 3, wherein the inputting image features of the RGB map, the light flow map, and the disparity map into a second pre-trained target neural network model to obtain the first probability, the second probability, and the third probability output by the second target neural network model comprises:

inputting image features output by the at least one layer of LSTM of the light flow graph, the RGB graph, and the disparity graph into a softmax layer of the second target neural network model, respectively, resulting in the first probability, the second probability, and the third probability output by the softmax layer.

5. The method of claim 1, wherein prior to determining image features of the RGB map, the light flow map, and the disparity map, respectively, the method further comprises:

6. The method of claim 1, wherein prior to determining image features of the RGB map, the light flow map, and the disparity map, respectively, the method further comprises:

7. The method according to any one of claims 1 to 6, characterized in that the target probability of the video image having the abnormal behavior is determined according to the first probability, the second probability, the third probability, and the preset weights of the RGB map, the light flow map, and the disparity map by:

，

wherein the content of the first and second substances,

、/>

、/>

the weights of the RGB map, the light flow map and the parallax map are respectively,

、/>

、/>

the first probability, the second probability, and the third probability, respectively.

8. An abnormal behavior recognition apparatus, comprising:

the detection module is used for detecting the number of target objects in a video image to be processed and the distance between the target objects in a detection stage; extracting a target area of the video image under the condition that the number of the target objects is larger than 1 and the distance between the target objects is smaller than a preset threshold value;

the acquisition module is used for acquiring the RGB image of the video image and respectively processing the RGB image to obtain a light flow graph and a parallax map;

a third determining module, configured to determine a target probability that the video image has the abnormal behavior according to the weight of the disparity map, the first probability, the second probability, and the third probability, and a preset weight of the RGB map and a preset weight of the light flow map;

9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any one of claims 1 to 7 when executed.

10. An electronic device comprising a memory and a processor, wherein the memory has a computer program stored therein, and the processor is configured to execute the computer program to perform the method of any of claims 1 to 7.