WO2020248386A1

WO2020248386A1 - Video analysis method and apparatus, computer device and storage medium

Info

Publication number: WO2020248386A1
Application number: PCT/CN2019/103373
Authority: WO
Inventors: 盖超
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-06-14
Filing date: 2019-08-29
Publication date: 2020-12-17
Also published as: CN110390262B; CN110390262A

Abstract

A video analysis method, comprising: receiving a video image collected by a camera; detecting a target object in the video image to obtain the type of target object; tracking the target object in the video image to obtain a state of the target object; performing, according to the type of target object and the state of the target object, analysis to obtain a business scene included in the video image; determining whether the business scene is abnormal; and when the business scene in the video image is abnormal, recording key information when the business scene is abnormal. Further provided in the present application are a video analysis apparatus, a computer device and a storage medium. By means of the present application, the key information in the video image when an abnormal event occurs can be acquired, and the abnormal event can be timely processed.

Description

Video analysis method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 14, 2019. The application number is 201910517477.X. The invention title is "Video Analysis Method, Device, Server, and Storage Medium". The entire content is incorporated by reference. In this application.

Technical field

This application relates to the field of image recognition technology, and in particular to a video analysis method, device, computer equipment and storage medium.

Background technique

With the continuous development of video surveillance technology, my country's current video surveillance is widely used in various projects such as smart cities, digital cities, smart parks, smart transportation, and ferry monitoring. The Internet of Things is the foundation of a smart city, and video surveillance will be the core. However, when analyzing the surveillance video, the user is required to replay the surveillance video and view the video image frame by frame to find abnormal events in the video image. It takes a lot of time and manpower.

Summary of the invention

In view of the above, it is necessary to propose a video analysis method, device, computer equipment, and storage medium that can obtain key information when an abnormal event occurs in a video image in time.

The first aspect of the present application provides a video analysis method, the method includes:

Receive video images collected by the camera;

Detecting the target object in the video image to obtain the target object category;

Tracking the target object in the video image to obtain the state of the target object;

Analyzing the business scene contained in the video image according to the category of the target object and the state of the target object;

Determine whether the business scenario is abnormal; and

When the business scene in the video image is abnormal, the key information when the business scene is abnormal is recorded.

Preferably, the detecting the target object in the video image to obtain the target object category includes:

Obtaining the basic attributes of the target object in the video image by decomposing the target object in the video image;

Comparing the acquired basic attributes with the basic attributes of the target object pre-stored in the database;

When the acquired basic attribute is consistent with the basic attribute of the target object in the database, query a table corresponding to the basic attribute and target object category stored in the database to obtain the target object category.

Preferably, the tracking the target object in the video image to obtain the state of the target object includes:

Determine the target object in the current video frame;

Acquiring the image area of the target object in the previous video frame and the image characteristics of the image area, where the previous video frame is k video frames before the current video frame, and k is a positive integer;

Perform motion estimation on the target object according to the image area of the target object in the previous video frame, and determine the prediction area of the target object in the current video frame;

Determine the detection range of the target object in the current video frame according to the prediction area;

Judging whether the target object appears in the detection range of the current video frame;

If the target object appears in the detection range of the current video frame, determine the image area of the target object in the current video frame;

If the target object does not appear in the detection range in the current video frame, it is determined that the target object is abnormal.

Preferably, the judging whether the business scenario is abnormal includes:

When it is determined that the target object is abnormal, extract the current video frame as an abnormal image;

Importing the abnormal image as an image to be identified into a pre-trained anomaly model, where the abnormal model is used to characterize the correspondence between the image to be identified and the abnormal scene;

When the abnormal model outputs the abnormal scene corresponding to the image to be recognized, it is confirmed that the business scene is abnormal.

Preferably, the key information includes the time and place when the business scene is abnormal, and the picture file when the business scene is abnormal in the intercepted video image.

Preferably, the method further includes:

Send the recorded key information to a third-party business platform, where the third-party business platform includes a public security system and a traffic control system.

Preferably, after receiving the video image collected by the camera, the method further includes:

Decoding the video image.

A second aspect of the present application provides a video analysis device, the device includes:

The receiving module is used to receive the video image collected by the camera;

The detection module is used to detect the target object in the video image to obtain the target object category;

A tracking module, used to track the target object in the video image to obtain the state of the target object;

An analysis module, configured to analyze and obtain the business scene contained in the video image according to the category of the target object and the state of the target object;

The judgment module is used to judge whether the business scenario is abnormal; and

The processing module is configured to record key information when the business scene is abnormal when the business scene in the video image is abnormal.

A third aspect of the present application provides a computer device that includes a processor and a memory, and the processor is configured to implement the video analysis method when executing computer-readable instructions stored in the memory.

A fourth aspect of the present application provides a non-volatile readable storage medium having computer readable instructions stored on the non-volatile readable storage medium, and when the computer readable instructions are executed by a processor, the Video analysis methods.

The video analysis method, device, computer equipment and storage medium described in this application can analyze the video image to obtain the business scenario contained in the video image, and determine whether the business scenario is abnormal, and when the business scenario is abnormal , To record the key information when the business scenario is abnormal. Thereby, the key information can be sent to the corresponding third-party platform, and the exception can be handled in time.

Description of the drawings

FIG. 1 is a flowchart of a video analysis method provided in Embodiment 1 of the present application.

2 is a diagram of functional modules in a preferred embodiment of the video analysis device of this application provided in the second embodiment of this application.

Fig. 3 is a schematic diagram of a computer device provided in Embodiment 3 of the present application.

The following specific embodiments will further illustrate this application in conjunction with the above-mentioned drawings.

Detailed ways

In order to be able to understand the above objectives, features and advantages of the application more clearly, the application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the application and the features in the embodiments can be combined with each other if there is no conflict.

In the following description, many specific details are set forth in order to fully understand the present application. The described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terms used in the description of the application herein are only for the purpose of describing specific embodiments, and are not intended to limit the application.

The terms "first", "second", and "third" in the specification and claims of this application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the term "including" and any variations of them are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

The video analysis method of the embodiment of the present application is applied in a hardware environment composed of at least one computer device and a mobile terminal connected to the computer device through a network. Networks include but are not limited to: wide area network, metropolitan area network or local area network. The video analysis method in the embodiments of the present application may be executed by a computer device or a mobile terminal; it may also be executed by the computer device and the mobile terminal.

For the computer equipment that needs to perform the video analysis method, the video analysis function provided by the method of this application can be directly integrated on the computer device, or a client for implementing the method of this application can be installed. For another example, the method provided in this application can also be run on a computer or other device in the form of a software development kit (SDK), and provide an interface for video analysis functions in the form of SDK. The interface can realize the video analysis function.

Example one

FIG. 1 is a flowchart of a video analysis method provided in Embodiment 1 of the present application. According to different needs, the execution order in this flowchart can be changed, and some steps can be omitted.

Step S1, receiving the video image collected by the camera.

In this embodiment, the video image is collected by a camera, and the camera is installed in different business scenarios. The business scenario describes a scenario that requires target object detection and/or video analysis. For example, the business scenario is an intelligent traffic business scenario that recognizes traffic accidents, congestion, vehicle speed detection, traffic flow prediction, vehicle loss of control, vehicle trajectory, people or bicycle intrusion, violation of traffic laws, throwing objects, etc. The business scenario also It can be a smart park business scenario that identifies personnel intrusion, leftovers, lost property monitoring, license plate analysis, vehicle trajectory, traffic flow analysis, pedestrian flow analysis, fireworks or smoke, etc. The business scenario can also be illegal ships, overloaded, dense crowds, etc. Ferry monitoring business scenarios such as detection, whether to wear a life jacket, falling into the water, etc.

The business scenarios may also be scenarios such as unmanned driving, financial scenarios, equipment login, airport and public area monitoring.

In this embodiment, the cameras may be cameras of different models and specifications manufactured by different manufacturers, and the video analysis method can realize unified processing and analysis of video images taken by cameras of different models and specifications manufactured by different manufacturers.

Preferably, after receiving the video image collected by the camera, the video analysis method further includes:

The step of decoding the video image.

Specifically, the video image may be video decoded by a graphics processing unit (GPU) to obtain each frame of the video image.

Step S2: Detect the target object in the video image to obtain the target object category.

In this embodiment, the target objects in the video image include people, animals, vehicles, buildings, smoke and so on.

Specifically, detecting the target object in the video image to obtain the target object category includes:

(1) Identify the target object in the video image;

In this embodiment, the target object in the video image includes a static target object and a moving target object.

When the target object in the video image is a stationary target object, the stationary target object can be identified through a template-based detection method. Specifically, it includes: determining the contour of the target object shape in the video image, and matching the contour of the target object shape with a pre-stored template file.

For example, when the target object in the video image is a door, the outline of the shape of the target object can be determined to be a rectangle, and the rectangle is feature-matched with a pre-stored door template file to identify the target object. Wherein, the template file of the door is rectangular.

When the target object in the video image is a moving target object, it can be identified by at least one of the background difference method, the frame difference method, and the optical flow method. The background difference method is to perform background modeling on a relatively fixed scene in the video image, and the moving target object is obtained from the difference between the current image and the background model during detection; The corresponding position pixels between the frames are compared to obtain the position of the moving target object; the optical flow method uses the time-varying optical flow vector characteristics to detect the moving target object in the video image.

It is understandable that the above-mentioned method for detecting static target objects and moving target objects in a video image is not limited to the above-mentioned enumeration, and any reproduction method suitable for detecting a target object in a video image can be applied to this. In addition, the methods for detecting stationary target objects and moving target objects in a video image in this embodiment are all existing technologies, and will not be described in detail herein.

(2) Determine the category of the target object.

For example, when the target object in the video image is recognized as a car, it may be determined that the target object is a vehicle. The detection and classification of the target object is a very basic task in vision technology, and its purpose is to track some objects of interest in the scene, including conventional target object detection, person detection, vehicle detection, and so on.

In this embodiment, the basic attributes of the target object in the video image can be obtained by decomposing the target object in the video image, where the basic attributes include color, motion track, shape, structure, etc., and then The obtained basic attributes are compared with the basic attributes of the target object pre-stored in the database, so as to accurately identify the target object in the video image. The database stores a table corresponding to the basic attributes of the target object and the target object category.

The determining the category of the target object specifically includes: obtaining the basic attributes of the target object in the video image by decomposing the target object in the video image; and comparing the obtained basic attributes with the ones stored in a database in advance. The basic attributes of the target object are compared; when the acquired basic attributes are consistent with the basic attributes of the target object in the database, the corresponding table of basic attributes and target object categories stored in the database is queried to obtain the target object’s category.

Step S3, tracking the target object in the video image to obtain the state of the target object.

After the target object detection is completed, it is necessary to calculate the motion trajectory of each detected target object, so as to realize the tracking of the target object in the video image. In this embodiment, the state of the target object can be determined by tracking the target object in the video image.

The method for tracking the target object in the video image includes:

a) Determine the target object in the current video frame.

b) Obtain the image area of the target object in the previous video frame and the image characteristics of the image area, wherein the previous video frame is k video frames before the current video frame, and k is a positive integer.

c) Perform motion estimation on the target object according to the image area of the target object in the previous video frame, and determine the prediction area of the target object in the current video frame.

d) Determine the detection range of the target object in the current video frame according to the prediction area.

e) Determine whether the target object appears in the detection range of the current video frame, if the target object does not appear in the detection range of the current video frame, determine that the state of the target object is abnormal; if the target object appears in the detection range The detection range in the current video frame determines the image area of the target object in the current video frame, that is, the state of the target object is normal.

Since the pre-order video frame refers to the k video frames before the current video frame, the current video frame is estimated and compared and detected through the first k video frames, which requires a small amount of calculation and can solve the occasional loss of target objects in the video Or the problem of occlusion, the detection accuracy is higher.

Step S4: Analyze the business scene contained in the video image according to the category of the target object and the state of the target object.

In this embodiment, the category of the target object can be obtained according to the detection result, and the state of the target object can be determined according to the tracking result, so that the business scene contained in the video image can be analyzed.

For example, according to the detection result, it can be obtained that the category of the target object is a car. If the car does not appear in the detection range of the current video frame, it can be determined that the state of the car is abnormal. If the car is in a congested state, it can be It is learned that the business scene included in the video image is an intelligent transportation business scene.

In another example, according to the detection result, the category of the target object can be obtained as a pedestrian, and the pedestrian does not appear in the detection range in the current video frame, it can be determined that the state of the pedestrian is abnormal. If the pedestrian falls down, it can be known that the business scene included in the video image is an intelligent traffic business scene.

Also, if the category of the target object can be obtained as a door according to the detection result, and the door does not appear in the detection range in the current video, it can be confirmed that the status of the door is abnormal. If it is kept open, it can be judged that the door is in the video image. The business scenarios included are smart security business scenarios.

Step S5: Determine whether the business scene in the video image is abnormal. When the business scene in the video image is abnormal, step S6 is entered; when the business scene in the video image is not abnormal, the process ends.

In this embodiment, it can be determined whether the business scene in the video image is abnormal by analyzing the category of the target object and the state of the target object. For example, by determining whether the target object appears in the detection range of the current video frame, if the target object does not appear in the detection range of the current video frame, it is determined that the state of the target object is abnormal, that is, the target object corresponds to The business scenario also appeared abnormal.

In other implementation manners, the video image may be input to a pre-trained abnormality model, and whether the business scene in the video image is abnormal or not can be determined according to the abnormality model. Specifically, when it is determined that the target object is abnormal, extract the current video frame as an abnormal image; import the abnormal image as an image to be recognized into a pre-trained anomaly model, wherein the abnormal model is used to characterize the Identify the correspondence between the image and the abnormal scene; when the abnormal model outputs the abnormal scene corresponding to the image to be identified, confirm that the business scene is abnormal and the abnormal model includes abnormal models corresponding to different business scenarios. For example, when the business scene is an intelligent transportation business scene, the abnormal model corresponding to the intelligent transportation business scene includes a traffic accident model, a traffic congestion model, and an illegal scale type, etc.; when the business scene is a smart park business scene, The abnormal model corresponding to the business scene of the smart park includes a personal belongings model, a personnel intrusion model, etc.; when the business scene is a ferry monitoring business scene, the abnormal model corresponding to the ferry monitoring business scene includes an overload model, a falling water model, Illegal ship model, etc.

For example, when traffic congestion occurs in the current video frame in the video image, the current video frame is extracted as an abnormal image, and the abnormal image is imported into a pre-trained traffic congestion model as an image to be recognized. When the traffic jam model outputs the traffic jam scene corresponding to the image to be recognized, it is confirmed that there is an abnormality in the intelligent transportation business scene corresponding to the video image; when the traffic jam model does not output the traffic corresponding to the image to be recognized In a congested scene, confirm that the intelligent transportation service scene corresponding to the video image is normal.

The above-mentioned abnormal model is a machine learning model trained based on a picture sample set. The picture samples include abnormal business scene picture samples and normal business scene picture samples. The machine learning model is an artificial intelligence algorithm model that can perform image recognition, including: a convolutional neural network model CNN, a recurrent neural network module RNN, and a deep neural network model DNN. Among them, the convolutional neural network model CNN is a multi-layer neural network, which can continuously reduce the dimensionality of the image recognition problem with a huge amount of data, and finally enable it to be trained. Therefore, the machine learning model in the embodiment of the present application may be CNN model.

In the evolution of CNN network structure, many CNN networks have appeared, including LeNet, AlexNet, VGGNet, GoogleNet and ResNet. Among them, the ResNet network proposes a residual learning framework that reduces the burden of network training. This network is inherently deeper than the previously used network, and solves the problem of other neural networks that decrease in accuracy as the network deepens. In this embodiment, the machine learning model may be the ResNet model in the convolution application network model CNN. It should be noted that this is only an example, and other machine learning models that can perform image recognition are also applicable to this application, and will not be repeated here.

Step S6: When the business scene in the video image is abnormal, record the key information when the business scene is abnormal.

In this embodiment, the key information includes the time and place when the business scene is abnormal, and the picture file when the business scene is abnormal in the intercepted video image.

Further, the video analysis method further includes sending the recorded key information to a third-party service platform. The third-party service platform includes a public security system, a traffic control system, etc. By sending the recorded key information to the third-party business platform, the third-party business platform can help the third-party business platform to obtain key information when an exception occurs in a business scenario in time, so as to process the exception in time.

Further, the video analysis method further includes: displaying key information when the business scenario is abnormal. Specifically, information such as an abnormal picture, time, and point of the business scene is displayed on the display screen.

In summary, the video analysis method provided by this application includes receiving a video image collected by a camera; detecting a target object in the video image to obtain the category of the target object; tracking the target object in the video image to obtain the The status of the target object; the business scene contained in the video image is obtained by analyzing the category of the target object and the status of the target object; judging whether the business scene is abnormal; and when the business scene in the video image When an exception occurs, record the key information when the business scenario is abnormal. It is possible to analyze in real time whether the business scene corresponding to the video image is abnormal, and when it is confirmed that the business scene is abnormal, record the key information when the business scene is abnormal. Thereby, the key information can be sent to the corresponding third-party platform, and the exception can be handled in time.

Example two

FIG. 2 is a diagram of functional modules in a preferred embodiment of the video analysis device of this application.

In some embodiments, the video analysis device 20 runs in a computer device. The video analysis device 20 may include multiple functional modules composed of computer-readable instruction code segments. The instruction codes of each computer-readable instruction code segment in the video analysis device 20 can be stored in a memory and executed by at least one processor to perform (see FIG. 1 and related descriptions for details) video analysis functions.

In this embodiment, the video analysis device 20 can be divided into multiple functional modules according to the functions it performs. The functional modules may include: a receiving module 201, a detection module 202, a tracking module 203, an analysis module 204, a judgment module 205, and a processing module 206. The module referred to in this application refers to a series of computer-readable instruction code segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In some embodiments, the functions of each module will be detailed in subsequent embodiments.

The receiving module 201 is used to receive video images collected by a camera.

In this embodiment, the camera and the computer device are connected through a wired or wireless network communication. The camera sends the collected video images to the computer device through a wired or wireless network.

In this embodiment, the cameras may be cameras of different models and specifications manufactured by different manufacturers, and the video analysis device 20 may implement unified processing and analysis of video images taken by cameras of different models and specifications manufactured by different manufacturers.

Preferably, after receiving the video image collected by the camera, the video analysis device 20 may also decode the video image.

The detection module 202 is used to detect the target object in the video image to obtain the target object category.

(1) Identify the target object in the video image;

When the target object in the video image is a stationary target object, the stationary target object can be identified through a template-based detection method. Specifically, it includes: determining the contour of the target object shape in the video image, and performing feature matching between the contour of the target object shape and a pre-stored template file.

When the target object in the video image is a moving target object, it can be identified by at least one of the background difference method, the frame difference method, and the optical flow method. The background difference method is to perform background modeling on a relatively fixed scene in the video image, and the moving target object is obtained from the difference between the current image and the background model during detection; the frame difference method is to obtain the moving target object from the difference between the current image and the background model; The corresponding position pixels between the frames are compared to obtain the position of the moving target object; the optical flow method uses the time-varying optical flow vector characteristics to detect the moving target object in the video image.

In this embodiment, the method for detecting the stationary target object and the moving target object in the video image is not limited to the above-listed method, and any reproduction method suitable for detecting the target object in the video image can be applied to this. In addition, the methods for detecting stationary target objects and moving target objects in a video image in this embodiment are all existing technologies, and will not be described in detail herein.

(2) Determine the category of the target object.

The tracking module 203 is configured to track the target object in the video image to obtain the state of the target object.

The method for tracking the target object in the video image includes:

a) Determine the target object in the current video frame.

e) Determine whether the target object appears in the detection range of the current video frame, if the target object does not appear in the detection range of the current video frame, determine that the state of the target object is abnormal; if the target object appears in the detection range The detection range in the current video frame determines the image area of the target object in the current video frame.

The analysis module 204 is configured to analyze and obtain the business scenario contained in the video image according to the category of the target object and the state of the target object.

For example, according to the detection result, it can be obtained that the target object is a car. If the car does not appear in the detection range of the current video frame, it can be determined that the state of the car is abnormal. If the car is in a congested state, it can be known The business scene included in the video image is an intelligent transportation business scene.

The judgment module 205 is used to judge whether the business scene in the video image is abnormal. When the business scene in the video image is abnormal, the key information when the business scene is abnormal is recorded.

In other implementation manners, the video image may be input to a pre-trained abnormality model, and whether the business scene in the video image is abnormal or not can be determined according to the abnormality model. Specifically, when it is determined that the target object is abnormal, extract the current video frame as an abnormal image; import the abnormal image as an image to be recognized into a pre-trained abnormal model, where the abnormal model is used to characterize the Identify the correspondence between the image and the abnormal scene; when the abnormal model outputs the abnormal scene corresponding to the image to be identified, confirm that the business scene is abnormal and the abnormal model includes abnormal models corresponding to different business scenarios. For example, when the business scenario is an intelligent transportation business scenario, the abnormal model corresponding to the intelligent transportation business scenario includes a traffic accident model, a traffic congestion model, and an illegal scale type, etc.; when the business scenario is a smart park business scenario, The abnormal model corresponding to the business scene of the smart park includes a personal belongings model, a personnel intrusion model, etc.; when the business scene is a ferry monitoring business scene, the abnormal model corresponding to the ferry monitoring business scene includes an overload model, a falling water model, Illegal ship model, etc.

The processing module 206 is configured to record key information when the business scene is abnormal when the business scene in the video image is abnormal.

Further, the video analysis device 20 can also send the recorded key information to a third-party service platform. The third-party service platform includes a public security system, a traffic control system, etc. By sending the recorded key information to the third-party business platform, the third-party business platform can help the third-party business platform to obtain key information when an exception occurs in a business scenario in time, so as to process the exception in time.

Further, the video analysis device 20 can also display key information when the business scene is abnormal. Specifically, the display screen displays the time and place when the business scene is abnormal, and the picture file when the business scene is abnormal in the intercepted video image.

In summary, the video analysis device 20 provided by the present application includes a receiving module 201, a detection module 202, a tracking module 203, an analysis module 204, a judgment module 205, and a processing module 206. The receiving module 201 is used to receive a video image collected by a camera; the detection module 202 is used to detect a target object in the video image to obtain the target object category; the tracking module 203 is used to track the video image The target object in obtains the status of the target object; the analysis module 204 is configured to analyze and obtain the business scene contained in the video image according to the category of the target object and the status of the target object; the judgment module 205 It is used to determine whether the business scene is abnormal; and the processing module 206 is used to record key information when the business scene is abnormal when the business scene in the video image is abnormal. It is possible to analyze in real time whether the business scene corresponding to the video image is abnormal, and when it is confirmed that the business scene is abnormal, record the key information when the business scene is abnormal. Thereby, the key information can be sent to the corresponding third-party platform, and the exception can be handled in time.

The aforementioned integrated unit implemented in the form of a software function module may be stored in a non-volatile readable storage medium. The above-mentioned software function module is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal computer, a dual-screen device, or a network device, etc.) or a processor to execute the various embodiments of this application Method part.

Example three

FIG. 3 is a schematic diagram of a computer device provided in Embodiment 3 of this application.

The computer device 3 includes: a database 31, a memory 32, at least one processor 33, computer readable instructions 34 stored in the memory 32 and executable on the at least one processor 33, and at least one communication bus 35 .

When the at least one processor 33 executes the computer-readable instructions 34, the steps in the foregoing video analysis method embodiment are implemented.

Exemplarily, the computer-readable instructions 34 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 32 and executed by the at least one processor 33 Execute to complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 34 in the computer device 3.

The computer device 3 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit (application license Specific Integrated Circuit). , ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc. Those skilled in the art can understand that the schematic diagram 3 is only an example of the computer device 3, and does not constitute a limitation on the computer device 3. It may include more or less components than those shown in the figure, or combine certain components, or different components. For example, the computer device 3 may also include input and output devices, network access devices, buses, etc.

The database (Database) 31 is a warehouse built on the computer device 3 to organize, store and manage data according to a data structure. Databases are usually divided into three types: hierarchical database, network database and relational database. In this embodiment, the database 31 is used to store the video images and the like.

The at least one processor 33 may be a central processing unit (Central Processing Unit, CPU), or may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), application specific integrated circuits (ASICs). ), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The processor 33 can be a microprocessor or the processor 33 can also be any conventional processor, etc. The processor 33 is the control center of the computer device 3, and connects the entire computer device 3 with various interfaces and lines. Parts.

The memory 32 can be used to store the computer-readable instructions 34 and/or modules/units, and the processor 33 runs or executes the computer-readable instructions and/or modules/units stored in the memory 32, and The data stored in the memory 32 is called to realize various functions of the computer device 3. The memory 32 may mainly include a program storage area and a data storage area. The program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; The data (such as audio data, etc.) created according to the use of the computer device 3 and the like are stored. In addition, the memory 32 may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash Card). , At least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.

Computer readable instruction codes are stored in the memory 32, and the at least one processor 33 can call the computer readable instruction codes stored in the memory 32 to perform related functions. For example, the various modules (receiving module 201, detection module 202, tracking module 203, analysis module 204, judgment module 205, and processing module 206) described in FIG. 2 are computer-readable instruction codes stored in the memory 32, It is executed by the at least one processor 33 to realize the functions of the various modules to achieve the purpose of video analysis.

The receiving module 201 is used to receive video images collected by a camera;

The detection module 202 is configured to detect a target object in the video image to obtain the target object category;

The tracking module 203 is configured to track the target object in the video image to obtain the state of the target object;

The analysis module 204 is configured to analyze and obtain the business scene contained in the video image according to the category of the target object and the state of the target object;

The judgment module 205 is used to judge whether the business scenario is abnormal; and

If the integrated module/unit of the computer device 3 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile readable storage medium. When the computer program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the computer program includes computer readable instruction code, and the computer readable instruction code may be in the form of source code, object code, executable file, or some intermediate form. The non-volatile readable medium may include: any entity or device capable of carrying the computer readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) etc.

Although not shown, the computer device 3 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 33 through a power management system, so as to be implemented through a power management system. Manage functions such as charging, discharging, and power management. The power supply may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power supply status indicators and other arbitrary components. The computer device 3 may also include a Bluetooth module, a Wi-Fi module, etc., which will not be repeated here.

It should be understood that the described embodiments are for illustrative purposes only, and are not limited by this structure in the scope of the patent application.

In the several embodiments provided in this application, it should be understood that the disclosed electronic device and method may be implemented in other ways. For example, the electronic device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other division methods in actual implementation.

In addition, the functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional modules.

For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any reference signs in the claims should not be regarded as limiting the claims involved. In addition, it is obvious that the word "including" does not exclude other elements or, and the singular does not exclude the plural. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Modifications or equivalent replacements are made without departing from the spirit of the technical solution of the present application.

Claims

A video analysis method, characterized in that the method includes:

Receive video images collected by the camera;

Detecting the target object in the video image to obtain the target object category;

Tracking the target object in the video image to obtain the state of the target object;

Analyzing the business scene contained in the video image according to the category of the target object and the state of the target object;

Determine whether the business scenario is abnormal; and

When the business scene in the video image is abnormal, the key information when the business scene is abnormal is recorded.
The video analysis method of claim 1, wherein the detecting the target object in the video image to obtain the target object category comprises:

Obtaining the basic attributes of the target object in the video image by decomposing the target object in the video image;

Comparing the acquired basic attributes with the basic attributes of the target object pre-stored in the database;

When the acquired basic attribute is consistent with the basic attribute of the target object in the database, query a table corresponding to the basic attribute and target object category stored in the database to obtain the target object category.
The video analysis method of claim 1, wherein the tracking the target object in the video image to obtain the state of the target object comprises:

Determine the target object in the current video frame;

Acquiring the image area of the target object in the previous video frame and the image characteristics of the image area, where the previous video frame is k video frames before the current video frame, and k is a positive integer;

Perform motion estimation on the target object according to the image area of the target object in the previous video frame, and determine the prediction area of the target object in the current video frame;

Determine the detection range of the target object in the current video frame according to the prediction area;

Judging whether the target object appears in the detection range of the current video frame;

If the target object appears in the detection range of the current video frame, determine the image area of the target object in the current video frame;

If the target object does not appear in the detection range in the current video frame, it is determined that the target object is abnormal.
5. The video analysis method according to claim 3, wherein said judging whether the business scene is abnormal comprises:

When it is determined that the target object is abnormal, extract the current video frame as an abnormal image;

Importing the abnormal image as an image to be identified into a pre-trained anomaly model, where the abnormal model is used to characterize the correspondence between the image to be identified and the abnormal scene;

When the abnormal model outputs the abnormal scene corresponding to the image to be recognized, it is confirmed that the business scene is abnormal.
The video analysis method according to claim 1, wherein the key information includes the time and place when the business scene is abnormal, and the picture file when the business scene is abnormal in the intercepted video image.
5. The video analysis method of claim 5, wherein the method further comprises:

Send the recorded key information to a third-party business platform, where the third-party business platform includes a public security system and a traffic control system.
The video analysis method of claim 1, wherein after receiving the video image collected by the camera, the method further comprises:

Decoding the video image.
A video analysis device, characterized in that the device includes:

The receiving module is used to receive the video image collected by the camera;

The detection module is used to detect the target object in the video image to obtain the target object category;

A tracking module, used to track the target object in the video image to obtain the state of the target object;

An analysis module, configured to analyze and obtain the business scene contained in the video image according to the category of the target object and the state of the target object;

The judgment module is used to judge whether the business scenario is abnormal; and

The processing module is configured to record key information when the business scene is abnormal when the business scene in the video image is abnormal.
A computer device, characterized in that the computer device includes a processor and a memory, the memory stores at least one computer readable instruction, and the processor executes the at least one computer readable instruction to implement the following steps:

Receive video images collected by the camera;

Detecting the target object in the video image to obtain the target object category;

Tracking the target object in the video image to obtain the state of the target object;

Analyzing the business scene contained in the video image according to the category of the target object and the state of the target object;

Determine whether the business scenario is abnormal; and

When the business scene in the video image is abnormal, the key information when the business scene is abnormal is recorded.
The computer device according to claim 9, wherein when the processor executes the at least one computer-readable instruction to realize the detection of the target object in the video image to obtain the target object category, the method comprises :

Obtaining the basic attributes of the target object in the video image by decomposing the target object in the video image;

Comparing the acquired basic attributes with the basic attributes of the target object pre-stored in the database;

When the obtained basic attribute is consistent with the basic attribute of the target object in the database, the corresponding table of the basic attribute and the target object category stored in the database is queried to obtain the target object category.
The computer device according to claim 9, wherein when the processor executes the at least one computer-readable instruction to realize the tracking of the target object in the video image to obtain the state of the target object, it comprises :

Determine the target object in the current video frame;

Acquiring the image area of the target object in the previous video frame and the image characteristics of the image area, where the previous video frame is k video frames before the current video frame, and k is a positive integer;

Perform motion estimation on the target object according to the image area of the target object in the previous video frame, and determine the prediction area of the target object in the current video frame;

Determine the detection range of the target object in the current video frame according to the prediction area;

Judging whether the target object appears in the detection range of the current video frame;

If the target object appears in the detection range of the current video frame, determine the image area of the target object in the current video frame;

If the target object does not appear in the detection range in the current video frame, it is determined that the target object is abnormal.
The computer device according to claim 11, wherein when the processor executes the at least one computer-readable instruction to realize the judging whether the business scenario is abnormal, it comprises:

When it is determined that the target object is abnormal, extract the current video frame as an abnormal image;

Importing the abnormal image as an image to be identified into a pre-trained anomaly model, where the abnormal model is used to characterize the correspondence between the image to be identified and the abnormal scene;

When the abnormal model outputs the abnormal scene corresponding to the image to be recognized, it is confirmed that the business scene is abnormal.
9. The computer device of claim 9, wherein the processor is further configured to implement the following steps when executing the at least one computer readable instruction:

Send the recorded key information to a third-party business platform, where the third-party business platform includes a public security system and a traffic control system.
9. The computer device according to claim 9, wherein the processor executes the at least one computer-readable instruction to implement the following steps after receiving the video image collected by the camera:

Decoding the video image.
A non-volatile readable storage medium having at least one computer readable instruction stored on the non-volatile readable storage medium, wherein the at least one computer readable instruction is executed by a processor to realize The following steps:

Receive video images collected by the camera;

Detecting the target object in the video image to obtain the target object category;

Tracking the target object in the video image to obtain the state of the target object;

Analyzing the business scene contained in the video image according to the category of the target object and the state of the target object;

Determine whether the business scenario is abnormal; and

When the business scene in the video image is abnormal, the key information when the business scene is abnormal is recorded.
15. The storage medium of claim 15, wherein when the at least one computer-readable instruction is executed by a processor to realize the detection of the target object in the video image to obtain the target object category, the method comprises:

Obtaining the basic attributes of the target object in the video image by decomposing the target object in the video image;

Comparing the acquired basic attributes with the basic attributes of the target object pre-stored in the database;

When the acquired basic attribute is consistent with the basic attribute of the target object in the database, query a table corresponding to the basic attribute and target object category stored in the database to obtain the target object category.
15. The storage medium of claim 15, wherein when the at least one computer-readable instruction is executed by a processor to implement the tracking of the target object in the video image to obtain the state of the target object, the method comprises:

Determine the target object in the current video frame;

Acquiring the image area of the target object in the previous video frame and the image characteristics of the image area, where the previous video frame is k video frames before the current video frame, and k is a positive integer;

Perform motion estimation on the target object according to the image area of the target object in the previous video frame, and determine the prediction area of the target object in the current video frame;

Determine the detection range of the target object in the current video frame according to the prediction area;

Judging whether the target object appears in the detection range of the current video frame;

If the target object appears in the detection range of the current video frame, determine the image area of the target object in the current video frame;

If the target object does not appear in the detection range in the current video frame, it is determined that the target object is abnormal.
The storage medium according to claim 17, wherein when the at least one computer-readable instruction is executed by the processor to realize the judging whether the business scenario is abnormal, it comprises:

When it is determined that the target object is abnormal, extract the current video frame as an abnormal image;

Importing the abnormal image as an image to be identified into a pre-trained anomaly model, where the abnormal model is used to characterize the correspondence between the image to be identified and the abnormal scene;

When the abnormal model outputs the abnormal scene corresponding to the image to be recognized, it is confirmed that the business scene is abnormal.
15. The storage medium of claim 15, wherein the at least one computer readable instruction is further used to implement the following steps when executed by the processor:

Send the recorded key information to a third-party business platform, where the third-party business platform includes a public security system and a traffic control system.
The storage medium according to claim 15, wherein the at least one computer-readable instruction is executed by the processor to implement the following steps after the video image collected by the camera is received:

Decoding the video image.