CN113891072B

CN113891072B - Video monitoring and anomaly analysis system and method based on hundred million-level pixel data

Info

Publication number: CN113891072B
Application number: CN202111490304.7A
Authority: CN
Inventors: 袁潮; 温建伟; 邓迪旻
Original assignee: Beijing Zhuohe Technology Co Ltd
Current assignee: Beijing Zhuohe Technology Co Ltd
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-02-11
Anticipated expiration: 2041-12-08
Also published as: WO2023103819A1; CN113891072A

Abstract

The invention provides a video monitoring and anomaly analysis system and method based on hundred million-level pixel data, and belongs to the technical field of video monitoring and anomaly identification. The video monitoring system comprises a plurality of image acquisition arrays and an annular array storage module; storing the image data collected by the image collecting array to an outer ring memory; storing the image data containing the same target object in the acquired image data into an inner ring memory; performing video monitoring based on the image data stored in the outer loop memory; based on the results of the video monitoring, it is determined whether to clear the inner loop memory. The video anomaly analysis system comprises a first video anomaly analysis model and a second video anomaly analysis model; the first video anomaly analysis model executes first anomaly identification on the image data stored in the outer ring storage space; the second video anomaly analysis model performs second anomaly identification on the image data stored in the inner ring storage space. The invention realizes the rapid monitoring and accurate abnormal recognition of the video of hundred million-level pixel data.

Description

Video monitoring and anomaly analysis system and method based on hundred million-level pixel data

Technical Field

The invention belongs to the technical field of video monitoring and anomaly identification, and particularly relates to a video monitoring and anomaly analysis system and method based on hundred million-level pixel data, computer terminal equipment for realizing the method and a storage medium.

Background

At present, the intelligent video monitoring technology is widely applied to various monitoring scenes such as public safety monitoring, industrial site monitoring, residential area monitoring, traffic state monitoring and the like, and realizes the functions of crime prevention, traffic control, accident prevention and detection, old and young disease and disability monitoring and the like. Compared with the traditional monitoring, the multi-camera real-time video image splicing technology has many advantages. One is remote identification. At present, the traditional monitoring and identification range is 5-8 meters, and the multi-camera real-time video image splicing technology can identify and analyze targets within 100 meters. And secondly, multiple targets can be identified simultaneously.

Thanks to the development of microelectronics, image sensors, instead of film cameras, have been developed towards smaller pixels and larger array sizes in the hope of achieving greater image resolution. Current video capture data pixels can already exceed the tens or hundreds of millions of levels.

With the use and combination of image sensors of hundred million pixel image sensors and even higher resolution, resulting in a proliferation of data storage and processing, conventional video data fusion processing methods are no longer suitable. Particularly for video monitoring scenes with hundred million-level pixel data, the problems of data storage disorder, data storage overflow and the like are gradually highlighted, and the problems become important factors for restricting the processing speed of video monitoring data and the accuracy of video abnormity identification.

In this regard, the prior art has not proposed a fast and accurate solution for video monitoring and anomaly analysis for giga-level pixel data.

Disclosure of Invention

In order to solve the technical problems, the invention provides a video monitoring and anomaly analysis system and method based on hundred million-level pixel data, a computer terminal device for realizing the method and a storage medium.

In a first aspect of the present invention, a video monitoring system based on giga-level pixel data is presented, the video monitoring system comprising a plurality of image capturing arrays of different resolutions, each image capturing array comprising a plurality of video sensors;

the video monitoring system further comprises:

an annular array storage module comprising an outer ring storage space and an inner ring storage space;

the outer ring storage space comprises a first number of outer ring memories, and the inner ring storage space comprises a second number of inner ring memories;

storing the image data acquired by each of said image acquisition arrays in said outer loop memory;

image data including the same target object in the image data acquired by at least two different image acquisition arrays are stored in the inner ring memory;

performing video monitoring based on the image data stored in the outer loop memory;

determining whether to clear the inner loop memory based on a result of the video monitoring.

Specifically, the annular array storage modules are all annular stacks, that is, the outer ring storage space and the inner ring storage space are both formed by annular stacks.

The first number of outer-loop memories form an outer-loop memory array, namely an outer-loop stack array;

the second number of inner ring memories form an inner ring memory array, namely an inner ring stack array;

the outer ring memory array and the inner ring memory array are annular memory stacks combined in the same annular stack memory module.

It is worth pointing out that, unlike the prior art which generally adopts a queue storage mode, the technical scheme of the invention adopts stack storage, which can avoid the problem of untimely identification caused by the failure to acquire the latest data in video monitoring; especially, the annular stack storage is adopted, so that the problem of data storage overflow in the generation scene of hundred million-level pixel data can be further avoided.

Further, the target object comprises a target area and a target person;

at least two image acquisition arrays with different resolutions exist for carrying out video monitoring on the same target area, and the resolution of at least one image acquisition array is not lower than hundred million pixels.

And when the target person is monitored in the target area, storing the image data with the highest resolution acquired by the image acquisition array into the inner ring memory.

When the target person is monitored in the target area, storing the image data with the lowest resolution ratio acquired by the image acquisition array into the outer ring memory;

performing video monitoring based on the image data stored in the outer loop memory to determine whether to clear the inner loop memory, specifically comprising:

and if the target person with abnormal behavior does not exist, emptying the inner ring memory.

Based on the technical scheme, the invention can avoid excessive invalid data storage of the inner ring memory, and reduce the data storage pressure; meanwhile, stack space is timely made up for critical real-time data.

In a second aspect of the invention, a video anomaly analysis system based on giga-level pixel data is provided, which is connected to the video monitoring system of the first aspect.

The video anomaly analysis system comprises a first video anomaly analysis model and a second video anomaly analysis model;

performing first anomaly identification on the image data stored in the outer ring storage space by adopting the first video anomaly analysis model;

executing second anomaly identification on the image data stored in the inner ring storage space by adopting the second video anomaly analysis model;

the precision of the first video abnormity analysis model is lower than that of the second video abnormity analysis model;

and when the output result of the first abnormality recognition model is yes, the second abnormality recognition is started.

Specifically, the first video anomaly analysis model is an action anomaly identification model;

the second video anomaly analysis model is an image frame semantic anomaly identification model which comprises region pixel value variation analysis, region edge variation analysis and image text semantic analysis.

In the technical scheme, different from the prior art, the method adopts a two-stage video anomaly identification model, firstly, a first video anomaly analysis model with lower precision is adopted to execute first anomaly identification on the image data stored in the outer ring storage space, and at the moment, the first anomaly identification only executes action anomaly identification to quickly obtain a result; then, the second video anomaly analysis model with higher precision is adopted to execute second anomaly identification on the image data stored in the inner ring storage space, at the moment, the second anomaly identification comprises regional pixel value change analysis, regional edge change analysis and image text semantic analysis, and the fine part can be accurately identified to obtain accurate judgment, so that the speed and the accuracy can be considered in combination of the two aspects.

In a third aspect of the present invention, a method for monitoring video based on hundred million-level pixel data is provided, wherein after video images are collected by a plurality of image collecting arrays comprising hundred million-level image sensors, the video images are monitored and analyzed.

In a main step, the method comprises the steps of:

s810: starting a first image acquisition array to obtain a plurality of video images with a first resolution;

s820: storing the plurality of video images to an outer loop memory array;

s830: identifying a plurality of sub video images containing the same target object in a plurality of video images;

s840: storing the plurality of sub-video images to an inner ring memory array;

s850: performing video monitoring based on the video images stored in the outer loop memory;

s860: determining whether to clear the inner loop memory based on a result of the video monitoring.

Wherein the target object comprises a target person and a target area;

at least two image acquisition arrays with different resolutions exist in the same target area for video monitoring, and the resolution of at least one image acquisition array is not lower than hundred million pixels;

and when the target person is monitored in the target area, storing the image data acquired by the image acquisition array with the highest resolution into the inner ring memory array.

In a fourth aspect of the present invention, there is provided a video anomaly analysis method based on giga-level pixel data, the method comprising:

s100: storing image data containing the same target object in the image data stored in at least two different outer ring memories into an inner ring memory array;

s101: performing first anomaly identification on image data stored in an outer ring memory array by adopting a first video anomaly analysis model;

s102: judging whether the first abnormal recognition result is yes; if yes, go to step S103; otherwise, judging whether the inner ring memory is full; if the stack is full, clearing the inner ring memory array, and returning to the step S100;

s103: performing second anomaly identification on the image data stored in the inner ring memory array by adopting a second video anomaly analysis model;

the outer ring memory array and the inner ring memory array are annular memory stacks combined in the same annular array memory module; the accuracy of the first video anomaly analysis model is lower than that of the second video anomaly analysis model.

In a fifth aspect of the present invention, a terminal device, which may be a data interaction device, for example, includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the computer program may be a data interaction program, and the processor executes the computer program to implement all or part of the steps of the method according to the third aspect or the fourth aspect.

In a sixth aspect of the present invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements all or part of the steps of the method of the third or fourth aspect.

In the technical scheme, a stack storage mode is adopted, so that the last-in first-out characteristic of a stack can be fully utilized, and the video data read from the stack firstly is always the data closest to the current processing time node; by adopting the annular stack form, the overflow phenomenon can be avoided during data storage; and the processing mode of active emptying when the stack is full further ensures that the key real-time data has corresponding stack storage space.

Further advantages of the invention will be apparent in the detailed description section in conjunction with the drawings attached hereto.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of a video surveillance system based on giga-level pixel data according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an external view of a camera array used in the video inspection system of FIG. 1;

FIG. 3 is a further exploded schematic view of a portion of the internal structure of the camera array of FIG. 2;

fig. 4(a) -4 (B) are schematic layout views of a flange structure for fixing the camera of fig. 1;

FIG. 5 is a block diagram of a video anomaly analysis system based on hundred million levels of pixel data according to an embodiment of the present invention;

FIG. 6 is a flow chart of a method for video surveillance based on giga-level pixel data according to an embodiment of the present invention;

FIG. 7 is a flow chart of a method for video anomaly analysis based on hundred million levels of pixel data according to an embodiment of the present invention;

fig. 8 is a block diagram of a computer device implementing all or part of the steps of the method of fig. 6 or 7.

Detailed Description

The invention is further described with reference to the following drawings and detailed description.

First, partial term meanings and mathematical symbolic expression meanings of various embodiments of the present invention are described.

Array: the device is an element array formed by arranging a plurality of elements together according to a certain shape or rule.

In the invention, each image acquisition array is formed by arranging a plurality of video sensor elements (such as photosensitive chips) according to a certain rule, and the number of pixels for video acquisition can be increased by the plurality of video sensor elements;

circular array storage is to form a circular array by a plurality of memories.

Stack: a special storage structure for linear tables. Unlike queue FIFO, the stack can only insert and delete data from the fixed end of the table, and the other end is dead, i.e. the data operation of the stack is FIFO.

Annular stack: and forming a plurality of stacks into a ring array and then storing data.

On this basis, various embodiments of the present invention are described.

Fig. 1 is a schematic structural diagram of a video surveillance system based on giga-level pixel data according to an embodiment of the present invention.

In fig. 1, the video monitoring system based on giga-level pixel data comprises a plurality of image acquisition arrays with different resolutions, each image acquisition array comprising a plurality of video sensors;

the video monitoring system further comprises:

the outer ring storage space includes a first number of outer ring memories and the inner ring storage space includes a second number of inner ring memories.

In fig. 1, the first number is greater than the second number; the first number is not less than the number of image acquisition arrays; and storing the image data acquired by each image acquisition array into an outer ring memory corresponding to the image acquisition array.

It is worth pointing out that, unlike the prior art which generally adopts a queue storage (queue is first-in first-out), the technical scheme of the invention adopts stack storage, which can avoid the problem of untimely identification caused by the inability to acquire the latest data in video monitoring; especially, the circular stack storage is adopted, so that the problem of overflow of data storage (overflow of queues) in the generation scene of hundred million-level pixel data can be further avoided.

In fig. 1, the plurality of image capturing arrays comprise video sensors that are image capturing sensors, at least one image sensor having a resolution greater than 1 billion;

preferably, the plurality of image sensors each have a resolution greater than 1 hundred million.

As a more specific example, each image sensor is a hundred million pixel wide field of view camera, the camera field of view can reach 180-360 degrees, and the number of pixels of the camera can reach more than 1 hundred million, and can be expanded to higher resolution as required.

Specifically, see the schematic external view of fig. 2 and the partially disassembled schematic view of fig. 3.

The camera is composed of a plurality of lenses, video pictures shot by the lenses are consistent in the vertical direction, different angles are shot in the horizontal direction, and the pictures shot by the cameras have a certain overlapping area, so that enough characteristics are ensured to be spliced and fused between the video pictures of the adjacent lenses.

Referring to fig. 2, the camera is integrally composed of a bottom cavity, a support frame for fixing a flange and the like inside, and an upper cover. The front part of the bottom cavity is designed to be in a circular arc shape, circular holes are reserved according to the number of lenses, the diameter of each hole is determined according to the visual range of different lenses, and the larger the visual range is, the larger the hole is. The outer side of the hole is provided with a platform for sticking glass, and the outer side of the glass is fixed by a circular ring.

Referring further to fig. 3, the inner flange support is used to position the lens module. In the wide-field camera, all lenses are at the same height on the visual field, but in the structural part design, in order to reduce the volume of the camera, a 2-layer design is adopted. The lenses are distributed on the upper and lower support frames at intervals. The upper and lower support frames determine the angle and position relationship through the positioning holes. The flange positioning surface of each support frame is determined according to the focal length of the lens, and a certain overlapping area of video pictures of 1) adjacent lenses is mainly ensured; 2) all the lens pictures on the upper and lower 2 support frames are kept consistent in the horizontal direction.

The foremost end of the lower layer support frame is provided with a reinforcing piece for reinforcing the support frame. Every camera lens is furnished with a module, and the module comprises mounting flange, camera sensitization device integrated circuit board and video processing chip integrated circuit board, uses the copper post fixed between two circuit boards to all there is connecting terminal on two integrated circuit boards, uses the winding displacement to link to each other. Fig. 4(a) -4 (B) are schematic layout views of a flange structure for fixing the camera of fig. 1. Wherein, the front and the periphery of the flange are provided with screw holes, which is convenient for fixing the flange on a bracket under different conditions. And a round hole for fixing the lens is reserved in the middle of the front surface of the flange.

In a specific embodiment, each lens of the camera achieves a resolution of 4K or 8K, and since the fields of view of the lenses are on the same horizontal line, the photosensitive component can be placed horizontally or vertically to obtain a camera with different vertical field angles, that is, the flange in fig. 4(a) can be placed horizontally in the manner shown in the figure, or can be placed by rotating it by 90 ° with reference to fig. 4 (B).

When the photosensitive component is vertically placed, although the vertical field angle is increased, the horizontal field angle of a single photosensitive component is correspondingly reduced, so that more lenses are needed when the whole camera is required to reach the same field angle (for example, 180 degrees); therefore, the actual arrangement needs to be determined according to the number of shots.

In practical application, the target object comprises a target area;

In another application scenario, the target object comprises a target person;

Reference is next made to fig. 5.

Fig. 5 is a schematic structural diagram of a video anomaly analysis system based on giga-level pixel data according to an embodiment of the present invention.

In fig. 5, the video anomaly analysis system is connected to the video monitoring system described in fig. 1.

The embodiment of fig. 5 adopts a two-stage video anomaly identification model, and first a first video anomaly analysis model with lower precision is adopted to perform first anomaly identification on the image data stored in the outer ring storage space, and at this time, the first anomaly identification only performs action anomaly identification to quickly obtain a result; then, the second video anomaly analysis model with higher precision is adopted to execute second anomaly identification on the image data stored in the inner ring storage space, at the moment, the second anomaly identification comprises regional pixel value change analysis, regional edge change analysis and image text semantic analysis, and the fine part can be accurately identified to obtain accurate judgment, so that the speed and the accuracy can be considered in combination of the two aspects.

For specific implementation of the first video anomaly analysis model, i.e., the motion anomaly recognition model, and the second video anomaly analysis model, i.e., the semantic recognition model, reference may be made to related prior art.

The related prior art of action abnormality recognition is mature, and the details can be seen as follows:

zhuhua, study and implementation of a video abnormal event detection algorithm based on the da vinci platform [ D ]. university of electronic technology, 2016;

flare-out, video anomalous event detection algorithm research, university of graduates, 2015.

Yi Cheng Xiang, City traffic abnormal event detection algorithm based on video monitoring research [ D ]. Shenyang university.

The above prior art is introduced as part of the present embodiment to facilitate understanding of the present invention.

The video anomaly analysis based on semantic recognition has higher precision than motion recognition, and the embodiment is simply expanded by combining the prior art so as to facilitate better understanding of the application by the technical personnel in the field.

Taking a construction site panoramic monitoring scene as an example, the basic principle of semantic recognition and analysis is to divide a construction scene into 2 types of unsafe behavior scenes and unsafe state scenes on the basis of analyzing scene composition factors in order to realize hidden danger scene recognition and intelligent construction safety general scene data processing. The object semantics, the spatial relationship semantics, the scene semantics and the behavior semantics are extracted by combining the division of the image semantic hierarchy and applying a deep learning method, and then the corresponding relationship between the image semantic information and the general scene data is obtained according to the construction safety general scene data theory, so that unsafe behaviors and unsafe scene general object data can be automatically collected, and the scene data processing efficiency is improved.

From the computer vision angle analysis, the construction scene image semantics can be divided into 3 levels of bottom layer characteristics, middle layer semantics and high layer vision according to different understanding levels.

The bottom layer characteristics are low-level visual information which can be directly obtained from the image and are the most direct and objective description of the visual characteristics of the image; the middle level semantics are represented by a visual bag of words model or semantic topic. The middle layer semantics are characteristic information derived and evolved from the low layer visual characteristic derivation and used for representing the content-based image. High-level vision refers to semantic information obtained by people performing high-level abstract cognition on images, and the semantic information usually contains higher-level and more abstract semantics than lower levels. The bottom layer features and the high-level vision understood by people have semantic gap, namely, the semantic gap is not only established on the characteristics of bottom layer texture color, shape and the like on the basis of establishing understanding description object semantics when judging the similarity of images.

The method comprises the steps of sequentially performing feature semantics (texture-color-shape information, region pixel value change information and region edge change information), object semantics, spatial relationship semantics, scene semantics, behavior semantics and emotion semantics from a low level to a high level. The image semantics mainly relate to 4 levels of object semantics, spatial relationship semantics, scene semantics and behavior semantics by combining actual unsafe object state and unsafe behavior scene description.

And (3) selecting a CNN deep learning structure to extract object semantics, wherein the method comprises 4 processes of information input, preprocessing, feature extraction and selection and classification decision learning. The CNN can directly process two-dimensional images, the image feature extraction and the dimension reduction are carried out step by step, the convolution layer extracts image features through convolution kernels, the sampling layer carries out the dimension reduction on the image features, and the classification is carried out through a full connection layer and a classification layer.

For the spatial relationship semantics, on the basis of object identification, in order to locate each object area, a Minimum Bounding Rectangle (MBR) method is used to approximately estimate the area, a series of rectangles are obtained through processing, and each Rectangle corresponds to the orientation relationship between a closed curve object of the object outline, and the orientation relationship can be determined by establishing a direction relationship matrix, so that the form description of the spatial position of the object in 8 directions (up/down left/right/up-left/down-left/up-right/down-right) and the relationship between the adjacency and the overlap between the objects can be intuitively obtained. For the spatial relationship semantics, on the basis of object recognition, in order to locate each object region, approximate estimation is carried out on the region by using a minimum edge rectangle method, a series of rectangles are obtained through processing, and each rectangle corresponds to a closed curve of the object outline. The orientation relation between the objects can be determined by establishing a direction relation matrix. In this way, a formal description of the spatial positions of the objects in 8 directions (up/down/left/right/up-left/down-left/up-right/down-right) and the adjacent and overlapping relationships between the objects can be intuitively obtained.

The extraction of the behavior semantics mainly comprises 2 steps of recognizing the action posture of a person and describing the human body behavior by natural language.

Step 1, identifying the action gesture of a person in a construction scene by using a DBN (database-based network), firstly, modeling a background by using a Gaussian mixture model, extracting a foreground by using a background subtraction method, obtaining a binary image when the foreground moves, then inputting the binary image, obtaining multi-level characteristics of various behaviors by deep learning, and finally identifying various different action gestures by using the DBN obtained by training;

in the behavior identification process, the DBN is a probability model comprising a plurality of hidden layers, the output result of each low-layer RBM is used as input data for training the next RBM, a group of RBMs is obtained through greedy learning, and the group of RBMs can form the DBN;

and 2, describing the behaviors of the target human body in the scene by using a natural language in combination with scene semantics. Since the character scene and behavior state related to behavior semantics are specific, on one hand, extraction of behavior semantics must rely on a relatively complete knowledge base, and a support system of the knowledge base needs to have certain reasoning capability, for example, if the recognized character acts as lying down, and the behavior semantics can be described as that a worker sleeps on a construction rack (there is a risk factor) in combination with scene semantics of the worker on the construction rack.

Fig. 6 is a flow chart of a video monitoring method based on giga-level pixel data according to an embodiment of the present invention.

In fig. 6, a method for video monitoring based on giga-level pixel data is shown, the method comprises the following steps after video images are collected by a plurality of image collecting arrays comprising giga-level image sensors, and the video images are subjected to monitoring analysis:

s820: storing the plurality of video images to an outer loop memory array;

s840: storing the plurality of sub-video images to an inner ring memory array;

The target object comprises a target person and a target area;

and when the target person is monitored in the target area, storing image data acquired by the image acquisition array with higher resolution into the inner ring memory array.

Fig. 7 is a flowchart illustrating a method for analyzing video anomalies based on giga-level pixel data according to an embodiment of the present invention.

The video anomaly analysis method based on hundred million-level pixel data shown in fig. 7 includes steps S100 to S103, and each step is implemented as follows:

The first video abnormity analysis model is an action abnormity identification model;

It should be noted that the methods and processes described in fig. 6 or fig. 7 can be implemented automatically by computer program instructions. Thus, referring to fig. 8, there is provided an electronic computer device, which may be a data interaction device, comprising a bus, a processor, and a memory for storing a computer program comprising program instructions, the processor being operative to execute the program instructions stored by the computer storage medium.

The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and an input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement the steps of the aforementioned method examples. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

The invention adopts a stack storage form, and can fully utilize the last-in first-out characteristic of the stack, so that the video data read from the stack at first is always the data closest to the current processing time node; by adopting the annular stack form, the overflow phenomenon can be avoided during data storage; the processing mode of active emptying when the stack is full further ensures that the key real-time data has corresponding stack storage space;

in addition, the video anomaly analysis adopts a two-stage video anomaly identification model, firstly, a first video anomaly analysis model with lower precision is adopted to execute first anomaly identification on the image data stored in the outer ring storage space, and at the moment, the first anomaly identification only executes action anomaly identification so as to quickly obtain a result; then, the second video anomaly analysis model with higher precision is adopted to execute second anomaly identification on the image data stored in the inner ring storage space, at the moment, the second anomaly identification comprises regional pixel value change analysis, regional edge change analysis and image text semantic analysis, and the fine part can be accurately identified to obtain accurate judgment, so that the speed and the accuracy can be considered in combination of the two aspects.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

The present invention is not limited to the specific module structure described in the prior art. The prior art mentioned in the background section can be used as part of the invention to understand the meaning of some technical features or parameters. The scope of the present invention is defined by the claims.

Claims

1. A video monitoring system based on giga-level pixel data, the video monitoring system comprising a plurality of image capture arrays of different resolutions, each image capture array comprising a plurality of video sensors;

it is characterized in that the preparation method is characterized in that,

the video monitoring system further comprises:

2. A video surveillance system based on giga-level pixel data as claimed in claim 1, characterized by:

the target object comprises a target area;

3. A video surveillance system based on giga-level pixel data as claimed in claim 2, characterized by:

the target object comprises a target person;

4. A video surveillance system based on giga-level pixel data as claimed in claim 2 or 3, characterized by:

the target object comprises a target person;

5. A video surveillance system based on giga-level pixel data as claimed in any one of claims 1-3, characterized by:

the first number is greater than the second number;

the first number is not less than the number of image acquisition arrays;

and storing the image data acquired by each image acquisition array into an outer ring memory corresponding to the image acquisition array.

6. A video anomaly analysis system based on giga-level pixel data, said video anomaly analysis system being connected to a video surveillance system according to any one of claims 1-5, characterized by:

and when the output result of the first video abnormity analysis model is yes, starting the second abnormity identification.

7. The system of claim 6, wherein the video anomaly analysis system comprises:

8. A video monitoring method based on hundred million levels of pixel data, after a plurality of image acquisition arrays including hundred million levels of image sensors are used for acquiring video images, the video images are monitored and analyzed, and the method is characterized by comprising the following steps:

s820: storing the plurality of video images to an outer loop memory array;

s840: storing the plurality of sub-video images to an inner ring memory array;

9. The method of claim 8, wherein the video surveillance system comprises:

the target object comprises a target person and a target area;

10. A method for video anomaly analysis based on giga-level pixel data, the method comprising: s100: storing image data containing the same target object in the image data stored in at least two different outer ring memories into an inner ring memory array;