CN111090773A

CN111090773A - Digital retina architecture and software architecture method and system

Info

Publication number: CN111090773A
Application number: CN201910804261.1A
Authority: CN
Inventors: 贾惠柱; 李源; 杨长水; 齐峰; 解晓东; 高文
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2020-05-01
Anticipated expiration: 2039-08-28
Also published as: CN111090773B

Abstract

The invention relates to the field of security monitoring and artificial intelligence, in particular to a digital retina architecture and software architecture method and system. The method comprises the following steps: accessing a video stream; converting the video stream into a video condensed stream and a characteristic stream at a front end; transmitting the video condensed stream and the characteristic stream to the cloud end; the cloud stores the received video concentrated stream and the characteristic stream, receives an offline retrieval instruction of the terminal, returns a result obtained by offline retrieval to the terminal for display or/and receives a real-time tracking instruction of the terminal, and returns a result obtained by real-time tracking to the terminal for display. The system architecture of the existing video monitoring system is changed, video monitoring is changed into intelligent monitoring, and the problems of intelligent analysis and system application of large-scale monitoring videos are solved.

Description

Digital retina architecture and software architecture method and system

Technical Field

The invention relates to the field of security monitoring, in particular to a digital retina architecture and software architecture method and system.

Background

The video monitoring system deployed at present adopts the technical standard H.264 more than ten years ago, has low data compression efficiency, high construction cost and poor application effect, and is mainly expressed as follows:

1) early standards compressed inefficient. Under the condition of ensuring the video quality, the estimated cost of hundreds of millions of cameras deployed in China needs storage cost, and under the condition that the storage space is insufficient in each place, the videos are often over-compressed, so that the quality of a large number of video images is seriously degraded, and key people and vehicles cannot be clearly seen when a case or a safety accident occurs;

2) and the monitoring video is difficult to network. Cameras deployed in many provinces and cities exceed millions, but the cameras adopt old standard codes, so that hundreds of videos can be transmitted in real time under the existing communication bandwidth condition, and most monitoring videos cannot be effectively utilized;

3) highly dense cameras cannot cover the full scene. Although the cameras in partial areas are distributed at high density, the full scene coverage still cannot be carried out, the information shot by the ground cameras in the area covered by the cameras is limited, and meanwhile, the redundancy of video data acquired all weather is high, the global valuable information is difficult to extract, so that huge information waste is caused;

4) and massive videos are difficult to retrieve. The traditional video monitoring system realizes the playback and evidence collection of an event by monitoring personnel looking up and reading a historical video, the manual playback and evidence collection mode of the video has low efficiency, and although the image retrieval technology is rapidly developed, the traditional video monitoring system is applied in the industrial field, particularly the large-scale application in the security field is still in need of solution;

5) video precision analysis is lacking. In the actual combat application of departments such as public security and the like, the video monitoring technology has the problems of slow video retrieval and difficult analysis, and has positive significance on how to find important and valuable clues from massive videos, such as rapid identification and positioning of targets and digging of action tracks of the targets, shortening of event processing time, reduction of working intensity of law enforcement personnel and improvement of working efficiency.

Disclosure of Invention

The embodiment of the invention provides a digital retina architecture and software architecture method and system, changes the architecture of the existing video monitoring system, changes video monitoring into intelligent monitoring, and solves the problems of intelligent analysis and system application of large-scale monitoring videos.

According to a first aspect of embodiments of the present invention, a digital retina architecture and software architecture method includes:

accessing a video stream;

processing and converting the video stream into a video concentrated stream and a characteristic stream at front-end monitoring equipment;

matching the video concentrated stream with the characteristic stream, packaging and transmitting to a cloud server;

the cloud server stores the received encapsulated video condensed stream and feature stream, and

receiving an offline retrieval instruction of the terminal, and returning a result obtained by offline retrieval to the terminal for displaying; and/or

And receiving a real-time tracking instruction of the terminal, and returning a result obtained by real-time tracking to the terminal for displaying.

The converting the video stream into the video concentrated stream and the feature stream at the front-end monitoring device specifically includes:

compressing the video stream to obtain a video concentrated stream;

carrying out input adaptation and system scheduling on the video stream, and then carrying out preprocessing;

and carrying out intelligent operation on the preprocessed video stream to obtain the characteristic stream of the video stream.

The cloud server stores the received encapsulated video concentrated stream and feature stream, and specifically comprises:

and the double-flow intelligent interaction middleware of the cloud server receives the encapsulated video concentrated flow and the encapsulated characteristic flow and respectively stores the video concentrated flow and the encapsulated characteristic flow in corresponding databases.

The receiving of the offline retrieval instruction of the terminal, and the returning of the result obtained by the offline retrieval to the terminal for display specifically include:

receiving an offline retrieval instruction of a terminal, and sending an interaction target of the offline retrieval instruction to a corresponding calculation engine by referring to an interaction middleware;

calculating by a calculation engine to obtain target characteristics, and sending the target characteristics to a retrieval engine through an application intelligent interaction middleware;

and the retrieval engine retrieves in a corresponding database and sends the retrieval result and the video associated with the retrieval result to the terminal for displaying.

Receiving a real-time tracking instruction of the terminal, and returning a result obtained by real-time tracking to the terminal for displaying, wherein the method specifically comprises the following steps:

the terminal sends the interactive target of the real-time tracking instruction to the calculation engine through the application intelligent interactive middleware of the cloud server, the calculation engine sends the target characteristics obtained by calculating the interactive target to the front-end monitoring equipment through the double-flow intelligent interactive middleware,

the front-end monitoring equipment matches the received target characteristics with the detected target in real time, and sends the matching result after real-time matching to the terminal for displaying through the double-current intelligent interaction middleware and the application intelligent interaction middleware.

A digital retina architecture and software architecture system, comprising: a front-end monitoring device, a cloud server and a terminal,

the front-end monitoring equipment is used for converting the video stream into a video concentrated stream and a characteristic stream in the front-end monitoring equipment, receiving target characteristics sent by the cloud server, and sending a matching result to the cloud server after real-time matching;

and the cloud server is used for receiving the matching results of the video concentrated stream, the feature stream and the real-time tracking of the front-end monitoring equipment, receiving an offline retrieval instruction or/and a real-time tracking instruction of the terminal, and returning the retrieval results obtained by offline retrieval or/and real-time tracking to the terminal for displaying.

And the terminal is used for sending the offline retrieval instruction or/and the real-time tracking instruction and receiving the retrieval result of the offline retrieval or/and the matching result of the real-time tracking.

The cloud server comprises a double-current intelligent interaction middleware module, an application intelligent interaction middleware module, a calculation engine module, a retrieval engine module and a database module,

the double-flow intelligent interaction middleware module is used for receiving the video concentrated flow and the characteristic flow sent by the front end and the matching result of real-time tracking, storing the video concentrated flow and the characteristic flow in the database module, sending the matching result of real-time tracking to the application intelligent interaction middleware module, receiving the target characteristics transmitted by the calculation engine module by the double-flow intelligent interaction middleware module, and sending the interaction target of the application intelligent interaction middleware module to the front end monitoring equipment for real-time matching;

the application intelligent interactive middleware module is used for receiving an offline retrieval instruction and/or a real-time tracking instruction of the terminal, converting an interactive target of the real-time tracking instruction into a target characteristic through calculation of the calculation engine module and then sending the target characteristic to the dual-flow intelligent interactive middleware module, sending the interactive target of the offline retrieval instruction to the calculation engine module, receiving the target characteristic of the calculation engine module and sending the target characteristic to the retrieval engine module, receiving an offline retrieval result returned by the dual-flow intelligent interactive middleware module or/and a matching result obtained by real-time tracking returned by the retrieval engine module, and sending the retrieval result and/or the matching result to the terminal;

the calculation engine module is used for receiving the interactive target for off-line retrieval or/and real-time tracking, converting the interactive target into target characteristics, and sending the target characteristics for off-line retrieval and/or the target characteristics for real-time tracking to the double-current intelligent interaction middleware;

the retrieval engine module is used for receiving the target characteristics of the application intelligent interaction middleware module and calling a related retrieval result in the database module according to a target IP retrieved from the database module;

and the database module is used for storing the video concentration stream and the feature stream.

The front-end monitoring equipment comprises an intelligent converter module, the intelligent converter module is used for preprocessing after video input adaptation and system scheduling, further characteristic flow of video flow is obtained through intelligent operation, the preprocessed video flow is compressed to obtain video concentrated flow, the video concentrated flow and the characteristic flow are matched and then packaged and then transmitted to a cloud server, the intelligent converter module sends matching results matched with target characteristics in real-time tracking to a double-flow intelligent interaction middleware in the intelligent operation, and retrieval results are sent to a terminal through the double-flow intelligent interaction middleware and the application intelligent interaction middleware.

The database module comprises a video library, a fusion visual feature library, a structural library and a picture library;

the video library is used for storing the video condensed stream;

the fusion visual feature library is used for storing fusion visual feature data of the feature stream;

the structured library is used for storing structured data of the feature stream;

the picture library is used for storing picture data of the characteristic stream;

the calculation engine module comprises a structural engine submodule and a fusion visual characteristic storage engine submodule;

the structural engine submodule is used for carrying out structural calculation to obtain structural information of the target characteristics;

the visual feature storage engine submodule is fused for calculating pictures to obtain visual feature information of the target features;

the retrieval engine module comprises a feature storage engine submodule and a video storage distribution engine submodule,

the feature storage engine submodule is used for retrieving a feature library to obtain a target IP and calling a retrieval result;

the video storage and distribution engine submodule is used for calling videos, namely videos related to the target IP retrieved in corresponding databases in the structuring engine submodule, the fusion visual feature storage engine submodule and the feature storage engine submodule.

The front-end monitoring equipment further comprises an intelligent conversion box, the intelligent conversion box is connected into the video stream, the video stream is decoded, encoded, detected and tracked to obtain a video concentrated stream and a target picture, the video concentrated stream and the target picture are packaged and transmitted to the intelligent converter, and the intelligent converter completes intelligent operation and sends a result to the cloud server.

The technical scheme provided by the embodiment of the invention has the following beneficial effects: the invention supports the front-end monitoring equipment to be additionally provided with the intelligent converter, so that the real-time transcoding processing and intelligent analysis of large-scale monitoring videos are realized at the front end, the processing result and the concentrated code stream are synchronized and then efficiently gathered and stored in the cloud server, and the statistical analysis and application display of the multi-metadata are provided.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow chart of a digital retina architecture and software architecture method of the present invention;

FIG. 2 is a schematic diagram of a data processing flow of the intelligent converter;

FIG. 3 is a schematic diagram of a data processing flow for a smart converter box;

fig. 4 is a schematic diagram of a cloud server data processing flow;

FIG. 5 is a diagram of a digital retina architecture and software architecture system according to the present invention.

Detailed Description

Example one

A digital retina architecture and software architecture method, comprising:

accessing a video stream;

transmitting the video condensed stream and the characteristic stream to a cloud server;

the cloud server stores the received video condensed stream and characteristic stream, and

receiving an offline retrieval instruction of the terminal, and returning a result obtained by offline retrieval to the terminal for display or/and

Example two

As shown in fig. 1, the present invention provides a digital retina architecture and software architecture method, comprising:

the intelligent converter of the front-end monitoring equipment faces to the intelligent analysis and calculation of the stock cameras, and uniformly converts H.264 video streams output by the multiple paths of stock cameras into video concentrated streams and feature streams (double streams for short), wherein the feature streams comprise results obtained by detecting, tracking, extracting structured information and fusing visual feature calculation of human and vehicle targets in a monitoring video scene;

the intelligent converter of the front-end monitoring equipment encapsulates the video concentrated stream and the feature stream according to a double-stream interaction protocol and transmits the encapsulated video concentrated stream and the feature stream to the cloud server for storage, preferably, the video concentrated stream is stored in a video library, and the feature stream is respectively stored in a fusion visual feature library, a structural library and a picture library according to features.

Preferably, as shown in fig. 2, an intelligent converter of the front-end monitoring device supports access to multiple paths (the number of paths is related to a calculation task) of IP monitoring video streams, and compresses the video streams to obtain concentrated video streams; the intelligent converter completes the functions of detecting, tracking, extracting structural information, integrating visual feature calculation and video transcoding of people and vehicle targets in a monitoring video scene, and can also complete the functions of upgrading a video analysis model, detecting human faces, analyzing pedestrian behaviors, analyzing crowd density and the like according to dynamic configuration of a cloud server control instruction. The intelligent converter receives real-time videos of the camera or stored offline video files, performs input adaptation and system scheduling on the videos, then performs decoding, encoding, detection and tracking on the videos, further obtains structural feature streams of each path of videos through intelligent operation, performs packaging after matching with the video streams, and transmits the structural feature streams to the cloud server. The detection is to detect the relevant area of the target of the data frame, and the tracking is to detect the position change of the target in the video related frame.

Preferably, as shown in fig. 3, the intelligent conversion box of the front-end monitoring device is connected to one camera, and converts the h.264 video stream into a video concentrated stream and a target picture, where the target picture is a result obtained by detecting, tracking, and optimizing a human target and a vehicle target in a monitored video scene, and then encapsulates the target picture according to a "front-end interaction protocol" and transmits the encapsulated target picture to the intelligent converter, and the intelligent converter completes an intelligent operation to obtain a structural feature stream of each video, encapsulates the structural feature stream after matching with the concentrated video stream, and transmits the encapsulated feature stream to the cloud server.

The intelligent conversion box can dynamically configure and complete the upgrading functions of detecting human and vehicle targets and tracking two video analysis models in a monitoring video scene according to the control instruction of the front-end monitoring equipment.

And the cloud server receives a service request of the terminal business application, retrieves the corresponding database, completes the operations of data statistics, analysis, calculation and the like, and returns the obtained result to the terminal for display.

Preferably, the cloud server performs data interaction with the terminal through the application intelligent interaction middleware, and the application intelligent interaction middleware of the cloud server receives the instruction transmitted by the application terminal and sends the result of the instruction executed by the cloud server to the terminal.

Preferably, as shown in fig. 4, the cloud server middleware server is an important component of the digital retina demonstration system, and is responsible for receiving, storing, retrieving and distributing the features and the video, and for defining software of the front-end camera network; and externally outputting support data for service application and system demonstration, retrieval and query results and retrieved video images. The intelligent middleware can be deployed in a client system through a subassembly component according to the actual environment of a user cloud server, can also form an intelligent analysis middleware server, is responsible for storing and gathering characteristic information from an intelligent converter and an intelligent conversion box/camera/chip, and performs data service according to the requirements of a terminal application system.

Preferably, the cloud server receives an offline retrieval instruction of the terminal, an interactive target of the offline retrieval instruction is transmitted to a corresponding calculation engine by referring to an interactive middleware, the corresponding calculation engine calculates the interactive target to obtain a target feature, the target feature is transmitted to a feature storage retrieval engine by applying an intelligent interactive middleware, searching is carried out in a corresponding database to obtain a retrieval IP, retrieval results are called from a video library and a feature library by the retrieval IP, and the retrieval results and videos related to a retrieval structure are called out and transmitted to the terminal for display.

Preferably, when the application terminal selects real-time tracking, the application terminal sends a real-time tracking instruction, and transmits an interactive target of the instruction to the application intelligent interaction middleware of the cloud server, the application intelligent interaction middleware sends the interactive target to the corresponding retrieval engine, the corresponding calculation engine calculates the interactive target to obtain target characteristics, transmits the target characteristics to the double-flow intelligent interaction middleware, the double-flow intelligent interaction middleware transmits the target characteristics to the intelligent converter, the intelligent converter of the front-end monitoring device receives the target characteristics transmitted by the double-flow intelligent interaction middleware of the cloud server, matches the detected real-time target in real time, and returns a real-time matching result. Preferably, the interactive target may be a picture, a structured feature, an unstructured feature, and the like, and the target feature may be a structured feature, a feature, and the like.

Preferably, the cloud server comprises a fusion visual feature engine for calculating the image and the visual feature; the system also comprises a target structuring engine used for calculating the structuring characteristics; the system also comprises a feature storage retrieval engine used for realizing the retrieval and the calling of the features; the video storage and distribution engine is used for calling the videos in the video library. For example, when an application terminal sends a picture retrieval instruction, a picture is transmitted and fused with a visual feature engine by referring to an interactive middleware, the visual feature engine is fused to convert the picture into a feature value of a target feature, the feature value of the target feature is transmitted to a feature storage and distribution engine by using an intelligent interactive middleware, the feature storage and distribution engine retrieves in a visual feature library to obtain a target IP, retrieves a retrieval result obtained by calling a video and the picture by the target IP, and transmits the retrieval result to the application terminal by using the intelligent interactive middleware. When the application terminal sends a structured retrieval instruction, the structured retrieval instruction is transmitted to a target structured engine through a reference interactive middleware, the target structured engine converts the structured retrieval instruction into structural information of target features, the structural information of the target features is transmitted to a feature storage and distribution engine through an application intelligent interactive middleware, the feature storage and distribution engine retrieves in a structured library to obtain a target IP or a statistical result, the video and the picture are called through the target IP, the statistical result or the called video and picture are used as retrieval results, and the retrieval results are transmitted to the application terminal through the application intelligent interactive middleware.

The interactive target can be target attribute, target image and statistical attribute.

The statistical attributes include a people or vehicle number statistic.

As shown in fig. 5, the present invention provides a digital retina architecture and software architecture system, comprising: a front-end monitoring device, a cloud server and a terminal,

the front-end monitoring equipment is used for converting the video stream into a video concentrated stream and a characteristic stream in the front-end monitoring equipment, receiving target characteristics of a real-time tracking instruction of the cloud server, tracking and calculating, and transmitting a retrieval result to the cloud server;

the cloud server is used for receiving the video concentrated stream and the characteristic stream of the front-end monitoring equipment and the retrieval result of real-time tracking, receiving an offline retrieval instruction or/and a real-time tracking instruction of the terminal, and returning the retrieval result obtained by offline retrieval or/and real-time tracking to the terminal for displaying.

The terminal is used for sending an offline retrieval instruction or/and a real-time tracking instruction and receiving a retrieval result obtained by offline retrieval or/and real-time tracking.

Preferably, the cloud server comprises a double-current intelligent interaction middleware module, an application intelligent interaction middleware module, a calculation engine module, a retrieval engine module and a database module,

the double-flow intelligent interaction middleware module receives the video concentrated flow and the characteristic flow transmitted by the intelligent converter and the retrieval result of real-time tracking, stores the video concentrated flow and the characteristic flow in the database module, transmits the retrieval result of real-time tracking to the application intelligent interaction middleware module, receives the target characteristic transmitted by the calculation engine module, and transmits the interaction target of the application intelligent interaction middleware module to the intelligent converter for real-time matching detection;

the application intelligent interaction middleware module receives an offline retrieval instruction or/and a real-time tracking instruction of the terminal, converts an interaction target of the real-time tracking instruction into a target characteristic through calculation of the calculation engine module and then transmits the target characteristic to the double-current intelligent interaction middleware module, transmits the interaction target of the offline retrieval instruction to the calculation engine module, receives the target characteristic of the calculation engine module, transmits the target characteristic to the retrieval engine module, receives a retrieval result obtained through offline retrieval or/and real-time tracking, and transmits the result to the terminal;

the calculation engine module is used for receiving the interactive target searched offline or/and tracked in real time, converting the interactive target into target characteristics, transmitting the target characteristics tracked in real time to the dual-flow intelligent interaction middleware, and transmitting the target characteristics searched offline to the dual-flow intelligent interaction middleware;

the retrieval engine module is used for receiving the target characteristics of the application intelligent interaction middleware module and calling related retrieval results in the database module according to the target IP retrieved in the database module;

the database module is used for storing the video concentration stream and the feature stream.

Preferably, the front-end monitoring device includes an intelligent converter module, and the intelligent converter module is configured to perform decoding, encoding, detection and tracking after performing input adaptation and system scheduling on a video, and further obtain a feature stream of a video stream through intelligent operation, and compress the video stream to obtain a video concentrated stream. And the intelligent converter of the front-end monitoring equipment encapsulates the video concentrated stream and the characteristic stream according to a double-stream interaction protocol and transmits the encapsulated video concentrated stream and the characteristic stream to the cloud server for storage.

Preferably, the database module comprises a video library, a fusion visual feature library, a structured library and a picture library;

the video library is used for storing the video condensed stream;

the picture library is used for storing picture data of the feature stream.

Preferably, the calculation engine module comprises a structural engine submodule and a fusion visual characteristic storage engine submodule;

the fusion visual characteristic storage engine submodule is used for calculating pictures and the like to obtain visual characteristic information of target characteristics;

Preferably, the front-end monitoring device further comprises an intelligent conversion box, the intelligent conversion box is connected with one camera, the H.264 video stream is converted into a video concentrated stream and a target picture, the target picture is a result obtained by detecting, tracking and optimizing a person target and a vehicle target in a monitored video scene, then the result is packaged according to a front-end interaction protocol and transmitted to the intelligent converter, and the intelligent converter completes intelligent operation and transmits the result to the cloud server.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A digital retina architecture and software architecture method, comprising:

accessing a video stream;

2. The method according to claim 1, wherein the converting the video stream into the video concentrate stream and the feature stream at the front-end monitoring device comprises:

compressing the video stream to obtain a video concentrated stream;

3. The method of claim 2, wherein the cloud server stores the received encapsulated video concentrate stream and feature stream, and comprises:

4. The method according to claim 3, wherein the receiving an offline retrieval instruction from the terminal returns the result of the offline retrieval to the terminal for display, and the method specifically comprises:

5. The digital retina architecture and software architecture method of claim 4,

6. A digital retina architecture and software architecture system, comprising: a front-end monitoring device, a cloud server and a terminal,

7. The digital retina architecture and software architecture system of claim 6,

8. The digital retina architecture and software architecture system of claim 7,

9. The digital retina architecture and software architecture system of claim 8,

the video library is used for storing the video condensed stream;

10. The system according to claim 9, wherein the front-end monitor device further comprises a smart converter, the smart converter accesses the video stream, decodes, encodes, detects, and tracks the video stream to obtain a video condensed stream and a target picture, encapsulates and transmits the video condensed stream and the target picture to the smart converter, and the smart converter performs smart operations and sends the results to the cloud server.