CN113467740B

CN113467740B - Video monitoring array display optimization method and device based on joint coding

Info

Publication number: CN113467740B
Application number: CN202110802969.0A
Authority: CN
Inventors: 孙国强; 刘保臣; 杨志刚
Original assignee: Qingdao Bo Tian Tian Tong Information Technology Co ltd
Current assignee: Qingdao Bo Tian Tian Tong Information Technology Co ltd
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2024-02-02
Anticipated expiration: 2041-07-15
Also published as: CN113467740A

Abstract

The invention discloses a video monitoring array display optimization method and device based on joint coding, which belong to the technical field of artificial intelligence, and are implemented by constructing a global encoder and a local encoder; constructing a joint coding monitoring strategy recommendation model containing a global coder and a local coder by using a deep learning cyclic neural network structure; calculating a similarity score by using the representation form of the current monitoring sequence and a bilinear similarity function between each candidate item, and obtaining a probability value of the next occurrence of the corresponding monitoring picture according to the similarity score of each item; optimizing video surveillance array display ordering based on probability values that each surveillance picture appears next; the behavior of the monitoring personnel is visually analyzed by constructing a joint coding monitoring strategy recommendation model containing a global encoder and a local encoder, and then the optimized behavior of the monitoring personnel is automatically captured and summarized by utilizing a cyclic neural network structure.

Description

Video monitoring array display optimization method and device based on joint coding

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a video monitoring array display optimization method and device based on joint coding.

Background

In recent years, with the development of technology and the progress of society, video monitoring has rapidly developed, and is increasingly applied to the traditional and non-traditional security fields. The video monitoring system is one of the most important security measures in the current security. Along with the increase of video monitoring points, the number of required monitoring videos is far greater than the displayable number of monitoring screens of a command center. The supervision personnel can carry out video round inspection through the manual mode, the working strength is high, the efficiency is low, and effective management and control are difficult to realize. With the development of computer vision and artificial intelligence technology, the round-robin mechanism of the intelligent monitoring system lightens the working intensity of supervisory personnel to a certain extent and improves the working efficiency of security management, but the current round-robin mechanism can cause serious information loss.

The existing monitoring camera array sequencing display technology mainly has two thinking directions of fixed regular sequencing display and abnormal picture sequencing display for the display of the monitoring video, the abnormal picture sequencing display method can use a method for calculating video weights based on image comparison, firstly, the weight value of each terminal is calculated based on the difference between the front video and the rear video of a single video acquisition terminal, and then the weight value of each terminal is used as a basis to screen and determine the playing sequence of a plurality of video streams on a monitor screen. The method has good effect on the monitor picture round in a long-term 'dynamic static' state, but has little effect on the monitor picture which continuously and dynamically changes. And secondly, judging whether personnel, abnormal equipment and the like invade or not through a background extraction technology, and carrying out important monitoring camera polling, so that the method has higher requirements on a moving object detection technology, and has higher false alarm rate due to the working environment of the monitoring camera and the reasons of the moving object detection technology. According to the fixed rule ordering display method, fixed picture round inspection is performed at fixed intervals according to the existing experience of monitoring personnel, the monitoring personnel are required to be familiar with risk easily-occurring areas and time, and round inspection monitoring at fixed time and fixed points cannot be performed aiming at high risk areas in different time periods due to fixed round inspection sequences.

The two video monitoring array sequencing display methods have great disadvantages: the round robin sequencing technology based on fixed rules has higher requirement on the experience of monitoring personnel, and can not accurately round different risk areas at different times; the round inspection technology based on the abnormal images is affected by the accuracy of the intelligent image analysis technology, the false alarm rate is high, and the judgment of monitoring personnel on risks is affected.

The video monitoring array ordering display method is theoretically clustered into a recommendation algorithm, and the problem can be effectively solved by using the recommendation algorithm. Monitoring operator information is not of interest in the monitoring system log because the monitoring camera ordering display order is more important to the monitoring system than monitoring operator information. Therefore, the information which can be effectively utilized only has the viewing sequence and the corresponding viewing time of the operator, in this case, the recommendation result which is displayed in the next order is often inaccurate by using the traditional recommendation method, and the recommended result has the problems of hysteresis and repeatability, however, the session-based recommendation system can effectively solve the problems.

Disclosure of Invention

The invention provides a video monitoring array display optimization method and device based on joint coding, which are characterized in that a joint coding monitoring strategy recommendation model containing a global coder and a local coder is constructed, the behaviors of monitoring personnel are visually analyzed, and then the behaviors of the monitoring personnel after optimization are automatically captured and summarized by utilizing a circulating neural network structure.

The specific technical scheme provided by the invention is as follows:

in one aspect, the invention provides a video monitoring array display optimization method based on joint coding, which comprises the following steps:

by taking the whole monitoring sequence as the input of the global encoder, the behavior characteristics of monitoring personnel in the monitoring sequence are taken as the output of the global encoder, and the global encoder is constructed;

dynamically selecting and linearly combining different parts of the input sequence by adopting an object-level attention mechanism to construct a local encoder;

constructing a joint coding monitoring strategy recommendation model containing a global coder and a local coder by using a deep learning cyclic neural network structure;

calculating a similarity score by using the representation form of the current monitoring sequence and a bilinear similarity function between each candidate item, and obtaining a probability value of the next occurrence of the corresponding monitoring picture according to the similarity score of each item;

the video surveillance array display ordering is optimized based on the probability value that each surveillance picture appears next.

Optionally, the building the global encoder specifically includes:

grouping the data sets according to the operation object organization, and sorting the grouped data sets according to the operation time, wherein one object organization arranged according to the time sequence corresponds to a sequence, and the data sets comprise a user name, an operation object, the operation object organization and the operation time;

using ordered data sets according to the formulaCalculating reset gate r _t Wherein sigma is a Sigmoid activation function, x _t For the t-th input data of the global encoder, is->Output data of t-1 th time of global encoder, W _r And U _r Is a weight vector;

according to the formulaCalculating candidate behavior->Wherein r is _t For resetting the door +.>Output data of t-1 th time of global encoder, x _t For the t-th input data of the global encoder, W and U are weight vectors respectively, and as a Hadamard product;

according to the formulaCalculating an update gate, wherein sigma is a Sigmoid activation function, x _t For the t-th input data of the global encoder, is->Output data of t-1 th time of global encoder, W _z And U _z Is a weight vector;

according to the formulaCalculating candidate behavior->Behavior h before it _t-1 Wherein z is _t For updating the door->For candidate behavior, ++>Candidate behavior for output data of the global encoder t-1 th time>Behavior h before it _t-1 Relation of (1)>Is an output sequence of operations characteristic of the global encoder.

Optionally, the constructing a local encoder specifically includes:

according to the formulaCalculating global encoder hidden layer output +.>And a local encoder hidden layer vector representation +.>Wherein matrix A is ₁ For taking->Conversion to a potential space, matrix A ₂ For taking->Converting into a potential space, wherein sigma is a Sigmoid activation function, v ^T Is a dimension conversion matrix;

according to the formulaCalculating a weighting factor alpha, wherein ∈>Is the global encoder hidden layer output; />Is a local encoder hidden layer vector representation;

according to the formulaCalculating the intention coefficient of the monitoring person in the monitoring sequence, wherein a _tj Is a weighting factor; />Is a local encoder hidden layer vector representation.

Optionally, the constructing a joint coding monitoring policy recommendation model including a global encoder and a local encoder specifically includes:

and constructing a joint coding monitoring strategy recommendation model containing a global encoder and a local encoder by using a deep learning cyclic neural network structure, wherein the global encoder is used for summarizing the whole monitoring sequence, and the local encoder is used for adaptively selecting important items in the current session.

Optionally, in the process of constructing the joint coding monitoring policy recommendation model, a global encoderIs integrated into c _t In order to provide a sequential behavior representation of the joint coding monitoring policy recommendation model, global encoder +.>A hidden state is different from the effect of the local encoder, local encoder +.>For calculating the attention weight in the previous hidden state, and the global encoder +.>Is used to encode the entire sequence behavior.

On the other hand, the invention also provides a video monitoring array display optimizing device based on joint coding, which comprises the following steps:

the global construction module is used for constructing a global encoder by taking the whole monitoring sequence as the input of the global encoder and taking the behavior characteristics of monitoring personnel in the monitoring sequence as the output of the global encoder;

the local construction module is used for dynamically selecting and linearly combining different parts of the input sequence by adopting an object-level attention mechanism to construct a local encoder;

the model construction module is used for constructing a joint coding monitoring strategy recommendation model containing a global coder and a local coder by utilizing a deep learning cyclic neural network structure;

the similarity calculation module is used for calculating a similarity score by using the representation form of the current monitoring sequence and a bilinear similarity function between each candidate item, and obtaining a probability value of the next occurrence of the corresponding monitoring picture according to the similarity score of each item;

and the display ordering module is used for optimizing the display ordering of the video monitoring array based on the probability value of each monitoring picture appearing next.

Optionally, the global building module is specifically configured to:

Optionally, the local construction module is specifically configured to:

Optionally, the model building module is specifically configured to:

The beneficial effects of the invention are as follows:

the video monitoring array display optimization method based on joint coding provided by the embodiment of the invention comprises the steps of constructing a global coder by taking the whole monitoring sequence as the input of the global coder and taking the behavior characteristics of monitoring personnel in the monitoring sequence as the output of the global coder; dynamically selecting and linearly combining different parts of the input sequence by adopting an object-level attention mechanism to construct a local encoder; constructing a joint coding monitoring strategy recommendation model containing a global coder and a local coder by using a deep learning cyclic neural network structure; calculating a similarity score by using the representation form of the current monitoring sequence and a bilinear similarity function between each candidate item, and obtaining a probability value of the next occurrence of the corresponding monitoring picture according to the similarity score of each item; optimizing video surveillance array display ordering based on probability values that each surveillance picture appears next; the behavior of the monitoring personnel is visually analyzed by constructing a joint coding monitoring strategy recommendation model containing a global encoder and a local encoder, and then the optimized behavior of the monitoring personnel is automatically captured and summarized by utilizing a cyclic neural network structure.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a video surveillance array display optimization method based on joint coding according to an embodiment of the present invention;

FIG. 2 is a block diagram of a video surveillance array display optimization method device based on joint coding according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a global encoder provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a local encoder according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a joint coding monitoring policy recommendation model according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.

The following will describe in detail a video surveillance array display optimization method and apparatus based on joint coding according to an embodiment of the present invention with reference to fig. 1 to fig. 5.

Referring to fig. 1, fig. 3, fig. 4 and fig. 5, the video monitoring array display optimization method based on joint coding provided by the embodiment of the invention includes:

step 100: by taking the whole monitoring sequence as the input of the global encoder, the behavior characteristics of monitoring personnel in the monitoring sequence are taken as the output of the global encoder, and the global encoder is constructed;

specifically, referring to fig. 3, the data sets are grouped according to the operation object organization, and the grouped data sets are ordered according to the operation time, and one object organization arranged according to the time sequence corresponds to one sequence, wherein the data sets comprise a user name, an operation object organization and the operation time;

according to the formulaCalculating an update gate, wherein sigma is Sigmoid activate function, x _t For the t-th input data of the global encoder, is->Output data of t-1 th time of global encoder, W _z And U _z Is a weight vector;

The whole monitoring sequence is used as the input of the global encoder, and the behavior characteristic of the monitoring personnel in the sequence is used as the output of the global encoder. The data set contains 14 data items of operation users, operation user IP, operation user MAC, operation user organization, operation service, operation action, operation object type, operation object organization, description, operation time, operation result, new value and original value.

The data sets are grouped according to the operation object organization, and the grouped data sets are ordered according to the operation time, and one object organization arranged according to the time sequence corresponds to one sequence. In the global encoder, the input data is divided into batches of fixed size for training, and the size of the batches determines the sum of the number of samples for one training, and influences the optimization degree of the model, and the parameter setting and speed of the model input layer. The correct lot is chosen to find the best balance between memory efficiency and memory capacity.

According to the embodiment of the invention, the global encoder splits the sequence data set of the user according to the characteristics of the user click sequence, takes the last click as input and the next click as output, and maintains the relevance between the data through the corresponding relation between the input and the output, thereby solving the problem that modeling is difficult due to large sequence length difference.

Step 200: dynamically selecting and linearly combining different parts of the input sequence by adopting an object-level attention mechanism to construct a local encoder;

because the vectorization summarization of the whole monitoring sequence by the global encoder is difficult to accurately obtain the intention of the monitoring personnel, and the video monitoring-oriented local encoder is designed based on the vectorization summarization, the video monitoring-oriented local encoder has the advantage of adaptively capturing the intention of the monitoring personnel.

Referring to fig. 4, in the construction process of the local encoder, data sets are grouped according to operation object organizations, and the grouped data sets are ordered according to operation time, one object organization arranged according to time sequence corresponds to a sequence, wherein the data sets include a user name, an operation object organization, and an operation time. In the construction process of the local encoder, the adopted data set is the same as the data set adopted in the construction process of the global encoder, and the data preprocessing mode is the same as that of the global encoder.

Referring to FIG. 4, the formula is shownComputing global encoder hidden layer outputAnd a local encoder hidden layer vector representation +.>Wherein matrix A is ₁ For taking->Conversion to a potential space, matrix A ₂ For taking->Converting into a potential space, wherein sigma is a Sigmoid activation function, v ^T Is a dimension conversion matrix;

Step 300: constructing a joint coding monitoring strategy recommendation model containing a global coder and a local coder by using a deep learning cyclic neural network structure;

referring to fig. 5, a joint coding monitoring policy recommendation model is constructed with a global encoder and a local encoder by using a deep learning cyclic neural network structure, wherein the global encoder is used for summarizing the whole monitoring sequence, and the local encoder is used for adaptively selecting important items in the current session.

And in the process of constructing the joint coding monitoring strategy recommendation model, a global encoderIs integrated into c _t In order to provide a sequential behavior representation of the joint coding monitoring policy recommendation model, global encoder +.>A hidden state is different from the effect of the local encoder, local encoder +.>For calculating the attention weight in the previous hidden state, and the global encoder +.>Is used to encode the entire sequence behavior.

The embodiment of the invention utilizes a deep learning cyclic neural network structure to construct a joint coding monitoring strategy recommendation model containing a global coder and a local coder. For session-based camera surveillance tasks, the global encoder is used to summarize the entire surveillance sequence, while the local encoder may adaptively select the important items in the current session. The sequential behavior facilitates extraction of the primary purpose of the user in the current session. Thus, embodiments of the present invention use the representation of the sequence behavior with the previous hidden state to calculate the attention weight per user click.

Step 400: calculating a similarity score by using the representation form of the current monitoring sequence and a bilinear similarity function between each candidate item, and obtaining a probability value of the next occurrence of the corresponding monitoring picture according to the similarity score of each item;

step 500: the video surveillance array display ordering is optimized based on the probability value that each surveillance picture appears next.

The round inspection means that each picture of all installed cameras is displayed on a screen according to the camera sequence, and every few seconds or minutes is switched to the picture shot by the next camera. The manual clicking of the switching picture is omitted in the polling process, and the method is generally suitable for night guarding of community security rooms and electronic patrol of mall security rooms. The round-robin strategy refers to the display sequence of the round-robin cameras and the switching interval. The cyclic neural network (Recurrent Neural Network, RNN) is a type of recurrent neural network which takes sequence data as input, performs recursion in the evolution direction of the sequence and all nodes are connected in a chained manner; the cyclic neural network has memory, parameter sharing and complete graphics, so that the cyclic neural network has certain advantages in learning the nonlinear characteristics of the sequence.

The embodiment of the invention uses a bilinear decoding scheme (bi-linear decoding scheme), which reduces the number of parameters and improves the performance of the model. Wherein, according to the formulaCalculating a similarity score S using a representation of the current monitored sequence and a bilinear similarity function between each candidate _i Wherein B is a dimension conversion matrix for converting c _t Is converted into and embedded layer->The same dimension. Finally, the similarity score for each item is input to the softmax layer to obtain the probability that the camera view will appear next.

The embodiment of the invention utilizes the operation log of the monitoring system to automatically learn the behaviors of operators, provides a video monitoring array display optimization method based on a global-local joint coding model, and solves the defects that the conventional round robin mechanism has higher requirement on the experience of the monitoring personnel and cannot accurately round the monitored area. The overall operation sequence is summarized by using the global encoder, and the GRU is used as a main unit of the global encoder, so that the GRU has lower calculation complexity and higher expandability, and longer operation sequences are summarized. The main purpose of capturing operators is achieved by adaptively selecting important items in the sequence of operations with a local encoder.

The video monitoring array display optimization method based on joint coding provided by the embodiment of the invention can adopt two indexes of Recall@20 and MMR@20 for evaluation,where TP represents the number of positive classes predicted as positive classes, FN represents the number of positive classes predicted as negative classes, recall@20 represents: in the ranking of the model scores all predicted items, the proportion of correctly predicted items in the first 20 items.

MRR is an indicator used to measure the effectiveness of search algorithms, and is currently widely used in terms of allowing multiple results to be returned, where a model gives a confidence (score) to each returned result, and then ranks the high-scoring results back in front according to the confidence ranking. Specifically: for a query, the average of the reciprocal ranks of the first correct answer (if the correct item returned is outside the top 20, the reciprocal rank score for that item is 0).

The MRR may be calculated using the following formula: wherein Q is a sample query set; the Q| is the number of queries in Q; rank (rank) _i Representing the ranking of the first correct answer in the ith query.

The video monitoring array display optimization method based on joint coding provided by the embodiment of the invention has the advantages that the recall@20 is 48%, the MRR@20 is 22%, and the method is obviously superior to the traditional method in the same scene.

Based on the same inventive concept, referring to fig. 2, an embodiment of the present invention further provides a video surveillance array display optimization device based on joint coding, including:

a global construction module 110, configured to construct a global encoder by taking the entire monitoring sequence as an input of the global encoder, and taking the behavior characteristics of the monitoring personnel in the monitoring sequence as an output of the global encoder;

a local construction module 120 for dynamically selecting and linearly combining different portions of the input sequence to construct a local encoder using an object-level attention mechanism;

the model building module 130 is configured to build a joint coding monitoring policy recommendation model including a global encoder and a local encoder by using a deep learning cyclic neural network structure;

a similarity calculation module 140, configured to calculate a similarity score using the representation of the current monitoring sequence and a bilinear similarity function between each candidate item, and obtain a probability value of the next occurrence of the corresponding monitoring picture according to the similarity score of each item;

the display ranking module 150 is configured to optimize the video surveillance array display ranking based on the probability value that each surveillance picture appears next.

Optionally, the global building module 110 is specifically configured to:

Optionally, the local construction module 120 is specifically configured to:

Optionally, the model building module 130 is specifically configured to:

construction of global-contained compilations using deep-learning recurrent neural network structuresAnd the joint coding monitoring strategy recommendation model of the coder and the local coder is used for summarizing the whole monitoring sequence, and the local coder is used for adaptively selecting important items in the current session. Global encoder in combined coding monitoring strategy recommendation model construction processIs integrated into c _t In order to provide a sequential behavior representation of the joint coding monitoring policy recommendation model, global encoder +.>A hidden state is different from the effect of the local encoder, local encoder +.>For calculating the attention weight in the previous hidden state, and the global encoder +.>Is used to encode the entire sequence behavior.

The embodiment of the invention also provides a video monitoring array display optimizing device based on joint coding, which utilizes an operation log of a monitoring system to construct a joint coding model, utilizes a global encoder to summarize an operation sequence, utilizes a local encoder to adaptively select important items in the operation sequence, captures the main purpose of operators, and can effectively solve the defects that the conventional round-robin mechanism has high requirement on experience of the monitoring personnel and cannot accurately round-robin the monitoring area.

It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims and the equivalents thereof, the present invention is also intended to include such modifications and variations.

Claims

1. The video monitoring array display optimization method based on joint coding is characterized by comprising the following steps of:

optimizing video surveillance array display ordering based on probability values that each surveillance picture appears next;

the building global encoder specifically includes:

according to the formulaCalculating candidate behavior->Behavior h before it _t-1 Wherein z is _t For updating the door->For candidate behavior, ++>Candidate behavior for output data of the global encoder t-1 th time>Behavior h before it _t-1 Relation of (1)>The output operation sequence characteristic of the global encoder;

the construction of the local encoder comprises in particular:

according to the formulaCalculating global encoder hidden layer output +.>And a local encoder hidden layer vector representation +.>Wherein matrix A is ₁ For taking->Conversion to a potential space, matrix A ₂ For connectingConverting into a potential space, wherein sigma is a Sigmoid activation function, v ^T Is a dimension conversion matrix;

according to the formulaCalculating the intention coefficient of the monitoring person in the monitoring sequence, wherein a _tj Is a weighting factor;is a local encoder hidden layer vector representation;

the construction of the joint coding monitoring strategy recommendation model containing the global coder and the local coder specifically comprises the following steps:

2. The video surveillance array display optimization method of claim 1, wherein a global encoder is used in the process of building a joint coding surveillance strategy recommendation modelIs integrated into c _t In order to provide a sequential behavior representation of the joint coding monitoring policy recommendation model, global encoder +.>A hidden state is different from the effect of the local encoder, local encoder +.>For calculating the attention weight in the previous hidden state, and the global encoder +.>Is used to encode the entire sequence behavior.

3. The utility model provides a video monitoring array display optimizing device based on joint coding which characterized in that, video monitoring array display optimizing device includes:

the display ordering module is used for optimizing the display ordering of the video monitoring array based on the probability value of each monitoring picture appearing next;

the global construction module is specifically configured to:

by using after orderingData set according to formulaCalculating reset gate r _t Wherein sigma is a Sigmoid activation function, x _t For the t-th input data of the global encoder, is->Output data of t-1 th time of global encoder, W _r And U _r Is a weight vector;

the local construction module is specifically configured to:

the model construction module is specifically used for:

4. The video surveillance array display optimization apparatus of claim 3, wherein the global encoder is configured to perform a joint encoding surveillance strategy recommendation model construction processIs integrated into c _t In order to provide a sequential behavior representation of the joint coding monitoring policy recommendation model, global encoder +.>A hidden state is different from the effect of the local encoder, local encoder +.>For calculating the attention weight in the previous hidden state, and the global encoder +.>Is used to encode the entire sequence behavior.