CN113051980A

CN113051980A - Video processing method, device, system and computer readable storage medium

Info

Publication number: CN113051980A
Application number: CN201911382234.6A
Authority: CN
Inventors: 黄骞; 王建华; 王昊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2021-06-29

Abstract

The embodiment of the application provides a video processing method, video processing equipment, a video processing system and a computer readable storage medium, which can be applied to an intelligent monitoring system in the field of AI artificial intelligence and are used for analyzing group motion indexes of a target area. The method comprises the following steps: acquiring position information of a target area; acquiring first position information and a first group behavior index corresponding to a first image block from first area video information according to the target area position information; acquiring second position information and a second group behavior index corresponding to the second image block from the second area video information according to the target area position information; and fusing the first group behavior index and the second group behavior index based on the first position information and the second position information to obtain a third group behavior index of the target area, wherein the group behavior analysis result has the capability of global state perception and can be applied to multiple scenes such as intelligent transportation, intelligent monitoring and the like.

Description

Video processing method, device, system and computer readable storage medium

Technical Field

Embodiments of the present disclosure relate to the field of data processing technologies, and in particular, to a video processing method, device, system, and computer-readable storage medium.

Background

With the development of computer vision technology, population detection technology appears in the prior art. The group detection technology can analyze the group behaviors and acquire the abnormal behaviors of the group, and has important application value in the fields of intelligent driving, intelligent traffic, intelligent security and the like.

In the prior art, the population detection technology usually adopts a single member analysis method or a whole body analysis method; the method for analyzing the single member comprises the steps of analyzing individual behaviors in a scene, and then performing group aggregation to obtain a group behavior analysis result; however, the individual analysis is a basic analysis unit, which is inefficient and slow, and therefore, the analysis of a single member is not suitable for the group analysis. The integral analysis method is to extract integral features of the whole scene image and perform behavior analysis according to the extracted integral features to obtain a group behavior analysis result. However, the overall analysis method is used for analyzing a single scene image, lacks the ability of global state perception, and is difficult to be practically applied in each scene.

Disclosure of Invention

The embodiment of the application provides a video processing method, video processing equipment, a video processing system and a computer-readable storage medium, which have global state perception capability on group behavior analysis and can be applied to various scenes.

In a first aspect, an embodiment of the present application provides a video processing method, including:

acquiring target area position information, wherein the target area position information is used for indicating a geographical position corresponding to the target area;

acquiring first position information corresponding to a first image block and a first group behavior index of the first image block from first area video information according to the target area position information, wherein the first position information is used for indicating a first geographical position corresponding to the first image block;

the first area video information may include position information of a plurality of image blocks and group behavior indexes corresponding to the image blocks, where the image blocks are obtained by dividing a monitored image; acquiring first position information of a first image block belonging to a target area and a first group behavior index of the first image block in first area video information, wherein the coverage area of the first image block belongs to the target area;

acquiring second position information corresponding to a second image block and a second group behavior index of the second image block from second area video information according to the target area position information, wherein the second position information is used for indicating a second geographical position corresponding to the second image block;

the second area video information may include position information of a plurality of image blocks and group behavior indexes corresponding to the image blocks, where the image blocks are obtained by dividing the monitored image; acquiring second position information of a second image block belonging to a target area and a second group behavior index of the second image block in the video information of the second area, namely the coverage area of the second image block belongs to the target area; the first region and the second region may be the same, different, or partially the same; similarly, the first image block and the second image block may be the same, may also be different, or may be partially the same; the first position information and the second position information are position information obtained based on the same division standard;

extracting group behavior indexes of related position areas from different area video information, obtaining a third group behavior index corresponding to the target area based on the first position information, the second position information, the first group behavior index and the second group behavior index, wherein the group behavior analysis result has the ability of global state perception and can be applied in each scene, the third group behavior index can be a group behavior index of the target area as a whole, for example, the third group behavior index can be a group behavior index of the whole target area obtained according to each image block, or can be a group behavior index of a partial area in the target area obtained according to partial image blocks, and the third group behavior index includes group behavior indexes of a plurality of partial areas, or can be a group behavior index corresponding to each target position, the third group behavior index includes the group behavior index corresponding to each target location.

In one possible design, the obtaining a third group behavior index corresponding to the target area based on the first location information, the second location information, the first group behavior index, and the second group behavior index includes:

and determining a group behavior index corresponding to a first target position according to the first group behavior index and determining a group behavior index corresponding to a second target position according to the second group behavior index based on first position information and second position information indicating different geographic positions.

The process is equivalent to a splicing processing process, and based on the operation, the group behavior indexes of the image blocks corresponding to the target area can be selected from different or the same video information sources, and the group behavior indexes corresponding to the image blocks corresponding to the target area are spliced, so that the global state perception is realized.

Based on first position information and second position information which indicate the same geographic position, carrying out weighting processing according to the first group behavior index and the second group behavior index to obtain a group behavior index corresponding to a third target position; the process is equivalent to a fusion process, so that the problem that the monitoring identification result is inaccurate due to the shooting angle, the shooting definition and the like of a single monitoring image is avoided;

and obtaining a third group behavior index corresponding to the target area according to the group behavior index corresponding to the first target position, the group behavior index corresponding to the second target position and the group behavior index corresponding to the third target position.

The process is equivalent to a fusion processing process, the group behavior indexes of at least two image blocks corresponding to the same position in the target area can be selected from different or the same video information sources, the group behavior indexes of the at least two image blocks are fused, the group behavior indexes are determined through the plurality of image blocks, and the problem that the group behavior indexes of individual image blocks are inaccurate due to the field of view or the definition and the like is solved.

In one possible design, the performing, based on first location information and second location information indicating a same geographic location, a weighting process based on the first group behavior index and the second group behavior index to obtain a group behavior index corresponding to a third target location includes:

determining a weight coefficient of the first group behavior index according to an actual geographic range covered by the first image block and a preset geographic range, wherein the preset geographic range is a geographic range corresponding to the first image block in the target area; the target area comprises a plurality of preset geographic ranges, the first image block is obtained by segmenting the first image according to the plurality of preset geographic ranges and the geographic range corresponding to the first image, and the actual geographic range covered by the first image block is part or all of the preset geographic range;

determining a weight coefficient of the second group behavior index according to the actual geographic range covered by the second image block and the preset geographic range, wherein the first image block and the second image block correspond to the same preset geographic range; the second image block is obtained by segmenting the second image according to a plurality of preset geographic ranges and geographic ranges corresponding to the second image, wherein the preset geographic ranges are included in the target area, and the actual geographic range covered by the second image block is part or all of the preset geographic range;

and performing weighting processing on the first group behavior index and the second group behavior index according to the weight coefficient of the first group behavior index and the weight coefficient of the second group behavior index to obtain a group behavior index corresponding to a third target position.

The weighting factor may be, for example, a ratio of an actual geographic range covered by each image block to an area covered by a preset geographic range. After the monitoring image is divided, incomplete image blocks exist, the weight occupied by the incomplete image blocks is low, and the weight occupied by the complete image blocks is high, so that the accuracy of fusion is ensured.

In one possible design, the first image block is an image block obtained by processing a first image according to a global discrete grid, the first location information is first encoding information of a first grid in the global discrete grid corresponding to the first image block, and the first encoding information is used to indicate the first geographic location corresponding to the first grid;

the second image block is an image block obtained by processing a second image according to the global discrete grid, the second location information is second coded information of a second grid in the global discrete grid corresponding to the second image block, and the second coded information is used for indicating the second geographic location corresponding to the second grid.

Segmenting the monitored image based on the monitored region and the geographic locations indicated by the grids in the global discrete grid, therefore, each image block corresponds to a grid, the geographical position indicated by each image block is consistent with the geographical position indicated by the grid, monitoring images shot by different monitoring equipment are subjected to grid division of unified space-time reference based on the global discrete grid, and the image blocks obtained by segmentation realize the original space-time correlation through the grids in the global discrete grids, and based on the group behavior indexes corresponding to the grids in the global discrete grids, thereby fast splicing and fusing can be carried out through the grid to obtain the group behavior index of the target area, therefore, the system can be uniformly organized, managed and applied to support the requirements of large-range multi-view group feature fusion and multi-scale aggregation analysis, and can be applied to various scenes.

In a possible design, the obtaining, according to the group behavior index corresponding to the first target position, the group behavior index corresponding to the second target position, and the group behavior index corresponding to the third target position, a third group behavior index corresponding to the target area includes:

obtaining a mesh of a first scale level in the global discrete mesh covered by the target region according to the first coded information and the second coded information, where the first coded information is further used for indicating the scale level of the first mesh, and the second coded information is further used for indicating the scale level of the second mesh;

obtaining a group behavior index corresponding to a grid at a second scale level according to the group behavior index corresponding to the grid at the first scale level; the area corresponding to the grid corresponding to the second scale level is equal to the sum of the areas of the preset number of grids corresponding to the first scale level;

and obtaining a third group behavior index corresponding to the target area according to the group behavior index corresponding to the grid of the second scale level.

The process is equivalent to a polymerization process, the polymerization refers to merging grids at the same scale level to obtain grids at a larger scale, and multi-scale identification is realized through the polymerization process, namely, in the identification process, a group behavior analysis result at any scale level can be obtained according to actual requirements, so that the method can be compatibly applied to various scenes without changing equipment or software.

In a possible design, the obtaining, according to the group behavior index of the target position corresponding to the grid at the first scale level, the group behavior index corresponding to the grid at the second scale level includes:

and averaging the group behavior indexes of the target positions corresponding to the grids at the first scale level to obtain the group behavior indexes corresponding to the grids at the second scale level. Through the averaging mode, the group behavior index corresponding to the grid with the second scale level can be quickly and accurately obtained.

In one possible design, the method further includes:

acquiring the first area video information from first monitoring equipment, wherein the first area video information also comprises a video image;

and acquiring the second area video information from second monitoring equipment, wherein the second area video information also comprises video images.

When the regional video information is generated by the monitoring equipment, software of the monitoring equipment can be improved, so that the monitoring equipment can generate the regional video information, and the timeliness of video processing is improved. When the population indicates an abnormal result for the analysis result, a specific abnormal event can be obtained by playing the monitoring image, so that the abnormal problem can be quickly positioned through visual display. The regional video information comprises position information of a plurality of image blocks and group behavior indexes corresponding to the image blocks, wherein the image blocks are obtained by dividing monitoring images acquired by monitoring equipment. For example, the location information may be encoded information of a mesh in a global discrete mesh.

In one possible design, the method further includes:

acquiring a first video stream and first indication information used for indicating the position and the view field information of a first monitoring device from the first monitoring device, and acquiring first area video information according to the first video stream and the first indication information;

and acquiring a second video stream and second indication information used for indicating the geographical position and the view field information of a second monitoring device from the second monitoring device, and acquiring second area video information according to the second video stream and the second indication information.

When the regional video information is generated by the video processing device, no improvement on the software of the monitoring device is needed, and the software requirement on the monitoring device is reduced. The video processing device may determine the location of the monitoring area of the monitoring device based on the location and field of view information of the first monitoring device. Or the monitoring equipment determines the position of the monitoring area according to the position of the monitoring equipment and the field-of-view information, and sends the position of the monitoring area to the video processing equipment.

In one possible design, the first group behavior index includes at least one of: estimating population density, population kinetic energy, population motion direction entropy or population distance potential energy;

the second population behavior indicator comprises at least one of: population density estimation, population kinetic energy, population motion direction entropy or population distance potential energy.

According to the group density estimation, the group kinetic energy, the group motion direction entropy or the group distance potential energy, group ordered motion recognition, group disordered motion recognition, group mutation recognition and the like can be carried out, group behavior analysis results can be obtained from multiple angles, and therefore the required analysis results are provided for various scenes.

In one possible design, if the third group behavior index indicates a group behavior anomaly, the method further includes:

and sending prompt information to the terminal equipment, wherein the prompt information is used for indicating that the group behavior is abnormal.

This terminal equipment can carry out visual large-size screen show, when the group action is unusual, can also warn through sound or light etc for relevant staff can discover and handle the incident the very first time, reach the target of accurate prevention and control, initiative prevention and control.

In a second aspect, an embodiment of the present application provides a video processing apparatus, including:

the acquisition module is used for acquiring target area position information, and the target area position information is used for indicating a geographic position corresponding to the target area;

the processing module is used for acquiring first position information corresponding to a first image block and a first group behavior index of the first image block from first area video information according to the target area position information, wherein the first position information is used for indicating a first geographical position corresponding to the first image block;

the processing module is further configured to obtain second position information corresponding to a second image block and a second group behavior index of the second image block from second area video information according to the target area position information, where the second position information is used to indicate a second geographic position corresponding to the second image block;

the processing module is further configured to obtain a third group behavior index corresponding to the target area based on the first location information, the second location information, the first group behavior index, and the second group behavior index.

In one possible design, the processing module is specifically configured to:

determining a group behavior index corresponding to a first target position according to the first group behavior index and determining a group behavior index corresponding to a second target position according to the second group behavior index based on first position information and second position information indicating different geographic positions;

based on first position information and second position information which indicate the same geographic position, carrying out weighting processing according to the first group behavior index and the second group behavior index to obtain a group behavior index corresponding to a third target position;

In one possible design, the processing module is specifically configured to: determining a weight coefficient of the first group behavior index according to an actual geographic range covered by the first image block and a preset geographic range, wherein the preset geographic range is a geographic range corresponding to the first image block in the target area;

determining a weight coefficient of the second group behavior index according to the actual geographic range covered by the second image block and the preset geographic range, wherein the first image block and the second image block correspond to the same preset geographic range;

In one possible design, the processing module is specifically configured to: obtaining a mesh of a first scale level in the global discrete mesh covered by the target region according to the first coded information and the second coded information, where the first coded information is further used for indicating the scale level of the first mesh, and the second coded information is further used for indicating the scale level of the second mesh;

In one possible design, the processing module is specifically configured to: and averaging the group behavior indexes of the target positions corresponding to the grids at the first scale level to obtain the group behavior indexes corresponding to the grids at the second scale level.

In one possible design, the obtaining module is further configured to:

acquiring the first area video information from first monitoring equipment;

and acquiring the second area video information from second monitoring equipment.

In one possible design, the obtaining module is further configured to: acquiring a first video stream and first indication information for indicating position and field-of-view information of a first monitoring device from the first monitoring device;

acquiring a second video stream and second indication information used for indicating the geographical position and the visual field information of a second monitoring device from the second monitoring device;

the processing module is further configured to: acquiring the first area video information according to the first video stream and the first indication information;

and acquiring the second area video information according to the second video stream and the second indication information.

In one possible design, further comprising:

and the sending module is used for sending prompt information to terminal equipment if the third group behavior index indicates that the group behavior is abnormal, wherein the prompt information is used for indicating that the group behavior is abnormal.

In a third aspect, an embodiment of the present application provides a video processing apparatus, including: a memory for storing a computer program and a processor for calling and executing the computer program from the memory, such that the processor executes the computer program to perform the video processing method as set forth in the first aspect or various possible designs of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, where a computer program is stored, and the computer program, when executed, is capable of implementing the video processing method according to the first aspect or various possible designs of the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, including a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that a communication device in which the chip is installed performs a video processing method as described in the first aspect or various possible designs of the first aspect.

In a sixth aspect, embodiments of the present application further provide a computer program product, where the computer program product includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the video processing method according to the first aspect or various possible designs of the first aspect.

In a seventh aspect, an embodiment of the present application further provides a video processing system, where the system includes: the system comprises video processing equipment, first monitoring equipment and second monitoring equipment; wherein

The first monitoring equipment is used for acquiring video information of a first area;

the second monitoring equipment is used for acquiring video information of a second area;

the video processing apparatus is configured to perform the video processing method as set forth in the first aspect or various possible designs of the first aspect.

In an eighth aspect, an embodiment of the present application further provides a video processing system, where the system includes: the system comprises video processing equipment, first monitoring equipment and second monitoring equipment; wherein

The first monitoring device is used for sending a first video stream and first indication information to the video processing device, the first indication information is used for indicating the position and the view field information of the first monitoring device, and the first area video information is information determined according to the first video stream and the first indication information;

the second monitoring device is configured to send a second video stream and second indication information to the video processing device, where the second indication information is used to indicate position and field of view information of the second monitoring device, and the second area video information is information determined according to the second video stream and the second indication information;

In the seventh or eighth aspect described above, the first and second monitoring devices are devices having a shooting function, and the first and second monitoring devices may be monitoring cameras, monitoring terminals, or the like. In a possible implementation, the first monitoring device and the second monitoring device may also be the same monitoring device.

The video processing device may be a cloud server, or may be an end-side host, such as a host in a local area network, or may be a terminal device having a processing function, or may also be a monitoring device having a processing function, and the monitoring device is in wired or wireless connection with other monitoring devices;

the first area video information comprises position information of a plurality of first image blocks and group behavior indexes corresponding to the first image blocks; the second area video information includes position information of a plurality of second image blocks and group behavior indexes corresponding to the second image blocks.

In the video processing method provided by this embodiment, by acquiring the position information of the target area, the position information of the target area is used to indicate the geographic position corresponding to the target area; acquiring first position information corresponding to a first image block and a first group behavior index of the first image block from first area video information according to the position information of the target area, wherein the first position information is used for indicating a first geographical position corresponding to the first image block; acquiring second position information corresponding to the second image block and a second group behavior index of the second image block from the second area video information according to the target area position information, wherein the second position information is used for indicating a second geographic position corresponding to the second image block; the third group behavior index corresponding to the target area is obtained based on the first position information, the second position information, the first group behavior index and the second group behavior index, the group behavior index of the target area is obtained through the position information, the primary space-time relevance is realized, the modes of rapid splicing fusion and the like can be supported, the global state perception capability is achieved, the group behavior indexes in all scenes can be obtained, and the accurate basis of visualization, early warning, decision analysis and information feedback of different service scenes is provided.

Drawings

Fig. 1 is a schematic diagram of a video processing system according to an embodiment of the present application;

fig. 2 is a flowchart of a video processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a global discrete grid according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an intelligent driving scenario system provided in an embodiment of the present application;

fig. 5 is a flowchart of a video processing method according to an embodiment of the present application;

fig. 6 is a schematic view of a monitoring scenario provided in an embodiment of the present application;

fig. 7A is a schematic view of a monitoring image segmentation of a monitoring device a according to an embodiment of the present application;

fig. 7B is a schematic diagram illustrating mapping of image blocks of a monitoring image of the monitoring device a onto a grid according to the embodiment of the present application;

fig. 8A is a schematic view of a monitoring image segmentation of a monitoring device B according to an embodiment of the present application;

fig. 8B is a schematic diagram illustrating mapping image blocks of a monitoring image of a monitoring device B to a grid according to an embodiment of the present application;

FIG. 9 is a schematic illustration of a process for stitching fusion according to an embodiment of the present application;

FIG. 10 is a schematic illustration of a polymerization process provided by an embodiment of the present application;

fig. 11 is a schematic diagram of a video processing apparatus according to an embodiment of the present application;

fig. 12 is a schematic diagram of a video processing apparatus according to an embodiment of the present application.

Detailed Description

The network architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and as a person of ordinary skill in the art knows that along with the evolution of the network architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

Fig. 1 is a schematic diagram of a video processing system according to an embodiment of the present application. As shown in fig. 1, the video processing system provided in this embodiment includes: video processing equipment, monitoring equipment and terminal equipment. The video processing equipment can acquire group behaviors in the video, and can perform analysis, early warning, visualization and other applications according to the group behaviors aiming at service scene requirements.

The number of the monitoring devices can be one or more than one. When the number of the monitoring devices is two or more, in some scenarios, the areas photographed by each monitoring device are different, or there is overlap of the areas photographed by some monitoring devices. The monitoring equipment can be equipment for a security system, can monitor people in public places, and can also be equipment for an intelligent transportation system, and is used for monitoring traffic flow and people. The monitoring device provided in this embodiment may also be a device for monitoring group movements in other scenarios, and this embodiment is not particularly limited herein.

After the monitoring device captures a monitored image of a monitored area, the monitoring device may obtain area video information according to the monitored image and the position of the area captured by the monitored image, and send the area video information to the video processing device, or the monitoring device sends monitoring operation information to the video processing device, where the monitoring operation information includes the monitored image and indication information indicating the position and view field information of the monitoring device. The video processing equipment acquires regional video information according to the monitoring image and the indication information.

In this embodiment, the video processing device may be a cloud server, or may be an end-side host, for example, a host in a local area network, or may be a terminal device having a processing function, or may also be a monitoring device having a processing function, where the monitoring device is connected to other monitoring devices in a wired or wireless manner. The implementation manner of the video processing device is not particularly limited in this embodiment, and as long as the video processing device has a processing function, devices capable of implementing the video processing method in the embodiment of the present application are all in the protection scope of the embodiment of the present application.

Meanwhile, the monitoring device can send regional video information or monitoring operation information to the cloud server, and also can send regional video information or monitoring operation information to the cloud server and the end-side host, namely sending information to various video processing devices. The embodiment of the application does not particularly limit the area video information or the monitoring operation information sent by the monitoring equipment.

In a possible implementation manner, the monitoring image may be divided into a plurality of image blocks, and the geographic position corresponding to each image block and the group behavior index represented by the image block are obtained. The video region information may include the geographic location and group behavior index corresponding to each image block.

The image blocks of the monitoring images of the plurality of monitoring devices can be mapped to a geographical position, so that the monitoring images can be mapped to the same space and time. The video processing equipment can accurately acquire the group behavior index of a certain position according to the video region information with spatial relevance so as to increase the global perception capability.

In order to realize the perception of the global state of the monitoring area, the group behavior indexes are applied to all scenes, the group behavior indexes of a plurality of image blocks are subjected to fusion processing aiming at the same geographic position based on the geographic position corresponding to each image block, and the group behavior indexes of the plurality of image blocks are subjected to splicing processing aiming at different geographic positions, so that the group behavior index of one area is accurately obtained, namely the global group behavior index is obtained, and the method is an accurate basis for visualization, early warning, decision analysis and information feedback of different service scenes.

Wherein the group behavior index includes, but is not limited to, at least one of: population density estimation, population kinetic energy, population motion direction entropy or population distance potential energy. For ease of understanding, each behavior index will be described in detail below. The following acquisition manners of group density estimation, group kinetic energy, group motion direction entropy, group distance potential energy, and the like are only one exemplary manner of acquiring a group behavior, and in a specific implementation manner, the group behavior may be acquired in other manners, which is not limited in this embodiment.

Population density estimation: the population density is estimated by computing the corner density within the foreground mask. Detecting the corners in the foreground mask range in each image block, counting the number of corners, wherein the population density is the ratio of the number of corners to the number of foreground pixels, the corner may be a feature From Acquired Segment (FAST) corner, taking FAST corner as an example, the population density estimation may be expressed by the following formula (1):

where ρ is the population density estimate, N_FASTNumber of FAST corner points in each image block, N_ROIThe number of foreground pixels in the image block.

Group kinetic energy: the speed and intensity of the group motion speed are represented by a motion vector based on an optical flow field, the sum of group motion energy is represented by calculating the optical flow energy of a FAST corner point in the range of a foreground mask, and the group kinetic energy in each image block, namely the group motion average kinetic energy, can be represented by a formula (2):

wherein E represents the group kinetic energy, N represents the total number of motion vectors, i.e. the number of FAST corners, m_iThe quality weight value representing the ith motion vector is usually set to a constant value, V_iIndicating the magnitude of the velocity of the ith motion vector.

Entropy of population movement direction: the degree of dispersion and the degree of confusion in the direction of population motion. The extraction basis of the group motion direction entropy is an optical flow vector, and the specific calculation steps are divided into three steps of direction histogram calculation, direction probability distribution calculation and direction entropy calculation:

calculating a direction histogram, taking a FAST corner point in the foreground mask range as a feature point, and setting a motion vector Vi of the feature point i to be (v)_x,v_y) The moving direction angle can be expressed by formula (3):

wherein A is_iAnd (3) representing a motion direction angle, namely the direction of the optical flow vector of the feature point, counting the directions of the optical flow vectors of the feature point, establishing a direction histogram by taking 45-degree intervals as intervals, wherein each element in the histogram represents the number of the directions of the optical flow vectors of the feature point in the image block falling into the direction interval, and thus obtaining the group motion direction.

The directional probability distribution calculation, which calculates the distribution probability of each direction from the directional histogram, can be expressed by formula (4):

wherein, P_iRepresents the distribution probability, h_iThe number of the orientation corners of the orientation histogram is shown, and N is the total number of the motion vectors.

The direction entropy calculation is to calculate the group motion direction entropy based on the direction probability distribution, and can be represented by formula (5):

wherein, O represents the entropy of the group movement direction, and Pi represents the distribution probability.

Group distance potential energy: the aggregation degree among the population individuals represents the relative distance among the individuals by using the average value of the distances between the angular points, and the extraction of population distance potential energy is realized by calculating the Euclidean distance among the angular points. The group distance potential in an image block may be represented by equation (6):

wherein Cij represents the euclidean distance between two corner points, and N is the total number of corner points in the image block.

The video processing device can obtain the group behavior analysis result according to the global group behavior index and send the group behavior analysis result to the terminal device. Or when the group behavior analysis result indicates that the group behavior is abnormal, the analysis result is sent to the terminal equipment. The terminal device may be, for example, a device having a sound and/or image display function, such as a mobile phone, a vehicle-mounted terminal, a computer, and an advertisement screen, and the terminal device is not particularly limited in this embodiment. The terminal equipment can display the group behavior analysis result and give alarm information when the group behavior is abnormal.

The following takes the embodiment shown in fig. 2 as an example, and in conjunction with the embodiment shown in fig. 1, a detailed description is given to a video processing method provided in the embodiment of the present application. Fig. 2 is a flowchart of a video processing method according to an embodiment of the present application. For example, the execution main body of the video processing method provided in this embodiment may be that the video processing device shown in the embodiment of fig. 1 is a cloud server, which is described above, and when the video processing device is an end-side host or a terminal device, the implementation manner thereof is similar, and details of this embodiment are not repeated here. As shown in fig. 2, the method includes:

s201, obtaining position information of a target area, wherein the position information of the target area is used for indicating a geographical position corresponding to the target area.

In some application scenarios, some target areas need to be monitored to obtain group behaviors in the target areas. The scene may be, for example, a traffic scene, or a security scene, and the like, and the implementation manner of the application scene is not particularly limited in this embodiment.

Taking the application of the embodiment to security scenes as an example, the target area is a key sensitive field area which is easy to cause a group event, such as an airport, a station, a dock, a subway station, a security inspection station, an exhibition, a garden, a public security checkpoint, and the like.

When the target area needs to be monitored, the position information of the target area is obtained, and the position information of the target area can be any information used for indicating the geographic position corresponding to the target area. The geographic location corresponding to the target area may represent a location of a geographic range covered by the target area. For example, the position information of the target area may be latitude and longitude information, latitude and longitude of a boundary point of the target area may be given, latitude and longitude of a central point of the target area may also be given, and then a range value is given for the central point; the target region location information may also be location information indicated by a mesh in the global discrete mesh.

S202, acquiring first position information corresponding to a first image block and a first group behavior index of the first image block from first area video information according to the target area position information, wherein the first position information is used for indicating a first geographical position corresponding to the first image block.

S203, acquiring second position information corresponding to a second image block and a second group behavior index of the second image block from second area video information according to the target area position information, wherein the second position information is used for indicating a second geographical position corresponding to the second image block.

After the position information of the target area is determined, image blocks of which the coverage areas belong to the target area are determined from the area video information corresponding to the monitoring equipment, so that splicing and fusion processing are performed on group behavior indexes corresponding to the image blocks.

The plurality of regional video information may correspond to one or two or more monitoring devices. That is, one monitoring device may correspond to one regional video information, or to two or more regional video information.

The regional video information may be generated by a monitoring device and the server then receives the regional video information from the monitoring device. The regional video information may also be generated by a cloud server that obtains a video stream from the monitoring device and then obtains the regional video information from the video stream.

When the regional video information is generated by the monitoring equipment, software of the monitoring equipment can be improved, so that the monitoring equipment can generate the regional video information, and the timeliness of video processing is improved. When the area video information is generated by the cloud server, the software of the monitoring device does not need to be improved, and the software requirement on the monitoring device is reduced. This regional video information is specifically by supervisory equipment generation or by the cloud server generation, and this embodiment does not do the special restriction here, and the scene selection can be used according to concrete application.

The process of generating the video information of the area by the monitoring equipment and the cloud server is similar, namely, the monitoring image is extracted from the video stream, then the actual geographic position of the monitoring image is determined according to the view field and the position of the monitoring equipment, and then the video information of the area is generated according to the actual geographic position. For the cloud server to generate the regional video information, the monitoring device may send the video stream and indication information for indicating the geographic location and the view field information of the monitoring device to the cloud server, where the view field information includes the view axis direction and the view field size.

In the process of generating regional video information, the position of a camera of the monitoring device can be determined according to a coordinate calibration method of the monitoring device, and a person skilled in the art can understand that the position is a three-dimensional position in a three-dimensional space, and then an actual geographic range corresponding to a video image shot by the monitoring device is determined according to the position of the camera, the direction of a visual axis and the size of a visual field.

And according to the actual geographic range corresponding to the video image, segmenting the video image to obtain at least one image block. Specifically, the actual geographic range may be divided into a plurality of small areas, each area corresponds to one image block, and then a geographic position corresponding to the image block, that is, a geographic position corresponding to the small area, is determined, where the geographic position may be a position of a center of the small area or a position corresponding to a range of the small area.

After the image block is obtained through division, the group behavior index represented by the image block is obtained. The group behavior index includes, for example, one or more of the group density estimates, group kinetic energy, group motion direction entropy, or group distance potential energy described above. The embodiment does not particularly limit the implementation manner of the group behavior index.

After the position information of the target area is obtained, image blocks belonging to the target area, namely the image blocks with the coverage ranges being subsets of the coverage range of the target area, are obtained according to the position information of the target area, so that the group behavior indexes indicated by each image block are processed to obtain the group behavior indexes corresponding to the target area.

In this embodiment, for convenience of description, two area video information among the plurality of area video information will be described as an example. Specifically, first position information corresponding to the first image block and a first group behavior index of the first image block are acquired from the first area video information according to the target area position information, and the first position information is used for indicating a first geographical position corresponding to the first image block.

Correspondingly, second position information corresponding to the second image block and a second group behavior index of the second image block are obtained from the second area video information according to the position information of the target area, and the second position information is used for indicating a second geographic position corresponding to the second image block

Taking the first image block as an example, the first video area information may include position information corresponding to a plurality of image blocks and a group behavior index of each image block. According to the position information of the target area and the video information of the first area, first position information of a first image block with a coverage area belonging to the coverage area of the target area and a first group behavior index of the first image block can be acquired.

For the implementation of the second image block, the implementation of the second image block is similar to that of the first image block, and this embodiment is not described herein again.

S204, obtaining a third group behavior index corresponding to the target area based on the first position information, the second position information, the first group behavior index and the second group behavior index.

And obtaining the group behavior index of each geographic position according to the group behavior index corresponding to each position information.

The description is continued by taking the first position information and the second position information as an example. When the first position information and the second position information indicate different geographic positions, determining a group behavior index corresponding to a first target position according to the first group behavior index, and determining a group behavior index corresponding to a second target position according to the second group behavior index, wherein the first target position is the same position as the first geographic position existing in the target area, and the second target position is the same position as the second geographic position existing in the target area.

And when the first position information and the second position information indicate the same geographic position, performing weighting processing based on the first group behavior index and the second group behavior index to obtain a group behavior index corresponding to the third target position.

In one possible implementation, the weighting coefficients may be determined according to the positions of the image blocks in the monitored image, for example, the weighting coefficients corresponding to the image blocks located at non-edge positions of the monitored image are greater than the weighting coefficients corresponding to the image blocks located at edge positions of the monitored image.

In another possible implementation manner, different monitoring devices correspond to different weighting coefficients, for example, a weighting coefficient of a first group behavior index corresponding to a first monitoring device is a, and a weighting coefficient of a second group behavior index corresponding to a second monitoring device is B.

In another possible implementation manner, a weight coefficient of a first group behavior index is determined according to an actual geographic range covered by the first image block and a first preset geographic range, where the first geographic position is a position used for indicating the first preset geographic range; determining a weight coefficient of a second group behavior index according to an actual geographic range covered by a second image block and a second preset geographic range, wherein the second geographic position is used for indicating the second preset geographic range, and the first preset geographic range and the second preset geographic range are the same preset geographic range; and performing weighting processing on the first group behavior index and the second group behavior index according to the weight coefficient of the first group behavior index and the weight coefficient of the second group behavior index to obtain a group behavior index corresponding to the third target position.

Specifically, the target area may be divided into a plurality of preset geographic ranges, the area of each preset geographic range may be the same or different, and the video image is divided by using the preset geographic range as a reference to obtain a plurality of image blocks. The image blocks included in each video image correspond to a preset geographic range, and the actual geographic range covered by each image block is part or all of the preset geographic range. For example, the actual geographic range covered by the image block located in the middle of the video image is the whole of the preset geographic range, and the actual geographic range covered by the image block located at the edge of the video image is a part of the preset geographic range. The geographic location of each image block can be understood as the geographic location of the pre-divided preset geographic range.

Therefore, the first image block corresponds to a first preset geographic range, and the weight coefficient of the first group behavior index is determined according to the ratio of the actual geographic range covered by the first image block to the first preset geographic range; and the second image block corresponds to a second preset geographic range, and the weight coefficient of the second group behavior index is determined according to the ratio of the actual geographic range covered by the second image block to the second preset geographic range. The first preset geographic range and the second preset geographic range are the same preset geographic range.

After the group behavior indexes corresponding to the target positions are obtained, a third group behavior index corresponding to the target area is obtained according to the group behavior index corresponding to the first target position, the group behavior index corresponding to the second target position and the group behavior index corresponding to the third target position.

For example, the group behavior index corresponding to each target location may be weighted or averaged to obtain a third group behavior index in the target area. The group may be a traffic flow, a crowd, or other movable group, and this embodiment is not particularly limited herein.

Taking the embodiment as an example of being applied to the security field, for example, the embodiment intelligently monitors important sensitive places which are easy to cause group accidents, such as airports, stations, docks, subway stations, security check stations, exhibitions, parks, security checkpoints and the like in real time, and realizes video analysis and early warning of the group emergency. According to the monitoring and identifying result, group behavior analysis processing is carried out, and people ordered movement identification, people disordered movement identification, people mutation identification and the like can be realized.

Identifying the ordered movement of the crowd: the ordered movement of the crowd, such as group events like meetings, tourist shows and impacting various mechanisms, is identified through the crowd density, the movement kinetic energy, namely the movement direction consistency and other factors. For example, when the crowd density in the target area is greater than the threshold 0.5, the crowd average kinetic energy is greater than the threshold 0.5, and the crowd movement direction entropy is less than the threshold 0.3, it is determined that the crowd in the area has orderly movement, and there may be group events such as meetings, parades, and the like.

Identifying disordered movement of people: the disordered movement of the crowd is identified through the crowd density, the movement kinetic energy, namely the movement direction consistency and other factors, such as crowd violence conflict, crowd trampling accidents, crowd running, crowd scattering and people wandering, and corresponding early warning is sent out. For example, when the crowd density in the target area is greater than the threshold value 0.5, the crowd average kinetic energy is greater than the threshold value 0.5, and the crowd movement direction entropy is greater than the threshold value 0.5, it is determined that the crowd in the area has disordered movement, and abnormal events such as riot conflict, crowd trampling, crowd running in a scattered way, and the like may exist.

Those skilled in the art will understand that the group behavior analysis result may be obtained from a single monitoring image, or may be obtained from a plurality of monitoring images that are consecutive in time. When a group behavior analysis result is obtained according to a plurality of continuous monitoring images in time, people sudden change identification can be carried out, such as identifying the situation that people movement characteristics have sudden changes through the factors of people movement kinetic energy, people movement direction consistency, people distance potential energy and the like, and the sudden changes such as direction burst, rapid walking and rapid running and the like.

If the third group behavior index indicates that the group behavior is abnormal, prompt information can be sent to the terminal device, and the prompt information is used for indicating that the group behavior is abnormal. This terminal equipment can carry out visual large-size screen show, when the group action is unusual, can also warn through sound or light etc for relevant staff can discover and handle the incident the very first time, reach the target of accurate prevention and control, initiative prevention and control.

In a possible implementation manner, the monitoring image may be divided by a global discrete grid to obtain a plurality of image blocks. Specifically, based on the global discrete grid, the monitoring images of different monitoring devices are mapped into the grid of the global discrete grid with a uniform space-time reference, so as to obtain regional video information, where the regional video information is information with fused space-time attributes, and the regional video information may include, for example, position information of each image block and a group behavior index represented by each image block.

Since the monitoring images of the multiple monitoring devices can be mapped to the same global discrete grid, so that the monitoring images can be mapped to the same space and time, the monitoring devices can be referred to as monitoring devices with space and time correlation. The cloud server can accurately acquire the group behaviors of a certain position according to the monitoring equipment with spatial relevance so as to increase the global perception capability.

The global discrete grid is characterized in that the earth space is divided into a multilayer discrete patch system which is similar in area and shape, can be infinitely subdivided and is seamless and non-overlapped through a certain division model, and hierarchical recursive division of the space and a multi-scale nesting relation of the division patches in the earth space are formed. The present embodiment does not particularly limit the granularity of the division and the size of the mesh. Wherein, the grid refers to a grid divided. Each multi-scale grid or grid body can be endowed with a unique code which can be used as a position identifier of an earth space region, all information and data on the earth can fall into one or more grids, and corresponding grid codes can be endowed, so that unified organization, calculation and service of earth big data are realized.

Fig. 3 is a schematic diagram of a global discrete grid according to an embodiment of the present application. As shown in fig. 3, the discrete grids become smaller with successive subdivisions in the geospatial plane. In the embodiment shown in fig. 3, for example, a quadtree-split grid system with equal longitude and latitude is used, the plane of the earth space is first divided into four equal parts by taking the intersection point of the meridian and the equator as the center, which is a 0-level grid. And continuously subdividing each grid by using a quadtree subdivision method to obtain a secondary grid, and repeating the process until the subdivision level of the required granularity is reached. In the subdivision process, each grid and the actual geographic space form a mapping relation and a unique grid code is given based on the space filling curve. Those skilled in the art will appreciate that the actual geospaties located on the same mesh correspond to the same mesh code since the actual geospaties are mapped to the same mesh.

In this embodiment, the monitoring images captured by different monitoring devices are segmented based on the global discrete grid to obtain image blocks, each image block corresponds to one grid of the global discrete grid, so that a plurality of image blocks all correspond to the global discrete grid, that is, are located on the same space-time reference, the group behavior index represented by each image block is obtained, and the group behavior indexes are subjected to fusion splicing, aggregation and other processing to obtain a group behavior analysis result.

Thus, taking the first image block and the second image block as an example, the first image block is an image block obtained by processing the first image according to the global discrete grid, the first position information is first encoding information of the first grid in the global discrete grid corresponding to the first image block, and the first encoding information is used to indicate a first geographic position corresponding to the first grid; the second image block is an image block obtained by processing a second image according to the global discrete grid, the second location information is second coded information of a second grid in the global discrete grid corresponding to the second image block, and the second coded information is used for indicating a second geographic location corresponding to the second grid.

The video processing system provided by the embodiment of the application can be applied to the fields of intelligent driving, intelligent traffic, intelligent security and the like. In this embodiment, the video processing system is applied to intelligent driving as an example, and on the basis of intelligentization of monitoring of people and vehicles in public places, traffic operation states and people flow characteristics are comprehensively sensed, traffic operation big data are collected and analyzed in real time, and prediction and early warning of traffic flows, monitoring and early warning of traffic accidents and intelligent scheduling are achieved.

Fig. 4 is a schematic view of an intelligent driving scene system provided in an embodiment of the present application. As shown in fig. 4, the intelligent driving scene system is one possible example of the video processing system described above. The intelligent scene system comprises an intelligent automobile, a traffic monitoring camera and a cloud server.

The intelligent automobile comprises an on-vehicle computer, a display device and other sensing devices which can interact with the on-vehicle computer, a V2X device and a vehicle-mounted camera, wherein the information exchange between the automobile and the outside can be realized. The vehicle-mounted camera can transmit the monitoring image and/or the regional video information obtained according to the monitoring image and the global discrete grid to the cloud server through V2X.

The traffic monitoring camera is a camera which is arranged on a road or each intersection and is used for monitoring traffic conditions. The traffic monitoring camera can transmit the monitoring image and/or the regional video information obtained according to the monitoring image and the global discrete grid to the cloud server.

The cloud server analyzes the traffic environment situation by combining road network information according to data sent by the vehicle-mounted camera and the traffic monitoring camera, and transmits an analysis result back to the intelligent automobile through V2X so as to perform automatic driving control, and the display equipment of the intelligent automobile can also display the analysis result.

For the example that the video system is applied to other fields or scenes, some devices may be added or reduced according to the needs of different fields or scenes based on the system architecture shown in fig. 1, and the embodiment is not limited herein.

The following describes in detail the video processing method provided in the embodiment of the present application with reference to fig. 5 by taking the group behaviors of the vehicles in the intelligent driving scene system shown in the embodiment of fig. 4 and dividing the image blocks by using a global discrete grid as an example. For the embodiments applied to other scenarios, the implementation manner is similar, and details are not described here.

Fig. 5 is a flowchart of a video processing method according to an embodiment of the present application. As shown in fig. 5, the method includes:

s501, obtaining target area position information, wherein the target area position information is used for indicating a geographic position corresponding to the target area;

s502, obtaining, from first area video information, first encoding information of a first mesh in a global discrete mesh corresponding to a first image block and a first group behavior indicator of the first image block according to the target area position information, where the first encoding information is used to indicate a first geographical position corresponding to the first mesh;

s503, obtaining, from second regional video information, second coding information of a second grid in the global discrete grid corresponding to a second image block and a second group behavior indicator of the second image block according to the target region position information, where the second coding information is used to indicate a second geographic position corresponding to the second grid;

the manner of obtaining the position information of the target area may refer to S201 in the embodiment of fig. 2, which is not described herein again.

In this embodiment, for convenience of description, two monitoring devices are taken as an example for description, and when there are more than two monitoring devices, the implementation manner is similar, and details of this embodiment are not described herein again. Fig. 6 is a schematic view of a monitoring scenario provided in the embodiment of the present application. As shown in fig. 6, the positions and the fields of view of the two monitoring devices are different, but there is an overlapping area in the position ranges photographed by the two monitoring devices, that is, there are some areas covered by the two monitoring devices at the same time.

For the monitoring device a, the video image is segmented according to the actual geographic range of the video image shot by the monitoring device a and the grid in the global discrete grid to obtain at least one image block. For convenience of description in this embodiment, the grid in the global discrete grid is illustrated as a square.

In a specific implementation process, when a video image shot by the monitoring device a is segmented, the area covered by the grid in the global discrete grid is taken as a reference, and the rule that the shot object has a large or small distance is considered, so that the size of the segmented image block is in a negative correlation with the size of the target distance. The target distance is the distance between the position where the image block is shot and the monitoring equipment. That is, the closer the photographed position is to the monitoring device, the larger the divided image block is, and the farther the photographed position is from the monitoring device, the smaller the divided image block is. And the coverage area of the segmented image block is a subset of the coverage area of the corresponding grid in the global discrete grid.

In a possible implementation manner, the size of the grid in the global discrete grid, that is, the granularity of the subdivision level, is determined by the accuracy required by a specific application and the monitoring range of the camera. For example: the wide high traffic surveillance camera that takes photograph of monitoring range adopts the grid of granularity to divide, and to the narrower on-vehicle camera of monitoring range, adopts the more meticulous grid of granularity to divide, finally forms multi-dimension grid system.

When the monitoring image is divided into a plurality of image blocks according to the global discrete grid, a first group behavior index represented by each image block is acquired. In a possible implementation manner, foreground regions of interest belonging to surrounding vehicles and pedestrians included in image blocks are calculated by using an example segmentation depth learning algorithm, and in the foreground regions in each image block, vehicle density, vehicle average kinetic energy, pedestrian density, pedestrian average kinetic energy, pedestrian distance potential energy and the like can be obtained through the group density estimation, the group kinetic energy, the group motion direction entropy or the group distance potential energy and the like.

Fig. 7A is a schematic view of a monitoring image segmentation of a monitoring device a according to an embodiment of the present application. The obtained video information of the first region may be as shown in table one below, for the coding information of the grid corresponding to each image block and the group behavior index represented by each image block.

Watch 1

Encoding information	Density of vehicles	Average kinetic energy of vehicle	Entropy of vehicle direction of motion	Density of pedestrians	…
						Code1	ρ-A1	E-A1	O-A1	ρ-A1	…
Code2	ρ-A2	E-A2	O-A2	ρ-A2	…
						…	…	…	…	…	…
Code15	ρ-A15	E-A15	O-A15	ρ-A15

Fig. 7B is a schematic diagram of mapping image blocks of a monitoring image of the monitoring device a onto a grid according to the embodiment of the present application. As shown in fig. 7A, the monitoring image is divided into 11 image blocks, 4 image blocks in the 1 st row, 4 image blocks in the 2 nd row, and 3 image blocks in the 3 rd row. In this embodiment, a grid corresponding to each image block and encoding information corresponding to the grid are shown in a simple and schematic manner, for example, the 1 st image block in the 1 st row corresponds to the grid of C1(Code1) in fig. 7B, and so on, which is described herein again. Those skilled in the art will understand that the examples herein are merely schematic representations for ease of understanding, and precision issues are not considered in the segmentation process, and in the specific implementation process, precision issues and the like are considered for accurate segmentation.

Fig. 8A is a schematic view of a monitoring image segmentation of a monitoring device B according to an embodiment of the present application. For the coding information of the grid corresponding to each image block and the group behavior index represented by each image block, the obtained video information of the second region may be as shown in table two below.

Watch two

Encoding information	Density of vehicles	Average kinetic energy of vehicle	Entropy of vehicle direction of motion	Density of pedestrians	…
						Code7	ρ-B7	E-B7	O-B7	ρ-B7	…
Code8	ρ-B8	E-B8	O-B8	ρ-B8	…
						…	…	…	…	…	…
Code18	ρ-B18	E-B18	O-B18	ρ-B18

Fig. 8B is a schematic diagram that an image block of a monitoring image of the monitoring device B is mapped to a grid according to the embodiment of the present application. As shown in fig. 8A, the monitoring image is divided into 11 image blocks, 4 image blocks in the 1 st row, 4 image blocks in the 2 nd row, and 3 image blocks in the 3 rd row. In this embodiment, a grid corresponding to each image block and encoding information corresponding to the grid are shown in a simple and schematic manner, for example, the 1 st image block in the 1 st row corresponds to the grid of C7(Code7) in fig. 8B, and so on, which is described herein again. Those skilled in the art will understand that the examples herein are merely schematic representations for ease of understanding, and precision issues are not considered in the segmentation process, and in the specific implementation process, precision issues and the like are considered for accurate segmentation.

If the regional video information is generated by the monitoring equipment, the monitoring equipment sends the regional video information to the cloud server through transmission modes such as the Internet of things. The monitoring equipment can transmit the video information of the area to the cloud server in real time.

If the area video information is generated by the cloud server, the monitoring equipment sends video streams and indication information used for indicating the position and the view field information of the monitoring equipment to the cloud server in real time, the cloud server extracts monitoring images from the video streams, and then the area video information is generated according to the indication information and the global discrete grid.

Therefore, according to the coverage range corresponding to the target area position information, first encoding information of a first grid in the global discrete grid corresponding to the first image block and a first group behavior index of the first image block are acquired from the first area video information, and second encoding information of a second grid in the global discrete grid corresponding to the second image block and a second group behavior index of the second image block are acquired from the second area video information. Wherein the geographic locations indicated by the first mesh and the second mesh are covered by the target area.

For example, a list of coding information of a mesh covered by the target region may be obtained, and then it is determined whether the coding information of the mesh in the video information of the first region belongs to the list of coding information, if so, the first coding information and the first group behavior index are obtained, and it is determined whether the coding information of the mesh in the video information of the second region belongs to the list of coding information, if so, the second coding information and the second group behavior index are obtained.

In this embodiment, for convenience of explanation, it is assumed that all the image blocks divided by the monitoring image captured by the first monitoring device are first image blocks meeting the requirement, and all the image blocks divided by the monitoring image captured by the second monitoring device are second image blocks meeting the requirement. The requirement means that the area covered by the image block belongs to the target area.

S504, based on first coded information and second coded information indicating different grids, determining a group behavior index corresponding to a first grid according to the first group behavior index, and determining a group behavior index corresponding to a second grid according to the second group behavior index;

s505, based on first coding information and second coding information indicating the same grid, performing weighting processing according to the first group behavior index and the second group behavior index to obtain a group behavior index corresponding to a third grid;

because there is the difference in the region that different supervisory equipment covered, so need splice based on unified global discrete graticule mesh and fuse, fuse the graticule mesh of repeated collection, splice the graticule mesh of non-repeated collection.

Therefore, according to the coding information of the grid in which each image block is mapped to the global discrete grid, whether the image blocks correspond to the same grid or not is determined. For example, with continuing reference to fig. 7B and 8B, for the encoded information Code1(C1), there corresponds an image block that is an image block of a monitoring image captured by the monitoring apparatus a. For the encoded information Code7(C7), the two image blocks correspond to the encoded information, being the first image block of the monitoring image captured by the monitoring device a and the second image block of the monitoring image captured by the monitoring device B. Specifically, C7, C8, C10 and C11 appear in both fig. 7B and fig. 8B, i.e., each of the four meshes corresponds to two image blocks.

In a specific implementation process, the cloud server can determine whether the image blocks correspond to the same grid or not based on the regional video information shown in the table one and the table two. For example, if the coding information Code7 exists in table one and the coding information Code7 also exists in table two, both image blocks correspond to the coding information Code 7. For another example, the coding information Code1 exists in table one, the coding information Code1 does not exist in table two, and the coding information Code1 corresponds to an image block. The group behavior indicators represented by the image blocks corresponding to the Code1 and the Code7 can be as shown in table three. For other encoded information, the implementation is similar, and this embodiment is not described herein again.

Watch III

Encoding information	Density of vehicles	Average kinetic energy of vehicle	Entropy of vehicle direction of motion	Density of pedestrians	…
						Code1	ρ-A1	E-A1	O-A1	ρ-A1
Code7	ρ-A7	E-A7	O-A7	ρ-A7	…
						Code7	ρ-B8	E-B8	O-B8	ρ-B8	…

In a specific implementation process, splicing and fusing the input regional video information corresponding to the monitoring equipment A and the monitoring equipment B based on a unified global discrete grid. The method is characterized in that each group motion index of vehicles and pedestrians is respectively fused, a weighted average value is taken from grid regions repeatedly acquired by different monitoring equipment in the fusion process, grid region data which are not repeatedly acquired are directly migrated to a global discrete grid, and the three-dimensional dynamic environment splicing of a complete time-space domain is realized.

Fig. 9 is a schematic process diagram of splicing and fusion provided in an embodiment of the present application. As shown in fig. 9, the process of stitching may be understood as a group behavior indicator represented by one image block corresponding to a grid in the global discrete grid. Namely, the first group behavior index is used as the group behavior index corresponding to the first grid, and the second group behavior index is used as the group behavior index corresponding to the second grid.

For example, for the grids corresponding to the coded information C1, C2, C12, C13, etc., which correspond to the group behavior index represented by only one image block, the group behavior index is directly used as the group behavior index corresponding to the grid in the global discrete grid.

The fusion process may be understood as a group behavior index represented by at least two image blocks corresponding to a grid in the global discrete grid, and the group behavior index represented by the at least two image blocks is weighted to obtain the group behavior index corresponding to the grid in the global discrete grid. Based on the first coding information and the second coding information indicating the same grid, the weighting processing is performed according to the first group behavior index and the second group behavior index, so as to obtain the group behavior index corresponding to the third grid.

The weighting processing in this embodiment may be to perform weighting processing on the first group behavior index and the second group behavior index of the same type, for example, to perform weighting processing on two vehicle density values corresponding to the same grid, and to perform weighting processing on two vehicle average kinetic energy values corresponding to the same grid. In the weighting process of this embodiment, the weighting coefficient may be, for example, a ratio of an area covered by each image block to an area covered by a grid corresponding to the image block. As can be seen from fig. 7A and 8A, after the monitored image is divided, there are incomplete image blocks, the weight occupied by the incomplete image blocks is low, and the weight occupied by the complete image blocks is high, so that the accuracy of fusion is ensured.

As shown in fig. 9, since the grids corresponding to the coded information C7, C8, C10, and C11 correspond to two image blocks, the first group behavior index and the second group behavior index are weighted to obtain the group behavior index corresponding to each grid.

And obtaining a third group behavior index corresponding to the target region according to the group behavior index corresponding to the first grid, the group behavior index corresponding to the second grid and the group behavior index corresponding to the third grid. For example, the group behavior indexes corresponding to the grids covered by the target region may be averaged to obtain the third group behavior index, or the group behavior indexes corresponding to the grids may be directly used as the third group behavior index without processing.

In an alternative implementation, the aggregation process may also be performed, i.e., the following S506 to S508 are performed.

S506, obtaining a mesh of a first scale level in the global discrete mesh covered by the target region according to the first coding information and the second coding information, where the first coding information is further used to indicate the scale level of the first mesh, and the second coding information is further used to indicate the scale level of the second mesh;

s507, obtaining a group behavior index corresponding to a grid at a second scale level according to the group behavior index corresponding to the grid at the first scale level; the area corresponding to the grid corresponding to the second scale level is equal to the sum of the areas of the preset number of grids corresponding to the first scale level;

and S508, obtaining a third group behavior index corresponding to the target area according to the group behavior index corresponding to the grid of the second scale level.

In particular, the coded information is also used to indicate a scale level of the mesh, which may be understood as the area of the region covered by the mesh. For example, the scale level may be set in a prefix of the encoded information.

Fig. 10 is a schematic diagram of a polymerization process provided in an embodiment of the present application. As shown in fig. 9 and fig. 10, in the process of performing splicing fusion, the grids belong to the same scale level, that is, in the embodiment of the present application, the grids belonging to the same scale level are subjected to splicing fusion. As shown in fig. 10, aggregation refers to merging grids at the same scale level to obtain a grid at a larger scale.

In the example shown in fig. 9, the mesh is at a first scale level, and every adjacent 4 meshes of the first scale level are aggregated to obtain a mesh at a second scale level. And the area corresponding to the grid corresponding to the second scale level is equal to the sum of the areas of the grids of the preset number corresponding to the first scale level.

In the example shown in fig. 10, grids of the same fill pattern are aggregated to obtain an aggregated grid. In the aggregation process, the group behavior indexes corresponding to at least two grids at the first scale level are averaged to obtain the group behavior index corresponding to the grid at the second scale level. The average value may also be a weighted average value, and the weighting coefficient is not particularly limited in this embodiment.

The third group behavior index corresponding to the target region may include group behavior indexes corresponding to the grids at the second scale level, or may perform weighted average on the group behavior indexes corresponding to the grids at the second scale level to obtain the third group behavior index.

Those skilled in the art will understand that if S505 to S507 are not executed, the group behavior analysis process may be performed according to the group behavior index corresponding to the grid at the first scale level. If S505 to S507 are executed, group behavior analysis processing may be performed according to the group behavior index corresponding to the grid at the second scale level.

In a specific implementation process, normalization processing may be performed on the finally obtained group behavior index corresponding to the grid at the first scale level or the group behavior index corresponding to the grid at the second scale level, and the environment situation of the grid region is determined by a threshold determination method based on the normalized data.

For example, when the vehicle density corresponding to each of a preset number of grids in the continuous grid area is greater than a threshold value 0.5 and the average kinetic energy of the vehicle is less than the threshold value 0.3, it is determined that the road segment is congested; and when the vehicle density corresponding to each of the preset number of grids in the continuous grid area is greater than the threshold value 0.5 and the group average kinetic energy is greater than the threshold value 0.3 and less than 0.5, judging that the road section runs slowly. Further, when the entropy of the group movement direction is larger than the threshold value 0.5, it is judged that the road section is slow to run, a large number of detours occur in the vehicle, traffic accidents are likely to occur, and the abnormal grid area is calibrated. When the vehicle density corresponding to each of the preset number of grids in the continuous grid area is less than 0.5, the group average kinetic energy is greater than 0.5, and the group movement direction entropy is greater than 0.5, it is determined that the abnormal driving phenomena such as detour and retrograde driving of the vehicle in the road section, possibly the abnormal traffic events such as vehicle retrograde driving, crossing traffic order disorder and the like occur, and the abnormal grid area is marked. And when the vehicle density corresponding to each of the preset number of grids in the continuous grid area is less than 0.5 of the threshold value, the average kinetic energy of the group is greater than the threshold value and greater than 0.5, and the entropy of the group movement direction is less than 0.5 of the threshold value, judging that the traffic condition is smooth. When the density of the pedestrians at the road junction is greater than the threshold value 0.5, the average kinetic energy of the pedestrians is greater than the threshold value 0.5, the entropy of the moving direction of the pedestrians is less than the threshold value 0.3, and the distance potential energy of the pedestrians is less than the threshold value 0.5, it is judged that the pedestrians pass through the road at the road junction. When the pedestrian density at the intersection is smaller than the threshold value 0.5 and larger than the threshold value 0.3, and the entropy of the pedestrian movement direction is larger than the threshold value 0.5, it is determined that the intersection may have traffic abnormal events such as traffic order disorder.

The method comprises the steps of obtaining group behavior indexes corresponding to each grid by constructing a global discrete grid, locating specific positions of events and predicting event types at the first time when group abnormal events occur, rapidly popping corresponding monitoring videos on a large screen or monitoring terminal equipment, and sending abnormal event alarm information, so that related workers can find and handle the events at the first time, and the aims of accurate prevention and control and active prevention and control are achieved.

When the method is applied to the field of intelligent driving, the grid perception of the group situation of the global environment can be realized, the traffic events are found in real time, the early warning is provided for the vehicles at the first time, the intelligent planning of the paths of the vehicles is assisted, the planning of the paths of the vehicles is more intelligent and active, and the encoding information corresponding to each grid contains accurate geographic position information, so that the abnormal group motion index grid can be accurately positioned; due to the adoption of the global discrete grid, one-graph monitoring can be realized, the real-time road network traffic operation condition can be visually displayed at the vehicle-mounted terminal, and traffic situations such as traffic accidents, potential safety hazards and the like can be displayed.

Fig. 11 is a schematic diagram of a video processing apparatus according to an embodiment of the present application. The video processing apparatus 110 provided in the present embodiment includes: an acquisition module 1101 and a processing module 1102. Optionally, the video processing device 110 further includes a sending module 1103.

An obtaining module 1101, configured to obtain target area location information, where the target area location information is used to indicate a geographic location corresponding to the target area;

the processing module 1102 is configured to acquire, from first area video information according to the target area location information, first location information corresponding to a first image block and a first group behavior index of the first image block, where the first location information is used to indicate a first geographic location corresponding to the first image block;

the processing module 1102 is further configured to obtain second position information corresponding to a second image block and a second group behavior index of the second image block from second area video information according to the target area position information, where the second position information is used to indicate a second geographic position corresponding to the second image block;

the processing module 1102 is further configured to obtain a third group behavior index corresponding to the target area based on the first location information, the second location information, the first group behavior index, and the second group behavior index.

In a possible implementation manner, the processing module 1102 is specifically configured to:

determining a weight coefficient of the first group behavior index according to an actual geographic range covered by the first image block and a preset geographic range, wherein the preset geographic range is a geographic range corresponding to the first image block in the target area;

In a possible implementation manner, the first image block is an image block obtained by processing a first image according to a global discrete grid, the first location information is first encoding information of a first grid in the global discrete grid corresponding to the first image block, and the first encoding information is used to indicate the first geographic location corresponding to the first grid;

In a possible implementation manner, the processing module 1102 is specifically configured to: and averaging the group behavior indexes of the target positions corresponding to the grids at the first scale level to obtain the group behavior indexes corresponding to the grids at the second scale level.

In a possible implementation manner, the obtaining module 1101 is further configured to:

acquiring the first area video information from first monitoring equipment;

In a possible implementation manner, the obtaining module 1101 is further configured to: acquiring a first video stream and first indication information for indicating position and field-of-view information of a first monitoring device from the first monitoring device;

the processing module 1102 is further configured to: acquiring the first area video information according to the first video stream and the first indication information;

In one possible implementation, the first group behavior indicator includes at least one of: estimating population density, population kinetic energy, population motion direction entropy or population distance potential energy;

In one possible implementation manner, the method further includes:

a sending module 1103, configured to send, if the third group behavior index indicates that the group behavior is abnormal, prompt information to a terminal device, where the prompt information is used to indicate that the group behavior is abnormal.

The video processing device provided in this embodiment may be used to execute the method described above, and the implementation principle and technical effect are similar, which is not described herein again.

Fig. 12 is a schematic diagram of a video processing apparatus according to an embodiment of the present application. As shown in fig. 12, the video processing apparatus 120 includes: a processor 1201 and a memory 1202; wherein

A memory 1202 for storing a computer program;

a processor 1201 for executing the computer program stored in the memory to implement the video processing method performed by the video processing apparatus in the above-described embodiments. Reference may be made in particular to the preceding description of the embodiments of the method shown in fig. 3 and 4.

Alternatively, the memory 1202 may be separate or integrated with the processor 1201.

When the memory 1202 is a separate device from the processor 1201, the video processing apparatus 120 may further include: a bus 1203 for connecting the memory 1202 and the processor 1201.

The video processing device 120 may further include a receiving interface 1204 and a sending interface 1205, where the receiving interface 1204 is configured to receive information such as regional video information and/or video streams sent by the monitoring device, and the sending interface 1205 is configured to send prompt information and the like to the terminal device.

In one possible implementation, the obtaining module 1101 and the processing module 1102 shown in fig. 11 may be integrated in the processor 1201 when the area video information is generated by the video processing device, the obtaining module 1101 may be integrated in the receiving interface 1204 when the area video information is generated by the monitoring device, and the sending module 1103 may be integrated in the sending interface 1205.

The video processing device provided in the embodiment of the present application may be configured to execute the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Embodiments of the present application also provide a computer-readable storage medium, which includes a computer program, and the computer program is used for implementing the method executed by the above video processing device.

The embodiment of the present application further provides a chip, which includes a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that a communication device installed with the chip executes the method implemented by the above video processing device.

Embodiments of the present application also provide a computer program product, which includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the method implemented by the video processing device in the above embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The computer-readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Claims

1. A video processing method, comprising:

and obtaining a third group behavior index corresponding to the target area based on the first position information, the second position information, the first group behavior index and the second group behavior index.

2. The method according to claim 1, wherein the obtaining a third group behavior index corresponding to the target area based on the first location information, the second location information, the first group behavior index, and the second group behavior index comprises:

3. The method of claim 2, wherein performing weighting processing based on the first group behavior index and the second group behavior index to obtain a group behavior index corresponding to a third target location based on first location information and second location information indicating a same geographic location comprises:

4. The method according to claim 2 or 3, wherein the first image block is an image block obtained by processing a first image according to a global discrete mesh, and the first location information is first encoded information of a first mesh in the global discrete mesh corresponding to the first image block, the first encoded information being used to indicate the first geographic location corresponding to the first mesh;

5. The method according to claim 4, wherein the obtaining a third group behavior index corresponding to the target area according to the group behavior index corresponding to the first target location, the group behavior index corresponding to the second target location, and the group behavior index corresponding to the third target location comprises:

6. The method according to claim 5, wherein obtaining the group behavior index corresponding to the grid at the second scale level according to the group behavior index of the target location corresponding to the grid at the first scale level comprises:

and averaging the group behavior indexes of the target positions corresponding to the grids at the first scale level to obtain the group behavior indexes corresponding to the grids at the second scale level.

7. The method according to any one of claims 1 to 6, further comprising:

acquiring the first area video information from first monitoring equipment;

8. The method according to any one of claims 1 to 6, further comprising:

9. The method of any one of claims 1 to 8, wherein the first population behavior metric comprises at least one of: estimating population density, population kinetic energy, population motion direction entropy or population distance potential energy;

10. The method of any one of claims 1 to 9, wherein if the third group behavior index indicates a group behavior anomaly, the method further comprises:

11. A video processing apparatus, comprising:

12. The device of claim 11, wherein the processing module is specifically configured to:

13. The device of claim 12, wherein the processing module is specifically configured to:

14. The apparatus according to claim 12 or 13, wherein said first image block is an image block obtained by processing a first image according to a global discrete mesh, and said first location information is first encoded information of a first mesh in said global discrete mesh corresponding to said first image block, said first encoded information being indicative of said first geographic location corresponding to said first mesh;

15. The device of claim 14, wherein the processing module is specifically configured to:

16. The device of claim 15, wherein the processing module is specifically configured to:

17. The apparatus of any of claims 11 to 16, wherein the obtaining module is further configured to:

acquiring the first area video information from first monitoring equipment;

18. The apparatus of any of claims 11 to 16, wherein the obtaining module is further configured to:

acquiring a first video stream and first indication information for indicating position and field-of-view information of a first monitoring device from the first monitoring device;

19. The apparatus of any of claims 11 to 18, wherein the first population behavior indicator comprises at least one of: estimating population density, population kinetic energy, population motion direction entropy or population distance potential energy;

20. The apparatus of any of claims 11 to 19, further comprising:

21. A video processing apparatus, comprising: a memory for storing a computer program and a processor for calling and running the computer program from the memory, such that the processor runs the computer program to perform the video processing method of any of claims 1-10.

22. A computer-readable storage medium, in which a computer program is stored which, when executed, is capable of implementing a video processing method according to any one of claims 1 to 10.

23. A video processing system, the system comprising: the system comprises video processing equipment, first monitoring equipment and second monitoring equipment; wherein

the video processing device is configured to perform the video processing method according to any one of claims 1-7, 9, 10.

24. A video processing system, the system comprising: the system comprises video processing equipment, first monitoring equipment and second monitoring equipment; wherein

the video processing device is configured to perform the video processing method according to any one of claims 1-6 and 8-10.