Disclosure of Invention
In order to solve the above problems, the present invention provides a cloud computing platform data storage method based on big data analysis, including:
performing segmentation processing on the video based on the action of the pedestrian in the video to obtain an action change video segment set; dividing the action change video segment set into a plurality of action change video segment sub-sets based on the change of the action of the person in the time sequence, and for the action change video segments in each action change video segment sub-set: classifying the action change video segments in the subset based on the similarity of the environmental information of any two action change video segments in the subset, and adding the action change video segments meeting preset conditions into a first video segment subset; the similarity of any two action change video segments in the first video segment subset meets a preset condition;
dividing each motion change video segment in a first video segment subset into a plurality of video blocks based on an image block dividing method, wherein a method for calculating a first compression coefficient of a video block at the same position in all motion change video segments in the first video segment subset comprises the following steps: calculating a first compression coefficient of the video block at the position based on the similarity of the environmental information between the video block at the position and the video block, and the variation trend of the similarity of the environmental information between the video blocks at the position and the similarity of the environmental information between the video blocks at other positions; compressing the motion change video segment in the first video segment subset based on a first compression factor;
and storing the compressed action change video segment.
Further, a single-action video segment set is obtained after the video is segmented, wherein each single-action video segment in the single-action video segment set corresponds to one video segment type, and the video segment types comprise standing, walking and running; the action type of the pedestrians exceeding a preset number threshold value in each frame of image of the single action video segment is consistent with the video segment type of the single action video segment;
the action change video segment in the action change video segment set is a video between two single action video segments when the video segment category of the single action video segment changes.
Further, the method for calculating the similarity of the two motion change video segments comprises the following steps: sequentially inputting the two motion change video segments into a neural network to obtain two environment characteristic vectors, and obtaining similarity according to the distance between the two environment characteristic vectors; wherein the environmental information refers to information other than pedestrians.
Further, after classifying the motion change video segments in one motion change video segment subset, a second video segment subset is obtained, and if the similarity of two motion change video segments in the motion change video segment subset meets a preset condition, the two motion change video segments are classified into the second video segment subset.
Further, the method for calculating the first compression coefficient comprises the following steps:
dividing each motion change video segment in a first video segment subset into N video blocks, wherein the N-th video block of each motion change video segment in the first video segment subset forms a video block set x n And N has a value range of [1, N ]]Computing a set x n The similarity of any two video blocks in the video sequence is obtained to obtain a similarity set y n According to the set y n Calculating the mean value and variance of similarity of all the similarity, and comparing the mean value and variance of similarityA value of a n ;
Setting sliding window, respectively collecting x with video block n Wherein each video block is a set x of center pairs n Performing sliding window, obtaining a similarity matrix based on the similarity of any two video blocks in the sliding window, and obtaining b based on the characteristic value of each similarity matrix n ;a n And b n Is beta as a product of n ;
Obtaining a coordinate point S nm =(S nij ,S mij ),S nij Representing the similarity of the nth video block of the ith motion change video segment and the nth video block of the jth motion change video segment in the first video segment subset, S mij Representing the similarity between the mth video block of the ith motion change video segment in the first video segment subset and the mth video block of the jth motion change video segment, the video block set x m And obtaining a set x of video blocks n The method of (1) is the same, and the value range of m is [1, N ]]And m is not equal to n; clustering coordinate points to obtain a plurality of coordinate point categories, obtaining a graph structure based on video blocks corresponding to the coordinate points in all the coordinate point categories, dividing the graph structure into a sub-graph structure, and calculating the collaboration gamma of the nth video block based on the weight in the sub-graph structure comprising the nth video block n ;
Based on a n 、β n 、γ n And obtaining a first compression coefficient of the nth video block in each motion change video segment.
Further, the clustering specifically is:
two-dimensional coordinate point (S) nij ,S mij ) Conversion to four-dimensional coordinate points (n, m, S) nij ,S mij ) And clustering the four-dimensional coordinate points to obtain a plurality of coordinate point categories, wherein (n, m) corresponding to all the four-dimensional coordinate points in each coordinate point category are the same.
Further, the method for acquiring the graph structure comprises the following steps:
acquiring video blocks corresponding to coordinate points in all coordinate point categories, regarding each video block as a node, and fusing the same nodes after connecting the nodes corresponding to each coordinate point category, wherein the weight between the two connected nodes is 1; selecting two connected nodes as two original points, searching adjacent points connected with the original points, obtaining adjacent coordinate point categories of the original points and the adjacent points connected with the original points, obtaining the initial coordinate point categories of the two original points, calculating category weights between the adjacent coordinate point categories and the initial coordinate point categories, and updating the weights between the two original points based on the maximum category weights.
Further, the method for calculating the category weight includes: for two coordinate point types, acquiring the number of action change video segments corresponding to all coordinate points in each coordinate point type, counting the action change video segments which appear repeatedly only once, wherein the ratio of the number of the action change video segments corresponding to each coordinate point type to the total number of the action change video segments in the first video segment subset is the video percentage of the coordinate point type; the mean value of the video proportion of the two coordinate point categories is delta; and acquiring the action change video band corresponding to each coordinate point type according to all the coordinate points in each coordinate point type, calculating the intersection ratio epsilon of the action change video bands corresponding to the two coordinate point types, and taking the product of delta and epsilon as a type weight.
Further, the updating of the weight between the two original points based on the maximum category weight is to update the weight between the two original points by the product of the maximum category weight and the initial coordinate point category video ratio.
Further, the video segments in the single action video segment set and the second video segment subset are compressed according to a preset second compression coefficient.
The invention has the beneficial effects that:
1. the method and the system compress the old video data stored on the cloud platform, reduce the required storage space, facilitate the storage of newly obtained data, simultaneously uniformly store the data with the same information, and facilitate the retrieval and traversal of the data.
2. The invention compresses data according to the action of the pedestrian, reduces the storage of useless data and ensures that the data which has influence on the action of the pedestrian and has important information is reserved.
3. On the basis of the invention, the video compressed by the invention is compressed again by the prior art, so that the storage space can be greatly reduced finally, and in addition, after the compression of the invention, the video data volume is reduced, so that the calculation amount during the video re-compression by the prior art can be reduced.
Detailed Description
In order that those skilled in the art will better understand the present invention, the following detailed description will be given with reference to the accompanying examples.
Because the old and outdated traffic data is not important but cannot be completely deleted, the invention mainly aims to realize the compressed storage of the old and outdated data stored in the cloud.
Example (b):
in this embodiment, only video data stored in the last quarter of the cloud platform is compressed and stored, and an implementation flow of the present invention is shown in fig. 1, and the main idea is as follows:
when pedestrians have the same behaviors and actions in the collected videos, the videos have similar information, the videos become single-action videos, and then the videos need to be compressed and stored in a unified mode;
when the action of the pedestrian in the video changes, such as the action changes from standing to walking, the video with the changing action of the pedestrian becomes the action change video, for the action change video, the similarity of the environment where the pedestrian is located needs to be detected, if the environment similarity is large, the environment where the pedestrian is located has a determining function on the action change of the pedestrian, the video sequences need to be stored together and compressed; if the similarity is small, the image or video needs to be divided into different blocks for analysis and storage and compression.
The method comprises the steps that firstly, segmentation processing is carried out on videos based on actions of pedestrians in the videos to obtain a single-action video segment set and an action change video segment set; specifically, the method comprises the following steps:
the method comprises the steps that an OpenPose network is adopted to obtain two-dimensional skeleton key points of each pedestrian in each frame of image in a video, a change sequence of the two-dimensional skeleton key points of each pedestrian in a section of video is obtained and input into a TCN network, and the action category of each pedestrian in the section of video sequence is obtained; the specific method for acquiring the pedestrian behavior action category by using the TCN network is a conventional technology, and specific details are not described in detail in the present invention.
By detecting the action of the pedestrian in the video, the action types of different pedestrians at different moments can be obtained. For a certain moment, acquiring the percentage of the number of pedestrians in each action type, namely the action proportion of each action type; acquiring an action category with the largest action ratio, wherein the action category represents that the number of pedestrians in the action category is the largest; the present invention regards the action category with the largest action proportion as the action category of all pedestrians at that time, which is simply referred to as the action category.
If the action occupation ratio of each moment action type is more than 0.8 in continuous time and the action types are the same, the video sequence of the time is a single action video segment, and the corresponding action type is the video segment type of the single action video segment; that is, more than 80% of pedestrians in the video sequence in a period of time have the motion category of walking at each time, the video sequence is a single motion video segment, and the video segment category of the video sequence is walking.
When the video segment type of a single action video segment is changed, acquiring a video sequence between two single action video segments with different video segment types, wherein the video sequence is an action change video segment; when the video segment category changes from walking to running, the video segment category of the motion change video segment is (walking, running).
Thus, a single action video segment set and an action change video segment set are obtained.
And step two, dividing the motion change video segment set into a plurality of motion change video segment sub-sets based on the motion change of the pedestrians on the time sequence, namely the video segment category, wherein the pedestrians in each motion change video segment sub-set have the same motion change, and the video segments in the sub-sets also have the same information and need to be compressed.
Performing the following operations on the motion change video segments in each subset of motion change video segments:
a) calculating the similarity of any two motion change video segments in the subset based on the environment information in the motion change video segments, specifically, sequentially inputting two motion change video segments with the same length processed by frame interpolation or frame deletion into a neural network to obtain two environment feature vectors, wherein the two environment feature vectors obtained by the two motion change video segments with the same environment information are consistent, and the environment feature vectors of the two motion change video segments with different environment information are inconsistent; calculating the distance l, e between two environment feature vectors -l Similarity is obtained; the environment information refers to information other than pedestrians, and includes roads, traffic flows, signal lamps, road signs, electronic billboards, roadblocks and the like.
b) Classifying the action change video segments based on the similarity to obtain a first video segment subset and a second video segment subset; specifically, if the similarity of two motion change video segments is greater than the similarity threshold, the two motion change video segments are classified into a second video segment subset, the motion change video segments included in the second video segment subset are removed from the motion change video segment subset, and the remaining motion change video segments constitute a first video segment subset.
c) Each motion change video segment in the first video segment subset is divided into a plurality of video blocks according to the same division rule based on an image block dividing method, each motion change video segment is divided into 16 video blocks in the embodiment, namely each image in the motion change video segment is divided into 16 image blocks, the division is not necessarily equal, an implementer can determine the division rule according to actual conditions, and the motion of pedestrians can be changed by environment information in the video blocks; although the similarity of any two motion change video segments in the first video segment subset is not large, video blocks with high environmental similarity may exist, and for such video blocks, which correspond to the same pedestrian behavior change, it is explained that such video blocks cause the behavior motion of pedestrians to change, that is, such video blocks induce the switching and changing of the behavior motion of pedestrians, and for such video blocks, should be stored together and compressed. For those video blocks that are not inducing a change in pedestrian behavior, the information contained in the video blocks is not important, and the video blocks should be compressed to a slightly greater degree. Calculating a first compression coefficient of the video block at the position based on the similarity of the environment information between the video block at the position and the video block, the similarity of the environment information between the video blocks at the position and the change trend of the similarity of the environment information between the video blocks at other positions, and compressing the motion change video segment in the first video segment subset based on the first compression coefficient; the method for calculating the first compression coefficient of each video block of which each motion change video segment is at the same position comprises the following steps:
1) dividing each motion change video segment in the first video segment subset into N video blocks, wherein the nth video block of each motion change video segment in the first video segment subset forms a video block set x n And N has a value range of [1, N ]]Computing a set x n The similarity of any two video blocks in the video sequence is obtained to obtain a similarity set y n According to the set y n Calculating the mean value and variance of similarity with the ratio of a n ;
When the mean value of the similarity is large and the variance of the similarity is small, the result shows that the environmental similarity of the motion change video segments is large and the difference of the environmental similarities of different motion change video segments is small, which indicates that the motion induction factor of the video block is large and can determine the change of the behavior of the pedestrian; when the mean value of the similarity is relatively small and the variance of the similarity is small, it is indicated that the environmental similarity of the motion change video segments is small but the difference of the environmental similarity of different motion change video segments is not large, which indicates that the motion induction factor of the video block is small and the change of the behavior of the pedestrian cannot be determined, but in view of the fact that the difference of the environmental similarity is not large, the environment synergy is required to be combined to further determine how the video block determines the change of the behavior of the pedestrian; when the mean value of the similarity is small and the variance of the similarity is large, it is shown that the environmental similarity of the motion change video segments is small and the difference of the environmental similarity of different motion change video segments is large, which indicates that the motion induction factor of the video block is small and the change of the pedestrian behavior cannot be determined.
For example, a traffic light for controlling pedestrians to pass is arranged in one video block, the traffic light works normally, the environmental similarity of all the motion change video segments in the video block is larger, namely the average value is large, the variance is small, and the video block independently determines the behavior change of the pedestrians. If the video block has a parking space, a shared bicycle or an electronic billboard, the environmental similarity of all motion change video segments in the video block is large or small, namely the mean value is small and the variance is large, which indicates that the video block does not determine the behavior and motion change of pedestrians. If only part of traffic flow information or part of signal light information exists in the video block, the pedestrian cannot be completely determined by the single part of traffic flow or part of signal light, but because the behavior change of the pedestrian has some correlation, the environmental similarity of all the motion change video segments in the video block has large magnitude, but the fluctuation of the magnitude difference is not large, namely the mean value is relatively small, the variance is small, and whether the behavior change of the pedestrian is determined by the video block can be calculated only when the part of traffic flow of the video block is combined and cooperated with the traffic flow of other video blocks.
2) Setting a sliding window, wherein the size of the sliding window is 1 × 11, and respectively collecting the video blocks x n Wherein each video block is a set x of center pairs n Performing sliding window, obtaining a similarity matrix based on the similarity of any two video blocks in the sliding window, specifically, calculating the similarity of any two video blocks, setting the similarity to be 0 when the similarity is smaller than a certain threshold value, wherein all the similarities form a similarity matrix, and diagonal elements of the matrix are zero; obtaining the eigenvalue of the Laplace matrix of each similarity matrix, calculating the variance of the eigenvalue based on all the eigenvalues of the Laplace matrix corresponding to each similarity matrix, and calculating the induction value e according to the sum c of the variance of the eigenvalue and the maximum eigenvalue -c The larger the induction value is, the more uniform the similarity change of the video block in the sliding window is; smaller values indicate a drastic change in similarity, when sliding within the windowThe larger the calculation interference of the video block to the motion induction factor is, the smaller the importance of the video block to the calculation of the motion induction factor is, one induction value is obtained by sliding a sliding window once, a plurality of induction values are obtained after the sliding window is finished, and the average value of Top-D induction values after the normalization processing of the plurality of induction values is b n ;a n And b n Product of is beta n ,β n Is the motion-inducing factor, beta, of the nth video block n Normalization processing is needed, an implementer of the value D can determine according to actual conditions, and the value D in the embodiment is one tenth of the total number of the action change video segments in the first video segment subset.
3) Coordinate point S is obtained nm =(S nij ,S mij ),S nij Similarity between the nth video block of the ith motion change video segment in the first video segment subset and the nth video block of the jth motion change video segment, S mij Representing the similarity of the mth video block of the ith motion change video segment in the first video segment subset and the mth video block of the jth motion change video segment, video block set x m And obtaining a set x of video blocks n The method of (1) is the same, and the value range of m is [1, N ]]And m is not equal to n; clustering the coordinate points to obtain a plurality of coordinate point categories, wherein the clustering is to specifically cluster the two-dimensional coordinate points (S) nij ,S mij ) Conversion into four-dimensional coordinate points (n, m, S) nij ,S mij ) Carrying out mean shift clustering on the four-dimensional coordinate points to obtain a plurality of coordinate point categories, wherein (n, m) corresponding to all the four-dimensional coordinate points in each coordinate point category are the same; acquiring the number of action change video segments corresponding to all coordinate points in each coordinate point category, counting the action change video segments which appear repeatedly only once, wherein the ratio of the number of the action change video segments corresponding to each coordinate point category to the total number of the action change video segments in the first video segment subset is the video percentage of the coordinate point category; deleting the coordinate point category of which the video proportion is smaller than a preset video proportion threshold;
obtaining a graph structure based on video blocks corresponding to coordinate points in all coordinate point categories: acquiring video blocks corresponding to coordinate points in all coordinate point categories, wherein the video blocks corresponding to the coordinate points in the coordinate point categories only consider that the video blocks are the nth video blocks and do not consider which action change video band the video blocks belong to, each coordinate point category corresponds to two video blocks, namely the nth video block and the mth video block, each video block is regarded as a node, the nodes corresponding to each coordinate point category are connected and then are fused with the same node, and the weight between the two connected nodes is 1; selecting two connected nodes as two original points, searching adjacent points connected with the original points, obtaining adjacent coordinate point categories of the original points and the adjacent points connected with the original points, obtaining initial coordinate point categories of the two original points, calculating category weights between the adjacent coordinate point categories and the initial coordinate point categories, and updating the weights between the two original points by the product of the maximum category weight and the initial coordinate point category video ratio; the category weight value calculation method comprises the following steps: the mean value of the video occupation ratios of the two coordinate point categories is delta; acquiring action change video bands corresponding to each coordinate point type according to all coordinate points in each coordinate point type, and calculating the intersection ratio epsilon of the action change video bands corresponding to the two coordinate point types; the larger the δ is, the more the number of motion change video segments corresponding to two coordinate point categories is, the more the two coordinate point categories are similar, the larger the ε is, the greater the coincidence degree of the motion change video segments corresponding to the two coordinate point categories is, the more the two coordinate point categories are similar, and therefore, the product of δ and ε is a category weight.
If the initial coordinate point categories of the two original points are multiple, calculating category weights of each initial coordinate point category and adjacent coordinate point categories, selecting the largest category weight among the category weights, obtaining the video occupation ratio of each initial coordinate point category, and updating the weight between the two original points by the product of the largest category weight and the largest video occupation ratio.
Disconnecting the connection edges with the weight values of the connection edges between the nodes smaller than a preset weight value threshold, namely dividing the graph structure into a sub-graph structure, and calculating the synergy gamma of the nth video block based on the weight values in the sub-graph structure comprising the nth video block n Specifically, the weights of the connecting edges connected to the nth video block in the sub-graph structure including the nth video block are all equalValue of gamma n 。
The beneficial effect of introducing the video block coordination degree is as follows: the larger the degree of cooperation of a certain video block is, the higher the dependency of the certain video block on other video blocks is, and the degree of compression cannot be too large for the video block with the strong dependency. If the sub-picture structure only includes one video block, the degree of cooperation of the video block is 0.
4) Based on a n 、β n 、γ n Calculating a first compression coefficient of the nth video block in each motion change video segment, namely the nth video block at the same position in each motion change video segment:
wherein, g n Represents the compression degree of the nth video block, theta is a super parameter, and leads to be more than or equal to 0 and less than or equal to theta a n ≤1,θa n Reference coefficient for motion-inducing factor, thetaa n The larger the representation, the less important the synergy of the video block, and the more important the action inducing factor; for video blocks with large motion inducers or synergies, the degree of compression is small relative to other video blocks. First compression coefficient k of nth video block 1n Comprises the following steps:
sigma is a hyperparameter, and the value of sigma in the embodiment is 0.5.
The number of image frames to be deleted when the nth video block is compressed is k 1n The product of the total number of frames of images in the first video segment subset, the number of frames of images to be deleted is selected in the same manner as all the methods described below for compressing a single action video segment; it should be noted that the ratio coefficient of each video block in each motion change video segment is the ratio coefficient of the motion change video segment, and the method for obtaining the ratio coefficient of the motion change video segment is the average value of the ratio coefficients of two single motion video segments before and after the motion change video segment in time sequence.
d) According to a second compression coefficient k, the video segments in the single action video segment collection and the second video segment sub-collection 2 Performing compression with the second compression coefficient k in the embodiment 2 To be 0.4, specifically, the total number of picture frames in the single action video segment set is obtained, and the number of frames K deleted in the single action video segment set is calculated by multiplying the second compression coefficient by the total number of picture frames.
In one embodiment, the K frames of images to be deleted are selected by: the method comprises the steps of obtaining a behavior proportion of an action category of each moment in a single action video segment, obtaining a mean value of the action proportions of all the moments as an occupation proportion coefficient, obtaining the occupation proportion coefficient of each single action video segment in a single action video segment set, calculating the sampling probability of the occupation proportion coefficient corresponding to each single action video segment according to all the occupation proportion coefficients, and setting the occupation proportion coefficient of the p-th single action video segment in the single action video segment set as z
p Then the sampling probability of the ratio coefficient is
P is the total number of single action video segments in the single action video segment set, and the P-th single action video segment needs to be randomly selected
Deleting the frame image; each single action video segment in the single action video segment collection calculates the number of frames of images to be deleted according to the above method.
In another embodiment, the K frames of images to be deleted are selected by: the method comprises the steps of obtaining the sampling probability of the proportion coefficient of each single-action video section in a single-action video section set, obtaining probability distribution based on the sampling probability of all the proportion coefficients, randomly generating an integer q based on a random sampling method of the probability distribution, wherein the value range of q is [1, P ], randomly deleting a frame of image in the qth single-action video section, repeating the process of randomly generating the integer for K times to obtain K random integers q, randomly deleting a frame of image in the qth single-action video section once the integer q is obtained, and finally deleting the K frame of image. The method for randomly generating an integer by a random sampling method based on probability distribution is well known, and the description of the invention is omitted.
Therefore, the compression of the video bands in the single action video band set and the action change video band set is completed.
The method comprises the steps of storing compressed single action video segments and action change video segments, wherein the compressed single action video segments are stored together, and the single action video segments are in the same action type and are stored together to facilitate data retrieval and traversal; and storing the compressed video segments in the second video segment subset together, and storing the motion change video segments in the first video segment subset in a blocking mode, namely storing the nth video block in each motion change video segment together.
The invention only compresses and stores the video data with the same information and important information, and an implementer can further compress the video data on the basis of the invention, wherein the compression method is a conventional and universal compression method, for example, the compression method compresses static object information in the video, thereby further saving the storage space.
The above description is intended to provide those skilled in the art with a better understanding of the present invention, and is not intended to limit the present invention to the particular embodiments shown and described, since various modifications and changes can be made without departing from the spirit and scope of the present invention.