CN114143551A

CN114143551A - Video safe and efficient transmission system applied to video sensor network

Info

Publication number: CN114143551A
Application number: CN202111515868.1A
Authority: CN
Inventors: 李丽香; 杨子航; 彭海朋; 仝丰华; 张嘉轩; 李思睿; 王兰兰
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2022-03-04
Anticipated expiration: 2041-12-01
Also published as: CN114143551B

Abstract

The invention discloses a video safe and efficient transmission system applied to a video sensor network.A sampling rate self-adaptive selection and non-key frame self-adaptive sampling algorithm is adopted at an encoding end, so that the problem of overlarge recovery quality fluctuation between adjacent non-key frames is solved; by adopting an encryption algorithm based on the video frame block, the calculation complexity can be reduced, and the time for encryption and decryption can be shortened; the chaotic sequence generated by the chaotic system is used as a key of a measurement matrix and an encryption algorithm to improve the safety of a transmission scheme; in addition, at a decoding end, a video frame recovery network based on block compressed sensing and deep learning is adopted, when a cloud server receives a small amount of non-compressed video frames sent at the earliest, a reconstruction unit can immediately carry out rapid training, and the video frames received after recovery can be used after the training is finished; and finally, a parallel recovery algorithm is adopted in the recovery process, so that the recovery speed can be greatly increased and the real-time video decoding delay is greatly reduced under the condition of more sufficient computing resources.

Description

Video safe and efficient transmission system applied to video sensor network

Technical Field

The invention relates to the technical field of image processing, in particular to a video safe and efficient transmission system applied to a video sensor network.

Background

A Video Sensor Network (VSN) is a distributed sensing Network consisting of a set of Video Sensor nodes with computing, storage, and communication capabilities. The video sensor on the node senses the image and video information of the surrounding environment, the data are transmitted to the information aggregation node in a multi-hop relay mode, and the aggregation node analyzes the monitoring data to realize comprehensive and effective environment monitoring. The video sensor network is a network system which is self-organized by a certain number of nodes with the function of shooting video images in a wireless communication mode and is used for sensing, collecting and processing information of target objects in the coverage area of the network.

In conventional video streaming, the well-known codec standards include h.261, h.263, h.264/AVC, h.265/hevc h.265 of the international telecommunication union, M-JPEG of the moving picture experts group, MPEG series standards of the moving picture experts group of the international organization for standardization, and the like. The principle of these standards is to reduce the correlation between data by removing a large amount of redundant information, such as spatial redundancy, temporal redundancy, coding redundancy, and visual redundancy, existing in video image data, and the compression techniques used include intra-frame image data compression techniques, inter-frame image data compression techniques, and entropy coding compression techniques. The method has the advantages that when the terminal collects the original video, the high-efficiency coding is directly carried out through operations such as transformation, quantization, entropy coding, motion estimation, compensation and the like, and the video stream is generated and transmitted.

However, the disadvantages are also evident in the case of limited resources at the encoding end, such as low power consumption and low memory of the mobile wireless terminal. Firstly, the encoding end needs to consume a lot of energy to perform very complex encoding on the original video, thereby occupying a lot of resources, resulting in that the computational complexity of the encoding end is usually 5-10 times that of the decoding end, which is obviously not friendly to the encoding end with limited resources. In addition, since the protocols are open, data is easily stolen or tampered during transmission, and error diffusion occurs during decoding.

Currently, the common video encryption algorithms include global encryption (negative encryption), Selective encryption (Selective encryption), and the like. The whole encryption algorithm, namely, the whole video stream is encrypted by using a standard encryption method (such as DES), and a special structure of a data stream after video compression is not utilized. The algorithm for selectively encrypting a portion of a frame has a lower computational complexity than the algorithm for fully encrypting the portion of the frame, but the security cannot be guaranteed.

The conventional video transmission system has the following problems in a Video Sensor Network (VSN): (1) VSN acquires rich image and video information and has complex format, and the coding efficiency of a coding end in VSN is low; (2) media information in the VSN serves a scene monitoring task together, and safety problems exist in video transmission in the VSN; (3) due to the introduction of media such as large-data-volume images and videos, the video sensor nodes and the network capacity (in the aspects of acquisition, processing, storage, transceiving, energy supply and the like) are remarkably enhanced. In order to better meet the requirement of media real-time transmission in the network, the bandwidth resources of the sensor network are correspondingly increased. However, the conventional compressed sensing reconstruction process is slow and is difficult to match the data size and response speed required by the VSN.

Disclosure of Invention

The invention aims at the problems and provides a video secure and efficient transmission system (SFE-VTS) applied to a video sensor network.

In order to achieve the above purpose, the invention provides the following technical scheme:

the video safe and efficient transmission system comprises a video coding end and a video decoding end, wherein the video coding end is deployed on a sensor node of a VSN and used for compressing and encrypting an original video, the video decoding end is deployed at a cloud end, a received video signal is decrypted and reconstructed by using a cloud end computing module, the encryption and decryption processes are based on a chaotic system and a video frame block-based compression sensing algorithm, and the video reconstruction is based on a deep learning parallel video frame recovery algorithm.

Further, the process of video compression at the video encoding end comprises the steps of performing key frame and non-key frame selection, adaptive non-key frame block selection, adaptive sampling rate selection and chaos-based video frame block compression on the original video.

Further, the video compression process specifically includes:

s101, dividing a video into a plurality of video frame groups, and dividing all original video frames in each video frame group into a key frame I and a plurality of non-key frames P;

s102, sampling is carried out on the key frame I based on block compressed sensing, the key frame is sequentially divided into l n × n blocks with the same size from left to right and from top to bottom, and then each n × n block B_iSampling by using a measurement matrix phi to obtain a measured value Y_iThe measurement matrix phi is a random sequence generated by the chaotic system;

s103, sampling the non-key frame P based on block compressed sensing, and firstly judging P_iWhether the RMSE of the frame and the I frame is greater than a threshold eta_IIf so, the last frame of the group of video frames is P_i-1Will calculate P_i-1Frame and P_iB in the frame_jStandard error of block (RMSE) if B_jRMSE of a block exceeding a threshold η_P，B_jThe block is discarded and the number of the last encoded image block will be used as the key₄Is transmitted.

Further, in step S101, the first frame of each video frame group is set as a key frame I, and the rest of the video frames are non-key frames P, so as to obtain a viewThe set of frequency frames is { I, P₁,P₂,…,P_n}。

Further, the video encoding end carries out the encryption process of the video signal, including the generation of the chaotic measurement matrix, the encryption of the measurement value and the secure transmission of the system key.

Further, the video signal encryption process specifically includes:

s201, the video coding end and the video decoding end define the chaotic system, and the coding end takes the initial parameters of the chaotic system as Key₁Carrying out transmission;

s202, dividing an ith video frame into L image blocks B with the size of nxn, and scrambling sequences S of the image blocks among the blocks to obtain S', wherein a vector A for controlling the scrambling of the image block sequences is generated by a chaotic system module;

and S203, performing intra-block diffusion on the image block B' obtained after the sequence scrambling.

Further, the process of decrypting the video signal by the video decoding end comprises the steps of decrypting the system key and decrypting the encrypted signal.

Further, the process of decrypting the video signal specifically includes:

s301, using the chaos system initial parameters for generating the vector A for controlling the image block sequence scrambling and the encryption vector C sequence as Key keys respectively₂Key of harmony Key₃A transmitting decoding end;

s302, when the decoding end receives the Key and the encrypted video frame data, the chaotic system module of the decoding end can follow the Key₂And Key₃Respectively obtaining initial parameters and generating A, C sequences;

s303, the decryption module firstly carries out intra-block diffusion inverse processing on the encrypted data E according to the encryption parameters in the A, C sequence; secondly, the measurement result is subjected to inter-block inverse scrambling, and finally, a decrypted measurement result Y' is output.

Further, the process of video restoration at the video decoding end comprises the steps of video frame reconstruction based on depth compressed sensing, non-key frame adaptive restoration and video frame combination.

Further, the video recovery process specifically includes:

s401, training a neural network, wherein a training set consists of dozens or hundreds of video frames sent to a decoding end by a video coding end in the equipment initialization process;

s402, the reconstruction unit R uses a Key Key₁Obtaining a measurement matrix phi from the chaotic system module, and training convolutional neural networks with different compression ratios r;

s403, each reconstruction unit { R ] of the key frame recovery module₁,R₂,…,R_lAccording to the measurement matrix phi and the compression rate r, image blocks { Y 'in the same key frame I are subjected to'_i1,Y′_i2,…,Y′_ilPerforming parallel recovery to obtain reconstructed image blocks { B'_i1,B′_i2,…,B′_il}；

S404, splicing the image blocks of all the recovered video frames in sequence according to the numbers to obtain a complete decoded Key frame I', and simultaneously, using a Key Key to select an image block module₄I.e. a sequence of unsampled image blocks, non-key frames P_iThe vacant image block is extracted from I' and is used as one of the inputs of the non-key frame recovery module;

s405, the non-key frame recovery module combines the recovered image blocks from the selected image block module and the non-key frame depth recovery module to form a complete non-key frame P_i'。

Compared with the prior art, the invention has the beneficial effects that:

(1) at the encoding end, because the key frames and the non-key frames of a group of video frames have great information redundancy, in order to reduce the actual sampling rate of data, the invention provides a new sampling rate self-adaptive selection and non-key frame self-adaptive sampling algorithm. Meanwhile, the algorithm can prevent the problem of overlarge recovery quality fluctuation between adjacent non-key frames.

(2) Compared with the encryption and decryption operation on the whole observation matrix, the encryption and decryption algorithm based on the video frame block can reduce the calculation complexity and shorten the encryption and decryption time. Meanwhile, the chaos sequence generated by the chaos system is used as a key of a measurement matrix and an encryption algorithm to improve the safety of a transmission scheme.

(3) At the decoding end, the invention provides a new video frame recovery network based on Block Compressed Sensing (BCS) and deep learning. During training, the neural network algorithm can adaptively search for a proper step length, so that the convergence speed is accelerated, and the training is quickly completed. When the cloud server receives a small amount of uncompressed video frames sent earliest, the reconstruction unit can immediately carry out rapid training, and the video frames received after the training can be recovered by the reconstruction unit after the training is finished. Meanwhile, in the recovery process, a parallel recovery algorithm is provided, so that the recovery speed can be greatly increased and the real-time video decoding delay can be greatly reduced under the condition that the computing resources are sufficient.

In summary, the video safe and efficient transmission system applied to the video sensor network provided by the invention provides a video sampling algorithm, a sampled data quick encryption and decryption algorithm and a video quick recovery algorithm based on deep learning based on deep compressed sensing. In addition, the invention uses the deep learning algorithm to reconstruct the signal, so that the effect and the speed in the aspect of video recovery are better than those of a video transmission scheme based on the traditional compressed sensing.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of a video secure and efficient transmission system applied to a video sensor network according to an embodiment of the present invention.

Detailed Description

For a better understanding of the present solution, the method of the present invention is described in detail below with reference to the accompanying drawings.

As shown in FIG. 1, the SFE-VTS is mainly divided into two parts, namely a video encoding end and a video decoding end. Wherein, the video coding part is to be disposed on the sensor node of the VSN, and has the functions of compressing and encrypting the original video frame. The decoding part can be deployed at the cloud end, and the received video signal is decrypted and reconstructed by utilizing sufficient computing power of the cloud end.

In the system provided by the invention, the first step of operation is a video compression process, which comprises the steps of performing key frame and non-key frame selection, self-adaptive non-key frame block selection, self-adaptive sampling rate selection and chaos-based video frame block compression on an original video; secondly, the encryption process of the video signal comprises the generation of a chaotic measurement matrix, the encryption of a measured value and the safe transmission of a system key; thirdly, the video signal decryption process is to decrypt the received signal, and the decryption process comprises decryption of a system key, decryption of an encrypted signal and the like; and fourthly, a video recovery process, wherein the process comprises video frame reconstruction based on depth compressed sensing, non-key frame adaptive recovery, video frame combination and the like, and finally the recovered video is obtained. Wherein, the process of step 1-2 in the above four steps will be carried out at the encoding end, and the process of step 3-4 will be carried out at the decoding end.

Video compression process

Since there are many similar and repeated data or some elements that do not affect the sense of mind in video data, we can remove this redundant information by constructing key frames and non-key frames. In the recovery process, the non-key frames are recovered by the key frames, so the quality of the recovery effect of the whole video is directly related to the recovery effect of the key frames. In the traditional video coding, the spatial redundancy information of a key frame is mainly reduced through intra-frame coding, and the temporal redundancy information between a non-key frame and the key frame is reduced through inter-frame coding, so that the compression of the transmission data volume is realized. However, the encoding mode has high computational complexity, so that a new video compression method is provided by using the characteristic of low computational complexity in the encoding process of the compressive sensing theory aiming at encoding end equipment with limited resources.

In the first step, we first divide the video into a number of groups of video frames (GOPs), and divide all the original video frames in each GOP into a key frame I and a number of non-key frames P. Where an I-frame may also be referred to as an intra-coded frame, its encoding is independent of other video frames, thus enabling independent decoding. While the P-frame is called the forward reference frame, its encoding and decoding requires a reference I-frame. In order to meet the requirements of real-time performance of monitoring videos and low computational complexity of interframe coding, the first frame of each GOP is set as an I frame, and the rest video frames are P frames, so that one GOP with { I, P is obtained₁,P₂,…,P_n}. In addition, an algorithm for custom dividing GOP is also provided. The size of the total number of video frames in each GOP will vary depending on the compression rate and the recovery effect of the non-key frames, since if the number of non-key frames in a GOP is excessive, this will result in P_nThe compression rate of the frame is low and the recovery effect is poor, and vice versa.

In the second step, we start block-based compressed sensing sampling of the I-frame. We divide the key frame into n × n blocks of the same size from left to right and from top to bottom in turn, and then divide each n × n block B_iAll using a measurement matrix phi to sample to obtain a measurement value Y_i. Each time a measurement Y is obtained_iThe encryption operation can be directly carried out without waiting for the completion of the sampling of the whole key frame, so that the transmission efficiency can be improved. In the algorithm herein, the size of a block is set to 33 × 33. This is because the quality of the image restored using the block compressed sensing is relatively low if the size of the block is too small. On the contrary, if the size of the block is too large, the dimension of the input vector of the neural network in the video frame reconstruction process is increased, so that the computational complexity of the reconstruction algorithm is high, and the reconstruction time is prolonged. The measurement matrix phi is a random sequence generated by the chaotic system and has a size of M × N, where N equals 1089 and M equals R × N, where R is the compression ratio.

Finally, the video frame compression process is a sampling operation on the P frames. In the above paragraph, how to encode I-frames using block compression is described, and for simplicity and efficiency of the system coding algorithm, we also perform a block compression operation on P-frames, which is the same as the above-described block sampling operation for I-frames. Because a large amount of identical or similar image information exists between the non-key frame and the key frame and between the non-key frames adjacent to the key frame, more redundant information is captured and discarded only on the premise of ensuring that the algorithm complexity is lower, and the transmission quantity of more video data can be reduced. Therefore, we propose a sampling algorithm for P frames, as shown in algorithm 1.

Simply put, the current frame P is discarded by choosing_iAnd a key frame I, a previous frame P_i-1The video frame blocks are very similar or identical, so that the aim of removing much redundant information is fulfilled. In pair P_iIn the process of sampling frames, we first judge if P is_iWhether the Root Mean Square Error (RMSE) of the frame and the I frame is greater than a threshold eta_I. If so, the last frame of the GOP is P_i-1. Then we will calculate P_i-1Frame and P_iB in the frame_jRMSE of block, if B_jRMSE of a block exceeding a threshold η_P，B_jThe block is discarded and the number of the last encoded image block is used as the key₄Is transmitted. The RMSE calculation process is shown in equation (1), where y and x are two images of the same size, and m and n represent the length and width of the image, respectively.

Two, encryption and decryption process

The system encodes and decodes the video frame based on the compressive sensing algorithm, and the compressive sensing algorithm has certain safety. We get the measurement y from the formula y ═ x, the measurement y can be regarded as the data obtained after encryption, and the measurement matrix phi is the key for that encryption. If an attacker intercepts information from the network to obtain data y, the information in the useful original video cannot be directly extracted from y through visual sense. But only depends on the encryption effect of the compressed sensing algorithm, the security of the system is not high enough, and the compression process is a linear operation and lacks some scrambling mechanisms, so that the system is vulnerable to chosen plaintext attack and known plaintext attack. Therefore, we also need to be able to combat these attacks by some other encryption mechanism.

Since the 20 th century mathematician poincare proposing "poincare guess", many one-dimensional and two-dimensional chaotic systems have appeared in succession, of which Logistic mapping, Chebyshev mapping, Henon mapping, etc. are typical. The chaos phenomenon is a deterministic and random-like process expressed in a nonlinear power system, is extremely sensitive to an initial value, and even a slight change of the initial value can cause a great difference of result sequences. In addition, the method has the characteristic of being bounded, and the chaotic sequence generated by the method can be used as a measurement matrix of compressed sensing. Therefore, under the condition that the chaotic system is well defined by the encoding end and the decoding end, the encoding end can take the initial parameter of the chaotic system as Key₁And compared with the direct transmission of the whole measurement matrix phi, the transmission is carried out, and the key space is greatly reduced. For example, a 1080P video is sampled, the pixel size of one I frame is 1920 × 1080, if the compression ratio is 10%, the size of the measurement matrix is 192 × 1920, and the chaotic system Logistic mapping is used, only the initial parameter { mu, x is needed to be transmitted₁The data amounts of the two are nearly 100000 times different from each other. The Logistic mapping is shown in equation (2), where μ ∈ [0,4 ]]，x∈[0,1]。

x_n+1＝x_n(1-x_n)μ (2)

Finally, we perform inter-block scrambling and intra-block diffusion on all pixel blocks in the video frame. And (3) scrambling among blocks, namely, scrambling the sequence among the image blocks to obtain irregular image block combinations, so that the difficulty of cracking is increased.The input of the encryption module is that the chaotic compressed sensing module divides the ith video frame into L image blocks B with the size of n multiplied by n, and then the encryption module conducts inter-block scrambling on the sequence S of the image blocks to obtain S'. The vector A for controlling the image block sequence scrambling is generated by the chaotic system module and has the length of L. The specific steps are as follows, assuming that the index of a certain image block in the sequence is j, and A_jAnd (4) ordering the image blocks from small to large in the sequence to be the kth, placing the jth image block at the kth position, and showing by formula (3). And scrambling the positions of all the image blocks to obtain a new image block sequence.

B'_k＝B_j (3)

In order to further enhance the security of video frame data in the transmission process and prevent an attacker from cracking the encrypted pixel blocks by resisting chosen plaintext attack and known plaintext attack, an image block B' obtained after sequence scrambling is subjected to diffusion processing. In image encryption, diffusion processing is to hide information of any plaintext pixel point in as many ciphertext pixel points as possible under the condition of not changing the position of the pixel point. B' is subjected to forward diffusion and backward diffusion to obtain E, and forward and backward diffusion algorithms are shown as an equation (4) and an equation (5). Wherein, C is an encryption vector and is generated by the chaotic system module.

At the sampling result Y of video frame_iAnd Y_nIn the encryption process, initial parameters of the chaotic system for generating A, C sequences are respectively used as Key keys₂Key of harmony Key₃And sending the decoding end. To prevent the key from being easily obtained by an attacker in the present system, the key will be transmitted in the secure channel due to the low security and high noise of the wireless channel, and the result of the video sampling is thatIs transmitted in a normal wireless channel. There are many ways to construct a secure channel, such as encryption using asymmetric cryptography, etc.

Since we propose a symmetric encryption and decryption scheme, the decryption algorithm is equivalent to the inverse of the encryption algorithm. Namely, when the decoding end receives the Key and the encrypted video frame data, the chaotic system module of the decoding end can follow the Key₂And Key₃Initial parameters were obtained separately and A, C sequences were generated. The decryption module first subjects the encrypted data E to the inverse process of block diffusion according to the encryption parameters in the A, C sequence, as shown in equations (6) and (7). Next, it is subjected to inter-block inverse scrambling, thereby finally outputting a decrypted measurement result Y'.

Third, video reconstruction process

In the conventional video encoding and decoding technology, the time consumed by video encoding is far longer than the decoding time, and the video transmission system based on compressed sensing is opposite to the conventional video encoding and decoding technology. Therefore, in SFE-VTS, the first problem to be solved is how to reconstruct video frames quickly and with high quality, which determines whether the entire video transmission system can operate efficiently. At present, most of real-time monitoring videos have pixel points of over 100 ten thousand, and a camera for road traffic monitoring has pixels of up to 800 ten thousand. The conventional compressed sensing is slow in recovery speed for large-size images and is not suitable for service scenes with millisecond-level time delay, so in order to improve the speed of a compressed sensing algorithm for recovering video frames, a new depth learning-based parallel video frame recovery algorithm (DPVR) is proposed next.

During the reconstruction of the video, the recovery is performed in units of GOPs. If a complete GOP is to be recovered, we first go through block-chaosThe key frame reconstruction step recovers the I-frame of this GOP. In the block-chaos key frame reconstruction step, the image blocks of each video frame are restored through DPVR, and a complete I frame is formed according to the serial number sequence of the blocks. The DPVR is composed of a number of identical reconstruction units R, each of which recovers a decrypted image block Y 'using a neural network'_iThe purpose of recovering a complete video frame within dozens of milliseconds is achieved through parallel operation, the reconstruction time is greatly shortened, and the defect of low recovery speed of a large-size image in the traditional compressed sensing is overcome. Therefore, compared with the traditional compressed sensing algorithm, the DPVR can be applied to the application environment of real-time video transmission.

DPVR first performs fast training on the neural network. In order to ensure the training efficiency and the reliability of video reconstruction, the training set is composed of dozens or hundreds of video frames sent by an encoding end to a decoding end in the process of equipment initialization. The number of video frames in the training set will change accordingly according to the video pixels, and if the video pixels are smaller, the number of video frames in the corresponding training set will increase. In addition, under the condition that sampling conditions allow, the sampling time interval of adjacent video frames in each training set needs to be as large as possible, which is to enable a decoding end to obtain more diverse training data and improve the recovery quality of the reconstruction unit R in a real-time working state. The reconstruction unit R will then use the Key₁And obtaining a measurement matrix phi from the chaotic system module, and training convolutional neural networks with different compression ratios r.

After the fast training is completed, each reconstruction unit { R ] of the key frame recovery module₁,R₂,…,R_lAccording to the measurement matrix phi and the compression rate r, image blocks { Y 'in the same key frame I are subjected to'_i1,Y′_i2,…,Y′_ilPerforming parallel recovery to obtain reconstructed image blocks { B'_i1,B′_i2,…,B′_il}. And finally, sequentially splicing image blocks of all recovered video frames according to the numbers to obtain a complete decoded key frame I'. Meanwhile, the self-selection image block module uses the Key Key₄I.e. a sequence of unsampled image blocks, non-key frames P_iThe empty tiles are extracted from I' and used as one of the inputs to the non-key frame restoration module.

The non-key frame depth recovery module also passes Y through the reconstruction unit_n' restoration to B according to compression ratio of respective image blocks_n', but only that portion of the non-key frame sample is obtained at this time. The non-key frame restoration module then combines the restored blocks from the selected block module and the non-key frame depth restoration module to form a complete frame of non-key frames P_i'. Relatively speaking, the smaller the size of an image block, the faster the recovery speed of the reconstruction unit, but the smaller the size of the image block, the less the recovery effect of the block-based compressed sensing algorithm is, so we set the size of the block to 33 × 33.

The frame combination module combines the key frame I' output by the key frame recovery module and the non-key frame { P) output by the non-key frame recovery module₁',P₂',…,P_n' } are combined and finally we will get a reconstructed GOP. The above are all the steps of the reconstruction process at the decoding end, and the following description will describe the image reconstruction algorithm used by the reconstruction unit and a new neural network we propose.

The invention constructs a new convolutional neural network based on a fast soft threshold iterative backtracking algorithm (BFISTA), thereby realizing fast shrinkage of the threshold in the iterative process and further solving the linear inverse problem in image processing. The BFISTA is based on the classical gradient algorithm ISTA, and incorporates the Nesterov acceleration approximate point gradient method (NGA), so that the convergence rate is improved from linear O (1/k) to O (1/k)²). Secondly, the BFISTA utilizes a backtracking method to continuously calculate and shrink iterative step length, and can solve the problem of rapidly reconstructing large-size videos. And under the condition that the current step length t meets a certain condition, basic steps of each iteration of BFISTA are shown in formulas (8), (9) and (10).

x_k＝Γ_λt(y_k-2tA^T(Ay_k-b)) (8)

Therefore, a novel convolutional neural network structure is designed according to the BFISTA algorithm. The depth of the convolution network of each iteration is 8, the sizes of convolution kernels are respectively 3 multiplied by 32, and X is_nAnd obtaining a vector result for the nth iteration of a certain decrypted image block, wherein X' is a recovered image block. And finally, the image data recovered by the deep learning model is subjected to a self-adaptive video reconstruction algorithm to obtain a reconstructed non-key frame, and the obtained key frame and the non-key frame are combined to obtain the recovered original video.

The invention provides a safe, fast and efficient video transmission system (SFE-VTS) applied to a real-time monitoring video network based on deep compressed sensing. Compared with the traditional video transmission scheme, the system can effectively solve the problems of overlarge occupied resources at the encoding end and low safety in the transmission process. In addition, the invention uses the deep learning algorithm to reconstruct the signal, and the recovery effect and speed are better than those of the video transmission scheme based on the traditional compressed sensing.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof, but such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The video safe and efficient transmission system applied to the video sensor network comprises a video coding end and a video decoding end and is characterized in that the video coding end is deployed on a sensor node of a VSN and used for compressing and encrypting an original video, the video decoding end is deployed at a cloud end, a received video signal is decrypted and reconstructed by using a cloud end computing module, the encryption and decryption process is based on a compression sensing algorithm of a chaotic system and a video frame block, and the video reconstruction process is based on a parallel video frame recovery algorithm of deep learning.

2. The system of claim 1, wherein the process of video compression at the video encoding end comprises the steps of key frame and non-key frame selection, adaptive non-key frame block selection, adaptive sampling rate selection, and chaos-based video frame block compression for the original video.

3. The system for safe and efficient video transmission applied to the video sensor network according to claim 2, wherein the video compression process is specifically as follows:

s103, sampling the non-key frame P based on block compressed sensing, and firstly judging P_iWhether the RMSE of the frame and the I frame is greater than a threshold eta_IIf so, the last frame of the group of video frames is P_i-1Will calculate P_i-1Frame and P_iB in the frame_jRMSE of block, if B_jRMSE of a block exceeding a threshold η_P，B_jThe block is discarded and the number of the last encoded image block will be used as the key₄Is transmitted.

4. The system according to claim 3, wherein in step S101, the first frame of each video frame group is set as a key frame I, and the rest of the video frames are non-key frames P, so as to obtain a video frame group { I, P }₁,P₂,…,P_n}。

5. The video safe and efficient transmission system applied to the video sensor network as claimed in claim 1, wherein the video encoding end performs the video signal encryption process including the steps of generating a chaotic measurement matrix, encrypting a measurement value and safely transmitting a system key.

6. The system for secure and efficient video transmission applied to the video sensor network according to claim 5, wherein the video signal encryption process is specifically as follows:

7. The system for secure and efficient transmission of video over a video sensor network as recited in claim 1, wherein the process of decrypting the video signal at the video decoding end comprises the steps of decrypting the system key and decrypting the encrypted signal.

8. The system for secure and efficient video transmission applied to the video sensor network according to claim 7, wherein the process of decrypting the video signal is specifically as follows:

s301, authigenicThe initial parameters of the chaotic system forming the vector A and the encrypted vector C sequence for controlling the scrambling of the image block sequence are respectively used as a Key Key₂Key of harmony Key₃A transmitting decoding end;

9. The system as claimed in claim 1, wherein the video decoding end performs video restoration process including steps of depth compressed sensing-based video frame reconstruction, non-key frame adaptive restoration and video frame merging.

10. The system for safe and efficient video transmission applied to the video sensor network according to claim 1, wherein the video recovery process specifically comprises:

s403, each reconstruction unit { R ] of the key frame recovery module₁,R₂,…,R_lAccording to the measurement matrix phi and the compression ratio r, each image block { Y) in the same key frame I is subjected to image matching_i'₁,Y_i'₂,…,Y_i'_lPerforming parallel recovery to obtain a reconstructed image block { B }_i'₁,B_i'₂,…,B_i'_l}；

S404, all the recovered video frames are processedThe image blocks are sequentially spliced according to the numbers to obtain a complete decoded Key frame I', and meanwhile, the self-selection image block module uses a Key Key₄I.e. a sequence of unsampled image blocks, non-key frames P_iThe vacant image block is extracted from I' and is used as one of the inputs of the non-key frame recovery module;