CN110290386A - A kind of low bit- rate human motion video coding system and method based on generation confrontation network - Google Patents
A kind of low bit- rate human motion video coding system and method based on generation confrontation network Download PDFInfo
- Publication number
- CN110290386A CN110290386A CN201910479249.8A CN201910479249A CN110290386A CN 110290386 A CN110290386 A CN 110290386A CN 201910479249 A CN201910479249 A CN 201910479249A CN 110290386 A CN110290386 A CN 110290386A
- Authority
- CN
- China
- Prior art keywords
- character
- memory
- skeleton
- coding
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/177—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/65—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Abstract
The present invention relates to a kind of based on the low bit- rate human motion video coding system and method that generate confrontation network, make full use of human motion video content structural information, video content information is decomposed into the memory character comprising global appearance attribute information and skeleton character two parts of body motion information can be expressed, and is compressed using attention mechanism with the more efficient low bit-rate video of confrontation network implementations is generated.The present invention makes full use of human motion video content structural information, video content information is decomposed into the memory character comprising global appearance attribute information and skeleton character two parts of body motion information can be expressed, and is compressed using attention mechanism with the more efficient low bit-rate video of confrontation network implementations is generated.
Description
Technical field
The present invention relates to a kind of based on the low bit- rate human motion video coding system and method that generate confrontation network, belongs to
Encoding video pictures and compression technique area.
Background technique
Video compress is resolved into prediction, become by traditional mixed video coding framework (MPEG-2, H.264 with H, 265 etc.)
It changes, quantify and entropy coding four basic steps carry out.In the coding framework, the redundancy between video consecutive frame is main
It is removed by the motion compensation as unit of the block of fixed size.By years development, conventional codec performance has been obtained not
Disconnected promotion.With the development of deep learning, it is thus proposed that realize the model of video compression coding using deep learning.It is different
In traditional code, the Video coding based on deep learning can be realized better changing learning.In addition, deep learning end to end
Model can voluntarily learn, according to unified optimization aim adjust automatically each section module come Optimized model.
Above-mentioned encoding scheme is using the fidelity of Pixel-level as optimization aim, the structural information of video sequence content
There is no be fully utilized in an encoding process.Such as the identification in the monitor video application of public safety field for people
Extremely important, the analysis for carrying out coding and later period there is also a large amount of video comprising human motion is studied and judged, and this kind of view
Frequently actually comprising global appearance attribute information and body motion information two parts in its structure.Therefore the present invention considers to utilize human body
The structural information of sport video further increases code efficiency.
Summary of the invention
The technology of the present invention solves the problems, such as: overcoming the deficiencies of the prior art and provide a kind of based on the low code for generating confrontation network
Rate human motion video coding system and method make full use of human motion video content structural information, by video content information
It is decomposed into the memory character comprising global appearance attribute information and skeleton character two parts of body motion information can be expressed, and
Using attention mechanism and generate the more efficient low bit-rate video compression of confrontation network implementations.
The technology of the present invention solution: a kind of low bit- rate human motion video coding system based on generation confrontation network,
Wherein be integrated with the structural information for human motion video content extract, the fusion of the structural information of human motion video content with
Characteristic information coding/decoding, characterized by comprising: memory character extraction module, memory character coding/decoding module, skeleton character mention
Modulus block, skeleton character coding/decoding module recall attention power module and generate confrontation network module, in which:
Memory character extracts network module, according to convolution loop neural network, by the video frame in image group temporally on
Sequencing input convolution loop neural network, the output obtained after to the last video frame input be remember it is special
Sign;
Memory character coding/decoding module, including memory character coding and memory character decode two modules, by memory character
Memory character coding module is inputted, which carries out quantization operation to memory character first, and memory character exports after being quantified;
Then its entropy coding for using memory character output after quantization memory character, obtains the code that memory character part is used for transmission
Stream;The code stream of memory character part is inputted into memory character decoder module, which carries out the entropy decoding amount of obtaining for code stream first
Feature reconstruction is remembered after change;Then inverse quantization is carried out to memory character output after quantization and obtains memory character reconstruction, it will input
Recall and pays attention to power module;
Skeleton character extraction module will need the video frame encoded to be input to human posture and estimate to carry out node position in network
Estimation is set, the skeleton character comprising human body key node location information is obtained, the skeleton character is human body key node
And joint position, the human body key node include head, hand, foot and body;In order to improve code efficiency, skeleton character will be defeated
Enter skeleton character coding/decoding module and further encodes compression;
Skeleton character coding/decoding module carries out encoding and decoding to the skeleton character of encoded video frame;Skeleton character is inputted into bone
Bone feature coding part carries out predictive coding to the skeleton character of input first, obtains the residual information of actual transmissions;Then right
Residual information carries out the entropy coding of skeleton character residual information, obtains the code stream of skeleton character fractional transmission;To the code stream of transmission
Skeleton character decoded portion is inputted, skeleton character entropy decoding is carried out to code stream first and obtains the reconstruction of skeleton character residual error;Again to bone
Bone feature residual error, which is rebuild, carries out prediction decoding, obtains final skeleton character and rebuilds, input is recalled and pays attention to power module;
Recall and pays attention to power module, the memory character reconstruction for being obtained memory character decoder module using attention mechanism and bone
The skeleton character reconstruction that bone feature decoder module obtains is merged, the characteristic information merged;
Generate confrontation network module, including two parts of generator and arbiter;The generator will recall attention mould
The fusion feature information input that block obtains generates network, the video frame generated;Whether arbiter judges the video frame generated
Consistent with true nature video frame to provide score, which will be used to generate the instruction of network as a part for generating network
Practice, i.e. a part of loss function optimizes the training of generator.
The memory character extracts in network module, and memory character includes image group successive frame overall situation appearance attribute information;
The set that described image group is made of certain length successive video frames;Described image group successive frame overall situation appearance attribute information is
Utilize the video appearance attribute information that video frame input convolution loop neural network is extracted in image group, video appearance attribute information
Including human body face clothing appearance attributes information in background appearance in video and scene.
The convolution loop neural network structure that the memory character extracts in network module is as follows: including one layer of convolution loop
Layer, which establishes the sequential relationship of similar cycle neural network, while part is portrayed as convolutional neural networks
Space characteristics, the port number of convolution loop layer are 128, and core size is 5, step-length 2.
In the memory character coding/decoding module, memory character coding module includes quantization to memory character and to amount
Entropy coding two parts of feature after change;The memory character quantization is according to existing quantization method, to each in memory character
Characteristic value carries out quantification treatment, memory character after being quantified after each characteristic value quantization respectively;The memory character entropy coding
Entropy is carried out to memory character after quantization according to existing entropy coding method for the entropy coding method for memory character after quantization
Coding, obtains the code stream that memory character part is used for transmission;Memory character decoder module includes corresponding entropy decoding and inverse
Change two parts;The memory character entropy decoding is according to entropy decoding method corresponding with memory character entropy coding, to memory character
The code stream of fractional transmission carries out entropy decoding process, remembers feature reconstruction after being quantified;The memory character inverse quantization according to
Memory character quantifies opposite quantification method, carries out inverse quantization operation to the memory character after quantization, obtains memory character weight
It builds, which, which rebuilds, recalls the reconstruction for noticing that power module is used for video frame for input.
The skeleton character coding/decoding module includes two parts of skeleton character compressed encoding and decoding;Skeleton character coding
Part includes two parts of predictive coding and entropy coding;The predictive coding considers the redundancy in skeleton character time domain, to every
For the coordinate of a node using the node coordinate of encoded video frame former frame as predicted value, the node predicted value is corresponding with the frame
The coordinate of node, that is, true value carries out residual error operation, and obtained residual error will carry out the entropy coding of skeleton character;The skeleton character
Entropy coding is the entropy coder for skeleton character, and according to existing entropy coding method, skeleton character predictive coding is obtained residual
Difference will input entropy coder progress entropy coding and obtain the code stream information of skeleton character;Skeleton character decoded portion utilizes skeleton character
The code stream obtained after coding realizes the lossless reconstruction of skeleton character by entropy decoding and prediction decoding;The skeleton character entropy decoding
According to existing entropy decoding method, entropy coding is carried out to skeleton character code stream and obtains the residual error that skeleton character predictive coding obtains;
The skeleton character prediction decoding then utilizes node coordinate and skeleton character entropy decoding gained residual error of previous moment, carries out phase
Add operation to obtain decoded skeleton character to rebuild.
In the skeleton character extraction module, human posture estimates that network is PAF network, and structure is divided into two line structures,
Joint confidence map, joint are obtained using the convolutional layer that 2 layers of core size are 1x1 after the convolutional layer that 3 layers of core size are 3 all the way
Confidence map expresses the location information for detecting obtained each node and the node belongs to the confidence level of human body node;Another way
PAFs is obtained using the convolutional layer that 2 layers of core size are 1x1 after the convolutional layer that 3 layers of core size are 3.PAFs be a 2D to
Duration set, each 2D vector can encode the position and direction of a limbs.It will join together with the confidence map in joint
Close study and prediction.
The present invention is a kind of based on the low bit- rate human motion method for video coding for generating confrontation network, comprising the following steps:
(1) successive frame of image group is sequentially inputted into memory character extraction module, memory character extraction module benefit
Memory character output is obtained with convolutional neural networks;
(2) memory character is inputted into memory character coding module, which quantify carrying out again with after by memory character
The entropy coding of memory character is handled, to obtain the code stream output of memory character part;
(3) video frame for needing to encode is inputted into skeleton character extraction module, which estimates network using human posture
Obtain skeleton character;
(4) skeleton character is inputted into skeleton character coding module, skeleton character is after predictive coding by skeleton character
Entropy coding handle to obtain the output of skeleton character partial code streams;
(5) memory character partial code streams are inputted into memory character decoder module, which remembers memory character code stream
Recall and obtain memory character by memory character inverse quantization after feature entropy decoding and rebuild, while skeleton character code stream input bone is special
Decoder module is levied, which will obtain skeleton character by skeleton character inverse quantization after skeleton character code stream entropy decoding and rebuild;
(6) memory character is rebuild and skeleton character rebuilds input memory and notices that power module, the module utilize attention machine
The fusion feature of fusion two parts information is made;
(7) fusion feature is obtained video frame as the condition entry for generating confrontation network generator to rebuild, in training
Arbiter can provide the video frame that a generator generates rebuilds the score for whether meeting natural video frequency according to the generation of generator,
Its training that will be used for generator.
The advantages of the present invention over the prior art are that:
(1) different based on block and the distortion of pixel level from conventional video coding, the present invention is for the first time by video information by memory
Feature and skeleton character are decomposed, and video structure information is taken full advantage of, to further improve code efficiency;
(2) mode that generation confrontation network is utilized to video frame reconstruction in the present invention is rebuild, different from traditional code,
The information of video-losing can be restored with the mode generated in decoding by generating confrontation network mode, to realize subjective quality
It is promoted;
(3) for the processing of feature after decomposing, after recalling attention power module progress video decomposition
The fusion of obtained feature, using attention mechanism, memory character is rebuild and skeleton character reconstruction is effectively merged, from
And it ensure that restoration and reconstruction of the video frame according to information.
Detailed description of the invention
Fig. 1 is the general frame of present system;
Fig. 2 is memory character coding/decoding module structural block diagram in the present invention;
Fig. 3 is skeleton character coding/decoding module structural block diagram in the present invention;
Fig. 4 is that attention modular structure block diagram is recalled in the present invention;
Fig. 5 present invention figure compared with traditional coding method subjective quality.
Specific embodiment
Below with reference to a kind of memory character coding/decoding method of the invention, a kind of skeleton character coding/decoding method, one kind
Recall pay attention to power module and a kind of generations fight network implementations to the technical solution in the present invention carry out it is clear, completely retouch
It states.Obviously, described embodiment is only a part of the embodiment of the present invention, is not whole embodiments.It is any to be based on this hair
Bright embodiment and every other reality obtained by those of ordinary skill in the art without making creative efforts
It applies example and belongs to protection scope of the present invention.
As shown in Figure 1, coding framework of the present invention includes following module: memory character extracts, memory character volume/solution
Code, skeleton character coding/decoding, recalls attention power module and generates confrontation network skeleton character extraction.Wherein:
It is to extract the memory comprising image group successive frame overall situation appearance attribute information that the memory character, which extracts role of network,
Feature, primary structure are convolution loop neural network.It includes one layer of convolution loop layer, which establishes similar cycle nerve net
The sequential relationship of network, while local spatial feature can be portrayed as convolutional neural networks.The port number of its convolution loop layer
It is 128, core size is 5, step-length 2.
The memory character encoding and decoding mainly include that memory character coding and memory character decode two parts.Wherein, it compiles
Code part mainly includes the quantization to memory character and entropy coding two parts to feature after quantization, and decoded portion includes corresponding to
Entropy decoding and inverse quantization.Wherein entropy coding/decoding device is generally arithmetic coder/decoder, and quantization generally uses scalar quantization.
In embodiments of the present invention, the memory character encoding and decoding are as shown in Fig. 2, memory character uses scalar quantization first
Quantified, removes its redundancy further using entropy coding then to obtain final code stream.For further removal amount
The redundancy of memory character is after change to improve code efficiency, and the embodiment of the present invention is using super pro-active network come to memory character after quantization
Each characteristic value has carried out probabilistic forecasting modeling, and is carried out according to the probability distribution that prediction obtains to the memory character after quantization
Entropy coding.
The probability of specific memory character in order to obtain, memory character will enter into super pro-active network to extract variable z, the change
Amount can be transmitted by arithmetic coding as additional information.At encoding and decoding end, with reference to existing at present based on deep learning
The probability distribution of image encoding method, memory character will be modeled as Gaussian Profile.And variable z will be used to predict memory character probability
The mean value and variance of distribution.
In addition, the quantization of memory character will be integrated in memory character coding module network, and the scalar quantization of standard
Can be to bring training that can not lead in memory character coding module network the problem of, so having used Uniform noise to carry out generation when training
For the scalar quantization process of standard.Its specific formula is as follows:
In above formulaFor the memory character after quantization, M is the memory character before quantization,ForWithIt
Between meet equally distributed noise.
The skeleton character extraction effect can express body motion information bone letter to extract in present encoding video frame
Breath, principal mode are human body key node position.Network is estimated by using existing PAFs human posture, includes human body
The skeleton character of key node location information will be exported from the module.
The skeleton character encoding and decoding mainly include two parts of skeleton character compressed encoding and decoding.Coded portion considers
Redundancy in skeleton character time domain uses predictive coding first, and the residual error of predicted value and true value will input entropy coder and carry out
Coding obtains code stream output.Decoded portion mainly passes through entropy decoding and prediction decoding realizes that skeleton character is rebuild.Wherein, entropy compile/
Decoder is usually Arithmetic codecs etc..
The memory pays attention to power module, main to be carried out memory character information and skeleton character information using attention mechanism
Fusion combines the bone containing particular frame body motion information special on the basis of the memory character containing global appearance attribute information
Levy the characteristic information to be merged.
Memory described in the embodiment of the present invention pays attention to the visible Fig. 4 of power module specific implementation, as seen from the figure used in embodiment
Attention mechanism can actually be indicated with the function of inquiry (query) and key-value (key-value) three.Its is corresponding
Formula is as follows:
R (Q, K, V)=[WVT+Q,V]
Wherein, Q is inquiry matrix, and K is key matrix, and V is value matrix, and R is the characteristic information that fusion obtains, [,] table
Show concatenation.Matrix W mainly passes through inquiry matrix Q and key matrix K and obtains, and specific calculation formula is as follows:
W=QKT
Memory character is rebuildIt is 1 by two different convolution kernel sizes, port number is identical as input, and step-length is 1
The feature obtained after convolution will be respectively as actual key matrix K and value matrix V, and inquiring matrix Q is by a convolution kernel
Size is 1, and port number is identical as input, the feature obtained after the convolution that step-length is 1.
In embodiments of the present invention, the skeleton character codec compression practical object is the position of 18 nodes of human body
Coordinate, basic structure are as shown in Figure 3.
The coordinate information of node each for body, embodiment use the coordinate information of the previous moment node as prediction
Value, uses the residual error of current time node coordinate and predicted value as actual encoded information during actual coding.This is residual
Poor information is further compressed using the common adaptive arithmetic code of conventional coding scheme, and obtained code stream is bone spy
Levy the final code stream of part.Its decoding process is similar, obtains residual information by arithmetic decoding first, and carries out prediction decoding and obtain
It is rebuild to skeleton character.
Generation confrontation network includes two parts of generator and arbiter described in the embodiment of the present invention.Wherein, generator G will
Recall and notices that the fusion feature expression of power module output obtains video frame reconstruction, the generator that the present invention uses as condition entry
Network is pix2pixHD network.The arbiter of the embodiment of the present invention then includes airspace arbiter DIWith time domain arbiter DxTwo portions
Point, the network structure of arbiter is VGG network.Airspace arbiter DIInput be present frame skeleton character S, generate frame information
With true frame information X.Whether it is used to judge the video frame generated close to true natural image.Time domain arbiter DxInput
It is then the skeleton character S of present frame and former frame, generates frame information and true frame information.It, which is acted on, predominantly guarantees to generate view
Continuity between frequency consecutive frame.Above-mentioned two arbiter together constitutes the confrontation loss l of present networks#dv.Its calculation formula is such as
Under:
Wherein stIt is x for the skeleton character information of t momenttFor the video frame of t moment, G makes a living into confrontation network,By a definite date
Hope function.
The loss function design for generating confrontation network G is as follows:
L=l#dv+λcomplcomp+λfmlfm+λVGGlVGG
Wherein, l#dvFor the corresponding confrontation loss of two arbiters, lc9mpFor the code stream size of memory character.lfmAnd lVGGThen
It is with reference to the existing characteristic matching loss and the loss of VGG network aware for generating confrontation network work addition.λc9mp、λfmWith
7VGGFor the weight of corresponding loss.In an embodiment of the present invention, weight is set as λc9mp=1, λfm=10, λVGG=10.
Human motion video is decomposed into recall info and bone information two parts by the present invention from structure, is taken full advantage of
The structural information of video content, to realize the effect for being better than conventional video coding method (H.264, H.265) in quality.
The present invention has carried out coding efficiency test on KTH data set and APE data set respectively.For KTH data set, originally
Invention has been randomly selected 8 video sequences therein and has been tested as test set.For APE data set, the present invention is then random
7 video sequences have been extracted to be tested.Specific results of property may refer to Fig. 5 and table 1.
Fig. 5 is present invention figure compared with traditional coding method subjective quality.The present invention and tradition side as we can see from the figure
Method, which is compared, has restored more detailed information, and the present invention does not have serious fuzzy and blocking artifact phenomenon.
Table 1 is the present invention and average coding efficiency comparison sheet of the conventional coding scheme on cycle tests.It can be with from table
Find out, the present invention obtained on KTH data set can with H.264 comparable coding efficiency, while on APE data set, this
The PSNR of invention with H.265 compared 3.61dB also high.The method of the present invention is in above-mentioned two data set in above-mentioned comparison
The encoding code stream size used only about conventional coding scheme 50%.
The average coding efficiency comparison sheet of 1 present invention of table and conventional coding scheme on cycle tests
In short, the invention proposes based on the low bit- rate human motion video compress model for generating confrontation network, performance
There is significant increase compared with present encoding standard method.It include the video of human motion for public safety field monitor video etc.
In coding, the present invention has great application prospect.
The foregoing is merely the preferable specific embodiments of the present invention, but scope of protection of the present invention is not limited thereto,
Anyone skilled in the art within the technical scope of the present disclosure, the variation that can readily occur in or replaces
It changes, should all be included within the scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of claims
It protects subject to range.
Claims (5)
1. a kind of based on the low bit- rate human motion video coding system for generating confrontation network characterized by comprising memory is special
Extraction module is levied, memory character coding/decoding module, skeleton character coding/decoding module, recalls attention at skeleton character extraction module
Module and generation confrontation network module, in which:
Memory character extracts network module, according to convolution loop neural network, by the video frame in image group temporally on elder generation
After sequentially input convolution loop neural network, the output obtained after to the last video frame input is memory character;
Memory character coding/decoding module, including memory character coding and memory character decode two modules, and memory character is inputted
Memory character coding module, the module carry out quantization operation to memory character first, and memory character exports after being quantified;Then
It uses memory character output after quantization the entropy coding of memory character, obtains the code stream that memory character part is used for transmission;It will
The code stream of memory character part inputs memory character decoder module, which obtains quantization postscript for code stream progress entropy decoding first
Recall feature reconstruction;Then inverse quantization is carried out to memory character output after quantization and obtains memory character reconstruction, note is recalled into input
Meaning power module;
Skeleton character extraction module will need the video frame encoded to be input to human posture and estimate that carrying out node location in network estimates
Meter, obtains the skeleton character comprising human body key node location information, and the skeleton character is human body key node and pass
Section is set, and the human body key node includes head, hand, foot and body;
Skeleton character coding/decoding module carries out encoding and decoding to the skeleton character of encoded video frame;Skeleton character input bone is special
Coded portion is levied, predictive coding is carried out to the skeleton character of input first, obtains the residual information of actual transmissions;Then to residual error
Information carries out the entropy coding of skeleton character residual information, obtains the code stream of skeleton character fractional transmission;Code stream input to transmission
Skeleton character decoded portion carries out skeleton character entropy decoding to code stream first and obtains the reconstruction of skeleton character residual error;Again to bone spy
It levies residual error and rebuilds progress prediction decoding, obtaining final skeleton character reconstruction ,=input is recalled and pays attention to power module by ----;
Recall and pay attention to power module, is rebuild using the memory character that attention mechanism obtains memory character decoder module and bone is special
The skeleton character reconstruction that sign decoder module obtains is merged, the characteristic information merged;
Generate confrontation network module, including two parts of generator and arbiter;The generator, which will be recalled, notices that power module obtains
The fusion feature information input arrived generates network, the video frame generated;Arbiter judge generate video frame whether with very
Real natural video frequency frame unanimously provides score, which will be used to generate the training of network as a part for generating network, i.e.,
A part of loss function optimizes the training of generator.
2. according to claim 1 based on the low bit- rate human motion video coding system for generating confrontation network, feature
Be: the memory character extracts in network module, and memory character includes image group successive frame overall situation appearance attribute information;It is described
The set that image group is made of certain length successive video frames;Described image group successive frame overall situation appearance attribute information is to utilize
The video appearance attribute information that video frame input convolution loop neural network is extracted in image group, video appearance attribute information include
Human body face clothing appearance attributes information in background appearance and scene in video.
3. according to claim 1 based on the low bit- rate human motion video coding system for generating confrontation network, feature
Be: in the memory character coding/decoding module, memory character coding module includes quantization to memory character and to quantization
Entropy coding two parts of feature afterwards;The memory character quantization is according to existing quantization method, to each spy in memory character
Value indicative carries out quantification treatment, memory character after being quantified after each characteristic value quantization respectively;The memory character entropy coding is
For the entropy coding method of memory character after quantization, according to existing entropy coding method, entropy volume is carried out to memory character after quantization
Code, obtains the code stream that memory character part is used for transmission;Memory character decoder module includes corresponding entropy decoding and inverse quantization
Two parts;The memory character entropy decoding is according to entropy decoding method corresponding with memory character entropy coding, to memory character portion
Divide the code stream of transmission to carry out entropy decoding process, remembers feature reconstruction after being quantified;The memory character inverse quantization according to note
Recall the opposite quantification method of characteristic quantification, inverse quantization operation carried out to the memory character after quantization, obtains memory character reconstruction,
The memory character, which is rebuild, recalls the reconstruction for noticing that power module is used for video frame for input.
4. according to claim 1 based on the low bit- rate human motion video coding system for generating confrontation network, feature
Be: the skeleton character coding/decoding module includes two parts of skeleton character compressed encoding and decoding;Skeleton character coding unit
It is divided to including two parts of predictive coding and entropy coding;The predictive coding considers the redundancy in skeleton character time domain, to each
The coordinate of node is using the node coordinate of encoded video frame former frame as predicted value, node predicted value section corresponding with the frame
The coordinate of point, that is, true value carries out residual error operation, and obtained residual error will carry out the entropy coding of skeleton character;The skeleton character entropy
It is encoded to the entropy coder for skeleton character, according to existing entropy coding method, residual error that skeleton character predictive coding obtains
Input entropy coder is subjected to entropy coding and obtains the code stream information of skeleton character;Skeleton character decoded portion is compiled using skeleton character
The code stream obtained after code realizes the lossless reconstruction of skeleton character by entropy decoding and prediction decoding;The skeleton character entropy decoding root
According to existing entropy decoding method, entropy coding is carried out to skeleton character code stream and obtains the residual error that skeleton character predictive coding obtains;Institute
The node coordinate and skeleton character entropy decoding gained residual error that skeleton character prediction decoding then utilizes previous moment are stated, is added
Operation obtains decoded skeleton character and rebuilds.
5. a kind of based on the low bit- rate human motion method for video coding for generating confrontation network, it is characterised in that: including following step
It is rapid:
(1) successive frame of image group is sequentially inputted into memory character extraction module, memory character extraction module utilizes volume
Product neural network obtains memory character output;
(2) memory character is inputted into memory character coding module, which quantify remembering again with after by memory character
The entropy coding of feature is handled, to obtain the code stream output of memory character part;
(3) video frame for needing to encode is inputted into skeleton character extraction module, which estimates that network obtains using human posture
Skeleton character;
(4) skeleton character is inputted into skeleton character coding module, skeleton character is after predictive coding by the entropy of skeleton character
Coded treatment obtains the output of skeleton character partial code streams;
(5) memory character partial code streams are inputted into memory character decoder module, which carries out memory spy for memory character code stream
Memory character is obtained by memory character inverse quantization after sign entropy decoding to rebuild, while skeleton character code stream is inputted into skeleton character solution
Code module, the module will obtain skeleton character by skeleton character inverse quantization after skeleton character code stream entropy decoding and rebuild;
(6) memory character is rebuild and skeleton character rebuilds input memory and notices that power module, the module are obtained using attention mechanism
To the fusion feature of fusion two parts information;
(7) fusion feature is obtained video frame as the condition entry for generating confrontation network generator to rebuild, is differentiated in training
Device can provide the video frame that a generator generates rebuilds the score for whether meeting natural video frequency according to the generation of generator, will
Training for generator.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910479249.8A CN110290386B (en) | 2019-06-04 | 2019-06-04 | Low-bit-rate human motion video coding system and method based on generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910479249.8A CN110290386B (en) | 2019-06-04 | 2019-06-04 | Low-bit-rate human motion video coding system and method based on generation countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110290386A true CN110290386A (en) | 2019-09-27 |
CN110290386B CN110290386B (en) | 2022-09-06 |
Family
ID=68003085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910479249.8A Active CN110290386B (en) | 2019-06-04 | 2019-06-04 | Low-bit-rate human motion video coding system and method based on generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110290386B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929242A (en) * | 2019-11-20 | 2020-03-27 | 上海交通大学 | Method and system for carrying out attitude-independent continuous user authentication based on wireless signals |
CN110942463A (en) * | 2019-10-30 | 2020-03-31 | 杭州电子科技大学 | Video target segmentation method based on generation countermeasure network |
CN111967340A (en) * | 2020-07-27 | 2020-11-20 | 中国地质大学(武汉) | Abnormal event detection method and system based on visual perception |
CN112950729A (en) * | 2019-12-10 | 2021-06-11 | 山东浪潮人工智能研究院有限公司 | Image compression method based on self-encoder and entropy coding |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120163675A1 (en) * | 2010-12-22 | 2012-06-28 | Electronics And Telecommunications Research Institute | Motion capture apparatus and method |
US20130107003A1 (en) * | 2011-10-31 | 2013-05-02 | Electronics And Telecommunications Research Institute | Apparatus and method for reconstructing outward appearance of dynamic object and automatically skinning dynamic object |
CN108174225A (en) * | 2018-01-11 | 2018-06-15 | 上海交通大学 | Filter achieving method and system in coding and decoding video loop based on confrontation generation network |
CN108596149A (en) * | 2018-05-10 | 2018-09-28 | 上海交通大学 | The motion sequence generation method for generating network is fought based on condition |
CN109086869A (en) * | 2018-07-16 | 2018-12-25 | 北京理工大学 | A kind of human action prediction technique based on attention mechanism |
-
2019
- 2019-06-04 CN CN201910479249.8A patent/CN110290386B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120163675A1 (en) * | 2010-12-22 | 2012-06-28 | Electronics And Telecommunications Research Institute | Motion capture apparatus and method |
US20130107003A1 (en) * | 2011-10-31 | 2013-05-02 | Electronics And Telecommunications Research Institute | Apparatus and method for reconstructing outward appearance of dynamic object and automatically skinning dynamic object |
CN108174225A (en) * | 2018-01-11 | 2018-06-15 | 上海交通大学 | Filter achieving method and system in coding and decoding video loop based on confrontation generation network |
CN108596149A (en) * | 2018-05-10 | 2018-09-28 | 上海交通大学 | The motion sequence generation method for generating network is fought based on condition |
CN109086869A (en) * | 2018-07-16 | 2018-12-25 | 北京理工大学 | A kind of human action prediction technique based on attention mechanism |
Non-Patent Citations (2)
Title |
---|
TIANYU HE等: "End-to-End Facial Image Compression with Integrated Semantic Distortion Metric", 《2018 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)》 * |
田曼等: "多模型融合动作识别研究", 《电子测量技术》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942463A (en) * | 2019-10-30 | 2020-03-31 | 杭州电子科技大学 | Video target segmentation method based on generation countermeasure network |
CN110942463B (en) * | 2019-10-30 | 2021-03-16 | 杭州电子科技大学 | Video target segmentation method based on generation countermeasure network |
CN110929242A (en) * | 2019-11-20 | 2020-03-27 | 上海交通大学 | Method and system for carrying out attitude-independent continuous user authentication based on wireless signals |
CN110929242B (en) * | 2019-11-20 | 2020-07-10 | 上海交通大学 | Method and system for carrying out attitude-independent continuous user authentication based on wireless signals |
CN112950729A (en) * | 2019-12-10 | 2021-06-11 | 山东浪潮人工智能研究院有限公司 | Image compression method based on self-encoder and entropy coding |
CN111967340A (en) * | 2020-07-27 | 2020-11-20 | 中国地质大学(武汉) | Abnormal event detection method and system based on visual perception |
CN111967340B (en) * | 2020-07-27 | 2023-08-04 | 中国地质大学(武汉) | Visual perception-based abnormal event detection method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110290386B (en) | 2022-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110290386A (en) | A kind of low bit- rate human motion video coding system and method based on generation confrontation network | |
CN102137263B (en) | Distributed video coding and decoding methods based on classification of key frames of correlation noise model (CNM) | |
CN110087092A (en) | Low bit-rate video decoding method based on image reconstruction convolutional neural networks | |
CN110290387A (en) | A kind of method for compressing image based on generation model | |
CN110139109A (en) | The coding method of image and corresponding terminal | |
CN103607591A (en) | Image compression method combining super-resolution reconstruction | |
CN113259676A (en) | Image compression method and device based on deep learning | |
CN113822147A (en) | Deep compression method for semantic task of cooperative machine | |
Sunil et al. | A combined scheme of pixel and block level splitting for medical image compression and reconstruction | |
CN113079378B (en) | Image processing method and device and electronic equipment | |
He et al. | Beyond coding: Detection-driven image compression with semantically structured bit-stream | |
Wu et al. | Memorize, then recall: a generative framework for low bit-rate surveillance video compression | |
CN113132727A (en) | Scalable machine vision coding method based on image generation | |
CN110246093A (en) | A kind of decoding image enchancing method | |
CN110677644A (en) | Video coding and decoding method and video coding intra-frame predictor | |
Duan et al. | Multimedia semantic communications: Representation, encoding and transmission | |
CN104363454A (en) | Method and system for video coding and decoding of high-bit-rate images | |
Tan et al. | Image compression algorithms based on super-resolution reconstruction technology | |
Shin et al. | Audio coding based on spectral recovery by convolutional neural network | |
CN101577825B (en) | Interactive quantized noise calculating method in compressed video super-resolution | |
CN115880762A (en) | Scalable human face image coding method and system for human-computer mixed vision | |
CN116416216A (en) | Quality evaluation method based on self-supervision feature extraction, storage medium and terminal | |
CN105376578A (en) | Image compression method and device | |
CN103647969B (en) | A kind of object-based Fast Fractal video compress and decompression method | |
CN105049871B (en) | A kind of audio-frequency information embedding grammar and extraction and reconstructing method based on HEVC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |