CN108882020A - A kind of video information processing method, apparatus and system - Google Patents
A kind of video information processing method, apparatus and system Download PDFInfo
- Publication number
- CN108882020A CN108882020A CN201710338736.3A CN201710338736A CN108882020A CN 108882020 A CN108882020 A CN 108882020A CN 201710338736 A CN201710338736 A CN 201710338736A CN 108882020 A CN108882020 A CN 108882020A
- Authority
- CN
- China
- Prior art keywords
- feature
- coding
- frame
- video
- present frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 11
- 238000003672 processing method Methods 0.000 title claims abstract description 11
- 238000000034 method Methods 0.000 claims abstract description 71
- 238000004458 analytical method Methods 0.000 claims abstract description 64
- 230000000007 visual effect Effects 0.000 claims abstract description 60
- 239000000284 extract Substances 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 30
- 238000009826 distribution Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 23
- 238000013135 deep learning Methods 0.000 claims description 16
- 238000012512 characterization method Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 230000003044 adaptive effect Effects 0.000 claims description 8
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 241000208340 Araliaceae Species 0.000 claims description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 2
- 235000008434 ginseng Nutrition 0.000 claims description 2
- 238000003860 storage Methods 0.000 abstract description 16
- 230000005540 biological transmission Effects 0.000 abstract description 10
- 238000010586 diagram Methods 0.000 description 18
- 230000006872 improvement Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 229920001296 polysiloxane Polymers 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/762—Media network packet handling at the source
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/537—Motion estimation other than block-based
- H04N19/54—Motion estimation other than block-based using feature points or meshes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440245—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/4728—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6047—Power optimization with respect to the encoder, decoder, storage or transmission
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The embodiment of the present application discloses a kind of video information processing method, apparatus and system.The method includes:Extract the feature of each video frame;Determine that the type of the feature, the type reflect the time domain degree of correlation between the feature and fixed reference feature;It is the feature coding, feature after being encoded using the scheduled coding mode for being matched with the type;Feature after the coding is sent to server, is used for visual analysis task after decoding in order to the server to feature after the coding.Utilize the embodiment of the present application, video itself can not be sent to cloud server, but be sent to cloud server after being encoded the feature of video and be used for visual analysis task, it can reduce data transmission pressure compared with the prior art, also can reduce the storage pressure of cloud server.
Description
Technical field
This application involves video technique field more particularly to a kind of video information processing methods, apparatus and system.
Background technique
In the cybertimes of high speed development, video information is more next as a kind of accurate, efficient, intuitive multimedia form
More widely it is applied.
Visual analysis field is one of important applied field of video information.For example, can be by being carried out to monitor video
Visual analysis realizes the functions such as automatic alarm, object detection, object tracing;It for another example, can be by visual analysis in magnanimity
Required image is retrieved in video;Etc..
In the prior art, visual analysis task is usually executed by cloud server, and the view of pending visual analysis
Frequency is often distributed on multiple terminals (for example, monitor terminal etc.), and the video that each terminal usually requires first by oneself acquisition is sent
Video is sent to cloud server to be used for visual analysis task to local local servers, then by each local servers.
But in practical applications, since the data volume for being sent to the video of cloud server is larger, data pass
Defeated pressure is larger, and biggish storage pressure can be brought to cloud server.
Summary of the invention
The embodiment of the present application provides a kind of video information processing method, apparatus and system, in the prior art to solve
Following technical problem:It is larger due to being sent to data volume of the cloud server for the video of visual analysis task, number
It is larger according to transmission pressure, and biggish storage pressure can be brought to cloud server.
In order to solve the above technical problems, what the embodiment of the present application was realized in:
A kind of video information processing method provided by the embodiments of the present application, including:
Extract the feature of each video frame;
Determine that the type of the feature, the type reflect the time domain degree of correlation between the feature and fixed reference feature;
It is the feature coding, feature after being encoded using the scheduled coding mode for being matched with the type;
Feature after the coding is sent to server, after decoding in order to the server to feature after the coding
For visual analysis task.
Optionally, the feature for extracting each video frame, specifically includes:
Receive the video sequence of one or more terminal acquisitions;
Extract in each video frame that the video sequence includes at least partly feature in region.
Optionally, the type for determining the feature, specifically includes:
Every frame in each video frame as present frame and is executed respectively:
According to the feature for the reference frame for belonging to same video sequence in each video frame with present frame, present frame is determined
Feature fixed reference feature, the frame in the video sequence is according to time-sequencing;
According to the fixed reference feature of the feature of present frame and the feature of present frame, the type of the feature of present frame is determined.
Optionally, the reference frame of present frame is based on to the frame carry out sequence ginseng belonging to present frame in the video sequence
Examine or adaptive reference determine, the adaptive reference according to interframe distance carry out.
Optionally, the fixed reference feature of the feature of the feature and present frame according to present frame, determines the feature of present frame
Type, specifically include:
Calculate the difference degree characterization value between the feature of present frame and the fixed reference feature of the feature of present frame;
According to calculated difference degree characterization value, the type of the feature of present frame is determined.
Optionally, the coding mode comprises at least one of the following:
Absolute coding is carried out to the feature of present frame;Residual error between the feature of present frame and its fixed reference feature is carried out
Coding;Using the coding result of the fixed reference feature of present frame as the coding result of present frame.
Optionally, the residual error between the feature of present frame and its fixed reference feature encodes, and specifically includes:
According to rate misalignment Optimized model, in scheduled each residual coding mode, determine that the matched residual error of the residual error is compiled
Pattern, the corresponding coding loss degree of each residual coding mode are different;
Using determining residual coding mode, the residual error is encoded;
Wherein, the rate misalignment Optimized model is the loss function according to the result accuracy rate of the visual analysis task
Determining, the loss function is determined according to the coding loss degree.
Optionally, in scheduled each residual coding mode, the matched residual coding mode of the residual error is determined, specifically
Including:
The residual error is divided into multiple subvectors;
In scheduled each residual coding mode, the matched residual coding mode of each subvector is determined respectively;
Using determining residual coding mode, the residual error is encoded, is specifically included:
Corresponding subvector is encoded by the way that the matched residual coding mode of each subvector is respectively adopted,
Realization encodes the residual error.
Optionally, the loss function is determined as follows:
According to specified probability distribution, the probability distribution of the distance between feature to be encoded and feature to be matched is determined, make
For the first probability distribution, wherein the feature to be matched is obtained according to visual analysis task sample;
According to prior probability, determine after the corresponding decoding of the feature to be encoded between feature and the feature to be matched
Distance probability distribution, as the second probability distribution;
According to first probability distribution, second probability distribution, visual analysis task is calculated separately based on coding
Result accuracy rate and the visual analysis task when the preceding feature executes after based on the encoding and decoding described in
Result accuracy rate when feature executes;
According to the result accuracy rate calculated separately, the loss function is determined.
Optionally, described is the feature coding, is specifically included:
Entropy coding is carried out for the feature;
The rate misalignment Optimized model is obtained according to the loss function and the corresponding encoder bit rate of the entropy coding
's.
Optionally, the method also includes:
The server is sent to after auxiliary information is encoded, in order to which server decoding obtains the auxiliary letter
Breath, and feature after the coding is decoded according to the auxiliary information;
Wherein, the auxiliary information comprises at least one of the following:Indicate the information of the type of the feature;Described in expression
The information of fixed reference feature.
Optionally, it is described feature after the coding is sent to server after, the method also includes:
When receiving the video frame acquisition request of the server, the corresponding each video frame of the request is sent to
The server.
Optionally, the feature is the deep learning feature extracted by deep learning network.
A kind of video information process device provided by the embodiments of the present application, including:
Extraction module extracts the feature of each video frame;
Determination module determines that the type of the feature, the type reflect the time domain between the feature and fixed reference feature
Degree of correlation;
Coding module is the feature coding, is encoded using the scheduled coding mode for being matched with the type
Feature afterwards;
Feature after the coding is sent to server by sending module, in order to which the server is to special after the coding
Visual analysis task is used for after sign decoding.
A kind of video information process system provided by the embodiments of the present application, including:One or more terminals, localized services
Device, visual analysis server;
The video sequence of acquisition is sent to the local servers by one or more of terminals;
The feature for each video frame that the video sequence includes is extracted in the local servers, determines the class of the feature
Type is the feature coding, feature after being encoded, by the volume using the scheduled coding mode for being matched with the type
Feature is sent to the visual analysis server after code, and the type reflects the time domain phase between the feature and fixed reference feature
Pass degree;
The visual analysis server is used for visual analysis task after decoding to feature after the coding.
At least one above-mentioned technical solution that the embodiment of the present application uses can reach following beneficial effect:It can not be to
Cloud server sends video itself, but is sent to cloud server after the feature of video is encoded and is used for vision point
Analysis task can reduce data transmission pressure compared with the prior art, also can reduce the storage pressure of cloud server, because
This, can partly or entirely solve the problems of the prior art.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will to embodiment or
Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only
Some embodiments as described in this application, for those of ordinary skill in the art, before not making the creative labor property
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is video features encoding and decoding solution provided by the embodiments of the present application under a kind of practical application scene
Principle summary figure;
Fig. 2 is a kind of flow diagram of video information processing method provided by the embodiments of the present application;
Fig. 3 is to determine that the type of the feature of present frame is adopted under a kind of practical application scene provided by the embodiments of the present application
A kind of specific embodiment schematic diagram;
Fig. 4 is the schematic illustration of the determining adoptable two kinds of reference modes of fixed reference feature provided by the embodiments of the present application;
Fig. 5 is residual coding flow diagram under a kind of practical application scene provided by the embodiments of the present application;
Fig. 6 is under a kind of practical application scene provided by the embodiments of the present application, and the feature based on deep learning feature compiles solution
The flow diagram of code solution;
Fig. 7 is a kind of structural schematic diagram of video information process device provided by the embodiments of the present application corresponding to Fig. 2;
Fig. 8 is a kind of structural schematic diagram of video information process system provided by the embodiments of the present application corresponding to Fig. 2.
Specific embodiment
The embodiment of the present application provides a kind of video information processing method and device.
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application
Attached drawing in embodiment, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described reality
Example is applied to be merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field
Those of ordinary skill's every other embodiment obtained without creative efforts, all should belong to this Shen
The range that please be protect.
Machine is analyzed a large amount of data and other visual tasks many times having can replace people, this
Process does not often need entire video or image, it is thus only necessary to according to the extracted feature of video information.Specifically may be used
To carry out feature extraction to collected video, carries out feature coding and be transmitted to cloud server, server is to volume beyond the clouds
Feature can be used for visual analysis task after being decoded after code.The coding of feature provided by the embodiments of the present application and transmission side
Case is a kind of effective solution scheme for the transmission of video big data, can guarantee the accuracy of visual analysis task
Meanwhile substantially reducing the cost stored and transmitted.
Fig. 1 is video features encoding and decoding solution provided by the embodiments of the present application under a kind of practical application scene
Principle summary figure.
In Fig. 1, great amount of terminals (such as monitoring camera, handheld mobile device etc.) is by captured Video Sequence Transmission
To local local servers, video sequence captured by terminal is locally stored in local servers;Then to video sequence
Column carry out feature extraction, feature coding, by the bit stream of feature after coding to cloud server;Cloud server is to reception
To code stream be decoded, and be used for visual analysis task.In some analysis applications, it is desirable to the result retrieved
Analysis is further looked at, the video content of needs (interested fraction video) is transferred to cloud server again at this time.By
Then feature is extracted on original video, so that higher-quality feature can guarantee the accuracy of visual analysis task;Relatively
The transmission cost of the code stream of feature is very small after the code stream of video sequence, coding, can alleviate data transmission and cloud clothes
The pressure of business device storage.
Further, it is contemplated that deep learning feature achieves good performance in many analysis tasks, favorably
In improving image retrieval, the performance of the visual tasks such as object detection, the embodiment of the present application also proposed a kind of based on deep learning
The encoding and decoding solution of feature is specifically exactly to be encoded using the temporal redundancy of feature to feature, reduces code rate
The performance of feature is kept simultaneously.
Above-mentioned solution is described in detail below.
Fig. 2 is a kind of flow diagram of video information processing method provided by the embodiments of the present application.The execution of the process
Main body can be terminal and/or server, for example, the monitor terminal mentioned in background technique, local servers etc..The end
End and/or server can specifically include but be not limited to following at least one equipment:Mobile phone, tablet computer, intelligence wearable are set
Standby, vehicle device, personal computer, large and medium-sized computer, computer cluster etc..
Process in Fig. 2 may comprise steps of:
S201:Extract the feature of each video frame.
In the embodiment of the present application, it is orderly to can be the time for each video frame, for example, each video frame can come from one
Or multiple video sequences.The feature can be the characteristics of image such as color, pattern, texture, gray scale, texture, be also possible to
The deep learning feature extracted by deep learning network, can preferably be the latter, because deep learning feature is more advantageous to
Improve the accuracy rate of subsequent visual analysis task.
The application to the data type of deep learning feature and without limitation, for example can be real value or two-value, this
Apply to extracting the specific structure of deep learning network that deep learning feature is based on also without limitation.
S202:Determine that the type of the feature, the type reflect that the time domain between the feature and fixed reference feature is related
Degree.
In the embodiment of the present application, the type is predetermined.The feature and fixed reference feature that different type is reflected
Between time domain degree of correlation it is different, the fixed reference feature of the feature of any frame in each video frame specifically can be according to following number
According to obtaining:The feature of a frame other than any frame or multiframe in each video frame.It for ease of description, can be with
One frame or multiframe are known as to the reference frame of any frame.
The time domain degree of correlation specifically can be temporal redundancy degree or similarity degree.Typically, for video sequence
In each video frame feature for, if the time domain degree of correlation between feature and fixed reference feature is higher, feature and fixed reference feature
Similarity degree it is also higher.
In the embodiment of the present application, the class of feature can be determined according to the difference degree between feature and fixed reference feature
Type.The difference degree can such as be characterized by characterization values such as residual error or space lengths.
S203:It is the feature coding using the scheduled coding mode for being matched with the type, it is special after being encoded
Sign.
In the embodiment of the present application, if the time domain degree of correlation between feature and fixed reference feature is higher, and fixed reference feature is
Coding, then can encode feature according to the coding result of fixed reference feature, or according between feature and fixed reference feature
Difference encodes feature, in this way, advantageously reducing the coding cost of feature and reducing the data of the coding result of feature
Amount;And if the time domain degree of correlation between feature and fixed reference feature is lower, can carry out absolute coding to feature.
S204:Feature after the coding is sent to server, in order to which the server is to characteristic solution after the coding
Visual analysis task is used for after code.
In the embodiment of the present application, the server in step S204 such as can be the cloud clothes mentioned in background technique
Business device etc..Other than feature after coding, relevant auxiliary information can also be also sent to server, in order to server
Decoding.The auxiliary information may include decoding information needed, for example, the type of feature, the information of fixed reference feature, characteristic
Amount etc..
In the embodiment of the present application, the visual analysis task includes but is not limited in image retrieval task and/or video
Object detection and tracking task etc..
By the method for Fig. 2, video itself can not be sent to cloud server, but the feature of video is compiled
It is sent to cloud server after code for visual analysis task, can reduce data transmission pressure compared with the prior art, it can also
To reduce the storage pressure of cloud server, therefore, the problems of the prior art can be partly or entirely solved.
Method based on Fig. 2, the embodiment of the present application also provides some specific embodiments of this method, and extension
Scheme is illustrated below.
In the embodiment of the present application, described to extract each video frame for step S201 for the scene in background technique
Feature, can specifically include:Receive the video sequence of one or more terminal acquisitions;Extracting the video sequence includes
At least partly feature in region in each video frame.At least partly region can be the area-of-interest extracted in advance, feel emerging
Interesting extracted region mode includes:Area-of-interest is automatically extracted, mark area-of-interest manually and marks and mentions automatically manually
Take the mode combined.The form of area-of-interest includes at least one of following data:Full frame image, arbitrary size
Partial image region.
In the embodiment of the present application, directly by the feature of reference frame as fixed reference feature.Then for step
S202, the type for determining the feature, can specifically include:
Every frame in each video frame as present frame and is executed respectively:
According to the feature for the reference frame for belonging to same video sequence in each video frame with present frame, present frame is determined
Feature fixed reference feature, the frame in the video sequence is according to time-sequencing;
According to the fixed reference feature of the feature of present frame and the feature of present frame, the type of the feature of present frame is determined.
Further, the method for determination of the reference frame of present frame can there are many.For example, the reference frame of present frame can be with
Be based on carrying out sequence reference to frame in the video sequence belonging to present frame and/or adaptive reference determines, it is described from
Reference is adapted to be carried out according to distance between feature.Wherein, sequence reference can specifically refer to:By the former frame of present frame or former frames
Reference frame as present frame;Adaptive reference can refer to:In the frame set constituted comprising multiple successive frames including present frame
In, according to distance between the feature of each frame, determine the feature of each frame to which frame in each frame feature sum of the distance it is minimum,
And can be using which frame as the reference frame of each frame in the frame set, the feature of reference frame is fixed reference feature.
In the embodiment of the present application, the fixed reference feature of the feature of the feature and present frame according to present frame, determines to work as
The type of the feature of previous frame, can specifically include:It calculates between the feature of present frame and the fixed reference feature of the feature of present frame
Difference degree characterization value;According to calculated difference degree characterization value, the type of the feature of present frame is determined.
Further, the coding mode may include following at least one:Absolute coding is carried out to the feature of present frame
(for example, when above-mentioned difference degree characterization value is larger);Residual error between the feature of present frame and its fixed reference feature is compiled
Code (for example, when above-mentioned difference degree characterization value is moderate);Using the coding result of the fixed reference feature of present frame as present frame
Coding result (for example, when above-mentioned difference degree characterization value is smaller).
In order to make it easy to understand, the embodiment of the present application provides under a kind of practical application scene, the feature of present frame is determined
A kind of specific embodiment schematic diagram used by type, as shown in Figure 3.
Under the scene of Fig. 3, the n-th frame image for defining video sequence is In, n=1,2 ..., N, N are totalframes.It is logical
It crosses specified region of interesting extraction algorithm and carries out region of interesting extraction.Wherein, there may be one or more in each frame
A area-of-interest, generally, area-of-interest are the object region of the movements such as people, vehicle in video sequence.
Depth characteristic extraction is carried out to each area-of-interest, the deep learning character representation of extraction is Fn,m, n=1,
2,...,N;M=1,2 ..., Mn, wherein MnFor the area-of-interest number in n-th frame.
By taking only one depth characteristic of each frame and for the depth characteristic coding method of binary feature as an example, then n-th frame
Deep learning feature is Fn。
As previously mentioned, characteristic type judgement can be carried out according to the time domain degree of correlation between feature and fixed reference feature, every kind
Characteristic type uses different coding modes.The number of characteristic type need to consider characteristic type encode needed for bit number and
Actual conditions are at least 1.Under the scene of Fig. 3, the used scheme that characteristic type determines is:Characteristic type is divided into
Three kinds, for example, being respectively designated as I-feature, P-feature, S-feature.The decision of characteristic type is according to current signature
With the similarity degree (specifically being characterized with distance between feature) of fixed reference feature, using two scheduled threshold value (TIP、TIS, and TIP
> TIS)。
The feature for defining present frame is Fn(referred to as current signature), FnFixed reference feature beFnFormer frame be Fn-1,
It is assumed that determine reference frame by the way of sequence reference, then FnFormer frame Fn-1As FnFixed reference feature
Current signature FnWith fixed reference featureWhen differing greatly, i.e., the distance between current signature and fixed reference feature are greater than
Threshold value TIPWhen, determine that the type of current signature for I-feature, then can carry out absolute coding, I- to current signature
Feature can be encoded for subsequent characteristics and be provided the reference of high quality, while can provide random access ability;Current signature
When smaller with fixed reference feature difference, i.e., the distance between current signature and fixed reference feature are less than threshold value TISWhen, determine current signature
Type be S-feature, do not need to encode current signature at this time, the coding result of current signature directly with reference
The coding result of feature indicates;For other situations, then it is considered as and a degree of scene changes has occurred, determines current signature
Type be P-feature, needed at this time to the residual error between current signature and fixed reference featureIt is encoded.
Distance can include but is not limited to following at least one between feature:Hamming distance, Euclidean distance and it is described away from
From transformation.For example, the distance between current signature and fixed reference feature can be usedAnd/orCarry out table
Show.In Fig. 3 be withF is first determined as decision conditionnType whether be I-feature, if it is not, then
Further withF is determined again as decision conditionnType be P-feature or S-feature.It needs
Illustrate, in practical applications, Fig. 3 illustrates only a kind of example of judgement sequence and decision condition, and non-limiting,
In practical applications, can modify according to demand Fig. 3 judgement sequence and decision condition, in addition, in addition between feature distance with
Outside, other can be used for above-mentioned decision process by characterization value of difference degree between characteristic feature.
Further, the embodiment of the present application also provides determine reference frame (that is, determining fixed reference feature) adoptable two
The schematic illustration of kind reference mode, as shown in Figure 4.It is assumed that each video frame is characterized in corresponding to according to described in video sequence
What the sequence of each video frame was arranged.
Define the feature between two adjacent I-feature and first I- in described two adjacent I-feature
Feature is:One feature set (Group of feature, GOF);It defines between two I-feature/P-feature
First I/P-feature in feature and described two I-feature/P-feature be:One character subset (sub-
GOF), only include an I/P-feature in a sub-GOF, be denoted as { Fi1, Fi2..., FiJ, two kinds of reference mode difference
There is corresponding reference configuration:
The first reference configuration is sequential prediction structure (Sequential Prediction Structure, SPS),
In SPS, { Fi1, Fi2..., FiJ] in first feature be encoded as I/P-feature, other subsequent features are S-
Feature, and directly with first character representation.However for the S-features in sub-GOF, first feature is very
It may not be a best fixed reference feature selection.
Second of structure is adaptive prediction structure (Adaptive Prediction Structure, APS), and APS exists
{Fi1, Fi2..., FiJIn an adaptively selected better fixed reference feature.In APS, any one feature f of definitionikWith son
The distance (by taking Hamming distance as an example, but being not limited to Hamming distance) of other features is aggregate distance D in gatheringgroup(k), then
Feature with minimal set distance is chosen for fixed reference feature.For no decoding delay, this feature is adjusted to
First feature and it is encoded to I/P-feature in sub-GOF, other features in sub-GOF carry out table with this feature
Show.
Further, for P-feature, the embodiment of the present application also provides a kind of residual coding modes.Specifically,
Above-mentioned encodes the residual error between the feature of present frame and its fixed reference feature, may include:Optimize mould according to rate misalignment
Type determines the matched residual coding mode of the residual error, each residual coding mould in scheduled each residual coding mode
The corresponding coding loss degree of formula is different;Using determining residual coding mode, the residual error is encoded;Wherein, described
Rate misalignment Optimized model is determined according to the loss function of the result accuracy rate of the visual analysis task, the loss letter
Number is determined according to the coding loss degree.
The entirety of each residual error can only be matched with a kind of residual coding mode, the whole multiple portions of each residual error
It can also be matched with different residual coding modes respectively.By taking latter situation as an example, for example, described in scheduled each residual error
In coding mode, determines the matched residual coding mode of the residual error, can specifically include:The residual error is divided into multiple
Subvector;In scheduled each residual coding mode, the matched residual coding mode of each subvector is determined respectively;It adopts
With determining residual coding mode, the residual error is encoded, is specifically included:By the way that each subvector is respectively adopted
Matched residual coding mode encodes corresponding subvector, and realization encodes the residual error.For the ease of reason
Solution, is illustrated in conjunction with Fig. 5.
Fig. 5 is residual coding flow diagram under a kind of practical application scene provided by the embodiments of the present application.
In Fig. 5, residual error is indicated with vector.It will be by residual vectorIt is divided into S isometric subvectors:And then its matched different residual coding mode is designed for each subvector.With residual coding mould
Formula is altogether there are three types of for:The first mode is direct coding raw residual (lossless mode), second of mode be antithetical phrase to
Amount carries out a degree of loss (lossy mode), the third mode is one zero subvector (skip mode) of coding.It is right
In eachThree models can respectively obtain different code rate and loss, can be carried out according to rate misalignment Optimized model
The selection of optimization model.
Further, the rate misalignment Optimized model proposed to the application is illustrated.Institute can be determined as follows
State loss function:According to specified probability distribution, the probability point of the distance between feature to be encoded and feature to be matched is determined
Cloth, as the first probability distribution, wherein the feature to be matched is obtained according to visual analysis task sample;It is general according to priori
Rate determines the probability distribution of the distance between feature and the feature to be matched after the corresponding decoding of the feature to be encoded, makees
For the second probability distribution;According to first probability distribution, second probability distribution, and the vision with result label point
Analysis task calculates separately result accuracy rate and described of the visual analysis task when executing based on the feature before coding
Result accuracy rate of the visual analysis task when being executed based on the feature after the encoding and decoding;According to the knot calculated separately
Fruit accuracy rate determines the loss function.
Visual analysis task sample can be provided by cloud server, can also pass through other approach by local servers
It obtains.The visual analysis task that visual analysis task sample can preferably be executed with subsequent cloud server has certain
Scene general character be conducive to the reliable of increase rate misalignment Optimized model in this way, visual analysis task sample more has reference value
Property.
For step S203, when being the feature coding, lossless coding can be used, lossy coding can also be used.Entropy
Coding belongs to lossless coding, and in this case, described is the feature coding, can specifically include:For the feature into
Row entropy coding (specific encoded content depends on the judgement to characteristic type);The rate misalignment Optimized model is according to
What loss function and the corresponding encoder bit rate of the entropy coding obtained.Entropy coding can specifically use efficient video coding (High
Efficiency Video Coding, HEVC) in context-adaptive two-value count encryption algorithm (CABAC) or similar
Other algorithms.
More specifically, rate misalignment Optimized model the loss function of visual analysis task accuracy rate can calculate according to
Comprehensively considered to obtain to loss, and by loss and code rate.Wherein the loss function of visual analysis task accuracy rate is to pass through
Distortion computation obtains between encoded feature and primitive character.
For example, rate misalignment Optimized model specifically can be defined as follows:
Wherein R is entropy coding code rate;λ is weight, for controlling the tradeoff between code rate and misalignment.When λ is larger, compile
Code is more likely to save code rate, while losing can be bigger.Φ is designed residual coding mode, and J is rate misalignment cost, from
And in rate misalignment Optimized model, there is the residual coding mode of minimum rate misalignment cost will be selected as optimization model.L(D)
For the loss function of visual analysis task accuracy rate, D is distorted between encoded feature and primitive character, and D can be following
One or more of form:Standard deviation (SAD), mean square deviation (MSE), root mean square (RMSE) etc..Visual analysis task is accurate
The loss function L (D) of rate is calculated by D, by taking the visual analysis task is retrieval tasks as an example, then loss function
Steps are as follows for the possible specific calculating of one kind of L (D):
It is built with probability distribution of the broad sense bi-distribution to the Hamming distance between primitive character and feature to be matched
Mould;
Feature and spy to be matched after the corresponding decoding of primitive character in the case of coding loss D are sought according to prior probability
The probability distribution of Hamming distance between sign;
After retrieval rate and the coding loss of seeking primitive character according to the mark of above-mentioned distribution and retrieval tasks
Retrieval rate;
Keep L (D) directly proportional to the loss of accuracy rate.
Compared with the rate-distortion optimization model (RDO) in Video coding, the L (D) in rate misalignment Optimized model can be more preferable
Loss in accuracy of the weighbridge measure feature in visual analysis task, it is quasi- so as to obtain preferably analyzing under same code rate
True rate.
It, can be according to referring-to relation by multiple features for there is the case where multiple semi-cylindrical hills (multiple features) in a frame
Encoded question is converted into single feature coding problem.
In the embodiment of the present application, it for the process in Fig. 2, can also be performed:Institute is sent to after auxiliary information is encoded
State server, in order to which server decoding obtains the auxiliary information, and according to the auxiliary information to the coding after
Feature is decoded;Wherein, the auxiliary information comprises at least one of the following:Indicate the information of the type of the feature;It indicates
The information of the fixed reference feature.Auxiliary information can also include more contents, for example, the width of area-of-interest, interested
The height in region, the coordinate information of area-of-interest in the picture, the attribute of object in area-of-interest, in area-of-interest
The information etc. of event.
In the embodiment of the present application, at decoding end (for example, server in step S204), the auxiliary is believed first
Breath is decoded, characteristic type needed for decoding available decoding deep learning feature, feature quantity, fixed reference feature information
Deng the feature for meeting different application demand can be reconstructed, decoding obtains feature for being decoded to feature after coding
After can be used for the tasks such as visual analysis with reconstruction features sequence, in turn.
In the embodiment of the present application, as previously mentioned, in some cases, cloud server can not be quasi- only according to feature possibly
Visual analysis task really is executed, in such a case, it is possible to which further corresponding video frame is transferred in request.Then for step
Rapid S204, it is described feature after the coding is sent to server after, can also be performed:When the view for receiving the server
When frequency frame acquisition request, the corresponding each video frame of the request is sent to the server.
It is above-mentioned to be based on depth the embodiment of the present application also provides under a kind of practical application scene according to explanation above
The flow diagram of the feature encoding and decoding solution of feature is practised, as shown in Figure 6.
Process in Fig. 6 mainly may comprise steps of:
Step 1:Region of interesting extraction is carried out to each frame of video sequence.
Step 2:Extract the depth characteristic of each area-of-interest.
Step 3:Choose the fixed reference feature of current signature.
Step 4:According to the time domain degree of correlation of current signature and fixed reference feature, characteristic type is carried out to current signature and is determined
Plan.
Step 5:It is encoded according to characteristic type.For needing the feature of coded residual, residual coding is first by residual error
Vector is divided into several isometric subvectors, and each subvector has different modes, is finally carried out according to rate misalignment Optimized model
Optimization model is chosen.
Step 6:Auxiliary information coding, wherein auxiliary information includes reference frame information necessary to feature decodes, feature
Number, characteristic type etc..
Step 7:In decoding process, auxiliary information is decoded first, decodes available decoding depth characteristic institute
The characteristic type needed, feature quantity, fixed reference feature information etc..
Step 8:According to auxiliary information, feature is decoded.
Step 9:If decoding obtains the residual error of feature, need first to reconstruct current signature according to fixed reference feature, and then reconstruct
Characteristic sequence.
Above it is a kind of video information processing method provided by the embodiments of the present application, is based on same invention thinking, this Shen
Please embodiment additionally provide corresponding device, system, as shown in Figure 7, Figure 8.
Fig. 7 is a kind of structural schematic diagram of video information process device provided by the embodiments of the present application corresponding to Fig. 2,
The device can be located in Fig. 2 in the executing subject of process, including:
Extraction module 701 extracts the feature of each video frame;
Determination module 702 determines that the type of the feature, the type reflect between the feature and fixed reference feature
Time domain degree of correlation;
Coding module 703 is the feature coding, is compiled using the scheduled coding mode for being matched with the type
Feature after code;
Feature after the coding is sent to server, in order to which the server is to the coding by sending module 704
Visual analysis task is used for after feature decoding afterwards.
Fig. 8 is a kind of structural schematic diagram of video information process system provided by the embodiments of the present application corresponding to Fig. 2,
The system includes:One or more terminals 801, local servers 802, visual analysis server 803;
The video sequence of acquisition is sent to the local servers 802 by one or more of terminals 801;
The feature for each video frame that the video sequence includes is extracted in the local servers 802, determines the feature
Type is the feature coding, feature after being encoded will be described using the scheduled coding mode for being matched with the type
Feature is sent to the visual analysis server 803 after coding, the type reflect between the feature and fixed reference feature when
Domain degree of correlation;
The visual analysis server 803 is used for visual analysis task after decoding to feature after the coding.
Apparatus, system and method provided by the embodiments of the present application be it is corresponding, therefore, device, system also have with it is corresponding
The similar advantageous effects of method, since the advantageous effects of method being described in detail above,
The advantageous effects of which is not described herein again corresponding device, system.
In the 1990s, it is improvement (example on hardware that the improvement of a technology, which can be distinguished clearly,
Such as, to the improvement of the circuit structures such as diode, transistor, switch) or software on improvement (for changing for method flow
Into).However, with the development of technology, the improvement of current many method flows can be considered as the straight of hardware circuit
Connect improvement.Designer nearly all obtains corresponding hardware electricity by the way that improved method flow to be programmed into hardware circuit
Line structure.Therefore, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable patrol
Volume device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable
Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By setting
Meter personnel, which voluntarily program, to come a digital display circuit " integrated " on a piece of PLD, designs without asking chip maker
With the dedicated IC chip of production.Moreover, nowadays, substitution manually makes IC chip, this programming is also most
" logic compiler (logic compiler) " software is used instead to realize, it and program development software translating used when writing
Device is similar, and the source code before compiling also write by handy specific programming language, this is referred to as Hardware description language
It says (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL
(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description
Language)、 Confluence、CUPL(Cornell University Programming Language)、HDCal、
JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL (Ruby
Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present
Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art
It will be apparent to the skilled artisan that only needing method flow slightly programming in logic and being programmed into integrated circuit with above-mentioned several hardware description languages
In, so that it may it is readily available the hardware circuit for realizing the logical method process.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing
The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can
Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit,
ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following micro-control
Device processed:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320,
Memory Controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to
It is realized other than controller in a manner of pure computer readable program code, it completely can be by the way that method and step be carried out programming in logic
Come so that controller is with the shape of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc.
Formula realizes identical function.Therefore this controller is considered a kind of hardware component, and is used in fact to include in it
The device of existing various functions can also be considered as the structure in hardware component.It or even, can will be for realizing various functions
Device is considered as either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer is for example
It can be broadcast for personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media
It puts in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment
The combination of any equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing
The function of each unit can be realized in the same or multiple software and or hardware when the application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer journey
Sequence product.Therefore, complete hardware embodiment, complete software embodiment or combining software and hardware aspects can be used in the present invention
The form of embodiment.Moreover, it wherein includes the calculating of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in machine usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions each in flowchart and/or the block diagram
The combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computers
Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices
To generate a machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute
For realizing the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram
Device.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that instruction stored in the computer readable memory generation includes
The manufacture of command device, the command device are realized in one box of one or more flows of the flowchart and/or block diagram
Or the function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that
Series of operation steps are executed on computer or other programmable devices to generate computer implemented processing, thus calculating
The instruction executed on machine or other programmable devices is provided for realizing in one or more flows of the flowchart and/or side
The step of function of being specified in block diagram one box or multiple boxes.
In a typical configuration, calculate equipment include one or more processors (CPU), input/output interface,
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any side
Method or technology realize that information stores.Information can be computer readable instructions, data structure, the module of program or other numbers
According to.The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory
(SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM are read-only
Memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or
Other magnetic storage devices or any other non-transmission medium, can be used for storage can be accessed by a computing device information.It presses
It is defined according to herein, computer-readable medium does not include temporary computer readable media (transitory media), is such as modulated
Data-signal and carrier wave.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described
There is also other identical elements in the process, method of element, commodity or equipment.
The application can describe in the general context of computer-executable instructions executed by a computer, such as journey
Sequence module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, it is program, right
As, component, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environment
In, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program mould
Block can be located in the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar between each embodiment
Part may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for being
For embodiment of uniting, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to method
The part of embodiment illustrates.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art
For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application, etc.
With replacement, improvement etc., should be included within the scope of the claims of this application.
Claims (15)
1. a kind of video information processing method, which is characterized in that including:
Extract the feature of each video frame;
Determine that the type of the feature, the type reflect the time domain degree of correlation between the feature and fixed reference feature;
It is the feature coding, feature after being encoded using the scheduled coding mode for being matched with the type;
Feature after the coding is sent to server, for regarding after decoding in order to the server to feature after the coding
Feel analysis task.
2. the method as described in claim 1, which is characterized in that the feature for extracting each video frame specifically includes:
Receive the video sequence of one or more terminal acquisitions;
Extract in each video frame that the video sequence includes at least partly feature in region.
3. the method as described in claim 1, which is characterized in that the type for determining the feature specifically includes:
Every frame in each video frame as present frame and is executed respectively:
According to the feature for the reference frame for belonging to same video sequence in each video frame with present frame, the feature of present frame is determined
Fixed reference feature, the frame in the video sequence is according to time-sequencing;
According to the fixed reference feature of the feature of present frame and the feature of present frame, the type of the feature of present frame is determined.
4. method as claimed in claim 3, which is characterized in that the reference frame of present frame is based on regarding to belonging to present frame
Frame in frequency sequence carries out sequence reference or adaptive reference determines, the adaptive reference according to distance between feature into
Row.
5. method as claimed in claim 3, which is characterized in that the ginseng of the feature of the feature and present frame according to present frame
Feature is examined, the type of the feature of present frame is determined, specifically includes:
Calculate the difference degree characterization value between the feature of present frame and the fixed reference feature of the feature of present frame;
According to calculated difference degree characterization value, the type of the feature of present frame is determined.
6. method as claimed in claim 3, which is characterized in that the coding mode comprises at least one of the following:
Absolute coding is carried out to the feature of present frame;Residual error between the feature of present frame and its fixed reference feature is encoded;
Using the coding result of the fixed reference feature of present frame as the coding result of present frame.
7. method as claimed in claim 6, which is characterized in that described to residual between the feature of present frame and its fixed reference feature
Difference is encoded, and is specifically included:
According to rate misalignment Optimized model, in scheduled each residual coding mode, the matched residual coding mould of the residual error is determined
Formula, the corresponding coding loss degree of each residual coding mode are different;
Using determining residual coding mode, the residual error is encoded;
Wherein, the rate misalignment Optimized model is determined according to the loss function of the result accuracy rate of the visual analysis task
, the loss function is determined according to the coding loss degree.
8. the method for claim 7, which is characterized in that it is described in scheduled each residual coding mode, described in determination
The matched residual coding mode of residual error, specifically includes:
The residual error is divided into multiple subvectors;
In scheduled each residual coding mode, the matched residual coding mode of each subvector is determined respectively;
Using determining residual coding mode, the residual error is encoded, is specifically included:
Corresponding subvector is encoded by the way that the matched residual coding mode of each subvector is respectively adopted, realization pair
The residual error is encoded.
9. the method for claim 7, which is characterized in that determine the loss function as follows:
According to specified probability distribution, the probability distribution of the distance between feature to be encoded and feature to be matched is determined, as
One probability distribution, wherein the feature to be matched is obtained according to visual analysis task sample;
According to prior probability, the distance between feature and the feature to be matched after the corresponding decoding of the feature to be encoded are determined
Probability distribution, as the second probability distribution;
According to first probability distribution, second probability distribution, and the visual analysis task with result label, count respectively
Result accuracy rate and the visual analysis task of the visual analysis task when executing based on the feature before coding is calculated to exist
Result accuracy rate when being executed based on the feature after the encoding and decoding;
According to the result accuracy rate calculated separately, the loss function is determined.
10. the method for claim 7, which is characterized in that described is the feature coding, is specifically included:
Entropy coding is carried out for the feature;
The rate misalignment Optimized model is obtained according to the loss function and the corresponding encoder bit rate of the entropy coding.
11. method as claimed in claim 3, which is characterized in that the method also includes:
The server is sent to after auxiliary information is encoded, in order to which server decoding obtains the auxiliary information, and
Feature after the coding is decoded according to the auxiliary information;
Wherein, the auxiliary information comprises at least one of the following:Indicate the information of the type of the feature;Indicate described with reference to special
The information of sign.
12. method as claimed in claim 3, which is characterized in that it is described feature after the coding is sent to server after, institute
The method of stating further includes:
When receiving the video frame acquisition request of the server, the corresponding each video frame of the request is sent to the clothes
Business device.
13. the method as described in claim 1~12, which is characterized in that the feature is extracted by deep learning network
Deep learning feature.
14. a kind of video information process device, which is characterized in that including:
Extraction module extracts the feature of each video frame;
Determination module determines that the type of the feature, the type reflect that the time domain between the feature and fixed reference feature is related
Degree;
Coding module is the feature coding using the scheduled coding mode for being matched with the type, special after being encoded
Sign;
Feature after the coding is sent to server by sending module, in order to which the server is to characteristic solution after the coding
Visual analysis task is used for after code.
15. a kind of video information process system for executing such as the described in any item methods of claim 1~13, feature exist
In, the system comprises:One or more terminals, local servers, visual analysis server;
The video sequence of acquisition is sent to the local servers by one or more of terminals;
The feature for each video frame that the video sequence includes is extracted in the local servers, determines the type of the feature, adopts
It is the feature coding, feature after being encoded will be special after the coding with the scheduled coding mode for being matched with the type
It levys and gives the visual analysis server, the type reflects the time domain degree of correlation between the feature and fixed reference feature;
The visual analysis server is used for visual analysis task after decoding to feature after the coding.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710338736.3A CN108882020B (en) | 2017-05-15 | 2017-05-15 | Video information processing method, device and system |
US15/690,595 US10390040B2 (en) | 2017-05-15 | 2017-08-30 | Method, apparatus, and system for deep feature coding and decoding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710338736.3A CN108882020B (en) | 2017-05-15 | 2017-05-15 | Video information processing method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108882020A true CN108882020A (en) | 2018-11-23 |
CN108882020B CN108882020B (en) | 2021-01-01 |
Family
ID=64097477
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710338736.3A Active CN108882020B (en) | 2017-05-15 | 2017-05-15 | Video information processing method, device and system |
Country Status (2)
Country | Link |
---|---|
US (1) | US10390040B2 (en) |
CN (1) | CN108882020B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344278A (en) * | 2018-09-25 | 2019-02-15 | 北京邮电大学 | A kind of visual search method, device and equipment |
CN110222649A (en) * | 2019-06-10 | 2019-09-10 | 北京达佳互联信息技术有限公司 | Video classification methods, device, electronic equipment and storage medium |
CN111163318A (en) * | 2020-01-09 | 2020-05-15 | 北京大学 | Human-machine vision coding method and device based on feedback optimization |
CN111491177A (en) * | 2019-01-28 | 2020-08-04 | 上海博泰悦臻电子设备制造有限公司 | Video information extraction method, device and system |
CN113592003A (en) * | 2021-08-04 | 2021-11-02 | 智道网联科技(北京)有限公司 | Picture transmission method, device, equipment and storage medium |
CN115134526A (en) * | 2022-06-28 | 2022-09-30 | 润博全景文旅科技有限公司 | Image coding method, device and equipment based on cloud control |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200021815A1 (en) | 2018-07-10 | 2020-01-16 | Fastvdo Llc | Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (vqa) |
EP3824606A1 (en) * | 2018-07-20 | 2021-05-26 | Nokia Technologies Oy | Learning in communication systems by updating of parameters in a receiving algorithm |
US11915144B2 (en) | 2018-10-02 | 2024-02-27 | Nokia Technologies Oy | Apparatus, a method and a computer program for running a neural network |
CN109522450B (en) | 2018-11-29 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Video classification method and server |
CN110072119B (en) * | 2019-04-11 | 2020-04-10 | 西安交通大学 | Content-aware video self-adaptive transmission method based on deep learning network |
US11700518B2 (en) * | 2019-05-31 | 2023-07-11 | Huawei Technologies Co., Ltd. | Methods and systems for relaying feature-driven communications |
KR102476057B1 (en) | 2019-09-04 | 2022-12-09 | 주식회사 윌러스표준기술연구소 | Method and apparatus for accelerating video encoding and decoding using IMU sensor data for cloud virtual reality |
US11410275B2 (en) * | 2019-09-23 | 2022-08-09 | Tencent America LLC | Video coding for machine (VCM) based system and method for video super resolution (SR) |
KR20220147641A (en) * | 2020-02-28 | 2022-11-03 | 엘지전자 주식회사 | Image encoding/decoding method, apparatus and method for transmitting bitstream for image feature information signaling |
US20230082561A1 (en) * | 2020-03-02 | 2023-03-16 | Lg Electronics Inc. | Image encoding/decoding method and device for performing feature quantization/de-quantization, and recording medium for storing bitstream |
WO2021205067A1 (en) * | 2020-04-07 | 2021-10-14 | Nokia Technologies Oy | Feature-domain residual for video coding for machines |
CN112728727A (en) * | 2021-01-06 | 2021-04-30 | 广东省科学院智能制造研究所 | Intelligent adjusting system for indoor environment comfort level based on edge calculation |
WO2023115506A1 (en) * | 2021-12-24 | 2023-06-29 | Huawei Technologies Co., Ltd. | Systems and methods for enabling automated transfer learning |
CN115761571A (en) * | 2022-10-26 | 2023-03-07 | 北京百度网讯科技有限公司 | Video-based target retrieval method, device, equipment and storage medium |
CN116112694B (en) * | 2022-12-09 | 2023-12-15 | 无锡天宸嘉航科技有限公司 | Video data coding method and system applied to model training |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102215391A (en) * | 2010-04-09 | 2011-10-12 | 华为技术有限公司 | Video data encoding and decoding method and device as well as transform processing method and device |
US20110310975A1 (en) * | 2010-06-16 | 2011-12-22 | Canon Kabushiki Kaisha | Method, Device and Computer-Readable Storage Medium for Encoding and Decoding a Video Signal and Recording Medium Storing a Compressed Bitstream |
US20120300834A1 (en) * | 2009-05-21 | 2012-11-29 | Metoevi Isabelle | Method and System for Efficient Video Transcoding Using Coding Modes, Motion Vectors and Residual Information |
CN104767997A (en) * | 2015-03-25 | 2015-07-08 | 北京大学 | Video-oriented visual feature encoding method and device |
CN104767998A (en) * | 2015-03-25 | 2015-07-08 | 北京大学 | Video-oriented visual feature encoding method and device |
CN106326395A (en) * | 2016-08-18 | 2017-01-11 | 北京大学 | Local visual feature selection method and device |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8902971B2 (en) * | 2004-07-30 | 2014-12-02 | Euclid Discoveries, Llc | Video compression repository and model reuse |
US7684587B2 (en) * | 2005-04-04 | 2010-03-23 | Spirent Communications Of Rockville, Inc. | Reduced-reference visual communication quality assessment using data hiding |
US9906921B2 (en) * | 2015-02-10 | 2018-02-27 | Qualcomm Incorporated | Updating points of interest for positioning |
EP3532993A4 (en) * | 2016-10-25 | 2020-09-30 | Deep North, Inc. | Point to set similarity comparison and deep feature learning for visual recognition |
US10748062B2 (en) * | 2016-12-15 | 2020-08-18 | WaveOne Inc. | Deep learning based adaptive arithmetic coding and codelength regularization |
-
2017
- 2017-05-15 CN CN201710338736.3A patent/CN108882020B/en active Active
- 2017-08-30 US US15/690,595 patent/US10390040B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120300834A1 (en) * | 2009-05-21 | 2012-11-29 | Metoevi Isabelle | Method and System for Efficient Video Transcoding Using Coding Modes, Motion Vectors and Residual Information |
CN102215391A (en) * | 2010-04-09 | 2011-10-12 | 华为技术有限公司 | Video data encoding and decoding method and device as well as transform processing method and device |
US20110310975A1 (en) * | 2010-06-16 | 2011-12-22 | Canon Kabushiki Kaisha | Method, Device and Computer-Readable Storage Medium for Encoding and Decoding a Video Signal and Recording Medium Storing a Compressed Bitstream |
CN104767997A (en) * | 2015-03-25 | 2015-07-08 | 北京大学 | Video-oriented visual feature encoding method and device |
CN104767998A (en) * | 2015-03-25 | 2015-07-08 | 北京大学 | Video-oriented visual feature encoding method and device |
CN106326395A (en) * | 2016-08-18 | 2017-01-11 | 北京大学 | Local visual feature selection method and device |
Non-Patent Citations (1)
Title |
---|
VICTOR SANCHEZ等: "Piecewise Mapping in HEVC Lossless Intra-Prediction Coding", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344278A (en) * | 2018-09-25 | 2019-02-15 | 北京邮电大学 | A kind of visual search method, device and equipment |
CN111491177A (en) * | 2019-01-28 | 2020-08-04 | 上海博泰悦臻电子设备制造有限公司 | Video information extraction method, device and system |
CN110222649A (en) * | 2019-06-10 | 2019-09-10 | 北京达佳互联信息技术有限公司 | Video classification methods, device, electronic equipment and storage medium |
CN111163318A (en) * | 2020-01-09 | 2020-05-15 | 北京大学 | Human-machine vision coding method and device based on feedback optimization |
WO2021139114A1 (en) * | 2020-01-09 | 2021-07-15 | 北京大学 | Man-machine visual coding method and apparatus based on feedback optimization |
CN113592003A (en) * | 2021-08-04 | 2021-11-02 | 智道网联科技(北京)有限公司 | Picture transmission method, device, equipment and storage medium |
CN113592003B (en) * | 2021-08-04 | 2023-12-26 | 智道网联科技(北京)有限公司 | Picture transmission method, device, equipment and storage medium |
CN115134526A (en) * | 2022-06-28 | 2022-09-30 | 润博全景文旅科技有限公司 | Image coding method, device and equipment based on cloud control |
Also Published As
Publication number | Publication date |
---|---|
CN108882020B (en) | 2021-01-01 |
US20180332301A1 (en) | 2018-11-15 |
US10390040B2 (en) | 2019-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108882020A (en) | A kind of video information processing method, apparatus and system | |
TWI590650B (en) | Object tracking in encoded video streams | |
KR20200039757A (en) | Point cloud compression | |
US10708627B2 (en) | Volumetric video compression with motion history | |
Duan et al. | Compact descriptors for visual search | |
US9420299B2 (en) | Method for processing an image | |
CN110324706A (en) | A kind of generation method, device and the computer storage medium of video cover | |
CN111435551B (en) | Point cloud filtering method and device and storage medium | |
CN104782124A (en) | Leveraging encoder hardware to pre-process video content | |
CN107566798A (en) | A kind of system of data processing, method and device | |
GB2514441A (en) | Motion estimation using hierarchical phase plane correlation and block matching | |
CN116233445B (en) | Video encoding and decoding processing method and device, computer equipment and storage medium | |
CN103020138A (en) | Method and device for video retrieval | |
US10445613B2 (en) | Method, apparatus, and computer readable device for encoding and decoding of images using pairs of descriptors and orientation histograms representing their respective points of interest | |
US20230047400A1 (en) | Method for predicting point cloud attribute, encoder, decoder, and storage medium | |
Khan et al. | Sparse to dense depth completion using a generative adversarial network with intelligent sampling strategies | |
CN107018287A (en) | The method and apparatus for carrying out noise reduction to image using video epitome | |
Wang et al. | An efficient deep learning accelerator architecture for compressed video analysis | |
KR20240006667A (en) | Point cloud attribute information encoding method, decoding method, device and related devices | |
CN113691818B (en) | Video target detection method, system, storage medium and computer vision terminal | |
WO2023024842A1 (en) | Point cloud encoding/decoding method, apparatus and device, and storage medium | |
WO2022257145A1 (en) | Point cloud attribute prediction method and apparatus, and codec | |
US20230040484A1 (en) | Fast patch generation for video based point cloud coding | |
Li et al. | Background Knowledge | |
Choi et al. | Lossless location coding for image feature points |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |