CN108882020A

CN108882020A - A kind of video information processing method, apparatus and system

Info

Publication number: CN108882020A
Application number: CN201710338736.3A
Authority: CN
Inventors: 田永鸿; 丁琳; 黄铁军
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2017-05-15
Filing date: 2017-05-15
Publication date: 2018-11-23
Anticipated expiration: 2037-05-15
Also published as: CN108882020B; US20180332301A1; US10390040B2

Abstract

The embodiment of the present application discloses a kind of video information processing method, apparatus and system.The method includes：Extract the feature of each video frame；Determine that the type of the feature, the type reflect the time domain degree of correlation between the feature and fixed reference feature；It is the feature coding, feature after being encoded using the scheduled coding mode for being matched with the type；Feature after the coding is sent to server, is used for visual analysis task after decoding in order to the server to feature after the coding.Utilize the embodiment of the present application, video itself can not be sent to cloud server, but be sent to cloud server after being encoded the feature of video and be used for visual analysis task, it can reduce data transmission pressure compared with the prior art, also can reduce the storage pressure of cloud server.

Description

A kind of video information processing method, apparatus and system

Technical field

This application involves video technique field more particularly to a kind of video information processing methods, apparatus and system.

Background technique

In the cybertimes of high speed development, video information is more next as a kind of accurate, efficient, intuitive multimedia form More widely it is applied.

Visual analysis field is one of important applied field of video information.For example, can be by being carried out to monitor video Visual analysis realizes the functions such as automatic alarm, object detection, object tracing；It for another example, can be by visual analysis in magnanimity Required image is retrieved in video；Etc..

In the prior art, visual analysis task is usually executed by cloud server, and the view of pending visual analysis Frequency is often distributed on multiple terminals (for example, monitor terminal etc.), and the video that each terminal usually requires first by oneself acquisition is sent Video is sent to cloud server to be used for visual analysis task to local local servers, then by each local servers.

But in practical applications, since the data volume for being sent to the video of cloud server is larger, data pass Defeated pressure is larger, and biggish storage pressure can be brought to cloud server.

Summary of the invention

The embodiment of the present application provides a kind of video information processing method, apparatus and system, in the prior art to solve Following technical problem：It is larger due to being sent to data volume of the cloud server for the video of visual analysis task, number It is larger according to transmission pressure, and biggish storage pressure can be brought to cloud server.

In order to solve the above technical problems, what the embodiment of the present application was realized in：

A kind of video information processing method provided by the embodiments of the present application, including：

Extract the feature of each video frame；

Determine that the type of the feature, the type reflect the time domain degree of correlation between the feature and fixed reference feature；

It is the feature coding, feature after being encoded using the scheduled coding mode for being matched with the type；

Feature after the coding is sent to server, after decoding in order to the server to feature after the coding For visual analysis task.

Optionally, the feature for extracting each video frame, specifically includes：

Receive the video sequence of one or more terminal acquisitions；

Extract in each video frame that the video sequence includes at least partly feature in region.

Optionally, the type for determining the feature, specifically includes：

Every frame in each video frame as present frame and is executed respectively：

According to the feature for the reference frame for belonging to same video sequence in each video frame with present frame, present frame is determined Feature fixed reference feature, the frame in the video sequence is according to time-sequencing；

According to the fixed reference feature of the feature of present frame and the feature of present frame, the type of the feature of present frame is determined.

Optionally, the reference frame of present frame is based on to the frame carry out sequence ginseng belonging to present frame in the video sequence Examine or adaptive reference determine, the adaptive reference according to interframe distance carry out.

Optionally, the fixed reference feature of the feature of the feature and present frame according to present frame, determines the feature of present frame Type, specifically include：

Calculate the difference degree characterization value between the feature of present frame and the fixed reference feature of the feature of present frame；

According to calculated difference degree characterization value, the type of the feature of present frame is determined.

Optionally, the coding mode comprises at least one of the following：

Absolute coding is carried out to the feature of present frame；Residual error between the feature of present frame and its fixed reference feature is carried out Coding；Using the coding result of the fixed reference feature of present frame as the coding result of present frame.

Optionally, the residual error between the feature of present frame and its fixed reference feature encodes, and specifically includes：

According to rate misalignment Optimized model, in scheduled each residual coding mode, determine that the matched residual error of the residual error is compiled Pattern, the corresponding coding loss degree of each residual coding mode are different；

Using determining residual coding mode, the residual error is encoded；

Wherein, the rate misalignment Optimized model is the loss function according to the result accuracy rate of the visual analysis task Determining, the loss function is determined according to the coding loss degree.

Optionally, in scheduled each residual coding mode, the matched residual coding mode of the residual error is determined, specifically Including：

The residual error is divided into multiple subvectors；

In scheduled each residual coding mode, the matched residual coding mode of each subvector is determined respectively；

Using determining residual coding mode, the residual error is encoded, is specifically included：

Corresponding subvector is encoded by the way that the matched residual coding mode of each subvector is respectively adopted, Realization encodes the residual error.

Optionally, the loss function is determined as follows：

According to specified probability distribution, the probability distribution of the distance between feature to be encoded and feature to be matched is determined, make For the first probability distribution, wherein the feature to be matched is obtained according to visual analysis task sample；

According to prior probability, determine after the corresponding decoding of the feature to be encoded between feature and the feature to be matched Distance probability distribution, as the second probability distribution；

According to first probability distribution, second probability distribution, visual analysis task is calculated separately based on coding Result accuracy rate and the visual analysis task when the preceding feature executes after based on the encoding and decoding described in Result accuracy rate when feature executes；

According to the result accuracy rate calculated separately, the loss function is determined.

Optionally, described is the feature coding, is specifically included：

Entropy coding is carried out for the feature；

The rate misalignment Optimized model is obtained according to the loss function and the corresponding encoder bit rate of the entropy coding 's.

Optionally, the method also includes：

The server is sent to after auxiliary information is encoded, in order to which server decoding obtains the auxiliary letter Breath, and feature after the coding is decoded according to the auxiliary information；

Wherein, the auxiliary information comprises at least one of the following：Indicate the information of the type of the feature；Described in expression The information of fixed reference feature.

Optionally, it is described feature after the coding is sent to server after, the method also includes：

When receiving the video frame acquisition request of the server, the corresponding each video frame of the request is sent to The server.

Optionally, the feature is the deep learning feature extracted by deep learning network.

A kind of video information process device provided by the embodiments of the present application, including：

Extraction module extracts the feature of each video frame；

Determination module determines that the type of the feature, the type reflect the time domain between the feature and fixed reference feature Degree of correlation；

Coding module is the feature coding, is encoded using the scheduled coding mode for being matched with the type Feature afterwards；

Feature after the coding is sent to server by sending module, in order to which the server is to special after the coding Visual analysis task is used for after sign decoding.

A kind of video information process system provided by the embodiments of the present application, including：One or more terminals, localized services Device, visual analysis server；

The video sequence of acquisition is sent to the local servers by one or more of terminals；

The feature for each video frame that the video sequence includes is extracted in the local servers, determines the class of the feature Type is the feature coding, feature after being encoded, by the volume using the scheduled coding mode for being matched with the type Feature is sent to the visual analysis server after code, and the type reflects the time domain phase between the feature and fixed reference feature Pass degree；

The visual analysis server is used for visual analysis task after decoding to feature after the coding.

At least one above-mentioned technical solution that the embodiment of the present application uses can reach following beneficial effect：It can not be to Cloud server sends video itself, but is sent to cloud server after the feature of video is encoded and is used for vision point Analysis task can reduce data transmission pressure compared with the prior art, also can reduce the storage pressure of cloud server, because This, can partly or entirely solve the problems of the prior art.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only Some embodiments as described in this application, for those of ordinary skill in the art, before not making the creative labor property It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is video features encoding and decoding solution provided by the embodiments of the present application under a kind of practical application scene Principle summary figure；

Fig. 2 is a kind of flow diagram of video information processing method provided by the embodiments of the present application；

Fig. 3 is to determine that the type of the feature of present frame is adopted under a kind of practical application scene provided by the embodiments of the present application A kind of specific embodiment schematic diagram；

Fig. 4 is the schematic illustration of the determining adoptable two kinds of reference modes of fixed reference feature provided by the embodiments of the present application；

Fig. 5 is residual coding flow diagram under a kind of practical application scene provided by the embodiments of the present application；

Fig. 6 is under a kind of practical application scene provided by the embodiments of the present application, and the feature based on deep learning feature compiles solution The flow diagram of code solution；

Fig. 7 is a kind of structural schematic diagram of video information process device provided by the embodiments of the present application corresponding to Fig. 2；

Fig. 8 is a kind of structural schematic diagram of video information process system provided by the embodiments of the present application corresponding to Fig. 2.

Specific embodiment

The embodiment of the present application provides a kind of video information processing method and device.

In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application Attached drawing in embodiment, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described reality Example is applied to be merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field Those of ordinary skill's every other embodiment obtained without creative efforts, all should belong to this Shen The range that please be protect.

Machine is analyzed a large amount of data and other visual tasks many times having can replace people, this Process does not often need entire video or image, it is thus only necessary to according to the extracted feature of video information.Specifically may be used To carry out feature extraction to collected video, carries out feature coding and be transmitted to cloud server, server is to volume beyond the clouds Feature can be used for visual analysis task after being decoded after code.The coding of feature provided by the embodiments of the present application and transmission side Case is a kind of effective solution scheme for the transmission of video big data, can guarantee the accuracy of visual analysis task Meanwhile substantially reducing the cost stored and transmitted.

Fig. 1 is video features encoding and decoding solution provided by the embodiments of the present application under a kind of practical application scene Principle summary figure.

In Fig. 1, great amount of terminals (such as monitoring camera, handheld mobile device etc.) is by captured Video Sequence Transmission To local local servers, video sequence captured by terminal is locally stored in local servers；Then to video sequence Column carry out feature extraction, feature coding, by the bit stream of feature after coding to cloud server；Cloud server is to reception To code stream be decoded, and be used for visual analysis task.In some analysis applications, it is desirable to the result retrieved Analysis is further looked at, the video content of needs (interested fraction video) is transferred to cloud server again at this time.By Then feature is extracted on original video, so that higher-quality feature can guarantee the accuracy of visual analysis task；Relatively The transmission cost of the code stream of feature is very small after the code stream of video sequence, coding, can alleviate data transmission and cloud clothes The pressure of business device storage.

Further, it is contemplated that deep learning feature achieves good performance in many analysis tasks, favorably In improving image retrieval, the performance of the visual tasks such as object detection, the embodiment of the present application also proposed a kind of based on deep learning The encoding and decoding solution of feature is specifically exactly to be encoded using the temporal redundancy of feature to feature, reduces code rate The performance of feature is kept simultaneously.

Above-mentioned solution is described in detail below.

Fig. 2 is a kind of flow diagram of video information processing method provided by the embodiments of the present application.The execution of the process Main body can be terminal and/or server, for example, the monitor terminal mentioned in background technique, local servers etc..The end End and/or server can specifically include but be not limited to following at least one equipment：Mobile phone, tablet computer, intelligence wearable are set Standby, vehicle device, personal computer, large and medium-sized computer, computer cluster etc..

Process in Fig. 2 may comprise steps of：

S201：Extract the feature of each video frame.

In the embodiment of the present application, it is orderly to can be the time for each video frame, for example, each video frame can come from one Or multiple video sequences.The feature can be the characteristics of image such as color, pattern, texture, gray scale, texture, be also possible to The deep learning feature extracted by deep learning network, can preferably be the latter, because deep learning feature is more advantageous to Improve the accuracy rate of subsequent visual analysis task.

The application to the data type of deep learning feature and without limitation, for example can be real value or two-value, this Apply to extracting the specific structure of deep learning network that deep learning feature is based on also without limitation.

S202：Determine that the type of the feature, the type reflect that the time domain between the feature and fixed reference feature is related Degree.

In the embodiment of the present application, the type is predetermined.The feature and fixed reference feature that different type is reflected Between time domain degree of correlation it is different, the fixed reference feature of the feature of any frame in each video frame specifically can be according to following number According to obtaining：The feature of a frame other than any frame or multiframe in each video frame.It for ease of description, can be with One frame or multiframe are known as to the reference frame of any frame.

The time domain degree of correlation specifically can be temporal redundancy degree or similarity degree.Typically, for video sequence In each video frame feature for, if the time domain degree of correlation between feature and fixed reference feature is higher, feature and fixed reference feature Similarity degree it is also higher.

In the embodiment of the present application, the class of feature can be determined according to the difference degree between feature and fixed reference feature Type.The difference degree can such as be characterized by characterization values such as residual error or space lengths.

S203：It is the feature coding using the scheduled coding mode for being matched with the type, it is special after being encoded Sign.

In the embodiment of the present application, if the time domain degree of correlation between feature and fixed reference feature is higher, and fixed reference feature is Coding, then can encode feature according to the coding result of fixed reference feature, or according between feature and fixed reference feature Difference encodes feature, in this way, advantageously reducing the coding cost of feature and reducing the data of the coding result of feature Amount；And if the time domain degree of correlation between feature and fixed reference feature is lower, can carry out absolute coding to feature.

S204：Feature after the coding is sent to server, in order to which the server is to characteristic solution after the coding Visual analysis task is used for after code.

In the embodiment of the present application, the server in step S204 such as can be the cloud clothes mentioned in background technique Business device etc..Other than feature after coding, relevant auxiliary information can also be also sent to server, in order to server Decoding.The auxiliary information may include decoding information needed, for example, the type of feature, the information of fixed reference feature, characteristic Amount etc..

In the embodiment of the present application, the visual analysis task includes but is not limited in image retrieval task and/or video Object detection and tracking task etc..

By the method for Fig. 2, video itself can not be sent to cloud server, but the feature of video is compiled It is sent to cloud server after code for visual analysis task, can reduce data transmission pressure compared with the prior art, it can also To reduce the storage pressure of cloud server, therefore, the problems of the prior art can be partly or entirely solved.

Method based on Fig. 2, the embodiment of the present application also provides some specific embodiments of this method, and extension Scheme is illustrated below.

In the embodiment of the present application, described to extract each video frame for step S201 for the scene in background technique Feature, can specifically include：Receive the video sequence of one or more terminal acquisitions；Extracting the video sequence includes At least partly feature in region in each video frame.At least partly region can be the area-of-interest extracted in advance, feel emerging Interesting extracted region mode includes：Area-of-interest is automatically extracted, mark area-of-interest manually and marks and mentions automatically manually Take the mode combined.The form of area-of-interest includes at least one of following data：Full frame image, arbitrary size Partial image region.

In the embodiment of the present application, directly by the feature of reference frame as fixed reference feature.Then for step S202, the type for determining the feature, can specifically include：

Further, the method for determination of the reference frame of present frame can there are many.For example, the reference frame of present frame can be with Be based on carrying out sequence reference to frame in the video sequence belonging to present frame and/or adaptive reference determines, it is described from Reference is adapted to be carried out according to distance between feature.Wherein, sequence reference can specifically refer to：By the former frame of present frame or former frames Reference frame as present frame；Adaptive reference can refer to：In the frame set constituted comprising multiple successive frames including present frame In, according to distance between the feature of each frame, determine the feature of each frame to which frame in each frame feature sum of the distance it is minimum, And can be using which frame as the reference frame of each frame in the frame set, the feature of reference frame is fixed reference feature.

In the embodiment of the present application, the fixed reference feature of the feature of the feature and present frame according to present frame, determines to work as The type of the feature of previous frame, can specifically include：It calculates between the feature of present frame and the fixed reference feature of the feature of present frame Difference degree characterization value；According to calculated difference degree characterization value, the type of the feature of present frame is determined.

Further, the coding mode may include following at least one：Absolute coding is carried out to the feature of present frame (for example, when above-mentioned difference degree characterization value is larger)；Residual error between the feature of present frame and its fixed reference feature is compiled Code (for example, when above-mentioned difference degree characterization value is moderate)；Using the coding result of the fixed reference feature of present frame as present frame Coding result (for example, when above-mentioned difference degree characterization value is smaller).

In order to make it easy to understand, the embodiment of the present application provides under a kind of practical application scene, the feature of present frame is determined A kind of specific embodiment schematic diagram used by type, as shown in Figure 3.

Under the scene of Fig. 3, the n-th frame image for defining video sequence is I_n, n=1,2 ..., N, N are totalframes.It is logical It crosses specified region of interesting extraction algorithm and carries out region of interesting extraction.Wherein, there may be one or more in each frame A area-of-interest, generally, area-of-interest are the object region of the movements such as people, vehicle in video sequence.

Depth characteristic extraction is carried out to each area-of-interest, the deep learning character representation of extraction is F_n,m, n=1, 2,...,N；M=1,2 ..., M_n, wherein M_nFor the area-of-interest number in n-th frame.

By taking only one depth characteristic of each frame and for the depth characteristic coding method of binary feature as an example, then n-th frame Deep learning feature is F_n。

As previously mentioned, characteristic type judgement can be carried out according to the time domain degree of correlation between feature and fixed reference feature, every kind Characteristic type uses different coding modes.The number of characteristic type need to consider characteristic type encode needed for bit number and Actual conditions are at least 1.Under the scene of Fig. 3, the used scheme that characteristic type determines is：Characteristic type is divided into Three kinds, for example, being respectively designated as I-feature, P-feature, S-feature.The decision of characteristic type is according to current signature With the similarity degree (specifically being characterized with distance between feature) of fixed reference feature, using two scheduled threshold value (T_IP、T_IS, and T_IP > T_IS)。

The feature for defining present frame is F_n(referred to as current signature), F_nFixed reference feature beF_nFormer frame be F_n-1, It is assumed that determine reference frame by the way of sequence reference, then F_nFormer frame F_n-1As F_nFixed reference feature

Current signature F_nWith fixed reference featureWhen differing greatly, i.e., the distance between current signature and fixed reference feature are greater than Threshold value T_IPWhen, determine that the type of current signature for I-feature, then can carry out absolute coding, I- to current signature Feature can be encoded for subsequent characteristics and be provided the reference of high quality, while can provide random access ability；Current signature When smaller with fixed reference feature difference, i.e., the distance between current signature and fixed reference feature are less than threshold value T_ISWhen, determine current signature Type be S-feature, do not need to encode current signature at this time, the coding result of current signature directly with reference The coding result of feature indicates；For other situations, then it is considered as and a degree of scene changes has occurred, determines current signature Type be P-feature, needed at this time to the residual error between current signature and fixed reference featureIt is encoded.

Distance can include but is not limited to following at least one between feature：Hamming distance, Euclidean distance and it is described away from From transformation.For example, the distance between current signature and fixed reference feature can be usedAnd/orCarry out table Show.In Fig. 3 be withF is first determined as decision condition_nType whether be I-feature, if it is not, then Further withF is determined again as decision condition_nType be P-feature or S-feature.It needs Illustrate, in practical applications, Fig. 3 illustrates only a kind of example of judgement sequence and decision condition, and non-limiting, In practical applications, can modify according to demand Fig. 3 judgement sequence and decision condition, in addition, in addition between feature distance with Outside, other can be used for above-mentioned decision process by characterization value of difference degree between characteristic feature.

Further, the embodiment of the present application also provides determine reference frame (that is, determining fixed reference feature) adoptable two The schematic illustration of kind reference mode, as shown in Figure 4.It is assumed that each video frame is characterized in corresponding to according to described in video sequence What the sequence of each video frame was arranged.

Define the feature between two adjacent I-feature and first I- in described two adjacent I-feature Feature is：One feature set (Group of feature, GOF)；It defines between two I-feature/P-feature First I/P-feature in feature and described two I-feature/P-feature be：One character subset (sub- GOF), only include an I/P-feature in a sub-GOF, be denoted as { F_i1, F_i2..., F_iJ, two kinds of reference mode difference There is corresponding reference configuration：

The first reference configuration is sequential prediction structure (Sequential Prediction Structure, SPS), In SPS, { F_i1, F_i2..., F_iJ] in first feature be encoded as I/P-feature, other subsequent features are S- Feature, and directly with first character representation.However for the S-features in sub-GOF, first feature is very It may not be a best fixed reference feature selection.

Second of structure is adaptive prediction structure (Adaptive Prediction Structure, APS), and APS exists {F_i1, F_i2..., F_iJIn an adaptively selected better fixed reference feature.In APS, any one feature f of definition_ikWith son The distance (by taking Hamming distance as an example, but being not limited to Hamming distance) of other features is aggregate distance D in gathering_group(k), then

Feature with minimal set distance is chosen for fixed reference feature.For no decoding delay, this feature is adjusted to First feature and it is encoded to I/P-feature in sub-GOF, other features in sub-GOF carry out table with this feature Show.

Further, for P-feature, the embodiment of the present application also provides a kind of residual coding modes.Specifically, Above-mentioned encodes the residual error between the feature of present frame and its fixed reference feature, may include：Optimize mould according to rate misalignment Type determines the matched residual coding mode of the residual error, each residual coding mould in scheduled each residual coding mode The corresponding coding loss degree of formula is different；Using determining residual coding mode, the residual error is encoded；Wherein, described Rate misalignment Optimized model is determined according to the loss function of the result accuracy rate of the visual analysis task, the loss letter Number is determined according to the coding loss degree.

The entirety of each residual error can only be matched with a kind of residual coding mode, the whole multiple portions of each residual error It can also be matched with different residual coding modes respectively.By taking latter situation as an example, for example, described in scheduled each residual error In coding mode, determines the matched residual coding mode of the residual error, can specifically include：The residual error is divided into multiple Subvector；In scheduled each residual coding mode, the matched residual coding mode of each subvector is determined respectively；It adopts With determining residual coding mode, the residual error is encoded, is specifically included：By the way that each subvector is respectively adopted Matched residual coding mode encodes corresponding subvector, and realization encodes the residual error.For the ease of reason Solution, is illustrated in conjunction with Fig. 5.

Fig. 5 is residual coding flow diagram under a kind of practical application scene provided by the embodiments of the present application.

In Fig. 5, residual error is indicated with vector.It will be by residual vectorIt is divided into S isometric subvectors：And then its matched different residual coding mode is designed for each subvector.With residual coding mould Formula is altogether there are three types of for：The first mode is direct coding raw residual (lossless mode), second of mode be antithetical phrase to Amount carries out a degree of loss (lossy mode), the third mode is one zero subvector (skip mode) of coding.It is right In eachThree models can respectively obtain different code rate and loss, can be carried out according to rate misalignment Optimized model The selection of optimization model.

Further, the rate misalignment Optimized model proposed to the application is illustrated.Institute can be determined as follows State loss function：According to specified probability distribution, the probability point of the distance between feature to be encoded and feature to be matched is determined Cloth, as the first probability distribution, wherein the feature to be matched is obtained according to visual analysis task sample；It is general according to priori Rate determines the probability distribution of the distance between feature and the feature to be matched after the corresponding decoding of the feature to be encoded, makees For the second probability distribution；According to first probability distribution, second probability distribution, and the vision with result label point Analysis task calculates separately result accuracy rate and described of the visual analysis task when executing based on the feature before coding Result accuracy rate of the visual analysis task when being executed based on the feature after the encoding and decoding；According to the knot calculated separately Fruit accuracy rate determines the loss function.

Visual analysis task sample can be provided by cloud server, can also pass through other approach by local servers It obtains.The visual analysis task that visual analysis task sample can preferably be executed with subsequent cloud server has certain Scene general character be conducive to the reliable of increase rate misalignment Optimized model in this way, visual analysis task sample more has reference value Property.

For step S203, when being the feature coding, lossless coding can be used, lossy coding can also be used.Entropy Coding belongs to lossless coding, and in this case, described is the feature coding, can specifically include：For the feature into Row entropy coding (specific encoded content depends on the judgement to characteristic type)；The rate misalignment Optimized model is according to What loss function and the corresponding encoder bit rate of the entropy coding obtained.Entropy coding can specifically use efficient video coding (High Efficiency Video Coding, HEVC) in context-adaptive two-value count encryption algorithm (CABAC) or similar Other algorithms.

More specifically, rate misalignment Optimized model the loss function of visual analysis task accuracy rate can calculate according to Comprehensively considered to obtain to loss, and by loss and code rate.Wherein the loss function of visual analysis task accuracy rate is to pass through Distortion computation obtains between encoded feature and primitive character.

For example, rate misalignment Optimized model specifically can be defined as follows：

Wherein R is entropy coding code rate；λ is weight, for controlling the tradeoff between code rate and misalignment.When λ is larger, compile Code is more likely to save code rate, while losing can be bigger.Φ is designed residual coding mode, and J is rate misalignment cost, from And in rate misalignment Optimized model, there is the residual coding mode of minimum rate misalignment cost will be selected as optimization model.L(D) For the loss function of visual analysis task accuracy rate, D is distorted between encoded feature and primitive character, and D can be following One or more of form：Standard deviation (SAD), mean square deviation (MSE), root mean square (RMSE) etc..Visual analysis task is accurate The loss function L (D) of rate is calculated by D, by taking the visual analysis task is retrieval tasks as an example, then loss function Steps are as follows for the possible specific calculating of one kind of L (D)：

It is built with probability distribution of the broad sense bi-distribution to the Hamming distance between primitive character and feature to be matched Mould；

Feature and spy to be matched after the corresponding decoding of primitive character in the case of coding loss D are sought according to prior probability The probability distribution of Hamming distance between sign；

After retrieval rate and the coding loss of seeking primitive character according to the mark of above-mentioned distribution and retrieval tasks Retrieval rate；

Keep L (D) directly proportional to the loss of accuracy rate.

Compared with the rate-distortion optimization model (RDO) in Video coding, the L (D) in rate misalignment Optimized model can be more preferable Loss in accuracy of the weighbridge measure feature in visual analysis task, it is quasi- so as to obtain preferably analyzing under same code rate True rate.

It, can be according to referring-to relation by multiple features for there is the case where multiple semi-cylindrical hills (multiple features) in a frame Encoded question is converted into single feature coding problem.

In the embodiment of the present application, it for the process in Fig. 2, can also be performed：Institute is sent to after auxiliary information is encoded State server, in order to which server decoding obtains the auxiliary information, and according to the auxiliary information to the coding after Feature is decoded；Wherein, the auxiliary information comprises at least one of the following：Indicate the information of the type of the feature；It indicates The information of the fixed reference feature.Auxiliary information can also include more contents, for example, the width of area-of-interest, interested The height in region, the coordinate information of area-of-interest in the picture, the attribute of object in area-of-interest, in area-of-interest The information etc. of event.

In the embodiment of the present application, at decoding end (for example, server in step S204), the auxiliary is believed first Breath is decoded, characteristic type needed for decoding available decoding deep learning feature, feature quantity, fixed reference feature information Deng the feature for meeting different application demand can be reconstructed, decoding obtains feature for being decoded to feature after coding After can be used for the tasks such as visual analysis with reconstruction features sequence, in turn.

In the embodiment of the present application, as previously mentioned, in some cases, cloud server can not be quasi- only according to feature possibly Visual analysis task really is executed, in such a case, it is possible to which further corresponding video frame is transferred in request.Then for step Rapid S204, it is described feature after the coding is sent to server after, can also be performed：When the view for receiving the server When frequency frame acquisition request, the corresponding each video frame of the request is sent to the server.

It is above-mentioned to be based on depth the embodiment of the present application also provides under a kind of practical application scene according to explanation above The flow diagram of the feature encoding and decoding solution of feature is practised, as shown in Figure 6.

Process in Fig. 6 mainly may comprise steps of：

Step 1：Region of interesting extraction is carried out to each frame of video sequence.

Step 2：Extract the depth characteristic of each area-of-interest.

Step 3：Choose the fixed reference feature of current signature.

Step 4：According to the time domain degree of correlation of current signature and fixed reference feature, characteristic type is carried out to current signature and is determined Plan.

Step 5：It is encoded according to characteristic type.For needing the feature of coded residual, residual coding is first by residual error Vector is divided into several isometric subvectors, and each subvector has different modes, is finally carried out according to rate misalignment Optimized model Optimization model is chosen.

Step 6：Auxiliary information coding, wherein auxiliary information includes reference frame information necessary to feature decodes, feature Number, characteristic type etc..

Step 7：In decoding process, auxiliary information is decoded first, decodes available decoding depth characteristic institute The characteristic type needed, feature quantity, fixed reference feature information etc..

Step 8：According to auxiliary information, feature is decoded.

Step 9：If decoding obtains the residual error of feature, need first to reconstruct current signature according to fixed reference feature, and then reconstruct Characteristic sequence.

Above it is a kind of video information processing method provided by the embodiments of the present application, is based on same invention thinking, this Shen Please embodiment additionally provide corresponding device, system, as shown in Figure 7, Figure 8.

Fig. 7 is a kind of structural schematic diagram of video information process device provided by the embodiments of the present application corresponding to Fig. 2, The device can be located in Fig. 2 in the executing subject of process, including：

Extraction module 701 extracts the feature of each video frame；

Determination module 702 determines that the type of the feature, the type reflect between the feature and fixed reference feature Time domain degree of correlation；

Coding module 703 is the feature coding, is compiled using the scheduled coding mode for being matched with the type Feature after code；

Feature after the coding is sent to server, in order to which the server is to the coding by sending module 704 Visual analysis task is used for after feature decoding afterwards.

Fig. 8 is a kind of structural schematic diagram of video information process system provided by the embodiments of the present application corresponding to Fig. 2, The system includes：One or more terminals 801, local servers 802, visual analysis server 803；

The video sequence of acquisition is sent to the local servers 802 by one or more of terminals 801；

The feature for each video frame that the video sequence includes is extracted in the local servers 802, determines the feature Type is the feature coding, feature after being encoded will be described using the scheduled coding mode for being matched with the type Feature is sent to the visual analysis server 803 after coding, the type reflect between the feature and fixed reference feature when Domain degree of correlation；

The visual analysis server 803 is used for visual analysis task after decoding to feature after the coding.

Apparatus, system and method provided by the embodiments of the present application be it is corresponding, therefore, device, system also have with it is corresponding The similar advantageous effects of method, since the advantageous effects of method being described in detail above, The advantageous effects of which is not described herein again corresponding device, system.

In the 1990s, it is improvement (example on hardware that the improvement of a technology, which can be distinguished clearly, Such as, to the improvement of the circuit structures such as diode, transistor, switch) or software on improvement (for changing for method flow Into).However, with the development of technology, the improvement of current many method flows can be considered as the straight of hardware circuit Connect improvement.Designer nearly all obtains corresponding hardware electricity by the way that improved method flow to be programmed into hardware circuit Line structure.Therefore, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable patrol Volume device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By setting Meter personnel, which voluntarily program, to come a digital display circuit " integrated " on a piece of PLD, designs without asking chip maker With the dedicated IC chip of production.Moreover, nowadays, substitution manually makes IC chip, this programming is also most " logic compiler (logic compiler) " software is used instead to realize, it and program development software translating used when writing Device is similar, and the source code before compiling also write by handy specific programming language, this is referred to as Hardware description language It says (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、 Confluence、CUPL(Cornell University Programming Language)、HDCal、 JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL (Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art It will be apparent to the skilled artisan that only needing method flow slightly programming in logic and being programmed into integrated circuit with above-mentioned several hardware description languages In, so that it may it is readily available the hardware circuit for realizing the logical method process.

Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following micro-control Device processed：ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, Memory Controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to It is realized other than controller in a manner of pure computer readable program code, it completely can be by the way that method and step be carried out programming in logic Come so that controller is with the shape of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Formula realizes identical function.Therefore this controller is considered a kind of hardware component, and is used in fact to include in it The device of existing various functions can also be considered as the structure in hardware component.It or even, can will be for realizing various functions Device is considered as either the software module of implementation method can be the structure in hardware component again.

System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer is for example It can be broadcast for personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media It puts in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of any equipment.

For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing The function of each unit can be realized in the same or multiple software and or hardware when the application.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer journey Sequence product.Therefore, complete hardware embodiment, complete software embodiment or combining software and hardware aspects can be used in the present invention The form of embodiment.Moreover, it wherein includes the calculating of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in machine usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions each in flowchart and/or the block diagram The combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computers Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices To generate a machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute For realizing the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram Device.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that instruction stored in the computer readable memory generation includes The manufacture of command device, the command device are realized in one box of one or more flows of the flowchart and/or block diagram Or the function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that Series of operation steps are executed on computer or other programmable devices to generate computer implemented processing, thus calculating The instruction executed on machine or other programmable devices is provided for realizing in one or more flows of the flowchart and/or side The step of function of being specified in block diagram one box or multiple boxes.

In a typical configuration, calculate equipment include one or more processors (CPU), input/output interface, Network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any side Method or technology realize that information stores.Information can be computer readable instructions, data structure, the module of program or other numbers According to.The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or Other magnetic storage devices or any other non-transmission medium, can be used for storage can be accessed by a computing device information.It presses It is defined according to herein, computer-readable medium does not include temporary computer readable media (transitory media), is such as modulated Data-signal and carrier wave.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described There is also other identical elements in the process, method of element, commodity or equipment.

The application can describe in the general context of computer-executable instructions executed by a computer, such as journey Sequence module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, it is program, right As, component, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environment In, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program mould Block can be located in the local and remote computer storage media including storage equipment.

All the embodiments in this specification are described in a progressive manner, same and similar between each embodiment Part may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for being For embodiment of uniting, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to method The part of embodiment illustrates.

The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application, etc. With replacement, improvement etc., should be included within the scope of the claims of this application.

Claims

1. a kind of video information processing method, which is characterized in that including：

Extract the feature of each video frame；

Feature after the coding is sent to server, for regarding after decoding in order to the server to feature after the coding Feel analysis task.

2. the method as described in claim 1, which is characterized in that the feature for extracting each video frame specifically includes：

Receive the video sequence of one or more terminal acquisitions；

3. the method as described in claim 1, which is characterized in that the type for determining the feature specifically includes：

According to the feature for the reference frame for belonging to same video sequence in each video frame with present frame, the feature of present frame is determined Fixed reference feature, the frame in the video sequence is according to time-sequencing；

4. method as claimed in claim 3, which is characterized in that the reference frame of present frame is based on regarding to belonging to present frame Frame in frequency sequence carries out sequence reference or adaptive reference determines, the adaptive reference according to distance between feature into Row.

5. method as claimed in claim 3, which is characterized in that the ginseng of the feature of the feature and present frame according to present frame Feature is examined, the type of the feature of present frame is determined, specifically includes：

6. method as claimed in claim 3, which is characterized in that the coding mode comprises at least one of the following：

Absolute coding is carried out to the feature of present frame；Residual error between the feature of present frame and its fixed reference feature is encoded； Using the coding result of the fixed reference feature of present frame as the coding result of present frame.

7. method as claimed in claim 6, which is characterized in that described to residual between the feature of present frame and its fixed reference feature Difference is encoded, and is specifically included：

According to rate misalignment Optimized model, in scheduled each residual coding mode, the matched residual coding mould of the residual error is determined Formula, the corresponding coding loss degree of each residual coding mode are different；

Using determining residual coding mode, the residual error is encoded；

Wherein, the rate misalignment Optimized model is determined according to the loss function of the result accuracy rate of the visual analysis task , the loss function is determined according to the coding loss degree.

8. the method for claim 7, which is characterized in that it is described in scheduled each residual coding mode, described in determination The matched residual coding mode of residual error, specifically includes：

The residual error is divided into multiple subvectors；

Corresponding subvector is encoded by the way that the matched residual coding mode of each subvector is respectively adopted, realization pair The residual error is encoded.

9. the method for claim 7, which is characterized in that determine the loss function as follows：

According to specified probability distribution, the probability distribution of the distance between feature to be encoded and feature to be matched is determined, as One probability distribution, wherein the feature to be matched is obtained according to visual analysis task sample；

According to prior probability, the distance between feature and the feature to be matched after the corresponding decoding of the feature to be encoded are determined Probability distribution, as the second probability distribution；

According to first probability distribution, second probability distribution, and the visual analysis task with result label, count respectively Result accuracy rate and the visual analysis task of the visual analysis task when executing based on the feature before coding is calculated to exist Result accuracy rate when being executed based on the feature after the encoding and decoding；

10. the method for claim 7, which is characterized in that described is the feature coding, is specifically included：

Entropy coding is carried out for the feature；

The rate misalignment Optimized model is obtained according to the loss function and the corresponding encoder bit rate of the entropy coding.

11. method as claimed in claim 3, which is characterized in that the method also includes：

The server is sent to after auxiliary information is encoded, in order to which server decoding obtains the auxiliary information, and Feature after the coding is decoded according to the auxiliary information；

Wherein, the auxiliary information comprises at least one of the following：Indicate the information of the type of the feature；Indicate described with reference to special The information of sign.

12. method as claimed in claim 3, which is characterized in that it is described feature after the coding is sent to server after, institute The method of stating further includes：

When receiving the video frame acquisition request of the server, the corresponding each video frame of the request is sent to the clothes Business device.

13. the method as described in claim 1~12, which is characterized in that the feature is extracted by deep learning network Deep learning feature.

14. a kind of video information process device, which is characterized in that including：

Extraction module extracts the feature of each video frame；

Determination module determines that the type of the feature, the type reflect that the time domain between the feature and fixed reference feature is related Degree；

Coding module is the feature coding using the scheduled coding mode for being matched with the type, special after being encoded Sign；

Feature after the coding is sent to server by sending module, in order to which the server is to characteristic solution after the coding Visual analysis task is used for after code.

15. a kind of video information process system for executing such as the described in any item methods of claim 1~13, feature exist In, the system comprises：One or more terminals, local servers, visual analysis server；

The feature for each video frame that the video sequence includes is extracted in the local servers, determines the type of the feature, adopts It is the feature coding, feature after being encoded will be special after the coding with the scheduled coding mode for being matched with the type It levys and gives the visual analysis server, the type reflects the time domain degree of correlation between the feature and fixed reference feature；