CN104767998B

CN104767998B - A kind of visual signature coding method and device towards video

Info

Publication number: CN104767998B
Application number: CN201510134617.7A
Authority: CN
Inventors: 段凌宇; 黄章帅; 陈杰; 黄铁军; 高文
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2015-03-25
Filing date: 2015-03-25
Publication date: 2017-12-08
Anticipated expiration: 2035-03-25
Also published as: CN104767998A

Abstract

The invention discloses a kind of visual signature coding method towards video and device, methods described to include：Obtain the local feature of present frame in video flowing；Reference local feature scope of the local feature of the present frame in the reference frame of present frame is determined, the reference frame of the present frame is the adjacent frame or multiframe of the present frame；According to the reference local feature scope of the reference frame, the reference local feature of the local feature of the present frame in the reference frame is determined；According to the local feature of each frame in the video flowing and with reference to local feature, the local feature bit stream to be sent of the video flowing is obtained.The above method can in client transmissions data Fast Compression transmit characteristic, reduce transmitted data amount, and improve efficiency of transmission.

Description

Video-oriented visual feature coding method and device

Technical Field

The invention relates to a computer technology, in particular to a video-oriented visual feature coding method and device.

Background

Currently, with the popularization of intelligent terminals, video streams are captured in real time through a terminal camera, and more applications are applied to real-time analysis and mining. That is, how to mine video/image information required by a user from a large amount of image videos becomes a research hotspot.

In the prior art, two real-time video stream analysis methods based on an intelligent terminal exist.

The first one is: and directly sending the coded video stream to a server at the mobile terminal side, and decoding and visually analyzing after the server receives the video stream. The drawbacks of this solution are: in order to ensure that the video quality can be used for visual analysis, the compression rate of video coding is low, the code stream is large, and finally, great bandwidth consumption is brought.

The second method is as follows: the mobile terminal sequentially extracts at least one local visual feature of each frame from the frame sequence of the video stream, and then sequentially sends the local visual features of each frame to the server for visual analysis. According to the scheme, in the process of extracting local visual features, the processes of feature reduction and quantization processing are used to obtain a lower bit rate, but visual analysis is influenced to a certain extent, and more visual analysis tasks cannot be supported; in addition, the second scheme does not consider the correlation of the inter-frame local visual features in the time domain, so that redundancy exists in the feature data stream, which results in a very large amount of data transmitted by the client and transmission delay, and cannot meet the requirement for real-time processing of the visual analysis task.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a video-oriented visual feature coding method and device, which can quickly compress transmitted feature data when a client transmits data and reduce the data transmission quantity.

In a first aspect, the present invention provides a video-oriented visual feature coding method, including:

acquiring local characteristics of a current frame in a video stream;

determining a reference local feature range of the local feature of the current frame in a reference frame of the current frame, wherein the reference frame of the current frame is one or more adjacent frames of the current frame;

determining the reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame;

and acquiring a local feature bit stream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream.

Optionally, determining a reference local feature range of the local feature of the current frame in the reference frame of the current frame includes:

selecting any frame of one or more frames adjacent to the current frame as a reference frame;

the reference local feature range is all local features of the whole reference frame;

or,

the reference local feature range is a subset of local features in the reference frame, and the metric distance between the local features in all the subsets and the local feature of the current frame is less than or equal to a preset metric distance.

Optionally, determining a reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame, including:

acquiring the matching similarity of each local feature of the current frame and the local feature of the reference local feature range in the reference frame;

comparing the matching similarity corresponding to each local feature of the current frame with a preset threshold range;

if all matching similarity corresponding to a certain local feature of the current frame does not meet a preset threshold range, determining that the local feature of the current frame has no reference local feature;

if two or more matching similarities which meet the preset threshold range in all the matching similarities corresponding to a certain local feature of the current frame exist, the local feature which best meets the matching similarity in the reference frame closest to the time point of the current frame is selected from the two or more matching similarities to serve as the reference local feature of the current frame.

Optionally, the local feature bitstream comprises:

a head region and a non-head region;

the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;

the non-head region includes: the local feature of the coded reference-free local feature in each frame and the residual error between the coded local feature with the reference local feature and the reference local feature.

Optionally, acquiring a local feature bitstream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream, including:

coding the local features of the coded non-reference local features in each frame in a first preset coding mode to obtain a first bit stream;

obtaining a residual error between a local feature with a reference local feature and the reference local feature;

coding the residual error by adopting a second preset coding mode to obtain a second bit stream;

the first bit stream and the second bit stream constitute a local feature bit stream to be transmitted of the video stream;

a header region of the local feature bit stream is composed of a binary code, and a non-header region includes: the local features coded by adopting a first preset coding mode and the residual coded by adopting a second preset coding mode.

Optionally, the method further comprises:

and sending the local feature bit stream to be sent of the video stream to a server, so that the server acquires the local feature of each frame in the video stream based on the local feature bit stream.

In a second aspect, the present invention further provides a video-oriented visual feature decoding method, including:

receiving a local feature bit stream of a video stream sent by a client, wherein the local feature bit stream comprises: a head region and a non-head region;

acquiring local features of each frame in the video stream according to the local feature bit stream;

wherein the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;

the non-head region includes: the local feature without reference local feature after coding in each frame, and the residual error between the local feature with reference local feature and the reference local feature after coding;

correspondingly, obtaining the local feature of each frame in the video stream according to the local feature bit stream includes:

after determining the information of using the reference local feature from the head region of the local feature bit stream, acquiring the number of the local features of the current frame, the index information of the reference local feature and the information of the quantization parameter indicating the local feature;

and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream.

In a third aspect, the present invention further provides a video-oriented visual feature coding apparatus, including:

the local feature acquisition unit is used for acquiring the local features of the current frame in the video stream;

the determining unit is used for determining a reference local feature range of the local feature of the current frame in a reference frame of the current frame, wherein the reference frame of the current frame is one or more adjacent frames of the current frame;

a reference local feature determining unit, configured to determine, according to a reference local feature range of the reference frame, a reference local feature of the current frame in the reference frame;

and the local feature bit stream acquisition unit is used for acquiring a local feature bit stream to be sent of the video stream according to the local feature of each frame in the video stream and the reference local feature.

In a fourth aspect, the present invention further provides a server, including:

a receiving unit, configured to receive a local feature bitstream of a video stream sent by a client, where the local feature bitstream includes: a head region and a non-head region;

a local feature recovery unit, configured to obtain a local feature of each frame in the video stream according to the local feature bit stream;

accordingly, the local feature recovery unit is specifically configured to:

In a fifth aspect, an embodiment of the present invention further provides a video processing system, including:

the video-oriented visual feature coding device according to any one of the above descriptions and the server according to any one of the above descriptions, wherein the video-oriented visual feature coding device transmits a local feature bitstream of an acquired video stream to the server, and the server restores local features of frames in the video stream according to the received local feature bitstream.

According to the technical scheme, the video-oriented visual feature coding method and the video-oriented visual feature coding device can rapidly compress transmitted feature data when a client transmits data, reduce the data transmission amount and improve the transmission efficiency of the video stream of the client by acquiring the local features of the current frame in the video stream, further determining the reference local features of the current frame in the reference frame of the current frame and further acquiring the local feature bit stream to be transmitted of the video stream.

Drawings

Fig. 1 is a flowchart illustrating a video-oriented visual feature encoding method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a video-oriented visual feature encoding method according to another embodiment of the present invention;

FIG. 3 is a flowchart illustrating a video-oriented visual feature decoding method according to another embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an apparatus for encoding visual characteristics of a video according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The following further describes embodiments of the invention with reference to the drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby. The terms "first" and "second" used in the embodiments of the present invention are merely used for more clearly explaining the present application, and do not have a specific meaning nor limit any content.

Fig. 1 is a flowchart illustrating a video-oriented visual feature coding method according to an embodiment of the present invention, and as shown in fig. 1, the video-oriented visual feature coding method according to the embodiment is as follows.

101. Local features of a current frame in a video stream are obtained.

For example, the local feature may be a Scale Invariant feature descriptor (SIFT), a fast robust Scale Invariant feature descriptor (SURF), a Binary robust independent basic descriptor (Brief), and so on, which are not limited by the embodiment, but are only for illustration.

It should be appreciated that SIFT, SURF represent local features described by floating point numbers, while Brief is a local feature described by binary. Meanwhile, the extraction mode of SIFT, SURF or Brief is the existing extraction mode, and the embodiment is not detailed;

102. and determining the reference local feature range of the local feature of the current frame in the reference frame of the current frame.

In this embodiment, the reference frame of the current frame is a neighboring frame or frames of the current frame. The adjacent frame or frames may be one or more frames before or after the time point of the current frame.

In addition, the current frame is a frame (image) where the local feature to be encoded is located, and the reference frame may be a frame (image) that has been subjected to local feature encoding; the reference frame may be a frame before or after the current frame in time or several frames before and after.

For example, any one of one or more frames adjacent to the current frame is selected as a reference frame; the reference local feature range of the local feature of the current frame in the reference frame may be all local features of the entire reference frame, that is, the reference local feature range is all local features of the entire reference frame;

or,

selecting any frame of one or more frames adjacent to the current frame as a reference frame; the reference local feature range of the local feature of the current frame in the reference frame may be a local feature subset in the reference frame according to a preset rule;

the preset rules are such as:

wherein p is the mth local feature of the current frame, and the coordinate corresponding to the current frame image isq is the nth local feature in the reference frame corresponding to the reference frame

When the above formula condition is satisfied, the reference frame local feature q is a local feature referred to by the current frame local feature p.

It is understood that the reference local feature range may be a subset of local features in each reference frame, and the metric distance between the local features in the subsets and the local feature of the current frame is less than or equal to a preset metric distance.

103. And determining the reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame.

In a specific application, a local feature which is most matched with the local feature to be coded in the reference local feature range in the reference frame can be searched to serve as the reference local feature.

For example, firstly, the matching similarity of each local feature of the current frame and the local feature of the reference local feature range in the reference frame is obtained;

secondly, comparing the matching similarity corresponding to each local feature of the current frame with a preset threshold range;

It is understood that, in this step, the matching similarity may be defined according to the distance definition of the local feature to be encoded and the candidate local feature, formalized as follows:

the m-th local feature of the ith frame of the current frame;

the nth local feature of the jth frame of the reference frame;

distance between two adjacent platesThe distance can be Euclidean distance, Manhattan distance or Hamming distance, etc.;

the best match is defined as:

wherein Dis_minAs candidate features andnearest distance, Dis_secondAs candidate features anda second closest distance;

because, the reference local feature satisfying the above best match definitionAnd local features to be encodedThe distance is the closest;

when θ is 1, andthe nearest candidate local feature is the reference local feature.

In a specific implementation process, the local features of the current frame do not have or have a plurality of reference local features meeting the matching definition; when there are multiple candidates, selecting the reference frame with the nearest time pointThe local feature with the smallest value serves as the only reference local feature.

104. And acquiring a local feature bit stream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream.

For example, the local feature bitstream may include: a head region and a non-head region;

In practical applications, step 104 may include the following sub-steps not shown in the figures:

1041. coding the local features of the coded non-reference local features in each frame in a first preset coding mode to obtain a first bit stream;

1042. obtaining a residual error between a local feature with a reference local feature and the reference local feature;

1043. coding the residual error by adopting a second preset coding mode to obtain a second bit stream;

1044. the first bit stream and the second bit stream constitute a local feature bit stream to be transmitted of the video stream;

1045. a header region of the local feature bit stream is composed of a binary code, and a non-header region includes: the local features coded by the first preset coding mode and the residual coded by the second preset coding mode are obtained by entropy coding, for example, the values obtained by transforming and quantizing the residual.

The method can quickly compress the transmitted characteristic data when the client transmits data, reduce the data transmission amount and improve the transmission efficiency of the video stream of the client.

In a specific example, step 101 in the method shown in fig. 1 may also be the following step 101':

step 101', local features of a current frame in the video stream and attributes of each local feature in the current frame are obtained.

It will be appreciated that the attributes of each local feature of each frame in the video stream may include: the coordinate, scale and other information related to the local feature.

Accordingly, step 104 may be the following step 104':

and step 104', acquiring a local feature bit stream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream and the attribute of the local feature in each frame.

The bitstream includes a header region and a non-header region; the head region is composed of a plurality of 0, 1, and comprises: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;

the non-head region includes: the local feature of the coded non-reference local feature and the attribute of the local feature in each frame, and the residual error between the coded local feature with the reference local feature and the reference local feature.

The method can greatly compress the characteristic data of the video stream by prediction, and can meet the requirement of real-time processing while ensuring the performance of the visual analysis task.

Fig. 2 is a schematic flow chart of a video-oriented visual feature coding method according to an embodiment of the present invention, and as shown in fig. 2, the video-oriented visual feature coding method according to the embodiment is as follows.

201. The method comprises the steps of obtaining local features of a current frame in a video stream, and preprocessing the local features of the current frame.

The local feature of the current frame in this embodiment may be a local feature descriptor, i.e., a local feature description vector. The local features in this embodiment may be one or more.

It should be noted that the present embodiment is different from the encoding method shown in fig. 1 described above in that the present embodiment is also used for preprocessing the local features extracted from the current frame.

For example, the local features of the current frame may be subjected to dimension reduction; and/or carrying out quantization processing on the local characteristics of the current frame.

Specifically, a predetermined dimensionality reduction matrix can be adopted to reduce the dimensionality of the local features in the subset consisting of the local features of the current frame, so as to obtain the local features after dimensionality reduction; the dimension reduction matrix is obtained after a preset first image data set is trained in a dimension reduction mode.

It should be noted that dimension reduction is an optional operation. The dimensionality reduction method can be a Principal component analysis method, a linear discriminant analysis method and the like, wherein the Principal component analysis method can refer to the content disclosed in "Jolliffe, I.T. (1986)." Principal component analysis.

The quantization process may be using scalar quantization or vector quantization to compact local features. The quantization process in the present embodiment is an optional operation.

202. And determining the reference local feature range of the local feature of the current frame in the reference frame of the current frame.

In this embodiment, the reference frame of the current frame is a neighboring frame or frames of the current frame.

203. And determining the reference local feature of the preprocessed current frame in the reference frame according to the reference local feature range of the reference frame.

204. And acquiring a local feature bit stream to be sent of the video stream according to the preprocessed local features and the reference local features of each frame in the video stream.

In this embodiment, the above-described encoding method aims at encoding the local features into a bitstream.

The local feature bitstream in this embodiment may include: a head region and a non-head region;

the aforementioned header region is composed of several 0's and 1's, and the header region may contain the same information as the header region exemplified in step 104; the non-head region includes: the local feature of the coded reference-free local feature in each frame and the residual error between the coded local feature with the reference local feature and the reference local feature.

It should be noted that the local feature in the non-header region may be a local feature encoded by using a first preset encoding mode;

the residual in the non-header region may be a residual encoded by a second preset encoding manner, for example, obtained by entropy encoding a value obtained by transforming and quantizing the residual.

In a specific application, the method shown in fig. 2 may further include the following steps not shown in the figures:

Therefore, the coding method can realize rapid compression of transmitted characteristic data when the client transmits data, reduce the data transmission amount and improve the transmission efficiency of the video stream of the client.

In an optional implementation manner, if the attribute of each local feature of each frame is further obtained in step 201, and if the attribute of each local feature of each frame is a coordinate attribute in step 204, the coordinate attribute of each local feature needs to be encoded for the application of object positioning, where the specific encoding method is as follows:

the first coding mode: and making a difference between the coordinate of the local feature to be coded and the coordinate of the reference local feature, quantizing the residual error, and entropy coding the obtained value to further obtain the residual error which corresponds to the coding in the second preset coding mode.

And a second coding mode: because the coordinate set of the local feature of the current frame can be obtained through affine transformation by the coordinate set of the local feature of the reference frame,for the coordinate to be encoded for the m-th local feature in the current i-th frame,for the coordinates of the n-th local feature of reference in the jth frame of the reference frame, an affine matrixThe affine transformation is as follows:

the affine matrix a may be calculated using a least squares method;

therefore, in the encoding process of the required coordinates, only the affine transformation matrix is encoded, and the original coordinates are encoded at the same timeAnd coordinates after transformationThe elements of this affine matrix and the residual may be quantized in order to further reduce the bit rate.

At this time, the local feature encoded bitstream includes a header region and a non-header region;

the head region is composed of a plurality of 0, 1, and comprises: recording information of whether to use the reference local feature, recording information of a reference frame and a reference local feature range corresponding to the reference frame, marking information of the number of local features, marking information of reference index information of the local features, marking information of a local feature quantization parameter, marking information of whether to use a local feature coordinate attribute, and marking information of a coordinate attribute quantization parameter;

the non-head region includes: the local feature without reference and the attribute of the local feature after coding in each frame, the residual error between the local feature with reference and the reference local feature after coding, and the residual error between the local feature coordinate with reference and the reference local feature coordinate after coding.

Or,

the local feature encoded bitstream comprises a header region and a non-header region; the head region is composed of a plurality of 0, 1, and comprises: recording information of whether to use the reference local feature, recording information of a reference frame and a reference local feature range corresponding to the reference frame, marking information of the number of local features, marking information of reference index information of the local features, marking information of local feature quantization parameters, marking information of whether to use local feature coordinate attributes, and marking information of coordinate transformation matrix information and quantization parameters;

the non-head region includes: the local feature without reference and the attribute of the local feature after coding in each frame, the residual error between the local feature with reference and the reference local feature after coding, and the residual error between the local feature coordinate with reference and the reference local feature coordinate after coding. It should be noted that the properties of different local features may also be coded according to different local features and visual applications.

In a specific implementation manner, for the process of obtaining the local feature of the current frame in the video stream in step 101 and step 201, a local feature selection rule is used to obtain a subset of the local feature of the current frame, where the local feature selection rule may be illustrated as follows:

in this embodiment, for a frame image, taking extracting local features SIFT as an example, if more than one SIFT is extracted, a subset including N SIFTs is selected from all SIFTs, where N is greater than 0. In this embodiment, N is 300, and it should be noted that the N may be adaptively selected according to different values of N.

It should be noted that when the number of SIFTs extracted from the image is less than N, all SIFTs of the image are selected as elements in the subset.

M01, extracting all local features from a plurality of matching image pairs and non-matching image pairs respectively;

wherein, the matched image pair refers to two images containing the same object or the same scene, and the non-matched image pair refers to two images containing different objects or different scenes. These matching image pairs and non-matching image pairs do not include the frame images to be subjected to the operation in steps 101 and 201 described above.

M02, obtaining probability distribution of different attributes of the local features in the correctly matched local features and the mismatched local features through statistics;

for SIFT local features, the different attributes may include, for example: scale, direction, peak of gaussian difference, distance to the center of the image, etc.

M03, based on the probability distribution, calculating the probability that the local features are correctly matched when each attribute of the local features of the frame image to be operated in steps 101 and 201 is in a certain value range, and selecting one or more local features from all the local features according to the probability as the local features of the frame image.

And assuming that different characteristics of the SIFT are statistically independent, and the probability of SIFT correct matching is the product of the probabilities of SIFT correct matching calculated based on different characteristics and is used as a basis for selecting elements in the SIFT subset.

In practical applications, other local feature selection methods can be adopted, and are not limited to the above-mentioned exemplary steps M01 through M03.

It should be noted that, the steps M01 and M02 may be obtained in advance, that is, obtained offline and then stored in the device.

Fig. 3 is a flowchart illustrating a video-oriented visual feature decoding method according to an embodiment of the present invention, and as shown in fig. 3, the video-oriented visual feature decoding method according to the embodiment is as follows.

301. Receiving a local characteristic bit stream of a video stream sent by a client;

302. and acquiring the local features of each frame in the video stream according to the local feature bit stream.

For example, after determining information using a reference local feature from a header region of the local feature bitstream, acquiring the number of local features of a current frame, index information of the reference local feature, and information indicating a quantization parameter of the local feature;

and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream. In a specific application, if the coded bitstream includes the attribute of the local feature, the corresponding step 302 may specifically include the following sub-steps:

a01, for a bitstream to be decoded, first obtaining dimension information indicating whether predictive coding is used, and then obtaining reference local feature range information.

A02, determining the number of local features to be decoded of the current frame and the reference local features in the reference frame, for example, obtaining the index information of the reference local features from the local feature bitstream header, and the information of the local feature quantization parameter.

And A03, performing predictive decoding on the local feature to be decoded according to the reference local feature, wherein the decoding comprises decoding the local feature and the attribute related to the local feature.

Firstly, entropy decoding is carried out on the corresponding bit stream of the non-head part to obtain a residual error, and then the residual error and the reference local characteristic are added to obtain the local characteristic with decoding;

the coordinate information to be decoded may be decoded in a decoding manner corresponding to the first encoding manner: firstly, entropy decoding is carried out on a corresponding bit stream without a head part to obtain a residual error, and then the residual error and the coordinate of a reference local feature are added to obtain coordinate data to be decoded;

the coordinate information to be decoded may be decoded in a decoding manner corresponding to the second encoding manner: firstly, decoding a corresponding bit stream of a non-header part to obtain a transformation matrix A and a residual error, then calculating transformation coordinates according to the transformation matrix and coordinates of reference local features, and finally adding the transformation coordinates and the residual error to obtain coordinates to be decoded.

Fig. 4 is a schematic structural diagram of a video-oriented visual feature coding apparatus according to an embodiment of the present invention, and as shown in fig. 4, the video-oriented visual feature coding apparatus according to the embodiment includes: a local feature acquisition unit 41, a determination unit 42, a reference local feature determination unit 43, and a local feature bitstream acquisition unit 44;

the local feature obtaining unit 41 is configured to obtain a local feature of a current frame in a video stream;

the determining unit 42 is configured to determine a reference local feature range of the local feature of the current frame in a reference frame of the current frame, where the reference frame of the current frame is one or more frames adjacent to the current frame;

the reference local feature determining unit 43 is configured to determine a reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame;

the local feature bitstream obtaining unit 44 is configured to obtain a local feature bitstream to be sent of the video stream according to the local feature of each frame in the video stream and the reference local feature.

For example, the local feature bitstream includes: a head region and a non-head region;

In a specific application, the local feature bitstream obtaining unit 44 may be specifically configured to encode the local feature without the reference local feature in each frame by using a first preset encoding manner, so as to obtain a first bitstream;

In a specific implementation manner, the aforementioned encoding apparatus may further include a preprocessing unit, not shown in the figure, located after the local feature obtaining unit 41 and before the reference local feature determining unit 43, where the preprocessing unit is specifically configured to perform preprocessing on the local feature of the current frame; for example, the local features of the current frame are subjected to dimension reduction processing; and/or carrying out quantization processing on the local characteristics of the current frame.

Accordingly, the reference local feature determining unit 43 may be specifically configured to determine, according to the reference local feature range of the reference frame, a reference local feature of the preprocessed current frame in the reference frame. For example, the reference local feature range is all local features of the entire reference frame, or the reference local feature range is a subset of local features in each reference frame, and the metric distance between the local features in all subsets and the local feature of the current frame is less than or equal to a preset metric distance.

In another specific implementation manner, the aforementioned encoding apparatus may further include a sending unit, not shown in the figure, located after the local feature bitstream obtaining unit 44, and configured to send the local feature bitstream to be sent of the video stream to the server, so that the server obtains the local features of each frame in the video stream based on the local feature bitstream.

The video-oriented visual feature coding apparatus of this embodiment may be located in any client, such as a mobile terminal or other smartphone terminals or a fixed terminal, and may perform the method embodiment described in any of the foregoing fig. 1 and fig. 2, which is not described in detail herein.

The video-oriented visual feature coding device of the embodiment can realize rapid compression of transmitted feature data when a client transmits data, reduce the amount of transmitted data, and improve the transmission efficiency of video streams of the client.

Fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention, and as shown in fig. 5, the server according to the embodiment includes: a receiving unit 51, a local feature recovery unit 52;

the receiving unit 51 is configured to receive a local feature bit stream of a video stream sent by a client;

the local feature recovery unit 52 is configured to obtain a local feature of each frame in the video stream according to the local feature bit stream.

For example, after determining the information using the reference local feature from the header region of the local feature bitstream, the local feature recovery unit 52 obtains the number of local features of the current frame, the index information of the reference local feature, and the information indicating the quantization parameter of the local feature;

and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream. The server of this embodiment may interact with the client that performs encoding to recover the local features of each frame in the video stream in the server, and the server may perform all the processes of the foregoing fig. 3 and the decoding method, and this embodiment is not described in detail.

The server and the video-oriented visual feature coding device of the embodiment interact with each other, so that the problems that in the prior art, when a client transmits data, the transmitted feature data cannot be compressed quickly, and the transmitted data volume is reduced can be solved.

In a fifth aspect, an embodiment of the present invention further provides an image processing system, including: the video-oriented visual feature encoding device according to any of the above embodiments and the server according to any of the above embodiments, wherein the video-oriented visual feature encoding device transmits the acquired local feature bit stream of the video stream to the server, and the server restores the local features of each frame in the video stream according to the received local feature bit stream.

In a specific implementation process, the client may obtain a local feature bitstream of each frame in the video stream according to the method in any of the embodiments described above, and send the obtained local feature bitstream to the server, and the server may obtain the local feature of the video stream according to the local feature bitstream decoding.

The system can solve the problems that the client cannot quickly compress the transmitted characteristic data when transmitting data and reduce the data transmission quantity in the prior art.

In the description of the present invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the devices in an embodiment may be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A video-oriented visual feature coding method, comprising:

acquiring local characteristics of a current frame in a video stream;

acquiring a local feature bit stream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream;

wherein the local feature bitstream comprises:

a head region and a non-head region;

2. The method of claim 1, wherein determining a reference local feature range of the local feature of the current frame in a reference frame of the current frame comprises:

or,

3. The method according to claim 1 or 2, wherein determining the reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame comprises:

4. The method according to claim 1 or 2, wherein obtaining a local feature bitstream to be transmitted of the video stream according to the local feature and the reference local feature of each frame in the video stream comprises:

5. The method according to claim 1 or 2, characterized in that the method further comprises:

6. A video-oriented visual feature decoding method, comprising:

7. A video-oriented visual feature coding apparatus, comprising:

a local feature bit stream obtaining unit, configured to obtain a local feature bit stream to be sent of the video stream according to a local feature and a reference local feature of each frame in the video stream;

wherein the local feature bitstream comprises:

a head region and a non-head region;

8. A server, comprising:

accordingly, the local feature recovery unit is specifically configured to:

9. A video processing system, comprising:

the video-oriented visual feature coding device according to claim 7 and the server according to claim 8, wherein the video-oriented visual feature coding device transmits the acquired local feature bit stream of the video stream to the server, and the server restores the local features of each frame in the video stream according to the received local feature bit stream.