CN104767998B - A kind of visual signature coding method and device towards video - Google Patents

A kind of visual signature coding method and device towards video Download PDF

Info

Publication number
CN104767998B
CN104767998B CN201510134617.7A CN201510134617A CN104767998B CN 104767998 B CN104767998 B CN 104767998B CN 201510134617 A CN201510134617 A CN 201510134617A CN 104767998 B CN104767998 B CN 104767998B
Authority
CN
China
Prior art keywords
local feature
frame
local
feature
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510134617.7A
Other languages
Chinese (zh)
Other versions
CN104767998A (en
Inventor
段凌宇
黄章帅
陈杰
黄铁军
高文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201510134617.7A priority Critical patent/CN104767998B/en
Publication of CN104767998A publication Critical patent/CN104767998A/en
Application granted granted Critical
Publication of CN104767998B publication Critical patent/CN104767998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a kind of visual signature coding method towards video and device, methods described to include:Obtain the local feature of present frame in video flowing;Reference local feature scope of the local feature of the present frame in the reference frame of present frame is determined, the reference frame of the present frame is the adjacent frame or multiframe of the present frame;According to the reference local feature scope of the reference frame, the reference local feature of the local feature of the present frame in the reference frame is determined;According to the local feature of each frame in the video flowing and with reference to local feature, the local feature bit stream to be sent of the video flowing is obtained.The above method can in client transmissions data Fast Compression transmit characteristic, reduce transmitted data amount, and improve efficiency of transmission.

Description

Video-oriented visual feature coding method and device
Technical Field
The invention relates to a computer technology, in particular to a video-oriented visual feature coding method and device.
Background
Currently, with the popularization of intelligent terminals, video streams are captured in real time through a terminal camera, and more applications are applied to real-time analysis and mining. That is, how to mine video/image information required by a user from a large amount of image videos becomes a research hotspot.
In the prior art, two real-time video stream analysis methods based on an intelligent terminal exist.
The first one is: and directly sending the coded video stream to a server at the mobile terminal side, and decoding and visually analyzing after the server receives the video stream. The drawbacks of this solution are: in order to ensure that the video quality can be used for visual analysis, the compression rate of video coding is low, the code stream is large, and finally, great bandwidth consumption is brought.
The second method is as follows: the mobile terminal sequentially extracts at least one local visual feature of each frame from the frame sequence of the video stream, and then sequentially sends the local visual features of each frame to the server for visual analysis. According to the scheme, in the process of extracting local visual features, the processes of feature reduction and quantization processing are used to obtain a lower bit rate, but visual analysis is influenced to a certain extent, and more visual analysis tasks cannot be supported; in addition, the second scheme does not consider the correlation of the inter-frame local visual features in the time domain, so that redundancy exists in the feature data stream, which results in a very large amount of data transmitted by the client and transmission delay, and cannot meet the requirement for real-time processing of the visual analysis task.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a video-oriented visual feature coding method and device, which can quickly compress transmitted feature data when a client transmits data and reduce the data transmission quantity.
In a first aspect, the present invention provides a video-oriented visual feature coding method, including:
acquiring local characteristics of a current frame in a video stream;
determining a reference local feature range of the local feature of the current frame in a reference frame of the current frame, wherein the reference frame of the current frame is one or more adjacent frames of the current frame;
determining the reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame;
and acquiring a local feature bit stream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream.
Optionally, determining a reference local feature range of the local feature of the current frame in the reference frame of the current frame includes:
selecting any frame of one or more frames adjacent to the current frame as a reference frame;
the reference local feature range is all local features of the whole reference frame;
or,
selecting any frame of one or more frames adjacent to the current frame as a reference frame;
the reference local feature range is a subset of local features in the reference frame, and the metric distance between the local features in all the subsets and the local feature of the current frame is less than or equal to a preset metric distance.
Optionally, determining a reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame, including:
acquiring the matching similarity of each local feature of the current frame and the local feature of the reference local feature range in the reference frame;
comparing the matching similarity corresponding to each local feature of the current frame with a preset threshold range;
if all matching similarity corresponding to a certain local feature of the current frame does not meet a preset threshold range, determining that the local feature of the current frame has no reference local feature;
if two or more matching similarities which meet the preset threshold range in all the matching similarities corresponding to a certain local feature of the current frame exist, the local feature which best meets the matching similarity in the reference frame closest to the time point of the current frame is selected from the two or more matching similarities to serve as the reference local feature of the current frame.
Optionally, the local feature bitstream comprises:
a head region and a non-head region;
the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature of the coded reference-free local feature in each frame and the residual error between the coded local feature with the reference local feature and the reference local feature.
Optionally, acquiring a local feature bitstream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream, including:
coding the local features of the coded non-reference local features in each frame in a first preset coding mode to obtain a first bit stream;
obtaining a residual error between a local feature with a reference local feature and the reference local feature;
coding the residual error by adopting a second preset coding mode to obtain a second bit stream;
the first bit stream and the second bit stream constitute a local feature bit stream to be transmitted of the video stream;
a header region of the local feature bit stream is composed of a binary code, and a non-header region includes: the local features coded by adopting a first preset coding mode and the residual coded by adopting a second preset coding mode.
Optionally, the method further comprises:
and sending the local feature bit stream to be sent of the video stream to a server, so that the server acquires the local feature of each frame in the video stream based on the local feature bit stream.
In a second aspect, the present invention further provides a video-oriented visual feature decoding method, including:
receiving a local feature bit stream of a video stream sent by a client, wherein the local feature bit stream comprises: a head region and a non-head region;
acquiring local features of each frame in the video stream according to the local feature bit stream;
wherein the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature without reference local feature after coding in each frame, and the residual error between the local feature with reference local feature and the reference local feature after coding;
correspondingly, obtaining the local feature of each frame in the video stream according to the local feature bit stream includes:
after determining the information of using the reference local feature from the head region of the local feature bit stream, acquiring the number of the local features of the current frame, the index information of the reference local feature and the information of the quantization parameter indicating the local feature;
and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream.
In a third aspect, the present invention further provides a video-oriented visual feature coding apparatus, including:
the local feature acquisition unit is used for acquiring the local features of the current frame in the video stream;
the determining unit is used for determining a reference local feature range of the local feature of the current frame in a reference frame of the current frame, wherein the reference frame of the current frame is one or more adjacent frames of the current frame;
a reference local feature determining unit, configured to determine, according to a reference local feature range of the reference frame, a reference local feature of the current frame in the reference frame;
and the local feature bit stream acquisition unit is used for acquiring a local feature bit stream to be sent of the video stream according to the local feature of each frame in the video stream and the reference local feature.
In a fourth aspect, the present invention further provides a server, including:
a receiving unit, configured to receive a local feature bitstream of a video stream sent by a client, where the local feature bitstream includes: a head region and a non-head region;
a local feature recovery unit, configured to obtain a local feature of each frame in the video stream according to the local feature bit stream;
wherein the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature without reference local feature after coding in each frame, and the residual error between the local feature with reference local feature and the reference local feature after coding;
accordingly, the local feature recovery unit is specifically configured to:
after determining the information of using the reference local feature from the head region of the local feature bit stream, acquiring the number of the local features of the current frame, the index information of the reference local feature and the information of the quantization parameter indicating the local feature;
and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream.
In a fifth aspect, an embodiment of the present invention further provides a video processing system, including:
the video-oriented visual feature coding device according to any one of the above descriptions and the server according to any one of the above descriptions, wherein the video-oriented visual feature coding device transmits a local feature bitstream of an acquired video stream to the server, and the server restores local features of frames in the video stream according to the received local feature bitstream.
According to the technical scheme, the video-oriented visual feature coding method and the video-oriented visual feature coding device can rapidly compress transmitted feature data when a client transmits data, reduce the data transmission amount and improve the transmission efficiency of the video stream of the client by acquiring the local features of the current frame in the video stream, further determining the reference local features of the current frame in the reference frame of the current frame and further acquiring the local feature bit stream to be transmitted of the video stream.
Drawings
Fig. 1 is a flowchart illustrating a video-oriented visual feature encoding method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a video-oriented visual feature encoding method according to another embodiment of the present invention;
FIG. 3 is a flowchart illustrating a video-oriented visual feature decoding method according to another embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an apparatus for encoding visual characteristics of a video according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The following further describes embodiments of the invention with reference to the drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby. The terms "first" and "second" used in the embodiments of the present invention are merely used for more clearly explaining the present application, and do not have a specific meaning nor limit any content.
Fig. 1 is a flowchart illustrating a video-oriented visual feature coding method according to an embodiment of the present invention, and as shown in fig. 1, the video-oriented visual feature coding method according to the embodiment is as follows.
101. Local features of a current frame in a video stream are obtained.
For example, the local feature may be a Scale Invariant feature descriptor (SIFT), a fast robust Scale Invariant feature descriptor (SURF), a Binary robust independent basic descriptor (Brief), and so on, which are not limited by the embodiment, but are only for illustration.
It should be appreciated that SIFT, SURF represent local features described by floating point numbers, while Brief is a local feature described by binary. Meanwhile, the extraction mode of SIFT, SURF or Brief is the existing extraction mode, and the embodiment is not detailed;
102. and determining the reference local feature range of the local feature of the current frame in the reference frame of the current frame.
In this embodiment, the reference frame of the current frame is a neighboring frame or frames of the current frame. The adjacent frame or frames may be one or more frames before or after the time point of the current frame.
In addition, the current frame is a frame (image) where the local feature to be encoded is located, and the reference frame may be a frame (image) that has been subjected to local feature encoding; the reference frame may be a frame before or after the current frame in time or several frames before and after.
For example, any one of one or more frames adjacent to the current frame is selected as a reference frame; the reference local feature range of the local feature of the current frame in the reference frame may be all local features of the entire reference frame, that is, the reference local feature range is all local features of the entire reference frame;
or,
selecting any frame of one or more frames adjacent to the current frame as a reference frame; the reference local feature range of the local feature of the current frame in the reference frame may be a local feature subset in the reference frame according to a preset rule;
the preset rules are such as:
wherein p is the mth local feature of the current frame, and the coordinate corresponding to the current frame image isq is the nth local feature in the reference frame corresponding to the reference frame
When the above formula condition is satisfied, the reference frame local feature q is a local feature referred to by the current frame local feature p.
It is understood that the reference local feature range may be a subset of local features in each reference frame, and the metric distance between the local features in the subsets and the local feature of the current frame is less than or equal to a preset metric distance.
103. And determining the reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame.
In a specific application, a local feature which is most matched with the local feature to be coded in the reference local feature range in the reference frame can be searched to serve as the reference local feature.
For example, firstly, the matching similarity of each local feature of the current frame and the local feature of the reference local feature range in the reference frame is obtained;
secondly, comparing the matching similarity corresponding to each local feature of the current frame with a preset threshold range;
if all matching similarity corresponding to a certain local feature of the current frame does not meet a preset threshold range, determining that the local feature of the current frame has no reference local feature;
if two or more matching similarities which meet the preset threshold range in all the matching similarities corresponding to a certain local feature of the current frame exist, the local feature which best meets the matching similarity in the reference frame closest to the time point of the current frame is selected from the two or more matching similarities to serve as the reference local feature of the current frame.
It is understood that, in this step, the matching similarity may be defined according to the distance definition of the local feature to be encoded and the candidate local feature, formalized as follows:
the m-th local feature of the ith frame of the current frame;
the nth local feature of the jth frame of the reference frame;
distance between two adjacent platesThe distance can be Euclidean distance, Manhattan distance or Hamming distance, etc.;
the best match is defined as:
wherein DisminAs candidate features andnearest distance, DissecondAs candidate features anda second closest distance;
because, the reference local feature satisfying the above best match definitionAnd local features to be encodedThe distance is the closest;
when θ is 1, andthe nearest candidate local feature is the reference local feature.
In a specific implementation process, the local features of the current frame do not have or have a plurality of reference local features meeting the matching definition; when there are multiple candidates, selecting the reference frame with the nearest time pointThe local feature with the smallest value serves as the only reference local feature.
104. And acquiring a local feature bit stream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream.
For example, the local feature bitstream may include: a head region and a non-head region;
the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature of the coded reference-free local feature in each frame and the residual error between the coded local feature with the reference local feature and the reference local feature.
In practical applications, step 104 may include the following sub-steps not shown in the figures:
1041. coding the local features of the coded non-reference local features in each frame in a first preset coding mode to obtain a first bit stream;
1042. obtaining a residual error between a local feature with a reference local feature and the reference local feature;
1043. coding the residual error by adopting a second preset coding mode to obtain a second bit stream;
1044. the first bit stream and the second bit stream constitute a local feature bit stream to be transmitted of the video stream;
1045. a header region of the local feature bit stream is composed of a binary code, and a non-header region includes: the local features coded by the first preset coding mode and the residual coded by the second preset coding mode are obtained by entropy coding, for example, the values obtained by transforming and quantizing the residual.
The method can quickly compress the transmitted characteristic data when the client transmits data, reduce the data transmission amount and improve the transmission efficiency of the video stream of the client.
In a specific example, step 101 in the method shown in fig. 1 may also be the following step 101':
step 101', local features of a current frame in the video stream and attributes of each local feature in the current frame are obtained.
It will be appreciated that the attributes of each local feature of each frame in the video stream may include: the coordinate, scale and other information related to the local feature.
Accordingly, step 104 may be the following step 104':
and step 104', acquiring a local feature bit stream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream and the attribute of the local feature in each frame.
The bitstream includes a header region and a non-header region; the head region is composed of a plurality of 0, 1, and comprises: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature of the coded non-reference local feature and the attribute of the local feature in each frame, and the residual error between the coded local feature with the reference local feature and the reference local feature.
The method can greatly compress the characteristic data of the video stream by prediction, and can meet the requirement of real-time processing while ensuring the performance of the visual analysis task.
Fig. 2 is a schematic flow chart of a video-oriented visual feature coding method according to an embodiment of the present invention, and as shown in fig. 2, the video-oriented visual feature coding method according to the embodiment is as follows.
201. The method comprises the steps of obtaining local features of a current frame in a video stream, and preprocessing the local features of the current frame.
The local feature of the current frame in this embodiment may be a local feature descriptor, i.e., a local feature description vector. The local features in this embodiment may be one or more.
It should be noted that the present embodiment is different from the encoding method shown in fig. 1 described above in that the present embodiment is also used for preprocessing the local features extracted from the current frame.
For example, the local features of the current frame may be subjected to dimension reduction; and/or carrying out quantization processing on the local characteristics of the current frame.
Specifically, a predetermined dimensionality reduction matrix can be adopted to reduce the dimensionality of the local features in the subset consisting of the local features of the current frame, so as to obtain the local features after dimensionality reduction; the dimension reduction matrix is obtained after a preset first image data set is trained in a dimension reduction mode.
It should be noted that dimension reduction is an optional operation. The dimensionality reduction method can be a Principal component analysis method, a linear discriminant analysis method and the like, wherein the Principal component analysis method can refer to the content disclosed in "Jolliffe, I.T. (1986)." Principal component analysis.
The quantization process may be using scalar quantization or vector quantization to compact local features. The quantization process in the present embodiment is an optional operation.
202. And determining the reference local feature range of the local feature of the current frame in the reference frame of the current frame.
In this embodiment, the reference frame of the current frame is a neighboring frame or frames of the current frame.
203. And determining the reference local feature of the preprocessed current frame in the reference frame according to the reference local feature range of the reference frame.
204. And acquiring a local feature bit stream to be sent of the video stream according to the preprocessed local features and the reference local features of each frame in the video stream.
In this embodiment, the above-described encoding method aims at encoding the local features into a bitstream.
The local feature bitstream in this embodiment may include: a head region and a non-head region;
the aforementioned header region is composed of several 0's and 1's, and the header region may contain the same information as the header region exemplified in step 104; the non-head region includes: the local feature of the coded reference-free local feature in each frame and the residual error between the coded local feature with the reference local feature and the reference local feature.
It should be noted that the local feature in the non-header region may be a local feature encoded by using a first preset encoding mode;
the residual in the non-header region may be a residual encoded by a second preset encoding manner, for example, obtained by entropy encoding a value obtained by transforming and quantizing the residual.
In a specific application, the method shown in fig. 2 may further include the following steps not shown in the figures:
and sending the local feature bit stream to be sent of the video stream to a server, so that the server acquires the local feature of each frame in the video stream based on the local feature bit stream.
Therefore, the coding method can realize rapid compression of transmitted characteristic data when the client transmits data, reduce the data transmission amount and improve the transmission efficiency of the video stream of the client.
In an optional implementation manner, if the attribute of each local feature of each frame is further obtained in step 201, and if the attribute of each local feature of each frame is a coordinate attribute in step 204, the coordinate attribute of each local feature needs to be encoded for the application of object positioning, where the specific encoding method is as follows:
the first coding mode: and making a difference between the coordinate of the local feature to be coded and the coordinate of the reference local feature, quantizing the residual error, and entropy coding the obtained value to further obtain the residual error which corresponds to the coding in the second preset coding mode.
And a second coding mode: because the coordinate set of the local feature of the current frame can be obtained through affine transformation by the coordinate set of the local feature of the reference frame,for the coordinate to be encoded for the m-th local feature in the current i-th frame,for the coordinates of the n-th local feature of reference in the jth frame of the reference frame, an affine matrixThe affine transformation is as follows:
the affine matrix a may be calculated using a least squares method;
therefore, in the encoding process of the required coordinates, only the affine transformation matrix is encoded, and the original coordinates are encoded at the same timeAnd coordinates after transformationThe elements of this affine matrix and the residual may be quantized in order to further reduce the bit rate.
At this time, the local feature encoded bitstream includes a header region and a non-header region;
the head region is composed of a plurality of 0, 1, and comprises: recording information of whether to use the reference local feature, recording information of a reference frame and a reference local feature range corresponding to the reference frame, marking information of the number of local features, marking information of reference index information of the local features, marking information of a local feature quantization parameter, marking information of whether to use a local feature coordinate attribute, and marking information of a coordinate attribute quantization parameter;
the non-head region includes: the local feature without reference and the attribute of the local feature after coding in each frame, the residual error between the local feature with reference and the reference local feature after coding, and the residual error between the local feature coordinate with reference and the reference local feature coordinate after coding.
Or,
the local feature encoded bitstream comprises a header region and a non-header region; the head region is composed of a plurality of 0, 1, and comprises: recording information of whether to use the reference local feature, recording information of a reference frame and a reference local feature range corresponding to the reference frame, marking information of the number of local features, marking information of reference index information of the local features, marking information of local feature quantization parameters, marking information of whether to use local feature coordinate attributes, and marking information of coordinate transformation matrix information and quantization parameters;
the non-head region includes: the local feature without reference and the attribute of the local feature after coding in each frame, the residual error between the local feature with reference and the reference local feature after coding, and the residual error between the local feature coordinate with reference and the reference local feature coordinate after coding. It should be noted that the properties of different local features may also be coded according to different local features and visual applications.
In a specific implementation manner, for the process of obtaining the local feature of the current frame in the video stream in step 101 and step 201, a local feature selection rule is used to obtain a subset of the local feature of the current frame, where the local feature selection rule may be illustrated as follows:
in this embodiment, for a frame image, taking extracting local features SIFT as an example, if more than one SIFT is extracted, a subset including N SIFTs is selected from all SIFTs, where N is greater than 0. In this embodiment, N is 300, and it should be noted that the N may be adaptively selected according to different values of N.
It should be noted that when the number of SIFTs extracted from the image is less than N, all SIFTs of the image are selected as elements in the subset.
M01, extracting all local features from a plurality of matching image pairs and non-matching image pairs respectively;
wherein, the matched image pair refers to two images containing the same object or the same scene, and the non-matched image pair refers to two images containing different objects or different scenes. These matching image pairs and non-matching image pairs do not include the frame images to be subjected to the operation in steps 101 and 201 described above.
M02, obtaining probability distribution of different attributes of the local features in the correctly matched local features and the mismatched local features through statistics;
for SIFT local features, the different attributes may include, for example: scale, direction, peak of gaussian difference, distance to the center of the image, etc.
M03, based on the probability distribution, calculating the probability that the local features are correctly matched when each attribute of the local features of the frame image to be operated in steps 101 and 201 is in a certain value range, and selecting one or more local features from all the local features according to the probability as the local features of the frame image.
And assuming that different characteristics of the SIFT are statistically independent, and the probability of SIFT correct matching is the product of the probabilities of SIFT correct matching calculated based on different characteristics and is used as a basis for selecting elements in the SIFT subset.
In practical applications, other local feature selection methods can be adopted, and are not limited to the above-mentioned exemplary steps M01 through M03.
It should be noted that, the steps M01 and M02 may be obtained in advance, that is, obtained offline and then stored in the device.
Fig. 3 is a flowchart illustrating a video-oriented visual feature decoding method according to an embodiment of the present invention, and as shown in fig. 3, the video-oriented visual feature decoding method according to the embodiment is as follows.
301. Receiving a local characteristic bit stream of a video stream sent by a client;
302. and acquiring the local features of each frame in the video stream according to the local feature bit stream.
For example, after determining information using a reference local feature from a header region of the local feature bitstream, acquiring the number of local features of a current frame, index information of the reference local feature, and information indicating a quantization parameter of the local feature;
and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream. In a specific application, if the coded bitstream includes the attribute of the local feature, the corresponding step 302 may specifically include the following sub-steps:
a01, for a bitstream to be decoded, first obtaining dimension information indicating whether predictive coding is used, and then obtaining reference local feature range information.
A02, determining the number of local features to be decoded of the current frame and the reference local features in the reference frame, for example, obtaining the index information of the reference local features from the local feature bitstream header, and the information of the local feature quantization parameter.
And A03, performing predictive decoding on the local feature to be decoded according to the reference local feature, wherein the decoding comprises decoding the local feature and the attribute related to the local feature.
Firstly, entropy decoding is carried out on the corresponding bit stream of the non-head part to obtain a residual error, and then the residual error and the reference local characteristic are added to obtain the local characteristic with decoding;
the coordinate information to be decoded may be decoded in a decoding manner corresponding to the first encoding manner: firstly, entropy decoding is carried out on a corresponding bit stream without a head part to obtain a residual error, and then the residual error and the coordinate of a reference local feature are added to obtain coordinate data to be decoded;
the coordinate information to be decoded may be decoded in a decoding manner corresponding to the second encoding manner: firstly, decoding a corresponding bit stream of a non-header part to obtain a transformation matrix A and a residual error, then calculating transformation coordinates according to the transformation matrix and coordinates of reference local features, and finally adding the transformation coordinates and the residual error to obtain coordinates to be decoded.
Fig. 4 is a schematic structural diagram of a video-oriented visual feature coding apparatus according to an embodiment of the present invention, and as shown in fig. 4, the video-oriented visual feature coding apparatus according to the embodiment includes: a local feature acquisition unit 41, a determination unit 42, a reference local feature determination unit 43, and a local feature bitstream acquisition unit 44;
the local feature obtaining unit 41 is configured to obtain a local feature of a current frame in a video stream;
the determining unit 42 is configured to determine a reference local feature range of the local feature of the current frame in a reference frame of the current frame, where the reference frame of the current frame is one or more frames adjacent to the current frame;
the reference local feature determining unit 43 is configured to determine a reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame;
the local feature bitstream obtaining unit 44 is configured to obtain a local feature bitstream to be sent of the video stream according to the local feature of each frame in the video stream and the reference local feature.
For example, the local feature bitstream includes: a head region and a non-head region;
the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature of the coded reference-free local feature in each frame and the residual error between the coded local feature with the reference local feature and the reference local feature.
In a specific application, the local feature bitstream obtaining unit 44 may be specifically configured to encode the local feature without the reference local feature in each frame by using a first preset encoding manner, so as to obtain a first bitstream;
obtaining a residual error between a local feature with a reference local feature and the reference local feature;
coding the residual error by adopting a second preset coding mode to obtain a second bit stream;
the first bit stream and the second bit stream constitute a local feature bit stream to be transmitted of the video stream;
a header region of the local feature bit stream is composed of a binary code, and a non-header region includes: the local features coded by adopting a first preset coding mode and the residual coded by adopting a second preset coding mode.
In a specific implementation manner, the aforementioned encoding apparatus may further include a preprocessing unit, not shown in the figure, located after the local feature obtaining unit 41 and before the reference local feature determining unit 43, where the preprocessing unit is specifically configured to perform preprocessing on the local feature of the current frame; for example, the local features of the current frame are subjected to dimension reduction processing; and/or carrying out quantization processing on the local characteristics of the current frame.
Accordingly, the reference local feature determining unit 43 may be specifically configured to determine, according to the reference local feature range of the reference frame, a reference local feature of the preprocessed current frame in the reference frame. For example, the reference local feature range is all local features of the entire reference frame, or the reference local feature range is a subset of local features in each reference frame, and the metric distance between the local features in all subsets and the local feature of the current frame is less than or equal to a preset metric distance.
In another specific implementation manner, the aforementioned encoding apparatus may further include a sending unit, not shown in the figure, located after the local feature bitstream obtaining unit 44, and configured to send the local feature bitstream to be sent of the video stream to the server, so that the server obtains the local features of each frame in the video stream based on the local feature bitstream.
The video-oriented visual feature coding apparatus of this embodiment may be located in any client, such as a mobile terminal or other smartphone terminals or a fixed terminal, and may perform the method embodiment described in any of the foregoing fig. 1 and fig. 2, which is not described in detail herein.
The video-oriented visual feature coding device of the embodiment can realize rapid compression of transmitted feature data when a client transmits data, reduce the amount of transmitted data, and improve the transmission efficiency of video streams of the client.
Fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention, and as shown in fig. 5, the server according to the embodiment includes: a receiving unit 51, a local feature recovery unit 52;
the receiving unit 51 is configured to receive a local feature bit stream of a video stream sent by a client;
the local feature recovery unit 52 is configured to obtain a local feature of each frame in the video stream according to the local feature bit stream.
For example, after determining the information using the reference local feature from the header region of the local feature bitstream, the local feature recovery unit 52 obtains the number of local features of the current frame, the index information of the reference local feature, and the information indicating the quantization parameter of the local feature;
and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream. The server of this embodiment may interact with the client that performs encoding to recover the local features of each frame in the video stream in the server, and the server may perform all the processes of the foregoing fig. 3 and the decoding method, and this embodiment is not described in detail.
The server and the video-oriented visual feature coding device of the embodiment interact with each other, so that the problems that in the prior art, when a client transmits data, the transmitted feature data cannot be compressed quickly, and the transmitted data volume is reduced can be solved.
In a fifth aspect, an embodiment of the present invention further provides an image processing system, including: the video-oriented visual feature encoding device according to any of the above embodiments and the server according to any of the above embodiments, wherein the video-oriented visual feature encoding device transmits the acquired local feature bit stream of the video stream to the server, and the server restores the local features of each frame in the video stream according to the received local feature bit stream.
In a specific implementation process, the client may obtain a local feature bitstream of each frame in the video stream according to the method in any of the embodiments described above, and send the obtained local feature bitstream to the server, and the server may obtain the local feature of the video stream according to the local feature bitstream decoding.
The system can solve the problems that the client cannot quickly compress the transmitted characteristic data when transmitting data and reduce the data transmission quantity in the prior art.
In the description of the present invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the devices in an embodiment may be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (9)

1. A video-oriented visual feature coding method, comprising:
acquiring local characteristics of a current frame in a video stream;
determining a reference local feature range of the local feature of the current frame in a reference frame of the current frame, wherein the reference frame of the current frame is one or more adjacent frames of the current frame;
determining the reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame;
acquiring a local feature bit stream to be sent of the video stream according to the local feature and the reference local feature of each frame in the video stream;
wherein the local feature bitstream comprises:
a head region and a non-head region;
the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature of the coded reference-free local feature in each frame and the residual error between the coded local feature with the reference local feature and the reference local feature.
2. The method of claim 1, wherein determining a reference local feature range of the local feature of the current frame in a reference frame of the current frame comprises:
selecting any frame of one or more frames adjacent to the current frame as a reference frame;
the reference local feature range is all local features of the whole reference frame;
or,
selecting any frame of one or more frames adjacent to the current frame as a reference frame;
the reference local feature range is a subset of local features in the reference frame, and the metric distance between the local features in all the subsets and the local feature of the current frame is less than or equal to a preset metric distance.
3. The method according to claim 1 or 2, wherein determining the reference local feature of the current frame in the reference frame according to the reference local feature range of the reference frame comprises:
acquiring the matching similarity of each local feature of the current frame and the local feature of the reference local feature range in the reference frame;
comparing the matching similarity corresponding to each local feature of the current frame with a preset threshold range;
if all matching similarity corresponding to a certain local feature of the current frame does not meet a preset threshold range, determining that the local feature of the current frame has no reference local feature;
if two or more matching similarities which meet the preset threshold range in all the matching similarities corresponding to a certain local feature of the current frame exist, the local feature which best meets the matching similarity in the reference frame closest to the time point of the current frame is selected from the two or more matching similarities to serve as the reference local feature of the current frame.
4. The method according to claim 1 or 2, wherein obtaining a local feature bitstream to be transmitted of the video stream according to the local feature and the reference local feature of each frame in the video stream comprises:
coding the local features of the coded non-reference local features in each frame in a first preset coding mode to obtain a first bit stream;
obtaining a residual error between a local feature with a reference local feature and the reference local feature;
coding the residual error by adopting a second preset coding mode to obtain a second bit stream;
the first bit stream and the second bit stream constitute a local feature bit stream to be transmitted of the video stream;
a header region of the local feature bit stream is composed of a binary code, and a non-header region includes: the local features coded by adopting a first preset coding mode and the residual coded by adopting a second preset coding mode.
5. The method according to claim 1 or 2, characterized in that the method further comprises:
and sending the local feature bit stream to be sent of the video stream to a server, so that the server acquires the local feature of each frame in the video stream based on the local feature bit stream.
6. A video-oriented visual feature decoding method, comprising:
receiving a local feature bit stream of a video stream sent by a client, wherein the local feature bit stream comprises: a head region and a non-head region;
acquiring local features of each frame in the video stream according to the local feature bit stream;
wherein the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature without reference local feature after coding in each frame, and the residual error between the local feature with reference local feature and the reference local feature after coding;
correspondingly, obtaining the local feature of each frame in the video stream according to the local feature bit stream includes:
after determining the information of using the reference local feature from the head region of the local feature bit stream, acquiring the number of the local features of the current frame, the index information of the reference local feature and the information of the quantization parameter indicating the local feature;
and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream.
7. A video-oriented visual feature coding apparatus, comprising:
the local feature acquisition unit is used for acquiring the local features of the current frame in the video stream;
the determining unit is used for determining a reference local feature range of the local feature of the current frame in a reference frame of the current frame, wherein the reference frame of the current frame is one or more adjacent frames of the current frame;
a reference local feature determining unit, configured to determine, according to a reference local feature range of the reference frame, a reference local feature of the current frame in the reference frame;
a local feature bit stream obtaining unit, configured to obtain a local feature bit stream to be sent of the video stream according to a local feature and a reference local feature of each frame in the video stream;
wherein the local feature bitstream comprises:
a head region and a non-head region;
the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature of the coded reference-free local feature in each frame and the residual error between the coded local feature with the reference local feature and the reference local feature.
8. A server, comprising:
a receiving unit, configured to receive a local feature bitstream of a video stream sent by a client, where the local feature bitstream includes: a head region and a non-head region;
a local feature recovery unit, configured to obtain a local feature of each frame in the video stream according to the local feature bit stream;
wherein the head region includes: recording information of a reference frame and a reference local feature range corresponding to the reference frame, information for marking the number of local features, information for marking reference index information of the local features, and information for marking a quantization parameter;
the non-head region includes: the local feature without reference local feature after coding in each frame, and the residual error between the local feature with reference local feature and the reference local feature after coding;
accordingly, the local feature recovery unit is specifically configured to:
after determining the information of using the reference local feature from the head region of the local feature bit stream, acquiring the number of the local features of the current frame, the index information of the reference local feature and the information of the quantization parameter indicating the local feature;
and decoding the local features from a non-header region according to the index information of the reference local features and the information of the quantization parameters indicating the local features to obtain the local features of each frame in the video stream.
9. A video processing system, comprising:
the video-oriented visual feature coding device according to claim 7 and the server according to claim 8, wherein the video-oriented visual feature coding device transmits the acquired local feature bit stream of the video stream to the server, and the server restores the local features of each frame in the video stream according to the received local feature bit stream.
CN201510134617.7A 2015-03-25 2015-03-25 A kind of visual signature coding method and device towards video Active CN104767998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510134617.7A CN104767998B (en) 2015-03-25 2015-03-25 A kind of visual signature coding method and device towards video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510134617.7A CN104767998B (en) 2015-03-25 2015-03-25 A kind of visual signature coding method and device towards video

Publications (2)

Publication Number Publication Date
CN104767998A CN104767998A (en) 2015-07-08
CN104767998B true CN104767998B (en) 2017-12-08

Family

ID=53649565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510134617.7A Active CN104767998B (en) 2015-03-25 2015-03-25 A kind of visual signature coding method and device towards video

Country Status (1)

Country Link
CN (1) CN104767998B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108882020B (en) * 2017-05-15 2021-01-01 北京大学 Video information processing method, device and system
CN107846576B (en) 2017-09-30 2019-12-10 北京大学 Method and system for encoding and decoding visual characteristic data
CN113453017B (en) * 2021-06-24 2022-08-23 咪咕文化科技有限公司 Video processing method, device, equipment and computer program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226589A (en) * 2012-10-15 2013-07-31 北京大学 Method for obtaining compact global feature descriptors of image and image retrieval method
CN103561264A (en) * 2013-11-07 2014-02-05 北京大学 Media decoding method based on cloud computing and decoder
CN104093030A (en) * 2014-07-09 2014-10-08 天津大学 Distributed video coding side information generating method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040258147A1 (en) * 2003-06-23 2004-12-23 Tsu-Chang Lee Memory and array processor structure for multiple-dimensional signal processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226589A (en) * 2012-10-15 2013-07-31 北京大学 Method for obtaining compact global feature descriptors of image and image retrieval method
CN103561264A (en) * 2013-11-07 2014-02-05 北京大学 Media decoding method based on cloud computing and decoder
CN104093030A (en) * 2014-07-09 2014-10-08 天津大学 Distributed video coding side information generating method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向人体动作识别的局部特征时空编码方法;王斌等;《四川大学学报(工程科学版)》;20140331;第46卷(第2期);第72-78页 *

Also Published As

Publication number Publication date
CN104767998A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN110225341B (en) Task-driven code stream structured image coding method
US20220329845A1 (en) Image encoding method and apparatus, and image decoding method and apparatus
CN103226589B (en) The compact global characteristics obtaining image describes method and the image search method of son
Duan et al. Compact descriptors for visual search
Ma et al. Joint feature and texture coding: Toward smart video representation via front-end intelligence
Zhang et al. A joint compression scheme of video feature descriptors and visual content
CN111131825A (en) Video processing method and related device
Baroffio et al. Coding binary local features extracted from video sequences
CN104767998B (en) A kind of visual signature coding method and device towards video
CN111093077A (en) Video coding method and device, electronic equipment and storage medium
CN104767997B (en) A kind of visual signature coding method and device towards video
US10445613B2 (en) Method, apparatus, and computer readable device for encoding and decoding of images using pairs of descriptors and orientation histograms representing their respective points of interest
Baroffio et al. Coding local and global binary visual features extracted from video sequences
CN103020138A (en) Method and device for video retrieval
US10536726B2 (en) Pixel patch collection for prediction in video coding system
Chen et al. Quality-of-content (QoC)-driven rate allocation for video analysis in mobile surveillance networks
Baroffio et al. Hybrid coding of visual content and local image features
Chen et al. Interframe coding of global image signatures for mobile augmented reality
Van Opdenbosch et al. A joint compression scheme for local binary feature descriptors and their corresponding bag-of-words representation
WO2023225808A1 (en) Learned image compress ion and decompression using long and short attention module
Bondi et al. Multi-view coding of local features in visual sensor networks
Wood Task Oriented Video Coding: A Survey
Monteiro et al. Coding mode decision algorithm for binary descriptor coding
CN107018421B (en) A kind of image sending, receiving method and device, system
Tian et al. Just noticeable difference modeling for face recognition system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant