CN113438483A

CN113438483A - Crowdsourcing video coding method and device

Info

Publication number: CN113438483A
Application number: CN202010209585.3A
Authority: CN
Inventors: 虞露; 于化龙
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2021-09-24
Anticipated expiration: 2040-03-23
Also published as: CN113438483B

Abstract

The invention discloses a crowdsourcing video coding method and a device, wherein the method comprises the steps of obtaining a group of knowledge image sets from candidate image sets by utilizing a crowdsourcing method, updating the knowledge image sets through multiple iterations, and comparing cost consumed by different knowledge image sets and provided coding benefits to obtain the knowledge image set which can enable a video to be coded to obtain the highest coding efficiency. The invention has the advantages that the optimal knowledge image set can be obtained, so that the video coding based on the knowledge base obtains higher coding efficiency than the prior art.

Description

Crowdsourcing video coding method and device

Technical Field

The present invention relates to the field of image or video compression technologies, and in particular, to a method for selecting a reference image.

Background

1. Legacy video coding scheme

In the existing video sequence processing, in order to make the encoded video sequence support the random access function, the video sequence is divided into a plurality of segments with the random access function (referred to as random access segments for short), as shown in fig. 1, a video sequence includes at least one random access segment, each random access segment corresponds to a display period and includes a random access image and a plurality of non-random access images, each image has a respective display time to describe the time when the image is displayed or played. The pictures in one random access segment may be intra-coded or coded using inter-prediction with reference to other pictures in the random access segment, where the referenced pictures may be pictures to be displayed, or synthetic pictures that cannot be displayed, etc. However, in the prior art, a picture (excluding leading pictures) that is displayed sequentially after a random access picture can only refer to other pictures in a random access segment to which the picture belongs, and cannot refer to pictures in a random access segment before or after the random access segment to which the picture belongs, as shown in fig. 1. Specifically, the dependency relationship between the current image and the candidate reference image is described in the following ways:

in existing video coding schemes (such as h.264\ AVC or h.265\ HEVC), the dependency between a current picture and a candidate reference picture is described by a reference picture configuration set of a video compression layer, where the reference picture configuration set describes the number difference between the reference picture and the current picture. Only the number difference is described in the reference picture configuration set because in the existing video coding scheme, the candidate reference picture and the current picture belong to the same independently decodable random access segment, and the candidate reference picture and the current picture can only use the same numbering rule, for example, numbering according to the time sequence, so that the candidate reference picture can be accurately positioned according to the current picture number and the candidate reference picture number difference. If the reference picture and the current picture use different numbering rules, since the existing video coding scheme does not provide a method for describing different numbering rules in the code stream, the same numbering difference will point to different candidate reference pictures, so that the codec cannot use the correct candidate reference pictures.

In Scalable Video Coding (SVC) and multi-view Video Coding (MVC), SVC/MVC extends the range of candidate reference pictures for a current picture using inter-layer/inter-view prediction on the basis of existing inter prediction (using only candidate reference pictures within the same layer/same view), as shown in fig. 2, where the extended candidate reference pictures have the same number (e.g., the same timestamp) as the current picture and do not belong to the same level of an independently decodable slice. SVC/MVC uses hierarchical identification to describe the dependency relationship of code streams of different layers/viewpoints in a video compression layer, and uses the same number of images in combination to describe the dependency relationship of images between layers/viewpoints.

In the background-frame technique of AVS2, as shown in fig. 3, the dependency of an encoded image and a scene image is described by the identification of a reference image type in the video compression layer. Specifically, the AVS2 describes a particular scene image type (i.e., G image and GB image) using the flag, and manages G/GB images using a particular reference buffer (i.e., scene image buffer), and at the same time, describes whether the current image refers to a G/GB image using the flag, and uses a particular reference image queue construction method (i.e., puts the G/GB image into the last reference image bit of the reference image queue by default), and finally, enables the current image numbered according to the rule to refer to a candidate reference image (i.e., GB image) that is not numbered according to the rule, or a candidate reference image (i.e., G image) that uses the same rule number as the current image but has a difference in numbering that exceeds the constraint range. However, the technique limits that only one candidate reference picture can exist in the scene picture buffer at any moment, and the candidate reference picture still belongs to the same independently decodable segment as the current picture.

2. Video coding scheme based on knowledge base

The above mechanism in the prior art may limit the number of available reference pictures of a picture to be encoded, and may not effectively improve the efficiency of picture encoding and decoding.

In order to mine and utilize information that pictures among a plurality of random access segments are mutually referenced during coding, when a picture is coded (or decoded), an encoder (or a decoder) can select a picture which is similar to the texture content of the current coded picture (or decoded picture) from a database as a reference picture, the reference picture is called a knowledge picture, the database storing the set of the reference pictures is called a knowledge base, and the method for coding and decoding at least one picture in a video by referring to at least one knowledge picture is called library-based video coding (English). The method comprises the steps that a video sequence is coded by adopting video coding based on a knowledge base, so that a knowledge code stream containing a knowledge image coding code stream and a main code stream containing a code stream obtained by referring to the knowledge image coding of each frame of image of the video sequence are generated. The two code streams are respectively similar to a base layer code stream and an enhancement layer code stream generated by Scalable Video Coding (SVC), that is, a main code stream depends on a knowledge code stream, and the knowledge code stream can be inserted into the main code stream to form a spliced code stream. However, the dependency relationship between the dual stream organization of the knowledge base-based video coding and the hierarchical stream organization of the scalable video coding is different, in that the dual stream hierarchy of the SVC depends on a certain alignment time period, and the video layer dependent knowledge layer in the dual stream of the knowledge base-based video coding depends on a non-alignment time period.

In the encoding and decoding technology using the knowledge image, the knowledge image is obtained and used for providing additional candidate reference images for encoding and decoding of the image, and fig. 4 shows the dependency relationship between the sequence image and the knowledge image in the encoding and decoding technology using the knowledge image. The knowledge images enable the sequence images to utilize large-span related information, and coding and decoding efficiency is improved.

The amount of effective reference information that knowledge images can carry determines the amount of coding efficiency that the knowledge images can bring to a video to be coded, and an optimal knowledge image set needs to be able to contain key information in the video to be coded, such as a background that appears for a long time, a repeatedly-appearing object, and the like. The existing video coding technical scheme based on the knowledge base uses two methods to obtain a knowledge image set. One method is to accumulate the variability of the content in the video to be coded, when the content variation accumulation reaches a threshold, the current image to be coded is selected as a knowledge image, and a reference is provided for the subsequent image to be coded. The method has the advantages that the content of the video to be coded can be processed in real time, but the method has the disadvantage that redundant information can be generated among knowledge images, the same background and foreground can repeatedly appear in the video to be coded, and the content change accumulation can not detect the repeatedly appearing content. When the video to be coded is completely obtained, large-span related information which repeatedly appears in the video to be coded can be analyzed globally, therefore, another method is to divide the video to be coded into a plurality of scene segments by utilizing scene switching detection, the scene segments containing similar contents are clustered into the same category, and for all the scene segments in the same scene category, the most representative image is selected as a knowledge image, and finally a knowledge image set is obtained, wherein each knowledge image represents a category of scenes. However, this method can only acquire a knowledge image containing a key background, but cannot provide complete information of the background (for example, some background is blocked or shot out), and cannot provide key foreground information, which makes the encoding efficiency unable to be further improved. Meanwhile, scene clustering brings excessive computational complexity on long-time video processing, so that the encoding time is greatly increased. Therefore, the invention provides a crowdsourced video coding method and device to obtain an optimal knowledge image set so as to further improve the video coding efficiency.

Disclosure of Invention

The invention discloses a crowdsourcing video coding method and a crowdsourcing video coding device. The crowdsourcing method updates the knowledge image set by alternately increasing and removing, and compares the cost consumed by different knowledge image sets and the provided coding contribution degree to obtain the knowledge image set which can enable the image to be coded to obtain the highest coding efficiency.

A first object of the present invention is to provide a method of video coding for crowdsourcing, comprising:

the knowledge image adding method comprises the following steps: detecting candidate images in a candidate image set one by one, and adding the candidate images into a knowledge image set to update the knowledge image set if the candidate images are used as knowledge images to increase the coding benefit of a representative image set of a video segment to be coded, wherein the coding benefit is obtained from the knowledge image set;

the knowledge image eliminating method comprises the following steps: detecting knowledge images in a knowledge image set one by one, if the knowledge image set from which the knowledge images are removed can increase the coding gain obtained by the representative image set, removing the knowledge images from the knowledge image set to update the knowledge image set, and putting the knowledge images into a candidate image set to update the candidate image set;

updating the knowledge image set by alternately using the two methods until the two methods cannot increase the coding gain of the representative image set from the knowledge image set;

and coding the image to be coded in the video segment to be coded by using the knowledge image set as a reference.

Preferably, the knowledge image adding method includes the steps of:

selecting: selecting a candidate image from the candidate image set to obtain coding income obtained by the representative image set from the knowledge image set when the candidate image is used as the knowledge image; performing one of the following operations:

a) addition step

If the coding profit is larger than a first threshold value, adding the candidate image into a knowledge image set to update the knowledge image set, and removing the candidate image from the candidate image set to update the candidate image set; setting the first count to zero, and repeating the selecting step;

b) judging step

If the coding gain is not greater than the first threshold, leaving the candidate image in the set of candidate images, incrementing a first count by one, performing one of:

if the value of the first count is less than the number of candidate images in the candidate image set, repeating the selecting step;

if the value of the first count is not less than the number of candidate images in the candidate image set, the knowledge-image adding method is ended.

Preferably, said alternating comprises the steps of:

carrying out a knowledge image adding method, and carrying out a knowledge image removing method when the knowledge image adding method is finished;

if the knowledge image eliminating method can increase the coding income obtained by the representative image set, repeating the previous step; otherwise, the alternation is ended.

Preferably, after the alternation is finished, the method further comprises:

and selecting an optimal set from the knowledge image set and the candidate image set to update the knowledge image set, wherein the optimal set is one of the knowledge image set and the candidate image set, which enables the representative image set to obtain higher coding benefit.

Preferably, the candidate images in the candidate image set are selected from or synthesized from a video clip to be encoded or other videos similar to the content of the video clip to be encoded.

Preferably, the coding benefit of the representative image set is a difference between a sum of coding contribution degrees provided by the knowledge image set to the representative image set and a sum of coding costs of the knowledge image set.

Preferably, the method for calculating the coding gain obtained by the representative image set comprises: calculating the coding cost of each knowledge image in the knowledge image set and a group of coding contribution degrees provided by the coding cost to the coding of all images to be coded in the representative image set; the total coding cost is the coding cost of all knowledge images in the knowledge image set; the total coding contribution degree is the coding contribution degree provided by all knowledge images in the knowledge image set to the coding of all images to be coded in the representative image set.

A second object of the present invention is to provide a crowdsourced video encoding device, comprising:

a processor;

a memory for storing a set of candidate images, a set of representative images and a video segment to be encoded; and

one or more programs are used to perform the following method:

the knowledge image adding method comprises the following steps: the processor detects candidate images in the candidate image set one by one, and if the processor uses the candidate images as knowledge images to increase the coding benefit of the representative image set of the video segment to be coded, which is obtained from the knowledge image set, the processor adds the candidate images into the knowledge image set to update the knowledge image set;

the knowledge image eliminating method comprises the following steps: the processor detects knowledge images in the knowledge image set one by one, if the processor can increase the coding income obtained by the representative image set by using the knowledge image set from which the knowledge images are removed, the processor removes the knowledge images from the knowledge image set to update the knowledge image set, and puts the knowledge images into the candidate image set to update the candidate image set;

the processor alternately updates the knowledge image set by using the two methods until all the methods cannot increase the coding gain of the representative image set from the knowledge image set;

and the processor uses the knowledge image set as a reference to encode the image to be encoded in the video segment to be encoded.

Preferably, the method for adding the knowledge image by the processor comprises the following steps:

selecting: the processor selects a candidate image from the candidate image set to obtain coding income obtained by the representative image set from the knowledge image set when the candidate image is used as the knowledge image; the processor performs one of the following operations:

a) addition step

If the processor judges that the coding income is larger than a first threshold value, the processor adds the candidate image into a knowledge image set to update the knowledge image set, and eliminates the candidate image from the candidate image set to update the candidate image set; the processor sets the first count to zero and repeats the selecting step;

b) judging step

If the processor determines that the coding gain is not greater than the first threshold, the processor leaves the candidate image in the set of candidate images, increments a first count, and performs one of:

if the processor determines that the value of the first count is less than the number of candidate images in the candidate image set, the processor resets the fetching step;

if the processor determines that the value of the first count is not less than the number of candidate images in the candidate image set, the processor ends the knowledge image adding method.

Preferably, said alternating comprises the steps of:

the processor carries out a knowledge image adding method and carries out a knowledge image removing method when the knowledge image adding method is finished;

if the processor performs the knowledge image rejection method to increase the coding gain obtained by the representative image set, the processor repeats the last step; otherwise, the processor ends the alternation.

The invention has the advantages that the optimal knowledge image set can be obtained, so that the video coding based on the knowledge base obtains higher coding efficiency than the prior art.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of image dependency of a video sequence segmented into random access segments using a first prior art technique;

FIG. 2 is a graph illustrating image dependency of a video sequence segmented into random access segments using a prior art two;

FIG. 3 is a graph illustrating image dependency of a video sequence segmented into random access segments using a prior art technique III;

FIG. 4 is a graph illustrating image dependency of a video sequence segmented into random access segments using a prior art four;

FIG. 5 is a flow chart illustrating a method for implementing the video encoding method of the present invention;

fig. 6 is a flow chart of another implementation method of the video coding method disclosed by the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.

The noun defines:

before describing the embodiments, the necessary noun definitions are stated:

video to be encoded: also known as a video sequence to be encoded, includes a set of video images for presentation. The video to be coded is a received video sequence to be coded, and the video sequence is an image sequence of a set of complete videos or image sequences of a plurality of sets of complete videos or image sequences obtained by splicing a plurality of sets of videos.

Video clip to be encoded: and one segment in the video to be coded, wherein the segment comprises at least two images to be coded. The video segment to be encoded is a subset of the video to be encoded, and is encoded using the knowledge image in the knowledge image set as a reference image.

Representing an image set: the method comprises the steps of obtaining a set of images to be coded from a video clip to be coded, wherein the image set comprises at least two images to be coded, and the images represent key contents of the video clip to be coded. The image to be encoded in the representative image set is also referred to as a representative image.

Candidate image set: the candidate images include all candidate images that may be selected as knowledge images, the image set includes at least two candidate images, and the candidate images may be from a video segment to be encoded or other videos including contents similar to the video segment to be encoded, and may be video images originally used for display, or synthesized images, or stitched images obtained by stitching.

Knowledge image set: a set of knowledge images is included. The knowledge image is an external image which can provide additional reference information for the image to be coded, and the external part refers to the knowledge image from the outside of the image set needing to be displayed in the random access segment to which the current image to be coded belongs and the random access segment which is the nearest to the current image to be coded. A knowledge picture is a reference picture that is used to provide a reference for a picture to be encoded or decoded. The knowledge image is selected from a set of candidate images.

Rl (reference to library) image: an image that is encoded or decoded with reference to only a knowledge image. In one implementation, the RL picture immediately follows the sequence header data by decoding the outer knowledge picture first and then decoding the RL picture to implement the random access function; in one implementation, the RL picture immediately follows the sequence header data and the knowledge picture data, and the random access function is implemented by decoding the knowledge picture first and then decoding the RL picture; in one implementation, the RL picture follows the sequence header data with data such as supplemental enhancement information or extension information or user information, and the random access function is implemented by decoding the knowledge picture first and then decoding the RL picture.

Crowdsourcing: a method of distributing a target task to a plurality of users to accomplish the target task. Crowd sourcing maximizes the benefits to the target task, including the contribution to the target task being completed and the cost penalty the user has to expend in completing the task.

Description of the embodiments:

according to fig. 4, the knowledge image provides reference information for encoding the video segment to be encoded, and therefore, how much reference information the knowledge image can provide is a key for determining the encoding efficiency of the video segment to be encoded. When the knowledge image provides a large amount of effective reference information, less residual errors can be obtained by inter-frame prediction of the image to be coded by using the knowledge image as a reference image, so that lower code rate and less distortion are obtained; when the effective reference information provided by the knowledge image is very little, the image to be coded can not obtain the effective reference information from the knowledge image, only the reference information can be obtained from the original short-term reference image or long-term reference image by using inter-frame prediction, or only the reference information can be obtained by using intra-frame prediction, so that the code rate and distortion of the image to be coded are limited. Therefore, the embodiment of the present invention provides a method for selecting a knowledge image for video encoding using crowdsourcing, so that the knowledge image can contain as much effective reference information as possible, so as to improve the encoding efficiency of a video segment to be encoded.

The core idea of the crowdsourcing method is to continuously add or remove knowledge images from a knowledge image set, so that the coding gain of a representative image set of a video segment to be coded from the knowledge image set is continuously increased, and when any adding or deleting operation cannot increase the coding gain, an optimal knowledge image set is finally obtained for coding the video segment to be coded. The video encoding method of the crowdsourcing will be described with reference to fig. 5.

501: a knowledge image addition method is used, i.e. candidate images are detected one by one in the candidate image set, and if the candidate images are used as knowledge images to increase the coding gain of the representative image set from the knowledge image set, the candidate images are added to the knowledge image set to update the knowledge image set.

The candidate images in the candidate image set may be images to be encoded in a video segment to be encoded, or images obtained by synthesis. There are various ways to obtain the candidate image set, for example, selecting a plurality of images from the video segment to be encoded as the candidate image set; selecting a plurality of images from other one or more videos with similar contents to the video segment to be coded as a candidate image set; for another example, a plurality of images are selected from a plurality of image sets containing backgrounds or foregrounds similar to the content of the video to be coded as candidate image sets, and in one implementation, when the video segment to be coded is a monitoring video segment, a plurality of images are selected from a vehicle image set or a road image set or a pedestrian image set as candidate image sets; for example, according to the images in the video segment to be encoded or the similar video, a background image is obtained by background synthesis, a foreground image is obtained by image segmentation, or different images are spliced to obtain a spliced image, and the images are used as a candidate image set. The candidate image set may be acquired by any of the above-described methods of acquiring the representative image set.

The representative image set contains key contents of the video segment to be encoded, where the key contents are background and foreground frequently appearing in the video segment to be encoded, or are images to be encoded with high importance (e.g., random access point images, base layer images in a layered coding structure, etc.). By calculating the coding efficiency obtained by coding when the representative image set uses the knowledge image set as the reference to evaluate the coding efficiency obtained by coding when the video segment to be coded uses the knowledge image set as the reference, a group of knowledge image sets which can improve the coding efficiency of the video segment to be coded can be preferentially obtained. One of the most straightforward ways is to select all the pictures to be coded in the video segment to be coded into the representative picture set. In order to be able to adequately represent the key content of the video segment to be encoded and reduce the consumption of computing resources, in one implementation, the representative image set only contains a part of the images to be encoded in the video segment to be encoded, for example, a plurality of images to be encoded are obtained at regular intervals from the video segment to be encoded as the representative image set according to a fixed time interval (e.g., 0.5 second, 1 second, 2 seconds, etc.) or the number of images (e.g., 1, 2, 10, etc.); for another example, a plurality of first (or second or third) images to be coded after scene switching are obtained as a representative image set according to scene switching positions in a video segment to be coded; for another example, according to the content motion condition in the video segment to be encoded, obtaining a plurality of images to be encoded with stable content and without motion blur as a representative image set; for example, a plurality of images to be coded containing different backgrounds or foregrounds are obtained according to the distribution situation of the backgrounds and the foregrounds in the video segments to be coded to serve as a representative image set; obtaining a plurality of Intra-frame coding images as a representative image set according to the Intra-frame image Period (Intra Period) of the video segment to be coded; for example, a plurality of random access point images are obtained according to random access points in a video segment to be coded and serve as a representative image set; for example, a plurality of images to be coded in the video segment to be coded are down-sampled to obtain a representative image set with reduced resolution; and a plurality of images to be encoded are acquired as a representative image set, for example, using a combination method of the above-described methods. The method for detecting scene switching of the video segment to be coded includes various methods, for example, calculating a pixel value difference mean value between a plurality of groups of pixels in two adjacent images in the video segment to be coded, and when the pixel value difference mean value is greater than a certain threshold value, determining that scene switching occurs between the two images; for another example, image features (for example, Scale Invariant Feature Transform descriptor SIFT, Scale Invariant Feature Transform) are respectively extracted from two images in a video segment to be encoded, and when the matching degree of the image features is smaller than a certain threshold, it is considered that scene switching occurs between the two images.

The coding gain obtained by the representative image set should reflect the coding efficiency obtained by coding when the representative image set uses the knowledge image set as a reference, so as to further reflect the coding efficiency obtained by coding when the video segment to be coded uses the knowledge image set as a reference. The coding efficiency is generally expressed as a reduction amount of the coding cost, the coding cost comprises distortion and a code rate of a code stream obtained after the video segment to be coded is coded, the reduction amount is a difference value between the coding cost obtained when the video segment to be coded is coded by using the knowledge image set as a reference and the coding cost obtained when the video segment to be coded is coded without using the knowledge image, the reduction amount is greater than zero, the knowledge image set can provide effective reference information for the video segment to be coded and improve the coding efficiency, otherwise, the knowledge image can not provide effective reference information for the video segment to be coded and reduce the coding efficiency. When the code stream of the knowledge image and the code stream of the video to be coded are stored or transmitted to a decoding end together, the coding cost also comprises the code rate of the code stream obtained after the knowledge image is coded (the distortion is not considered because the knowledge image is not used for display; in another implementation mode, the coding cost also comprises the distortion of the code stream of the knowledge image when the knowledge image can be used for display); when the knowledge image can be stored in the encoding end and the decoding end in a prefetching or in advance synchronization mode, the encoding cost does not include the code rate of the code stream of the knowledge image. There are many ways to calculate the coding gain. In one implementation, the coding benefit should be equal to the reduction of the coding cost, but the distortion and the code rate of the code stream obtained by actual coding consume a large amount of computing resources, so in another implementation, the coding benefit can estimate the reduction of the coding cost, that is, the sum of the coding contribution degrees of the representative image set according to the knowledge image and the sum of the coding cost of the knowledge image are subtracted to calculate the coding benefit. The method for calculating the total coding contribution degree comprises the steps of selecting a knowledge image with the highest coding contribution degree for each representative image, namely a plurality of knowledge images cannot provide the coding contribution degree for one representative image at the same time, and then summing the coding contribution degrees obtained by each representative image from the knowledge images referred by the representative image to obtain the total coding contribution degree; in another implementation, the representative image can use multiple knowledge images as reference images at the same time, and in this case, the total coding contribution is calculated by first obtaining a set of coding contribution obtained by each representative image from all knowledge images to which the representative image refers, and then summing the coding contribution of each set to obtain the total coding contribution.

According to the coding contribution and the coding benefit calculated by the coding cost, the coding cost of each knowledge image in the knowledge image set and a coding contribution list thereof need to be calculated, wherein the coding contribution list comprises the coding contribution provided by the knowledge image to which the knowledge image belongs for coding each representative image in the representative image set.

There are various methods for calculating the coding cost, for example, calculating the self-information entropy of a knowledge image according to the pixel value thereof as the coding cost; for another example, extracting image features from the knowledge image, and calculating coding cost according to the significance and the number of the image features; for example, intra-frame coding is carried out on the knowledge image, coding cost is calculated according to the code rate of a code stream obtained by coding, and the coding cost can be used for calculating the most accurate coding benefit due to the fact that complete coding operation is carried out; for another example, code rate estimation is performed on the knowledge image (for example, the code rate is estimated according to the sum of the variances of the pixel values of the image blocks in the knowledge image, or the code rate is estimated according to the gradient values of the pixels of the image blocks in the knowledge image), and the coding cost is calculated according to the estimated code rate; for example, the knowledge image is encoded by using simplified intra-frame prediction (for example, a simplified intra-frame prediction mode is used, or only a direct current mode DC and a plane mode Planar are used), and the encoding cost is calculated according to the code rate of a code stream obtained by encoding; and for example, a predicted value of the knowledge image is obtained by using simplified intra-frame prediction, and the coding cost is calculated according to a residual error between an original pixel value and the predicted value of the knowledge image.

There are various methods for calculating the coding contribution degree, for example, calculating the mutual information entropy of the knowledge image and the representative image according to the pixel values of the two as the coding contribution degree; for another example, the image features of the knowledge image and the representative image are extracted, and the encoding contribution degree is calculated based on the matching degree between the image features. In one implementation, the contribution calculation method performs inter-frame coding on the representative image, calculates a difference between a coding cost when the representative image uses the knowledge image as the reference image and a coding cost when the representative image does not use the knowledge image as the reference image, and uses the difference as a coding contribution, that is, a coding cost reduction amount that the knowledge image can bring to coding of the representative image. The inter-frame coding method for obtaining the coding cost may use a complete inter-frame coding method in an existing coding scheme (such as HEVC, AVS, etc.) or simplified inter-frame coding (for example, reducing a motion search range, or using a fixed coding block size such as 8x8 or 16x16, etc.), where the coding cost includes a weighted sum of a coding rate and distortion of coding; motion search may also be used to obtain a reference block of a fixed block-sized coding block in the knowledge image, at which point the coding cost is the pixel residual between the coding block and the reference block in the knowledge image.

502: and (3) a knowledge image removing method is used, namely knowledge images are detected in a knowledge image set one by one, if the knowledge image set after the knowledge images are removed can increase the coding yield obtained by the representative image set, the knowledge images are removed from the knowledge image set to update the knowledge image set, and the knowledge images are put into the candidate image set to update the candidate image set.

The elimination of the knowledge images makes it possible to increase the coding gain obtained by the representative image set because the existing knowledge images in the knowledge image set may have redundancy, and when the knowledge images are eliminated, the consumed coding cost is eliminated, and the coding contribution degree provided by the knowledge images can be replaced by other existing knowledge images to a certain extent, so that the coding gain is increased.

The reason for placing the culled knowledge image into the candidate image set is that the knowledge image may still be added to the knowledge image set in a later operation, because the knowledge image set after multiple updates may still need the knowledge image that has been culled before to provide the encoding contribution in the later operation.

503: and updating the knowledge image set by alternately using the two methods until all the methods cannot increase the coding benefit of the representative image set from the knowledge image set.

The method for adding and removing the knowledge images is used alternately, so that the situation that the selection of the knowledge image set falls into the local optimal solution can be avoided. Therefore, when the knowledge image adding method cannot increase the coding gain obtained by the representative image set, the knowledge image eliminating method is used instead; when the knowledge image eliminating method cannot increase the coding income obtained by the representative image set, the knowledge image increasing method is replaced; when neither method can increase the coding gain obtained by the representative image set, the optimal knowledge image set is considered to have been obtained.

504: and coding the image to be coded in the video segment to be coded by using the knowledge image set as a reference.

When encoding is performed, the knowledge image may be the only reference image used by the image to be encoded, and in this case, the image to be encoded does not depend on other already encoded video images in the video segment to be encoded, only depends on at least one knowledge image in the knowledge image set, and can support a random access function (in this case, the image to be encoded is an RL image) in the case that the knowledge image set already exists. In one implementation, the knowledge image is a candidate reference image of an image to be encoded, the reference image that can be used by the image to be encoded when encoding includes not only the knowledge image but also a short-term reference image and/or a long-term reference image that already exists in the decoded image buffer, and the image to be encoded can select to acquire reference information from any candidate reference image through an encoding decision (e.g., rate distortion optimization or mode selection).

There are many methods for selecting a knowledge image from a knowledge image set as a reference image of an image to be encoded, for example, calculating content similarity (such as inter-frame pixel difference, feature matching value, motion search prediction value, etc.) between the image to be encoded and the knowledge image and selecting one or more knowledge images most similar to the image to be encoded; for example, according to the scene classification of the segment to which the image to be coded belongs, one or more knowledge images similar to the scene are selected; for another example, according to the coding contribution degree of the knowledge image to the image to be coded, selecting one or more knowledge images with the largest coding contribution degree; for another example, when the knowledge image is a video image extracted from a video to be encoded, one or more knowledge images with the smallest difference, that is, the closest time, are selected according to the difference between the corresponding time position of the knowledge image in the video to be encoded and the time position of the image to be encoded.

Referring to fig. 6, which shows another flowchart of the video encoding method provided by the present invention, the video encoding method shown in fig. 6 is obtained by changing on the basis of fig. 5, and unlike the method shown in fig. 5, fig. 6 explicitly sets the alternative usage of the adding method and the removing method of the knowledge image.

601: (adding a knowledge image) the candidate images in the candidate image set are traversed, a candidate image which enables the representative image set to obtain the most coding income is found, the candidate image is added into the knowledge image set to update the knowledge image set, and the candidate image is removed from the candidate image set to update the candidate image set.

602: detecting candidate images in the candidate image set one by one (adding a knowledge image), and performing one of the following operations:

1) if the candidate image is taken as a knowledge image such that the coding gain obtained by the representative image set is greater than the first threshold, then the candidate image is added to the knowledge image set to update the knowledge image set, the candidate image is culled from the candidate image set to update the candidate image set, the first count is set to zero, and step 602 is repeated.

2) If the candidate image is taken as the knowledge image such that the coding gain obtained by the representative image set is not greater than the first threshold, the candidate image is left in the candidate image set, the first count is increased by one, and one of the following operations is performed:

A) if the value of the first count is less than the number of candidate images in the candidate image set, repeating step 602;

B) if the value of the first count is not less than the number of candidate images in the candidate image set, step 602 is ended and step 603 is performed.

The first threshold may be a fixed value or an iteratively updated value, and should be generally greater than zero in order to increase the coding yield. In one implementation, the first threshold is calculated from the knowledge image set such that the coding gain obtained for the representative image set is calculated for a given growth ratio, which may be greater than 1, for example, the growth ratio may be calculated as 1+ a/K (where a is a growth factor greater than zero, which may be 0.3, 0.4, 1, 10, etc., and K is the number of candidate images in the candidate image set). In another implementation, step 602 does not need to determine the next operation according to the magnitude relationship between the coding benefit and the first threshold, but determines the result according to the variation of the coding benefit in multiple iterations, for example, when the variation of the coding benefit in multiple iterations is less than a third threshold (for example, a value greater than zero), the coding benefit is considered to reach the convergence result, and the iteration loop of the crowdsourcing method is terminated early to avoid excessive computational complexity.

Wherein the first count is for counting the number of candidate images in the candidate image set that cannot increase the coding gain obtained by the representative image set. The first count is smaller than the number of candidate images in the candidate image set, which indicates that other candidate images still exist in the candidate image set to be detected, and coding benefit obtained by the representative images is possibly increased; and the first count is not less than the number of candidate images in the candidate image set, which indicates that all candidate images in the candidate image set have been detected and cannot increase the coding gain obtained by the representative image, and at this time, the knowledge image adding method should be ended, and the knowledge image removing method should be used instead.

603: detecting the knowledge images in the knowledge image set one by one (removing the knowledge images), and executing one of the following steps:

1) if the knowledge-image is removed from the knowledge-image set such that the coding gain obtained by the representative-image set is greater than the second threshold, then the knowledge-image is removed from the knowledge-image set to update the knowledge-image set, the knowledge-image is added to the candidate-image set to update the candidate-image set, and step 602 is performed again.

2) If the knowledge image is removed from the knowledge image set such that the coding gain obtained by the representative image set is less than a second threshold, performing one of the following steps:

A) if the knowledge image is not detected, executing step 603 again;

B) if all knowledge images have been detected, step 603 is ended and step 604 is performed.

604: and selecting an optimal set from the knowledge image set and the candidate image set to update the knowledge image set, wherein the optimal set is one of the knowledge image set and the candidate image set, which enables the representative image set to obtain higher coding benefit.

The optimal one of the knowledge image set and the candidate image set is selected because the candidate image set may cause the representative image set to obtain higher coding benefit, because the coding benefit considers both the degree of coding contribution and the coding cost, when the coding cost is too large, the coding benefit is smaller, and the candidate image set may have smaller coding cost to obtain higher coding benefit.

605: and using at least one knowledge image in the knowledge image set as a reference image of at least one image to be coded in a video segment to be coded, and coding the image to be coded.

Another embodiment of the video encoding method provided by the present invention is described below, which is obtained by changing the method shown in fig. 6, and is different from the video encoding method shown in fig. 6 in that the random access point image in the video segment to be encoded is selected as the representative image set and the candidate image set at the same time, because the original random access point image in the video segment to be encoded has no available reference image and can only use the intra-frame encoding method, and the use of the knowledge image can provide an available reference image for the random access point image and allow the random access point image to use the inter-frame encoding method, thereby significantly improving the encoding efficiency; and the non-random access point image may be inter-coded using the short-term reference image and/or the long-term reference image without obtaining reference information only from the knowledge image. Therefore, the coding benefit brought by the knowledge image to the video to be coded mainly comes from the random access point image, and when the acquired knowledge image set can provide the optimal coding benefit for the random access point image, the knowledge image set can provide high coding benefit for the video to be coded to a certain extent.

The specific difference is that step 701 is added before step 601:

701: and acquiring a random access point image from the video clip to be coded, and simultaneously using the random access point image as a representative image set and a candidate image set.

An optimized implementation may be used to calculate the degree of coding contribution when the representative image set and the candidate image set use the same images. In this implementation, when calculating the coding contribution list for a knowledge image, if the selected representative image and the knowledge image contain the same image content (e.g., both from the same random access point image), the calculation of the coding contribution of the knowledge image to the representative image may be skipped and set to the maximum value, because the representative image uses the same knowledge image as itself as the reference image, substantially no coding cost is incurred (because all image blocks may skip coding). This optimized approach may reduce computational complexity.

The following describes yet another embodiment of the video encoding method provided by the present invention, which is obtained by changing the method shown in fig. 6, and is different from the video encoding method shown in fig. 6 in that candidate images in a candidate image set are obtained only from video segments to be encoded, and when a knowledge image set is selected from the candidate image set by using a crowdsourcing method, a knowledge image is limited to be used as a reference image only by a subsequent image to be encoded, where the subsequent image to be encoded is an image that is chronologically subsequent to a time point of the knowledge image that is referred to by the subsequent image to be encoded, where the time point of the knowledge image is a time point of an image in the video segment to be encoded to which the knowledge image corresponds. The benefit of this definition is that when encoding or decoding sequentially, the knowledge picture used by the picture to be encoded comes from a picture that has been encoded or decoded before the picture to be encoded, without the need to encode or decode the knowledge picture again. Without such a limitation, the knowledge image used by the image to be encoded may come from an image that is not yet encoded or decoded after the image to be encoded, and the image that is not yet encoded or decoded needs to be encoded or decoded in advance, which may occupy additional encoding or decoding resources. Under this definition, the encoding contribution of the knowledge image to the representative image of the time point in the representative image set after the time point of the knowledge image is the same as the encoding contribution obtained in the method shown in fig. 6, and the encoding contribution of the knowledge image to the representative image of the time point in the representative image set before the time point of the knowledge image is zero.

The specific difference is that the operation of calculating the coding cost and the coding contribution in the crowdsourcing method is performed before the crowdsourcing method to obtain step 801.

801: and calculating the encoding cost of each candidate image in the candidate image set and an encoding contribution degree list of the candidate image set to the representative image set, wherein the encoding contribution degree list comprises the encoding contribution degree provided by the candidate image to the encoding of each representative image in the representative image set. The candidate image set and the representative image set are selected from the video segment to be encoded. If the time point of the image corresponding to the candidate image in the video segment to be coded is greater than the time point of the image corresponding to the representative image in the video segment to be coded, the coding contribution degree of the candidate image to the representative image is zero.

In this embodiment, the calculation of the encoding cost and the encoding contribution degree is performed independently from and before the crowdsourcing method, which has the advantage of avoiding the repeated calculation of the encoding cost and the encoding contribution degree of the knowledge image in the knowledge image adding method and the culling method of the crowdsourcing method, thereby reducing the computational complexity. The reason why the encoding cost and the encoding contribution degree of the candidate image are calculated in step 801 is that the knowledge image set is not selected yet, and the knowledge image is selected from the candidate images, so the encoding cost and the encoding contribution degree of the candidate image are the encoding cost and the encoding contribution degree of the knowledge image.

Another specific embodiment of the video encoding method disclosed in the present invention is described as follows:

901: obtaining a set of all N random access segments from a video segment to be encoded

As a representative image set, each random access slice contains one random access point image and all images before the next random access point image.

902: acquiring set of all random access point images from video clip to be coded

As a candidate image set.

903: for each candidate image p in the candidate image set_k(K is more than or equal to 1 and less than or equal to K) carrying out intra-frame coding rate estimation to obtain an estimated code rate R (p)_k) Then p is_kHas a coding cost of λ (p)_k)R(p_k). With each candidate image p_kAs each random access segment s respectively_n(1. ltoreq. N. ltoreq.N) of the reference picture (i.e. all pictures to be coded in the random access segment use the same candidate picture as the reference picture), since s_nUsing a given Reference Structure (e.g. Hierarchical Reference Structure or Low-latency Reference Structure), constructing a Reference picture set of the picture to be coded by the existing short-term Reference picture and/or long-term Reference picture of each picture to be coded and the candidate picture, and encoding s_nEach image in the imageDividing the image block into a plurality of image blocks according to the size of a fixed block, acquiring a reference block (the reference block may be from a candidate image, a short-term reference image or a long-term reference image) from a reference image for each image block by using motion search, and obtaining s according to the difference value of the original pixel values of the image block and the reference block_nThe coding cost of each image is taken as s_nThe coding cost C(s) at which the candidate image can be used_n|p_k). In addition, s is obtained in the same manner as described above without using the candidate image as the reference image_nCoding cost C(s) when candidate image is not used_n). Finally obtaining a candidate image p_kFor random access segment s_nHas a coding contribution of C(s)_n|p_k) And C(s)_n) Difference Δ C(s) of_n|p_k) When the coding contribution degree is less than zero, the candidate image cannot provide effective reference information for the random access segment, and the coding contribution degree should be zero. Therefore, when C(s)_n)>C(s_n|p_k) The degree of contribution of the code is Δ C(s)_n|p_k)＝C(s_n)-C(s_n|p_k) When C(s)_n)≤C(s_n|p_k) The degree of contribution of the code is Δ C(s)_n|p_k)＝0。

904: initially setting up a knowledge image set

Is an empty set phi. Traversing a set of candidate images

Candidate image p in (1)_kComputing a set of random access segments

A coding yield F (p) obtained when the candidate image is used as a reference image_k) Wherein the coding gain is the sum of the coding contributions of the candidate images to the representative image set and subtracted by the coding cost thereof, i.e.

If F (p)_k) Is all p_k(1. ltoreq. K. ltoreq.K), then p is_kPutting into a knowledge image set

To update a knowledge image set, i.e.

905: selecting candidate image sets one by one

Candidate image p in (1)_i(1. ltoreq. i. ltoreq.K) to obtain a set

Computing

Coding gain obtained when using the set as a reference image

Wherein

Is a set

All images in can be s_nMaximum degree of coding contribution provided, i.e.

In addition, calculate

Using knowledge image sets

Coding gain obtained when used as a reference image

If it is not

Wherein the threshold ratio thr₁＝1+const/K²Const is a given fixed value greater than zero (e.g., 0.2, 0.3, 1.0, 1.1, etc.), the candidate image p is selected_iPutting into a knowledge image set

To update a knowledge image set, i.e.

And step 905 is repeated until all p are reached_i(1≤i≤K)，

I.e. collections

The coding gain obtained by the representative image set cannot be larger than the threshold value

906: selecting knowledge image sets one by one

Knowledge image x of_j(j is more than or equal to 1 and less than or equal to M), wherein M is a knowledge image set

Number of intermediate knowledge images, x_jRemoving the set from the knowledge image set

Computing

Use sets

Coding gain obtained when used as a reference image

In addition, calculate

Using knowledge image sets

Coding gain obtained when used as a reference image

If it is not

Wherein the threshold ratio thr₂Can be mixed with thr₁The same or different, use the set

Updating knowledge image sets, i.e.

And step 905 is performed again. If not, then,

proceed to step 907.

907: if it is not

Then will be

As a final knowledge image set; if not, then,

is the final knowledge-image set.

908: and using at least one knowledge image in the knowledge image set as a reference image of at least one image to be coded in the video to be coded, and coding the image to be coded.

The complexity of obtaining the knowledge image set by using the crowdsourcing method is O (1/const) K²NlogN), including O (1/const. multidot.K)²logN) iterations, and O (N) coding gain calculations in each iteration. The clustering method used in the existing encoding method based on knowledge base has the operation complexity of O (1/2N)³(N +1) I) (I is the number of iterations per cluster), including O (1/2 × N)²(N +1) I) iterations, and O (N) coding gain calculations in each iteration. Therefore, the knowledge image set acquisition method based on crowdsourcing can acquire a better knowledge image set with lower complexity.

Claims

1. A method of crowd-sourced video encoding, comprising:

2. The method according to claim 1, wherein the knowledge image adding method comprises the steps of:

a) addition step

b) judging step

3. Method according to claim 1 or 2, characterized in that said alternating comprises the steps of:

4. The method of claim 3, further comprising, after the alternating ends:

5. The method of claim 1, further comprising:

the candidate images in the candidate image set are obtained by selecting from or synthesizing the video clip to be coded or other videos with similar contents with the video clip to be coded.

6. The method of claim 1, wherein the coding benefit of the representative image set is a difference between a sum of coding contributions provided by the knowledge-image set to the representative image set and a sum of coding costs of the knowledge-image set.

7. The method of claim 6, wherein calculating the encoded revenue from the representative image set comprises: calculating the coding cost of each knowledge image in the knowledge image set and a group of coding contribution degrees provided by the coding cost to the coding of all images to be coded in the representative image set; the total coding cost is the coding cost of all knowledge images in the knowledge image set; the total coding contribution degree is the coding contribution degree provided by all knowledge images in the knowledge image set to the coding of all images to be coded in the representative image set.

8. A crowd-sourced video encoding device, comprising:

a processor;

one or more programs are used to perform the following method:

9. The apparatus of claim 8, wherein the processor-implemented knowledge image augmentation method comprises:

a) addition step

b) judging step

10. The apparatus according to claim 8 or 9, wherein said alternating comprises the steps of: