CN116347082A

CN116347082A - Method and device for encoding multimedia resources, electronic equipment and storage medium

Info

Publication number: CN116347082A
Application number: CN202310377079.9A
Authority: CN
Inventors: 简云瑞; 黄跃; 闻兴
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2023-04-10
Filing date: 2023-04-10
Publication date: 2023-06-27

Abstract

The embodiment of the disclosure provides a method and a device for encoding multimedia resources, electronic equipment and a storage medium. The method comprises the following steps: acquiring a current coding unit corresponding to a multimedia resource to be coded and candidate motion information corresponding to a merging mode with a motion vector difference, wherein the candidate motion information comprises a plurality of candidate motion vectors and adjustment information of each candidate motion vector; arranging and combining each candidate motion vector, each offset direction, each list index and each offset step in an offset step list indicated by each list index to obtain a plurality of coding parameter combinations; coding the current coding unit by using each coding parameter combination to obtain respective rate distortion cost data of each coding parameter combination; and determining the coding parameter combination of which the rate distortion cost data meets the preset condition as a target coding parameter combination of the current coding unit so as to obtain a target coding result. The method can save computer resources and improve the coding effect.

Description

Method and device for encoding multimedia resources, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a method and a device for encoding multimedia resources, electronic equipment and a computer readable storage medium.

Background

In video coding, MMVD (Merge with Motion Vector Difference, merging mode with motion vector difference) is an inter prediction technique in VVC (Versatile Video Coding, universal video coding) coding standard. MMVD utilizes the first two candidate motion information in the Merge list to construct MMVD candidates, shifts in the horizontal and vertical directions, and selects the optimal candidate motion information index, shift direction and shift step size through rate distortion optimization.

In the related art, in the MMVD mode, 8 offset steps are specified for selection, which are respectively: {1/4,1/2,1,2,4,8,16,32}, when the offset step size of the coding block selection is larger, the number of bits required for coding the syntax element is larger, and more computer resources are required to be occupied; meanwhile, the coding effect is poor due to the alternative offset step length.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides a multimedia resource encoding method, a multimedia resource encoding device, electronic equipment and a computer readable storage medium.

The embodiment of the disclosure provides a method for encoding multimedia resources, which comprises the following steps: acquiring a current coding unit corresponding to a multimedia resource to be coded and candidate motion information corresponding to a merging mode with motion vector difference, wherein the candidate motion information comprises a plurality of candidate motion vectors and adjustment information of each candidate motion vector, the adjustment information comprises offset direction adjustment information and offset step length adjustment information, the offset direction adjustment information comprises a plurality of offset directions, the offset step length adjustment information comprises a plurality of list indexes and offset step length lists indicated by the list indexes, and each offset step length list comprises a plurality of offset step lengths; arranging and combining each candidate motion vector, each offset direction, each list index and each offset step in an offset step list indicated by each list index to obtain a plurality of coding parameter combinations; coding the current coding unit by using each coding parameter combination to obtain respective rate distortion cost data of each coding parameter combination; and determining the coding parameter combination of which the rate distortion cost data meets the preset condition as the target coding parameter combination of the current coding unit, so as to obtain the target coding result of the multimedia resource to be coded according to the coding result of the current coding unit corresponding to the target coding parameter combination.

In some exemplary embodiments of the present disclosure, the method further comprises: determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists and offset step indexes corresponding to target coding parameter combinations of the current coding unit; and writing the candidate motion vector index, the offset direction index, the list index of the offset step list and the offset step index corresponding to the target coding parameter combination of the current coding unit into a code stream, and transmitting the code stream to a decoding end for decoding.

In some exemplary embodiments of the present disclosure, the list of offset steps includes a first list of offset steps and a second list of offset steps; each offset step in the first offset step list is smaller than each offset step in the second offset step list; step intervals between adjacent offset step sizes in the first offset step size list are smaller than step intervals between adjacent offset step sizes in the second offset step size list.

In some exemplary embodiments of the present disclosure, the number of coding parameter combinations is a product of the number of candidate motion vectors, the number of offset directions, the number of list indices, and the number of offset steps included per offset step list.

In some exemplary embodiments of the present disclosure, the candidate motion information includes a plurality of motion vector accuracies; the step of arranging and combining each candidate motion vector, each offset direction, each list index, and each offset step in the offset step list indicated by each list index to obtain a plurality of coding parameter combinations includes: and arranging and combining each candidate motion vector, each offset direction, each list index, each offset step in an offset step list indicated by each list index and each motion vector precision to obtain the plurality of coding parameter combinations.

In some exemplary embodiments of the present disclosure, the method further comprises: determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists, offset step indexes and motion vector precision indexes corresponding to target coding parameter combinations of the current coding unit; and writing the candidate motion vector index, the offset direction index, the list index of the offset step list, the offset step index and the motion vector precision index corresponding to the target coding parameter combination of the current coding unit into a code stream to be transmitted to a decoding end for decoding.

In some exemplary embodiments of the present disclosure, the number of coding parameter combinations is a product of the number of candidate motion vectors, the number of offset directions, the number of list indices, the number of offset steps included in each offset step list, and the number of motion vector accuracies.

The embodiment of the disclosure provides a coding device for multimedia resources, comprising: the acquisition module is configured to acquire the current coding unit corresponding to the multimedia resource to be coded and candidate motion information corresponding to the merging mode with the motion vector difference, wherein the candidate motion information comprises a plurality of candidate motion vectors and adjustment information of each candidate motion vector, the adjustment information comprises offset direction adjustment information and offset step length adjustment information, the offset direction adjustment information comprises a plurality of offset directions, the offset step length adjustment information comprises a plurality of list indexes and offset step length lists indicated by the list indexes, and each offset step length list comprises a plurality of offset step lengths; a determining module configured to perform permutation and combination of each candidate motion vector, each offset direction, each list index, and each offset step in the offset step list indicated by each list index, to obtain a plurality of coding parameter combinations; an obtaining module configured to perform encoding of the current encoding unit using each encoding parameter combination, obtaining rate-distortion cost data of each encoding parameter combination; the determining module is configured to execute the coding parameter combination that the rate distortion cost data of each coding parameter combination meets the preset condition, determine the coding parameter combination as the target coding parameter combination of the current coding unit, and obtain the target coding result of the to-be-coded multimedia resource according to the coding result of the current coding unit corresponding to the target coding parameter combination.

In some exemplary embodiments of the present disclosure, the determining module is configured to perform: determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists and offset step indexes corresponding to target coding parameter combinations of the current coding unit; and writing the candidate motion vector index, the offset direction index, the list index of the offset step list and the offset step index corresponding to the target coding parameter combination of the current coding unit into a code stream, and transmitting the code stream to a decoding end for decoding.

In some exemplary embodiments of the present disclosure, the candidate motion information includes a plurality of motion vector accuracies; an obtaining module configured to perform: and arranging and combining each candidate motion vector, each offset direction, each list index, each offset step in an offset step list indicated by each list index and each motion vector precision to obtain the plurality of coding parameter combinations.

In some exemplary embodiments of the present disclosure, the determining module is configured to perform: determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists, offset step indexes and motion vector precision indexes corresponding to target coding parameter combinations of the current coding unit; and writing the candidate motion vector index, the offset direction index, the list index of the offset step list, the offset step index and the motion vector precision index corresponding to the target coding parameter combination of the current coding unit into a code stream to be transmitted to a decoding end for decoding.

An embodiment of the present disclosure provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute executable instructions to implement a method of encoding a multimedia asset as in any of the above.

The disclosed embodiments provide a computer readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform a method of encoding a multimedia asset as any one of the above.

The disclosed embodiments provide a computer program product comprising a computer program which, when executed by a processor, implements the method of encoding a multimedia asset of any of the above.

According to the method for encoding the multimedia resource, provided by the embodiment of the disclosure, by setting a plurality of list indexes and a plurality of offset step sizes included in each list, when the list indexes are different, the same offset step size index can be used for representing different offset step sizes, so that the same computer resource can be used for representing more offset step size information, and the computer resource is saved; meanwhile, the number of the optional offset step steps is increased, so that when each coding parameter combination is used for coding the current coding unit, the number of the optional coding parameter combinations is more, the obtained target coding parameter combination and the coding result are more accurate, and the coding effect is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the encoding method of multimedia resources of the embodiments of the present disclosure may be applied.

Fig. 2 is a flowchart illustrating a method of encoding a multimedia asset according to an exemplary embodiment.

Fig. 3 is a schematic diagram of a forward reference frame and a backward reference frame, according to an example embodiment.

Fig. 4 is a flowchart illustrating another encoding method of a multimedia asset according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an encoding apparatus of a multimedia asset according to an exemplary embodiment.

Fig. 6 is a schematic diagram illustrating a structure of an electronic device suitable for use in implementing an exemplary embodiment of the present disclosure, according to an exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.

The described features, structures, or characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will recognize that the aspects of the present disclosure may be practiced with one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

The drawings are merely schematic illustrations of the present disclosure, in which like reference numerals denote like or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in at least one hardware module or integrated circuit or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and not necessarily all of the elements or steps are included or performed in the order described. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

In the present specification, the terms "a," "an," "the," "said" and "at least one" are used to indicate the presence of at least one element/component/etc.; the terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements/components/etc., in addition to the listed elements/components/etc.; the terms "first," "second," and "third," etc. are used merely as labels, and do not limit the number of their objects.

As shown in fig. 1, the system architecture may include a server 101, a network 102, a terminal device 103, a terminal device 104, and a terminal device 105. Network 102 is the medium used to provide communication links between terminal device 103, terminal device 104, or terminal device 105 and server 101. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.

The server 101 may be a server providing various services, such as a background management server providing support for devices operated by a user with the terminal device 103, the terminal device 104, or the terminal device 105. The background management server may analyze and process the received data such as the request, and feed back the processing result to the terminal device 103, the terminal device 104, or the terminal device 105; the server 101 may be, for example, an encoder or a decoder.

The terminal device 103, the terminal device 104, and the terminal device 105 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a wearable smart device, a virtual reality device, an augmented reality device, and the like.

In the embodiment of the present disclosure, the server 101 may: the method comprises the steps of obtaining a multimedia resource to be encoded, wherein the multimedia resource to be encoded comprises a plurality of video frames, dividing each video frame into a plurality of encoding units, and taking a unit which is being encoded as a current encoding unit as an example.

In the embodiment of the disclosure, when a current coding unit is coded, a plurality of coding modes are used for coding the current coding unit, so that rate distortion cost values corresponding to the coding modes are obtained; and selecting a target coding mode of the current coding unit and a corresponding coding result according to the rate distortion cost value corresponding to each coding mode.

The encoding method provided by the embodiment of the disclosure is a process of selecting encoding parameters when a current encoding unit uses an MMVD mode for encoding.

The Merge mode is an MV (Motion vector) prediction technique in a coding standard, and predicts an MV of a current coding block (which may also be referred to as a current coding unit) by using MVs of adjacent blocks in a time domain or a space domain. The Merge mode establishes a motion information candidate list for the current coding unit, wherein the list comprises 5 candidate motion information, each candidate motion information comprises MV, prediction direction, reference frame index, motion vector precision and the like, and candidate motion information with the minimum rate distortion cost is calculated as the candidate motion information of the current coding unit by traversing the 5 candidate motion information. No motion information (e.g., reference frame index, motion vector precision index, prediction direction, etc.) need be transmitted in the code stream, only the index of the optimal candidate motion information in the list need be transmitted.

The MMVD mode is similar to the common Merge mode, and the MMVD can construct an MV candidate list in the same way as the Merge mode; however, the number of candidate lists constructed by the MMVD mode is only 2, so that the MMVD mode only needs to traverse 2 candidate motion information, and the optimal one is selected from the 2 candidate motion information; the MMVD mode adopts motion information of list candidates to perform motion compensation on the current coding block, and MV information needs to be adjusted to a certain extent under the MMVD mode; therefore, in the MMVD mode, when transmitting motion information, it is necessary to transmit information for MV adjustment, including an offset step size and an offset direction, in addition to the list index of the optimal candidate motion information as in the normal Merge mode.

Specifically, the server 101 may: obtaining candidate motion information corresponding to an MMVD mode, wherein the candidate motion information comprises a plurality of candidate motion vectors and adjustment information of each candidate motion vector, the adjustment information comprises offset direction adjustment information and offset step length adjustment information, the offset direction adjustment information comprises a plurality of offset directions, the offset step length adjustment information comprises a plurality of list indexes and offset step length lists indicated by the list indexes, and each offset step length list comprises a plurality of offset step lengths; arranging and combining each candidate motion vector, each offset direction, each list index and each offset step in an offset step list indicated by each list index to obtain a plurality of coding parameter combinations; coding the current coding unit by using each coding parameter combination to obtain respective rate distortion cost data of each coding parameter combination; and determining the coding parameter combination of which the rate distortion cost data meets the preset condition as a target coding parameter combination of the current coding unit, so as to obtain a target coding result of the multimedia resource to be coded according to the coding result of the current coding unit corresponding to the target coding parameter combination.

And under the MMVD mode, after determining the target coding parameter combination of the current coding unit, comparing the rate-distortion cost value corresponding to the target coding parameter combination under the MMVD mode with the rate-distortion cost values under other coding modes, and selecting the coding mode with smaller rate-distortion cost value as the target coding mode of the current coding unit.

After the current coding unit is coded, the next coding unit can be coded by adopting the same method until the current video frame is coded.

It should be understood that the numbers of the terminal device 103, the terminal device 104, the terminal device 105, the network 102 and the server 101 in fig. 1 are only illustrative, and the server 101 may be a server of one entity, may be a server cluster formed by a plurality of servers, may be a cloud server, and may have any number of terminal devices, networks and servers according to actual needs.

Hereinafter, each step of the encoding method of multimedia resources in the exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings and embodiments. The method provided by the embodiments of the present disclosure may be performed by any electronic device, for example, the server and/or the terminal device in fig. 1 described above, but the present disclosure is not limited thereto.

As shown in fig. 2, the method provided by the embodiments of the present disclosure may include the following steps.

In step S210, the current coding unit corresponding to the multimedia resource to be coded and the candidate motion information corresponding to the MMVD mode are obtained, where the candidate motion information includes a plurality of candidate motion vectors and adjustment information of each candidate motion vector, the adjustment information includes offset direction adjustment information and offset step adjustment information, the offset direction adjustment information includes a plurality of offset directions, the offset step adjustment information includes a plurality of list indexes and offset step lists indicated by the list indexes, and each offset step list includes a plurality of offset steps.

In the embodiment of the disclosure, the MMVD modes include a unidirectional MMVD mode and a bidirectional MMVD mode, and when the prediction direction in the candidate motion information is unidirectional, that is, the current coding unit is unidirectional prediction, only reference is made to a forward reference frame (L0 reference) or a backward reference frame (L1 reference), the MMVD is unidirectional. When the prediction direction in the candidate motion information is bidirectional, namely the current coding unit is bidirectional prediction, the forward reference frame and the backward reference frame need to be referenced simultaneously, namely bidirectional MMVD; the forward reference frame (L0 reference) and the backward reference frame (L1 reference) are shown in fig. 3.

In the following illustration, the determination of the coding parameter combinations in the unidirectional MMVD mode is described.

In an embodiment of the disclosure, in a unidirectional MMVD mode, the candidate motion information may include a plurality of candidate motion vectors and adjustment information of each candidate motion vector, the adjustment information may include offset direction adjustment information and offset step adjustment information, the offset direction adjustment information may include a plurality of offset directions, the offset step adjustment information may include a plurality of list indexes and offset step lists indicated by the respective list indexes, and each offset step list may include a plurality of offset steps; for example, there may be 2 candidate motion vectors, 4 offset directions (e.g., x-axis positive direction, x-axis negative direction, y-axis positive direction, and y-axis negative direction), 2 List indices (e.g., list0 and List 1), and offset step lists indicated by the respective List indices (e.g., the 0 th offset step List indicated by List0 and the 1 st offset step List indicated by List 1), where each offset step List may include 8 offset steps.

In an exemplary embodiment, the offset step List includes a first offset step List0 and a second offset step List1; each offset step in the first offset step List0 is smaller than each offset step in the second offset step List1; the step spacing between adjacent offset steps in the first List of offset steps List0 is smaller than the step spacing between adjacent offset steps in the second List of offset steps List 1.

For example, the offset steps included in the first offset step List0 and the second offset step List1 are as follows:

List0＝{1/4,2/4,3/4,1,5/4,6/4,8/4,10/4}，

List1＝{4,6,8,12,16,24,32,64}，

as can be seen from the values of the offset steps in the first offset step List0 and the second offset step List1, the values of the offset steps in List0 are smaller, and the offset step List is suitable for a coding unit with smaller motion amplitude; the values of the offset steps in List1 and the intervals between adjacent offset steps are large, which is suitable for coding units comprising large motion amplitudes. Meanwhile, for the offset step sizes in the List0 and the List1, the last offset step size (namely the largest offset step size) in the List0 is smaller than the first offset step size (namely the smallest offset step size) in the List 1; and, the step interval of adjacent offset steps in List1 is larger than the step interval of adjacent offset steps in List 0.

In the embodiment of the disclosure, by setting that each offset step in the first offset step list is smaller than each offset step in the second offset step list, and step intervals between adjacent offset steps in the first offset step list are smaller than step intervals between adjacent offset steps in the second offset step list, a proper offset step list can be selected for coding units with different motion amplitudes, thereby improving coding efficiency.

In the embodiments of the present disclosure, different syntax elements may be used to represent the candidate motion vectors, the offset directions, the list indices, and the offset steps in the respective offset step lists.

For example, the candidate motion vector is represented using a first syntax element (mmvd_cand_flag), as specifically shown in table 1:

TABLE 1 meaning of the first syntax element mmvd_cand_flag

Value of	Meaning of representation
		0	Selecting the zeroth candidate motion vector
1	Selecting a first candidate motion vector

For example, the offset direction is represented using a second syntax element (mmvd_direction_idx), as specifically shown in table 2:

TABLE 2 meaning of the second syntax element mmvd_direction_idx

Value of	Meaning of representation
		00	Selecting positive x-axis direction
01	Selecting the negative x-axis direction
		10	Selecting the positive y-axis direction
11	Selecting the negative y-axis direction

For example, the list index is represented using a third syntax element (mmvd_list_idx), as specifically shown in table 3:

TABLE 3 meaning of the third syntax element mmvd_list_idx

Value of	Meaning of representation
		0	Select List0
1	Select List1

For example, the offset step is represented by a fourth syntax element (mmvd_distance_flag), and the offset step is encoded by a truncated unary code, as shown in table 4:

TABLE 4 meaning of fourth syntax element mmvd_distance_flag

Value of	Meaning of representation
		0	Select step 0
10	Select 1 st step
		110	Select step 2
1110	Select step 3
		11110	Select step 4
111110	Select the 5 th step
		1111110	Select the 6 th step
1111111	Select the 7 th step

It can be seen that the first syntax element mmvd_cand_flag requires only 1bit to represent its information, the second syntax element mmvd_direction_idx requires 2bits to represent its information, the third syntax element mmvd_list_idx requires only 1bit to represent its information, and the fourth syntax element mmvd_distance_idx requires 1 to 7bits to represent its information depending on the step size selected.

Referring to tables 3 and 4 in combination, when the third syntax element is 0, i.e., list0 is selected, if the fourth syntax element is 0, it means that the 0 th step size in List0 is selected, i.e., 1/4; when the third syntax element is 1, i.e. List1 is selected, if the fourth syntax element is 0, it means that the 0 th step size in List1 is selected, i.e. 4; it follows that when the list indices are different, the same offset step index can be used to represent different offset steps, whereby more offset step information can be represented using the same computer resources, while increasing the number of selectable offset steps, thereby improving coding performance.

In the embodiment of the disclosure, by setting a plurality of list indexes and a plurality of offset steps included in an offset step list indicated by each list index, when a current coding unit is coded, the list index is selected first, and then a proper offset step is selected from the corresponding offset step list according to the list index, so that the number of selectable offset steps is increased, and the coding performance can be improved; meanwhile, only the list index and the offset step index are required to be transmitted in the subsequent data transmission, and transmission resources can be saved when a larger offset step is represented.

In step S220, each candidate motion vector, each offset direction, each list index, and each offset step in the offset step list indicated by each list index are aligned and combined to obtain a plurality of coding parameter combinations.

In the embodiment of the disclosure, each candidate motion vector, each offset direction, each list index, and each offset step in the offset step list indicated by each list index may be traversed, and are arranged and combined, and each candidate motion vector, each offset direction, each list index, and each offset step in the offset step list indicated by each list index are used to form the coding parameter combination.

In an exemplary embodiment, the number of coding parameter combinations is the product of the number of candidate motion vectors, the number of offset directions, the number of list indices, and the number of offset steps included in each offset step list.

For example, in the case where there are 2 candidate motion vectors, 4 offset directions, 2 list indices, and each offset step list includes 8 offset steps, the encoding parameters are combined to be common: 2×4×2×8=128 (pieces).

In the embodiment of the disclosure, the product of the number of candidate motion vectors, the number of offset directions, the number of list indexes, and the number of offset steps included in each offset step list is the number of coding parameter combinations, so that the determined coding parameter combinations are more comprehensive, and the target coding parameter combinations determined according to the respective coding parameter combinations can be ensured to be more accurate.

In step S230, the current coding unit is coded using each coding parameter combination, and respective rate-distortion cost data of each coding parameter combination is obtained.

In the embodiment of the disclosure, the rate-distortion cost data may be a rate-distortion cost value; and encoding the current encoding unit by using the candidate motion vector, the offset direction and the offset step length in each encoding parameter combination to obtain the respective rate distortion cost value of each encoding parameter combination.

Specifically, with the position pointed by the candidate motion vector in the reference frame as a starting point, different new motion vectors are respectively formed with different offset directions and different offset step sizes. As shown in fig. 3, the origin 301 of the dashed line is the starting point of the candidate motion vector in the reference frame (the forward reference frame or the backward reference frame), and the candidate motion vector is shifted in the horizontal direction or the vertical direction, so as to obtain a new motion vector (the solid line circle in the figure).

For example, assuming that the candidate motion vector MV is {4,6}, the offset step size is 2, the offsets MVoffset in the 4 directions are respectively: x-axis positive direction {2,0}, x-axis negative direction { -2,0}, y-axis positive direction {0,2}, y-axis negative direction {0, -2}; the new motion vector after adjustment: mvfinal=mv+mvoffset; taking the positive x-axis direction as an example, the new motion vectors are: mvfinal= {4,6} + {2,0 = {6,6}.

Thus, 128 new motion vectors can be obtained from 2 candidate motion vectors, 4 offset directions, 2 list indices, 8 offset steps included in each offset step list; and respectively encoding the current coding unit by using the 128 new motion vectors to obtain respective rate distortion cost values of each coding parameter combination.

In step S240, the coding parameter combination of the rate-distortion cost data satisfying the preset condition is determined as the target coding parameter combination of the current coding unit, so as to obtain the target coding result of the multimedia resource to be coded according to the coding result of the current coding unit corresponding to the target coding parameter combination.

In the embodiment of the disclosure, the coding parameter combination with the minimum rate distortion cost value can be used as the target coding parameter combination of the current coding unit.

In an exemplary embodiment, the method may further include: determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists and offset step indexes corresponding to target coding parameter combinations of a current coding unit; and writing the candidate motion vector index, the offset direction index, the list index of the offset step list and the offset step index corresponding to the target coding parameter combination of the current coding unit into the code stream, and transmitting the code stream to a decoding end for decoding.

In the embodiment of the disclosure, after determining the target coding parameter combination, according to tables 1 to 4, determining a candidate motion vector index corresponding to a candidate motion vector in the target coding parameter combination (for example, a candidate motion vector index corresponding to a zero candidate motion vector is 0), an offset direction index corresponding to an offset direction (for example, an offset direction index corresponding to an x-axis positive direction is 00), a list index corresponding to an offset step length, and an offset step length index (for example, a list index corresponding to an offset step length 1/4 is 0, and an offset step length index is 0); and writing the candidate motion vector index, the offset direction index, the list index and the offset step index into the code stream for transmission to a decoding end.

In the embodiment of the disclosure, after determining the target coding parameter combination, determining the candidate motion vector index, the offset direction index, the list index of the offset step list and the offset step index corresponding to the target coding parameter combination, only the indexes are required to be written into the code stream, and only the indexes are required to be transmitted during data transmission, so that transmission resources can be saved, and the transmission efficiency can be improved.

As shown in fig. 4, the method provided by the embodiment of the present disclosure may include the following steps.

In step S410, candidate motion information corresponding to a current coding unit corresponding to a multimedia resource to be coded and a merging mode with motion vector difference is obtained, the candidate motion information includes a plurality of candidate motion vectors, adjustment information of each candidate motion vector and a plurality of motion vector precision, the adjustment information includes offset direction adjustment information and offset step adjustment information, the offset direction adjustment information includes a plurality of offset directions, the offset step adjustment information includes a plurality of list indexes and offset step lists indicated by the list indexes, and each offset step list includes a plurality of offset steps.

In the embodiment of the present disclosure, in addition to obtaining a plurality of candidate motion vectors, a plurality of offset directions, a plurality of list indexes, and offset step lists indicated by the list indexes, a plurality of motion vector accuracies of a current coding unit may be obtained.

In MMVD mode, the motion vector precision of the coding unit can be 2: one-half precision and one-quarter precision.

In the related art, in the MMVD mode, the motion vector precision of each coding unit is inherited from the candidate motion vector. If the motion vector precision of the candidate motion vector is one-fourth pixel precision, the motion vector precision of the current coding unit is also one-fourth, otherwise, the motion vector precision of the current coding unit is set to one-half.

In the embodiment of the disclosure, a current optimal motion vector precision is determined for a coding unit of unidirectional MMVD in a rate distortion optimization mode. Specifically, in all the combination processes of the current unidirectional MMVD traversal candidate motion vector, the offset direction, the offset candidate list and the offset step length, each combination also calculates the motion vector to be the rate distortion cost value under the quarter precision and the half precision respectively in a rate distortion optimization mode, and selects the optimal motion vector precision of the current coding block.

In the disclosed embodiments, different syntax elements may be used to represent candidate motion vectors, offset directions, list indices, offset steps in the respective offset step lists, and motion vector precision.

For example, the candidate motion vector, the offset direction, the list index, and the offset step in each offset step list are represented by using the first to fourth syntax elements, as shown in tables 1 to 4; the motion vector accuracy is represented using a fifth syntax element (mmvd_mv_resolution_idx), as shown in table 5 in detail:

TABLE 5 meaning of the fifth syntax element mmvd_mv_resolution_idx

Value of	Meaning of representation
		0	Selecting one-half pixel precision
1	Selecting quarter-pixel precision

As can be seen from table 5, the fifth syntax element mmvd_mv_resolution_idx requires only 1bit to represent its syntax meaning.

In step S420, each candidate motion vector, each offset direction, each list index, each offset step in the offset step list indicated by each list index, and each motion vector precision are aligned and combined to obtain a plurality of coding parameter combinations.

In the embodiment of the disclosure, each candidate motion vector, each offset direction, each list index, each offset step in the offset step list indicated by each list index, and each motion vector precision may be traversed, and are arranged and combined, and each candidate motion vector, each offset direction, each list index, each offset step in the offset step list indicated by each list index, and each motion vector precision are used to form the coding parameter combination.

In an exemplary embodiment, the number of coding parameter combinations is the product of the number of candidate motion vectors, the number of offset directions, the number of list indices, the number of offset steps each offset step list comprises, and the number of motion vector accuracies.

For example, in the case where there are 2 candidate motion vectors, 4 offset directions, 2 list indexes, 8 offset step sizes for each offset step list, and 2 motion vector accuracies, the encoding parameter combinations are common: 2×4×2×8×2=256 (one).

In the embodiment of the disclosure, the product of the number of candidate motion vectors, the number of offset directions, the number of list indexes, the number of offset steps included in each offset step list, and the number of motion vector precision is the number of coding parameter combinations, so that the determined coding parameter combinations are more comprehensive, and the target coding parameter combinations determined according to the respective coding parameter combinations can be ensured to be more accurate.

In step S430, the current coding unit is coded using each coding parameter combination, and respective rate-distortion cost data of each coding parameter combination is obtained.

In the embodiment of the disclosure, a current coding unit is coded by using candidate motion vectors, offset directions, offset step sizes and motion vector precision in each coding parameter combination, so as to obtain respective rate distortion cost values of each coding parameter combination.

Specifically, 256 coding parameter combinations can be obtained according to 2 candidate motion vectors, 4 offset directions, 2 list indexes, 8 offset steps included in each offset step list, and 2 motion vector accuracies; and respectively encoding the current coding unit by using the 256 coding parameter combinations to obtain respective rate distortion cost values of the coding parameter combinations.

In step S440, the coding parameter combination of the rate-distortion cost data satisfying the preset condition is determined as the target coding parameter combination of the current coding unit, so as to obtain the target coding result of the multimedia resource to be coded according to the coding result of the current coding unit corresponding to the target coding parameter combination.

In an exemplary embodiment, the method may further include: determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists, offset step indexes and motion vector precision indexes corresponding to target coding parameter combinations of a current coding unit; and writing the candidate motion vector index, the offset direction index, the list index of the offset step list, the offset step index and the motion vector precision index corresponding to the target coding parameter combination of the current coding unit into the code stream, and transmitting the code stream to a decoding end for decoding.

In the embodiment of the disclosure, after determining the target coding parameter combination, according to tables 1 to 5, a candidate motion vector index corresponding to a candidate motion vector in the target coding parameter combination (for example, a candidate motion vector index corresponding to a zero candidate motion vector is 0), an offset direction index corresponding to an offset direction (for example, an offset direction index corresponding to an x-axis positive direction is 00), a list index corresponding to an offset step size, and an offset step size index (for example, a list index corresponding to an offset step size 1/4 is 0, an offset step size index is 0), and a motion vector precision index corresponding to a motion vector precision (for example, a 0 corresponding to a half precision) are determined; and writing the candidate motion vector index, the offset direction index, the list index, the offset step index and the motion vector precision index into the code stream for transmission to a decoding end.

In the embodiment of the disclosure, after determining the target coding parameter combination, determining the candidate motion vector index, the offset direction index, the list index of the offset step list, the offset step index and the motion vector precision index corresponding to the target coding parameter combination, only the indexes are written into the code stream, and only the indexes are required to be transmitted during data transmission, so that transmission resources can be saved, and the transmission efficiency can be improved.

In the method for encoding the multimedia resource provided by the embodiment of the disclosure, in all the combination processes of traversing candidate motion vectors, offset directions, offset candidate lists and offset step sizes, each combination also calculates the motion vectors to be respectively the rate distortion cost value under the quarter precision and the half precision in a rate distortion optimization mode, and compared with the method for directly inheriting the motion vector precision in the related art, the method can enable the obtained target encoding parameter combination and encoding results to be more accurate, thereby improving the encoding performance.

It should also be understood that the above is only intended to assist those skilled in the art in better understanding the embodiments of the present disclosure, and is not intended to limit the scope of the embodiments of the present disclosure. It will be apparent to those skilled in the art from the foregoing examples that various equivalent modifications or variations can be made, for example, some steps of the methods described above may not be necessary, or some steps may be newly added, etc. Or a combination of any two or more of the above. Such modifications, variations, or combinations thereof are also within the scope of the embodiments of the present disclosure.

It should also be understood that the foregoing description of the embodiments of the present disclosure focuses on highlighting differences between the various embodiments and that the same or similar elements not mentioned may be referred to each other and are not repeated here for brevity.

It should also be understood that the sequence numbers of the above processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

It is also to be understood that in the various embodiments of the disclosure, terms and/or descriptions of the various embodiments are consistent and may be referenced to one another in the absence of a particular explanation or logic conflict, and that the features of the various embodiments may be combined to form new embodiments in accordance with their inherent logic relationships.

Examples of the encoding method of the multimedia resource provided by the present disclosure are described above in detail. It will be appreciated that the computer device, in order to carry out the functions described above, comprises corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 5 is a block diagram illustrating an encoding apparatus of a multimedia asset according to an exemplary embodiment. Referring to fig. 5, the apparatus 500 may include an acquisition module 510, a determination module 520, and an acquisition module 530.

The obtaining module 510 is configured to obtain a current coding unit corresponding to a multimedia resource to be coded and candidate motion information corresponding to a merging mode with motion vector differences, where the candidate motion information includes a plurality of candidate motion vectors and adjustment information of each candidate motion vector, the adjustment information includes offset direction adjustment information and offset step adjustment information, the offset direction adjustment information includes a plurality of offset directions, the offset step adjustment information includes a plurality of list indexes and offset step lists indicated by the list indexes, and each offset step list includes a plurality of offset steps; the determining module 520 is configured to perform permutation and combination of each candidate motion vector, each offset direction, each list index, and each offset step in the offset step list indicated by each list index, resulting in a plurality of coding parameter combinations; the obtaining module 530 is configured to perform encoding of the current encoding unit using each encoding parameter combination, obtaining respective rate-distortion cost data for each encoding parameter combination; the determining module is configured to execute the coding parameter combination that the rate distortion cost data of each coding parameter combination meets the preset condition, determine the coding parameter combination as the target coding parameter combination of the current coding unit, and obtain the target coding result of the to-be-coded multimedia resource according to the coding result of the current coding unit corresponding to the target coding parameter combination.

In some exemplary embodiments of the present disclosure, the determining module 520 is further configured to perform: determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists and offset step indexes corresponding to target coding parameter combinations of the current coding unit; and writing the candidate motion vector index, the offset direction index, the list index of the offset step list and the offset step index corresponding to the target coding parameter combination of the current coding unit into a code stream, and transmitting the code stream to a decoding end for decoding.

In some exemplary embodiments of the present disclosure, the candidate motion information includes a plurality of motion vector accuracies; the obtaining module 530 is configured to perform: and arranging and combining each candidate motion vector, each offset direction, each list index, each offset step in an offset step list indicated by each list index and each motion vector precision to obtain the plurality of coding parameter combinations.

In some exemplary embodiments of the present disclosure, the determining module 520 is further configured to perform: determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists, offset step indexes and motion vector precision indexes corresponding to target coding parameter combinations of the current coding unit; and writing the candidate motion vector index, the offset direction index, the list index of the offset step list, the offset step index and the motion vector precision index corresponding to the target coding parameter combination of the current coding unit into a code stream to be transmitted to a decoding end for decoding.

It should be noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor terminals and/or microcontroller terminals.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

An electronic device 600 according to such an embodiment of the present disclosure is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 6, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, a bus 630 connecting the different system components (including the memory unit 620 and the processing unit 610), a display unit 640.

Wherein the storage unit stores program code that is executable by the processing unit 610 such that the processing unit 610 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 610 may perform various steps as shown in fig. 2.

As another example, the electronic device may implement the various steps shown in fig. 2.

The storage unit 620 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 621 and/or cache memory 622, and may further include Read Only Memory (ROM) 623.

The storage unit 620 may also include a program/utility 624 having a set (at least one) of program modules 625, such program modules 625 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 670 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 600, and/or any devices (e.g., routers, modems, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. As shown, network adapter 660 communicates with other modules of electronic device 600 over bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment, a computer readable storage medium is also provided, e.g., a memory, comprising instructions executable by a processor of an apparatus to perform the above method. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, a computer program product is also provided, comprising a computer program/instruction which, when executed by a processor, implements the method of encoding a multimedia asset in the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for encoding a multimedia asset, comprising:

acquiring a current coding unit corresponding to a multimedia resource to be coded and candidate motion information corresponding to a merging mode with motion vector difference, wherein the candidate motion information comprises a plurality of candidate motion vectors and adjustment information of each candidate motion vector, the adjustment information comprises offset direction adjustment information and offset step length adjustment information, the offset direction adjustment information comprises a plurality of offset directions, the offset step length adjustment information comprises a plurality of list indexes and offset step length lists indicated by the list indexes, and each offset step length list comprises a plurality of offset step lengths;

arranging and combining each candidate motion vector, each offset direction, each list index and each offset step in an offset step list indicated by each list index to obtain a plurality of coding parameter combinations;

coding the current coding unit by using each coding parameter combination to obtain respective rate distortion cost data of each coding parameter combination;

and determining the coding parameter combination of which the rate distortion cost data meets the preset condition as the target coding parameter combination of the current coding unit, so as to obtain the target coding result of the multimedia resource to be coded according to the coding result of the current coding unit corresponding to the target coding parameter combination.

2. The method according to claim 1, wherein the method further comprises:

determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists and offset step indexes corresponding to target coding parameter combinations of the current coding unit;

and writing the candidate motion vector index, the offset direction index, the list index of the offset step list and the offset step index corresponding to the target coding parameter combination of the current coding unit into a code stream, and transmitting the code stream to a decoding end for decoding.

3. The method of claim 1, wherein the list of offset steps comprises a first list of offset steps and a second list of offset steps;

each offset step in the first offset step list is smaller than each offset step in the second offset step list;

step intervals between adjacent offset step sizes in the first offset step size list are smaller than step intervals between adjacent offset step sizes in the second offset step size list.

4. A method according to claim 1 or 3, characterized in that the number of coding parameter combinations is the product of the number of candidate motion vectors, the number of offset directions, the number of list indices, and the number of offset steps comprised by each offset step list.

5. The method of claim 1, wherein the candidate motion information comprises a plurality of motion vector accuracies;

the step of arranging and combining each candidate motion vector, each offset direction, each list index, and each offset step in the offset step list indicated by each list index to obtain a plurality of coding parameter combinations includes:

and arranging and combining each candidate motion vector, each offset direction, each list index, each offset step in an offset step list indicated by each list index and each motion vector precision to obtain the plurality of coding parameter combinations.

6. The method of claim 5, wherein the method further comprises:

determining candidate motion vector indexes, offset direction indexes, list indexes of offset step lists, offset step indexes and motion vector precision indexes corresponding to target coding parameter combinations of the current coding unit;

and writing the candidate motion vector index, the offset direction index, the list index of the offset step list, the offset step index and the motion vector precision index corresponding to the target coding parameter combination of the current coding unit into a code stream to be transmitted to a decoding end for decoding.

7. The method of claim 5, wherein the number of coding parameter combinations is a product of the number of candidate motion vectors, the number of offset directions, the number of list indices, the number of offset steps included in each offset step list, and the number of motion vector accuracies.

8. An apparatus for encoding a multimedia asset, comprising:

the acquisition module is configured to acquire the current coding unit corresponding to the multimedia resource to be coded and candidate motion information corresponding to the merging mode with the motion vector difference, wherein the candidate motion information comprises a plurality of candidate motion vectors and adjustment information of each candidate motion vector, the adjustment information comprises offset direction adjustment information and offset step length adjustment information, the offset direction adjustment information comprises a plurality of offset directions, the offset step length adjustment information comprises a plurality of list indexes and offset step length lists indicated by the list indexes, and each offset step length list comprises a plurality of offset step lengths;

a determining module configured to perform permutation and combination of each candidate motion vector, each offset direction, each list index, and each offset step in the offset step list indicated by each list index, to obtain a plurality of coding parameter combinations;

An obtaining module configured to perform encoding of the current encoding unit using each encoding parameter combination, obtaining rate-distortion cost data of each encoding parameter combination;

the determining module is configured to execute the coding parameter combination that the rate distortion cost data of each coding parameter combination meets the preset condition, determine the coding parameter combination as the target coding parameter combination of the current coding unit, and obtain the target coding result of the to-be-coded multimedia resource according to the coding result of the current coding unit corresponding to the target coding parameter combination.

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the executable instructions to implement the method of encoding a multimedia asset as claimed in any one of claims 1 to 7.

10. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method of encoding a multimedia asset as claimed in any one of claims 1 to 7.