CN113160342B

CN113160342B - Encoding method and device based on feedback, storage medium and electronic equipment

Info

Publication number: CN113160342B
Application number: CN202110529836.0A
Authority: CN
Inventors: 韩庆瑞; 阮良; 陈功; 李雪莉
Original assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Current assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2023-08-25
Anticipated expiration: 2041-05-14
Also published as: CN113160342A

Abstract

Embodiments of the present disclosure relate to the field of computer technology, and more particularly, to a feedback-based encoding method and apparatus, a storage medium, and an electronic device. The encoding method comprises the following steps: establishing communication connection with a receiving terminal, and receiving reference characteristic information fed back by the receiving terminal; constructing a reference feature weight model according to the reference feature information; and constructing a coding model based on the reference characteristic weight model so as to code video data by using the coding model. When the transmitting end encodes the video, the method can effectively consider the reference characteristic information fed back by the receiving terminal, and can utilize the reference characteristic information to decide the encoding parameters more suitable for the receiving terminal, thereby ensuring the quality of the receiving terminal to the greatest extent.

Description

Encoding method and device based on feedback, storage medium and electronic equipment

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and more particularly, to a feedback-based encoding method and apparatus, a storage medium, and an electronic device.

Background

This section is intended to provide a background or context for the embodiments of the disclosure recited in the claims, which description herein is not admitted to be prior art by inclusion in this section.

In the field of multimedia communications, it is necessary to encode transmitted video, especially for an instant communication (RTC) scene, a live scene. For example, for some characteristics of the human visual system (HVS, human visual system), a traditional coding framework was introduced based on perceptual coding techniques to obtain higher codec performance resulting in perceptual coding (PVC, perceptual video coding) techniques. Among them, the coding technology based on just-in-distortion (JND, just noticeable distortion) model is a current research hotspot.

In some techniques, at least two users in communication, a sender of video data calculates a code based on information at the sender's end and sends the code to the other party, thereby implementing bandwidth reduction by using a vision system. However, such coding schemes may deviate, resulting in a certain deviation of the video data played by the receiving end, which affects the experience and feel of the receiving end on the video.

Disclosure of Invention

In this context, embodiments of the present disclosure desire to provide a feedback-based encoding method and apparatus, a storage medium, and an electronic device.

According to one aspect of the present disclosure, there is provided a feedback-based encoding method, including:

establishing communication connection with a receiving terminal, and receiving reference characteristic information fed back by the receiving terminal;

constructing a reference feature weight model according to the reference feature information;

and constructing a coding model based on the reference characteristic weight model so as to code video data by using the coding model.

In an exemplary embodiment of the present disclosure, the receiving the reference characteristic information fed back by the receiving terminal includes:

and receiving the reference characteristic information in the preset period duration fed back by the receiving terminal, and calculating a corresponding average value, so that the reference characteristic information based on the average value is used in the next preset period duration.

In an exemplary embodiment of the present disclosure, the reference feature reference information includes: ambient brightness information and/or screen brightness information of the receiving terminal.

In an exemplary embodiment of the present disclosure, the reference characteristic information further includes: motion information; the motion information comprises any one or combination of a plurality of items of speed information, acceleration information and angular speed information corresponding to the receiving terminal;

The building the brightness masking model according to the reference characteristic information comprises the following steps:

and constructing a reference characteristic weight model by combining the environment brightness information, the screen brightness information and the motion information fed back by the receiving terminal.

In an exemplary embodiment of the present disclosure, the constructing a reference feature weight model by combining the environment brightness information, the screen brightness information and the motion information fed back by the receiving terminal includes:

JND _rec ＝a ₁ *exp(c)+a ₂ *exp(d)+a ₃ *log(m)

wherein ,a₁ 、a ₂ 、a ₃ And c is an ambient brightness value, m is a motion information value, and d is a screen brightness value.

In an exemplary embodiment of the disclosure, the constructing an encoding model based on the reference feature weight model includes:

and constructing an encoding model based on the pixel domain just noticeable distortion model by combining the reference feature weight model.

In an exemplary embodiment of the disclosure, the constructing, in combination with the reference feature weight model, an encoding model based on a pixel domain just noticeable distortion model includes:

acquiring an image to be processed, and calculating a corresponding background brightness self-adaptive threshold value and a texture masking threshold value;

constructing an just-noticeable distortion model based on the reference feature weight model, the background brightness self-adaptive threshold and the texture masking threshold so as to carry out DCT coding on an image to be processed by utilizing the just-noticeable distortion model.

In an exemplary embodiment of the present disclosure, calculating a background brightness adaptive threshold corresponding to the image to be processed includes:

and dividing the image to be processed into areas according to the preset first pane size, and calculating the average brightness value in each area so as to determine the corresponding background brightness self-adaptive threshold value according to the average brightness value of the area.

In an exemplary embodiment of the present disclosure, determining a corresponding background luminance adaptive threshold from an average luminance value of the region includes:

wherein ,the average brightness of the region is expressed.

In an exemplary embodiment of the present disclosure, calculating a texture masking threshold corresponding to the image to be processed includes:

dividing the image to be processed into areas according to a second preset pane size, and dividing each divided area again according to a third pane size;

and calculating the small area texture intensity of the area based on the texture intensity of each pixel point in the area with the third pane size, and determining the texture intensity of the area with the second preset pane size according to the small area texture intensities.

In an exemplary embodiment of the present disclosure, calculating a small area texture intensity of a region of a third pane size based on texture intensities of pixels in the region, and determining texture intensities of corresponding regions of the second preset pane size according to a plurality of the small area texture intensities, includes:

JND _tex ＝0.12*G(x,y)

G(x,y)＝max _k＝1,2,3,4 |grad _k (x,y)|

wherein ,g_k (i, j) is the texture intensity value of the pixel point, gradk (x, y) is the small region texture intensity, and G (x, y) is the region texture intensity.

In an exemplary embodiment of the present disclosure, the DCT-encoding the image to be processed using the just noticeable distortion model includes:

acquiring JND values corresponding to each pixel point of the image to be processed based on the just noticeable distortion model; the method comprises the steps of,

performing DCT coding on the image to be processed to determine original DCT coefficients corresponding to each pixel point;

and calculating a current coding rate corresponding to the image to be processed according to the original DCT coefficient and the JND value, so as to perform entropy coding on the image to be processed based on the current coding rate.

In an exemplary embodiment of the present disclosure, the calculating, according to the original DCT coefficient and the JND value, a current coding rate corresponding to the image to be processed includes:

code rate=e (dct (x, y) -JND (x, y))

Where DCT (x, y) is the original DCT coefficient.

According to one aspect of the present disclosure, there is provided a feedback-based encoding apparatus, comprising:

the reference characteristic information receiving module is used for establishing communication connection with a receiving terminal and receiving reference characteristic information fed back by the receiving terminal;

The reference feature weight model construction module is used for constructing a reference feature weight model according to the reference feature information;

and the coding module is used for constructing a coding model based on the reference characteristic weight model so as to code the video data by using the coding model.

In an exemplary embodiment of the present disclosure, the reference feature information receiving module is further configured to receive reference feature information in a preset period duration fed back by the receiving terminal, and calculate a corresponding average value, so as to use the reference feature information based on the average value in a next preset period duration.

the reference feature weight model construction module is also used for constructing a reference feature weight model by combining the environment brightness information, the screen brightness information and the motion information fed back by the receiving terminal.

In an exemplary embodiment of the present disclosure, the reference feature weight model building module includes:

JND _rec ＝a ₁ *exp(c)+a ₂ *exp(d)+a ₃ *log(m)

In an exemplary embodiment of the present disclosure, the apparatus further comprises:

and the coding model construction module is used for constructing a coding model based on the pixel domain just noticeable distortion model by combining the reference characteristic weight model.

In an exemplary embodiment of the disclosure, the coding model building module is further configured to obtain an image to be processed, and calculate a corresponding background luminance adaptive threshold and texture masking threshold; and constructing an just-noticeable distortion model based on the reference feature weight model, the background brightness self-adaptive threshold and the texture masking threshold to perform DCT coding on an image to be processed by using the just-noticeable distortion model.

In an exemplary embodiment of the present disclosure, the coding model building module includes:

the background brightness self-adaptive threshold calculating module is used for dividing the image to be processed into areas according to the preset first pane size, and calculating average brightness values in the areas so as to determine the corresponding background brightness self-adaptive threshold according to the average brightness values of the areas.

In an exemplary embodiment of the present disclosure, the background luminance adaptive threshold calculation module includes:

wherein ,the average brightness of the region is expressed.

Comprising the following steps:

comprising the following steps:

wherein ,the average brightness of the region is expressed.

In an exemplary embodiment of the disclosure, the coding model building module includes:

the texture masking threshold calculation module is used for dividing the image to be processed into areas according to a second preset pane size, and dividing the divided areas again according to a third pane size; and

In one exemplary embodiment of the present disclosure, the texture masking threshold calculation module includes:

JND _tex ＝0.12*G(x,y)

G(x,y)＝max _k＝1,2,3,4 |grad _k (x,y)|

In an exemplary embodiment of the present disclosure, the encoding module includes:

the coding execution module is used for acquiring JND values corresponding to each pixel point of the image to be processed based on the just-noticeable distortion model; performing DCT coding on the image to be processed to determine original DCT coefficients corresponding to each pixel point; and calculating a current coding rate corresponding to the image to be processed according to the original DCT coefficient and the JND value, so as to perform entropy coding on the image to be processed based on the current coding rate.

In an exemplary embodiment of the present disclosure, the code execution module includes:

the code rate calculation module is configured to calculate a current coding code rate corresponding to the image to be processed according to the original DCT coefficient and the JND value, and includes:

code rate=e (dct (x, y) -JND (x, y))

Where DCT (x, y) is the original DCT coefficient.

According to one aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, is a feedback-based encoding method as described above.

According to one aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any of the above described feedback-based encoding methods via execution of the executable instructions.

According to the feedback-based coding method and the feedback-based coding device provided by the embodiments, after the terminal equipment establishes the video communication connection, the terminal equipment is used as a sending terminal and can receive the reference characteristic information fed back by another receiving terminal; and constructing a reference feature weight model at the transmitting end according to the reference feature information, and constructing a coding model based on the reference feature weight model, so that the reference feature information fed back by the receiving terminal can be effectively considered when the transmitting end codes the video, and coding parameters more suitable for the receiving terminal can be decided by utilizing the reference feature information, thereby ensuring the subjective quality of the receiving terminal to the greatest extent.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments in this disclosure will become readily apparent from the following detailed description when read in light of the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

fig. 1 schematically illustrates a schematic diagram of a feedback-based encoding method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates an exemplary system architecture diagram of a solution according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of a method of constructing an encoding model according to an embodiment of the present disclosure;

fig. 4 schematically shows a flow diagram of a method of encoding an image to be processed according to an embodiment of the disclosure;

fig. 5 schematically illustrates a block diagram of a feedback-based encoding apparatus according to an embodiment of the present disclosure;

FIG. 6 shows a schematic diagram of a storage medium according to an embodiment of the present disclosure; and

fig. 7 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to an embodiment of the present disclosure, there are provided a feedback-based encoding method, a feedback-based encoding management apparatus, a storage medium, and an electronic device.

Any number of elements in the figures are for illustration and not limitation, and any naming is used for distinction only, and not for any limiting sense.

The principles and spirit of the present disclosure are described in detail below with reference to several representative embodiments thereof.

Summary of The Invention

The inventors have found that in some techniques, video coding techniques are mainly compression coding for spatial, temporal and statistical redundancy. Aiming at application scenes such as instant messaging, video live broadcasting or video conference, the coding of video data is mostly considered from the perspective of a sending end of the video data, and how to reduce the bandwidth by utilizing a human eye vision system. However, for the above-described scenes, the user who views the video is actually the receiving end of the video data. If encoding and calculation are performed based on only the information of the transmitting side, in most cases, the user of the video receiving side, as a viewer of the video data, may have a certain deviation in viewing experience of the video.

In view of the above, when video coding is performed, on the side of the transmitting end, features such as the viewing environment and behavior of the video receiving end need to be comprehensively considered, so that more suitable coding parameters can be comprehensively decided by using relevant information of the video receiving end at the transmitting end of the video screen, and the viewing experience of a user at the video receiving end is improved.

Having described the basic principles of the present disclosure, various non-limiting embodiments of the present disclosure are specifically described below.

Exemplary method

A feedback-based encoding method according to an exemplary embodiment of the present disclosure is described below in conjunction with fig. 1.

Referring to fig. 2, a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present disclosure may be applied is shown. As shown in fig. 2, the system architecture may include a receiving terminal device of video data (e.g., one or more of a smartphone 2031, a tablet 2032, and a computer 2033 shown in fig. 2), a network 202, and a server 201, and a transmitting terminal of video data (e.g., one or more of a smartphone 2041, a tablet 2042, and a computer 2043 shown in fig. 2). The network 202 is the medium used to provide communication links between the terminal devices and the servers. The network 202 may include various connection types, such as wired communication links, wireless communication links, and the like.

It should be understood that the number of terminal devices, networks and servers in fig. 2 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 201 may be a server cluster formed by a plurality of servers.

In an exemplary embodiment of the present disclosure, a user may participate in a multimedia conference, live broadcast, or instant messaging on a terminal device as a receiving end of video data; the transmitting end of the video data transmits the encoded video data to the receiving end through the network with the server 201. The transmitting end and the receiving end are opposite. For example, when two users perform video call through two different terminal devices, the transmitting end may also be used as another receiving end of the video data, and receive the video data transmitted by the receiving end. Of course, in some application scenarios, video data may also be sent from only one side terminal device to the other side terminal device. For example, when two terminal devices are in video conference or instant video communication, one terminal can be used as a receiving end of video data when the terminal on one side is not provided with a video component; or when the user watches live broadcast, the user watching live broadcast is used as a receiving end of live broadcast video data, and the live broadcast user is used as a transmitting end of the live broadcast video data.

Of course, in some exemplary embodiments of the present disclosure, the receiving-side terminal device and the transmitting-side terminal device may also transmit video data to each other directly through the network 202, not through the server 201.

Referring to fig. 1, the feedback-based encoding method may include the steps of:

s1, establishing communication connection with a receiving terminal, and receiving reference characteristic information fed back by the receiving terminal;

s2, constructing a reference feature weight model according to the reference feature information;

s3, constructing a coding model based on the reference characteristic weight model so as to code the video data by using the coding model.

In the feedback-based coding method and device in the embodiment, after the terminal equipment establishes the video communication connection, the terminal equipment is used as a sending terminal and can receive the reference characteristic information fed back by another receiving terminal; and constructing a reference feature weight model at the transmitting end according to the reference feature information, and constructing a coding model based on the reference feature weight model, so that the reference feature information fed back by the receiving terminal can be effectively considered when the transmitting end codes the video, and coding parameters more suitable for the receiving terminal can be decided by utilizing the reference feature information, thereby ensuring the subjective quality of the receiving terminal to the greatest extent.

Specifically, in one exemplary feedback-based encoding method of the present disclosure:

in step S1, a communication connection is established with a receiving terminal, and reference feature information fed back by the receiving terminal is received.

In an exemplary embodiment of the present disclosure, taking a video call scenario under instant messaging as an example, in terms of a transmitting terminal, after the transmitting terminal and a receiving terminal establish a communication connection, in an initial state, the transmitting terminal may encode audio data with an initial parameter or a default parameter, and transmit the encoded transmission video data and audio data to the receiving terminal.

After the video call is established, the receiving terminal first receives and decodes the video data encoded with the default parameters or the initial parameters in the initial state, and plays the decoded video data and audio data at the receiving terminal. Meanwhile, the self reference characteristic information can be acquired and sent to the receiving terminal. For example, the collected reference characteristic information may be transmitted back to the transmitting terminal through an RTCP protocol.

In an exemplary embodiment of the present disclosure, for a receiving terminal, a data acquisition period with a certain duration may be preconfigured to periodically acquire reference feature data, and the reference feature data is sent to a sending terminal according to a time node of the preset period.

Specifically, the step S1 may include: and receiving the reference characteristic information in the preset period duration fed back by the receiving terminal, and calculating a corresponding average value, so that the reference characteristic information based on the average value is used in the next preset period duration.

Specifically, for the receiving terminal, the reference characteristic information of the preset period duration fed back by the receiving terminal can be directly received, a corresponding average value is calculated at the receiving terminal side, and the average value data is applied to the next period.

Alternatively, in other embodiments, after the transmitting terminal collects the data of the current period, a corresponding average value may be calculated and fed back to the transmitting terminal. Therefore, the sending terminal can directly use the data, and the receiving terminal helps the sending terminal to share certain data calculation pressure.

For example, the acquisition period of the reference characteristic information of the receiving terminal may be 500 milliseconds. Correspondingly, at the transmitting terminal, the usage period of the reference feature information that can be configured to be synchronized is also 500 milliseconds. After the communication connection is established, the sending terminal and the receiving terminal can firstly perform time synchronization; the transmitting terminal can be applied in the next 500 ms synchronization period when receiving the reference characteristic information of the current period fed back by the receiving terminal. For example, under the condition that 24 frames of images are acquired in 1 second, for a 1 second video stream, the receiving terminal receives the reference feature information in the first 500 milliseconds and then applies the reference feature information to the 12 frames of images corresponding to the last 500 milliseconds.

Alternatively, for the transmitting terminal, a relatively long reference characteristic information usage period may be configured at the receiving terminal, taking into account other factors such as delay caused by network transmission. For example, if the parameter acquisition period of the receiving terminal is 500 milliseconds, the parameter use period configured at the transmitting terminal is 1 second.

Alternatively, in some exemplary embodiments of the present disclosure, a fixed parameter usage period may not be configured for the transmitting terminal, and the data may be used only when the reference characteristic information fed back by the receiving terminal is received; and uses the updated data when new reference characteristic information is received.

Alternatively, in other exemplary embodiments of the present disclosure, for the transmitting terminal, a data queue referencing the characteristic information may also be created, and the received data may be added to the queue in the order of reception time. In the data queue, after the data amount reaches a preset threshold, the transmitting terminal can estimate the reference characteristic data at the current moment according to the historical data in a period of time in the data queue. For example, if the reference feature data is a screen brightness value or an environment brightness value of the receiving terminal, historical data with a length of 2 seconds, 5 seconds or 10 seconds may be used as the estimated value by calculating a corresponding average value; alternatively, the brightness value at the current time may be estimated by counting the period of change of the history data in the period of time, and used as the estimated value.

In step S2, a reference feature weight model is constructed according to the reference feature information.

In the exemplary embodiment of the disclosure, for the receiving terminal, the current environment and the application scene can be further classified, so that the corresponding data type to be acquired can be determined according to the current state, environment, network condition and application scene of the terminal device. The above-mentioned reference feature information may include any one or a combination of any plurality of environmental brightness information, screen brightness information, and motion information.

For example, if the application scenario is currently a video conference, the environment and the network state of the receiving terminal are relatively stable; at this time, the reference characteristic information may be luminance characteristic information of the receiving terminal; such as ambient brightness information and screen brightness information of the receiving terminal.

Or if the current network state of the receiving terminal is unstable, the environment brightness information or the screen brightness information can be collected as the reference characteristic information.

Or if the current application scenario is video call under instant messaging, the receiving terminal is outdoor, the brightness change is frequent, and the user is in uninterrupted movement, at this time, the reference characteristic information can be the environment brightness information, the screen brightness information and the movement information of the receiving terminal.

The motion information comprises any one or combination of multiple items of speed information, acceleration information and angular speed information corresponding to the receiving terminal.

For example, the reference characteristic information includes: when the environment brightness information and the screen brightness information are included, the reference feature weight model may include:

JND _rec ＝a ₁ *exp(c)+a ₂ *exp(d)

wherein ,a₁ 、a ₂ And c is an ambient brightness value, and d is a screen brightness value.

Alternatively, the reference characteristic information includes: when the environment brightness information, the screen brightness information and the motion information are included, the reference feature weight model may include:

JND _rec ＝a ₁ *exp(c)+a ₂ *exp(d)+a ₃ *log(m)

The ambient brightness value, the screen brightness value and the motion information value in the above formula may be average values of statistical data in a period. For example, the coefficients may be a1=0.4, a2=0.2, a3=0.4. Specifically, for the receiving terminal, the motion information can be obtained by using the gyroscope of the terminal device for analysis; the light sensor is used for collecting ambient light, or the camera is used for collecting images and analyzing the images to obtain specific values of ambient brightness. In addition, the screen brightness of the terminal equipment can be obtained by calling the system process. Of course, other conventional means may be used to obtain the screen brightness, and the specific manner of obtaining the screen brightness is not repeated and limited in this disclosure.

In some exemplary embodiments of the present disclosure, specific values of each type of data may also be determined based on parameters of the current period in combination with historical data. For example:

the above formula for ambient brightness may include:

c＝Ci+(1/2)*Ci-1+(1/4)*Ci-2+(1/8)*Ci-3+…

where Ci is the ambient light level value fed back in the ith period.

The above formula of motion information may include:

m＝Mi+(1/2)*Mi-1+(1/4)*Mi-2+(1/8)*Mi-3+…

where Mi is motion information fed back in the ith period.

The above formula for screen brightness may include:

d＝Di+(1/2)*Di-1+(1/4)*Di-2+(1/8)*Di-3+…

where Di is luminance information fed back in the ith period.

In step S3, an encoding model is constructed based on the reference feature weight model to encode video data using the encoding model.

In an exemplary embodiment of the present disclosure, the above-described coding model may be a coding model based on an just noticeable distortion model.

The current just noticeable distortion (JND, just noticeable distortion) model can be roughly divided into a pixel-domain JND model and a transform-domain JND model, and the pixel-domain JND model is widely used due to simple calculation, and the basic principle of the pixel-domain JND model is mostly modeled by characterizing a luminance self-adaptation effect and a texture masking effect. In the prior art, a JND model can calculate a salient point of an image through a visual saliency model, then calculate a distance between a given pixel point and the salient point, and an eccentricity of the given pixel point compared with the salient point, then construct a modulation function based on a relation between the eccentricity and an observation distance, modulate the JND model to obtain a JND model based on a fovea, but on one hand, because a visual saliency detection method does not consider the hierarchical selectivity characteristic of human eyes in the process of observing the image, and on the other hand, for a high-definition image, modulate a JND threshold value of a modulation factor pair calculated by using a modulation function based on a relation between the retinal eccentricity and a visual distance, the modulation is possibly more than actually possible noise due to a longer distance between the salient point and the pixel point, so that a visual redundancy threshold value of human eyes on the image cannot be accurately calculated.

In order to overcome the defect, the coding model adopts a just noticeable distortion model based on a pixel domain, and a reference characteristic weight model constructed according to the reference characteristic information is added into the model, so that the defect is effectively overcome.

In an exemplary embodiment of the present disclosure, specifically, referring to fig. 3, the step S3 described above may include:

step S31, obtaining an image to be processed, and calculating a corresponding background brightness self-adaptive threshold value and a texture masking threshold value;

and step S32, constructing an accurate perceivable distortion model based on the reference feature weight model, the background brightness self-adaptive threshold and the texture masking threshold so as to carry out DCT coding on the image to be processed by utilizing the accurate perceivable distortion model.

Specifically, for each frame of image to be processed acquired by the transmitting terminal, a corresponding reference feature weight model, a background brightness self-adaptive threshold value and a texture masking threshold value are calculated for the image.

In an exemplary embodiment of the present disclosure, specifically, in the step S31 described above, calculating a background brightness adaptive threshold corresponding to the image to be processed includes: and dividing the image to be processed into areas according to the preset first pane size, and calculating the average brightness value in each area so as to determine the corresponding background brightness self-adaptive threshold value according to the average brightness value of the area.

Specifically, the formula may include:

wherein ,the average brightness of the region is expressed.

In an exemplary embodiment of the present disclosure, specifically, in the step S31 described above, calculating a texture masking threshold corresponding to the image to be processed includes: dividing the image to be processed into areas according to a second preset pane size, and dividing each divided area again according to a third pane size; and calculating the small area texture intensity of the area based on the texture intensity of each pixel point in the area with the third pane size, and determining the texture intensity of the area with the second preset pane size according to the small area texture intensities.

Specifically, the formula may include:

JND _tex ＝0.12*G(x,y)

G(x,y)＝max _k＝1,2,3,4 |grad _k (x,y)|

Wherein the first pane size and the third pane size may be the same.

In an exemplary embodiment of the present disclosure, referring to fig. 4, in the above-described step S32, performing DCT encoding on the image to be processed using the just noticeable distortion model may specifically include:

step S321, acquiring JND values corresponding to each pixel point of the image to be processed based on the just-noticeable distortion model; the method comprises the steps of,

Step S322, DCT encoding is carried out on the image to be processed to determine the original DCT coefficient corresponding to each pixel point;

step S323, calculating a current coding rate corresponding to the image to be processed according to the original DCT coefficient and the JND value, so as to perform entropy coding on the image to be processed based on the current coding rate.

Specifically, in the transmitting terminal, for the image to be processed, after the JND model calculation based on the pixel domain is utilized, the JND value corresponding to each pixel point in the image to be processed can be obtained.

Meanwhile, for the image to be processed, image segmentation may also be performed first according to a preset pane size, so as to obtain a plurality of small blocks, for example, segmentation is performed in a size of 4×4, 8×8, or 16×16. Each small block is then encoded. The RGB information of each pixel point is converted into luminance-hue-saturation system (YUV) information and resampled. Based on the sampling result, DCT transform (Discrete Cosine Transform ) is performed on each small block, thereby obtaining a corresponding original DCT coefficient, i.e., quantized coefficient. Thereby obtaining the original DCT coefficient corresponding to each pixel point.

Based on the original DCT coefficients and the JND values, the code rate may be recalculated. Specifically, the formula may include:

Code rate=e (dct (x, y) -JND (x, y))

Where DCT (x, y) is the original DCT coefficient.

In an exemplary embodiment of the present disclosure, in normal video encoding, a coding rate=e (dct (x, y)), where E (dct (x, y)) represents entropy encoding a transformed dct coefficient to obtain a binary code stream; dct (x, y) represents each dct coefficient; x, y represent the two-dimensional coordinates of the coefficients. According to the method, through the processes, a JND value is obtained for each dct coefficient; through the above steps, the coding rate=e (dct (x, y) -JND (x, y)) can be configured to reduce the coded stream.

Based on the formula, the coding parameters comprehensively considering the feedback information of the receiving terminal can be obtained. The transmitting terminal can encode the image to be processed by using the encoding code rate. Therefore, the method and the device realize that more proper coding parameters are obtained by comprehensively deciding at the transmitting terminal by utilizing the information fed back by the receiving terminal. And further, the coding rate is reduced under the condition that the subjective quality of human eyes is not reduced.

In summary, the video encoding method provided in the present disclosure may be applied to a server, for example, in a video conference including multiple parties, where each participant terminal is used as a video receiving terminal and is also a video transmitting terminal. At this time, each participant terminal can send the original video stream to the server side, and upload the reference characteristic information to the server side; the server may calculate a plurality of encoding rates with respect to other reference terminals, and then encode the video data respectively and transmit it to the corresponding receiving terminals.

Alternatively, the video encoding method described above may be applied to a terminal device, for example, a video session scene of instant messaging, and the video encoding method may be performed on two user terminal devices.

The corresponding reference feature weight model can be built in real time by receiving the reference feature information such as screen brightness information, environment brightness information, speed information and the like fed back by the video data receiving end in real time at the video data sending end. Therefore, when the image corresponding to the video stream is coded, coding parameters which are more suitable for the current application and environment of the video receiving terminal can be obtained according to the information comprehensive decision, and video coding is carried out by utilizing the coding parameters. Therefore, the purpose of further improving the subjective compression rate can be achieved on the premise that the video quality is not reduced and the subjective video quality of human eyes of a user at a video receiving terminal is not affected.

Exemplary apparatus

Having introduced the feedback-based encoding method of the exemplary embodiments of the present disclosure, next, the feedback-based encoding apparatus of the exemplary embodiments of the present disclosure will be described with reference to fig. 5.

Referring to fig. 5, a feedback-based encoding apparatus 50 of an exemplary embodiment of the present disclosure may include: a reference feature information receiving module 501, a reference feature weight model constructing module 502, and an encoding module 503, wherein:

The reference feature information receiving module 501 may be configured to establish a communication connection with a receiving terminal, and receive reference feature information fed back by the receiving terminal.

The reference feature weight model construction module 502 may be configured to construct a reference feature weight model from the reference feature information.

The encoding module 503 may be configured to construct an encoding model based on the reference feature weight model to encode video data using the encoding model.

According to an exemplary embodiment of the present disclosure, the reference feature information receiving module 501 may be further configured to receive reference feature information in a preset period duration fed back by the receiving terminal, and calculate a corresponding average value, so as to use the reference feature information based on the average value in a next preset period duration.

According to an exemplary embodiment of the present disclosure, the reference feature reference information includes: ambient brightness information and/or screen brightness information of the receiving terminal.

According to an exemplary embodiment of the present disclosure, the reference characteristic information further includes: motion information; the motion information comprises any one or combination of a plurality of items of speed information, acceleration information and angular speed information corresponding to the receiving terminal;

The reference feature weight model construction module 502 may be further configured to construct a reference feature weight model by combining the environmental brightness information, the screen brightness information, and the motion information fed back by the receiving terminal.

According to an exemplary embodiment of the present disclosure, the reference feature weight model construction module 502 may include:

JND _rec ＝a ₁ *exp(c)+a ₂ *exp(d)+a ₃ *log(m)

According to an exemplary embodiment of the present disclosure, the feedback-based encoding apparatus 50 may further include: and a coding model building module.

The coding model construction module can be used for constructing a coding model based on the pixel domain just noticeable distortion model by combining the reference feature weight model.

According to an exemplary embodiment of the disclosure, the encoding model construction module is further configured to obtain an image to be processed, and calculate a corresponding background luminance adaptive threshold and texture masking threshold; and constructing an just-noticeable distortion model based on the reference feature weight model, the background brightness self-adaptive threshold and the texture masking threshold to perform DCT coding on an image to be processed by using the just-noticeable distortion model.

According to an exemplary embodiment of the present disclosure, the encoding model construction module may further include: and a background brightness self-adaptive threshold calculating module.

The background brightness self-adaptive threshold calculating module is used for dividing the image to be processed into areas according to the preset first pane size, and calculating average brightness values in each area so as to determine a corresponding background brightness self-adaptive threshold according to the average brightness values of the areas.

According to an exemplary embodiment of the present disclosure, the background luminance adaptive threshold calculation module includes:

comprising the following steps:

wherein ,the average brightness of the region is expressed.

According to an exemplary embodiment of the present disclosure, the encoding model construction module may further include: a texture masking threshold calculation module.

The texture masking threshold calculation module may be configured to divide the image to be processed into regions according to a second preset pane size, and divide each divided region again according to a third pane size; and calculating the small area texture intensity of the area based on the texture intensity of each pixel point in the area with the third pane size, and determining the texture intensity of the area with the second preset pane size according to the small area texture intensities.

According to an exemplary embodiment of the present disclosure, the texture masking threshold calculation module includes:

JND _tex ＝0.12*G(x,y)

G(x,y)＝max _k＝1,2,3,4 |grad _k (x,y)|

According to an exemplary embodiment of the present disclosure, the encoding module 503 may include: and the code executing module.

The coding execution module may be configured to obtain JND values corresponding to each pixel point of the image to be processed based on the just-noticeable distortion model; performing DCT coding on the image to be processed to determine original DCT coefficients corresponding to each pixel point; and calculating a current coding rate corresponding to the image to be processed according to the original DCT coefficient and the JND value, so as to perform entropy coding on the image to be processed based on the current coding rate.

According to an exemplary embodiment of the present disclosure, the encoding execution module may include: and a code rate calculation module.

The code rate calculating module may be configured to calculate a current coding code rate corresponding to the image to be processed according to the original DCT coefficient and the JND value, including:

code rate=e (dct (x, y) -JND (x, y))

Where DCT (x, y) is the original DCT coefficient.

The respective functional blocks of the feedback-based encoding 50 of the present disclosure correspond to the content settings of the feedback-based encoding method described above. Based on this, each functional module in the feedback-based encoding apparatus 50 can implement a related implementation manner in which the corresponding method content is the same, and each functional module in the apparatus is consistent with the corresponding method embodiment, so that the embodiment of the apparatus will not be described herein again.

Exemplary storage Medium

Having described the feedback-based encoding method and apparatus of the exemplary embodiments of the present disclosure, next, a storage medium of the exemplary embodiments of the present disclosure will be described with reference to fig. 6.

Referring to fig. 6, a program product 60 for implementing the above-described method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Exemplary electronic device

Having described the storage medium of the exemplary embodiments of the present disclosure, next, an electronic device of the exemplary embodiments of the present disclosure will be described with reference to fig. 7.

The electronic device 800 shown in fig. 7 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 7, the electronic device 800 is embodied in the form of a general purpose computing device. Components of electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one storage unit 820, a bus 830 connecting the different system components (including the storage unit 820 and the processing unit 810), and a display unit 840.

Wherein the storage unit stores program code that is executable by the processing unit 810 such that the processing unit 810 performs steps according to various exemplary embodiments of the present disclosure described in the above section of the present specification. For example, the processing unit 810 may perform the steps as shown in fig. 1.

The storage unit 820 may include volatile storage units such as a Random Access Memory (RAM) 8201 and/or a cache memory 8202, and may further include a Read Only Memory (ROM) 8203.

Storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 830 may include a data bus, an address bus, and a control bus.

The electronic device 800 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.) via an input/output (I/O) interface 850. The electronic device 800 further comprises a display unit 840 connected to an input/output (I/O) interface 850 for displaying. Also, electronic device 800 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 860. As shown, network adapter 860 communicates with other modules of electronic device 800 over bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 800, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

It should be noted that although several modules or sub-modules of the audio playback device and the audio sharing device are mentioned in the detailed description above, this division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A feedback-based encoding method, comprising:

establishing communication connection with a receiving terminal, and receiving reference characteristic information fed back by the receiving terminal; the reference characteristic information includes: any one or combination of any multiple items of environment brightness information, screen brightness information of the receiving terminal and motion information;

constructing an encoding model based on the reference feature weight model, comprising: acquiring an image to be processed, and calculating a corresponding background brightness self-adaptive threshold value and a texture masking threshold value; constructing an just-noticeable distortion model based on the reference feature weight model, a background brightness self-adaptive threshold and a texture masking threshold to perform DCT coding on an image to be processed by using the just-noticeable distortion model; calculating a background brightness self-adaptive threshold corresponding to the image to be processed, including: and dividing the image to be processed into areas according to the preset first pane size, and calculating the average brightness value in each area so as to determine the corresponding background brightness self-adaptive threshold value according to the average brightness value of the area.

2. The feedback-based coding method according to claim 1, wherein the receiving the reference characteristic information fed back by the receiving terminal comprises:

3. The feedback-based encoding method of claim 1, wherein the reference characteristic information further comprises: motion information; the motion information comprises any one or combination of a plurality of items of speed information, acceleration information and angular speed information corresponding to the receiving terminal;

the constructing a reference feature weight model according to the reference feature information comprises the following steps:

4. The feedback-based encoding method of claim 3, wherein said constructing a reference feature weight model by combining the ambient brightness information, the screen brightness information, and the motion information fed back by the receiving terminal comprises:

JND _rec ＝a ₁ *exp(c)+a ₂ *exp(d)+a ₃ *log(m)

5. The feedback-based encoding method of claim 1, wherein determining a corresponding background luminance adaptive threshold from the average luminance value of the region comprises:

wherein ,the average brightness of the region is expressed.

6. The feedback-based encoding method of claim 1, wherein calculating a texture masking threshold corresponding to the image to be processed comprises:

7. The feedback-based encoding method of claim 6, wherein calculating a small area texture intensity for each pixel point in a region of a third pane size based on the texture intensity of the region, and determining the texture intensity for the corresponding region of the second preset pane size based on a plurality of the small area texture intensities, comprises:

JND _tex ＝0.12*G(x,y)

G(x,y)＝max _k＝1,2,3,4 |grad _k (x,y)|

wherein ,g_k (i, j) is the texture intensity value of the pixel point, grad _k (x, y) is the small region texture intensity, and G (x, y) is the region texture intensity.

8. The feedback-based encoding method of claim 1, wherein the DCT-encoding the image to be processed using the just noticeable distortion model comprises:

9. The feedback-based encoding method of claim 8, wherein the calculating a current encoding rate corresponding to the image to be processed according to the original DCT coefficient and the JND value to entropy encode the image to be processed based on the current encoding rate comprises:

code rate=e (dct (x, y) -JND (x, y))

Where DCT (x, y) is the original DCT coefficient.

10. A feedback-based encoding apparatus, comprising:

the reference characteristic information receiving module is used for establishing communication connection with a receiving terminal and receiving reference characteristic information fed back by the receiving terminal; the reference characteristic information includes: any one or combination of any multiple items of environment brightness information, screen brightness information of the receiving terminal and motion information;

the coding module is used for constructing a coding model based on the reference characteristic weight model so as to code video data by using the coding model;

the coding model construction module is used for acquiring an image to be processed and calculating a corresponding background brightness self-adaptive threshold value and a texture masking threshold value; constructing an accurate perceivable distortion model based on the reference feature weight model, the background brightness self-adaptive threshold and the texture masking threshold, so as to carry out DCT coding on an image to be processed by utilizing the accurate perceivable distortion model;

the coding model construction module comprises: the background brightness self-adaptive threshold calculating module is used for dividing the image to be processed into areas according to the preset first pane size, and calculating average brightness values in the areas so as to determine the corresponding background brightness self-adaptive threshold according to the average brightness values of the areas.

11. The feedback-based encoding apparatus of claim 10, wherein the reference characteristic information receiving module is further configured to receive reference characteristic information in a preset period duration fed back by the receiving terminal, and calculate a corresponding average value for using the reference characteristic information based on the average value in a next preset period duration.

12. The feedback-based encoding apparatus of claim 10, wherein the reference characteristic information further comprises: motion information; the motion information comprises any one or combination of a plurality of items of speed information, acceleration information and angular speed information corresponding to the receiving terminal;

13. The feedback-based encoding apparatus of claim 12, wherein the reference feature weight model building module comprises:

JND _rec ＝a ₁ *exp(c)+a ₂ *exp(d)+a ₃ *log(m)

14. The feedback-based encoding apparatus of claim 10, wherein the background luminance adaptive threshold calculation module comprises:

wherein ,the average brightness of the region is expressed.

15. The feedback-based encoding apparatus of claim 10, wherein the encoding model construction module comprises:

16. The feedback-based encoding apparatus of claim 15, wherein the texture masking threshold calculation module comprises:

JND _tex ＝0.12*G(x,y)

G(x,y)＝max _k＝1,2,3,4 |grad _k (x,y)|

17. The feedback-based encoding apparatus of claim 10, wherein the encoding module comprises:

18. The feedback-based encoding apparatus of claim 17, wherein the encoding execution module comprises:

The code rate calculation module is configured to calculate a current coding code rate corresponding to the image to be processed according to the original DCT coefficient and the JND value, so as to perform entropy coding on the image to be processed based on the current coding code rate, and includes:

code rate=e (dct (x, y) -JND (x, y))

Where DCT (x, y) is the original DCT coefficient.

19. A storage medium having stored thereon a computer program, which when executed by a processor implements the feedback-based encoding method of any of claims 1-9.

20. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the feedback-based encoding method of any of claims 1-9 via execution of the executable instructions.