WO2021156525A1

WO2021156525A1 - Method and system for estimating perceived quality in an audiovisual signal

Info

Publication number: WO2021156525A1
Application number: PCT/ES2020/070809
Authority: WO
Inventors: David JIMÉNEZ BERMEJO; Álvaro LLORENTE GÓMEZ; David MARTÍN GUTIÉRREZ; Andrés ARTALOYTIA VILARIÑO; José Manuel MENÉNDEZ GARCÍA; Federico ÁLVAREZ GARCÍA
Original assignee: Universidad Politécnica de Madrid
Priority date: 2020-02-07
Filing date: 2020-12-21
Publication date: 2021-08-12
Also published as: ES2767475A1

Abstract

The present invention enables the quality of the contents that are transported to be monitored, and it further provides relevant information that enables the audiovisual services deployed in communication networks to be optimised. The method for estimating perceived quality in an audiovisual signal comprises the following steps: capturing characteristic parameters of the data transport service regardless of the network, both for the video signal and the audio signal contained in the audiovisual signal; calculating a set of no-reference metrics, from the characteristic parameters of the transport service, which provide an ordered set of parameters with the quality distortion levels of the audiovisual content; and, obtaining a perceived quality value by applying a mathematical algorithm, signal processing and machine learning techniques on the characteristic parameters of the service and the set of no-reference metrics.

Description

METHOD AND SYSTEM FOR THE ESTIMATION OF PERCEIVED QUALITY IN SIGNAL

AUDIOVISUAL

DESCRIPTION

Technical sector

The invention falls within the Audiovisual Technologies sector for estimating perceived quality by applying an objective model of subjective quality of audiovisual content (signals) in media services for different digital content platforms with distribution through different communication networks. . The invention can be implemented as a software product, virtualizable and instantiable in any communication network, which offers the results of the final quality assessment of the analyzed content, through a user interface, on a limited numerical scale.

The invention allows the monitoring of the quality of the contents that are transported, and also provides relevant information that allows the optimization of the audiovisual services deployed in communication networks.

State of the art

The consumption of audiovisual content through different networks and digital platforms results in the activity that generates the highest network data traffic and various business models in which providing the best quality of experience (QoE) results key. Networked multimedia services, such as IP Television (IPTV, Internet Protocol Television), content associated with social networks, streaming and on-demand services, virtual reality services, video games, are responsible for increasing traffic network and with expectations of continuing exponential growth.

In this context, QoE assessment plays a key role for networked multimedia applications and services. Especially, through the development of objective QoE metrics that correlate with perceived subjective measures. To do this, they rely on different schemes to try to model how users perceive and experience quality losses in content, such as degradations. East Analysis is usually focused on the two main elements of multimedia services and applications: video and audio.

An accurate estimate of QoE allows providers of multimedia services and applications to improve the provision of services, and optimize their use of the network. In fact, the QoE evaluation of audiovisual content is becoming an extremely important issue.

Most of the existing QoE models are based on the quality assessment (QA) of audio (AQA, Audio Quality Assessment), video (VQA, Video Quality Assessment) or both, analyzing the different processes that they can cause deterioration of signal quality such as: acquisition, preprocessing, encoding, transmission, presentation and storage. Within these, subjective methods can be distinguished, in which users rate the quality they perceive in the content offered under certain conditions and through standardized schemes; and objectives, which apply mathematical algorithms to the information they can extract from network media services and applications to approximate the measure of perceived quality.

Different objective quality models are normally used for audio and video. The most common models estimate the degree of quality, or quality degradation, due to coding, taking into account parameters such as bit rate, resolution, number of frames per second, sampling rate, number of channels, etc. . The usual result delivered by these quality models is typically a list of scores on a MOS (Mean Opinion Score) scale, where each score represents quality for a time segment of content. Some of these models are standardized in the recommendation of the International Telecommunications Union (ITU) P.1201. Depending on the typology of the parameters used, different categories are defined for the objective systems, from low to high complexity: parametric, bitstream and hybrids.

The objective models are compared with subjective measures of the same contents on which the objective measures are applied. These are carried out following standardized procedures, included in international regulations and recommendations, such as ITU-R BT.500 (Methodology for the subjective assessment of the quality of television pictures), ITU-T P.910 (Subjective video quality assessment methods for multimedia applications) and ITU-T P.913 (Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environmentf).

Regarding the patents related to the quality of the experience of audiovisual signals, it is worth highlighting the patent application CN108900862-A with title: “A network video stream QoE-QoS parameter mapping method based on statistics analysis”, and the application for Patent with publication number US2019124375-A1 with title: “Quality Estimation Of Adaptive Multimedia Streaming”.

Brief description of the invention

The invention belongs to the technical field of quality analysis of audiovisual content in media services for different digital content platforms with distribution through different communication networks. The invention can be implemented as a software product, virtualizable and instantiable in any communication network, which offers the results of the quality assessment of the analyzed content, through a user interface, on a limited numerical scale.

In the context of the present invention, audiovisual content services can come from different content distribution, broadcast and / or transport platforms. Furthermore, the present invention has the ability to work with constant, variable and adaptive bit rates for all types of audiovisual signal.

Another distinctive feature of the present invention is that it has the ability to offer measurements in real time, and in configurable blocks of time. Also, the ability to work with any type of audiovisual signal, regardless of its nature, origin and / or configuration. Furthermore, the means of analysis and signal processing do not need references to the original signal.

The invention can be carried out by means of several interconnected modules, which are described below:

1) Capture module: developed in a high-level programming language, which includes the transport stream decapsulator (TS, from English Transport Stream) and other audiovisual containers such as AVI or Matroska, streaming in any of its modalities (for example, UDP (User Datagram Protocol), RTP (Real Time Protocol), RTSP (Real Time Streaming Protocol), HTTP (Hypertext Transfer Protocol), unicast and multicast, adaptive streaming based on MPEG DASH (Dynamic Adaptive Streaming over HTTP), HLS (HTTP Uve Streaming), etc.) for multi-resolution content; and decoding of any type of audiovisual signal (that makes use, for example, of MPEG-2, MPEG-4 Part 10 (A VC or H.264), MPEG-H Part 2 (HEVC or H.265) encoding, etc. . for video, and MP3, AC3, AC3 +, AC4, etc. for audio. It allows the extraction of parameters that characterize the audiovisual content service, such as the resolution of the video signal, the coding standard for audio and video (for separately) used, the bit rate used for video and audio, the type of scan and the number of frames per second in the case of video, the number of audio channels, the sample rate used for audio, etc.) Module of calculation of quality metrics without reference: developed in a high-level programming language for the calculation of the perceptible levels of distortions associated with the degradation of the audiovisual signal due to the stages of acquisition, preprocessing, encoding, transmission , presentation and storage of s signal, and those introduced by the distribution network, broadcasting and / or transport of audiovisual content, such as: spatial complexity, temporal complexity, absence of audio signal, level of blurring in the video signal, speech intelligibility, level presence of blocks in the image, presence of frozen frames, loss of audio channels, excessive presence of black frames in the video, distortion level of high-frequency areas in the image, equalization imbalances in the audio channels, existence of mismatched col orimetric levels in the video, etc. These measurements are made on the audiovisual information obtained, and are offered as a set of significant measurements together with their relevant statistics that are offered as input for the next module in the form of a feature vector. The range of potential values of the metrics is known, and allows setting thresholds to grant different treatments and priorities when they are transferred to the end user through the graphical interface (for example, through alarms), or through information compilation files. and generated by the invention, either created in real time, or through the management of structured data in the storage system of the system.

3) Quality prediction module: based on mathematical algorithms, signal processing and machine learning techniques, which allows, on a programming model based on the implementation of software pipes (pipelines), the establishment of communications between the different sub-modules employees to attend to the user settings on the system. The prediction module is built on the basis of mathematical algorithms, signal processing and machine learning techniques, the latter supported by training processes that use a wide set of labeled audiovisual content, which allow characterizing the presence of different artifacts, and the response that originate at the level of perceived or experienced quality at the time of consumption. The sub-modules that the quality prediction module integrates are: Lasso and Ridge regressors, regressors with support machines (SVR), Random Forest and various deep learning models (Deep Learning).

The characteristics obtained from both the capture module (vector with the parametric properties of the signal) and the module of quality metrics without reference (vector with the values of the measurements per analyzed time segment and their statistics of up to second order) are used as input parameters for the quality prediction system, which, after passing through the different processing stages internal to the quality prediction module, end up offering, in the output layer, the quality result. The prediction architectures used are oriented towards efficient computation to provide the system with the capacity to work in real time, and optimized in terms of precision and absence of errors.

When the present invention is implemented as a system embedded in compact software, instantiable as a virtual machine or container, it allows the calculation of the quality prediction of an audiovisual content to be carried out without software or hardware restrictions imposed by the work environment where it is used. deployment. Furthermore, the deployment of the present invention can be carried out at any point in the distribution, diffusion and / or transport chain, end-to-end. One aspect of the present invention is a method for estimating perceived quality in audiovisual signal. The audiovisual signal is transmitted by at least one communication network with audiovisual content selected from the distribution network, the broadcast network and the transport network. The method comprises the following steps:

- capture some characteristic parameters of the data transport service regardless of the network, both for the video signal and the audio signal contained in the audiovisual signal;

- calculate a set of metrics without reference, from the characteristic parameters of the data transport service, which provide an ordered set of parameters with the quality distortion levels of the audiovisual content; Y,

- Obtain a perceived quality value by applying mathematical algorithms, signal processing and machine learning techniques on the characteristic parameters of the service and the set of metrics without reference.

In an embodiment of the method for estimating perceived quality in audiovisual signal, the characteristic parameters of the service are selected between the resolution of the image, the number of channels and the sampling frequency of the audio, the bit rates used in the coding audio and video, and combinations of the above.

In another embodiment of the method for estimating perceived quality in audiovisual signal, calculating a set of metrics without reference additionally comprises calculating an energy distribution as a function of frequency and detecting structures in both the audio signal and the signal. video with the ability to produce perceptual discomfort from encoding.

In another embodiment of the method for the estimation of perceived quality in audiovisual signal, the mathematical algorithm, the signal processing and the machine learning techniques are Lasso and Ridge regressors, support machine regressors (SVR), Random Forest, and unvarious Deep Learning models.

Brief description of the figures

Figure 1 represents the functional blocks of the system. The drawing is made up of the following elements: 1) Capture module [1]

2) Module for calculating quality metrics without reference [2]

3) Quality prediction module [3]

4) Exchange of information between modules.

5) Structured information storage system [4]

6) Exchange of information with the structured information storage system.

7) User interface [5]

8) Information exchange with the interface.

Figure 2 shows a flow chart for the three main modules:

1) Capture module. Responsible for the acquisition of the audiovisual signal, its most significant parameters and the parameters of the service [1]

2) Module of quality metrics without reference. Provides quality measures of captured audiovisual content [2]

3) Quality prediction module. Provides the quality result obtained [3]

Figure 3 shows a flow chart of data exchange at the operational level in the following stages. The data exchanges are carried out through a data file with predefined coding.

1) Capture module [1] Formed by two modules that acquired the audiovisual flow, they separate the service metadata [11] and the media content [12] From them, the service information [13] and the information are extracted of audiovisual content [14] The first is intended for the quality prediction module [3], the second for the module of quality metrics without reference [2]

2) Non-referenced quality metrics module [2] Provides your results to the quality prediction module [3]

3) Quality prediction module [3] The quality result is transferred by means of a data file with predefined coding to the graphical user interface [5] and to the structured information storage system of the invention [4]

Figure 4 shows a flow chart according to the present invention at the level of presentation of results. They are presented:

1) Graphic and textual information of the source (audiovisual content) under analysis

[51]. 2) Access to the system's structured storage system [52]

3) Graphic information on the quality of audiovisual content in real time from the quality prediction module [53]

4) Graphic information on the quality history of audiovisual content in a time bracket [54]

5) Graphic and textual information of the most significant artifacts suffered by audiovisual content in the analysis period, categorized as alarms and warnings [55]

Detailed description of an embodiment

It specifically refers to a method based on the hybrid analysis of parametric properties, which can be extracted from the parameterization used in the audiovisual stream, and intrinsic media, which includes the capture of multimedia signal [1], the extraction of objective parameters of the audiovisual signal and the calculation of objective metrics [2], to provide through mathematical algorithm (which makes use of signal processing and machine learning), with the elements previously described as input, a measure of audiovisual quality [ 3] of an audiovisual network flow for different platforms for the distribution, diffusion and / or transport of audiovisual content that can work with systems of constant or variable bandwidth and constant, variable and adaptive binary rates.

The method provides a value of the equivalent of perceived quality expressed on a limited numerical scale after a sequential process of applying a model that includes three stages aimed at: 1) capturing the information from the audio and video stream [1], 2) calculating a set of quality metrics without reference on audiovisual content [2], and 3) provide the measure of quality offered using this data in a model based on mathematical algorithms (which makes use of signal processing and machine learning) that provides a prediction of perceived quality according to various operating modes (predefined a priori and characterized by the parameters considered to make the prediction) [3]

The data flow object of analysis, coming from a network of distribution, diffusion and / or transport of audiovisual content, is captured by means of a module that allows the collection of parameters of characterization of the media [14] and of the quality of the transport service of information [13] The parameters captured are, among others, the resolution of the image, the number of frames per second, the number of channels and the sampling frequency of the audio, and the bit rates used in the encoding of audio and video. In addition to the above parameters, the module captures the audiovisual signal for further processing. This processing is carried out in the second [2] and third module [3], destined to obtain the parameters of the audiovisual signal, such as the video luminance levels and their distribution, the tint and saturation values, the component frequencies and audio formants, and entropy levels (both audio and video) and the calculation of quality metrics without reference [21] on it, focused on the calculation of the energy distribution as a function of frequency and the detection of structures both in the audio and in the image with the capacity to produce perceptual discomfort from the encoding. The number of parameters considered varies depending on the previously selected operating mode, which allows selecting the precision (complexity) of the algorithm and, consequently, the speed of execution. These modes of operation allow the operation of the system in real time regardless of the hardware and software resources available in the system's deployment environment. The parameterization obtained from the service [13], together with the parameters of the audiovisual signal [14] and the measurements of the quality metrics without reference [21], are inserted in a data file with predefined coding for its exchange between the modules of the system. All this information is turned against the quality prediction module [3] which, taking the input parameters compiled in the data file with predefined communication coding, returns a quality value on a limited numerical scale after the calculations performed in the prediction module, depending on the selected operating mode. To do this, it passes the data introduced by a set of mathematical models such as Lasso and Ridge regressors, regressors with support machines (SVR), Random Forest, and various Deep Learning models. For any of the operating modes, the result is dependent on the parameters obtained by means of mathematical algorithms, signal processing and applied machine learning techniques. The system returns a numerical value on a limited scale, resulting from the processes executed by the different modules of the invention. Said numerical value constitutes an assessment of the quality of the audiovisual content under analysis per predetermined time unit. Additionally, the invention offers textual and graphic information temporally synchronized with the captured audiovisual signal, which allows the subsequent generation of quality reports and the generation of previously configurable alarms in the system [55] The invention includes a structured storage system, which provides scalability, performance and high availability, for the entire set of analyzed audiovisual segments and the volume of data obtained over time [4]

The invention includes a graphical interface for visualization and interaction with the user [5] that allows the visualization of the evolution in real time of the quality measures provided by the system [54], the visualization of the temporal evolution of the same in a time interval set by the user [53], and access to a set of additional information, parameterizable by the user, with exact temporal correspondence with various effects potentially present in the stream of audiovisual content analyzed.

The QoE measurement represents a challenge not completely solved in the environment of the distribution, broadcasting and / or transport of audiovisual content, where the availability of the original content (to use as a reference) is very scarce or null, and the influence of those processed on the contents, required by the necessary adaptation to the capacities of the distribution, diffusion and / or transport network, very significant. Subjective evaluations are unfeasible because they are expensive and slow, and current objective models lack the flexibility and scalability necessary to meet the growing needs related to the provision of media services and applications in order to ensure the best provision of them to users. end.

In response to the above challenge, the invention proposes a virtualizable software service that can be dynamically instantiated at different points, end-to-end, of the multimedia content distribution, broadcast and / or transport network. The invention makes it possible to provide a quality measure, similar to what a user could provide, making use exclusively of information from the network and the content itself, without the need to rely on versions of that content prior to the processing stages that precede it. in the chain of distribution, diffusion and / or transport of the same. The quality measure offered by the system, provided as a numerical value within a limited scale, comes from an algorithm developed with the help of mathematical algorithms, signal processing and machine learning techniques, and has the ability to self-adapt through the information it is able to collect during its operation, relying on reinforcement learning techniques. The present invention overcomes the problems noted above through the features listed in the claims.

In particular, the invention allows, by means of a compact software, instantiable as a virtual machine or container, the performance of the calculation of the quality prediction of an audiovisual content with different properties (resolution in the case of video, encoding standard for both audio as for video, bits / audio sample, bit rate for both audio and video, etc.), offered as a numerical result on an easily intelligible scale, through a visual interface, which can also be stored and later retrieved in a structured way.

Compared with other state-of-the-art proposals, this proposal has: 1) the capacity to function in a virtualized way, allowing it to be dynamically instantiated at different points and with different time durations through different virtualization technical solutions; 2) ability to work on different content in parallel, or on the same content at different points in the distribution, broadcast and / or transport network of audiovisual content; 3) ability to capture and process the data resulting from its execution to improve its own measurement models, applying machine learning techniques by reinforcement; 4) ability to offer results in adjustable scales and even user-defined, taking into account different sensitivities to discrimination capabilities against possible defects present in the content; 5) ability to establish different storage structures for data, work with them individually and in aggregate, and retrieve them according to different criteria set by the user.

The object of the present invention is not any claim on the structured information storage system, nor the data presentation and visualization technologies linked to the graphical interface.

Industrial application

The application possibilities are manifold, since the invention offers numerous advantages. The system allows the evaluation of the quality of audiovisual content, offering a result that can be used for: • Provide a value on a quality scale that allows the qualification and discrimination of audiovisual content services regardless of their nature and distribution, broadcast or transport platform.

• Provide visual and reliable suggestions for network audiovisual service providers that allow them to guarantee the quality of the user experience.

• Provide alarms and indicators in situations of critical quality in audiovisual content distribution, broadcast and / or transport services that allow decision-making on their reconfiguration.

All of the above opens the possibility of using the system in:

• Quality verification of audiovisual contribution and distribution signals.

• Performance analysis of coding and statistical multiplexing models, applied to terrestrial multiplexes or satellite spectral segments.

• Analysis of the quality of services and network media programs and verification of compliance with the quality policies of audiovisual service providers.

• Determination of the impact of technical and service configurations on perceived audiovisual quality.

• Quantification of the impact on the audiovisual content of different distribution, broadcast and / or transport platforms for providers and content aggregators.

In general, the invention allows the realization of any application for monitoring and analyzing the quality of audiovisual content and audiovisual content services.

Claims

1. Method for estimating perceived quality in audiovisual signal implemented by computer, where the audiovisual signal is transmitted by at least one communication network with audiovisual content selected from the distribution network, broadcast network and transport network; the method comprises the following steps:

- capture some characteristic parameters of the data transport service regardless of the network, both for the video signal and the audio signal contained in the audiovisual signal (1);

- calculate a set of metrics without reference, from the characteristic parameters of the data transport service, which provide an ordered set of parameters with the quality distortion levels of the audiovisual content (2); Y,

- Obtain a perceived quality value by applying mathematical algorithms, signal processing and machine learning techniques on the characteristic parameters of the service and the set of metrics without reference (3).

2. Method for estimating perceived quality in audiovisual signal implemented by computer, according to claim 1, characterized in that the characteristic parameters of the service (13) are selected from the resolution of the image, the number of channels and the sampling frequency of the audio, the bit rates used in audio and video encoding, and combinations of the above.

3. Method for estimating perceived quality in audiovisual signal implemented by computer, according to claim 1, characterized in that calculating a set of metrics without reference (21) additionally comprises calculating an energy distribution as a function of frequency and detecting structures both in the audio signal as in the video signal with the ability to produce perceptual discomfort from encoding.

4. Method for estimating perceived quality in audiovisual signal implemented by computer, according to claim 1, characterized in that the mathematical algorithm, the signal processing and the machine learning techniques are Lasso and Ridge regressors, regressors with support machines " SVR ”, Random Forest, and some Deep Learning models (3).

5. System for estimating perceived quality in audiovisual signal, which is transmitted by at least one communication network with audiovisual content selected from distribution network, broadcast network and transport network; the system being deployed in a virtual machine, said system is characterized in that it comprises:

• a capture module configured to capture some characteristic parameters of the data transport service regardless of the network, both for the video signal and the audio signal contained in the audiovisual signal (1);

• a module for calculating quality metrics without reference configured to, based on the characteristic parameters of the transport service, provide an ordered set of parameters with the quality distortion levels of audiovisual content (2); Y,

• a quality prediction module configured to obtain a perceived quality value by applying mathematical algorithms, signal processing and machine learning techniques on the characteristic parameters of the service and the set of metrics without reference (3).

6. Computer program comprising instructions that, when executing the program on a computer, cause the computer to carry out the steps of the method for estimating perceived quality in computer-implemented audiovisual signal of any of claims 1 to 4.

7. Computer-readable medium that contains the computer program of claim 6 and that when read and executed by a computer implements the method claimed in any of claims 1 to 4.