CN114079777A

CN114079777A - Video processing method and device

Info

Publication number: CN114079777A
Application number: CN202010841624.1A
Authority: CN
Inventors: 马利; 折小强; 滕艺丹; 刁文波; 汪学斌; 尹东明; 苏敏; 韩文勇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2022-02-22

Abstract

The embodiment of the application provides a video processing method and device, and relates to the field of data processing. The method comprises the following steps: the mobile edge computing device receiving a plurality of first video streams from a plurality of first terminal devices; the mobile edge computing equipment acquires a second video stream, a background image and description parameters according to the plurality of first video streams; the second video stream is a video stream synthesized according to the plurality of first video streams; the mobile edge computing device determines values of a plurality of target metrics from the plurality of first video streams, the second video stream, the background image, and the description parameters; the plurality of target indicators are related to human factors engineering indicators and objective video quality indicators; the mobile edge computing device performs weighted computation on the values of the plurality of target indexes to obtain a processing result reflecting the quality of the second video stream. The embodiment of the application integrates subjective and objective evaluation methods, does not need real-time participation of people, is easy to realize, and can obtain a more accurate processing result reflecting the video quality.

Description

Video processing method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a video processing method and apparatus.

Background

With the development of image processing technology, video live broadcast is more developed. For example, camera devices for capturing video streams may be installed in a game field, and a video synthesis device synthesizes video streams captured by the camera devices into a panoramic video, and then distributes the panoramic video through a network medium or the like, so as to live a game process to users.

In a possible design, in order to obtain a synthesized video with good quality, the quality of the synthesized video may be detected, so that synthesis optimization or the like may be performed with reference to the quality of the synthesized video. For example, a synthesized video may be obtained, and the related information of the synthesized video may be analyzed and counted, and the related information may be used to reflect the video quality. Illustratively, the correlation information may include Mean Squared Error (MSE), signal-to-noise ratio (SNR), peak signal-to-noise ratio (PSNR), Root Mean Squared Error (RMSE), and the like.

However, the video quality reflected in this approach is often inaccurate and does not provide a valid reference for optimizing video processing.

Disclosure of Invention

The embodiment of the application provides a video processing method and device, and a Mobile Edge Computing (MEC) device can obtain information reflecting video quality by combining subjective and objective parameters, and can obtain more accurate information reflecting video quality.

In a first aspect, an embodiment of the present application provides a video processing method, including: the mobile edge computing device receiving a plurality of first video streams from a plurality of first terminal devices; the mobile edge computing equipment acquires a second video stream, a background image and description parameters according to the plurality of first video streams; the second video stream is a video stream synthesized according to the plurality of first video streams, the background image is obtained by performing foreground and background processing on the second video stream, and the description parameters are used for describing scene information corresponding to the second video stream; the mobile edge computing device determines values of a plurality of target metrics from the plurality of first video streams, the second video stream, the background image, and the description parameters; the plurality of target indicators are related to human factors engineering indicators and objective video quality indicators; the mobile edge computing equipment performs weighted computation on the values of the target indexes to obtain a processing result of the second video stream; the processing result is used to reflect the quality of the second video stream. In the embodiment of the application, the target index is related to both the human factor engineering index and the objective video quality index, and the subjective parameter and the objective parameter can be simultaneously combined to obtain the processing result reflecting the video quality, so that the processing result is more accurate compared with an objective video quality evaluation method, and compared with a subjective video quality evaluation method, the video processing method does not need real-time participation of people and can be automatically realized, so that the video processing method provided by the embodiment of the application is easy to realize and can obtain the more accurate processing result reflecting the video quality.

In one possible implementation, the mobile edge computing device determining values of a plurality of target metrics from a plurality of first video streams, a second video stream, a background image, and a description parameter, includes: the mobile edge computing device collects values of a plurality of first parameters from the plurality of first video streams, the second video stream, the background image, and the description parameters; the first parameters comprise general objective video parameters acquired from the second video stream, parameters reflecting video processing quality acquired from the second video stream, a plurality of first video streams and background images, parameters reflecting video processing algorithm quality, interaction parameters and user side parameters; the mobile edge computing device determines values of a plurality of target metrics for the second video stream score based on the values of the plurality of first parameters. Therefore, the parameters including the objective quality of the video and the parameters reflecting human factors can be obtained, and accurate processing results can be obtained subsequently.

In one possible implementation, the mobile edge computing device determining values of a plurality of target metrics for the second video stream scoring from values of a plurality of first parameters, comprising: the mobile edge computing device determining values of a plurality of first video metrics from the values of the plurality of first parameters; the first video index comprises an objective video quality index and a correlation index determined according to a correlated parameter in the plurality of first parameters; the mobile edge computing device determines values of a plurality of target metrics from the values of the plurality of first video metrics. In the embodiment of the application, when the values of the multiple target indexes are determined, part of the associated parameters in the first parameter can be divided into the associated indexes again, and then the values of the multiple target indexes are determined according to the associated indexes and the objective video quality indexes, so that on one hand, the calculated amount can be saved, on the other hand, the associated indexes can comprehensively reflect the quality condition of the video, and further, when the values of the multiple target indexes are determined by using the associated indexes, more accurate values of the target indexes can be obtained.

In one possible implementation, the mobile edge computing device determining values of a plurality of target metrics from values of a plurality of first video metrics, comprising: the mobile edge computing equipment determines the value of the human factor engineering index according to the values of the plurality of first video indexes and the first model corresponding to each first video index; the first model is used for outputting a human factor engineering related value corresponding to the first video index by using the value of the first video index; the mobile edge computing device determines values of a plurality of target indicators based on the values of the ergonomic indicator and the values of the objective video quality indicator. Therefore, the influence of subjective factors on the video quality can be clearly reflected on the basis of human factor engineering indexes, and an accurate processing result can be obtained.

In one possible implementation, the correlation index includes one or more of the following: image sharpness related to resolution and/or screen size and/or viewing distance, image fineness related to peak signal-to-noise ratio and/or information entropy, detail recognizability related to brightness contrast threshold and/or texture detection threshold and/or inter-frame brightness difference, seam invisibility related to structure similarity and/or edge difference spectral information, brightness imbalance related to brightness contrast threshold and/or saturation, operation and response speed related to head screen latency and/or viewpoint transition speed and/or video processing speed, ghosting related to foreground region area and/or foreground region motion speed and/or edge difference spectral information, or distortion related to structure similarity and/or streak similarity.

In one possible implementation, the ergonomic indicator includes one or more of the following: video information content, operation and response experience, visual information fidelity or video fluency; wherein the video information content is related to image definition, image fineness and/or detail recognizability; the operation and response experience is related to the operation freedom richness, the operation and response speed and/or the operation accuracy; visual information fidelity is related to seam invisibility, brightness imbalance, ghosting and/or distortion; video fluency is related to the cadence duration, cadence frequency, frame stability, and/or frame rate.

In one possible implementation, the generic objective video parameters include one or more of the following parameters calculated for reference-free objective evaluation of the second video stream: resolution, frame rate, entropy, saturation, peak signal-to-noise ratio, brightness contrast threshold, or texture detection threshold.

In one possible implementation, the parameters reflecting the video processing quality include: the method comprises the steps of obtaining the image stability through non-reference subjective evaluation, and obtaining one or more of the following parameters, namely peak signal-to-noise ratio, structural similarity, edge difference spectrum information, information entropy, inter-frame brightness difference, foreground area movement speed or texture similarity, through reference objective evaluation calculation on a second video stream, a plurality of first video streams and a background image.

In one possible implementation, the parameters reflecting the quality of the video processing algorithm include one or more of the following parameters calculated using a reference-free objective evaluation: algorithm temporal complexity or algorithm spatial complexity.

In one possible implementation, the interaction parameters include one or more of the following parameters: the method comprises the steps of obtaining sense of reality by adopting non-reference subjective evaluation, obtaining operation accuracy by adopting the non-reference subjective evaluation, obtaining first screen delay by adopting the non-reference objective evaluation, obtaining a field angle by adopting the non-reference objective evaluation, obtaining a pause time length by adopting the non-reference objective evaluation, obtaining a pause frame rate by adopting the non-reference objective evaluation or obtaining operation freedom richness by adopting the non-reference objective evaluation.

In one possible implementation, the client-side parameters include one or more of the following parameters: a screen size obtained by the reference-free objective evaluation or a viewing distance obtained by the reference-free objective evaluation.

In one possible implementation, the objective video quality indicator includes one or more of the following: frame rate, field of view, realism, resolution, peak signal-to-noise ratio, or entropy.

In one possible implementation, the target indicator includes one or more of: viewing experience, interactive experience, or objective parameters; the viewing experience is related to video information amount, visual information fidelity, video fluency and/or sense of reality, the interaction experience is related to field angle and/or operation and response experience, and the objective parameters are related to resolution, frame rate, peak signal-to-noise ratio and/or information entropy.

In one possible implementation, the description parameters include one or more of the following: scene number, scene type, video stream brightness range, video stream capture angle, long shot, short shot, match scene, or static scene.

In one possible implementation manner, the method further includes: and the mobile edge computing equipment sends the processing result and the second video stream to the second terminal equipment. In this way, the second terminal device can play the second video stream and/or adjust the second video stream with reference to the processing result.

In one possible implementation manner, the method further includes: the mobile edge computing device adjusts the second video stream according to the processing result to obtain an adjusted video stream; and the mobile edge computing equipment sends the adjusted video stream to the second terminal equipment. Therefore, the MEC equipment can adjust the video stream with better quality according to the processing result, further send the video stream with better quality to the second terminal equipment, and display the video stream with better quality in the second terminal equipment.

In a second aspect, an embodiment of the present application provides a video processing apparatus, where the video processing apparatus may be an MEC device, and may also be a chip or a chip system in the MEC device. The video processing apparatus may comprise a processing unit. When the video processing apparatus is an MEC device, the processing unit may be a processor. The video processing apparatus may further include a storage unit, which may be a memory. The storage unit is configured to store instructions, and the processing unit executes the instructions stored by the storage unit to enable the MEC apparatus to implement the video processing method described in the first aspect or any one of the possible implementation manners of the first aspect. When the video processing apparatus is a chip or a system of chips within an MEC device, the processing unit may be a processor. The processing unit executes the instructions stored by the storage unit to cause the MEC apparatus to implement the video processing method described in the first aspect or any one of the possible implementation manners of the first aspect. The memory unit may be a memory unit (e.g., a register, a cache, etc.) within the chip, or may be a memory unit (e.g., a read-only memory, a random access memory, etc.) within the MEC device that is external to the chip.

Illustratively, the communication unit is used for receiving a plurality of first video streams from a plurality of first terminal devices; the processing unit is used for acquiring a second video stream, a background image and description parameters according to the plurality of first video streams; the second video stream is a video stream synthesized according to the plurality of first video streams, the background image is obtained by performing foreground and background processing on the second video stream, and the description parameters are used for describing scene information corresponding to the second video stream; the processing unit is further used for determining values of a plurality of target indexes according to the plurality of first video streams, the plurality of second video streams, the background image and the description parameters; the plurality of target indicators are related to human factors engineering indicators and objective video quality indicators; the processing unit is also used for carrying out weighted calculation on the values of the target indexes to obtain a processing result of the second video stream; the processing result is used to reflect the quality of the second video stream.

In a possible implementation, the processing unit is specifically configured to collect values of a plurality of first parameters from a plurality of first video streams, a second video stream, a background image, and description parameters; the first parameters comprise general objective video parameters acquired from the second video stream, parameters reflecting video processing quality acquired from the second video stream, a plurality of first video streams and background images, parameters reflecting video processing algorithm quality, interaction parameters and user side parameters; and determining values of a plurality of target metrics for the second video stream score based on the values of the plurality of first parameters.

In a possible implementation, the processing unit is specifically configured to determine values of a plurality of first video indicators according to values of a plurality of first parameters; the first video index comprises an objective video quality index and a correlation index determined according to a correlated parameter in the plurality of first parameters; and determining values of a plurality of target metrics based on the values of the plurality of first video metrics.

In a possible implementation manner, the processing unit is specifically configured to determine a value of a human factor engineering index according to values of a plurality of first video indexes and a first model corresponding to each first video index; the first model is used for outputting a human factor engineering related value corresponding to the first video index by using the value of the first video index; and determining values of the plurality of target indexes according to the values of the human factor engineering indexes and the values of the objective video quality indexes.

In a possible implementation manner, the communication unit is further configured to send the processing result and the second video stream to the second terminal device.

In a possible implementation manner, the processing unit is further configured to adjust the second video stream according to the processing result to obtain an adjusted video stream; and the communication unit is also used for sending the adjusted video stream to the second terminal equipment.

In a third aspect, an embodiment of the present application provides an electronic device, including: means for performing the first aspect or any of its possible implementations.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor configured to invoke a program in a memory to perform the first aspect or any of the possible implementation manners of the first aspect.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor and interface circuitry for communicating with other devices; the processor is configured to execute the code instructions to implement the first aspect or any of its possible implementation manners.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium storing instructions that, when executed, implement the first aspect or any of the possible implementation manners of the first aspect.

It should be understood that the second aspect to the sixth aspect of the embodiments of the present application correspond to the technical solutions of the first aspect of the embodiments of the present application, and beneficial effects obtained by various aspects and corresponding possible implementations are similar and will not be described again.

Drawings

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic diagram of an MEC network architecture provided in an embodiment of the present application;

fig. 3 is a schematic diagram of an MEC network architecture provided in an embodiment of the present application;

fig. 4 is a schematic diagram of an MEC network architecture provided in an embodiment of the present application;

fig. 5 is a schematic flowchart of a video processing method according to an embodiment of the present application;

fig. 6 is a conceptual diagram for constructing a video processing model according to an embodiment of the present application;

fig. 7 is a schematic diagram illustrating an architecture of a video processing model according to an embodiment of the present application;

fig. 8 is a schematic view of a video processing scene according to an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating processing logic of a video processing model according to an embodiment of the present application;

fig. 10 is a schematic flowchart of a video processing method according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 12 is a schematic hardware configuration diagram of a video processing apparatus according to an embodiment of the present disclosure.

Detailed Description

In the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same or similar items having substantially the same function and action. For example, the first terminal device and the second terminal device are only used for distinguishing different terminal devices, and the sequence order of the terminal devices is not limited. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.

It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

In a virtual reality Live (VR Live) scene, in order to achieve a Live effect of high definition and a large field angle, a panoramic video stream is often obtained by splicing single-channel video streams acquired by a plurality of camera devices; in order to achieve a higher value-added effect, an image or video advertisement may need to be merged into a VR video, so that the quality of the processed video needs to be detected, and video parameters and the like need to be modified according to a quality detection result.

The method and the device can be applied to application scenes such as live broadcast and the like which need video synthesis. Fig. 1 shows, by way of example, one possible application scenario of the embodiment of the present application, where a plurality of image capturing apparatuses 11, a video processing apparatus 12 for executing the video processing method of the embodiment of the present application, and an optional display apparatus 13 may be included in the application scenario.

In one embodiment, a plurality of image capturing devices 11 (also referred to as first terminal devices) may be installed in a live broadcast site, each image capturing device 11 may capture a video stream of a respective area, each image capturing device 11 may send the respective captured video stream to the video processing device 12, the video processing device 12 may synthesize the video streams from the plurality of image capturing devices 11, detect the quality of the synthesized video streams, obtain a processing result for reflecting the video quality, and further may adjust the synthesized video and the like according to the processing result to obtain a video meeting the current quality requirement. In a possible implementation, the adjusted video stream or the unadjusted composite video stream may also be sent to the display device 13 (which may also be referred to as a second terminal device) to display the video content in the display device 13.

The detection of the quality of the synthesized video plays an important role in subsequent video processing and the like. In possible implementations, the method for detecting the quality of the synthesized video may include a subjective video quality assessment method and an objective video quality assessment method.

For example, the subjective video quality assessment method may select people as judgers of video quality, for example, a certain number of experimenters (people) are selected first, the experimenters are required to observe a video to be assessed, and then visual video assessment results are given. The evaluation scores of all experimenters were recorded, after which the overall average of the evaluation scores was found. The obtained value is the subjective score of video quality, and this average value can also be called "mean opinion score" (MOS). According to different test methods, subjective video quality evaluation methods can be divided into three categories: performing quality test, wherein an experimenter directly gives an intuitive quality grade of a video to be evaluated; performing damage testing, wherein an experimenter evaluates the damage degree of the video by observing the video to obtain the damage degree of the video; and comparing and testing, wherein an experimenter observes the video to be evaluated and the reference video to obtain the contrast quality of the video to be evaluated and the reference video.

Illustratively, the objective video quality evaluation method may be to find a suitable calculation model, and automatically output the evaluation result of the video quality through the calculation model, so that the final evaluation result of the video quality matches the result visually perceived by human eyes as much as possible. For example, in the implementation of the objective video quality evaluation method, specific information of a video is obtained first, and then relevant information of the statistical video is analyzed by using a computational model, so that some data are obtained to reflect the video quality level. Widely used data includes Mean Squared Error (MSE), signal-to-noise ratio (SNR), peak signal-to-noise ratio (PSNR), Root Mean Squared Error (RMSE), and the like. In some possible implementations, human visual characteristics may be integrated into the objective video quality evaluation method to research a video objective quality evaluation method that matches the subjective evaluation result as much as possible, and the video objective evaluation method may include a full-reference method, a partial-reference method (which may also be referred to as a half-reference method), and a no-reference method. In the full-reference method, an undistorted original image needs to be provided, and a quality evaluation result of the synthesized image is obtained by comparing the undistorted original image with the synthesized image. In the partial reference method, the quality evaluation result of the synthesized image is obtained by comparing the reference features extracted from the original image as a reference with the synthesized image. In the no-reference method, the quality of an image is estimated based on the self-characteristics of the synthesized image without a reference image.

However, in the above method, the subjective video quality assessment method uses a large number of people as observers to obtain the quality scores of the videos, so the assessment results are directly subjective feeling of human eyes, and thus the assessment results have strong referential property (because the video quality is required to meet the subjective experience of the user). However, the video evaluation work of the subjective video quality assessment method is manually completed, so that the following problems exist: a large number of experimenters are needed, and the experimenters need to be trained to obtain accurate evaluation results, which consumes a large amount of manpower and material resources and is long in time; the experimenters have large individual difference and different responses to the experimenters in the same video, and meanwhile, the subjective feeling of the experimenters is subjected to deviation, and the video quality evaluation result is unreliable; the experiment process cannot be monitored in real time due to artificial participation; the final video quality score is obtained by averaging the results of the experimenters, and the problem cannot be accurately judged when the score is low.

The objective video quality evaluation method does not need human participation, has the characteristics of simplicity, rapidness and low implementation cost, and is widely used in video quality evaluation. However, the video quality result obtained by the objective video quality evaluation method does not consider the subjective feeling of people, so that the result may not be consistent with the subjective evaluation result. The complex and huge calculation amount of the full-reference method in objective video quality evaluation makes the method difficult to realize in practical application. In the partial reference method, the reference features extracted from the original video content are used as the information features of the original video, but the reference features change along with the continuous change of the video content, so that the feature value difference is difficult to be corresponding to the evaluation level. The evaluation result of the no-reference method is relatively low in goodness of fit with human eye subjective feeling, and the realization difficulty is high.

Based on this, the embodiment of the present application provides a video processing method, and a Mobile Edge Computing (MEC) device may obtain information reflecting video quality by combining subjective and objective parameters, and may obtain information reflecting video quality more accurately.

Illustratively, in conjunction with fig. 1, the video processing device is a mobile edge computing device, and then the mobile edge computing device 12 receives a plurality of first video streams from a plurality of first terminal devices 11; the mobile edge computing device 12 obtains a second video stream, a background image, and description parameters from the plurality of first video streams; the second video stream is a video stream synthesized according to the plurality of first video streams, the background image is obtained by performing foreground and background processing on the second video stream, and the description parameters are used for describing scene information corresponding to the second video stream; the mobile edge computing device 12 determines values for a plurality of target metrics from the plurality of first video streams, the second video stream, the background image, and the description parameters; the plurality of target indicators are related to human factors engineering indicators and objective video quality indicators; the mobile edge computing device 12 performs weighted computation on the values of the plurality of target indexes to obtain a processing result of the second video stream; the processing result is used to reflect the quality of the second video stream. In the embodiment of the application, the target index is related to both the human factor engineering index and the objective video quality index, and the subjective parameter and the objective parameter can be simultaneously combined to obtain the processing result reflecting the video quality, so that the processing result is more accurate compared with the objective video quality evaluation method, and compared with the subjective video quality evaluation method, the video processing method does not need real-time participation of people and can be automatically realized, so that the video processing method provided by the embodiment of the application is easy to realize and can obtain the more accurate processing result reflecting the video quality.

The method of the embodiment of the present application may be applied to a Mobile Edge Computing (MEC) system, and the MEC system may be deployed in a Long Term Evolution (LTE) system, a fifth Generation mobile communication (5Generation, 5G) system, or a future mobile communication system.

MEC runs at the edge of the network, is logically independent of other parts of the network, and is suitable for applications with higher security requirements, and MEC devices generally have higher computing power and are therefore suitable for analyzing and processing large amounts of data (e.g., video data in embodiments of the present application). Meanwhile, as the MEC is generally closer to the user or the information source geographically, the time delay of the network for responding to the user request is greatly reduced, and the possibility of network congestion occurring in the transmission network and the core network is also reduced. And the MEC at the edge of the network can acquire network data such as base station Identification (ID), available bandwidth and the like and information related to the user location in real time, so as to perform link-aware adaptation, and provide a deployment possibility for location-based applications, which can greatly improve the service quality experience of the user.

The MEC device may be deployed at multiple locations of the network, for example, at an LTE macro base station (eNode B) side, a 3G Radio Network Controller (RNC) side, a multi-radio access technology (multi-RAN) cellular aggregation point side, or a core network edge, and the like, and the specific deployment of the MEC in the embodiments of the present application is not limited.

Exemplarily, fig. 2 and 3 show schematic views of a scenario in which the MEC is deployed at a Radio Access Network (RAN) side or at a position close to the RAN.

As shown in fig. 2, the MEC device 23 may be deployed behind a convergence node of a plurality of base stations 22 on the RAN side, and a plurality of first terminal devices 21 may access the MEC device 23 through the plurality of base stations 22.

As shown in fig. 3, the MEC equipment 33 may also be deployed behind a single base station 32, and multiple MEC equipment 33 may be aggregated into a single aggregation node, which is suitable for deployment of MECs in hot spot areas such as schools, shopping malls, stadiums, etc.

The advantage of deploying the MEC device on the RAN side is that it is more convenient to obtain the base station side wireless related information by monitoring and analyzing the signaling of the S1 interface, and provide a low-latency localized service. Through MEC equipment, not only can effectively reduce the network load of core network, can also provide the VR experience of high real-time, low time delay through the deployment of localization.

Exemplarily, fig. 4 shows a schematic view of a scenario in which the MEC is deployed on the core network side.

As shown in fig. 4, the MEC device 43 may be deployed at the edge of the core network, for example, after (or integrated with) a packet data network gateway (PGW), a data service initiated by the first terminal device 41 reaches the internet through the base station 42, the aggregation node, the SGW, and the PGW + MEC device.

It can be understood that, with the development of a communication system, the deployment of the MEC device may also be changed along with the development of the communication system, and the embodiment of the present application does not limit the specific deployment of the MEC device, and the MEC device may be used to execute the video processing method according to the embodiment of the present application no matter in which scene the MEC device is deployed.

Some words of the embodiments of the present application are described below. The description is for the purpose of better illustrating the examples of the present application and should not be taken as an absolute limitation on the terminology used in the examples of the present application.

The mobile edge computing device described in the embodiment of the present application may be a device for implementing the MEC function, and may include, for example, an independent device such as a server or a network element, or may include a plurality of devices that implement the MEC function together.

The first terminal device described in the embodiments of the present application may have an image pickup function and a communication function. Illustratively, the first terminal device may be a camera device having a capability of communicating with the MEC device. Or, for example, the first terminal device may include a separate image capturing device and a separate electronic device communicating with the image capturing device, and the image capturing device may capture the video stream and then transmit the video stream to the MEC device via the electronic device. For example, the first terminal device may include one or more of a smart camera, a computer, a cell phone, a tablet, a wearable device, and the like. The number of the first terminal devices may be multiple, and the specific form and number of the first terminal devices are not limited in this embodiment of the application.

The second terminal device described in the embodiments of the present application may have a display function and a communication function. Based on the communication function, the second terminal device may receive data such as a video stream from the MEC device, and based on the display function, the second terminal device may display the content of the video stream. Illustratively, the second terminal device may include one or more of a computer, a cell phone, a tablet, a wearable device, a television, and the like. The number of the second terminal devices may be one or more, and the specific form and number of the second terminal devices are not limited in the embodiments of the present application. In a specific implementation, the second terminal device may be the same as or different from the first terminal device.

The first video stream described in the embodiment of the present application may be an original video stream captured by the first terminal device. Taking the first terminal device as a 360-degree multi-channel camera parallel to the ground as an example, 360-degree surround shooting generates multiple (e.g., 4-18) video streams, which may be referred to as a first video stream.

The second video stream described in the embodiments of the present application may be a video stream obtained by synthesizing a plurality of first video streams. Illustratively, video streams shot by multiple cameras are sent to VR content splicing generation equipment (e.g., MEC equipment) and combined into one VR360 panoramic video stream, and the VR360 panoramic video stream may be referred to as a second video stream.

The background image described in the embodiment of the present application may include a static background, and the background image may be obtained by performing foreground and background processing on the second video stream. For example, the second video stream may include a static background and a dynamic object (or a dynamic person, etc.), and the moving object and the static background may be identified based on a general foreground and background model, because the moving of the position of the moving object may cause a ghost and a shake during video composition, and therefore, the ghost, the shake, etc. in the second video stream may be quantitatively represented by extracting parameters of the background image, so that the ghost and the shake may be listed as influencing factors for detecting the video quality, which is beneficial to obtain a more accurate video quality detection result.

The description parameters described in the embodiments of the present application are used to describe scene information corresponding to the second video stream. The context information may include, for example, one or more of the following: scene number, scene type, video stream brightness range, video stream shooting angle, long shot, advance shot, match scene or static scene, etc. The scene information may be obtained by the MEC device based on a general scene judgment model, or may be sent to the MEC device by using the user device after the user determines the description parameters.

The target index described in the embodiments of the present application is related to a human factor engineering index and an objective video quality index. The human factor engineering index is related to subjective reaction of a person on video quality and the like, the objective video quality index is related to objective quality of the video, and specific contents of the target index, the human factor engineering index and the objective video quality index are not limited in the embodiment of the application. The specific possible implementation manners of the target index, the human factor engineering index and the objective video quality index will be described in detail in the following implementation, and will not be described herein.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following embodiments may be implemented independently or in combination, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 5 is a schematic flowchart of a video method according to an embodiment of the present application, including the following steps:

s501: a mobile edge computing device (MEC device) receives a plurality of first video streams from a plurality of first end devices.

In this embodiment of the application, a plurality of first terminal devices may be arranged at different locations, and a plurality of first terminal devices may acquire first video streams at different locations and send a plurality of first video streams to the MEC device, and adaptively, the MEC device may receive a plurality of first video streams from a plurality of first terminal devices.

For example, referring to the corresponding scene diagram of fig. 2, the plurality of first terminal devices may send the plurality of first video streams to the MEC device based on the plurality of base stations. In a possible implementation manner, when multiple first terminal devices are served by the same base station, the multiple first terminal devices may also send multiple first video streams to the MEC device based on one base station.

In possible implementation, for example, a live scene of a ball game is taken as an example, a plurality of first terminal devices may be arranged at a plurality of positions in a ball game field, and the plurality of first terminal devices may record a plurality of first video streams at different positions and angles of the ball game and send the plurality of first video streams to the MEC device.

S502: the mobile edge computing device obtains a second video stream, a background image, and description parameters from the plurality of first video streams.

In this embodiment of the application, the MEC device may perform processing such as splicing and merging on the plurality of first video streams in a normal video composition manner, so as to obtain the second video stream. The MEC device may perform foreground and background recognition on the second video stream by using a normal foreground and background recognition model to obtain a background image. The MEC device may identify a plurality of first video streams or second video streams to obtain description parameters describing scene information corresponding to the second video streams, or the MEC device may receive the description parameters from other devices.

S503: the mobile edge computing device determines values of a plurality of target metrics from the plurality of first video streams, the second video stream, the background image, and the description parameters; the plurality of target indicators are related to a human factor engineering indicator and an objective video quality indicator.

In an embodiment of the application, the MEC device may extract a parameter related to video quality from the plurality of first video streams, the plurality of second video streams, the plurality of background images, and the plurality of description parameters, and further determine a value of a target index related to a human factor engineering index and an objective video quality index.

In a possible implementation manner, the MEC apparatus may employ a parameter acquisition method in a general non-reference method and a reference acquisition method to acquire a parameter related to video quality in the plurality of first video streams, the plurality of second video streams, the plurality of background images, and the plurality of description parameters, where the parameter related to video quality may include, for example, a parameter related to subjective evaluation and a parameter related to objective evaluation, and the like, and this is not particularly limited in this embodiment of the present application.

In a possible implementation manner, after extracting parameters related to video quality from the plurality of first video streams, the plurality of second video streams, the plurality of background images, and the plurality of description parameters, the MEC device may perform weighted calculation or average calculation on values of the parameters, and obtain values of the plurality of target indexes.

In a possible implementation manner, after extracting parameters related to video quality from the plurality of first video streams, the plurality of second video streams, the plurality of background images, and the plurality of description parameters, the MEC device may further classify, calculate, and the like the parameters to obtain a human factor engineering index and an objective video quality index, and perform weighting operation or other operations based on values of the human factor engineering index and the objective video quality index to obtain values of the plurality of target indexes.

S504: the mobile edge computing equipment performs weighted computation on the values of the target indexes to obtain a processing result of the second video stream; the processing result is used to reflect the quality of the second video stream.

In the embodiment of the present application, the weight of each target index may be set based on experience, or may be obtained based on machine learning, and the weight of each target index is not limited in the embodiment of the present application.

In a possible implementation manner, when performing weighted calculation on values of a plurality of target indexes, the weighted calculation may be based on a linear function, or may be based on other weighted calculation functions, which is not limited in the embodiment of the present application.

In a possible implementation manner, the processing result of the second video stream may be a specific numerical value or an identifier used to indicate a video quality level, and the embodiment of the present application does not limit a specific form of the processing result.

In a possible implementation manner, after the processing result of the second video stream is obtained, the processing result may be sent to a device for managing and controlling video quality, and a user may refer to the processing result to adjust the effect of video stream composition in the device for managing and controlling video quality by adjusting parameters of video stream composition and the like. For example, if the processing result indicates that the video quality of the current second video stream is poor, the method for synthesizing the second video stream may be adjusted to obtain a second video stream with better quality. If the processing result indicates that the video quality of the current second video stream is good, the method of synthesizing the second video stream can be maintained. It can be understood that the process of adjusting the video stream composition manner by referring to the processing result may be continuously executed, for example, the MEC device may obtain a new processing result according to a certain period and send the new processing result to the device for managing and controlling video quality, and periodically execute the above step of adjusting the method of composing the second video stream.

In a possible implementation manner, after the processing result of the second video stream is obtained, the MEC device may adjust a manner of synthesizing the second video stream based on the processing result, and further check the adjusted second video stream by using the manner of obtaining the processing result of the second video stream, so as to obtain the second video stream with better quality.

In summary, when the MEC device performs video processing, a subjective evaluation method and an objective evaluation method are integrated, and the processing result can be more accurate compared with the objective video quality evaluation method, and compared with the subjective video quality evaluation method, the MEC device does not need real-time participation of people and can be automatically implemented.

On the basis of the corresponding embodiment in fig. 5, in a possible implementation manner, the determining, by the mobile edge computing device in S503, values of a plurality of target indexes according to a plurality of first video streams, second video streams, a background image, and the description parameter includes:

the mobile edge computing device collects values of a plurality of first parameters from the plurality of first video streams, the second video stream, the background image, and the description parameters; the first parameters comprise general objective video parameters acquired from the second video stream, parameters reflecting video processing quality acquired from the second video stream, a plurality of first video streams and background images, parameters reflecting video processing algorithm quality, interaction parameters and user side parameters; the mobile edge computing device determines values of a plurality of target metrics for the second video stream score based on the values of the plurality of first parameters.

In the embodiment of the application, the general objective video parameters may be acquired by using a non-reference mode in a general non-reference video evaluation method, and the general objective video parameters may be used to reflect an objective quality condition of the second video stream.

The parameters reflecting the video processing quality collected from the second video stream and the plurality of first video streams and the background image may be collected by using a reference mode in a common reference video evaluation method, and the parameters may be used for reflecting the deformation of the first video stream to the second video stream and the like.

The parameter reflecting the quality of the video processing algorithm may be obtained from an algorithm used when synthesizing the plurality of first video streams, and the parameter reflecting the quality of the video processing algorithm may be used to reflect the accuracy or quality of the algorithm, or the like.

The interaction parameters may be derived by the MEC device based on network conditions and the second video stream acquisition. The interaction parameter may be related to a network condition and an objective video request condition of the second video stream. For example, the interaction parameters may be used to reflect the user's subjective perception of the video, as well as the delay, pause, etc. in the video transmission.

The user-side parameter may be sent by the first end device to the MEC device, or may be input by the user through another device. The user-side parameters may reflect parameters related to the user side, such as the size of the screen and the viewing distance when the user views the screen.

In a possible implementation manner, when the mobile edge computing device determines the values of the multiple target indexes for the second video stream scoring according to the values of the multiple first parameters, the values of the first parameters may be respectively assigned with weight values, and the weight values may be obtained by computing using a linear function or another function.

In a possible implementation, the mobile edge computing device determining values of a plurality of target metrics for the second video stream scoring from values of a plurality of first parameters includes: the mobile edge computing device determining values of a plurality of first video metrics from the values of the plurality of first parameters; the first video index comprises an objective video quality index and a correlation index determined according to a correlated parameter in the plurality of first parameters; the mobile edge computing device determines values of a plurality of target metrics from the values of the plurality of first video metrics.

In the embodiment of the application, when the values of the multiple target indexes are determined, part of the associated parameters in the first parameter can be divided into the associated indexes again, and then the values of the multiple target indexes are determined according to the associated indexes and the objective video quality indexes, so that on one hand, the calculated amount can be saved, on the other hand, the associated indexes can comprehensively reflect the quality condition of the video, and further, when the values of the multiple target indexes are determined by using the associated indexes, more accurate values of the target indexes can be obtained.

In a possible implementation, the mobile edge computing device determining values of a plurality of target metrics according to values of a plurality of first video metrics includes: the mobile edge computing equipment determines the value of the human factor engineering index according to the values of the plurality of first video indexes and the first model corresponding to each first video index; the first model is used for outputting a human factor engineering related value corresponding to the first video index by using the value of the first video index; the mobile edge computing device determines values of a plurality of target indicators based on the values of the ergonomic indicator and the values of the objective video quality indicator.

In this embodiment of the present application, a first video index may correspond to a first model, where the first model is used to output a human factor engineering related value (for example, user satisfaction, etc.) corresponding to the first video index when a value of the first video index is input, and the first model may be obtained through a user experiment or may be obtained through machine learning, which is not specifically limited in this embodiment of the present application.

In a possible implementation, the correlation index includes one or more of the following: image sharpness related to resolution and/or screen size and/or viewing distance, image fineness related to peak signal-to-noise ratio and/or information entropy, detail recognizability related to brightness contrast threshold and/or texture detection threshold and/or inter-frame brightness difference, seam invisibility related to structure similarity and/or edge difference spectral information, brightness imbalance related to brightness contrast threshold and/or saturation, operation and response speed related to head screen latency and/or viewpoint transition speed and/or video processing speed, ghosting related to foreground region area and/or foreground region motion speed and/or edge difference spectral information, or distortion related to structure similarity and/or streak similarity.

In possible implementations, the ergonomic indicator includes one or more of: video information content, operation and response experience, visual information fidelity or video fluency; wherein the video information content is related to image definition, image fineness and/or detail recognizability; the operation and response experience is related to the operation freedom richness, the operation and response speed and/or the operation accuracy; visual information fidelity is related to seam invisibility, brightness imbalance, ghosting and/or distortion; video fluency is related to the cadence duration, cadence frequency, frame stability, and/or frame rate.

In a possible implementation, the generic objective video parameters include one or more of the following parameters calculated by reference-free objective evaluation of the second video stream: resolution, frame rate, entropy, saturation, peak signal-to-noise ratio, brightness contrast threshold, or texture detection threshold.

In a possible implementation, the parameters reflecting the video processing quality include: the method comprises the steps of obtaining the image stability through non-reference subjective evaluation, and obtaining one or more of the following parameters, namely peak signal-to-noise ratio, structural similarity, edge difference spectrum information, information entropy, inter-frame brightness difference, foreground area movement speed or texture similarity, through reference objective evaluation calculation on a second video stream, a plurality of first video streams and a background image.

In a possible implementation, the parameters reflecting the quality of the video processing algorithm include one or more of the following parameters calculated by using a non-reference objective evaluation: algorithm temporal complexity or algorithm spatial complexity.

In a possible implementation, the interaction parameters include one or more of the following parameters: the method comprises the steps of obtaining sense of reality by adopting non-reference subjective evaluation, obtaining operation accuracy by adopting the non-reference subjective evaluation, obtaining first screen delay by adopting the non-reference objective evaluation, obtaining a field angle by adopting the non-reference objective evaluation, obtaining a pause time length by adopting the non-reference objective evaluation, obtaining a pause frame rate by adopting the non-reference objective evaluation or obtaining operation freedom richness by adopting the non-reference objective evaluation.

In a possible implementation manner, the user-side parameter includes one or more of the following parameters: a screen size obtained by the reference-free objective evaluation or a viewing distance obtained by the reference-free objective evaluation.

In possible implementations, the objective video quality indicator includes one or more of the following: frame rate, field of view, realism, resolution, peak signal-to-noise ratio, or entropy.

In possible implementations, the target indicators include one or more of the following: viewing experience, interactive experience, or objective parameters; the viewing experience is related to video information amount, visual information fidelity, video fluency and/or sense of reality, the interaction experience is related to field angle and/or operation and response experience, and the objective parameters are related to resolution, frame rate, peak signal-to-noise ratio and/or information entropy.

In possible implementations, the description parameters include one or more of the following: scene number, scene type (e.g., stage scene or audience scene), video stream luminance range, video stream capture angle, long shot, short shot, match scene, or static scene.

Illustratively, table 1 shows the correspondence and the source between the above parameters.

TABLE 1

On the basis of the embodiment corresponding to fig. 5 and any possible embodiment described above, in a possible implementation manner, after S504, the method further includes: and the mobile edge computing equipment sends the processing result and the second video stream to the second terminal equipment. In this way, the second terminal device can play the second video stream and/or adjust the second video stream with reference to the processing result.

On the basis of the embodiment corresponding to fig. 5 and any possible embodiment described above, in a possible implementation manner, after S504, the method further includes: the mobile edge computing device adjusts the second video stream according to the processing result to obtain an adjusted video stream; and the mobile edge computing equipment sends the adjusted video stream to the second terminal equipment. Therefore, the MEC equipment can adjust the video stream with better quality according to the processing result, further send the video stream with better quality to the second terminal equipment, and display the video stream with better quality in the second terminal equipment.

On the basis of the embodiment corresponding to fig. 5, in a possible implementation manner, S502 to S504 and any possible implementation manner may be performed by the MEC apparatus using a video processing model. The video processing model, which may be a neural network model or other possible machine learning model, may be derived based on machine learning and provided in the MEC device.

By way of example, FIG. 6 illustrates a conceptual diagram of one possible training video processing model. As shown in fig. 6, subjective parameters (for reflecting human evaluation on video) and objective parameters (for reflecting objective quality of video) may be respectively counted, new individual indicators (for example, subjective indicators) may be generated according to the subjective parameters and the objective parameters, objective calculation models may be respectively established for part of the objective parameters and the new individual indicators, the objective calculation models may output human-related engineering values based on the part of the objective parameters and the new individual indicator values, further perform weighted assignment and the like for the part of the objective parameters and the human-related engineering values corresponding to the new individual indicators, to obtain processing results of the video, and the process may be iterated to train to obtain a video processing model (or referred to as a quality evaluation model).

Based on the concept of fig. 6, fig. 7 shows an architectural schematic of a possible video processing model. As shown in fig. 7, the video processing model may include: the system comprises a parameter acquisition module, an index redefinition and classification module, a user experiment module, a link division index model construction module and a model integration module.

When the video processing model is trained, the video stream, the background image and the description parameter obtained according to the video stream can be input into the parameter acquisition module.

The parameter acquisition module acquires a first parameter, inputs the first parameter into the index and then defines and classifies the first parameter. The specific value of the first parameter will be specifically described in the following section, and will not be described herein.

The index redefining and classifying module redefines or classifies part of the indexes in the first parameters to obtain new indexes (related indexes), inputs part of original indexes (or called old indexes or objective video quality indexes), the new indexes and/or part of original parameters (or called old parameters) in the first parameters into the user experiment module for user experiments, and obtains weighted values (or called statistical data) of the indexes or parameters in video quality detection through the user experiments.

The user experiment module outputs statistical data to the sub-link index model building module, and the index redefinition and classification module can also input the first video index (for example, new and old indexes or new indexes and old parameters) into the sub-link index model building module.

The sub-link index model building module may train the processing results corresponding to each index, for example, in one training, the sub-link index model building module may keep the values of other indexes unchanged, change the value of only one of the indexes, and further build a model of the input index value-output processing result corresponding to the index, and the sub-link index model building module may further redefine and classify the index input index to the module for the index model building of the next link.

And repeatedly executing the steps of the index redefinition and classification module, the user experiment module and the sub-link index model construction module to obtain a processing result which accords with an actual video quality result.

The sub-link index modeling construction module can output the processing results corresponding to each index to the model integration module so as to obtain a final quality evaluation model.

In a possible implementation manner, the user experiment module may not be set, and a default weight of each index or parameter in the video quality detection may be adopted, or a weight of each index or parameter in the video quality detection may be manually specified, so as to obtain a weight value of each index or parameter in the video quality detection.

If the video processing model in fig. 7 is set in the MEC device, the video processing method in the embodiment of the present application may be implemented based on the MEC device, and fig. 8 shows a scene diagram of implementing the video processing method in the embodiment of the present application based on the MEC device.

As shown in fig. 8, a scene in which the video processing method according to the embodiment of the present application is implemented based on an MEC device may include a first terminal device (such as an image capture device, a VR camera, and the like), a second terminal device (such as a VR video playing device, and the like), a base station, and an MEC device.

The first terminal device collects video streams and transmits the video streams to the base station.

And the base station uplink video flows to a coding and decoding module of the MEC equipment.

And the coding and decoding module transmits the decoded video streams to the multi-channel video processing module for splicing, fusion and other processing. And a parameter acquisition module for transmitting the decoded video stream to a quality scoring model.

And the multi-channel video processing module inputs the spliced and fused video stream into a parameter acquisition module in the quality scoring model. And a scene judging and modeling module for inputting the spliced and fused video stream into a video processing model.

The scene judging and modeling module outputs the background image and the description parameters to the parameter acquisition module.

And the parameter acquisition module is used for redefining and classifying the acquired parameter input indexes.

The index redefining and classifying module inputs the updated index into the model integrating module.

The model integration module inputs the processed video stream and the quality detection processing result into the coding and decoding module.

And the coding and decoding module is used for descending the coded video stream and/or the quality detection processing result to the base station.

And the base station transmits the video stream and/or the quality detection processing result to the second terminal equipment.

In the scene schematic diagram of fig. 8 for implementing the video processing method according to the embodiment of the present application based on the MEC device, for example, fig. 9 shows a specific implementation schematic diagram according to the embodiment of the present application. In fig. 9, a line 1 represents a step of constructing a model, a line 2 represents a flow direction of a data stream when the model is used, and a line 3 represents input and output of a parameter or an index.

As shown in fig. 9, in the construction of a video processing model (or referred to as a quality evaluation model), a video stream collected by a camera may be subjected to splicing, fusion, and other processes to obtain a processed video stream.

The processed video stream enters a scene judging and modeling module, and in possible implementation, if static objects in the picture are not changed, the same scene can be judged. If the scene is not established, a new foreground and background model is established, and the foreground and the background in the picture are distinguished through the foreground and background model, for example, if a person walks on a road, the road and the blue sky are the background and the artificial foreground. The background image and the description parameters can be obtained through the foreground and background model.

And inputting the video stream acquired by the camera, the processed video stream, the background image and the description parameters into an acquisition module. Reference acquisition methods and non-reference acquisition methods may be employed in the acquisition module. The non-reference acquisition method only acquires the processed video stream to obtain non-reference acquisition parameters. The reference surface acquisition method can acquire the video stream acquired by the camera, the processed video stream, the background image and the description parameters to obtain the reference acquisition parameters. The method for acquiring video quality with reference and the method for acquiring without reference are relatively general methods, and are not repeated herein, and different from the general method for acquiring with reference, the method for acquiring with reference according to the embodiment of the application adds the background image and the description parameters to the data acquired by the method for acquiring with reference, so that the parameters related to ghost and jitter can be effectively acquired based on the background image and the description parameters, and more accurate video quality detection can be performed subsequently by combining with the ghost and the jitter.

For example, the reference acquisition parameters and the non-reference acquisition parameters acquired by the acquisition module are both referred to as first parameters (e.g., 27 first parameters in table 1), and the first parameters may include: the video processing method comprises the steps of acquiring general objective video parameters from a processed video stream (a second video stream), acquiring parameters reflecting video processing quality from the second video stream, a plurality of first video streams (video streams acquired by a camera) and background images, acquiring interactive parameters from description parameters and acquiring user side parameters from the description parameters.

The reference acquisition parameters and the non-reference acquisition parameters can be input into the index S-type modeling module, some important first parameters are reserved and become independent indexes, the indexes are redefined for some related first parameters to obtain related indexes, some of the related indexes are negatively influenced in several related parameters, so that the redefinition of negative parameters is needed, for example, the negative parameters are redefined as positive parameters, for example, a negative parameter redefinition function is f' (x) <0, and the front section of a multidimensional function of the related redefinition indexes is a concave function and the rear end of the concave function is a convex function, so that the normalization parameters for constructing the S model can be obtained. In the embodiment of the application, all the obtained independent indexes and associated indexes are collectively referred to as single indexes (or referred to as first video indexes, for example, 14 first video indexes in table 1), and a user scoring experiment is performed on each single index and an S-type model is established. Here, the S-type model of the independent index is represented as an S-type curve (because the index of the independent index is in one-dimensional relation with the satisfaction), and the S-type model of the associated redefined index is represented as a multi-dimensional (because the associated redefining is in multi-bit relation of a plurality of indexes with one satisfaction), for example, a 3-dimensional or even a 4-dimensional model.

Because the single indexes cannot intuitively reflect the impression of the user, the human factor engineering index statistics can be carried out. The single index is classified as a human factor engineering index: the method comprises the steps of carrying out N (N can be a natural number larger than 1, such as 4) times of user subjective evaluation experiments aiming at human factor engineering indexes, and carrying out data statistics on user experiment results to complete assignment of single index weight.

The human factor engineering index is mainly used for reflecting the viewing experience of the user, which is a very important part, but cannot completely determine the final video quality, so that the final decisive index (or called target index, such as 3 target indexes in table 1) can be counted, for example, the individual and human factor engineering indexes are classified into three target indexes, namely viewing experience, interactive experience and video parameter.

And carrying out weight assignment on three decisive indexes of viewing experience, interactive experience and video parameters according to actual scene requirements, wherein default values of 0.6,0.4 and 0.1 can also be adopted. Finally, a complete evaluation model (or referred to as a video processing model) is obtained, so that the video processing model related to both subjective parameters and objective parameters can be obtained in the embodiment of the present application, and then the video quality evaluation can be performed by using the video processing model and using the path of line 2 in fig. 9.

Illustratively, fig. 10 shows a schematic flow chart of video processing using a video processing model according to an embodiment of the present application.

As shown in fig. 10, in the embodiment of the present application, a subjective and objective evaluation method is fused, first, a first collected video stream is subjected to a synthesis process to obtain a second video stream (or referred to as a processed video stream), a background image and a description parameter are obtained according to the second video stream, then, a reference parameter extraction module and a non-reference parameter extraction model are utilized to extract a first parameter according to the first video stream, the second video stream, the background image and the description parameter, a part of associated parameters in the first parameter is defined as an associated redefined index (or referred to as an associated index), a single index is obtained by combining an independent index in the first parameter, a part of live broadcast single indexes is classified as a human factor engineering index according to an S-type model of each single index, a part of indexes in the single index is retained, and is assigned with the human factor engineering index in a weighting manner to classify the live broadcast single index into a target index, and carrying out weight assignment on each target index, and calculating to obtain a processing result for reflecting the video quality.

Therefore, when video processing is performed on the video processing model of the MEC device in the embodiment of the application, subjective and objective evaluation methods are fused, the processing result can be more accurate compared with an independent objective video quality evaluation method, and compared with an independent subjective video quality evaluation method, the method can be automatically realized without real-time participation of people, so that the video processing method in the embodiment of the application is easy to realize and can obtain an accurate processing result reflecting the video quality.

The video processing method according to the embodiment of the present application has been described above, and the apparatus for performing the video processing method according to the embodiment of the present application is described below. Those skilled in the art will understand that the method and apparatus can be combined and referred to each other, and the video processing apparatus provided in the embodiments of the present application can perform the steps in the video processing method.

As shown in fig. 11, fig. 11 shows a schematic structural diagram of a video processing apparatus provided in this embodiment of the present application, where the video processing apparatus may be an MEC device in this embodiment of the present application, and may also be a chip or a chip system in the MEC device. The video processing apparatus includes: a communication unit 1103 configured to receive a plurality of first video streams from a plurality of first terminal devices; a processing unit 1101 configured to obtain a second video stream, a background image, and description parameters from a plurality of first video streams; the second video stream is a video stream synthesized according to the plurality of first video streams, the background image is obtained by performing foreground and background processing on the second video stream, and the description parameters are used for describing scene information corresponding to the second video stream; a processing unit 1101, further configured to determine values of a plurality of target indicators from the plurality of first video streams, the second video stream, the background image and the description parameter; the plurality of target indicators are related to human factors engineering indicators and objective video quality indicators; the processing unit 1101 is further configured to perform weighted calculation on the values of the multiple target indexes to obtain a processing result of the second video stream; the processing result is used to reflect the quality of the second video stream.

Illustratively, taking the video processing apparatus as an MEC device or a chip or chip system applied in the MEC device as an example, the processing unit 1101 is configured to support the video processing apparatus to execute S502-S504 and the like in the above embodiments.

In one possible implementation, the video processing apparatus may further include: a memory unit 1102. The storage unit 1102 may include one or more memories, which may be devices in one or more devices or circuits for storing programs or data.

The storage unit 1102 may be separate and connected to the processing unit 1101 through a communication bus. The storage unit 1102 may also be integrated with the processing unit 1101.

Taking a chip or a chip system of the video processing apparatus, which may be a terminal device in the embodiment of the present application, as an example, the storage unit 1102 may store a computer-executable instruction of a method of the terminal device, so that the processing unit 1101 executes the method of the MEC device in the embodiment described above. The storage unit 1102 may be a register, a cache memory, a Random Access Memory (RAM), or the like, and the storage unit 1102 may be integrated with the processing unit 1101. The storage unit 1102 may be a read-only memory (ROM) or other type of static storage device that may store static information and instructions, and the storage unit 1102 may be separate from the processing unit 1101.

The apparatus of this embodiment may be correspondingly used to perform the steps performed in the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 12 is a schematic hardware configuration diagram of a video processing apparatus according to an embodiment of the present disclosure. Referring to fig. 12, the network management apparatus includes: memory 1201, processor 1202. The communication device may further comprise interface circuitry 1203, wherein the memory 1201, the processor 1202, and the interface circuitry 1203 may communicate; illustratively, the memory 1201, the processor 1202 and the interface circuit 1203 may communicate via a communication bus, and the memory 1201 is used for storing computer-executable instructions and is controlled by the processor 1202 to execute, thereby implementing the video processing method provided by the following embodiments of the present application.

In a possible implementation manner, the computer execution instructions in the embodiment of the present application may also be referred to as application program codes, which is not specifically limited in the embodiment of the present application.

Optionally, the interface circuit 1203 may also include a transmitter and/or a receiver.

Optionally, the processor 1202 may include one or more CPUs, and may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor, or in a combination of the hardware and software modules in the processor.

The embodiment of the application also provides a computer readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer storage media and communication media, and may include any medium that can communicate a computer program from one place to another. A storage medium may be any target medium that can be accessed by a computer.

In one possible implementation, the computer-readable medium may include RAM, ROM, a compact disk read-only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and Disc, as used herein, includes Disc, laser Disc, optical Disc, Digital Versatile Disc (DVD), floppy disk and blu-ray Disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above embodiments are only for illustrating the embodiments of the present invention and are not to be construed as limiting the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the embodiments of the present invention shall be included in the scope of the present invention.

It should be noted that the abnormal access behavior described in this application may also adopt other definitions or names in specific applications, and for example, the abnormal access behavior may be referred to as abnormal attack behavior, abnormal access, and the like. Or the abnormal access behavior may also define other names according to actual application scenarios, which is not specifically limited in this embodiment of the present application.

Claims

1. A video processing method, comprising:

the mobile edge computing device receiving a plurality of first video streams from a plurality of first terminal devices;

the mobile edge computing device acquires a second video stream, a background image and description parameters according to the plurality of first video streams; the second video stream is a video stream synthesized according to the plurality of first video streams, the background image is obtained by performing foreground and background processing on the second video stream, and the description parameter is used for describing scene information corresponding to the second video stream;

the mobile edge computing device determining values for a plurality of target metrics from the plurality of first video streams, the second video stream, the background image, and the description parameter; the plurality of target indicators are related to a human factor engineering indicator and an objective video quality indicator;

the mobile edge computing equipment performs weighted computation on the values of the target indexes to obtain a processing result of the second video stream; the processing result is used for reflecting the quality of the second video stream.

2. The method of claim 1, wherein the mobile edge computing device determines values of a plurality of target metrics from the plurality of first video streams, the second video stream, the background image, and the description parameters, comprising:

the mobile edge computing device capturing values of a plurality of first parameters from the plurality of first video streams, the second video stream, the background image, and the description parameters; the first parameters comprise general objective video parameters acquired from the second video stream, parameters reflecting video processing quality acquired from the second video stream, the plurality of first video streams and the background image, parameters reflecting video processing algorithm quality, interaction parameters and user side parameters;

the mobile edge computing device determines values of a plurality of target metrics for the second video stream score from the values of the plurality of first parameters.

3. The method of claim 2, wherein the mobile edge computing device determining values of a plurality of target metrics for the second video stream score based on the values of the plurality of first parameters comprises:

the mobile edge computing device determining values of a plurality of first video metrics from the values of the plurality of first parameters; the first video index comprises the objective video quality index and a correlation index determined according to a correlated parameter in the plurality of first parameters;

the mobile edge computing device determines values of the plurality of target metrics from the values of the plurality of first video metrics.

4. The method of claim 3, wherein the mobile edge computing device determines the values of the plurality of target metrics from the values of the plurality of first video metrics, comprising:

the mobile edge computing equipment determines the value of the human factor engineering index according to the values of the plurality of first video indexes and the first model corresponding to each first video index; the first model is used for outputting a human factor engineering related value corresponding to the first video index by using the value of the first video index;

the mobile edge computing device determines values of the plurality of target indicators based on the values of the ergonomic indicator and the values of the objective video quality indicator.

5. The method according to any of claims 3-4, wherein the correlation index comprises one or more of: image sharpness related to resolution and/or screen size and/or viewing distance, image fineness related to peak signal-to-noise ratio and/or information entropy, detail recognizability related to brightness contrast threshold and/or texture detection threshold and/or inter-frame brightness difference, seam invisibility related to structure similarity and/or edge difference spectral information, brightness imbalance related to brightness contrast threshold and/or saturation, operation and response speed related to head screen latency and/or viewpoint transition speed and/or video processing speed, ghosting related to foreground region area and/or foreground region motion speed and/or edge difference spectral information, or distortion related to structure similarity and/or streak similarity.

6. The method of claim 5, wherein the ergonomic indicator comprises one or more of: video information content, operation and response experience, visual information fidelity or video fluency;

wherein the amount of video information is related to the image sharpness, the image fineness and/or the detail recognizability; the operation and response experience is related to the operation freedom richness, the operation and response speed and/or the operation accuracy; the visual information fidelity is related to the patchwork invisibility, the luminance imbalance, the ghosting, and/or the distortion; the video fluency is related to a stuck duration, a stuck frequency, a picture stability, and/or the frame rate.

7. The method according to any of claims 2-4, wherein the generic objective video parameters include one or more of the following parameters calculated for reference-free objective evaluation of the second video stream: resolution, frame rate, entropy, saturation, peak signal-to-noise ratio, brightness contrast threshold, or texture detection threshold.

8. The method according to any of claims 2-7, wherein the parameters reflecting the video processing quality comprise: the image stability obtained by non-reference subjective evaluation, and one or more of the following parameters obtained by performing reference objective evaluation calculation on the second video stream, the plurality of first video streams and the background image, such as peak signal-to-noise ratio, structural similarity, edge difference spectrum information, information entropy, inter-frame brightness difference, foreground area motion speed or texture similarity.

9. The method according to any one of claims 2-8, wherein the parameters reflecting the quality of the video processing algorithm comprise one or more of the following parameters calculated using a reference-free objective evaluation: algorithm temporal complexity or algorithm spatial complexity.

10. The method according to any of claims 2-9, wherein the interaction parameters comprise one or more of the following parameters: the method comprises the steps of obtaining sense of reality by adopting non-reference subjective evaluation, obtaining operation accuracy by adopting the non-reference subjective evaluation, obtaining first screen delay by adopting the non-reference objective evaluation, obtaining a field angle by adopting the non-reference objective evaluation, obtaining a pause time length by adopting the non-reference objective evaluation, obtaining a pause frame rate by adopting the non-reference objective evaluation or obtaining operation freedom richness by adopting the non-reference objective evaluation.

11. The method according to any of claims 2-10, wherein the user-side parameters comprise one or more of the following parameters: a screen size obtained by the reference-free objective evaluation or a viewing distance obtained by the reference-free objective evaluation.

12. The method according to any one of claims 1-11, wherein the objective video quality indicator comprises one or more of: frame rate, field of view, realism, resolution, peak signal-to-noise ratio, or entropy.

13. The method of any one of claims 1-12, wherein the target metrics include one or more of: viewing experience, interactive experience, or objective parameters; wherein the viewing experience is related to video information content, visual information fidelity, video fluency and/or realism, the interaction experience is related to field angle and/or operation and response experience, and the objective parameters are related to resolution, frame rate, peak signal-to-noise ratio and/or information entropy.

14. The method of any one of claims 1 to 13, wherein the description parameters include one or more of: scene number, scene type, video stream brightness range, video stream capture angle, long shot, short shot, match scene, or static scene.

15. The method of any one of claims 1-14, further comprising:

and the mobile edge computing device sends the processing result and the second video stream to a second terminal device.

16. The method of any one of claims 1-14, further comprising:

the mobile edge computing device adjusts the second video stream according to the processing result to obtain an adjusted video stream;

and the mobile edge computing equipment sends the adjusted video stream to second terminal equipment.

17. An electronic device, comprising: means for performing the steps of any of claims 1-16.

18. An electronic device, comprising: a processor for calling a program in memory to perform the method of any one of claims 1 to 16.

19. An electronic device, comprising: a processor and interface circuitry for communicating with other devices, the processor being configured to perform the method of any of claims 1-16.

20. A computer-readable storage medium having instructions stored thereon that, when executed, cause a computer to perform the method of any of claims 1-16.