CN116030040A

CN116030040A - Data processing method, device, equipment and medium

Info

Publication number: CN116030040A
Application number: CN202310157631.3A
Authority: CN
Inventors: 徐东
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-04-28
Anticipated expiration: 2043-02-23
Also published as: CN116030040B

Abstract

The application discloses a data processing method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring an original rendering resource and a target rendering evaluation model; inputting the original rendering resources to an ith target structural feature extraction component for feature extraction to obtain structural features of the ith target structural feature extraction component; inputting original rendering resources into a target rendering feature extraction network, and determining rendering features of the jth target super-resolution residual error assembly based on the rendering features of the ith target super-resolution residual error assembly and structural features of the ith target structural feature extraction assembly when the rendering features of the ith target super-resolution residual error assembly are acquired; and taking the rendering characteristic of the Nth target super-resolution residual error component as a target rendering characteristic, and performing rendering quality evaluation on the target rendering characteristic through a target quality evaluation network to obtain the rendering evaluation quality of the original rendering resource. By adopting the method and the device, the evaluation efficiency and the evaluation accuracy of the rendering resources can be improved.

Description

Data processing method, device, equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and medium.

Background

With the rapid development of computer technology, the performance of image processing technology is greatly improved. In the prior art, there are many schemes for evaluating image quality, for example, some schemes can set scores in combination with the definition of an image, and such schemes are generally suitable for overall evaluation of a large image, however, rendering resources (such as texture resources in a game) for textures or small images of synthesized textures are difficult to evaluate by the existing schemes, so that the rendering resources can be evaluated only manually, so that the evaluation efficiency and the evaluation accuracy of the rendering resources are low.

Disclosure of Invention

The embodiment of the application provides a data processing method, a device, equipment and a medium, which can improve the evaluation efficiency and the evaluation accuracy of rendering resources.

In one aspect, an embodiment of the present application provides a data processing method, including:

when an original rendering resource in a multimedia file is acquired, acquiring a target rendering evaluation model for evaluating the rendering quality of the original rendering resource; the target rendering evaluation model comprises a target structure feature extraction network, a target rendering feature extraction network and a target quality evaluation network; the target structural feature extraction network comprises M target structural feature extraction components; m is a positive integer; the target rendering feature extraction network comprises N target super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

Inputting the original rendering resources to an ith target structural feature extraction component in the M target structural feature extraction components, and carrying out structural feature extraction on the original rendering resources by the ith target structural feature extraction component to obtain structural features of the ith target structural feature extraction component;

inputting original rendering resources into a target rendering feature extraction network for extracting rendering features, and determining rendering features of a j-th target super-resolution residual assembly based on rendering features of the i-th target super-resolution residual assembly and structural features of the i-th target structural feature extraction assembly when the rendering features of the i-th target super-resolution residual assembly in the N target super-resolution residual assemblies are acquired; the ith target super-resolution residual error component is the last super-resolution residual error component of the jth target super-resolution residual error component; i is a positive integer less than N, and j=i+1;

when the rendering characteristics of the jth target super-resolution residual error assembly are checked to be the rendering characteristics of the Nth target super-resolution residual error assembly in the N target super-resolution residual error assemblies, the rendering characteristics of the Nth target super-resolution residual error assembly are used as target rendering characteristics, and the rendering quality evaluation is carried out on the target rendering characteristics through a target quality evaluation network, so that the rendering evaluation quality of the original rendering resources is obtained.

acquiring a sample rendering resource for training an initial rendering evaluation model and a rendering mark quality of the sample rendering resource; sample rendering resources are obtained from a sample multimedia file; the initial rendering evaluation model comprises an initial structural feature extraction network, an initial rendering feature extraction network and an initial quality evaluation network; the initial structural feature extraction network comprises M initial structural feature extraction components; m is a positive integer; the initial rendering feature extraction network comprises N initial super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

inputting the sample rendering resources to an ith initial structural feature extraction component in the M initial structural feature extraction components, and carrying out structural feature extraction on the sample rendering resources by the ith initial structural feature extraction component to obtain structural features of the ith initial structural feature extraction component;

inputting sample rendering resources into an initial rendering feature extraction network for extracting rendering features, and determining rendering features of a jth initial super-resolution residual assembly based on the rendering features of the ith initial super-resolution residual assembly and the structural features of the ith initial structural feature extraction assembly when the rendering features of the ith initial super-resolution residual assembly in the N initial super-resolution residual assemblies are acquired; the ith initial super-resolution residual error component is the last super-resolution residual error component of the jth initial super-resolution residual error component; i is a positive integer less than N, and j=i+1;

When the rendering characteristics of the jth initial super-resolution residual error assembly are checked to be the rendering characteristics of the nth initial super-resolution residual error assembly in the N initial super-resolution residual error assemblies, taking the rendering characteristics of the nth initial super-resolution residual error assembly as sample rendering characteristics, and performing rendering quality evaluation on the sample rendering characteristics through an initial quality evaluation network to obtain sample evaluation quality of sample rendering resources;

and performing iterative training on the initial rendering evaluation model based on the sample evaluation quality and the rendering annotation quality to obtain a target rendering evaluation model for evaluating the rendering quality of the original rendering resources in the multimedia file.

An aspect of an embodiment of the present application provides a data processing apparatus, including:

the model acquisition module is used for acquiring a target rendering evaluation model for evaluating the rendering quality of the original rendering resources when the original rendering resources in the multimedia file are acquired; the target rendering evaluation model comprises a target structure feature extraction network, a target rendering feature extraction network and a target quality evaluation network; the target structural feature extraction network comprises M target structural feature extraction components; m is a positive integer; the target rendering feature extraction network comprises N target super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

The structure feature extraction module is used for inputting the original rendering resources to an ith target structure feature extraction component in the M target structure feature extraction components, and the ith target structure feature extraction component is used for carrying out structure feature extraction on the original rendering resources to obtain the structure features of the ith target structure feature extraction component;

the rendering feature extraction module is used for inputting original rendering resources into a target rendering feature extraction network for extracting rendering features, and determining rendering features of the jth target super-resolution residual assembly based on the rendering features of the ith target super-resolution residual assembly and the structural features of the ith target structural feature extraction assembly when the rendering features of the ith target super-resolution residual assembly in the N target super-resolution residual assemblies are acquired; the ith target super-resolution residual error component is the last super-resolution residual error component of the jth target super-resolution residual error component; i is a positive integer less than N, and j=i+1;

the quality evaluation module is used for taking the rendering characteristic of the Nth target super-resolution residual error assembly as a target rendering characteristic when the rendering characteristic of the jth target super-resolution residual error assembly is checked to be the rendering characteristic of the Nth target super-resolution residual error assembly in the N target super-resolution residual error assemblies, and performing rendering quality evaluation on the target rendering characteristic through the target quality evaluation network to obtain the rendering evaluation quality of the original rendering resource.

Wherein the ith target structural feature extraction component comprises a structural feature extraction layer and a first activation layer;

the structural feature extraction module comprises:

the semantic extraction unit is used for inputting the original rendering resources to a structural feature extraction layer in the ith target structural feature extraction component, and carrying out feature extraction on the original rendering resources through the structural feature extraction layer to obtain advanced semantic features of the original rendering resources;

the feature mapping unit is used for inputting the high-level semantic features to a first activation layer in the ith target structural feature extraction component, and performing feature mapping on the high-level semantic features through the first activation layer to obtain the structural features of the ith target structural feature extraction component.

The j-th target super-resolution residual error component comprises a rendering feature extraction layer, a spatial attention layer and a feature normalization layer;

the rendering feature extraction module comprises:

the characteristic splicing unit is used for carrying out characteristic splicing on the rendering characteristic of the ith target super-resolution residual error component and the structural characteristic of the ith target structural characteristic extraction component to obtain a mixed splicing characteristic;

the rendering feature extraction unit is used for inputting the mixed splicing features to a rendering feature extraction layer in the j-th target super-resolution residual error assembly, and extracting the features of the mixed splicing features through the rendering feature extraction layer to obtain first-resolution features;

An attention acquisition unit for inputting the first resolution feature to the spatial attention layer, acquiring a spatial attention matrix associated with the first resolution feature through the spatial attention layer, and determining a second resolution feature based on the spatial attention matrix and the first resolution feature;

the feature normalization unit is used for inputting the second resolution features to the feature normalization layer, and performing feature normalization processing on the second resolution features through the feature normalization layer to obtain rendering features of the j-th target super-resolution residual error assembly.

Wherein, the characteristic concatenation unit includes:

the characteristic Chi Huazi unit is used for carrying out maximum pooling treatment on the rendering characteristics of the ith target super-resolution residual error assembly to obtain pooled rendering characteristics corresponding to the ith target super-resolution residual error assembly; the resolution of the pooled rendering features is the same as the resolution of the structural features of the ith target structural feature extraction component;

and the characteristic splicing subunit is used for carrying out characteristic splicing on the pooled rendering characteristic and the structural characteristic of the ith target structural characteristic extraction component to obtain a mixed splicing characteristic.

The rendering feature extraction layer comprises a first convolution layer, a second activation layer and a second convolution layer;

The rendering feature extraction unit includes:

the first feature extraction subunit is used for inputting the mixed splicing features into the first convolution layer, and extracting the features of the mixed splicing features through the first convolution layer to obtain first mixed features;

the first nonlinear processing subunit is used for inputting the first mixed characteristic into the second activation layer, and nonlinear processing is carried out on the first mixed characteristic through the second activation layer to obtain a second mixed characteristic;

and the second feature extraction subunit is used for inputting the second mixed features into the second convolution layer, and extracting the features of the second mixed features through the second convolution layer to obtain the first resolution features.

The spatial attention layer comprises a first grouping convolution layer, a third activation layer, a second grouping convolution layer and a fourth activation layer; the first grouping convolution layer and the second grouping convolution layer are symmetrical networks with the same network structure;

the attention acquisition unit includes:

the group convolution subunit is used for inputting the first resolution characteristic into the first group convolution layer, dividing the first resolution characteristic into G groups of sub-resolution characteristics through the first group convolution layer, and respectively carrying out convolution processing on each group of sub-resolution characteristics in the G groups of sub-resolution characteristics to obtain G groups of first subspace attention characteristics; g is a positive integer greater than 1;

The second nonlinear processing subunit is used for inputting the G groups of first subspace attention features into the third activation layer, and respectively carrying out nonlinear processing on each group of first subspace attention features in the G groups of first subspace attention features through the third activation layer to obtain G groups of second subspace attention features;

the characteristic reconstruction subunit is used for inputting the G group second subspace attention characteristic into the second group convolution layer, and carrying out characteristic reconstruction on the G group second subspace attention characteristic through the second group convolution layer to obtain a target space attention characteristic;

the feature mapping subunit is used for inputting the target spatial attention feature to the fourth activation layer, and performing feature mapping on the target spatial attention feature through the fourth activation layer to obtain a spatial attention matrix associated with the first resolution feature; each attention weight in the spatial attention matrix is a non-negative number;

and the feature acquisition subunit is used for multiplying the spatial attention matrix and the first resolution feature to obtain a second resolution feature.

Wherein the feature normalization unit includes:

the resolution filtering subunit is used for inputting the second resolution features into the feature normalization layer, and performing resolution filtering processing on the second resolution features through the feature normalization layer to obtain low resolution features;

And the residual connection subunit is used for carrying out residual connection and normalization processing on the low-resolution characteristic and the second-resolution characteristic to obtain the rendering characteristic of the jth target super-resolution residual component.

Wherein the target rendering evaluation model includes an initial convolution layer associated with the target rendering feature extraction network; the initial convolution layer is used for extracting convolution characteristics of original rendering resources; when the ith target super-resolution residual error component is a first target super-resolution residual error component in the N target super-resolution residual error components, the first target super-resolution residual error component is used for extracting rendering characteristics of the convolution characteristics when the convolution characteristics output by the initial convolution layer are acquired, so that the rendering characteristics of the first target super-resolution residual error component are obtained.

Wherein the target quality assessment network comprises a pooling layer and a quality assessment layer;

the quality assessment module comprises:

the feature pooling unit is used for inputting the target rendering features into the pooling layer, and carrying out feature pooling on the target rendering features through the pooling layer to obtain compression features;

the quality evaluation unit is used for inputting the compressed features to the quality evaluation layer, and performing rendering quality evaluation on the compressed features through the quality evaluation layer to obtain rendering evaluation quality of the original rendering resources.

The quality evaluation layer comprises a full-connection layer and a regression layer;

the quality evaluation unit includes:

the feature merging subunit is used for carrying out feature merging on the compression features through the full connection layer to obtain quality features;

the quality output subunit is used for inputting the quality features into the regression layer, outputting the evaluation quality corresponding to the quality features through the regression layer, and taking the evaluation quality corresponding to the quality features as the rendering evaluation quality of the original rendering resources.

Wherein the apparatus further comprises:

a resource replacement module for acquiring a rendering quality threshold associated with the multimedia file; if the rendering evaluation quality of the original rendering resource is smaller than the rendering quality threshold, determining that the original rendering resource is a low-quality rendering resource; in the multimedia file, replacing the low-quality rendering resource with a target rendering resource to obtain a multimedia update file; the rendering evaluation quality of the target rendering asset is higher than the rendering evaluation quality of the original rendering asset.

the sample acquisition module is used for acquiring sample rendering resources for training the initial rendering evaluation model and rendering mark quality of the sample rendering resources; sample rendering resources are obtained from a sample multimedia file; the initial rendering evaluation model comprises an initial structural feature extraction network, an initial rendering feature extraction network and an initial quality evaluation network; the initial structural feature extraction network comprises M initial structural feature extraction components; m is a positive integer; the initial rendering feature extraction network comprises N initial super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

The first feature extraction module is used for inputting the sample rendering resources to an ith initial structural feature extraction component in the M initial structural feature extraction components, and the ith initial structural feature extraction component is used for carrying out structural feature extraction on the sample rendering resources to obtain structural features of the ith initial structural feature extraction component;

the second feature extraction module is used for inputting sample rendering resources into an initial rendering feature extraction network for extracting rendering features, and determining rendering features of an ith initial super-resolution residual assembly based on the rendering features of the ith initial super-resolution residual assembly and structural features of the ith initial structural feature extraction assembly when the rendering features of the ith initial super-resolution residual assembly in the N initial super-resolution residual assemblies are acquired; the ith initial super-resolution residual error component is the last super-resolution residual error component of the jth initial super-resolution residual error component; i is a positive integer less than N, and j=i+1;

the sample evaluation module is used for taking the rendering characteristic of the Nth initial super-resolution residual error assembly as a sample rendering characteristic when the rendering characteristic of the jth initial super-resolution residual error assembly is checked to be the rendering characteristic of the Nth initial super-resolution residual error assembly in the N initial super-resolution residual error assemblies, and performing rendering quality evaluation on the sample rendering characteristic through the initial quality evaluation network to obtain sample evaluation quality of sample rendering resources;

The model training module is used for carrying out iterative training on the initial rendering evaluation model based on the sample evaluation quality and the rendering annotation quality to obtain a target rendering evaluation model for evaluating the rendering quality of the original rendering resources in the multimedia file.

In one aspect, a computer device is provided, including: a processor and a memory;

the processor is connected to the memory, wherein the memory is configured to store a computer program, and when the computer program is executed by the processor, the computer device is caused to execute the method provided in the embodiment of the application.

In one aspect, the present application provides a computer readable storage medium storing a computer program adapted to be loaded and executed by a processor, so that a computer device having the processor performs the method provided in the embodiments of the present application.

In one aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided in the embodiments of the present application.

In the embodiment of the application, when the original rendering resources in the multimedia file are acquired, a target rendering evaluation model for evaluating the rendering quality of the original rendering resources can be acquired; the target rendering evaluation model comprises a target structural feature extraction network, a target rendering feature extraction network and a target quality evaluation network; the target structural feature extraction network comprises M target structural feature extraction components; m is a positive integer; the target rendering feature extraction network comprises N target super-resolution residual error components; n is a positive integer greater than 1, and n=m+1; further, the original rendering resource can be input to an ith target structural feature extraction component in the M target structural feature extraction components, and the ith target structural feature extraction component performs structural feature extraction on the original rendering resource, so that structural features of the ith target structural feature extraction component are obtained; further, the original rendering resources may be input to a target rendering feature extraction network for extracting rendering features, and when the rendering features of the ith target super-resolution residual assembly of the N target super-resolution residual assemblies are acquired, the rendering features of the jth target super-resolution residual assembly may be determined based on the rendering features of the ith target super-resolution residual assembly and the structural features of the ith target structural feature extraction assembly; the ith target super-resolution residual error component is the last super-resolution residual error component of the jth target super-resolution residual error component; i is a positive integer less than N, and j=i+1; further, when the rendering characteristics of the jth target super-resolution residual error assembly are checked to be the rendering characteristics of the nth target super-resolution residual error assembly in the N target super-resolution residual error assemblies, the rendering characteristics of the nth target super-resolution residual error assembly can be used as target rendering characteristics, and further, the rendering quality evaluation can be carried out on the target rendering characteristics through a target quality evaluation network, so that the rendering evaluation quality of the original rendering resources is obtained. Therefore, the embodiment of the application designs an end-to-end dual-flow network (i.e., a target rendering evaluation model) to explore the structural features and the rendering features of the rendering resources together, wherein after the structural features of the rendering resources are extracted by the structural branches (i.e., the target structural feature extraction network), the rendering features of the rendering resources can be extracted more carefully by combining the structural features by the texture branches (i.e., the target rendering feature extraction network), so that the finally extracted rendering features (i.e., the target rendering features) can more fully represent the rendering quality, and therefore, the rendering evaluation quality with higher accuracy can be obtained based on the target rendering features, and the evaluation accuracy of the rendering resources can be improved. In addition, compared with a mode of manually evaluating the rendering resources, the embodiment of the application can directly input the acquired rendering resources into the trained target rendering evaluation model to automatically perform rendering quality evaluation, replaces manual work, and greatly improves the evaluation efficiency of the rendering resources.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 2 is a schematic view of a scenario of data processing provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a network structure for rendering an evaluation model according to an embodiment of the present application;

fig. 5 is a schematic network structure diagram of a super-resolution residual error component according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a network structure of a spatial attention layer according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a network structure of a further spatial attention layer according to an embodiment of the present application;

fig. 8 is a schematic diagram of a network structure of a feature normalization layer according to an embodiment of the present application;

FIG. 9 is a flowchart of yet another data processing method according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a data processing apparatus according to another embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

The scheme that this application embodiment provided relates to computer vision technology under the artificial intelligence field. It will be appreciated that Computer Vision (CV) is a science of how to "look" a machine, and more specifically, to replace the human eyes with a camera and a Computer to perform machine Vision such as recognition and measurement of a target, and further perform graphic processing, so that the Computer processes the image into an image more suitable for the human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include data processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc. In embodiments of the present application, computer vision techniques may be used to extract structural features and rendering features of a rendering resource.

The scheme provided by the embodiment of the application relates to a machine learning technology in the field of artificial intelligence. It will be appreciated that Machine Learning (ML) is a multi-domain interdisciplinary discipline involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning typically includes techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. In the embodiment of the application, the target rendering evaluation model is an AI model based on a machine learning technology, and can be used for evaluating the rendering quality of an input rendering resource.

Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in fig. 1, the system architecture may comprise a service server 100 and a cluster of terminal devices, wherein the cluster of terminal devices may comprise one or more terminal devices, the number of terminal devices in the cluster of terminal devices will not be limited here. As shown in fig. 1, the plurality of terminal devices in the terminal device cluster may specifically include: terminal device 200a, terminal device 200b, terminal devices 200c, …, terminal device 200n, wherein a communication connection may exist between the clusters of terminal devices, e.g. a communication connection exists between terminal device 200a and terminal device 200b, and a communication connection exists between terminal device 200a and terminal device 200 c. Meanwhile, any terminal device in the terminal device cluster may have a communication connection with the service server 100, so that each terminal device in the terminal device cluster may perform data interaction with the service server 100 through the communication connection, for example, a communication connection exists between the terminal device 200a and the service server 100. The communication connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may be directly or indirectly connected through a wireless communication manner, or may be other manners, which is not limited herein.

Wherein each terminal device in the terminal device cluster may include: smart phones, tablet computers, notebook computers, desktop computers, wearable devices (e.g., smart watches, smart bracelets), smart homes, headsets, smart car-mounted and other intelligent terminals (e.g., may have rendering functions). It should be understood that each terminal device in the terminal device cluster shown in fig. 1 may be provided with an application client, and when the application client runs in each terminal device, data interaction may be performed between the application client and the service server 100 shown in fig. 1. The application client may include an entertainment client (e.g., a game client, a live client), a multimedia client (e.g., a video client), a tool client (e.g., an image editing client, a video production client), a social client, an instant messaging client (e.g., a conference client), a vehicle client, a smart home client, etc., and the application client has a function of displaying data information such as text, image, audio, and video. The application client may be an independent client, or may be an embedded sub-client integrated in a client (e.g., social client, multimedia client, etc.), which is not limited herein. Taking a Game client as an example, the service server 100 may be a collection of a plurality of servers such as a background server and a data processing server corresponding to the Game client, so each terminal device may perform data transmission with the service server 100 through the Game client, for example, each terminal device may participate in the same Game with other terminal devices through the service server 100, for example, an MMORPG Game (which is fully called Massive Multiplayer Online Role-Playing Game, i.e. a massively multiplayer online role Playing Game), an FPS Game (which is fully called First-person shooting Game, i.e. a First person shooting Game), and may output a corresponding Game screen.

The service server 100 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like.

It will be appreciated that in an actual business scenario (e.g., a game scenario), when texture rendering for objects/scenes does not reach the standard, this results in poor visual effects and therefore an assessment of texture rendering effects is necessary. Based on this, the embodiment of the application provides an evaluation technology of texture rendering effect, which can evaluate texture resources under any service scene. For the convenience of subsequent understanding and description, the texture resources to be evaluated and rendered may be collectively referred to as original rendering resources, where the original rendering resources may specifically be texture resources such as real textures (e.g. textures collected for real objects/scenes, etc.) or synthetic textures (e.g. textures constructed for virtual objects/scenes, etc.), where the texture resources may exist in the form of images in practical application, and optionally, a complete texture may be formed by one or more texture sub-images, where the type and content of the original rendering resources will not be limited. It may be appreciated that the original rendering resources in the embodiments of the present application may be extracted from a multimedia file, where the multimedia file may be a Video file that carries image data and audio data at the same time, for example, a game Video, an animation, a short Video, a tv episode, a movie, a Music clip (MV), and so on; or may be an image class file composed of complete image data, such as a screenshot of a game video or animation, a poster, a photograph, a web page picture, or the like; the image type file mainly comprises texture images; the embodiments of the present application will not limit the type, content, source and format of the multimedia file. In addition, the embodiment of the application may refer to a neural network model for performing rendering quality evaluation on an original rendering resource extracted from a multimedia file as a target rendering evaluation model.

It is to be appreciated that the methods provided by embodiments of the present application may be performed by a computer device, including but not limited to a terminal device (e.g., any one of the terminal devices in the cluster of terminal devices shown in fig. 1) or a business server (e.g., business server 100 shown in fig. 1), and that the computer device may load the target rendering assessment model. For ease of understanding, a computer device is illustrated herein as service server 100. Specifically, the service server 100 may acquire a target rendering evaluation model for evaluating rendering quality of an original rendering resource when acquiring the original rendering resource in the multimedia file; the target rendering evaluation model may include a target structural feature extraction network, a target rendering feature extraction network, and a target quality evaluation network; the target structural feature extraction network comprises M target structural feature extraction components; m is a positive integer; the target rendering feature extraction network comprises N target super-resolution residual error components; n is a positive integer greater than 1, and n=m+1, for ease of understanding, the i-th target structural feature extraction component and the j-th target super-resolution residual component are described herein as examples. Further, the service server 100 may input the original rendering resource to an ith target structural feature extraction component of the M target structural feature extraction components, and the ith target structural feature extraction component performs structural feature extraction on the original rendering resource, so as to obtain structural features of the ith target structural feature extraction component; the structural features may be high-level semantic information of the original rendering resources, such as high-frequency fusion degree, low-frequency information feature combination, and the like. Further, the service server 100 may input the original rendering resources to a target rendering feature extraction network for extracting rendering features, and when the rendering features of the ith target super-resolution residual assembly of the N target super-resolution residual assemblies are acquired, may determine the rendering features of the jth target super-resolution residual assembly based on the rendering features of the ith target super-resolution residual assembly and the structural features of the ith target structural feature extraction assembly; the rendering features may include low-level texture information and hybrid structural features of the original rendering resources, etc.; the ith target super-resolution residual error component is the last super-resolution residual error component of the jth target super-resolution residual error component; i is a positive integer less than N, and j=i+1. Further, when the rendering characteristic of the jth target super-resolution residual error component is detected to be the rendering characteristic of the nth target super-resolution residual error component in the N target super-resolution residual error components, the rendering characteristic of the nth target super-resolution residual error component can be used as a target rendering characteristic, and further, the rendering quality evaluation can be performed on the target rendering characteristic through a target quality evaluation network, so that the rendering evaluation quality of the original rendering resource is obtained. The rendering evaluation quality may be indicated in the form of a rendering quality score, a rendering quality level, a rendering evaluation result, and the like, which is not limited in the embodiment of the present application.

It can be understood that the embodiment of the application combines the structural features to mine the obtained rendering features (i.e. texture features) in a staged manner, so that the rendering quality can be more fully represented, and the rendering evaluation quality with higher accuracy can be obtained; in addition, the embodiment of the application can automatically carry out rendering quality evaluation on the input rendering resources through the trained end-to-end double-flow network (namely the target rendering evaluation model), and replaces manual work, so that the evaluation efficiency and the evaluation accuracy of the rendering resources are greatly improved.

Optionally, it may be understood that the method provided in the embodiment of the present application may be executed by a terminal device, for example, the terminal device may also obtain an original rendering resource and load a corresponding target rendering evaluation model, and evaluate, through a process similar to that executed by the service server, the rendering quality of the original rendering resource to obtain the rendering evaluation quality of the original rendering resource, and further may perform corresponding service processing according to the rendering evaluation quality (for example, optimize the rendering resource with poor rendering evaluation quality). Or, alternatively, the method provided in the embodiment of the present application may be further executed by the service server and the terminal device together, for example, the terminal device may first extract an original rendering resource from the multimedia file, and then may send the extracted original rendering resource to the connected service server for performing rendering quality assessment, or may directly send the local multimedia file to the service server, where the service server extracts the original rendering resource from the multimedia file for performing rendering quality assessment; finally, the service server may return the resulting rendering evaluation quality to the terminal device. The number of service servers in the system architecture shown in fig. 1 may be one or more, one terminal device may be connected to one service server, and each service server may acquire a multimedia file or an original rendering resource uploaded by a terminal device connected to the service server and evaluate rendering quality of the multimedia file or the original rendering resource.

It should be noted that, the above rendering quality evaluation method may be applied to various service scenarios such as games, videos, instant messaging, etc. where there is a requirement for rendering quality evaluation, and specific service scenarios will not be listed here one by one.

For example, in a game scenario, where the game player attaches more importance to high frequency details and partial region roughness is allowed, a computer device (e.g., the service server 100 described above) may obtain a rendering resource (e.g., a game texture image B1) in a multimedia file (e.g., a game video A1) already rendered in a target game, and further may evaluate the rendering quality of the game texture image B1 through an associated target rendering evaluation model, so that the rendering evaluation quality C1 of the game texture image B1 may be obtained. It will be appreciated that when the rendering evaluation quality C1 is low, the corresponding game texture image B1 may have problems such as low quality or poor light, and the computer device may optimize the game texture image B1 in the target game (e.g., re-render or replace a new texture image); in addition, when synthesizing textures, considering the problems of rendering engine texture combination conditions and the like, the method provided by the embodiment of the application can solve the evaluation problem of the synthesized textures, thereby providing high-quality game pictures for game players and further improving the game quality and the game experience of the game players.

The target game may be any virtual game, and the target game is classified according to a game running mode, and may be divided into: cell phone games, client games, web games (Cloud games), and the like. (1) The mobile phone game can be simply called as hand game, and is a game running on the smart phone; the mobile phone game can be operated on the smart phone by downloading and installing the installation package of the mobile phone game on the smart phone. (2) The client game can be simply called end game, and is a game running in the intelligent computer; the client game can be operated by requiring the installation package of the game to be downloaded and installed in a smart computer (e.g., a personal computer). (3) The web game may be called web (World Wide Web) game or web game (page game for short), which is a network online game that operates on a game after a game web page is opened based on a browser. (4) Cloud gaming, which may also be referred to as game on demand (game on demand), is an online gaming technology based on cloud computing technology; in the running mode of the cloud game, the cloud game is not run in terminal equipment (such as a smart phone, a personal computer, or a wearable device) held by a game player, but run in a cloud server, the cloud server renders scenes in the game into audio and video streams, and the cloud server transmits the rendered audio and video streams to the terminal equipment through a network, so that the terminal equipment directly plays the audio and video streams. Based on the above description, the cloud game has the following advantages: the requirements on the computing and processing capabilities of the terminal equipment held by the game player are not high (for example, the terminal equipment does not need to have strong graphic operation and data processing capabilities, only needs to have basic streaming media playing capabilities and the capabilities of acquiring game player input instructions and sending the game player input instructions to the cloud server), and the development of the cloud game does not need to consider the suitability of hardware and the like, so that the cloud game gradually realizes the large-scale development. Further, classifying the target games according to game play may divide the target games into: role-playing Game (RPG), shooting Game (ACT), and the like.

It should be understood that when the target game is a cloud game, the computer device (e.g., the service server 100) may be a cloud server associated with the cloud game, that is, in a cloud game scenario, the cloud server may render (including texture rendering) virtual scenes/objects in the target game, and may extract rendered texture resources (i.e., rendering resources, such as the game texture image B1) from the obtained audio-video stream, and input the extracted texture resources into the target rendering evaluation model for evaluation, so that unreasonable textures used in the target game may be timely detected, and optimization may be performed according to service requirements, for example, the unreasonable textures used in the target game may be updated and re-rendered, and finally, a new audio-video stream may be sent to the terminal device of the game player for output.

For another example, in a video scenario, the computer device (e.g., the service server 100) may obtain a multimedia file (such as the video file A2) obtained by the video producer during the video production by the application client (e.g., the video production client) on the terminal device (e.g., the terminal device 200B) held by the computer device, and then may extract the rendered resources (e.g., the composite texture image B2) already rendered in the video file A2, and input the rendered resources to the associated target rendering evaluation model to perform the rendering quality evaluation, so as to obtain the rendering evaluation quality C2 of the composite texture image B2, and then may evaluate or optimize the composite texture image B2 according to the rendering evaluation quality C2. It can be appreciated that the system architecture shown in fig. 1 may be applied to video production scenes involving texture rendering, for example, the service server 100 shown in fig. 1 may be used to generate, manage and render virtual scenes, and the terminal devices (for example, the terminal device 200a and the terminal device 200 b) in the terminal device cluster shown in fig. 1 may control the virtual object and the virtual camera through the installed video production client, and may further obtain the rendering evaluation quality (such as the rendering evaluation quality C2) obtained by evaluating the relevant rendering resources by the service server 100.

For another example, in an instant messaging scenario based on a virtual object, in order to provide A3D (short for Three Dimensions) virtual space with link perception and sharing characteristics, or an interactive, immersive and collaborative world, a computer device (for example, the terminal device 200 a) may obtain a multimedia file (for example, an audio-video stream A3) generated when a user (for example, a user X) performs instant messaging through an installed application client (for example, an instant messaging client), and may obtain a rendering resource (for example, a texture image B3 of a virtual object associated with the user X) from the audio-video stream A3, input the rendering resource to an associated target rendering evaluation model for performing real-time rendering quality evaluation, so that when a rendering evaluation quality C3 of the texture image B3 is obtained, the texture image B3 may be dynamically adjusted and optimized according to the rendering evaluation quality C3, thereby enriching a display manner of the instant messaging and improving a display quality of the virtual object in the instant messaging process. The virtual object associated with the user X may be a virtual object rendered by the computer device based on the collected real object data of the user X (for example, face shape, hairstyle, clothes, etc. of the user X), or may be a virtual object selected in advance from the object resource library by the user X, which is not limited in the embodiment of the present application.

For ease of understanding, please refer to fig. 2, fig. 2 is a schematic diagram of a scenario of data processing according to an embodiment of the present application. As shown in fig. 2, the computer device in the embodiment of the present application may be a computer device capable of loading the target rendering evaluation model, where the computer device may be any one of the terminal devices in the terminal device cluster shown in fig. 1, for example, the terminal device 200a, and the computer device may also be the service server 100 shown in fig. 1, which is not limited herein.

Taking the evaluation of the rendering effect of the rendering resource 202 as an example, as shown in fig. 2, the rendering resource 202 may be derived from the multimedia file 201, where the type, content and source of the multimedia file 201 are not limited, and the type and content of the rendering resource 202 are not limited, for example, the rendering resource 202 may be a game texture image that is already rendered. In order to enable a faster and more accurate assessment of the rendering quality of the rendering resource 202, the rendering resource 202 may be regarded as the original rendering resource to be assessed, and the computer device may obtain, when obtaining the rendering resource 202, a target rendering assessment model (e.g. rendering assessment model 203) for assessing the rendering quality of the rendering resource 202, where the rendering assessment model is a pre-trained texture-structure joint learning network, and mainly includes a target structural feature extraction network (e.g. structural feature extraction network 203 a), a target rendering feature extraction network (e.g. rendering feature extraction network 203 b), and a target quality assessment network (e.g. quality assessment network 203 c).

It will be appreciated that each network in the rendering evaluation model 203 is each tasked differently, thereby jointly enabling the evaluation of rendering quality. The structural feature extraction network 203a is mainly used for extracting structural features of rendering resources; the structural feature extraction network 203a may be composed of M target structural feature extraction components, where M is a positive integer, and the number of target structural feature extraction components included in the structural feature extraction network 203a is not limited. For example, as shown in fig. 2, the M target structural feature extraction components may specifically include a structural feature extraction component 1, a structural feature extraction component 2, …, a structural feature extraction component M; and the network structure of the M target structural feature extraction components is not limited, for example, the network structure of each target structural feature extraction component is the same, or the network structure of each target structural feature extraction component is different or not identical. Wherein, the rendering feature extraction network 203b is mainly used for extracting rendering features (i.e. texture features) of rendering resources; the rendering feature extraction network 203b may be composed of N target super-resolution residual components, where N is a positive integer greater than 1, and n=m+1, that is, the number of target super-resolution residual components is one more than the number of target structural feature extraction components, where the number of target super-resolution residual components included in the rendering feature extraction network 203b is not limited. For example, as shown in fig. 2, the N target super-resolution residual components may specifically include a super-resolution residual component 1, a super-resolution residual component 2, a super-resolution residual component 3, …, and a super-resolution residual component N; and the network structure of the N target super-resolution residual error components is not limited, for example, the network structure of each target super-resolution residual error component is the same, or the network structure of each target super-resolution residual error component is different or not identical. Further, the quality evaluation network 203c is mainly used to predict rendering quality from the extracted features.

It can be appreciated that the related features of the rendering resource 202 may be jointly extracted by the structural feature extraction network 203a and the rendering feature extraction network 203b, specifically, the structural features extracted by each structural feature extraction component in the structural feature extraction network 203a may be combined with the rendering features extracted by the corresponding super-resolution residual component in the rendering feature extraction network 203b, so that the finally extracted rendering features can fully represent the texture quality, which is to be simply prepared for feature differentiation.

As shown in fig. 2, after the rendering resource 202 is input into the structural feature extraction network 203a, each target structural feature extraction component in the structural feature extraction network 203a may perform structural feature extraction on the rendering resource 202, so as to obtain a corresponding structural feature. For example, the structural feature extraction component 1 may perform structural feature extraction on the rendering resource 202 to obtain structural feature a of the structural feature extraction component 1 ₁ The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the structural feature extraction component 2 performs structural feature extraction on the rendering resource 202 to obtain structural feature a of the structural feature extraction component 2 ₂ The method comprises the steps of carrying out a first treatment on the surface of the …; the structural feature extraction component M performs structural feature extraction on the rendering resource 202 to obtain structural features A of the structural feature extraction component M _M . Wherein the structural feature of each structural feature extraction component refers to the structural feature of the rendering resource 202 extracted and output by the structural feature extraction component.

Further, the computer device may input the rendering resource 202 to the rendering feature extraction network 203b, which may then incorporate the above-described junctionsThe structural features extracted by the structural feature extraction network 203a are extracted in the rendering feature extraction network 203b by super-resolution residual components contained therein. As shown in fig. 2, the rendering feature B of the super-resolution residual component (i.e., the super-resolution residual component 1) immediately preceding the super-resolution residual component 2 is acquired ₁ The rendering characteristics B of the super-resolution residual component 1 can be then ₁ And structural feature a of structural feature extraction assembly 1 ₁ Commonly input to the super-resolution residual component 2, which in turn may be based on rendering characteristics B ₁ And structural feature A ₁ Determining rendering characteristics B of the super resolution residual component 2 ₂ . Similarly, further, the rendering characteristics B of the super resolution residual component 2 may be determined ₂ And structural feature a of structural feature extraction assembly 2 ₂ Commonly input to the super-resolution residual component 3, which in turn may be based on rendering characteristics B ₂ And structural feature A ₂ Determining rendering characteristics B of the super resolution residual component 3 ₃ The method comprises the steps of carrying out a first treatment on the surface of the …; finally, rendering feature B, which may be based on the super-resolution residual component (N-1) _(N-1) And structural feature A of structural feature extraction component M _M (where m=n-1), determining a rendering characteristic B of the super resolution residual component N _N . Wherein, the rendering characteristics of each super-resolution residual component refer to rendering characteristics of the rendering resource 202 extracted and output by the super-resolution residual component.

Further, it should be noted that, for the first super-resolution residual component (i.e., the super-resolution residual component 1) in the rendering feature extraction network 203b, there is no other super-resolution residual component in front of it, so the process of rendering feature extraction by the super-resolution residual component 1 is slightly different from the process of rendering feature extraction by the super-resolution residual component behind it. For example, a convolution layer associated with the rendering feature extraction network 203B may be added to the rendering evaluation model 203 to initially extract the convolution feature C of the rendering resource 202, and then the convolution feature C may be input to the super-resolution residual error component 1, and the super-resolution residual error component 1 performs rendering feature extraction on the convolution feature C to obtain the rendering feature B of the super-resolution residual error component 1 ₁ 。

It can be understood that, because the structural features such as high-frequency fusion degree, low-frequency information features and the like are considered when the rendering features are extracted, the finally extracted rendering features can more fully and accurately express the rendering quality, and the accuracy of rendering quality assessment can be improved.

Further, the currently acquired rendering feature B may be checked _N When the rendering feature of the last super-resolution residual component (i.e., super-resolution residual component N) in the network 203B is extracted for the rendering feature, the rendering feature B may be extracted _N Is input as a target rendering feature to the quality evaluation network 203c, and the rendering feature B is subjected to the quality evaluation network 203c _N A rendering quality assessment is made so that a rendering assessment quality (e.g., rendering assessment quality 204) of the rendering resource 202 can be derived.

It is appreciated that, alternatively, the evaluation of the rendering resource 202 may be implemented by the rendering evaluation quality 204, for example, when the rendering evaluation quality 204 reaches a specified rendering quality threshold, the rendering quality of the rendering resource 202 may be considered to be up to standard; when the rendering evaluation quality 204 does not reach the rendering quality threshold, the rendering quality of the rendering resource 202 may be considered to be not up to standard, and at this time, the optimization of the rendering resource may be implemented by deleting, replacing, re-rendering, and the like. Or, alternatively, when the rendering resource 202 is a texture synthesized by the texture synthesis model, the evaluation of the texture synthesis model may be implemented by the rendering evaluation quality 204, for example, when the standard reaching rate of a batch of rendering resources including the rendering resource 202 is counted to be greater than a specified quality statistics threshold, the texture synthesis quality of the texture synthesis model may be considered to be better; otherwise, when the standard reaching rate of a batch of rendering resources including the rendering resources 202 is counted to be smaller than the quality statistics threshold, the texture synthesis quality of the texture synthesis model can be considered to be poor, and at the moment, the optimization of the texture synthesis model can be realized by adjusting model parameters and the like. The embodiment of the application does not limit the optimization mode based on the rendering evaluation quality.

It can be appreciated that the embodiments of the present application may utilize a large number of sample rendering resources to train the neural network to obtain the rendering evaluation model 203, and a specific training process may refer to an embodiment corresponding to fig. 9.

The specific implementation manner of performing the rendering quality evaluation on the original rendering resources in the acquired multimedia file by the target rendering evaluation model through the target rendering evaluation model obtained by the computer device through training the initial rendering evaluation model can be referred to the following description in the embodiments corresponding to fig. 3-9.

Referring to fig. 3, fig. 3 is a flow chart of a data processing method according to an embodiment of the present application. It may be appreciated that, in this embodiment, the method provided by the present application may be performed by a computer device, where the computer device may be a terminal device or a service server running a target rendering evaluation model. For ease of understanding, the embodiment of the present application takes the computer device as a terminal device, to illustrate a specific process of evaluating the rendering quality of the original rendering resource in the terminal device. As shown in fig. 3, the method may at least include the following steps S101-S104:

step S101, when an original rendering resource in a multimedia file is obtained, a target rendering evaluation model for evaluating the rendering quality of the original rendering resource is obtained;

It can be appreciated that the terminal device can obtain the rendered original rendering resources in the multimedia file. When the multimedia file is stored locally, the terminal equipment can extract the original rendering resources to be evaluated from the multimedia file by itself; optionally, when the multimedia file is stored on other computer devices, the other computer devices may first extract the original rendering resources in the multimedia file and then send the original rendering resources to the terminal device for processing, or the other computer devices may directly send the complete multimedia file to the terminal device for processing.

Further, when the original rendering resource is acquired, the terminal device may acquire a target rendering evaluation model for evaluating rendering quality of the original rendering resource. The target rendering evaluation model may be deployed locally on the terminal device, or may be deployed on a service platform accessible to the terminal device (e.g., deployed on the cloud), which is not limited herein. The target rendering evaluation model may include a target structural feature extraction network, a target rendering feature extraction network, and a target quality evaluation network; the target structural feature extraction network and the target rendering feature extraction network can be used as extractors in the target rendering evaluation model, for example, the target structural feature extraction network can be used as a structural branch in the extractors and is mainly used for extracting structural features, the target structural feature extraction network can comprise M target structural feature extraction components, and each target structural feature extraction component can be used for extracting structural features of original rendering resources; m is a positive integer; similarly, a target rendering feature extraction network may be used as a texture branch in the extractor, primarily for mining texture features (i.e., rendering features) in stages, the target rendering feature extraction network may include N target super-resolution residual components; n is a positive integer greater than 1, and n=m+1. The number of target structural feature extraction components and the number of target super-resolution residual components are not limited. It can be understood that after the original rendering resource is obtained, the terminal device may input the original rendering resource to the target structural feature extraction network and the target rendering feature extraction network to extract corresponding features, and the specific implementation process may refer to the subsequent steps.

Step S102, inputting the original rendering resources to an ith target structural feature extraction component in M target structural feature extraction components, and carrying out structural feature extraction on the original rendering resources by the ith target structural feature extraction component to obtain structural features of the ith target structural feature extraction component;

it will be appreciated that the target structural feature extraction network may be used for sensing structural information extraction, so that the terminal device may input the original rendering resource into the target structural feature extraction network, and each of the M target structural feature extraction components extracts structural features of the original rendering resource, respectively, and for convenience of understanding, an i-th target structural feature extraction component of the M target structural feature extraction components will be described below as an example, where i is a positive integer less than or equal to M.

In one embodiment, the ith target structural feature extraction component may comprise a structural feature extraction layer and a first activation layer; the original rendering resources can be input to a structural feature extraction layer in the ith target structural feature extraction component, and feature extraction is carried out on the original rendering resources through the structural feature extraction layer, so that advanced semantic features of the original rendering resources are obtained; and the high-level semantic features can be input to a first activation layer in the ith target structural feature extraction component, and feature mapping is carried out on the high-level semantic features through the first activation layer, so that the structural features of the ith target structural feature extraction component are obtained.

The structural feature extraction layer can be constructed by adopting a convolutional neural network (Convolutional Neural Network, CNN), and the high-level semantic features extracted by the structural feature extraction layer mainly comprise high-level semantic information such as high-frequency fusion degree, low-frequency information features and the like. It will be appreciated that the structural feature extraction layer may be constructed according to business requirements, and the specific network structure of the structural feature extraction layer is not limited herein.

The first activation layer may use a proper activation function to map the high-level semantic features to other spaces to obtain structural features meeting the requirement of subsequent data processing, for example, the first activation layer may use a Sigmoid function as an activation function to map the high-level semantic features to a (0, 1) interval, that is, a value range of the output structural features may be guaranteed to be (0, 1), and in addition, a ReLu function (Rectified Linear Unit, a linear rectification function, also called a modified linear unit), a tanh function (hyperbolic tangent function, a hyperbolic tangent function) and the like may be selected as the activation function of the first activation layer, which is not limited herein.

It can be appreciated that the network structures of the different target structural feature extraction components may be identical, or may be different or not identical, which is not limited in the embodiment of the present application. The process of extracting structural features by other target structural feature extraction components in the target structural feature extraction network is similar, and the process is not expanded one by one.

For ease of understanding, please refer to fig. 4, fig. 4 is a schematic diagram of a network structure for rendering an evaluation model according to an embodiment of the present application. The data processing scenario described in fig. 2 above may be implemented by using the rendering evaluation model shown in fig. 4, where the rendering evaluation model may be constructed by using the BVSNet network (i.e. the end-to-end dual-flow network for mining texture features and structural features) designed in the embodiment of the present application, and the prediction pipeline of the rendering evaluation model mainly includes an extractor and a regressor, where the extractor may explore the texture features and the structural features through dual branches, and the extractor may mainly divide the texture branches into structural branches and texture branches, where the structural branches are mainly used to extract high-level semantic information (such as high-frequency fusion degree, low-frequency information feature combination, etc.), and the texture branches are mainly used to mine low-level texture information and mix structural features (in combination with the structural branches) for performing stage processing, so that the finally extracted features can fully represent the quality of texture, i.e. provide for feature differentiation. The regressor may then perform quality multi-class assessment of the final texture image via a multi-class network, such as designing rendering quality via a nonlinear mapping. It will be appreciated that given a rendering resource (such as a rendered game texture image I ^SR ) The task of this BVSNet network-based rendering assessment model is to predict perceived rendering quality:

（1）

wherein,,

a BVSNet network designed according to an embodiment of the present application may be represented, and a specific network structure thereof may be referred to fig. 4. Here->

Refers to a rendering quality score that may be used to characterize the rendering evaluation quality of a rendering asset.

It will be appreciated that when shown in FIG. 4For convenience of explanation, it is assumed that the target structural feature extraction network as a structural branch contains 5 target structural feature extraction components (i.e., m=5 as described above), and the target rendering feature extraction network as a texture branch contains 6 target super-resolution residual components (i.e., n=6), and the rendering resource 401 (e.g., game texture image I ^SR ) Can be used as the original rendering resource to be evaluated. In the target structural feature extraction network, assuming that the 5 target structural feature extraction components are the structural feature extraction component 1, the structural feature extraction component 2, the structural feature extraction component 3, the structural feature extraction component 4, and the structural feature extraction component 5, respectively, the rendering resource 401 may be input to the 5 target structural feature extraction components to extract structural features, respectively. It will be appreciated that the target structural feature extraction network is a pre-trained network, with structural features noted as F ^S Then:

（2）

wherein,,

is the structural feature of the ith structural feature extraction component (which may also be referred to as a structural feature extractor). Taking the structural feature extraction component 1 as an example (i.e., i=1 at this time), the rendering resource 401 (e.g., the game texture image I ^SR ) The structural feature extraction layer (for example, the structural feature extraction layer D1) input to the structural feature extraction component 1 extracts the advanced semantic feature E1 of the rendering resource 401 through the structural feature extraction layer, and then the advanced semantic feature E1 can be subjected to feature mapping through the first activation layer (for example, the activation layer D2) of the structural feature extraction component 1, so as to obtain the structural feature (structural feature a) of the structural feature extraction component 1 ₁ For example +.>

). By a similar process, the structural features (structural feature a) of the structural feature extraction assembly 2 can be obtained ₂ For example +.>

) Structural feature of the structural feature extraction unit 3 (structural feature a ₃ For example +.>

) Structural feature of the structural feature extraction unit 4 (structural feature a ₄ For example +.>

) And the structural feature of the structural feature extraction unit 5 (structural feature a ₅ For example +.>

). Optionally, in some embodiments, the extracted structural feature A ₁ Structural feature A ₅ The respective channel numbers are c=64, 128, 256, 512 and 512, respectively, and the resolution of the structural features is gradually halved.

It can be understood that when the network structures of the M target structural feature extraction components are different, each target structural feature extraction component can extract structural features under different dimensions of the original rendering resources, i.e. can extract richer advanced semantic information, thereby being beneficial to improving accuracy of rendering quality assessment.

Step S103, inputting original rendering resources into a target rendering feature extraction network for extracting rendering features, and determining rendering features of a j-th target super-resolution residual assembly based on rendering features of the i-th target super-resolution residual assembly and structural features of the i-th target structural feature extraction assembly when rendering features of the i-th target super-resolution residual assembly in N target super-resolution residual assemblies are acquired;

it will be appreciated that the target rendering feature extraction network may be used to explore pixel-level texture information (e.g., by a shallow extractor). It should be noted that, in the embodiment of the present application, a spatial attention layer with a spatial attention (Spatial Attention, SA) mechanism and a feature normalization layer with a feature normalization (F-Norm) function are configured in each target super-resolution residual component of the target rendering feature extraction network, so as to be specially used for evaluating rendering features, and the partial network layer can perform fine feature extraction on the input texture resources so as to predict the pipeline for feature reading.

Based on this, the terminal device may input the original rendering resource to the target rendering feature extraction network, and extract the rendering feature of the original rendering resource layer by using N target super-resolution residual error components, for convenience of understanding, the description will be given below taking, as an example, an ith target super-resolution residual error component and a jth target super-resolution residual error component in the N target super-resolution residual error components, where the ith target super-resolution residual error component is a last super-resolution residual error component of the jth target super-resolution residual error component; i is a positive integer less than N, and j=i+1.

Referring to fig. 5, fig. 5 is a schematic diagram of a network structure of a super-resolution residual error module according to an embodiment of the present application. As shown in fig. 5, in one embodiment, the jth target super-resolution residual component may include a render feature extraction layer, a spatial attention layer, and a feature normalization layer; in order to more fully mine the rendering characteristics, the rendering characteristics and the structural characteristics which are extracted in the prior art can be combined together for consideration, and in particular, the rendering characteristics of the ith target super-resolution residual error component and the structural characteristics of the ith target structural characteristic extraction component can be subjected to characteristic splicing (namely, channel connection operation is carried out), so that mixed splicing characteristics are obtained; the mixed splicing characteristic can be input to a rendering characteristic extraction layer in a j-th target super-resolution residual error component, and the rendering characteristic extraction layer is used for extracting the characteristic of the mixed splicing characteristic, so that a first resolution characteristic can be obtained; further, the first resolution feature may be input to a spatial attention layer, a spatial attention matrix associated with the first resolution feature may be acquired by the spatial attention layer, and a second resolution feature may be determined based on the spatial attention matrix and the first resolution feature; further, the second resolution characteristic is input to a characteristic normalization layer, characteristic normalization processing is carried out on the second resolution characteristic through the characteristic normalization layer, and the rendering characteristic of the jth target super-resolution residual error component can be obtained. It is understood that rendering features extracted by the target super-resolution residual component may include low-level texture information, hybrid structural features, and the like.

For ease of understanding, please refer again to fig. 4, assume that the 6 target super-resolution residual components included in the target rendering feature extraction network as texture branches are respectively: super-resolution residual component 1, super-resolution residual component 2, super-resolution residual component 3, super-resolution residual component 4, super-resolution residual component 5, and super-resolution residual component 6, rendering resource 401 (such as game texture image I ^SR ) Input to these 6 target super-resolution residual components to extract rendering features layer by layer. It will be appreciated that the target rendering feature extraction network is a pre-trained network, with rendering features noted as F ^T Then for the j-th stage of the texture branch (i.e., the j-th target super-resolution residual component), there is:

（3）

wherein,,

is a super-resolution residual error component (also called a super-resolution residual error block) designed by the embodiment of the application]Indicating a channel connect operation. It will be appreciated that in feature stitching the rendered and structural features, in order to preserve and structure feature F ^S The same resolution, each stage may be followed by rendering feature F ^T Performing a max pooling operation on the rendering features of the ith target super-resolution residual component (e.g.) >

I.e. +.>

) And the structural feature of the i-th target structural feature extraction component (e.g.)>

I.e. +.>

) For example, the specific process of feature stitching may be: performing maximum pooling treatment on rendering features of the ith target super-resolution residual error assembly to obtain pooled rendering features corresponding to the ith target super-resolution residual error assembly; wherein the resolution of the pooled rendering features is the same as the resolution of the structural features of the i-th target structural feature extraction component; and further, carrying out feature splicing on the pooled rendering features and the structural features of the ith target structural feature extraction component to obtain mixed spliced features, and then excavating the mixed spliced features through a subsequent rendering feature extraction layer, a spatial attention layer and a feature normalization layer.

For ease of understanding, taking super-resolution residual component 1 (i.e., where i=1) and super-resolution residual component 2 (i.e., where j=2) in fig. 4 as an example, rendering feature B of super-resolution residual component 1 is acquired for super-resolution residual component 2 ₁ (e.g

) And structural feature a of structural feature extraction assembly 1 ₁ (e.g.)>

) Thereafter, the rendering feature B may be performed by a channel connect operation ₁ And structural feature A ₁ Performing feature stitching to obtain a hybrid stitching feature (such as hybrid stitching feature E2), and further processing the hybrid stitching feature E2 sequentially through a rendering feature extraction layer, a spatial attention layer and a feature normalization layer in the super-resolution residual error assembly 2, so as to obtain a rendering feature (rendering feature B) of the super-resolution residual error assembly 2 ₂ For example +.>

）。

It should be noted that for the first phase rendering feature F ^T Slightly different from the subsequent stage, in the embodiment of the present application, a convolution layer and a target may be usedThe super-resolution residual component extracts the residual component from the original rendering resource (such as game texture image I ^SR ) Is described. In particular, the target rendering evaluation model may include an initial convolution layer (e.g., convolution layer 402 shown in fig. 4) associated with the target rendering feature extraction network, the initial convolution layer to extract convolution features of the original rendering resources; it will be appreciated that when the i-th target super-resolution residual component is the first target super-resolution residual component of the N target super-resolution residual components (i.e. i=1), the first target super-resolution residual component may be used to extract the rendering characteristics of the convolution characteristics when the convolution characteristics of the initial convolution layer output are obtained, thereby obtaining the rendering characteristics of the first target super-resolution residual component (such as

). Then there are:

（4）

wherein,,

i.e. the initial convolution layer described previously.

It will be appreciated that the rendered feature extraction layer may be constructed using a Convolutional Neural Network (CNN) and may be used to further integrate and mine the hybrid stitching features, where the specific network structure of the structural feature extraction layer is not limited. For example, as shown in fig. 5, in one embodiment, the rendering feature extraction layer may include a first convolution layer, a second activation layer, and a second convolution layer, and the specific process of performing feature extraction on the input hybrid stitching feature by the rendering feature extraction layer may be: inputting the mixed splicing characteristics into a first convolution layer, and extracting the characteristics of the mixed splicing characteristics through the first convolution layer to obtain first mixed characteristics; further, the first mixed characteristic is input into a second activation layer, and nonlinear processing is carried out on the first mixed characteristic through the second activation layer, so that a second mixed characteristic can be obtained; and the second mixed characteristic can be input into a second convolution layer, and the second mixed characteristic is subjected to characteristic extraction through the second convolution layer to obtain the first resolution characteristic. Alternatively, the first convolution layer and the second convolution layer may adopt the same network structure, or may adopt different network structures, which is not limited herein. For example, in some embodiments, both the first convolution layer and the second convolution layer may be convolved with a 3 x 3 convolution kernel. In addition, to enhance the learning ability of the network, the second activation layer may use a suitable activation function to perform nonlinear processing on the first mixed feature output by the first convolution layer, for example, a ReLu function may be used as an activation function to introduce nonlinearity, and other activation functions may also be used, which is not limited herein.

Wherein the spatial attention layer (also referred to as SA layer) introduces a spatial attention mechanism by mimicking the attention of the human eye to texture details (i.e., the human eye is more concerned with important information), making the visually sensitive area more distinguishable. The specific network structure of the spatial attention layer is not limited herein. Referring to fig. 6, fig. 6 is a schematic diagram of a network structure of a spatial attention layer according to an embodiment of the present application. As shown in fig. 6, in one embodiment, the spatial attention layer may comprise a first packet convolutional layer, a third active layer, a second packet convolutional layer, and a fourth active layer; the first grouping convolution layer and the second grouping convolution layer are symmetrical networks with the same network structure; specifically, the first resolution features extracted by the rendering feature extraction layer may be input to a first group convolution layer, where the first resolution features are divided into G groups of sub-resolution features by the first group convolution layer, and convolution processing is performed on each group of sub-resolution features in the G groups of sub-resolution features, so as to obtain G groups of first subspace attention features; where G is a positive integer greater than 1; further, the first subspace attention features of the G groups can be input to a third activation layer, nonlinear processing is respectively carried out on each first subspace attention feature of the first subspace attention features of the G groups through the third activation layer, and the second subspace attention features of the G groups can be obtained; subsequently, the G group second subspace attention features can be input into a second group convolution layer, and the second group subspace attention features are subjected to feature reconstruction through the second group convolution layer, so that the target space attention features can be obtained; the target spatial attention features can be input to a fourth activation layer, feature mapping is carried out on the target spatial attention features through the fourth activation layer, so that a spatial attention matrix associated with the first resolution features is obtained, and at the moment, each attention weight in the spatial attention matrix is a non-negative number; finally, the spatial attention matrix may be multiplied by the first resolution feature to obtain a second resolution feature. It will be appreciated that by weighting the first resolution features by the spatial attention matrix, important information in the first resolution features (such as high frequency details of textures etc.) may be made more prominent and easily distinguishable.

Wherein, the first packet convolution layer and the second packet convolution layer can carry out convolution processing in a packet convolution (Group Convolution) mode. The grouping convolution is to divide the input features into a specified number of groups, each convolution kernel is correspondingly divided into groups, and convolutions are performed in the corresponding groups, so that a large number of features can be generated by using a small number of parameters and calculation amount, and more information can be extracted.

Further, referring to fig. 7, fig. 7 is a schematic diagram of a network structure of another spatial attention layer according to an embodiment of the present application. As shown in fig. 7, assuming that one spatial attention layer includes a packet convolution layer 70a (i.e., the aforementioned first packet convolution layer), an active layer 70b (i.e., the aforementioned third packet convolution layer), a packet convolution layer 70C (i.e., the aforementioned second packet convolution layer), and an active layer 70d (i.e., the aforementioned fourth active layer), a feature 701 may be the aforementioned first resolution feature, and assuming that the size of the feature 701 is w×h×c, after inputting the feature 701 into the packet convolution layer 70a, the feature 701 may be divided into G groups of sub-resolution features, and in the case of uniform division, the size of each group of sub-resolution features is w×h×c/G; similarly, the single convolution kernel of size kxkxc used by the packet convolution layer 70a may also be divided into G groups, to obtain G groups of sub-convolution kernels, where each group of sub-convolution kernels has a size kxkxc/G, and each group of sub-convolution kernels is further used to convolve the sub-resolution features of the corresponding group of sub-resolution features, e.g., the first group of sub-resolution features and the first group of sub-convolution kernels, to obtain the corresponding first subspace-attentiveness feature (e.g., feature 702). Further, feature 702 may be non-linearly processed by activation layer 70b, such that a corresponding second subspace attention feature (e.g., feature 703) may be derived; subsequently, feature 703 may be feature reconstructed (i.e., the shape of the feature is recovered using a symmetric set of convolutions) by the packet convolution layer 70c to obtain a target spatial attention feature, which then has the same size as the first resolution feature; further, the feature 703 is feature-mapped by the activation layer 70d so that the attention is not negative, thereby obtaining a desired spatial attention matrix (e.g., matrix P), and finally, the obtained spatial attention matrix (e.g., matrix P) is multiplied by the feature 701, thereby obtaining a second resolution feature (e.g., feature 704) having a spatial attention.

In addition, the third activation layer and the fourth activation layer may each employ a suitable activation function to achieve a corresponding function, e.g., the third activation layer may employ a ReLu function as an activation function to introduce nonlinearities in processing the first subspace attentiveness feature, e.g., the fourth activation layer may employ a Sigmoid function as an activation function such that the spatial attentiveness matrix is not negative, i.e., each attentiveness weight in the spatial attentiveness matrix is within the (0, 1) interval _sa When the number of convolution processing groups corresponding to the first grouping convolution layer is C _sa 4 (i.e. g=c as described above) _sa 4) the number of packets of the filter (convolution kernel) may also be C _sa /4。

Furthermore, embodiments of the present application exploit feature normalization to investigate the inherent spatial correlation of low resolution features. In the target super-resolution residual component, feature normalization can be used to replace batch normalization. For ease of understanding, please refer to fig. 8, fig. 8 is a schematic diagram of a network structure of a feature normalization layer according to an embodiment of the present application. As shown in fig. 8, in an embodiment, the feature normalization layer may be formed by a deep convolution layer and a residual connection, specifically, the terminal device may input the second resolution feature to the feature normalization layer, and further perform resolution filtering processing on the second resolution feature through the feature normalization layer to obtain a low resolution feature; further, residual connection and normalization processing can be performed on the low-resolution feature and the second-resolution feature, so that rendering features of the jth target super-resolution residual component are obtained. Among them, the feature normalization layer is more suitable for SR (low resolution) features, because it can avoid texture confusion and save memory cost. The depth convolution layer can carry out self-adaptive depth convolution by adopting a depth convolution kernel for processing low-frequency information, the convolution can efficiently filter low-resolution textures, express low-frequency characteristics, avoid information redundancy caused by blind convolution, and simultaneously, compared with the method adopting omnibearing convolution, the method has less information after convolution, and can save memory. Then, a residual is made for the information after convolution (i.e. the aforementioned low resolution features) and the information before convolution (i.e. the second resolution features), and a normalized connection is made for the residual, i.e. matrix stitching. Residual errors are adopted because intermediate frequency information is processed by using information after convolution of a convolution kernel, and the residual errors can be used for properly supplementing some intermediate frequency and low frequency information characteristics, so that richer rendering characteristics can be obtained, and the accuracy of rendering quality assessment can be improved.

Step S104, when the rendering characteristics of the jth target super-resolution residual error assembly are checked to be the rendering characteristics of the Nth target super-resolution residual error assembly in the N target super-resolution residual error assemblies, the rendering characteristics of the Nth target super-resolution residual error assembly are used as target rendering characteristics, and the rendering quality evaluation is carried out on the target rendering characteristics through a target quality evaluation network, so that the rendering evaluation quality of the original rendering resources is obtained.

It can be understood that the terminal device may check in real time whether the currently acquired rendering feature is the rendering feature output by the last target super-resolution residual component (i.e. the nth target super-resolution residual component) of the N target super-resolution residual components, and if so, may take the rendering feature of the nth target super-resolution residual component as the target rendering feature, so that the target quality evaluation network may predict the rendering evaluation quality of the original rendering resource from the target rendering feature. For example, in connection with the embodiment corresponding to fig. 4, the texture branch has 6 stages, and the rendering evaluation quality is:

（5）

wherein,,

for the Reg function regressor, which can be used as target quality evaluation network, here +. >

Refers to a rendering quality score that may be used to indicate the rendering evaluation quality of a rendering asset. As compared with the above formula (1), it can be seen that the ratio is +.>

The evaluation of the original rendering resources by the whole target rendering evaluation model can be represented,/or->

Can represent the evaluation of the last regressor, +.>

Comprises->

(i.e., the target rendering assessment model includes a target quality assessment network), but the assessment results of both are consistent. It can be appreciated that->

Representing the result of a tandem operation for which the object is the input game texture image I ^SR And->

For game texture image I ^SR The processed texture feature is a regressor.

For ease of understanding, please refer to fig. 4 again, as shown in fig. 4, in an embodiment, the target quality evaluation network may be a multi-classification network, and may specifically include a pooling layer (e.g. pooling layer 403) and a quality evaluation layer (e.g. quality evaluation layer 404), and then the terminal device may input the target rendering feature to the pooling layer, and perform feature pooling on the target rendering feature by the pooling layer to obtain the compressed feature; and the compressed features can be input into a quality evaluation layer, and the quality evaluation layer is used for performing rendering quality evaluation on the compressed features to obtain the rendering evaluation quality of the original rendering resources. That is, the target quality assessment network may use the pooling layer to embed features and utilize the quality assessment layer to regress rendering quality so that the target quality assessment network may generate rendering assessment quality from features having any resolution. Wherein the pooling layer may compress the target rendering features in an adaptive maximum pooling and an adaptive average pooling manner to obtain compressed features having a target size (e.g., 4 x 4). The quality evaluation layer can be constructed by adopting a convolutional neural network, and the specific network structure of the quality evaluation layer is not limited in the embodiment of the application.

For example, in one implementation, the quality evaluation layer may include a full connection layer and a regression layer, and the terminal device may perform feature combination on the compressed features through the full connection layer to obtain quality features; and the quality features can be input into a regression layer, the evaluation quality corresponding to the quality features can be output through the regression layer, and the evaluation quality corresponding to the quality features can be used as the rendering evaluation quality of the original rendering resources. For example, as shown in fig. 4, the full-connection layer and the regression layer may be composed of a convolution layer and an activation layer, for example, the full-connection layer may include a third convolution layer, a fifth activation layer, a fourth convolution layer, and a sixth activation layer, and the third convolution layer and the fourth convolution layer may use the same network structure or different network structures, and not limited herein, for example, the third convolution layer may be convolved with a 4×4 convolution kernel and the fourth convolution layer may be convolved with a 1×1 convolution kernel. In addition, the fifth and sixth activation layers may also employ suitable activation functions to activate the received feature, e.g., both the fifth and sixth activation layers may employ the ReLu function as an activation function to introduce nonlinearity.

It is understood that the rendering evaluation quality may be indicated in the form of a rendering quality score, a rendering quality level, a rendering evaluation result, etc., which is not limited in the embodiment of the present application. That is, the rendering quality score of the original rendering resource may be regarded as its rendering evaluation quality, alternatively, the rendering quality level (e.g., divided into a level a, a level B, a level C, etc. from high to low) or the rendering evaluation result (e.g., divided into a high-quality rendering result and a low-quality rendering result) may be determined based on the rendering quality score, and then the rendering quality level or the rendering evaluation result may be regarded as its rendering evaluation quality.

It can be appreciated that by the method provided by the embodiment of the present application, a large number of texture resources extracted from files such as games, videos, etc. can be evaluated. For example, the method can be applied to the performance optimization of texture resources of a target game, firstly, a debug tool (i.e. a game developer debugging tool, such as a resource debugging tool of UE) can be utilized to extract the rendered texture resources, and then the extracted texture resources can be input into a target rendering evaluation model for rendering quality evaluation, so that a great deal of manpower can be saved to evaluate textures or texture synthesis models.

Optionally, in one embodiment, the terminal device may optimize the original rendering resources based on the rendering evaluation quality, e.g., the terminal device may obtain a rendering quality threshold associated with the multimedia file; if the rendering evaluation quality of the original rendering resource is smaller than the rendering quality threshold, determining that the original rendering resource is a low-quality rendering resource; in this way, in the multimedia file, the low-quality rendering resource can be replaced by the target rendering resource, so as to obtain a multimedia update file with higher rendering quality; it is appreciated that the rendering evaluation quality of the target rendering asset is higher than the rendering evaluation quality of the original rendering asset.

As can be seen from the foregoing, the embodiment of the present application designs an end-to-end dual-flow network (i.e., a target rendering evaluation model) to explore the structural features and the rendering features of the rendering resources together, where after the structural branches (i.e., the target structural feature extraction network) extract the structural features of the rendering resources, the texture branches (i.e., the target rendering feature extraction network) may combine the structural features to extract the rendering features of the rendering resources more carefully, so that the finally extracted rendering features (i.e., the target rendering features) may more fully represent the level of the rendering quality, and therefore, a rendering evaluation quality with higher accuracy may be obtained based on the target rendering features, so as to improve the evaluation accuracy of the rendering resources. In addition, compared with a mode of manually evaluating the rendering resources, the embodiment of the application can directly input the acquired rendering resources into the trained target rendering evaluation model to automatically perform rendering quality evaluation, replaces manual work, and greatly improves the evaluation efficiency of the rendering resources.

Referring to fig. 9, fig. 9 is a flowchart of another data processing method according to an embodiment of the present application. It may be understood that the method provided in the embodiment of the present application may be performed by a computer device, where the computer device may be a terminal device or a service server, and it may be understood that the computer device may be the same as or different from the computer device using the target rendering evaluation model in the embodiment corresponding to fig. 3, which is not limited herein. For easy understanding, the embodiment of the application takes the computer device as a terminal device as an example, and describes a specific process of model training for the initial rendering evaluation model in the terminal device. As shown in fig. 9, the method may at least include the following steps S201 to S205:

step S201, obtaining sample rendering resources for training an initial rendering evaluation model and rendering label quality of the sample rendering resources;

it can be understood that the terminal device may acquire the sample rendering resources for training the initial rendering evaluation model from a large number of sample multimedia files, and the acquisition manner may be referred to the above description of step S101 in the embodiment corresponding to fig. 3 regarding acquiring the original rendering resources and the target rendering evaluation model in the multimedia files. In addition, in order to calculate the loss function later, the terminal device may further obtain a rendering label quality of the sample rendering resource, where the rendering label quality may be manually labeled on the sample rendering resource, and the rendering label quality is indicated in a form of a rendering quality score, a rendering quality level, a rendering evaluation result, and the embodiment of the present application does not limit the foregoing. The initial rendering evaluation model comprises an initial structural feature extraction network, an initial rendering feature extraction network and an initial quality evaluation network; the initial structural feature extraction network comprises M initial structural feature extraction components; m is a positive integer; the initial rendering feature extraction network comprises N initial super-resolution residual error components; n is a positive integer greater than 1, and n=m+1.

Step S202, inputting a sample rendering resource into an ith initial structural feature extraction component in M initial structural feature extraction components, and carrying out structural feature extraction on the sample rendering resource by the ith initial structural feature extraction component to obtain structural features of the ith initial structural feature extraction component;

it may be appreciated that, after the terminal device obtains the sample rendering resources, the sample rendering resources may be input to the initial structural feature extraction network, and structural features of the sample rendering resources are extracted by each of the M initial structural feature extraction components, for example, the sample rendering resources may be input to an i-th initial structural feature extraction component of the M initial structural feature extraction components, and structural feature extraction may be performed on the sample rendering resources by the i-th initial structural feature extraction component, so as to obtain structural features of the i-th initial structural feature extraction component, which may be referred to in the related description of step S102 in the embodiment corresponding to fig. 3 for structural feature extraction of the original rendering resources by the i-th target structural feature extraction component, which is not repeated herein.

Step S203, inputting sample rendering resources into an initial rendering feature extraction network for extracting rendering features, and determining rendering features of a j-th initial super-resolution residual assembly based on rendering features of the i-th initial super-resolution residual assembly and structural features of the i-th initial structural feature extraction assembly when rendering features of the i-th initial super-resolution residual assembly in the N initial super-resolution residual assemblies are acquired;

It may be appreciated that the terminal device may input the sample rendering resources to an initial rendering feature extraction network for extracting rendering features, and extract rendering features of the sample rendering resources layer by N initial super resolution residual assemblies, for example, when obtaining rendering features of an ith initial super resolution residual assembly of the N initial super resolution residual assemblies, may determine rendering features of the jth initial super resolution residual assembly based on the rendering features of the ith initial super resolution residual assembly and structural features of the ith initial structural feature extraction assembly, where the ith initial super resolution residual assembly is a last super resolution residual assembly of the jth initial super resolution residual assembly; i is a positive integer less than N, and j=i+1. The specific implementation manner of this method may be referred to the description of the rendering characteristics of the j-th target super-resolution residual component in step S103 in the embodiment corresponding to fig. 3, and the description thereof is omitted here.

Step S204, when the rendering characteristics of the jth initial super-resolution residual error assembly are checked to be the rendering characteristics of the Nth initial super-resolution residual error assembly in the N initial super-resolution residual error assemblies, the rendering characteristics of the Nth initial super-resolution residual error assembly are used as sample rendering characteristics, and the sample rendering characteristics are subjected to rendering quality assessment through an initial quality assessment network, so that sample assessment quality of sample rendering resources is obtained;

It can be understood that the terminal device may check in real time whether the currently acquired rendering feature is the rendering feature output by the last initial super-resolution residual component (i.e. the nth initial super-resolution residual component) of the N initial super-resolution residual components, if so, the rendering feature of the nth initial super-resolution residual component may be used as a sample rendering feature, and further, the sample rendering feature may be subjected to rendering quality assessment by the initial quality assessment network to obtain the sample assessment quality of the sample rendering resource, and the specific implementation manner thereof may refer to the description of the relevant description of the rendering quality assessment of the target rendering feature by the target quality assessment network in the embodiment corresponding to fig. 3 in step S104.

Step S205, performing iterative training on the initial rendering evaluation model based on the sample evaluation quality and the rendering annotation quality to obtain a target rendering evaluation model for evaluating the rendering quality of the original rendering resources in the multimedia file.

It can be understood that the terminal device can generate a loss function based on the sample evaluation quality and the rendering annotation quality, further can correct model parameters in the initial rendering evaluation model based on the loss function, and finally can obtain the target rendering evaluation model for evaluating the rendering quality of the original rendering resources in the multimedia file through repeated iterative training.

It will be appreciated that, alternatively, the initial structural feature extraction network, the initial rendering feature extraction network, and the initial quality assessment network may be jointly trained, or each network may be independently trained, which is not limited in the embodiments of the present application.

It can be understood that the terminal device for training the initial rendering evaluation model in the embodiment of the present application may be the same terminal device or different terminal devices as the terminal device using the target rendering evaluation model in the embodiment corresponding to fig. 3, which is not limited herein.

Fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 10, the data processing apparatus 1 may be a computer program (including program code) running on a computer device, for example, the data processing apparatus 1 is an application software; the device can be used for executing corresponding steps in the data processing method provided by the embodiment of the application. As shown in fig. 10, the data processing apparatus 1 may include: the system comprises a model acquisition module 11, a structural feature extraction module 12, a rendering feature extraction module 13, a quality evaluation module 14 and a resource replacement module 15;

the model acquisition module 11 is configured to acquire, when an original rendering resource in the multimedia file is acquired, a target rendering evaluation model for evaluating rendering quality of the original rendering resource; the target rendering evaluation model comprises a target structure feature extraction network, a target rendering feature extraction network and a target quality evaluation network; the target structural feature extraction network comprises M target structural feature extraction components; m is a positive integer; the target rendering feature extraction network comprises N target super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

The structural feature extraction module 12 is configured to input an original rendering resource to an ith target structural feature extraction component of the M target structural feature extraction components, and perform structural feature extraction on the original rendering resource by the ith target structural feature extraction component to obtain structural features of the ith target structural feature extraction component;

the structural feature extraction module 12 may include: a semantic extraction unit 121 and a feature mapping unit 122;

the semantic extraction unit 121 is configured to input an original rendering resource to a structural feature extraction layer in the ith target structural feature extraction component, and perform feature extraction on the original rendering resource through the structural feature extraction layer to obtain an advanced semantic feature of the original rendering resource;

the feature mapping unit 122 is configured to input the advanced semantic features to a first activation layer in the ith target structural feature extraction component, and perform feature mapping on the advanced semantic features through the first activation layer to obtain structural features of the ith target structural feature extraction component.

The specific functional implementation manner of the semantic extraction unit 121 and the feature mapping unit 122 may refer to step S102 in the embodiment corresponding to fig. 3, and will not be described herein.

The rendering feature extraction module 13 is configured to input an original rendering resource to a target rendering feature extraction network for extracting rendering features, and determine, when the rendering feature of an ith target super-resolution residual assembly of the N target super-resolution residual assemblies is acquired, the rendering feature of the jth target super-resolution residual assembly based on the rendering feature of the ith target super-resolution residual assembly and the structural feature of the ith target structural feature extraction assembly; the ith target super-resolution residual error component is the last super-resolution residual error component of the jth target super-resolution residual error component; i is a positive integer less than N, and j=i+1;

the rendering feature extraction module 13 may include: a feature stitching unit 131, a rendering feature extraction unit 132, an attention acquisition unit 133, a feature normalization unit 134;

the feature stitching unit 131 is configured to perform feature stitching on the rendering feature of the ith target super-resolution residual error component and the structural feature of the ith target structural feature extraction component, so as to obtain a hybrid stitching feature;

the feature stitching unit 131 may include: a feature Chi Huazi unit 1311, a feature stitching subunit 1312;

The feature Chi Huazi unit 1311 is configured to perform maximum pooling processing on the rendering feature of the ith target super-resolution residual error assembly, so as to obtain a pooled rendering feature corresponding to the ith target super-resolution residual error assembly; the resolution of the pooled rendering features is the same as the resolution of the structural features of the ith target structural feature extraction component;

and a feature stitching subunit 1312, configured to perform feature stitching on the pooled rendering feature and the structural feature of the ith target structural feature extraction component, to obtain a hybrid stitching feature.

The specific functional implementation manner of the feature Chi Huazi unit 1311 and the feature stitching subunit 1312 may refer to step S103 in the embodiment corresponding to fig. 3, which is not described herein.

The rendering feature extraction unit 132 is configured to input the mixed spliced feature to a rendering feature extraction layer in the jth target super-resolution residual error component, and perform feature extraction on the mixed spliced feature through the rendering feature extraction layer to obtain a first resolution feature;

the rendering feature extraction unit 132 may include: a first feature extraction subunit 1321, a first nonlinear processing subunit 1322, a second feature extraction subunit 1323;

The first feature extraction subunit 1321 is configured to input the mixed splicing feature to a first convolution layer, and perform feature extraction on the mixed splicing feature through the first convolution layer to obtain a first mixed feature;

a first nonlinear processing subunit 1322, configured to input the first mixed feature to the second active layer, and perform nonlinear processing on the first mixed feature through the second active layer to obtain a second mixed feature;

the second feature extraction subunit 1323 is configured to input the second mixed feature to a second convolution layer, and perform feature extraction on the second mixed feature through the second convolution layer, so as to obtain the first resolution feature.

The specific functional implementation manner of the first feature extraction subunit 1321, the first nonlinear processing subunit 1322, and the second feature extraction subunit 1323 may refer to step S103 in the embodiment corresponding to fig. 3, which is not described herein.

An attention acquisition unit 133 for inputting the first resolution feature to the spatial attention layer, acquiring a spatial attention matrix associated with the first resolution feature through the spatial attention layer, and determining a second resolution feature based on the spatial attention matrix and the first resolution feature;

the attention acquisition unit 133 includes: a packet convolution subunit 1331, a second nonlinear processing subunit 1332, a feature reconstruction subunit 1333, a feature mapping subunit 1334, and a feature acquisition subunit 1335;

a grouping convolution subunit 1331, configured to input the first resolution feature to a first grouping convolution layer, divide the first resolution feature into G groups of sub-resolution features through the first grouping convolution layer, and respectively perform convolution processing on each group of sub-resolution features in the G groups of sub-resolution features to obtain a G group of first subspace attention features; g is a positive integer greater than 1;

a second nonlinear processing subunit 1332, configured to input the G groups of first subspace attention features to a third activation layer, and perform nonlinear processing on each of the G groups of first subspace attention features through the third activation layer to obtain G groups of second subspace attention features;

a feature reconstruction subunit 1333, configured to input the G-group second subspace attention feature into a second packet convolution layer, and perform feature reconstruction on the G-group second subspace attention feature through the second packet convolution layer to obtain a target spatial attention feature;

A feature mapping subunit 1334, configured to input the target spatial attention feature to a fourth active layer, and perform feature mapping on the target spatial attention feature through the fourth active layer to obtain a spatial attention matrix associated with the first resolution feature; each attention weight in the spatial attention matrix is a non-negative number;

feature acquisition subunit 1335 is configured to multiply the spatial attention matrix with the first resolution feature to obtain a second resolution feature.

The specific functional implementation manners of the grouping convolution subunit 1331, the second nonlinear processing subunit 1332, the feature reconstruction subunit 1333, the feature mapping subunit 1334, and the feature obtaining subunit 1335 may refer to step S103 in the embodiment corresponding to fig. 3, and will not be described herein.

The feature normalization unit 134 is configured to input the second resolution feature to a feature normalization layer, and perform feature normalization processing on the second resolution feature through the feature normalization layer to obtain a rendering feature of the jth target super-resolution residual error component.

Wherein the feature normalization unit 134 may include: a resolution filtering subunit 1341, a residual connecting subunit 1342;

a resolution filtering subunit 1341, configured to input the second resolution feature to a feature normalization layer, and perform resolution filtering processing on the second resolution feature through the feature normalization layer to obtain a low resolution feature;

And a residual connection subunit 1342, configured to perform residual connection and normalization processing on the low resolution feature and the second resolution feature, so as to obtain a rendering feature of the jth target super resolution residual component.

The specific functional implementation manner of the resolution filtering subunit 1341 and the residual connecting subunit 1342 may refer to step S103 in the embodiment corresponding to fig. 3, which is not described herein.

The specific functional implementation manners of the feature stitching unit 131, the rendering feature extracting unit 132, the attention obtaining unit 133, and the feature normalizing unit 134 may refer to step S103 in the embodiment corresponding to fig. 3, and are not described herein.

The quality evaluation module 14 is configured to, when it is checked that the rendering feature of the jth target super-resolution residual assembly is the rendering feature of the nth target super-resolution residual assembly of the N target super-resolution residual assemblies, take the rendering feature of the nth target super-resolution residual assembly as the target rendering feature, perform rendering quality evaluation on the target rendering feature through the target quality evaluation network, and obtain rendering evaluation quality of the original rendering resource.

the quality assessment module 14 may include: a feature pooling unit 141, a quality evaluation unit 142;

the feature pooling unit 141 is configured to input the target rendering feature to a pooling layer, and perform feature pooling on the target rendering feature through the pooling layer to obtain a compressed feature;

the quality evaluation unit 142 is configured to input the compressed feature to the quality evaluation layer, and perform rendering quality evaluation on the compressed feature through the quality evaluation layer to obtain rendering evaluation quality of the original rendering resource.

the quality evaluation unit 142 may include: a feature merge subunit 1421, a quality output subunit 1422;

A feature merging subunit 1421, configured to perform feature merging on the compressed feature through the full connection layer to obtain a quality feature;

the quality output subunit 1422 is configured to input the quality feature to the regression layer, output, through the regression layer, an evaluation quality corresponding to the quality feature, and use the evaluation quality corresponding to the quality feature as a rendering evaluation quality of the original rendering resource.

The specific functional implementation manner of the feature combining subunit 1421 and the quality outputting subunit 1422 may refer to step S104 in the embodiment corresponding to fig. 3, which is not described herein.

The specific functional implementation manner of the feature pooling unit 141 and the quality evaluation unit 142 may refer to step S104 in the embodiment corresponding to fig. 3, and will not be described herein.

A resource replacement module 15, configured to obtain a rendering quality threshold associated with the multimedia file; if the rendering evaluation quality of the original rendering resource is smaller than the rendering quality threshold, determining that the original rendering resource is a low-quality rendering resource; in the multimedia file, replacing the low-quality rendering resource with a target rendering resource to obtain a multimedia update file; the rendering evaluation quality of the target rendering asset is higher than the rendering evaluation quality of the original rendering asset.

The specific functional implementation manners of the model obtaining module 11, the structural feature extracting module 12, the rendering feature extracting module 13, the quality evaluating module 14, and the resource replacing module 15 may refer to step S101-step S104 in the embodiment corresponding to fig. 3, and are not described herein. In addition, the description of the beneficial effects of the same method is omitted.

Fig. 11 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 11, the data processing device 2 may be a computer program (including program code) running on a computer apparatus, for example, the data processing device 2 is an application software; the device can be used for executing corresponding steps in the data processing method provided by the embodiment of the application. As shown in fig. 11, the data processing apparatus 2 may include: a sample acquisition module 21, a first feature extraction module 22, a second feature extraction module 23, a sample evaluation module 24, and a model training module 25;

a sample acquisition module 21, configured to acquire a sample rendering resource for training an initial rendering evaluation model and a rendering label quality of the sample rendering resource; sample rendering resources are obtained from a sample multimedia file; the initial rendering evaluation model comprises an initial structural feature extraction network, an initial rendering feature extraction network and an initial quality evaluation network; the initial structural feature extraction network comprises M initial structural feature extraction components; m is a positive integer; the initial rendering feature extraction network comprises N initial super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

A first feature extraction module 22, configured to input a sample rendering resource to an ith initial structural feature extraction component of the M initial structural feature extraction components, and perform structural feature extraction on the sample rendering resource by the ith initial structural feature extraction component to obtain a structural feature of the ith initial structural feature extraction component;

a second feature extraction module 23, configured to input a sample rendering resource to an initial rendering feature extraction network for extracting rendering features, and determine, when a rendering feature of an ith initial super-resolution residual assembly of the N initial super-resolution residual assemblies is acquired, a rendering feature of the jth initial super-resolution residual assembly based on the rendering feature of the ith initial super-resolution residual assembly and a structural feature of the ith initial structural feature extraction assembly; the ith initial super-resolution residual error component is the last super-resolution residual error component of the jth initial super-resolution residual error component; i is a positive integer less than N, and j=i+1;

the sample evaluation module 24 is configured to, when it is checked that the rendering feature of the jth initial super-resolution residual error assembly is the rendering feature of the nth initial super-resolution residual error assembly of the N initial super-resolution residual error assemblies, take the rendering feature of the nth initial super-resolution residual error assembly as a sample rendering feature, perform rendering quality evaluation on the sample rendering feature through an initial quality evaluation network, and obtain sample evaluation quality of sample rendering resources;

The model training module 25 is configured to iteratively train the initial rendering evaluation model based on the sample evaluation quality and the rendering annotation quality, so as to obtain a target rendering evaluation model for evaluating the rendering quality of the original rendering resource in the multimedia file.

The specific functional implementation manners of the sample acquiring module 21, the first feature extracting module 22, the second feature extracting module 23, the sample evaluating module 24, and the model training module 25 may be referred to the step S201-step S205 in the embodiment corresponding to fig. 9, and the detailed description thereof will be omitted. In addition, the description of the beneficial effects of the same method is omitted.

Fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 12, the computer device 1000 may include: processor 1001, network interface 1004, and memory 1005, and in addition, the above-described computer device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface, among others. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 12, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 1005, which is one type of computer-readable storage medium.

In the computer device 1000 shown in FIG. 12, the network interface 1004 may provide network communication functions; while user interface 1003 is primarily used as an interface for providing input to a user; the processor 1001 may be configured to invoke the device control application stored in the memory 1005 to execute the description of the data processing method in any of the embodiments corresponding to fig. 3 and 9, which is not described herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiments of the present application further provide a computer readable storage medium, in which the aforementioned computer program executed by the data processing apparatus 1 or the data processing apparatus 2 is stored, and the computer program includes program instructions, when executed by a processor, can execute the description of the data processing method in any of the foregoing embodiments corresponding to fig. 3 and 9, and therefore, a detailed description will not be given here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.

The computer readable storage medium may be the data processing apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Furthermore, it should be noted here that: embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method provided by the corresponding embodiment of any of the preceding figures 3, 9. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the computer program product or the computer program embodiments related to the present application, please refer to the description of the method embodiments of the present application.

The terms first, second and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims

1. A method of data processing, comprising:

when an original rendering resource in a multimedia file is acquired, acquiring a target rendering evaluation model for evaluating the rendering quality of the original rendering resource; the target rendering evaluation model comprises a target structural feature extraction network, a target rendering feature extraction network and a target quality evaluation network; the target structural feature extraction network comprises M target structural feature extraction components; m is a positive integer; the target rendering feature extraction network comprises N target super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

Inputting the original rendering resources to the target rendering feature extraction network for extracting rendering features, and determining rendering features of a j-th target super-resolution residual assembly based on rendering features of the i-th target super-resolution residual assembly and structural features of the i-th target structural feature extraction assembly when rendering features of the i-th target super-resolution residual assembly in the N target super-resolution residual assemblies are acquired; the ith target super-resolution residual error component is the last super-resolution residual error component of the jth target super-resolution residual error component; i is a positive integer less than N, and j=i+1;

when the rendering characteristics of the jth target super-resolution residual error assembly are detected to be the rendering characteristics of the Nth target super-resolution residual error assembly in the N target super-resolution residual error assemblies, the rendering characteristics of the Nth target super-resolution residual error assembly are taken as target rendering characteristics, and the target rendering characteristics are subjected to rendering quality evaluation through the target quality evaluation network, so that the rendering evaluation quality of the original rendering resources is obtained.

2. The method of claim 1, wherein the ith target structural feature extraction component comprises a structural feature extraction layer and a first activation layer;

Inputting the original rendering resource to an ith target structural feature extraction component in the M target structural feature extraction components, and performing structural feature extraction on the original rendering resource by the ith target structural feature extraction component to obtain structural features of the ith target structural feature extraction component, wherein the method comprises the following steps:

inputting the original rendering resources to the structural feature extraction layer in the ith target structural feature extraction component, and carrying out feature extraction on the original rendering resources through the structural feature extraction layer to obtain advanced semantic features of the original rendering resources;

inputting the advanced semantic features to the first activation layer in the ith target structural feature extraction component, and performing feature mapping on the advanced semantic features through the first activation layer to obtain structural features of the ith target structural feature extraction component.

3. The method of claim 1, wherein the j-th target super-resolution residual component comprises a rendering feature extraction layer, a spatial attention layer, and a feature normalization layer;

the determining the rendering feature of the jth target super-resolution residual component based on the rendering feature of the ith target super-resolution residual component and the structural feature of the ith target structural feature extraction component includes:

Performing feature stitching on the rendering features of the ith target super-resolution residual error component and the structural features of the ith target structural feature extraction component to obtain mixed stitching features;

inputting the mixed splicing features to the rendering feature extraction layer in the jth target super-resolution residual error component, and carrying out feature extraction on the mixed splicing features through the rendering feature extraction layer to obtain first-resolution features;

inputting the first resolution feature to the spatial attention layer, acquiring a spatial attention matrix associated with the first resolution feature through the spatial attention layer, and determining a second resolution feature based on the spatial attention matrix and the first resolution feature;

and inputting the second resolution characteristic to the characteristic normalization layer, and carrying out characteristic normalization processing on the second resolution characteristic through the characteristic normalization layer to obtain the rendering characteristic of the j-th target super-resolution residual error component.

4. The method according to claim 3, wherein the feature stitching the rendering feature of the i-th target super-resolution residual component and the structural feature of the i-th target structural feature extraction component to obtain a hybrid stitching feature includes:

Performing maximum pooling treatment on the rendering characteristics of the ith target super-resolution residual error assembly to obtain pooled rendering characteristics corresponding to the ith target super-resolution residual error assembly; the resolution of the pooled rendering features is the same as the resolution of the structural features of the ith target structural feature extraction component;

and performing feature stitching on the pooled rendering features and the structural features of the ith target structural feature extraction component to obtain mixed stitching features.

5. The method of claim 3, wherein the rendering feature extraction layer comprises a first convolution layer, a second activation layer, and a second convolution layer;

the step of inputting the mixed splicing feature to the rendering feature extraction layer in the jth target super-resolution residual error component, and performing feature extraction on the mixed splicing feature through the rendering feature extraction layer to obtain a first resolution feature, wherein the method comprises the following steps:

inputting the mixed splicing characteristics to the first convolution layer, and extracting the characteristics of the mixed splicing characteristics through the first convolution layer to obtain first mixed characteristics;

inputting the first mixed characteristic into the second activation layer, and performing nonlinear processing on the first mixed characteristic through the second activation layer to obtain a second mixed characteristic;

And inputting the second mixed feature into the second convolution layer, and extracting the feature of the second mixed feature through the second convolution layer to obtain a first resolution feature.

6. The method of claim 3, wherein the spatial attention layer comprises a first packet convolutional layer, a third active layer, a second packet convolutional layer, and a fourth active layer; the first grouping convolution layer and the second grouping convolution layer are symmetrical networks with the same network structure;

the inputting the first resolution feature into the spatial attention layer, obtaining a spatial attention matrix associated with the first resolution feature through the spatial attention layer, determining a second resolution feature based on the spatial attention matrix and the first resolution feature, comprising:

inputting the first resolution characteristic into the first grouping convolution layer, dividing the first resolution characteristic into G groups of sub-resolution characteristics through the first grouping convolution layer, and respectively carrying out convolution processing on each group of sub-resolution characteristics in the G groups of sub-resolution characteristics to obtain G groups of first subspace attention characteristics; g is a positive integer greater than 1;

Inputting the G group first subspace attention features to the third activation layer, and respectively carrying out nonlinear processing on each group of first subspace attention features in the G group first subspace attention features through the third activation layer to obtain G group second subspace attention features;

inputting the G group second subspace attention features to the second group convolution layer, and carrying out feature reconstruction on the G group second subspace attention features through the second group convolution layer to obtain target space attention features;

inputting the target spatial attention feature to the fourth activation layer, and performing feature mapping on the target spatial attention feature through the fourth activation layer to obtain a spatial attention matrix associated with the first resolution feature; each attention weight in the spatial attention matrix is a non-negative number;

multiplying the spatial attention matrix by the first resolution feature to obtain a second resolution feature.

7. The method according to claim 3, wherein the inputting the second resolution feature to the feature normalization layer, performing feature normalization processing on the second resolution feature by the feature normalization layer, to obtain the rendering feature of the jth target super resolution residual component, includes:

Inputting the second resolution characteristic to the characteristic normalization layer, and carrying out resolution filtering treatment on the second resolution characteristic through the characteristic normalization layer to obtain a low resolution characteristic;

and carrying out residual connection and normalization processing on the low-resolution characteristic and the second-resolution characteristic to obtain the rendering characteristic of the jth target super-resolution residual error component.

8. The method of claim 1, wherein the target rendering evaluation model comprises an initial convolution layer associated with the target rendering feature extraction network; the initial convolution layer is used for extracting convolution characteristics of the original rendering resources; when the ith target super-resolution residual error component is a first target super-resolution residual error component in the N target super-resolution residual error components, the first target super-resolution residual error component is used for extracting rendering characteristics of the convolution characteristics when the convolution characteristics output by the initial convolution layer are acquired, so that the rendering characteristics of the first target super-resolution residual error component are obtained.

9. The method of claim 1, wherein the target quality assessment network comprises a pooling layer and a quality assessment layer;

Performing rendering quality evaluation on the target rendering feature through the target quality evaluation network to obtain rendering evaluation quality of the original rendering resource, including:

inputting the target rendering features to the pooling layer, and carrying out feature pooling on the target rendering features through the pooling layer to obtain compression features;

inputting the compressed features to the quality evaluation layer, and performing rendering quality evaluation on the compressed features through the quality evaluation layer to obtain rendering evaluation quality of the original rendering resources.

10. The method of claim 9, wherein the quality assessment layer comprises a fully connected layer and a regressive layer;

performing rendering quality evaluation on the compressed feature through the quality evaluation layer to obtain rendering evaluation quality of the original rendering resource, including:

carrying out feature combination on the compression features through the full connection layer to obtain quality features;

inputting the quality features into the regression layer, outputting the evaluation quality corresponding to the quality features through the regression layer, and taking the evaluation quality corresponding to the quality features as the rendering evaluation quality of the original rendering resources.

11. The method as recited in claim 1, further comprising:

acquiring a rendering quality threshold associated with the multimedia file;

if the rendering evaluation quality of the original rendering resource is smaller than the rendering quality threshold, determining that the original rendering resource is a low-quality rendering resource;

in the multimedia file, replacing the low-quality rendering resource with a target rendering resource to obtain a multimedia update file; the rendering evaluation quality of the target rendering resource is higher than the rendering evaluation quality of the original rendering resource.

12. A method of data processing, comprising:

acquiring a sample rendering resource for training an initial rendering evaluation model and a rendering mark quality of the sample rendering resource; the sample rendering resources are obtained from a sample multimedia file; the initial rendering evaluation model comprises an initial structural feature extraction network, an initial rendering feature extraction network and an initial quality evaluation network; the initial structural feature extraction network comprises M initial structural feature extraction components; m is a positive integer; the initial rendering feature extraction network comprises N initial super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

inputting the sample rendering resources to the initial rendering feature extraction network for extracting rendering features, and determining rendering features of a j-th initial super-resolution residual assembly based on rendering features of the i-th initial super-resolution residual assembly and structural features of the i-th initial structural feature extraction assembly when rendering features of the i-th initial super-resolution residual assembly in the N initial super-resolution residual assemblies are acquired; the ith initial super-resolution residual error component is the last super-resolution residual error component of the jth initial super-resolution residual error component; i is a positive integer less than N, and j=i+1;

when the rendering characteristics of the jth initial super-resolution residual error assembly are detected to be the rendering characteristics of the Nth initial super-resolution residual error assembly in the N initial super-resolution residual error assemblies, taking the rendering characteristics of the Nth initial super-resolution residual error assembly as sample rendering characteristics, and performing rendering quality evaluation on the sample rendering characteristics through the initial quality evaluation network to obtain sample evaluation quality of the sample rendering resources;

13. A data processing apparatus, comprising:

the model acquisition module is used for acquiring a target rendering evaluation model for evaluating the rendering quality of the original rendering resources when the original rendering resources in the multimedia file are acquired; the target rendering evaluation model comprises a target structural feature extraction network, a target rendering feature extraction network and a target quality evaluation network; the target structural feature extraction network comprises M target structural feature extraction components; m is a positive integer; the target rendering feature extraction network comprises N target super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

the structural feature extraction module is used for inputting the original rendering resources to an ith target structural feature extraction component in the M target structural feature extraction components, and the ith target structural feature extraction component is used for carrying out structural feature extraction on the original rendering resources to obtain structural features of the ith target structural feature extraction component;

The rendering feature extraction module is used for inputting the original rendering resources to the target rendering feature extraction network for extracting rendering features, and determining rendering features of a j-th target super-resolution residual assembly based on the rendering features of the i-th target super-resolution residual assembly and the structural features of the i-th target structural feature extraction assembly when the rendering features of the i-th target super-resolution residual assembly in the N target super-resolution residual assemblies are acquired; the ith target super-resolution residual error component is the last super-resolution residual error component of the jth target super-resolution residual error component; i is a positive integer less than N, and j=i+1;

the quality evaluation module is used for taking the rendering characteristics of the Nth target super-resolution residual error assembly as target rendering characteristics when the rendering characteristics of the jth target super-resolution residual error assembly are detected to be the rendering characteristics of the Nth target super-resolution residual error assembly in the N target super-resolution residual error assemblies, and performing rendering quality evaluation on the target rendering characteristics through the target quality evaluation network to obtain rendering evaluation quality of the original rendering resources.

14. A data processing apparatus, comprising:

the sample acquisition module is used for acquiring sample rendering resources for training the initial rendering evaluation model and rendering mark quality of the sample rendering resources; the sample rendering resources are obtained from a sample multimedia file; the initial rendering evaluation model comprises an initial structural feature extraction network, an initial rendering feature extraction network and an initial quality evaluation network; the initial structural feature extraction network comprises M initial structural feature extraction components; m is a positive integer; the initial rendering feature extraction network comprises N initial super-resolution residual error components; n is a positive integer greater than 1, and n=m+1;

a second feature extraction module, configured to input the sample rendering resource to the initial rendering feature extraction network for extracting rendering features, and determine, when the rendering feature of an ith initial super-resolution residual component of the N initial super-resolution residual components is obtained, a rendering feature of an jth initial super-resolution residual component based on the rendering feature of the ith initial super-resolution residual component and the structural feature of the ith initial structural feature extraction component; the ith initial super-resolution residual error component is the last super-resolution residual error component of the jth initial super-resolution residual error component; i is a positive integer less than N, and j=i+1;

The sample evaluation module is used for taking the rendering characteristics of the Nth initial super-resolution residual error assembly as sample rendering characteristics when the rendering characteristics of the jth initial super-resolution residual error assembly are detected to be the rendering characteristics of the Nth initial super-resolution residual error assembly in the N initial super-resolution residual error assemblies, and performing rendering quality evaluation on the sample rendering characteristics through the initial quality evaluation network to obtain sample evaluation quality of the sample rendering resources;

15. A computer device, comprising: a processor and a memory;

the processor is connected to the memory, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program to cause the computer device to perform the method of any of claims 1-12.

16. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-12.