CN117933333A - Method for determining neural network model loss value and related application method and equipment - Google Patents

Method for determining neural network model loss value and related application method and equipment Download PDF

Info

Publication number
CN117933333A
CN117933333A CN202211262653.8A CN202211262653A CN117933333A CN 117933333 A CN117933333 A CN 117933333A CN 202211262653 A CN202211262653 A CN 202211262653A CN 117933333 A CN117933333 A CN 117933333A
Authority
CN
China
Prior art keywords
neural network
loss
network model
target
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211262653.8A
Other languages
Chinese (zh)
Inventor
周川
周通
吕卓逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202211262653.8A priority Critical patent/CN117933333A/en
Publication of CN117933333A publication Critical patent/CN117933333A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application discloses a method for determining a neural network model loss value and related application methods and equipment, belonging to the field of artificial intelligence, wherein the method for determining the neural network model loss value comprises the following steps: obtaining an output result in the iterative training process of the neural network model; carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value; the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model.

Description

Method for determining neural network model loss value and related application method and equipment
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a method for determining a neural network model loss value and a related application method and device.
Background
In the field of video coding, a neural network model is generally used for coding and decoding. At present, in the iterative training process of the neural network model, the loss calculation is carried out on the output result of the neural network model by adopting an average absolute error function or a mean square error function to obtain a loss value, and the network parameters of the neural network model are adjusted based on the loss value so as to carry out the next iterative training until the loss converges. However, the performance of the trained neural network model is poor because the current loss calculation only considers the gap between the output result and the labels of the neural network model.
Disclosure of Invention
The embodiment of the application provides a method for determining a neural network model loss value, and a related application method and device, which can improve the performance of a neural network model.
In a first aspect, a method for determining a neural network model loss value is provided, including:
Obtaining an output result in the iterative training process of the neural network model;
Carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value;
the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model.
In a second aspect, a video codec processing method is provided, including:
obtaining an unfiltered reconstructed image in the video encoding and decoding process;
Inputting the unfiltered reconstructed image to a loop filter to obtain a filtered reconstructed image;
The loop filter comprises a loop filter model, wherein the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on the determination method of the neural network model loss value in the first aspect.
In a third aspect, a video image processing method is provided, including:
inputting the first video image obtained after decoding into a video super-resolution model to obtain a second video image;
the resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and the loss value of the video super-resolution model in the iterative training process is determined based on the determination method of the neural network model loss value of the first aspect.
In a fourth aspect, a device for determining a neural network model loss value is provided, including:
the first acquisition module is used for acquiring an output result in the iterative training process of the neural network model;
the calculation module is used for carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value;
the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model.
In a fifth aspect, there is provided a video codec processing apparatus including:
the second acquisition module is used for acquiring unfiltered reconstructed images in the video encoding and decoding process;
The first input module is used for inputting the unfiltered reconstructed image into a loop filter to obtain a filtered reconstructed image;
The loop filter comprises a loop filter model, wherein the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on the determination method of the neural network model loss value in the first aspect.
In a sixth aspect, there is provided a video image processing apparatus comprising:
the second input module is used for inputting the first video image obtained after decoding into the video super-resolution model to obtain a second video image;
the resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and the loss value of the video super-resolution model in the iterative training process is determined based on the determination method of the neural network model loss value of the first aspect.
In a seventh aspect, there is provided a terminal comprising a processor and a memory storing a program or instructions executable on the processor, the program or instructions implementing the steps of the method according to the first aspect, or implementing the steps of the method according to the second aspect, or implementing the steps of the method according to the third aspect when executed by the processor.
An eighth aspect provides a terminal, including a processor and a communication interface, where the processor is configured to obtain an output result in an iterative training process of a neural network model; carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value; the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples for training the neural network model;
Or the processor is used for acquiring unfiltered reconstructed images in the video encoding and decoding process; inputting the unfiltered reconstructed image to a loop filter to obtain a filtered reconstructed image; the loop filter comprises a loop filter model, wherein the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on a determination method of the neural network model loss value;
Or the processor is used for inputting the first video image obtained after decoding into the video super-resolution model to obtain a second video image; the resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the video super-resolution model is determined based on a determination method of the neural network model loss value.
In a ninth aspect, there is provided a readable storage medium having stored thereon a program or instructions which when executed by a processor, performs the steps of the method according to the first aspect, or performs the steps of the method according to the second aspect, or performs the steps of the method according to the third aspect.
In a tenth aspect, there is provided a chip comprising a processor and a communication interface, the communication interface and the processor being coupled, the processor being adapted to run a program or instructions, to carry out the steps of the method according to the first aspect, or to carry out the steps of the method according to the second aspect, or to carry out the steps of the method according to the third aspect.
In an eleventh aspect, a computer program/program product is provided, stored in a storage medium, the computer program/program product being executed by at least one processor to implement the steps of the method as described in the first aspect, or to implement the steps of the method as described in the second aspect, or to implement the steps of the method as described in the third aspect.
The embodiment of the application obtains the output result in the iterative training process of the neural network model; carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value; the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model. In this way, the influence of the target parameters on the training convergence of the neural network model is analyzed in the model training stage, so that the performance of the trained neural network model is improved.
Drawings
FIG. 1 is a flowchart of a method for determining a neural network model loss value according to an embodiment of the present application;
FIG. 2 is an exemplary diagram of a weight function in a method for determining a neural network model loss value according to an embodiment of the present application;
Fig. 3 is a flowchart of a video encoding and decoding processing method according to an embodiment of the present application;
fig. 4 is a flowchart of a video image processing method according to an embodiment of the present application;
FIG. 5 is a block diagram of a determining device for a neural network model loss value according to an embodiment of the present application;
Fig. 6 is a block diagram of a video codec processing device according to an embodiment of the present application;
Fig. 7 is a block diagram of a video image processing apparatus provided in an embodiment of the present application;
Fig. 8 is a schematic structural diagram of a communication device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the application, fall within the scope of protection of the application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the "first" and "second" distinguishing between objects generally are not limited in number to the extent that the first object may, for example, be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/" generally means a relationship in which the associated object is an "or" before and after.
For ease of understanding, some of the following descriptions are directed to embodiments of the present application:
1. Application of neural network model in video codec.
Inspired by the great success of neural network technology in computer vision and image processing tasks, many neural network-based methods have been introduced into the field of video encoding and decoding. Some techniques replace some of the modules in the conventional video standard, and some techniques are end-to-end video codec.
In the loop filtering and super resolution directions, network inputs in addition to reconstructing pixels, some techniques may use video codec related information as additional inputs to the network to guide the network. These information include: quantization parameters, frame type, predicted pixels, boundary strength, etc. The loss functions are typically L1 and L2, L1 being the mean absolute error and L2 being the mean square error.
2. Quantization parameters.
Quantization can effectively reduce the value range of the transformation coefficient, thereby obtaining better compression effect. At the same time, distortion is inevitably introduced, which is the root cause of distortion in video coding. The quantization parameter reflects the compression condition of space details, and the smaller the quantization parameter is, the finer the quantization is, the higher the image quality is, and the smaller the compression rate is; the quantization parameter is increased, which causes detail loss, and the compression ratio is increased as the image quality is reduced more.
3. A loss function.
The function of the loss function is to measure the prediction quality of the neural network model, so that the next training is guided to be carried out in the correct direction, and the gradient back propagation method can work. The quality of the loss function design is directly related to the final convergence degree of the neural network model. The calculation formulas of two loss functions, which are relatively commonly used in the field of video encoding and decoding, are as follows.
Average absolute error:
Mean square error:
wherein y i is the original pixel value of the sample point of the video image, f (x i) is the pixel value of the sample point of the video image after the video image passes through the neural network model, and n is the number of the sample points of the video image.
In the field of video coding and decoding, in order to train a unified model, a scheme based on a neural network model usually uses a quantization parameter as one of inputs of the neural network model to achieve the effect of processing different video qualities. In the training process, the loss function is mostly mean square error and the like, and parameters affecting the performance of the neural network model such as quantization parameters and the like are not considered, so that the method for determining the loss value of the neural network model is provided.
The method for determining the neural network model loss value provided by the embodiment of the application is described in detail below through some embodiments and application scenes thereof with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present application provides a method for determining a neural network model loss value, where, as shown in fig. 1, the method for determining a neural network model loss value includes:
step 101, obtaining an output result in the iterative training process of a neural network model;
102, carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value;
the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model.
In the embodiment of the application, a training sample and a test sample can be set first, the training sample is marked, iterative training can be carried out on the neural network model in batches after the marking is completed, in each iterative training process, the loss calculation is carried out by utilizing the target loss function, and the network parameters of the neural network model are adjusted based on the loss value obtained by calculation until the loss converges, so that the neural network model after the training is completed is obtained. For example, the change in the loss value satisfies a preset condition.
Alternatively, the above-mentioned predetermined loss function may be understood as a conventional loss function, for example, in some embodiments, the predetermined loss function is an average absolute error function or a mean square error function. Optionally, for video encoding, a loss between a sample point of the video image after passing through the neural network model and an original pixel value of the video image sample point may be obtained based on a preset loss function.
Alternatively, in some embodiments, the neural network model may be used for video encoding and decoding, and the target parameter is a parameter affecting compression efficiency. For example, the target parameter may include at least one of: quantization parameters, boundary strength, block partition information, distortion size, resolution, and frame type. Of course, in other embodiments, the neural network may be a neural network model for other processes, for example, a neural network model for image recognition, which is not described herein. By introducing the target parameters during loss calculation, the influence of the target parameters on training convergence of the neural network model in different application scenes can be further analyzed, and further the performance of the neural network model can be further improved, namely the compression efficiency of the neural network model is improved.
It should be appreciated that the frame types described above may include I frames, B frames, and P frames.
Optionally, in some embodiments, the target parameter may further be an input parameter of the neural network model, such as the quantization parameter and the frame type are input parameters of the neural network model.
The embodiment of the application obtains the output result in the iterative training process of the neural network model; carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value; the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model. In this way, the influence of the target parameters on the training convergence of the neural network model is analyzed in the model training stage, so that the performance of the trained neural network model is improved.
Optionally, in some embodiments, the objective loss function is:
Loss=w*A(x,y);
or loss=α×w×a (x, y);
wherein Loss represents the Loss value, w represents a weight value determined based on the target parameter, a (x, y) represents the preset Loss function, α is an attenuation factor, and α is determined based on the number of iterative training.
In the embodiment of the present application, the setting of the weight function of w may be set according to actual needs, for example, in some embodiments, the weight function for calculating w may be a decreasing function, and of course, an increasing function, a concave function, or a convex function with the target parameter as a unique or one independent variable may be designed according to the needs of the problem to be solved, which is not limited herein.
The value of the attenuation factor may decrease gradually with increasing number of iterations. For example, in some embodiments, the value of the decay factor is 1 when the number of iterative training is less than or equal to a first threshold, 0.5 when the number of iterative training is greater than the first threshold and less than or equal to a second threshold, and 0.2 when the number of iterative training is greater than the second threshold. Of course, in other embodiments, other values may be further adopted, which are not listed here.
As the attenuation factor is set, the loss value of the training sample can be balanced along with the increase of the training times, and the generalization of the network model is improved.
Optionally, in some embodiments, where the target parameter is a quantization parameter, calculating the weight function of w satisfies:
wherein, B and C are constants, qp is the quantization parameter.
In the embodiment of the present application, the value range of the quantization parameter may be 0to 51. The above B may be 30, c may be 6, and the weight function set based on the quantization parameter may be a decreasing function, as shown in fig. 2.
Optionally, in some embodiments, where the neural network model is used for video encoding and decoding, the w includes a first weight value determined based on luminance and a second weight value determined based on chrominance, wherein the first weight value and the second weight value are determined based on the same or different weight functions.
In other words, in the embodiment of the application, two different weight functions can be designed for the neural network model trained on luminance and chromaticity, so as to reduce the influence of the luminance and chromaticity model performance respectively and further improve the model performance. Alternatively, a weight function can be uniformly designed to reduce the difficulty of model training.
Optionally, in some embodiments, in a case where the neural network model is used for video encoding and decoding, weight functions corresponding to different frame types are different, and the weight functions are used to calculate the w.
In the embodiment of the application, if different neural network models are trained according to the frame types, different weight functions can be designed in a targeted manner, and in some embodiments, a weight function can be designed uniformly.
Optionally, in some embodiments, where the neural network model is used for video encoding and decoding, and the target parameters include sequence-level quantization parameters and frame-level quantization parameters, the method further comprises:
Determining a reference weight according to the quantization parameter of the sequence level and the first weight function;
The w is determined based on the reference weight, the quantization parameter at the sequence level, the quantization parameter at the frame level, and the second weight function.
In the embodiment of the application, on the basis of the reference weight obtained based on the sequence-level quantization parameter, the final weight value is obtained by using the frame-level quantization parameter, so that different weight values can be set for each frame more finely, and the performance of the model is further improved.
Optionally, the first weight function satisfies:
and/or, the second weight function satisfies:
Where B and C are constants, w 0 represents the reference weight, qp 0 represents a quantization parameter at the sequence level, and qp 1 represents a quantization parameter at the frame level.
Optionally, in some embodiments, the calculating the loss of the output result using a target loss function, obtaining a loss value includes:
and under the condition that the number of iterative training is smaller than or equal to a preset threshold value, carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value.
In the embodiment of the application, the magnitude of the preset threshold value can be set according to actual needs. And are not further defined herein.
Further, in some embodiments, the method further comprises: and under the condition that the number of iterative training is larger than a preset threshold value, carrying out loss calculation on the output result by utilizing the preset loss function to obtain a loss value.
It should be understood that, in the embodiment of the present application, when the number of iterative training is greater than the preset threshold, the target parameter is not considered any more, so that the performance balance of the model on different training samples can be further improved.
Referring to fig. 3, the embodiment of the present application further provides a video codec processing method, as shown in fig. 3, where the video codec processing method includes:
Step 301, obtaining an unfiltered reconstructed image in the video encoding and decoding process;
step 302, inputting the unfiltered reconstructed image to a loop filter to obtain a filtered reconstructed image;
The loop filter comprises a loop filter model, wherein the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on the determination method of the neural network model loss value.
In the embodiment of the application, the loss value is determined by applying the method for determining the neural network model loss value in the training process of the loop filter model, so that the efficiency of compressing the video image by the loop filter model can be improved.
Referring to fig. 4, an embodiment of the present application further provides a video image processing method, as shown in fig. 4, including:
Step 401, inputting the first video image obtained after decoding into a video super-resolution model to obtain a second video image;
The resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and the loss value of the video super-resolution model in the iterative training process is determined based on the determination method of the loss value of the neural network model.
In the embodiment of the application, the loss value is determined by applying the method for determining the neural network model loss value in the training process of the video super-resolution model, so that the compression effect of the image can be improved, and the definition of the video image for super-resolution amplification is further improved.
According to the method for determining the neural network model loss value, provided by the embodiment of the application, the execution main body can be a device for determining the neural network model loss value. In the embodiment of the application, the method for determining the neural network model loss value by using the device for determining the neural network model loss value as an example is explained.
Referring to fig. 5, the embodiment of the present application further provides a device for determining a neural network model loss value, as shown in fig. 5, where the device 500 for determining a neural network model loss value includes:
The first obtaining module 501 is configured to obtain an output result in the iterative training process of the neural network model;
The calculating module 502 is configured to perform loss calculation on the output result by using a target loss function, so as to obtain a loss value;
the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model.
Optionally, the neural network model is used for video encoding and decoding, and the target parameter is a parameter affecting compression efficiency.
Optionally, the target parameter includes at least one of:
quantization parameters, boundary strength, block partition information, distortion size, resolution, and frame type.
Optionally, the objective loss function is:
Loss=w*A(x,y);
or loss=α×w×a (x, y);
wherein Loss represents the Loss value, w represents a weight value determined based on the target parameter, a (x, y) represents the preset Loss function, α is an attenuation factor, and α is determined based on the number of iterative training.
Optionally, in the case that the target parameter is a quantization parameter, calculating the weight function of w satisfies:
wherein, B and C are constants, qp is the quantization parameter.
Optionally, in a case where the neural network model is used for video encoding and decoding, the w includes a first weight value determined based on luminance and a second weight value determined based on chrominance, wherein the first weight value and the second weight value are determined based on the same or different weight functions.
Optionally, in the case that the neural network model is used for video encoding and decoding, the weight functions corresponding to different frame types are different, and the weight functions are used for calculating the w.
Optionally, the determining device of the neural network model loss value further includes:
The determining module is used for determining a reference weight according to the quantization parameter of the sequence level and the first weight function; the w is determined based on the reference weight, the quantization parameter at the sequence level, the quantization parameter at the frame level, and the second weight function.
Optionally, the first weight function satisfies:
and/or, the second weight function satisfies:
Where B and C are constants, w 0 represents the reference weight, qp 0 represents a quantization parameter at the sequence level, and qp 1 represents a quantization parameter at the frame level.
Optionally, the calculation module is specifically configured to perform loss calculation on the output result by using a target loss function to obtain a loss value when the number of iterative training times is less than or equal to a preset threshold.
Optionally, the calculation module is further configured to perform loss calculation on the output result by using the preset loss function to obtain a loss value when the number of iterative training times is greater than a preset threshold.
Optionally, the preset loss function is an average absolute error function or a mean square error function.
The device for determining the neural network model loss value in the embodiment of the application can be electronic equipment, such as electronic equipment with an operating system, or can be a component in the electronic equipment, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the terminals may include, but are not limited to, the types of terminals 11 listed above, other devices may be servers, network attached storage (Network Attached Storage, NAS), etc., and embodiments of the present application are not limited in detail.
The device for determining the neural network model loss value provided by the embodiment of the application can realize each process realized by the method embodiment of fig. 1 and achieve the same technical effects, and is not repeated here for avoiding repetition.
According to the video encoding and decoding processing method provided by the embodiment of the application, the execution main body can be a video encoding and decoding processing device. In the embodiment of the present application, a video codec processing device executes a video codec processing method as an example, and the video codec processing device provided in the embodiment of the present application is described.
Referring to fig. 6, an embodiment of the present application further provides a video codec processing apparatus, as shown in fig. 6, the video codec processing apparatus 600 includes:
A second obtaining module 601, configured to obtain an unfiltered reconstructed image in a video encoding and decoding process;
A first input module 602, configured to input the unfiltered reconstructed image into a loop filter, and obtain a filtered reconstructed image;
The loop filter comprises a loop filter model, wherein the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on the determination method of the neural network model loss value.
The video encoding and decoding processing device provided by the embodiment of the application can realize each process realized by the method embodiment of fig. 3 and achieve the same technical effect, and in order to avoid repetition, the description is omitted here.
Referring to fig. 7, an embodiment of the present application further provides a video image processing apparatus, as shown in fig. 7, the video image processing apparatus 700 includes:
the second input module 701 is configured to input the first video image obtained after decoding to a video super-resolution model to obtain a second video image;
The resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and the loss value of the video super-resolution model in the iterative training process is determined based on the determination method of the loss value of the neural network model.
The video image processing device provided by the embodiment of the present application can implement each process implemented by the method embodiment of fig. 4, and achieve the same technical effects, and in order to avoid repetition, a detailed description is omitted here.
Optionally, as shown in fig. 8, the embodiment of the present application further provides a communication device 800, including a processor 801 and a memory 802, where the memory 802 stores a program or an instruction that can be executed on the processor 801, where the program or the instruction implements each step of the embodiment of the determining device for a neural network model loss value when executed by the processor 801, and the steps can achieve the same technical effects, and are not repeated herein.
The embodiment of the application also provides a terminal which comprises a processor and a communication interface, wherein the processor is used for acquiring an output result in the iterative training process of the neural network model; carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value; the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples for training the neural network model;
Or the processor is used for acquiring unfiltered reconstructed images in the video encoding and decoding process; inputting the unfiltered reconstructed image to a loop filter to obtain a filtered reconstructed image; the loop filter comprises a loop filter model, wherein the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on a determination method of the neural network model loss value;
Or the processor is used for inputting the first video image obtained after decoding into the video super-resolution model to obtain a second video image; the resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the video super-resolution model is determined based on a determination method of the neural network model loss value.
The terminal embodiment corresponds to the terminal-side method embodiment, and each implementation process and implementation manner of the method embodiment can be applied to the terminal embodiment, and the same technical effects can be achieved. Specifically, fig. 5 is a schematic hardware structure of a terminal for implementing an embodiment of the present application.
The terminal 900 includes, but is not limited to: at least some of the components of the radio frequency unit 901, the network module 902, the audio output unit 903, the input unit 904, the sensor 905, the display unit 906, the user input unit 907, the interface unit 908, the memory 909, and the processor 910, etc.
Those skilled in the art will appreciate that the terminal 900 may further include a power source (e.g., a battery) for powering the various components, and the power source may be logically coupled to the processor 910 by a power management system so as to perform functions such as managing charging, discharging, and power consumption by the power management system. The terminal structure shown in fig. 9 does not constitute a limitation of the terminal, and the terminal may include more or less components than shown, or may combine some components, or may be arranged in different components, which will not be described in detail herein.
It should be appreciated that in embodiments of the present application, the input unit 904 may include a graphics processing unit (Graphics Processing Unit, GPU) 9041 and a microphone 9042, with the graphics processor 9041 processing image data of still pictures or video obtained by an image capture device (e.g., a camera) in a video capture mode or an image capture mode. The display unit 906 may include a display panel 9061, and the display panel 9061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 907 includes at least one of a touch panel 9071 and other input devices 9072. Touch panel 9071, also referred to as a touch screen. The touch panel 9071 may include two parts, a touch detection device and a touch controller. Other input devices 9072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
In the embodiment of the present application, after receiving downlink data from a network side device, the radio frequency unit 901 may transmit the downlink data to the processor 910 for processing; in addition, the radio frequency unit 901 may send uplink data to the network side device. Typically, the radio frequency unit 901 includes, but is not limited to, an antenna, an amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
The memory 909 may be used to store software programs or instructions as well as various data. The memory 909 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 909 may include a volatile memory or a nonvolatile memory, or the memory 909 may include both volatile and nonvolatile memories. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDRSDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct random access memory (DRRAM). Memory 909 in embodiments of the application includes, but is not limited to, these and any other suitable types of memory.
Processor 910 may include one or more processing units; optionally, the processor 910 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, etc., and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 910.
The processor 910 is configured to obtain an output result in the iterative training process of the neural network model; carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value; the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples for training the neural network model;
Or the processor 910 is configured to obtain an unfiltered reconstructed image during video encoding and decoding; inputting the unfiltered reconstructed image to a loop filter to obtain a filtered reconstructed image; the loop filter comprises a loop filter model, wherein the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on a determination method of the neural network model loss value;
Or the processor 910 is configured to input the first video image obtained after decoding to a video super-resolution model to obtain a second video image; the resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the video super-resolution model is determined based on a determination method of the neural network model loss value.
The embodiment of the application obtains the output result in the iterative training process of the neural network model; carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value; the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model. In this way, the influence of the target parameters on the training convergence of the neural network model is analyzed in the model training stage, so that the performance of the trained neural network model is improved.
The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above embodiment of the method for determining a neural network model loss value, and can achieve the same technical effect, so that repetition is avoided, and no further description is given here.
Wherein the processor is a processor in the terminal described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, the chip comprises a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running a program or instructions, the various processes of the embodiment of the method for determining the neural network model loss value can be realized, the same technical effects can be achieved, and the repetition is avoided, so that the description is omitted.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, or the like.
The embodiment of the present application further provides a computer program/program product, where the computer program/program product is stored in a storage medium, and the computer program/program product is executed by at least one processor to implement each process of the above embodiment of the method for determining a neural network model loss value, and the same technical effects can be achieved, so that repetition is avoided, and details are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (30)

1. A method for determining a neural network model loss value, comprising:
Obtaining an output result in the iterative training process of the neural network model;
Carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value;
the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model.
2. The method of claim 1, wherein the neural network model is used for video encoding and decoding, and the target parameter is a parameter affecting compression efficiency.
3. The method of claim 2, wherein the target parameters include at least one of:
quantization parameters, boundary strength, block partition information, distortion size, resolution, and frame type.
4. A method according to any one of claims 1 to 3, wherein the target loss function is:
Loss=w*A(x,y);
or loss=α×w×a (x, y);
wherein Loss represents the Loss value, w represents a weight value determined based on the target parameter, a (x, y) represents the preset Loss function, α is an attenuation factor, and α is determined based on the number of iterative training.
5. The method according to claim 4, wherein, in case the target parameter is a quantization parameter, calculating the weight function of w satisfies:
wherein, B and C are constants, qp is the quantization parameter.
6. The method of claim 4, wherein in the case where the neural network model is used for video coding, the w comprises a first weight value determined based on luminance and a second weight value determined based on chrominance, wherein the first weight value and the second weight value are determined based on the same or different weight functions.
7. The method of claim 4, wherein in the case where the neural network model is used for video codec, weight functions corresponding to different frame types are different, the weight functions being used to calculate the w.
8. The method of claim 4, wherein in the case where the neural network model is used for video coding and the target parameters include quantization parameters at a sequence level and quantization parameters at a frame level, the method further comprises:
Determining a reference weight according to the quantization parameter of the sequence level and the first weight function;
The w is determined based on the reference weight, the quantization parameter at the sequence level, the quantization parameter at the frame level, and the second weight function.
9. The method of claim 8, wherein the first weight function satisfies:
and/or, the second weight function satisfies:
Where B and C are constants, w 0 represents the reference weight, qp 0 represents a quantization parameter at the sequence level, and qp 1 represents a quantization parameter at the frame level.
10. The method according to any one of claims 1 to 9, wherein the calculating the loss of the output result using a target loss function, to obtain a loss value, comprises:
and under the condition that the number of iterative training is smaller than or equal to a preset threshold value, carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value.
11. The method according to claim 10, wherein the method further comprises:
And under the condition that the number of iterative training is larger than a preset threshold value, carrying out loss calculation on the output result by utilizing the preset loss function to obtain a loss value.
12. The method according to any one of claims 1 to 11, wherein the predetermined loss function is an average absolute error function or a mean square error function.
13. A video encoding and decoding processing method, comprising:
obtaining an unfiltered reconstructed image in the video encoding and decoding process;
Inputting the unfiltered reconstructed image to a loop filter to obtain a filtered reconstructed image;
the loop filter comprises a loop filter model, the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on the determination method of the neural network model loss value according to any one of claims 1 to 12.
14. A video image processing method, comprising:
inputting the first video image obtained after decoding into a video super-resolution model to obtain a second video image;
The resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and the loss value of the video super-resolution model in the iterative training process is determined based on the determination method of the neural network model loss value of any one of claims 1 to 12.
15. A neural network model loss value determining apparatus, comprising:
the first acquisition module is used for acquiring an output result in the iterative training process of the neural network model;
the calculation module is used for carrying out loss calculation on the output result by utilizing a target loss function to obtain a loss value;
the target loss function is determined based on a preset loss function and target parameters, the target parameters are parameters affecting the performance of the neural network model, and the target parameters are associated with training samples used for training the neural network model.
16. The apparatus of claim 15, wherein the neural network model is used for video encoding and decoding, and the target parameter is a parameter affecting compression efficiency.
17. The apparatus of claim 16, wherein the target parameter comprises at least one of:
quantization parameters, boundary strength, block partition information, distortion size, resolution, and frame type.
18. The apparatus according to any one of claims 15 to 17, wherein the target loss function is:
Loss=w*A(x,y);
or loss=α×w×a (x, y);
wherein Loss represents the Loss value, w represents a weight value determined based on the target parameter, a (x, y) represents the preset Loss function, α is an attenuation factor, and α is determined based on the number of iterative training.
19. The apparatus of claim 18, wherein, in the case where the target parameter is a quantization parameter, the weight function of w is calculated to satisfy:
wherein, B and C are constants, qp is the quantization parameter.
20. The apparatus of claim 18, wherein the w comprises a first weight value determined based on luminance and a second weight value determined based on chrominance, wherein the first weight value and the second weight value are determined based on a same or different weight function, in the case where the neural network model is used for video codec.
21. The apparatus of claim 18, wherein in the case where the neural network model is used for video codec, weight functions corresponding to different frame types are different, the weight functions being used to calculate the w.
22. The apparatus of claim 18, wherein the means for determining the neural network model loss value further comprises:
The determining module is used for determining a reference weight according to the quantization parameter of the sequence level and the first weight function; the w is determined based on the reference weight, the quantization parameter at the sequence level, the quantization parameter at the frame level, and the second weight function.
23. The apparatus of claim 22, wherein the first weight function satisfies:
and/or, the second weight function satisfies:
Where B and C are constants, w 0 represents the reference weight, qp 0 represents a quantization parameter at the sequence level, and qp 1 represents a quantization parameter at the frame level.
24. The apparatus according to any one of claims 15 to 23, wherein the calculating module is specifically configured to perform a loss calculation on the output result by using a target loss function to obtain a loss value when the number of iterative training is less than or equal to a preset threshold.
25. The apparatus of claim 24, wherein the calculating module is further configured to perform a loss calculation on the output result using the preset loss function to obtain a loss value if the number of iterative training is greater than a preset threshold.
26. The apparatus according to any one of claims 15 to 25, wherein the predetermined loss function is an average absolute error function or a mean square error function.
27. A video codec processing apparatus, comprising:
the second acquisition module is used for acquiring unfiltered reconstructed images in the video encoding and decoding process;
The first input module is used for inputting the unfiltered reconstructed image into a loop filter to obtain a filtered reconstructed image;
the loop filter comprises a loop filter model, the loop filter model is a neural network model obtained through iterative training, and a loss value in the iterative training process of the loop filter model is determined based on the determination method of the neural network model loss value according to any one of claims 1 to 12.
28. A video image processing apparatus, comprising:
the second input module is used for inputting the first video image obtained after decoding into the video super-resolution model to obtain a second video image;
The resolution of the first video image is smaller than that of the second video image, the video super-resolution model is a neural network model obtained through iterative training, and the loss value of the video super-resolution model in the iterative training process is determined based on the determination method of the neural network model loss value of any one of claims 1 to 12.
29. A terminal comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the neural network model loss value determination method of any one of claims 1 to 14, or implement the steps of the video codec processing method of claim 15, or implement the steps of the video image processing method of claim 16.
30. A readable storage medium, wherein a program or instructions is stored on the readable storage medium, which when executed by the processor, implements the steps of the neural network model loss value determination method according to any one of claims 1 to 14, or implements the steps of the video codec processing method according to claim 15, or implements the steps of the video image processing method according to claim 16.
CN202211262653.8A 2022-10-14 2022-10-14 Method for determining neural network model loss value and related application method and equipment Pending CN117933333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211262653.8A CN117933333A (en) 2022-10-14 2022-10-14 Method for determining neural network model loss value and related application method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211262653.8A CN117933333A (en) 2022-10-14 2022-10-14 Method for determining neural network model loss value and related application method and equipment

Publications (1)

Publication Number Publication Date
CN117933333A true CN117933333A (en) 2024-04-26

Family

ID=90768897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211262653.8A Pending CN117933333A (en) 2022-10-14 2022-10-14 Method for determining neural network model loss value and related application method and equipment

Country Status (1)

Country Link
CN (1) CN117933333A (en)

Similar Documents

Publication Publication Date Title
US11153575B2 (en) Electronic apparatus and control method thereof
CN109685746B (en) Image brightness adjusting method and device, storage medium and terminal
US10863206B2 (en) Content-weighted deep residual learning for video in-loop filtering
US20140140615A1 (en) Global Approximation to Spatially Varying Tone Mapping Operators
US20140126808A1 (en) Recursive conditional means image denoising
JP2017537403A (en) Method, apparatus and computer program product for generating a super-resolution image
US20230419452A1 (en) Method and device for correcting image on basis of compression quality of image in electronic device
CN111783957B (en) Model quantization training method and device, machine-readable storage medium and electronic equipment
US20240064296A1 (en) Network based image filtering for video coding
CN116258651A (en) Image processing method and related device
CN111050169B (en) Method and device for generating quantization parameter in image coding and terminal
CN104599238A (en) Image processing method and device
Zhang et al. Extremely efficient PM2. 5 estimator based on analysis of saliency and statistics
CN117933333A (en) Method for determining neural network model loss value and related application method and equipment
CN112511890A (en) Video image processing method and device and electronic equipment
CN113780252B (en) Training method of video processing model, video processing method and device
US10764578B2 (en) Bit rate optimization system and method
CN116245769A (en) Image processing method, device, equipment and storage medium
US11503306B2 (en) Image and video data processing method and system
CN114821730A (en) Face recognition method, device, equipment and computer readable storage medium
WO2022120285A1 (en) Network based image filtering for video coding
CN114170251A (en) Image semantic segmentation method and device and electronic equipment
WO2024007952A1 (en) Loop filtering method and apparatus, and device
WO2024078404A1 (en) Feature map processing method and apparatus, and device
WO2023125231A1 (en) Loop filtering method and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination