CN112348045B

CN112348045B - Training method and training device of neural network and electronic equipment

Info

Publication number: CN112348045B
Application number: CN201910735387.8A
Authority: CN
Inventors: 章政文; 王国利; 张骞
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2019-08-09
Filing date: 2019-08-09
Publication date: 2024-08-09
Anticipated expiration: 2039-08-09
Also published as: CN112348045A

Abstract

A neural network training method, a neural network training device and electronic equipment are disclosed. The training method of the neural network comprises the following steps: acquiring a feature map of an input tagged data set from the tagged data set through a neural network; inputting the feature map of the labeled dataset into a Softmax activation function of the neural network to obtain a probability output value of the corresponding position of the label; calculating a weighted value based on the monotonically increasing convex function of the probability output value and weighting the original loss function value of the neural network with the weighted value to obtain a new loss function value; and updating parameters of the neural network based on the new loss function value. In this way, the robustness of the neural network to tag noise may be improved.

Description

Training method and training device of neural network and electronic equipment

Technical Field

The present application relates to the field of deep learning, and more particularly, to a training method of a neural network, a training apparatus of the neural network, and an electronic device.

Background

Currently, in the field of deep learning, it is common to train neural networks using tagged data to achieve classification, regression, or other purposes, and this method of training model learning rules is generally referred to as supervised learning.

In supervised learning, the quality of the label corresponding to the training data is critical to the learning effect. If the tag data used at the time of learning is erroneous, it is impossible to train out a valid model. Meanwhile, because the neural network used for deep learning is often complex in structure, in order to obtain a good learning effect, the quantity of training data with labels is also required to be high.

However, in the process of labeling mass data, tag noise is inevitably generated, which seriously reduces the performance of the model in the training process. This is because the task of tagging data is manually implemented in many scenarios, and the massive, high quality tags themselves are time consuming and laborious and economically relatively expensive. Thus, in practice, the deep learning problem must be faced with the effects of tag noise, i.e., each tagged data set is assumed to contain noise therein.

Therefore, there is a need for a training method for neural networks that can effectively cope with tag noise.

Disclosure of Invention

The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a neural network training method, a neural network training device and electronic equipment, which enable the neural network to train in a self-learning mode and improve the robustness of the neural network to tag noise.

According to an aspect of the present application, there is provided a training method of a neural network, including: acquiring a feature map of an input tagged data set from the tagged data set through a neural network; inputting the feature map of the labeled dataset into a Softmax activation function of the neural network to obtain a probability output value of the corresponding position of the label; calculating a weighted value based on the monotonically increasing convex function of the probability output value and weighting the original loss function value of the neural network with the weighted value to obtain a new loss function value; and updating parameters of the neural network based on the new loss function value.

According to another aspect of the present application, there is provided a training apparatus for a neural network, including: a feature map acquisition unit for acquiring a feature map of an input labeled dataset from the labeled dataset through a neural network; the probability obtaining unit is used for inputting the feature map of the tagged data set obtained by the feature map obtaining unit into a Softmax activation function of the neural network to obtain a probability output value of the corresponding position of the tag; a loss function obtaining unit configured to calculate a weighting value based on the monotonically increasing convex function of the probability output value obtained by the probability obtaining unit and weight an original loss function value of the neural network with the weighting value to obtain a new loss function value; and a network updating unit configured to update parameters of the neural network based on the new loss function value obtained by the loss function obtaining unit.

According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the neural network training method as described above.

According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the method of training a neural network as described above.

When the neural network is trained by using the data set with the label noise, the training method of the neural network, the training device of the neural network and the electronic equipment provided by the application use the function value of the monotonically increasing convex function of the probability output value of the label position after the softmax layer as the weight for updating the loss function value, so that the training of the neural network is performed in a self-learning mode. Therefore, the neural network can preferentially learn the mode in the training process of the neural network, so that the weight distributed by the noise sample is reduced, the weight distributed by the positive sample is increased, and the neural network can be more focused on the learning of the positive sample when training in a self-learning mode, so that the robustness of the label noise is realized.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 illustrates a flowchart of a training method of a neural network according to an embodiment of the present application.

Fig. 2 illustrates a schematic diagram of a framework structure of a neural network according to an embodiment of the present application.

FIG. 3 illustrates a block diagram of a first example of a training apparatus of a neural network, in accordance with an embodiment of the present application

FIG. 4 illustrates a block diagram of a second example of a training apparatus of a neural network, in accordance with an embodiment of the present application

Fig. 5 illustrates a block diagram of a first example of a network update unit of a training apparatus of a neural network according to an embodiment of the present application.

Fig. 6 illustrates a block diagram of a second example of a network update unit of a training apparatus of a neural network according to an embodiment of the present application.

Fig. 7 illustrates a block diagram of an electronic device according to an embodiment of the application.

Detailed Description

Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

Summary of the application

As described above, the present method includes a method of using a loss function of robustness and a method of course learning by predicting a noise transfer matrix for tag noise. In general, when the data set is small and the model is simple, a method for predicting a noise transfer matrix or a robust loss function method is used; in processing a large number of noisy data sets, a course learning method may be used.

However, the method of predicting the noise transfer matrix requires the use of an additional network structure to predict the noise transfer matrix, and the robust loss function is difficult to train a large network structure applied to multi-classification tasks, which is difficult to function with a huge amount of noisy data sets. The course learning method depends on the quality of courses given during training, and the model performance is reduced by adopting a hard threshold method based on the lost courses, and the training is complicated by using an extra course generation structure with a large amount of extra calculation and super parameters.

The basic concept of the present application is to input a tagged data set into a neural network to obtain a feature map of the tagged data set, input the feature map into a Softmax activation function of the neural network to obtain a probability output value of a tag corresponding position of the tagged data set, and weight an original loss function value of the neural network based on a function value of a monotonically increasing convex function of the probability output value, so as to update the neural network with a new weighted loss function value.

Specifically, the training method of the neural network, the training device of the neural network and the electronic equipment provided by the application firstly acquire the feature map of the tagged data set from the input tagged data set through the neural network, then input the feature map of the tagged data set into a Softmax activation function of the neural network to acquire a probability output value of a corresponding position of the tag, then calculate a weighting value based on a monotonically increasing convex function of the probability output value, weight an original loss function value of the neural network by the weighting value to acquire a new loss function value, and finally update parameters of the neural network based on the new loss function value.

That is, when the training method of the neural network, the training device of the neural network and the electronic equipment provided by the application are used for training the neural network by using the data set with the label noise, the function value of the monotonically increasing convex function of the probability output value of the label position after the softmax layer is used as the weight of the original loss function value allocated to the neural network, so that the training of the neural network is performed in a self-learning mode.

Therefore, the neural network can prioritize the learning mode, the weight distributed by the noise sample is reduced in the training process of the neural network, and the weight distributed by the positive sample is increased, so that the neural network can pay more attention to the learning of the positive sample when training in a self-learning mode, and the robustness of the label noise is realized.

Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.

Exemplary method

As shown in fig. 1, the training method of the neural network according to the embodiment of the present application includes the following steps.

Step S110, acquiring a feature map of the labeled dataset from the input labeled dataset through a neural network. Here, the labeled data set is labeled training data for training the neural network, and may be, for example, a plurality of images with labels. Accordingly, the feature map of the tagged data set obtained through the neural network may be a plurality of feature maps corresponding to a plurality of images.

Fig. 2 illustrates a schematic diagram of a framework structure of a neural network according to an embodiment of the present application. As shown IN fig. 2, a labeled dataset (e.g., a plurality of images IN) is input to a neural network N of the base model, and a feature map F is obtained through the neural network N.

Step S120, inputting the feature map of the labeled dataset into a Softmax activation function of the neural network to obtain a probability output value of the label corresponding position. Here, as described above, the tags in the tagged dataset contain tag noise, i.e., the tags may be true or false. But because the neural network will learn the real pattern among the data first, the probability output value of the corresponding location of the false tag is low.

With continued reference to fig. 2, after inputting the feature map F into the Softmax activation function, a probability output value p _t for the corresponding location of the tag B may be obtained. Here, the probability output value p _t may be calculated as follows:

Where x is the input of the neural network, i.e. the labeled dataset, f _c (x) is the feature map obtained by the neural network, u is the weight of the nodes of the neural network, and b is the deviation of the nodes of the neural network.

Step S130, calculating a weighted value based on the monotonically increasing convex function of the probability output value and weighting the original loss function value of the neural network with the weighted value to obtain a new loss function value. With continued reference to fig. 2, for the probability output value p _t, a monotonically increasing convex function is calculated:

v＝f(p_t)

here, the monotonically increasing convex function f may be in any form that satisfies the requirements, for example, may be a power function, that is:

v＝(p_t)^β

and, let v e 0,1, a weighting value can be obtained that weights the original loss function values of the neural network. For example, for the above power function, β ε [0,1] is made.

In this way, the weight v of the course study (curriculums) is obtained, so that the weight of the course study generated can be weighted to the loss function value L as shown in fig. 2. Here, in the field of deep learning, course learning refers to learning weights for weighting either directly the loss function values or for weighting data, such as sample data.

The way the new loss function value is calculated can be expressed as:

L_r＝v*L

Where L is the original loss function value and L _r is the new loss function value.

And step S140, updating parameters of the neural network based on the new loss function value. Referring to fig. 2, in particular, parameters of the neural network N, i.e., the weights u and the deviations b of the neural network nodes, may be updated using a gradient descent algorithm based on the new loss function value L.

In this way, the loss function value is weighted by the weight obtained in the course learning mode in the updating process of the parameters of the neural network, and the neural network can prioritize the learning mode, so that the weight distributed to the noise sample of the label is reduced, the weight distributed to the positive sample of the label is increased, the neural network can concentrate on the learning of the positive sample of the label when training in the self-learning mode, and the robustness of the label noise is realized.

In one example, in a training method of a neural network according to an embodiment of the present application, before acquiring a feature map of an input labeled dataset from the labeled dataset through the neural network, comprising: and initializing parameters of the neural network.

That is, by initializing parameters of a neural network such as a deep neural network, training of the neural network can be performed in a training manner of the neural network according to an embodiment of the present application, thereby obtaining a neural network model robust to tag noise. Here, the parameter initialization may select any initialization method that facilitates training of the neural network, such as a random initialization method.

In one example, in a training method of a neural network according to an embodiment of the present application, calculating a weighting value based on a monotonically increasing convex function of the probability output value and weighting an original loss function value of the neural network with the weighting value to obtain a new loss function value includes: and weighting the original loss function value of the neural network by taking the value of the monotonically increasing convex function of the probability output value as a sample confidence value to obtain a new loss function value of the neural network.

That is, the probability output value can represent whether the label sample is true or false, and therefore, the value of the monotonically increasing convex function of the probability output value can be used as a sample confidence value, thereby enabling training of the neural network to focus more on the positive sample of the label and ignore the noise sample of the label. Therefore, the trained neural network can exhibit robustness to tag noise.

In one example, in a training method of a neural network according to an embodiment of the present application, updating parameters of the neural network based on the new loss function value includes: the steps of acquiring a feature map, acquiring a probability output value, calculating a weighting value and calculating a new loss function value and updating parameters of the neural network are repeated in an iterative manner.

That is, the prediction accuracy of the neural network can be improved continuously through the weight v of course learning formed by the neural network itself in an iterative training manner. Moreover, as the iterative process proceeds, the assigned weight of the noise samples approaches zero, while the assigned weight of the positive samples approaches one. Therefore, the neural network can pay more attention to the learning of the positive sample and ignore the noise sample of the label in training by a self-learning mode, so that the robustness of the label noise is realized.

In the training process of the actual neural network, a stop condition of the iteration may be set, for example, whether the neural network after the iteration converges, and whether the iteration of the neural network reaches a predetermined number of times. After stopping the iteration, the validation set may be used to select the optimal neural network.

That is, in the training method of the neural network according to the embodiment of the present application, the steps of iteratively repeating the steps of acquiring a feature map, acquiring a probability output value, calculating a weighting value, and calculating a new loss function value, and updating parameters of the neural network include: determining whether the updated neural network converges; and stopping iterative updating of the neural network in response to convergence of the updated neural network. Specifically, the accuracy of the updated neural network to the verification set may be obtained, and whether the neural network converges may be determined by the accuracy of the verification set. That is, if the accuracy of the obtained result after the verification set is input into the updated neural network does not rise any more, the neural network may be considered to have converged, thereby stopping the iterative update of the neural network.

Or in the training method of the neural network according to the embodiment of the present application, the steps of iteratively repeating the steps of acquiring a feature map, acquiring a probability output value, calculating a weighting value, calculating a new loss function, and updating the neural network include: setting the maximum iteration update times of the neural network; and stopping iterative updating of the neural network in response to the iterative updating number of the neural network reaching the maximum iterative updating number. Here, the maximum number of iterative updates of the neural network may be 5 to 20 times, for example, the maximum number of iterative updates may be set to 10 times, and if the neural network has been iteratively updated 10 times, the iterative update of the neural network is stopped.

In this way, by setting the stopping condition of iteration in the training process of the neural network, the excessive iteration times of the neural network can be avoided, so that the training time is shortened and the training cost is reduced.

Exemplary apparatus

Fig. 3 illustrates a block diagram of a first example of a training apparatus of a neural network according to an embodiment of the present application.

As shown in fig. 3, the training apparatus 200 of the neural network according to the embodiment of the present application includes: a feature map acquisition unit 210 for acquiring a feature map of an input labeled dataset from the labeled dataset through a neural network; a probability obtaining unit 220, configured to input the feature map of the labeled dataset acquired by the feature map acquiring unit 210 into a Softmax activation function of the neural network to obtain a probability output value of the label corresponding position; a loss function obtaining unit 230 for calculating a weighting value based on the monotonically increasing convex function of the probability output value obtained by the probability obtaining unit 220 and weighting the original loss function value of the neural network with the weighting value to obtain a new loss function value; and a network updating unit 240 for updating parameters of the neural network based on the new loss function value obtained by the loss function obtaining unit 230.

Fig. 4 illustrates a block diagram of a second example of a training apparatus of a neural network, according to an embodiment of the present application.

As shown in fig. 4, the training apparatus 200' for a neural network further includes a network initializing unit 250 for initializing parameters of the neural network, in addition to the feature map acquiring unit 210, the probability acquiring unit 220, the loss function acquiring unit 230, and the network updating unit 240, on the basis of the embodiment shown in fig. 3. And, the feature map acquiring unit 210 is configured to acquire a feature map of the labeled dataset from the inputted labeled dataset through the neural network initialized by the network initializing unit 250.

In one example, in the training apparatus of a neural network according to an embodiment of the present application, the loss function obtaining unit 230 is configured to: and weighting the original loss function value of the neural network by taking the value of the monotonically increasing convex function of the probability output value as a sample confidence value to obtain a new loss function value of the neural network.

In one example, in the training apparatus of a neural network according to an embodiment of the present application, the network updating unit 240 is configured to: the feature map acquisition unit 210 acquires a feature map, the probability acquisition unit 220 acquires a probability output value, the loss function acquisition unit 230 calculates a weighted value, and acquires a new loss function value and updates the neural network with the new loss function value in an iterative manner.

As shown in fig. 5, on the basis of the embodiment shown in fig. 3, the network updating unit 240 includes: a convergence determination subunit 241, configured to determine whether the updated neural network converges; and a first stopping subunit 242, configured to stop iterative updating of the neural network in response to the convergence determining subunit 241 determining that the updated neural network converges.

As shown in fig. 6, on the basis of the embodiment shown in fig. 3, the network updating unit 240 includes: a number setting subunit 243 configured to set a maximum number of iterative updating of the neural network; and a second stopping subunit 244, configured to stop the iterative update of the neural network in response to the number of iterative updates of the neural network reaching the maximum number of iterative updates set by the number setting subunit 243.

Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described training apparatus 200 for a neural network have been described in detail in the above description of the training method for a neural network with reference to fig. 1 and 2, and thus, repetitive descriptions thereof will be omitted.

As described above, the training apparatus 200 of the neural network according to the embodiment of the present application may be implemented in various terminal devices, for example, a server or the like for training the neural network. In one example, the training apparatus 200 of the neural network according to an embodiment of the present application may be integrated into the terminal device as one software module and/or hardware module. For example, the training apparatus 200 of the neural network may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the training device 200 of the neural network may also be one of a plurality of hardware modules of the terminal device.

Alternatively, in another example, the training apparatus 200 of the neural network and the terminal device may be separate devices, and the training apparatus 200 of the neural network may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information in a contracted data format.

Exemplary electronic device

Next, an electronic device according to an embodiment of the present application is described with reference to fig. 7.

As shown in fig. 7, the electronic device 10 includes one or more processors 11 and a memory 12.

The processor 13 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 11 to implement the neural network training method and/or other desired functions of the various embodiments of the present application described above. Various contents such as a sample confidence value, a loss function value, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

The input means 13 may comprise, for example, a keyboard, a mouse, etc.

The output device 14 can output various information to the outside, including a trained neural network model, and the like. The output means 14 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device 10 that are relevant to the present application are shown in fig. 7 for simplicity, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in a neural network training method according to various embodiments of the application described in the "exemplary methods" section of this specification.

The computer program product may write program code for performing operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps in a neural network training method according to various embodiments of the present application described in the "exemplary methods" section of the present specification.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present application have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be construed as necessarily possessed by the various embodiments of the application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.

The block diagrams of the devices, apparatuses, devices, systems referred to in the present application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. A method of training a neural network, comprising:

Acquiring a feature map of an input tagged data set from the tagged data set through a neural network; the tags in the tagged dataset contain tag noise;

inputting the feature map of the labeled dataset into a Softmax activation function of the neural network to obtain a probability output value of the corresponding position of the label;

Calculating a weighted value based on the monotonically increasing convex function of the probability output value and weighting the original loss function value of the neural network with the weighted value to obtain a new loss function value; and

Updating parameters of the neural network based on the new loss function value;

Wherein calculating a weighting value based on the monotonically increasing convex function of the probability output value and weighting the original loss function value of the neural network with the weighting value to obtain a new loss function value comprises:

And weighting the original loss function value of the neural network by taking the value of the monotonically increasing convex function of the probability output value as a sample confidence value to obtain a new loss function value of the neural network.

2. The method of training a neural network of claim 1, wherein prior to obtaining a feature map of an input tagged dataset from the tagged dataset through the neural network, comprising:

And initializing parameters of the neural network.

3. The method of training the neural network of claim 1, wherein updating parameters of the neural network based on the new loss function value comprises:

The steps of acquiring a feature map, acquiring a probability output value, calculating a weighting value and calculating a new loss function value and updating parameters of the neural network are repeated in an iterative manner.

4. The training method of a neural network of claim 3, wherein iteratively repeating the steps of obtaining a feature map, obtaining a probability output value, calculating a weighting value, and calculating a new loss function value, and updating parameters of the neural network comprises:

determining whether the updated neural network converges; and

And stopping iterative updating of the neural network in response to convergence of the updated neural network.

5. A method of training a neural network as claimed in claim 3, wherein iteratively repeating the steps of obtaining a feature map, obtaining a probability output value, calculating a weighting value and calculating a new loss function, and updating parameters of the neural network comprises:

Setting the maximum iteration update times of the neural network; and

And stopping iterative updating of the neural network in response to the iterative updating number of the neural network reaching the maximum iterative updating number.

6. A training device for a neural network, comprising:

A feature map acquisition unit for acquiring a feature map of an input labeled dataset from the labeled dataset through a neural network; the tags in the tagged dataset contain tag noise;

the probability obtaining unit is used for inputting the feature map of the tagged data set obtained by the feature map obtaining unit into a Softmax activation function of the neural network to obtain a probability output value of the corresponding position of the tag;

A loss function obtaining unit configured to calculate a weighting value based on the monotonically increasing convex function of the probability output value obtained by the probability obtaining unit and weight an original loss function value of the neural network with the weighting value to obtain a new loss function value; and

A network updating unit configured to update parameters of the neural network based on the new loss function value obtained by the loss function obtaining unit;

7. The training apparatus of a neural network of claim 6, wherein the network updating unit is to:

the feature map acquisition unit acquires a feature map, the probability acquisition unit acquires a probability output value, the loss function acquisition unit calculates a weighting value, and acquires a new loss function value and updates the neural network with the new loss function value in an iterative manner.

8. The training apparatus of a neural network of claim 7, wherein the network updating unit comprises:

a convergence determination subunit, configured to determine whether the updated neural network converges; and

And the first stopping subunit is used for stopping iterative updating of the neural network in response to the convergence determining subunit determining that the updated neural network converges.

9. An electronic device, comprising:

A processor; and

A memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the neural network training method of any of claims 1-5.