CN115546528A - Neural network training method, sample processing method and related equipment - Google Patents

Neural network training method, sample processing method and related equipment Download PDF

Info

Publication number
CN115546528A
CN115546528A CN202110742175.XA CN202110742175A CN115546528A CN 115546528 A CN115546528 A CN 115546528A CN 202110742175 A CN202110742175 A CN 202110742175A CN 115546528 A CN115546528 A CN 115546528A
Authority
CN
China
Prior art keywords
neural network
target
training
network layer
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110742175.XA
Other languages
Chinese (zh)
Inventor
武晓宇
胡崝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202110742175.XA priority Critical patent/CN115546528A/en
Publication of CN115546528A publication Critical patent/CN115546528A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the application discloses a training method of a neural network, a sample processing method and related equipment, wherein the method can be used in the field of incremental learning in the field of artificial intelligence, and comprises the following steps: determining a target neural network layer from a target feature extraction network according to at least one first accuracy, wherein the target feature extraction network is obtained by training through a historical training data set, N types of first training samples are arranged in the historical training data set, and the first accuracy is obtained by performing classification operation on the first training samples based on feature information generated by the first neural network layer; keeping the parameters of at least one second neural network layer unchanged, training the first neural network by using a target training data set, wherein the at least one second neural network layer comprises a neural network layer which is positioned in the target characteristic extraction network and is positioned in front of the target neural network layer, the target training data set comprises N types of second training samples and newly added types of training samples, and the total training time of the first neural network is shortened.

Description

Neural network training method, sample processing method and related equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a training method for a neural network, a sample processing method, and related devices.
Background
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
With the continuous development of artificial intelligence technology, neural networks are continuously applied to various industries. Generally, a trained neural network can be deployed to an execution device for operation, a newly added type of sample can be collected during the operation process of the neural network, and in order to ensure the accuracy of the neural network, the neural network needs to be incrementally trained by combining a history sample and the newly added type of sample.
However, when the target performs incremental training on the neural network, all parameters of the neural network need to be adjusted, and a large amount of time is generally consumed, so a scheme for shortening the training time of the neural network is urgently needed.
Disclosure of Invention
The embodiment of the application provides a training method of a neural network, a processing method of a sample and related equipment, which are used for keeping the parameters of a neural network layer before a target neural network layer in a target feature extraction network unchanged in the process of training a first neural network so as to shorten the total time for training the first neural network; and the N training samples can be better distinguished through the characteristic information generated by the target neural network layer, and the accuracy of the second neural network is favorably improved.
In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:
in a first aspect, an embodiment of the present application provides a training method for a neural network, which may be used in the field of incremental learning in the field of artificial intelligence. The method comprises the steps that training equipment obtains a target feature extraction network, wherein the target feature extraction network is obtained after training is carried out by adopting a historical training data set, and the target feature extraction network is a neural network used for carrying out feature extraction operation on an input sample; the historical training data set is provided with N types of first training samples, N is a positive integer, and the target feature extraction network comprises a plurality of first neural network layers. The training equipment determines a target neural network layer from a target feature extraction network according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer; the first accuracy corresponding to the first neural network layer is the accuracy of a prediction classification result obtained by performing classification operation on the first training sample based on the feature information generated by the first neural network layer. The training equipment keeps the parameters of at least one second neural network layer in the target feature extraction network unchanged, and trains the first neural network by using the target training data set until preset conditions are met to obtain a second neural network, wherein the first neural network comprises the target feature extraction network. The at least one second neural network layer comprises a neural network layer positioned in the target feature extraction network before the target neural network layer, namely the at least one second neural network layer comprises the target neural network layer and at least one neural network layer positioned in the target feature extraction network before the target neural network layer; the target training data set comprises N types of second training samples and at least one type of newly added training sample, wherein the N types of second training samples are derived from N types of first training samples; the preset condition may be a convergence condition of reaching the first loss function, or may be that the number of times of training the first neural network reaches a preset number threshold.
In the implementation mode, in the process of training the first neural network, the parameters of at least one second neural network layer in the first neural network are kept unchanged, so that the total time length for training the first neural network is favorably shortened, and the consumption of computer resources in the training process of the first neural network is favorably reduced; in addition, at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is positioned before the target neural network layer, the target neural network layer is selected according to a first accuracy rate, the first accuracy rate is the accuracy rate obtained by performing classification operation on the first training samples based on the feature information generated by the first neural network layer, namely the N types of training samples can be better distinguished through the feature information generated by the target neural network layer, so that the comprehension capability of the learned N types of training samples in the target feature extraction network is kept as much as possible, and the accuracy rate of the trained first neural network (namely the second neural network) is improved.
In one possible implementation manner of the first aspect, the determining, by the training device, a target neural network layer from the target feature extraction network according to at least one first accuracy rate in one-to-one correspondence with the at least one first neural network layer may include: the training equipment acquires at least two first accuracy rates in one-to-one correspondence with the at least two first neural network layers, and acquires a second accuracy rate from the at least two first accuracy rates, wherein the value of the second accuracy rate in the at least two first accuracy rates is the highest. The training equipment acquires a third neural network layer corresponding to the second accuracy rate one by one from the at least two first neural network layers, and determines a target neural network layer according to the third neural network layer.
In this implementation, since it is not necessary that the feature information generated by the last first neural network layer in the target feature extraction network can distinguish the N categories most, the first accuracy rates corresponding to the at least two first neural network layers are compared, and a third neural network corresponding to a high accuracy rate is selected, and then the target neural network layer is determined according to the third neural network.
In one possible implementation of the first aspect, the at least two first neural network layers may comprise all first neural network layers in the target feature extraction network. In this implementation manner, a plurality of first accuracy rates that are in one-to-one correspondence with all first neural network layers in the target feature extraction network are obtained, and then a third neural network layer corresponding to the second accuracy rate with the highest value is obtained from all first neural network layers, that is, the N categories can be most distinguished by feature information generated by the selected third neural network layer, so as to further improve the accuracy rate of the second neural network.
In one possible implementation manner of the first aspect, the determining, by the training device, the target neural network layer according to the third neural network layer may include: the training device may directly determine the third neural network layer as the target neural network layer; or, the training device may determine whether the third neural network layer is within a preset range, and if the third neural network layer is within the preset range, determine an h-th neural network layer located before the third neural network layer in the target feature extraction network as the target neural network layer. The preset range can be specifically H first neural network layers in the target feature extraction network, and values of H and H are positive integers.
In one possible implementation manner of the first aspect, the historical training data set includes a target training data subset, the target training data subset includes a first training sample of target classes in N classes, and the target class is any one of the N classes. The training device trains the first neural network by using the target training data set until the preset condition is met, and the method may further include: the training equipment acquires a characteristic information set corresponding to a target training data subset, wherein the characteristic information set comprises target characteristic information of a first training sample of a target class, and the target characteristic information is obtained through at least one second neural network layer in a target characteristic extraction network. The training equipment determines the class center of a first training sample of a target class according to the characteristic information set, and acquires at least one second training sample of the target class from the target training data subset according to the class center of the first training sample of the target class; a third distance between the target feature information of the second training sample of each target class and the class center of the target class is smaller than the target threshold, and the third distance may be specifically represented as a cosine distance, an euclidean distance, a mahalanobis distance, a manhattan distance, a hamming distance, or the like.
In this implementation manner, the N classes can be well distinguished by the target feature information of the first training sample obtained through the target neural network layer (i.e., through the at least one second neural network layer in the target feature extraction network), that is, the target feature information of the first training sample of the target class can well represent the target class, so that a plurality of second training samples representing the target class can be selected by using the target feature information of the first training sample of the target class, that is, the quality of the training sample of the first neural network is improved, and the accuracy of the second neural network is improved.
In a possible implementation manner of the first aspect, if the target neural network layer and the third neural network layer are different neural network layers in the target feature extraction network, the target feature information is obtained through the third neural network layer in the target feature extraction network, that is, the target feature information is obtained through at least one first neural network layer located before the third neural network layer in the target feature extraction network.
In a possible implementation manner of the first aspect, the more discrete the data distribution of the plurality of target feature information corresponding to the plurality of first training samples of the target category is, the larger the value of the target threshold is. In the implementation manner, the more discrete the data distribution of the target characteristic information of the first training samples of the target category is, the larger the value of the target threshold is, the less the number of the obtained second training samples of the target category can be avoided, so as to ensure that the number of the second training samples of different categories in the N categories can be kept balanced as much as possible, and ensure that the second neural network can learn the characteristics of each category of the first training samples in the N categories of the first training samples.
In a possible implementation manner of the first aspect, the first data distribution information of the target feature information of the plurality of first training samples may be specifically expressed as a variance of the target feature information of the plurality of first training samples. Alternatively, the first data distribution information may also be a difference between first feature information in the target feature information of the plurality of first training samples and a class center of the target class, where the first feature information is one of the plurality of target feature information corresponding to the plurality of first training samples of the target class one to one, which is farthest from the class center of the target class.
In a possible implementation manner of the first aspect, values of the target thresholds corresponding to different classes in the N classes are the same.
In one possible implementation of the first aspect, the function of the first neural network is any one of: image classification, feature extraction on images, or regression operation from sequence data in text form. In the implementation mode, various specific implementation functions of the first neural network are provided, the application scene of the scheme is expanded, and the implementation flexibility of the scheme is improved.
In a possible implementation manner of the first aspect, the target feature extraction network further includes a third neural network, the third neural network is obtained by training with a historical training data set, and functions of the third neural network and the first neural network are the same. The training device training the first neural network with a target training data set may include: the training equipment acquires any training sample (hereinafter referred to as a 'fourth training sample' for convenience of description) from the target training data set, inputs the fourth training sample into the first neural network, and obtains a prediction result which is output by the first neural network and corresponds to the fourth training sample; the training device may further input the second training sample into the third neural network and the first neural network, respectively, to obtain a first prediction result corresponding to the second training sample output by the third neural network and a second prediction result corresponding to the second training sample output by the first neural network. The training equipment trains the first neural network according to a first loss function and a second loss function, wherein the first loss function is used for indicating the similarity between the prediction result corresponding to the fourth training sample and the correct result corresponding to the fourth training sample, and the second loss function is used for indicating the similarity between the first prediction result corresponding to the second training sample and the second prediction result corresponding to the second training sample.
In a second aspect, an embodiment of the present application provides a training method for a neural network, which can be used in the field of incremental learning in the field of artificial intelligence. The method can comprise the following steps: the method comprises the steps that training equipment obtains a target feature extraction network, the target feature extraction network is obtained after training is conducted through a historical training data set, N types of first training samples are arranged in the historical training data set, N is a positive integer, and the target feature extraction network comprises a plurality of first neural network layers; and determining a third neural network layer from the target feature extraction network according to at least one first accuracy rate in one-to-one correspondence with the at least one first neural network layer, wherein the first accuracy rate is the accuracy rate obtained by performing classification operation on the first training sample based on the feature information generated by the first neural network layer. The training equipment acquires a feature information set corresponding to a target training data subset, wherein the target training data subset belongs to a historical training data set, the target training data subset comprises first training samples of target classes in N classes, the target classes are any one of the N classes, the feature information set comprises target feature information of the first training samples of the target classes, and the target feature information is generated for a third neural network layer in a target feature extraction network. The training equipment determines the class center of a first training sample of the target class according to the characteristic information set; and acquiring a second training sample of at least one target class from the target training data subset according to the class center of the first training sample of the target class, wherein the distance between the target characteristic information of the second training sample of each target class and the class center of the target class is smaller than a target threshold value. The training equipment trains the first neural network by using a target training data set until a preset condition is met to obtain a second neural network, the target training data set comprises N types of selected second training samples and at least one type of newly added training samples, and the first neural network comprises a target feature extraction network.
In one possible implementation manner of the second aspect, the method may further include: the training equipment determines a target neural network layer from the target feature extraction network according to the third neural network layer; the position of the target neural network layer in the target feature extraction network is more front than that of the third neural network layer in the target feature extraction network, or the target neural network layer and the third neural network layer are the same neural network layer. The training device trains the first neural network by using the target training data set until a preset condition is met, which may include: the training equipment keeps the parameters of at least one second neural network layer in the target feature extraction network unchanged, the first neural network is trained by using the target training data set until the preset conditions are met, and a second neural network is obtained, wherein the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is positioned before the target neural network layer.
The training device provided in the second aspect of the embodiment of the present application may further perform steps performed by the training device in each possible implementation manner of the first aspect, and for specific implementation steps of the first aspect and each possible implementation manner of the first aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in each possible implementation manner of the first aspect, and details are not repeated here.
In a third aspect, an embodiment of the present application provides a sample processing method, which can be used in the field of incremental learning in the field of artificial intelligence. The method can comprise the following steps: the method comprises the steps that executing equipment obtains a second neural network, the second neural network is obtained by utilizing a target training data set to train a first neural network, a target feature extraction network in the first neural network carries out training operation based on a historical training data set, N types of first training samples are arranged in the historical training data set, the target training data set comprises N types of second training samples and at least one type of newly added training sample, the N types of second training samples are derived from the N types of first training samples, and N is a positive integer. The execution equipment processes the input sample through the second neural network to obtain a processing result output by the second neural network; the target feature extraction network comprises a plurality of first neural network layers, and in the training process of the first neural network, the parameters of at least one second neural network layer in the first neural network are unchanged; the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network before the target neural network layer, the target neural network layer is determined according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer in the target feature extraction network, and the first accuracy rate is the accuracy rate obtained by performing classification operation on the input samples based on feature information generated by the first neural network layer.
In a possible implementation manner of the third aspect, the historical training data set includes a target training data subset, the target training data subset includes a first training sample of target classes in N classes, and the target class is any one of the N classes; the distance between the target feature information of the second training sample of each target class and the class center of the first training sample of each target class is smaller than a target threshold value, the class center of the first training sample of each target class is obtained based on the target feature information of the first training sample of each target class, and the target feature information is obtained through at least one second neural network layer in the target feature extraction network.
For specific implementation steps of the third aspect and various possible implementation manners of the third aspect in the embodiment of the present application, and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in various possible implementation manners in the first aspect, and details are not described here any more.
In a fourth aspect, an embodiment of the present application provides a training apparatus for a neural network, which may be used in the field of incremental learning in the field of artificial intelligence. The device comprises: the acquisition module is used for acquiring a target feature extraction network, the target feature extraction network is obtained by training through a historical training data set, N types of first training samples are arranged in the historical training data set, N is a positive integer, and the target feature extraction network comprises a plurality of first neural network layers; the determining module is used for determining a target neural network layer from a target feature extraction network according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer, wherein the first accuracy rate is obtained by performing classification operation on a first training sample based on feature information generated by the first neural network layer; the training module is used for keeping parameters of at least one second neural network layer in the target feature extraction network unchanged, training the first neural network by using a target training data set until a preset condition is met, and obtaining a second neural network, wherein the first neural network comprises a target feature extraction network; the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is positioned before the target neural network layer, the target training data set comprises N types of second training samples and at least one type of newly added training samples, and the N types of second training samples are derived from the N types of first training samples.
The training apparatus for a neural network provided in the fourth aspect of the embodiment of the present application may further perform steps performed by the training device in each possible implementation manner of the first aspect, and for specific implementation steps of the fourth aspect and each possible implementation manner of the fourth aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in each possible implementation manner of the first aspect, and details are not repeated here.
In a fifth aspect, an embodiment of the present application provides a training apparatus for a neural network, which may be used in the field of incremental learning in the field of artificial intelligence. The device comprises: the acquisition module is used for acquiring a target feature extraction network, the target feature extraction network is obtained after training is carried out by adopting a historical training data set, N types of first training samples are arranged in the historical training data set, N is a positive integer, and the target feature extraction network comprises a plurality of first neural network layers; the determining module is used for determining a third neural network layer from the target feature extraction network according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer, wherein the first accuracy rate is obtained by performing classification operation on the first training sample based on feature information generated by the first neural network layer; the acquisition module is further used for acquiring target feature information of the first training sample of a target category, wherein the target category is any one of N categories, and the target feature information is generated for a third neural network layer in the target feature extraction network; the determining module is further used for determining the category center of the first training sample of the target category according to the target feature information of the first training sample of the target category; the acquisition module is further used for acquiring at least one second training sample of the target class from the target training data subset according to the class center of the first training sample of the target class, wherein the distance between the target characteristic information of the second training sample of each target class and the class center of the target class is smaller than a target threshold value; the training module is used for training the first neural network by using a target training data set until a preset condition is met to obtain a second neural network, the target training data set comprises N types of selected second training samples and at least one type of newly added training samples, and the first neural network comprises a target feature extraction network.
The training apparatus for a neural network provided in the fifth aspect of the embodiment of the present application may further perform steps performed by the training device in each possible implementation manner of the second aspect, and for specific implementation steps of the fifth aspect and each possible implementation manner of the fifth aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in each possible implementation manner in the second aspect, and details are not repeated here.
In a sixth aspect, an embodiment of the present application provides a sample processing apparatus, which can be used in the field of incremental learning in the field of artificial intelligence. The device comprises: the acquisition module is used for acquiring a second neural network, the second neural network is obtained by training a first neural network by using a target training data set, a target feature extraction network in the first neural network performs training operation based on a historical training data set, the historical training data set is provided with N types of first training samples, the target training data set comprises N types of second training samples and at least one type of newly added training samples, the N types of second training samples are derived from the N types of first training samples, and N is a positive integer; the processing module is used for processing the input sample through the second neural network to obtain a processing result output by the second neural network; the target feature extraction network comprises a plurality of first neural network layers, and in the training process of the first neural network, the parameters of at least one second neural network layer in the first neural network are unchanged; the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is positioned before the target neural network layer, the target neural network layer is determined according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer in the target feature extraction network, and the first accuracy rate is obtained by performing classification operation on the input samples based on feature information generated by the first neural network layer.
For specific implementation steps of the sixth aspect and various possible implementation manners of the sixth aspect in the embodiment of the present application, and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in various possible implementation manners in the third aspect, and details are not repeated here.
In a seventh aspect, an embodiment of the present application provides a training apparatus, which may include a processor, a processor coupled to a memory, and the memory storing program instructions, where the program instructions stored in the memory are executed by the processor to implement the method for training a neural network according to the first aspect or the second aspect.
In an eighth aspect, embodiments of the present application provide an execution device, which may include a processor, a coupling between the processor and a memory, and the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the execution device implements the sample processing method of the third aspect.
In a ninth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and when the program runs on a computer, the program causes the computer to execute the method for training a neural network according to the first aspect, or causes the computer to execute the method for training a neural network according to the second aspect, or causes the computer to execute the method for processing a sample according to the third aspect.
In a tenth aspect, the present embodiments provide a circuit system, where the circuit system includes a processing circuit, and the processing circuit is configured to execute the method for training a neural network according to the first aspect, or the processing circuit is configured to execute the method for training a neural network according to the second aspect, or the processing circuit is configured to execute the method for processing a sample according to the third aspect.
In an eleventh aspect, the present application provides a computer program, which when run on a computer, causes the computer to execute the method for training a neural network according to the first aspect, or causes the computer to execute the method for training a neural network according to the second aspect, or causes the computer to execute the method for processing a sample according to the third aspect.
In a twelfth aspect, embodiments of the present application provide a chip system, which includes a processor, configured to implement the functions recited in the above aspects, for example, to transmit or process data and/or information recited in the above methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the server or the communication device. The chip system may be formed by a chip, or may include a chip and other discrete devices.
Drawings
FIG. 1 is a schematic structural diagram of an artificial intelligence body framework provided by an embodiment of the present application;
FIG. 2 is a system architecture diagram of a sample processing system provided by an embodiment of the present application;
fig. 3 is a schematic flowchart of a method for training a neural network according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of obtaining training samples of new categories in the training method for a neural network according to the embodiment of the present application;
fig. 5 is a schematic diagram of a training sample of a new class in a training method for a neural network according to an embodiment of the present application;
fig. 6 is a schematic diagram of a third neural network layer and a target neural network layer in a training method of a neural network provided in an embodiment of the present application;
fig. 7 is a schematic flowchart illustrating a process of determining a third neural network layer in the training method for a neural network according to the embodiment of the present application;
FIG. 8 is a schematic diagram illustrating a comparison between a first neural network and a third neural network in a training method of a neural network according to an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of at least one second neural network layer in the neural network training method according to the embodiment of the present application;
FIG. 10 is a schematic flow chart of a sample processing method provided by an embodiment of the present application;
fig. 11 is a schematic structural diagram of a training apparatus for a neural network according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an alternative training apparatus for a neural network according to an embodiment of the present application;
FIG. 13 is a schematic structural diagram of a sample processing device according to an embodiment of the present disclosure;
fig. 14 is a schematic structural diagram of an execution device according to an embodiment of the present application;
FIG. 15 is a schematic structural diagram of a training apparatus provided in an embodiment of the present application;
fig. 16 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.
Detailed Description
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenes, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" process of consolidation. The "IT value chain" reflects the value of artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (provision and processing technology implementation) to the industrial ecological process of the system.
(1) Infrastructure arrangement
The infrastructure provides computing power support for the artificial intelligent system, communication with the outside world is achieved, and support is achieved through the foundation platform. Communicating with the outside through a sensor; the computing power is provided by an intelligent chip, and the intelligent chip may specifically adopt hardware acceleration chips such as a Central Processing Unit (CPU), an embedded neural Network Processor (NPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), or a Field Programmable Gate Array (FPGA); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.
(2) Data of
Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
The machine learning and the deep learning can be used for performing symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.
Decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sorting, prediction and the like.
(4) General purpose capabilities
After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.
(5) Intelligent product and industrial application
The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent manufacturing, intelligent transportation, intelligent house, intelligent medical treatment, intelligent security protection, autopilot, wisdom city etc..
The embodiment of the present application may be applied to various application scenarios where there is a need for incremental training of a model, where the model may be specifically represented as a neural network, or may be represented as a model in a non-neural network form, and it should be understood that in the embodiment of the present application, only the neural network adopted by the model is taken as an example for explanation. The sample processed by the model may specifically be an image, speech or text, etc. For example, in the field of automated driving, a first neural network for classifying images is configured in an automated driving vehicle, and during inference by the first neural network, the automated driving vehicle can continuously acquire images of a surrounding environment, an object of a new category may appear in the surrounding environment of the automated driving vehicle, and in order to enable the first neural network to favorably classify the new category, it is necessary to perform incremental training on the first neural network by using a target training data set, where the target training data set includes N types of historical training images and training images of the new category.
As another example, for example, in the field of intelligent terminals, a client of a search system may be configured on an intelligent terminal, a server of the search system is configured with a first neural network, and a user may input description information in a text form through the client of the search system, and the server of the search system obtains description information of a plurality of products matching with the description information through the first neural network, so as to present the description information of the plurality of products to the user through the client of the search system. In order to enable the first neural network to compatibly process the description information of the newly added type of product, it is necessary to perform incremental training on the first neural network by using a target training data set, where the target training data set includes the description information of the N types of historical products and the description information of the newly added type of product.
As another example, in the field of smart cities, a monitoring system is installed in a city, a target face image can be collected by the monitoring system, feature extraction is performed on the target face image through a first neural network to obtain feature information of the target face image, and the feature information generated by the first neural network is matched with feature information of a plurality of faces in a database, so as to obtain a clear face image corresponding to the target face image from the database. Because the new classes of face images may appear in the use process of the first neural network, in order to enable the first neural network to accurately extract the feature information of the face images of various classes, the first neural network needs to be incrementally trained by using a target training data set, where the target training data set includes N classes of historical face images, the new classes of face images, and the like.
It can be known from the above examples that, in various application fields of artificial intelligence, there is a need to perform incremental learning on the first neural network, that is, the training method for a neural network provided in the embodiment of the present application can be applied in various application fields of artificial intelligence, and it should be noted that the above examples are only for facilitating understanding of an application scenario of the present solution, and are not used for exhaustive application scenarios in the embodiment of the present application.
Before describing the training method of the neural network provided in the embodiment of the present application in detail, a sample processing system provided in the embodiment of the present application is described with reference to fig. 2. Referring to fig. 2, fig. 2 is a system architecture diagram of a sample processing system according to an embodiment of the present disclosure, in fig. 2, the sample processing system 200 includes an execution device 210, a training device 220, a database 230, and a data storage system 240, and the execution device 210 includes a calculation module 211.
The historical training data set is stored in the database 230, the training device 220 generates the target model/rule 201, and performs iterative training on the target model/rule 201 by using the historical training data set in the database 230 to obtain a mature target model/rule 201. Further, the historical training data set comprises N types of first training samples; the target model/rule 201 may include a target feature extraction network having a plurality of first neural network layers, each first neural network layer for generating feature information of a first training sample.
The trained target model/rule 201 obtained by the training device 220 may be applied to different systems or devices, such as a mobile phone, a tablet, a notebook, a Virtual Reality (VR) device, a monitoring system, a data processing system of a radar, and so on. The execution device 210 may call data, code, etc. in the data storage system 240, or store data, instructions, etc. in the data storage system 240. The data storage system 240 may be located in the execution device 210 or the data storage system 240 may be external to the execution device 210. The calculation module 211 may process the input sample through the trained target model/rule 201 to obtain a processing result corresponding to the input sample.
In the process of reasoning the target model/rule 201, the training device 220 may obtain training samples of at least one kind of newly added categories, and then the training device 220 needs to perform incremental training on the target model/rule 201 by using a target training data set, where the target training data set includes N types of second training samples and at least one kind of training samples of newly added categories. Specifically, the training device 220 determines the target neural network layer from the target feature extraction network according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer, where the first accuracy rate is an accuracy rate obtained by performing a classification operation on the first training sample based on feature information generated by the first neural network layer; the training device 220 keeps the parameters of at least one second neural network layer in the target model/rule 201 unchanged, trains the target model/rule 201 by using the target training data set until a preset condition is met to obtain the target model/rule 201 which has performed the incremental training operation, and the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is before the target neural network layer.
In the embodiment of the application, in the process of training the target model/rule 201, the parameters of at least one second neural network layer in the target model/rule 201 are kept unchanged, which is beneficial to shortening the total time of training the target model/rule 201; in addition, the first accuracy is obtained by performing classification operation on the first training sample based on the feature information generated by the first neural network layer, that is, the N types of training samples can be better distinguished through the feature information generated by the target neural network layer, so that the understanding capability of the N types of training samples learned in the target feature extraction network is kept as much as possible, and the accuracy of the trained target neural network is improved.
In some embodiments of the present application, please refer to fig. 2, a "user" may directly interact with the execution device 210, that is, the execution device 210 may directly display the processing result output by the target model/rule 201 to the "user", it should be noted that fig. 2 is only an architectural diagram of a sample processing system provided in the embodiments of the present invention, and the position relationship among the devices, modules, and the like shown in the diagram does not constitute any limitation. For example, in other embodiments of the present application, the execution device 210 and the client device may also be separate devices, the execution device 210 is configured with an input/output (in/out, I/O) interface, and the execution device 210 performs data interaction with the client device through the I/O interface.
In view of the above description, a specific implementation flow of the training phase and the inference phase of the training method for a neural network provided in the embodiment of the present application is described below.
1. Training phase
In the embodiment of the present application, a training phase describes a process of how the training device 220 performs training by using a target training data set in the database 230, specifically, please refer to fig. 3, where fig. 3 is a flowchart of a training method of a neural network provided in the embodiment of the present application, and the training method of the neural network provided in the embodiment of the present application may include:
301. the training equipment acquires a target feature extraction network, wherein the target feature extraction network is obtained after training is carried out by adopting a historical training data set, and the historical training data set is provided with N types of first training samples.
In the embodiment of the application, the training device acquires a target feature extraction network. The target feature extraction network may be included in a third neural network, and the third neural network further includes a second feature processing network. The third neural network comprising the target feature extraction network is obtained by training a historical training data set, wherein N types of first training samples are arranged in the historical training data set, and N is a positive integer; the third neural network is provided with a target feature extraction network, and the target feature extraction network comprises a plurality of third neural network layers, in other words, each third neural network layer in the target feature extraction network is used for generating the feature information of the input sample.
Further, the third neural network may be a Deep Neural Network (DNN), and the function of the third neural network may be any one of the following: image classification, feature extraction of images, regression operations from sequence data in text form, speech recognition, text translation or other tasks, etc. Correspondingly, the first training sample may be embodied as an image, a voice, a text, or the like.
Specifically, in an implementation manner, the training device obtains the third neural network sent by the other electronic device, that is, the training device may directly obtain the neural network trained by using the historical training data set.
In another implementation manner, the training device further needs to use a historical training data set to perform iterative training on the third neural network according to the third loss function until a preset condition is met, so as to obtain the trained third neural network. The preset condition may be a convergence condition that satisfies the third loss function, or the number of times of performing iterative training on the third neural network reaches a preset number of times. The historical training data set comprises N types of first training samples and a correct result corresponding to each first training sample, and the correct result corresponding to each first training sample needs to be determined by combining the specific function of the third neural network.
As an example, for example, if the function of the third neural network is image classification, the historical training data set may be represented by D = { X, Y }, where X = { X } represents N classes of first training samples, and Y = { Y } includes a correct label corresponding to each of the N classes of first training samples. Further, the correct label in the historical training data set may be specifically represented as a vector after a One-hot (One-hot) encoding process, for example, if the center of the class to which the N types of first training samples belong includes {0,1,2,3 \8230;, N-1}, and the correct class of any training sample (hereinafter referred to as "target training sample" for convenience of description) in the N types of first training samples is 3, the correct label corresponding to the target training sample may be denoted as [0, 1, \82300 ], it should be understood that this example is only for convenience of understanding of the present scheme, and is not used to limit the present scheme.
And training the third neural network once according to the third loss function aiming at the training equipment. If the function of the third neural network (that is, the function of the third neural network) is not to perform feature extraction on the image, the training device may input the first training sample to the third neural network, so as to obtain a prediction processing result corresponding to the first training sample output by the third neural network; and the training equipment generates a function value of a third loss function according to the prediction processing result corresponding to the first training sample and the correct result corresponding to the first training sample, and reversely updates the parameters of the third neural network according to the function value of the third loss function so as to finish one-time training of the third neural network. The third loss function may be configured to indicate a similarity between a prediction processing result corresponding to the first training sample and a correct result corresponding to the first training sample; the third loss function may specifically adopt a cross-entropy loss function, an L1 loss function, or other types of loss functions, and it should be noted that the meaning of the third loss function and the type of the third loss function are both determined by combining the function of the third neural network, and this is not limited here.
To further understand the present solution, as an example, the function of the third neural network is to classify the image, define the third neural network as F, and the parameter of the third neural network as θ. The first training sample is x, y' = F (θ, x) represents a prediction result of the first training sample output by the third neural network, and an example of the third loss function is disclosed as follows:
L1=-∑ x∈X ylog y′; (1)
wherein L1 represents the third loss function, y represents the correct result corresponding to the first training sample, - ∑ E x∈X While the ylog y' represents the similarity between the predicted result of the first training sample and the correct result corresponding to the first training sample, and the training goal of the training device is to reduce the value of L, it should be understood that the example in equation (1) is only for convenience of understanding the scheme, and is not used to limit the scheme.
If the function of the third neural network is to extract features of the image, that is, the whole third neural network is a target feature extraction network, in one training process of the third neural network, the training device may input the first training sample to the third neural network to obtain feature information of the first training sample generated by the third neural network; the training device performs a classification operation through the classifier according to the feature information of the first training sample to generate a prediction class of the first training sample. The training device may generate a function value of a third loss function according to the prediction class of the first training sample and the correct result of the first training sample (i.e., the correct class to which the first training sample corresponds), the function value of the third loss function indicating a similarity between the prediction class of the first training sample and the correct result of the first training sample. The training device may update the parameters of the third neural network inversely according to the function value of the third loss function to complete one training of the third neural network.
It should be noted that, in the embodiment of the present application, the description of the training process of the third neural network is only to prove the implementability of the present solution, and the specific training process of the neural network may be determined by combining factors such as the function of the neural network, and the embodiment of the present application is not limited in this embodiment.
302. The training device obtains at least one type of training sample of the new added category.
In the embodiment of the application, the training device can also obtain at least one type of training samples of the newly added category. In the introduction of the application scenarios in the three cases of the function of the third neural network being image classification, feature extraction on the image, and regression operation according to the sequence data in the text form, the specific form of the training sample in the newly added category in the three cases has been exemplified, and details thereof are not repeated here.
For example, the class of the N classes of first training samples includes mandarin and dialects of various countries in China, the correct result corresponding to the first training sample is a speech recognition result, and the class of the training sample of the newly added class may be a voice. For example, the categories of N types of first training samples include english, japanese, french, and spanish, the correct result (i.e., the translation result) corresponding to the first training sample is chinese, the categories of the training samples of the newly added category may include korean and russian, and the like.
Specifically, the training device may obtain a plurality of newly added samples, determine, for any newly added sample (for convenience of description, subsequently referred to as a "target newly added sample") in the plurality of newly added samples, whether the target newly added sample is a newly added sample, and if the determination result of the determination is negative, the training device continues to perform a determination operation for a next newly added sample; if the judgment result is yes, the training equipment determines the target new sample as a new class training sample, and adds the target new sample into a first training data subset, wherein the first training data subset belongs to a target training data set, and the first training data subset further comprises correct results corresponding to each new class training sample.
A manner of obtaining a plurality of newly added samples for a training apparatus. In one implementation, in the process that one or more execution devices process the input sample through the third neural network, each execution device may collect a new sample, each execution device may send the collected new sample to the training device, and correspondingly, the training device may receive a plurality of newly added samples. In another implementation manner, the training device may receive a plurality of new samples sent by other electronic devices except the execution device, for example, the third neural network is configured at one or more clients (i.e., the execution device) of the search system, each client of the search system may send the plurality of new samples to a server of the search system (i.e., other electronic devices except the execution device), and the server of the search system may send the plurality of new samples to the training device.
Further, the training device can receive a plurality of newly added samples in real time; the obtaining operation of the newly added sample may also be performed every other first time length, for example, the value of the first time length may be 5 minutes, 8 minutes, 10 minutes, 30 minutes, or other values, and the value of the first time length may be flexibly set in combination with the actual application scenario.
Furthermore, the training equipment can judge whether the target newly added sample is a newly added sample in real time; or, the training device may perform a judgment operation on one or more newly added samples acquired within a second time period every other second time period, where values of the second time period and the first time period may be the same or different. For example, the value of the second duration may be 5 minutes, 8 minutes, 10 minutes, 30 minutes, or other values, and the like, and the value of the specific second duration may be set in combination with a specific application scenario, which is not limited herein.
And judging whether the target newly added sample (namely any newly added sample) is a training sample of the newly added category or not by aiming at the training equipment. In one implementation, if the function of the third neural network is to perform a classification operation, the training device may input the target new sample into the third neural network, so as to generate, through the third neural network, predicted classification information corresponding to the target new sample. The prediction classification information may include N probability values corresponding to the N categories one to one, where a first probability value (that is, any one of the N probability values) of the N probability values is used to indicate that the category of the target newly added sample is the probability of the first category, and the first category corresponds to the first probability value.
The training equipment determines the maximum probability value of the N probability values corresponding to the target newly added sample as a second probability value, judges whether the value of the second probability value is greater than or equal to a first threshold value, and determines that the target newly added sample is not a newly added training sample if the value of the second probability value is greater than or equal to the first threshold value; and if the value of the second probability value is smaller than the first threshold value, determining that the target newly added sample is a training sample of the newly added category. The first threshold may be greater than 0.5 and less than 1, for example, the value of the first threshold may be 0.6, 0.7, 0.8, 0.85, 0.9, or other values, and the like, which is not limited herein.
For a more intuitive understanding of the present disclosure, please refer to fig. 4, where fig. 4 is a schematic diagram of obtaining training samples of new categories in the training method of the neural network according to the embodiment of the present disclosure. Fig. 4 includes two sub-diagrams (a) and (b), and the sub-diagram (a) and the sub-diagram (b) in fig. 4 both take the function of the third neural network as the image classification, the value of N is 4, and the value of the first threshold is 0.8 as an example. Referring to the sub-diagram of fig. 4 (a), A1, the training device inputs the target new sample into the third neural network to generate the prediction classification information corresponding to the target new sample through the third neural network, where the prediction classification information includes four probability values, i.e., 0.01, 0.08, and 0.9. And A2, the training equipment determines that the value of the first probability value (namely the maximum probability value in the 4 probability values) is 0.9 according to the prediction classification information. And A3, the training equipment judges whether 0.9 is greater than or equal to 0.8. And A4, under the condition that 0.9 is larger than 0.8, the training device determines that the target new sample is not the training sample of the new class.
Referring to the sub-diagram of fig. 4 (B), B1, the training device inputs the new target sample into the third neural network to generate the predicted classification information corresponding to the new target sample through the third neural network, where the predicted classification information includes four probability values of 0.3, 0.02, 0.08, and 0.6. And B2, the training equipment determines that the value of the first probability value (namely the maximum probability value of the 4 probability values) is 0.6 according to the prediction classification information. And B3, the training equipment judges whether the 0.6 is greater than or equal to 0.8. And B4, in the case that 0.6 is less than 0.8, the training device determines that the target new sample is the training sample of the new category, and it should be understood that the example in fig. 4 is only for convenience of understanding the scheme, and is not used for limiting the scheme.
In another implementation manner, the training device may cluster the N types of first training samples, respectively, to obtain N first-type centers that are in one-to-one correspondence with the N types of first training samples. The training equipment calculates a first distance between a target newly-added sample (namely any newly-added sample) and each of the N first class centers, namely the training equipment generates N first distances; the training equipment determines the minimum distance in the N first distances as a second distance, judges whether the second distance is smaller than or equal to a second threshold value, and determines that the target newly added sample is not a newly added training sample if the second distance is smaller than or equal to the second threshold value; and if the second distance is greater than the second threshold value, determining the target newly added sample as a newly added training sample.
The second distance between the target new sample and one of the first class centers may specifically adopt a cosine distance, an euclidean distance, a mahalanobis distance, a hamming distance (hamming distance), or other types of distances between the target new sample and the one of the first class centers, and the like, which is not limited herein.
For a more intuitive understanding of the present solution, please refer to fig. 5, in which fig. 5 is a schematic diagram of a training sample of a newly added category in the training method of a neural network according to an embodiment of the present application. The first training sample is embodied as an image, the value of the second threshold is 0.2, for example, "square" represents a category center corresponding to the first training sample of the category cat, "triangle" represents a category center corresponding to the first training sample of the category dog, "pentagon" represents a category center corresponding to the first training sample of the category sofa, "circle" represents a category center corresponding to the first training sample of the category tiger, "heptagon" represents the target new sample, as shown in fig. 5, the second distances between "square" to "circle" and "heptagon" are 0.3, 0.5, 0.8 and 0.35, respectively, and the values of 0.3, 0.5, 0.8 and 0.35 are all greater than 0.2, so that the target new sample belongs to the sample of the new category, it should be understood that the example in fig. 5 is only for convenience of understanding of the present scheme, and is not limited to the present scheme.
In another implementation manner, the training device may cluster the N types of first training samples respectively to obtain N first class centers one-to-one corresponding to the N types of first training samples. The training equipment calculates a first similarity between a target newly-added sample (namely any newly-added sample) and each of the N first class centers, namely the training equipment generates N first similarities; the training equipment determines the maximum similarity in the N first similarities as a second similarity, judges whether the second similarity is larger than or equal to a third threshold, and determines that the target newly added sample is not the training sample of the newly added category if the second similarity is larger than or equal to the third threshold; and if the second similarity is larger than a third threshold value, determining the target newly added sample as a newly added training sample. The first similarity may specifically adopt cosine similarity or other types of similarity, and is not exhaustive here.
It should be noted that the training device may also determine whether the target new sample (that is, any new sample) is a training sample of a new category by adopting other manners, for example, as an example, the training device generates N second category centers corresponding to the N first training samples one to one by using the feature information of the N first training samples, further calculates a distance/similarity between the feature information of the target new sample and each of the N second category centers, and further determines whether the target new sample is a training sample of a new category or not according to the distance/similarity between the feature information of the target new sample and each of the N second category centers, which is not exhaustive here.
It should be noted that, in the embodiment of the present application, the number of times of executing steps 301 and 302 is not limited, and step 302 may be executed multiple times after step 301 is executed once.
303. The training equipment determines a target neural network layer from the target feature extraction network according to at least one first accuracy rate in one-to-one correspondence with the at least one first neural network layer.
In some embodiments of the present application, the training device may obtain at least one first accuracy rate corresponding to at least one first neural network layer one to one, and then determine one target neural network layer from M first neural network layers included in the target feature extraction network. The first accuracy corresponding to the first neural network layer is the accuracy of a prediction classification result obtained by performing classification operation on the first training sample based on the characteristic information generated by the first neural network layer; the higher the value of the first accuracy rate is, the higher the probability that the first neural network layer corresponding to the first accuracy rate is selected is.
Specifically, a process of obtaining at least one first accuracy corresponding to at least one first neural network layer one to one for a training device. The training device may be preconfigured with at least one first classifier corresponding to the at least one first neural network layer one to one, where the first classifier is a classifier that performs a training operation, that is, the first classifier is a mature classifier.
For any classifier (hereinafter referred to as a second classifier for convenience of description) in at least one first classifier, the training device inputs a first training sample to the target feature extraction network to obtain feature information generated by one first neural network corresponding to the second classifier, the input of the second classifier is the feature information generated by one first neural network corresponding to the second classifier, and the output of the second classifier is a predicted class of the first training sample, so that the training device can obtain the accuracy of the second classifier by using a plurality of first training samples, and thus the accuracy of each first classifier in at least one first classifier can be obtained.
A process for determining a target neural network layer from among a plurality of first neural network layers included in a target feature extraction network. In one implementation manner, the training device acquires at least two first accuracy rates corresponding to at least two first neural network layers one to one, and acquires a second accuracy rate from the at least two first accuracy rates, wherein the value of the second accuracy rate in the at least two first accuracy rates is the highest; and acquiring a third neural network layer corresponding to the second accuracy rate one by one from the at least two first neural network layers, and determining a target neural network layer according to the third neural network layer.
In the embodiment of the application, because the feature information generated by the last first neural network layer in the target feature extraction network cannot necessarily distinguish the N categories, the first accuracy rates corresponding to at least two first neural network layers are compared, a third neural network corresponding to high accuracy rate is selected, and then the target neural network layer is determined according to the third neural network.
Further, the meaning for "at least two first neural network layers" is used. In one case, the at least two first neural network layers may include all first neural network layers in the target feature extraction network, i.e., the at least two first neural network layers include M first neural network layers. In the embodiment of the application, a plurality of first accuracy rates which are in one-to-one correspondence with all first neural network layers in the target feature extraction network are obtained, and then a third neural network layer corresponding to a second accuracy rate with the highest value is obtained from all first neural network layers, namely the N categories can be distinguished most by feature information generated by the selected third neural network layer, so that the accuracy rate of the second neural network is further improved.
In another case, the training device may divide the M first neural network layers into M subsets in a front-to-back order of the M first neural network layers in the target feature extraction network, where each subset of the M subsets includes at least one first neural network layer, and the training device may randomly select one first neural network layer from each subset of the M subsets, and determine all the first neural network layers selected from the M subsets as the "at least two first neural network layers". For example, if the value of M is 12, the training device uses the first 3 first neural network layers as a first subset, uses the 4 th to 6 th first neural network layers as a second subset, uses the 7 th to 9 th first neural network layers as a third subset, and uses the 10 th to 12 th first neural network layers as a fourth subset, the training device may randomly acquire one first neural network layer from each of the four subsets, and determine the acquired four first neural network layers as the at least two first neural network layers.
In another case, the at least two first neural network layers are randomly selected from M first neural network layers by the training device, and the training device may also select the at least two first neural networks from the M first neural network layers included in the target feature extraction network by using other manners, which is not exhaustive here.
To a process of obtaining a second accuracy from at least two first accuracies. If there is one second accuracy in the at least two first accuracies, the training device may directly obtain the one determined second accuracy. If at least two second accuracy rates with the same value exist in the at least two first accuracy rates, the training equipment can randomly select one second accuracy rate from the at least two second accuracy rates. Or, the training device may randomly select one second accuracy rate from at least two second accuracy rates based on a first preset rule; the first preset rule may be a second accuracy corresponding to a first neural network layer located at a front position in the target feature extraction network, or may be a second accuracy corresponding to a first neural network layer located at a rear position in the target feature extraction network, and the like, which is not limited herein.
A process for the training device to determine a target neural network layer from the third neural network layer. In one case, the training device may directly determine the third neural network layer as the target neural network layer.
In another case, the training device may determine whether the third neural network layer is within a preset range, and determine an h-th neural network layer located before the third neural network layer in the target feature extraction network as the target neural network layer if the third neural network layer is within the preset range. The preset range may specifically be H first neural network layers in the target feature extraction network, values of H and H are positive integers, values of H may be 1,2,3, other numerical values, or the like, and values of H may be 1,2,3, other numerical values, or the like, which are not limited herein.
For a more intuitive understanding of the present disclosure, please refer to fig. 6 and fig. 7, and refer to fig. 6 first, fig. 6 is a schematic diagram of a third neural network layer and a target neural network layer in the training method of a neural network according to the embodiment of the present disclosure. In fig. 6, taking values of H and H as 2 as an example, as shown in fig. 6, the third neural network layer is the last to last neural network layer in the target feature extraction network, and the third neural network layer is located within the preset range, then the 2 nd neural network layer located before the third neural network layer in the target feature extraction network is obtained and determined as the target neural network layer.
Referring to fig. 7 again, fig. 7 is a schematic flowchart illustrating a process of determining a third neural network layer in the training method of the neural network according to the embodiment of the present disclosure. As shown in fig. 7, the training apparatus inputs the first training sample to the third neural network to perform feature extraction on the first training sample by the target feature extraction network in the third neural network. In fig. 7, taking the example that the at least two first neural network layers include 4 first neural network layers, which are respectively a first neural network layer, a second neural network layer, a third neural network layer, and a fourth neural network layer, 4 classifiers corresponding to the 4 neural network layers are configured on the training device in advance, where the 4 classifiers are respectively a classifier 1, a classifier 2, a classifier 3, and a classifier 4. The input of the classifier 1 is the feature information of a first training sample generated by a first neural network layer, the input of the classifier 2 is the feature information of a first training sample generated by a second neural network layer, the input of the classifier 3 is the feature information of a first training sample generated by a third neural network layer, the input of the classifier 4 is the feature information of a first training sample generated by a fourth neural network layer, and the outputs of the classifier 1, the classifier 2, the classifier 3 and the classifier 4 are prediction categories of the first training sample, so that the accuracy rates of the classifier 1, the classifier 2, the classifier 3 and the classifier 4 can be respectively obtained based on a plurality of first training samples and the outputs of the classifier 1, the classifier 2, the classifier 3 and the classifier 4. After the training device sequences the accuracy rates of the classifier 1, the classifier 2, the classifier 3 and the classifier 4, it is known that the accuracy rate of the classifier i in the first 4 classifiers is the highest, and then the ith neural network layer corresponding to the classifier i is determined as the third neural network layer.
In another implementation manner, the training device randomly determines a fourth neural network layer from M first neural network layers included in the target feature extraction network, and obtains first accuracy rates corresponding to the fourth neural network layers one by one. The training equipment judges whether the first accuracy corresponding to the fourth neural network layer is greater than or equal to a fourth threshold value or not, and if the judgment result is yes, the training equipment determines a target neural network layer according to the fourth neural network layer; the specific implementation manner of determining the target neural network layer by the training device according to the fourth neural network layer is similar to the specific implementation manner of determining the target neural network layer by the training device according to the third neural network layer, except that the "third neural network layer" in the above step is replaced by the "fourth neural network layer" in the implementation manner, and the specific implementation steps may refer to the above description, which is not described herein again. If the judgment result of the judgment is negative, the training equipment continues to determine the next fourth neural network layer from the M first neural network layers included in the target feature extraction network. As an example, the value of the fourth threshold may be 70%, 80%, 85%, 87%, 89%, 92%, 94% or other values, etc., which are not exhaustive here.
304. The training equipment acquires a target characteristic information set corresponding to the target training data subset, wherein the target characteristic information set comprises target characteristic information of a first training sample of a target class.
In some embodiments of the present application, because the number of training samples included in the historical training data set is too large, before the training device performs the incremental training, N types of second training samples may be further selected from N types of first training samples included in the historical training data set, that is, a part of representative training samples is selected from each type of first training samples as the second training samples used in the incremental training process.
The selection process of the first training sample is performed on any one of the N types of first training samples (for convenience of description, referred to as "target category" in this embodiment). The training device may obtain a target feature information set corresponding to the target training data subset to perform the aforementioned selection process according to the target feature information set. The target training data subset comprises a plurality of first training samples of a target class, and the target feature information set comprises target feature information of each first training sample in the plurality of first training samples of the target class. Further, the target feature information may be expressed as a three-dimensional tensor, a two-dimensional matrix, a one-dimensional vector, or other forms of data, and the like, which is not limited herein.
Specifically, in an implementation manner, the target feature information is obtained through at least one second neural network layer in the target feature extraction network, where the at least one second neural network layer includes all the neural network layers located before the target neural network layer in the target feature extraction network, that is, the at least one second neural network layer includes the target neural network layer and at least one neural network layer located before the target neural network layer in the target feature extraction network.
That is, the training device inputs the first training sample of the target class into the target feature extraction network to perform the feature extraction operation through the target feature extraction network, and the training device determines the feature information generated by the target neural network layer as the target feature information of the first training sample of the target class. The training device performs the above operation on each of the plurality of first training samples of the target class to obtain target feature information of each of the plurality of first training samples of the target class, that is, to obtain a target feature information set corresponding to the target training data subset. For the understanding of the meaning of the target neural network layer, reference may be made to the above description, which is not repeated herein.
In another implementation manner, if the target neural network layer and the third neural network layer are different neural network layers in the target feature extraction network, the target feature information may also be obtained through the third neural network layer in the target feature extraction network, that is, the target feature information may be obtained through at least one first neural network layer in the target feature extraction network before the third neural network layer. Specifically, the training device inputs a first training sample of the target category into the target feature extraction network to perform feature extraction operation through the target feature extraction network, and the training device determines feature information generated by the third neural network layer as target feature information of the first training sample of the target category.
In another implementation, the target feature information is feature information generated by a last first neural network layer in the target feature extraction network. That is, the training device inputs the first training sample of the target class into the target feature extraction network, and the training device determines the feature information generated by the whole target feature extraction network as the target feature information of the first training sample of the target class. The training device performs the above operation on each of a plurality of first training samples of the target class to obtain a target feature information set.
305. The training device determines a class center of a first training sample of the target class.
In the embodiment of the present application, in a process of selecting a second training sample from a plurality of first training samples of a target category, a category center of the first training sample of the target category needs to be determined, where the category center of the first training sample of the target category is used to indicate feature information of the first training sample of the entire target category.
Specifically, step 304 is an optional step, and if step 304 is not executed, in an implementation manner, the training device may directly perform a clustering operation on a plurality of first training samples of the target class to obtain a class center of the first training sample of the target class. In another implementation, since the first training samples may be embodied as a matrix, the training device may directly calculate a first mean value of a plurality of first training samples of the target class, and determine the first mean value as a class center of the first training samples of the target class.
If step 304 is executed, after the training device obtains the target feature information set corresponding to the target training data subset, in an implementation manner, the training device may calculate a second average value of the target feature information of all the first training samples of the target category according to the target feature information of each of the plurality of first training samples of the target category, and determine the second average value as a category center of the first training samples of the target category.
In another implementation manner, the training device may further directly perform a clustering operation according to the target feature information of each of the plurality of first training samples of the target class, to obtain a class center of the first training sample of the target class.
306. The training device obtains a second training sample of at least one target class from the target training data subset according to the class center of the first training sample of the target class.
In some embodiments of the present application, after obtaining the category center of the first training sample of the target category, the training device may select a second training sample of at least one target category from a plurality of first training samples of the target category included in the target training data subset; wherein a distance between the second training sample of each target class and the class center of the target class is less than a target threshold.
A process for selecting a second training sample of at least one target class from a plurality of first training samples of the target class. Specifically, in one case, the class center of the target class is obtained based on the target feature information set (that is, the target feature information of each of the plurality of first training samples of the target class), the training device calculates a third distance between the target feature information of the second training sample of the target class and the class center of the target class, and determines the third distance as the distance between the second training sample of the target class and the class center of the target class; the third distance may be expressed as a cosine distance, a euclidean distance, a mahalanobis distance, a manhattan distance, a hamming distance, or other types of distances, and the like, which are not limited herein.
More specifically, the training device obtains at least one piece of target feature information, of which the third distance from the class center of the target class is smaller than or equal to the target threshold, from the target feature information of each first training sample of the target class by using the class center of the target class as an origin and using the target threshold as a radius, and determines the first training sample corresponding to the obtained at least one piece of target feature information as a second training sample.
In another case, the class center of the target class is obtained based on a plurality of first training samples of the target class, and the training device may directly calculate a fourth distance between the second training sample of the target class and the class center of the target class, and determine the fourth distance as the distance between the second training sample of the target class and the class center of the target class; the fourth distance may be expressed as a cosine distance, a euclidean distance, a mahalanobis distance, a manhattan distance, a hamming distance, or other types of distances, and the like, which is not limited herein.
More specifically, the training device obtains at least one first training sample from each first training sample of the target class, wherein a fourth distance between the training sample and the class center of the target class is smaller than or equal to the target threshold value, and the obtained at least one first training sample is determined as a second training sample, with the class center of the target class as an origin and the target threshold value as a radius.
Meaning for a target threshold. In one implementation, the values of the target thresholds corresponding to different classes in the N classes are the same. In another implementation manner, the more discrete the data distribution of the first training sample of the target class is, the larger the value of the target threshold is; or the more discrete the data distribution of the target feature information of the plurality of first training samples of the target category is, the larger the value of the target threshold is. That is, the target thresholds corresponding to different classes in the N classes may be different.
In the embodiment of the application, the more discrete the data distribution of the target characteristic information of the first training sample of the target category is, the larger the value of the target threshold value is; or, the more discrete the data distribution of the first training samples of the target class is, the larger the value of the target threshold is, the less the number of the second training samples of the target class can be avoided, so as to ensure that the number of the second training samples of different classes in the N classes can be kept balanced as much as possible, and ensure that the second neural network can learn the features of each class of the first training samples in the N classes of the first training samples.
Specifically, in one case, the class center of the target class is obtained based on a target feature information set (that is, the target feature information of each of the plurality of first training samples including the target class), the training device may be preconfigured with a second preset rule, the first data distribution information of the target feature information of the plurality of first training samples of the target class is obtained, and the target threshold corresponding to the first data distribution information is obtained according to the second preset rule.
The first data distribution information of the target feature information of the plurality of first training samples can be specifically expressed as a variance of the target feature information of the plurality of first training samples; alternatively, the first data distribution information may also be a difference between first feature information in the target feature information of the plurality of first training samples and a class center of the target class, where the first feature information is one of the plurality of target feature information corresponding to the plurality of first training samples of the target class one to one, which is farthest from the class center of the target class.
The second preset rule may be specifically configured to indicate a one-to-one correspondence relationship between the plurality of data distribution information and the plurality of thresholds; alternatively, the second preset rule may be a generation formula for obtaining the target threshold based on the first data distribution information, and the like.
To further understand the present solution, an example of a formula for determining the value of the target threshold is disclosed as follows:
Figure BDA0003141728650000191
wherein r represents a target threshold value, and p has a value ranging from 0 to 1,u c Representing a plurality of target characteristic information corresponding to a plurality of first training samples of the target class one by oneOne object feature information farthest from the class center of the object class,
Figure BDA0003141728650000192
the class center of the target class is represented, the class center of the target class is obtained based on the target feature information of a plurality of first training samples of the target class in formula (2), the first data distribution information is taken as the difference between the first feature information and the class center of the target class in formula (2) as an example, and the value of p can be obtained according to the difference value between the first feature information and the class center of the target class
Figure BDA0003141728650000193
The value of (b) is obtained from a second preset rule, that is, the second preset rule includes a correspondence relationship between a plurality of data distribution information and a plurality of p values; it should be noted that the example in equation (2) is only one example for facilitating understanding of the present solution, and for example, the second preset rule may also directly include a one-to-one correspondence relationship between a plurality of data distribution information and a plurality of thresholds, or the first data distribution information may also be a variance of target feature information of a plurality of first training samples, and the like, and the example here is not used to limit the present solution.
In another case, the category center of the target category is obtained based on a plurality of first training samples of the target category, a third preset rule may be preconfigured on the training device, second data distribution information of the plurality of first training samples of the target category is obtained, and a target threshold corresponding to the second data distribution information is obtained according to the third preset rule.
The second data distribution information of the plurality of first training samples of the target class can be expressed as the variance of the plurality of first training samples; alternatively, the second data distribution information may also be a difference between a third training sample in the plurality of first training samples and the class center of the target class, where the third training sample is a training sample farthest from the class center of the target class in the plurality of first training samples of the target class.
The meaning of the "third preset rule" is similar to the meaning of the "second preset rule", and the difference is that the first data distribution information in the "second preset rule" is replaced by the second data distribution information in the present scheme, which is not described herein again.
307. The training equipment trains the first neural network by using a target training data set, wherein the target training data set comprises N types of second training samples and at least one type of newly added training sample, and the N types of second training samples are derived from the N types of first training samples.
In the embodiment of the application, after the training device acquires at least one new class of training samples, the training device may first acquire a first neural network, and perform iterative training on the first neural network according to a first loss function by using a target training data set (that is, perform incremental training on the first neural network by using the target training data set) until a preset condition is met, so as to obtain a second neural network, where the second neural network is the first neural network that has performed training operation.
The first neural network and the third neural network have the same function, that is, the first neural network may also be a deep neural network, and the function of the first neural network may be any one of the following: image classification, feature extraction of images, regression operations from sequence data in text form, speech recognition, text translation or other tasks, etc. In the embodiment of the application, various specific implementation functions of the first neural network are provided, the application scene of the scheme is expanded, and the implementation flexibility of the scheme is improved.
Further, the first neural network may include only the target feature extraction network; optionally, the first neural network may include both the target feature extraction network and the first feature processing network; namely, the first neural network and the third neural network adopt a target feature extraction network to execute feature extraction operation. Since the first neural network needs to process the N-class samples and the newly added class samples, in some cases, the first feature processing network and the second feature processing network may be the same neural network. As an example, the first and second feature processing networks may be the same, e.g. the function of the first and third neural networks is to perform a regression operation based on textual information.
In other cases, the neural network architecture of the first feature processing network and the second feature processing network are different. As an example, if the function of the first neural network and the third neural network is image classification, the number of output nodes of the first feature processing network and the second feature processing network is different.
For a more intuitive understanding of the present solution, please refer to fig. 8, in which fig. 8 is a schematic diagram illustrating a comparison between a first neural network and a third neural network in a training method of a neural network according to an embodiment of the present application. In fig. 8, it is exemplified that the functions of the first neural network and the third neural network are image classification, and the training samples of at least one new class include a class of training samples. As shown in fig. 8, the third neural network includes a target feature extraction network and a second feature processing network, the first neural network includes a target feature extraction network and a first feature processing network, and a comparison between the first feature processing network and the second feature processing network indicates that a branch corresponding to a newly added category is added to the first feature processing network, and it should be understood that the example in fig. 8 is only for convenience of understanding the present solution and is not used to limit the present solution.
The target training data set comprises N types of second training samples and at least one type of newly added training sample, the target training data set comprises correct results corresponding to each training sample in the target training data set, and the N types of second training samples are derived from the N types of first training samples. The preset condition may be a convergence condition for reaching the first loss function, or may be that the number of times of training the first neural network reaches a preset number threshold.
A concept of N classes of second training samples included for the target training data set. Steps 304 to 306 are optional steps, and if the training apparatus does not perform steps 304 to 306, in one case, the N types of second training samples and the N types of first training samples may be the same concept, that is, the target training data set includes all the first training samples in the historical training data set and at least one type of training samples of the newly added category.
In another case, for a first training sample of any one of the N classes (i.e., a "target class"), the training device may randomly select a preset number of training samples from the first training samples of the target class as a second training sample, and perform the foregoing operation on each of the N classes by the training device, thereby obtaining N classes of second training samples, and the training device may further update the historical training data set to the N classes of second training samples and at least one class of training samples of the newly added class.
If the training device performs steps 304 to 306, the concept of N types of second training samples may refer to the description in step 306, which is not repeated herein. The training device also updates the historical training data set to N classes of second training samples and at least one class of newly added training samples. To further understand the present solution, a target training data set is shown in a formula manner, where R represents the N types of second training samples obtained in steps 304 to 306, K represents at least one type of training sample of the newly added category, and the target training data set is { R, K }, and the training device further updates the historical training data set in step 301, that is, D = { R, K }.
A process of training a first neural network with a target training data set for a training device. In one case, the training device keeps the parameters of at least one second neural network layer in the first neural network unchanged, and trains the first neural network according to a first loss function using a target training data set.
Specifically, the training device obtains any one training sample (for convenience of description, hereinafter referred to as a "fourth training sample") from the target training data set and a correct result corresponding to the fourth training sample, where the fourth training sample may specifically be the second training sample or may also be a training sample of a newly added category.
And the training equipment inputs the fourth training sample into the first neural network to obtain a prediction result which is output by the first neural network and corresponds to the fourth training sample, and a function value of the first loss function is generated according to the prediction result which corresponds to the fourth training sample and a correct result which corresponds to the fourth training sample. The first loss function is used for indicating the similarity between the predicted result corresponding to the fourth training sample and the correct result corresponding to the fourth training sample; the first loss function may be specifically a cross-entropy loss function, an L1 loss function, an L2 loss function, a 0-1 loss function, or other types of loss functions, and the like, which is not limited herein.
The training equipment keeps the parameters of at least one second neural network layer in the first neural network unchanged, and reversely updates the parameters of other neural network layers in the first neural network according to the function value of the first loss function so as to finish one training of the first neural network. The training device repeatedly executes the steps on the first neural network by using different training samples in the target training data set so as to realize iterative training on the first neural network.
For a more intuitive understanding of the present disclosure, please refer to fig. 9, and fig. 9 is a schematic diagram of at least one second neural network layer in the neural network training method according to the embodiment of the present disclosure. In fig. 9, taking the function of the first neural network as an example of image classification, referring to fig. 9, if 4 first neural network layers located before the target neural network layer in the target feature extraction network of the first neural network are determined as 4 second neural network layers, in the process of updating the parameters of the first neural network, the parameters of the 4 second neural network layers are kept unchanged, and only the parameters of other neural network layers in the first neural network are updated, it should be understood that the example in fig. 9 is only for convenience of understanding the present solution, and is not used for limiting the present solution.
In the embodiment of the application, the N classes can be well distinguished by the target feature information of the first training sample obtained through the target neural network layer (i.e., through the at least one second neural network layer in the target feature extraction network), i.e., the target feature information of the first training sample of the target class can well represent the target class, so that a plurality of second training samples representing the target class can be selected by using the target feature information of the first training sample of the target class, i.e., the quality of the training sample of the first neural network is improved, and the accuracy of the second neural network is improved.
In another case, after obtaining the function value of the first loss function, the training device may also reversely update the parameters of the entire first neural network according to the function value of the first loss function, so as to complete one training of the first neural network.
Optionally, the training device may further input the second training sample into the third neural network and the first neural network, respectively, to obtain a first prediction result corresponding to the second training sample output by the third neural network and a second prediction result corresponding to the second training sample output by the first neural network, and generate a function value of the second loss function according to the first prediction result corresponding to the second training sample and the second prediction result corresponding to the second training sample output by the first neural network. The second loss function is used for indicating the similarity between the first prediction result corresponding to the second training sample and the second prediction result corresponding to the second training sample; the second loss function may be specifically a cross-entropy loss function, an L1 loss function, an L2 loss function, a 0-1 loss function, or other types of loss functions, and the like, which is not limited herein.
And then the training equipment can keep the parameters of at least one second neural network layer in the first neural network unchanged, and reversely update the parameters of other neural network layers in the first neural network according to the function value of the first loss function and the function value of the second loss function so as to finish one training of the first neural network. Or the training equipment directly and reversely updates the parameters of the whole first neural network according to the function value of the first loss function and the function value of the second loss function so as to finish one-time training of the first neural network.
To further understand the present solution, one example of the formula for the first and second loss functions is disclosed below:
L2=εL c (K)+(1-ε)L d (R); (3)
Figure BDA0003141728650000221
wherein L2 represents the sum of the first loss function and the second loss function, ε is a parameter, ε is a value of 0 to 1, L c (K) Represents the first loss function, L c (K) Represents the firstA loss function, L d (R) represents a second loss function, F represents a third neural network,
Figure BDA0003141728650000222
represents a first neural network, y r,F Representing the first prediction result corresponding to the second training sample,
Figure BDA0003141728650000223
representing the second prediction result corresponding to the second training sample, it should be understood that the examples of equations (3) and (4) are only for convenience of understanding the scheme and are not intended to limit the scheme.
It should be noted that, in the embodiment of the present application, the execution times of step 302 and step 307 are not limited, and step 307 may be executed once after step 302 is executed multiple times.
In the embodiment of the application, in the process of training the first neural network, the parameters of at least one second neural network layer in the first neural network are kept unchanged, so that the total time for training the first neural network is favorably shortened, and the consumption of computer resources in the training process of the first neural network is favorably reduced; in addition, at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is positioned before the target neural network layer, the target neural network layer is selected according to a first accuracy rate, the first accuracy rate is the accuracy rate obtained by performing classification operation on the first training samples based on the feature information generated by the first neural network layer, namely the N types of training samples can be better distinguished through the feature information generated by the target neural network layer, so that the comprehension capability of the learned N types of training samples in the target feature extraction network is kept as much as possible, and the accuracy rate of the trained first neural network (namely the second neural network) is improved.
2. Inference phase
In the embodiment of the present application, the inference phase describes a process how the execution device 210 performs processing on a sample by using the target model/rule 201 to generate a predicted image, specifically, please refer to fig. 10, where fig. 10 is a flowchart of a sample processing method provided in the embodiment of the present application, and the sample processing method provided in the embodiment of the present application may include:
1001. the execution equipment acquires a second neural network, the second neural network is obtained by training a first neural network by using a target training data set, the first neural network comprises a target characteristic extraction network, the target characteristic extraction network is obtained by training by using a historical training data set, N types of first training samples are arranged in the historical training data set, the target training data set comprises N types of second training samples and at least one type of newly added training sample, and the N types of second training samples are derived from the N types of first training samples.
1002. The execution equipment processes the input sample through the second neural network to obtain a processing result output by the second neural network, the target feature extraction network comprises a plurality of first neural network layers, in the training process of the first neural network, parameters of at least one second neural network layer in the first neural network are unchanged, the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is positioned in front of the target neural network layer, the target neural network layer belongs to the target feature extraction network, the target neural network layer is determined according to at least one first accuracy rate in one-to-one correspondence with the at least one first neural network layer, and the first accuracy rate is the accuracy rate obtained by performing classification operation on the first training sample based on feature information generated by the first neural network layer.
In the embodiment of the present application, after the training device performs iterative training on the first neural network to obtain the second neural network, the second neural network may be sent to the execution device, where the function of the second neural network is consistent with that of the first neural network, and the form of the input sample is consistent with that of the first training sample and that of the second training sample, which can be referred to the description in the corresponding embodiment of fig. 3. For the training process of the first neural network, the description in the corresponding embodiment shown in fig. 3 may also be referred to, and is not repeated herein.
In the embodiment of the application, not only the training process of the first neural network is provided, but also the application process of the second neural network (namely the trained first neural network) is provided, and the application scene of the scheme is expanded.
In order to more intuitively understand the beneficial effects brought by the embodiment of the application, the beneficial effects of the embodiment of the application are described in combination with experimental data, in the experiment, a public data set Mnist is used for training, the first 12000 sample data in the public data set Mnist are used as a historical training data set, and the 12000-14000 samples in the public data set Mnist are used as at least one new type of sample. The reference group adopts a scheme that only the parameters of the last neural network layer in the target feature extraction network and the parameters of the first feature processing network are adjusted in the process of training the first neural network, and the accuracy of the second neural network obtained through the scheme provided by the reference group is 98.5%; the accuracy of the second neural network obtained by the scheme provided by the embodiment of the application is 99%, and the accuracy of the second neural network is improved.
On the basis of the embodiments corresponding to fig. 1 to 10, in order to better implement the above-mentioned scheme of the embodiments of the present application, the following also provides related equipment for implementing the above-mentioned scheme. Referring to fig. 11, fig. 11 is a schematic structural diagram of a training apparatus for a neural network according to an embodiment of the present disclosure, where the training apparatus 1100 for a neural network includes: an obtaining module 1101, configured to obtain a target feature extraction network, where the target feature extraction network is obtained by training using a historical training data set, the historical training data set includes N types of first training samples, N is a positive integer, and the target feature extraction network includes a plurality of first neural network layers; a determining module 1102, configured to determine a target neural network layer from a target feature extraction network according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer, where the first accuracy rate is an accuracy rate obtained by performing a classification operation on a first training sample based on feature information generated by the first neural network layer; a training module 1103, configured to keep parameters of at least one second neural network layer in the target feature extraction network unchanged, train a first neural network by using a target training data set until a preset condition is met, and obtain a second neural network, where the first neural network includes a target feature extraction network; the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is positioned before the target neural network layer, the target training data set comprises N types of second training samples and at least one type of newly added training samples, and the N types of second training samples are derived from the N types of first training samples.
In one possible design, the determining module 1102 is specifically configured to: acquiring at least two first accuracy rates which are in one-to-one correspondence with the at least two first neural network layers, and acquiring a second accuracy rate from the at least two first accuracy rates, wherein the value of the second accuracy rate in the at least two first accuracy rates is the highest; and acquiring a third neural network layer corresponding to the second accuracy rate one by one from the at least two first neural network layers, and determining a target neural network layer according to the third neural network layer.
In one possible design, the determining module 1102 is further configured to determine a class center of the first training sample of the target class according to target feature information of the first training sample of the target class, where the target class is any one of N classes, and the target feature information is obtained through at least one second neural network layer in the target feature extraction network; the obtaining module 1101 is further configured to obtain, from the historical training data set, a second training sample of at least one target class according to the class center of the first training sample of the target class, where a distance between the target feature information of the second training sample of each target class and the class center of the target class is smaller than a target threshold.
In one possible design, the more discrete the data distribution of the plurality of target feature information corresponding to the plurality of first training samples of the target class is, the larger the value of the target threshold is.
In one possible design, the function of the first neural network is any one of the following: image classification, feature extraction on images or regression operation according to sequence data in text form.
It should be noted that, the information interaction, the execution process, and the like between the modules/units in the training apparatus 1100 of the neural network are based on the same concept as the method embodiments corresponding to fig. 3 to 9 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described herein again.
Referring to fig. 12, fig. 12 is another schematic structural diagram of a training apparatus for a neural network provided in an embodiment of the present application, where the training apparatus 1200 for a neural network includes: an obtaining module 1201, configured to obtain a target feature extraction network, where the target feature extraction network is obtained by training using a historical training data set, the historical training data set includes N types of first training samples, N is a positive integer, and the target feature extraction network includes a plurality of first neural network layers; a determining module 1202, configured to determine a third neural network layer from the target feature extraction network according to at least one first accuracy that corresponds to at least one first neural network layer one to one, where the first accuracy is obtained by performing a classification operation on the first training sample based on feature information generated by the first neural network layer; the obtaining module 1201 is further configured to obtain target feature information of the first training sample of a target category, where the target category is any one of the N categories, and the target feature information is generated by a third neural network layer in the target feature extraction network; the determining module 1202 is further configured to determine a category center of the first training sample of the target category according to the target feature information of the first training sample of the target category; the obtaining module 1201 is further configured to obtain, from the historical training data set, at least one second training sample of the target category according to the category center of the first training sample of the target category, where a distance between the target feature information of the second training sample of each target category and the category center of the target category is smaller than a target threshold; the training module 1203 is configured to train the first neural network by using a target training data set until a preset condition is met, so as to obtain a second neural network, where the target training data set includes N types of selected second training samples and at least one type of newly added training samples, and the first neural network includes a target feature extraction network.
In one possible design, the determining module 1202 is further configured to determine a target neural network layer from the target feature extraction network according to a third neural network layer, where the target neural network layer is located at a position in the target feature extraction network that is further forward than a position of the third neural network layer in the target feature extraction network; the training module 1203 is specifically configured to keep parameters of at least one second neural network layer in the target feature extraction network unchanged, train the first neural network by using the target training data set until a preset condition is met, and obtain a second neural network, where the at least one second neural network layer includes a neural network layer located before the target neural network layer in the target feature extraction network.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the training apparatus 1200 of the neural network are based on the same concept as the method embodiments corresponding to fig. 3 to fig. 9 in the present application, and specific contents may refer to the description in the foregoing method embodiments of the present application, and are not repeated herein.
Referring to fig. 13, fig. 13 is a schematic view of another structure of a sample processing apparatus 1300 according to an embodiment of the present disclosure, which includes: an obtaining module 1301, configured to obtain a second neural network, where the second neural network is obtained by training a first neural network using a target training data set, a target feature extraction network in the first neural network performs a training operation based on a historical training data set, the historical training data set includes N types of first training samples, the target training data set includes N types of second training samples and at least one type of newly added training samples, the N types of second training samples are derived from the N types of first training samples, and N is a positive integer; a processing module 1302, configured to process the input sample through the second neural network to obtain a processing result output by the second neural network; the target feature extraction network comprises a plurality of first neural network layers, and parameters of at least one second neural network layer in the first neural network are unchanged in the training process of the first neural network; the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is positioned before the target neural network layer, the target neural network layer is determined according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer in the target feature extraction network, and the first accuracy rate is obtained by performing classification operation on the input samples based on feature information generated by the first neural network layer.
In one possible design, the distance between the target feature information of the second training sample of each target class and the class center of the first training sample of the target class is smaller than a target threshold, the target class is any one of N classes, the class center of the first training sample of the target class is obtained based on the target feature information of the first training sample of the target class, and the target feature information is obtained through at least one second neural network layer in the target feature extraction network.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the sample processing apparatus 1300 are based on the same concept as the method embodiments corresponding to fig. 10 in the present application, and specific contents may refer to the descriptions in the foregoing method embodiments in the present application, and are not described herein again.
Referring to fig. 14, fig. 14 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 1400 may be embodied as a virtual reality VR device, a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a monitoring data processing device, or a radar data processing device, which is not limited herein. The processing apparatus 1300 of the sample described in the embodiment corresponding to fig. 13 may be disposed on the execution device 1400, so as to implement the function of the execution device in the embodiment corresponding to fig. 10. Specifically, the execution device 1400 includes: a receiver 1401, a transmitter 1402, a processor 1403 and a memory 1404 (wherein the number of processors 1403 in the performing device 1400 may be one or more, for example one processor in fig. 14), wherein the processor 1403 may comprise an application processor 14031 and a communication processor 14032. In some embodiments of the present application, the receiver 1401, the transmitter 1402, the processor 1403, and the memory 1404 may be connected by a bus or other means.
The memory 1404 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1403. A portion of memory 1404 may also include non-volatile random access memory (NVRAM). The memory 1404 stores the processor and operating instructions, executable modules or data structures, or a subset or an expanded set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
The processor 1403 controls the operation of the execution device. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as bus systems.
The method disclosed in the embodiments of the present application may be applied to the processor 1403, or implemented by the processor 1403. The processor 1403 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method can be performed by hardware integrated logic circuits or instructions in software form in the processor 1403. The processor 1403 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The processor 1403 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The storage medium is located in the memory 1404, and the processor 1403 reads information in the memory 1404 and completes the steps of the above-described method in combination with hardware thereof.
The receiver 1401 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 1402 may be configured to output numeric or character information via a first interface; the transmitter 1402 may also be configured to send instructions to the disk pack via the first interface to modify data in the disk pack; the transmitter 1402 may also include a display device such as a display screen.
In this embodiment, the application processor 14031 in the processor 1403 is configured to execute the method for processing the sample executed by the execution device in the embodiment corresponding to fig. 10. It should be noted that, the specific manner in which the application processor 14031 executes each step is based on the same concept as that of each method embodiment corresponding to fig. 10 in the present application, and the technical effect brought by the method embodiment is the same as that of each method embodiment corresponding to fig. 10 in the present application, and specific contents may refer to descriptions in the foregoing method embodiments in the present application, and are not described again here.
An embodiment of the present application further provides a training device, please refer to fig. 15, fig. 15 is a schematic structural diagram of the training device provided in the embodiment of the present application, and a training apparatus 1100 of a neural network described in the embodiment corresponding to fig. 11 may be deployed on a training device 1500, so as to implement the functions of the training device in the embodiments corresponding to fig. 3 to 9; alternatively, the training apparatus 1500 may be disposed with the training device 1200 of the neural network described in the corresponding embodiment of fig. 12. In particular, training apparatus 1500 is implemented as one or more servers, which may be configured or otherwise functionally differentiated to include one or more Central Processing Units (CPUs) 1522 (e.g., one or more processors) and memory 1532, one or more storage media 1530 (e.g., one or more mass storage devices) for storing applications 1542 or data 1544. Memory 1532 and storage medium 1530 may be, among other things, transitory or persistent storage. The program stored on the storage medium 1530 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, a central processor 1522 may be provided in communication with the storage medium 1530, executing a series of instruction operations in the storage medium 1530 on the exercise device 1500.
Training apparatus 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input-output interfaces 1558, and/or one or more operating systems 1541, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.
In one embodiment of the present application, the central processor 1522 is configured to execute the steps executed by the training apparatus 1100 for neural network described in the corresponding embodiment of fig. 11. The specific manner in which the central processing unit 1522 executes the foregoing steps is based on the same concept as that of the method embodiments corresponding to fig. 3 to 9 in the present application, and the technical effects brought by the method embodiments are the same as those of the method embodiments corresponding to fig. 3 to 9 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described herein again.
In another case, the central processor 1522 is configured to execute the steps executed by the training apparatus 1200 for neural network described in the corresponding embodiment of fig. 12. The specific manner in which the central processing unit 1522 executes the foregoing steps is based on the same concept as that of the method embodiments corresponding to fig. 3 to 9 in the present application, and the technical effects brought by the method embodiments are the same as those of the method embodiments corresponding to fig. 3 to 9 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described herein again.
Embodiments of the present application also provide a computer program product, which when executed on a computer, causes the computer to execute the steps performed by the apparatus in the method described in the embodiment shown in fig. 10 or the steps performed by the training apparatus in the method described in the embodiments shown in fig. 3 to 9.
Also provided in the embodiments of the present application is a computer-readable storage medium, which stores a program for signal processing, and when the program is run on a computer, the program causes the computer to execute the steps executed by the device in the method described in the foregoing embodiment shown in fig. 10, or causes the computer to execute the steps executed by the training device in the method described in the foregoing embodiments shown in fig. 3 to 9.
The sample processing device, the neural network training device, the execution device, and the training device provided in the embodiment of the present application may specifically be chips, where the chips include: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to enable the chip to execute the neural network training method described in the embodiment shown in fig. 10, or to enable the chip to execute the neural network training method described in the embodiments shown in fig. 3 to 9. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.
Specifically, referring to fig. 16, fig. 16 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU 160, and the NPU 160 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core part of the NPU is an arithmetic circuit 1603, and the controller 1604 controls the arithmetic circuit 1603 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 1603 includes a plurality of processing units (PEs) therein. In some implementations, the arithmetic circuitry 1603 is a two-dimensional systolic array. The arithmetic circuit 1603 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 1603 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 1602 and buffers it in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1601 and performs matrix operation with the matrix B, and a partial result or a final result of the obtained matrix is stored in an accumulator (accumulator) 1608.
The unified memory 1606 is used to store input data as well as output data. The weight data is directly passed through a Memory Access Controller (DMAC) 1605, which is then carried into a weight Memory 1602. The input data is also carried into the unified memory 1606 through the DMAC.
The BIU is a Bus Interface Unit (Bus Interface Unit) 1610, which is used for the interaction of the AXI Bus with the DMAC and an Instruction Fetch memory (IFB) 1609.
The Bus Interface Unit 1610 (Bus Interface Unit, BIU for short) is configured to fetch instructions from the external memory by the instruction fetch memory 1609, and further configured to fetch the original data of the input matrix a or the weight matrix B from the external memory by the storage Unit access controller 1605.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1606, or to transfer weight data to the weight memory 1602, or to transfer input data to the input memory 1601.
The vector calculation unit 1607 includes a plurality of operation processing units, and further processes the output of the operation circuit such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.
In some implementations, the vector calculation unit 1607 can store the processed output vector to the unified memory 1606. For example, the vector calculation unit 1607 may apply a linear function and/or a non-linear function to the output of the arithmetic circuit 1603, such as linear interpolation of the feature planes extracted by the convolution layer, and further such as a vector of accumulated values to generate the activation value. In some implementations, the vector calculation unit 1607 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 1603, e.g., for use in subsequent layers in a neural network.
An instruction fetch buffer (1609) connected to the controller 1604 for storing instructions used by the controller 1604;
the unified memory 1606, the input memory 1601, the weight memory 1602, and the instruction fetch memory 1609 are all On-Chip memories. The external memory is private to the NPU hardware architecture.
Among them, the operations of the layers in the first neural network shown in fig. 3 to 9 may be performed by the operation circuit 1603 or the vector calculation unit 1607.
Wherein any of the above processors may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the method of the first aspect.
It should be noted that the above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that a computer can store or a data storage device, such as a training device, data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

Claims (22)

1. A method of training a neural network, the method comprising:
acquiring a target feature extraction network, wherein the target feature extraction network is obtained by training by adopting a historical training data set, the historical training data set is provided with N types of first training samples, N is a positive integer, and the target feature extraction network comprises a plurality of first neural network layers;
determining a target neural network layer from the target feature extraction network according to at least one first accuracy rate corresponding to at least one first neural network layer, wherein the first accuracy rate is obtained by performing classification operation on the first training sample based on feature information generated by the first neural network layer;
keeping parameters of at least one second neural network layer in the target feature extraction network unchanged, and training a first neural network by using a target training data set until a preset condition is met to obtain a second neural network, wherein the first neural network comprises the target feature extraction network;
wherein the at least one second neural network layer comprises a neural network layer located before the target neural network layer in the target feature extraction network, the target training data set comprises N types of second training samples and at least one type of newly added training sample, and the N types of second training samples are derived from the N types of first training samples.
2. The method of claim 1, wherein determining a target neural network layer from the target feature extraction network based on at least one first accuracy rate corresponding to at least one first neural network layer comprises:
acquiring at least two first accuracy rates in one-to-one correspondence with the at least two first neural network layers, and acquiring a second accuracy rate from the at least two first accuracy rates, wherein the second accuracy rate has the highest value in the at least two first accuracy rates;
and acquiring a third neural network layer corresponding to the second accuracy rate one by one from the at least two first neural network layers, and determining the target neural network layer according to the third neural network layer.
3. The method of claim 1 or 2, wherein the training of the first neural network with the target training data set is performed until a preset condition is met, and the method further comprises:
determining a class center of a first training sample of a target class according to target feature information of the first training sample of the target class, wherein the target class is any one of the N classes, and the target feature information is obtained through the at least one second neural network layer in the target feature extraction network;
and acquiring at least one second training sample of the target category from the historical training data set according to the category center of the first training sample of the target category, wherein the distance between the target feature information of the second training sample of each target category and the category center of the target category is smaller than a target threshold value.
4. The method according to claim 3, wherein the value of the target threshold is larger as a data distribution of a plurality of target feature information corresponding to a plurality of first training samples of the target class is more discrete.
5. The method of claim 1 or 2, wherein the first neural network functions as any one of: image classification, feature extraction on images or regression operation according to sequence data in text form.
6. A method of training a neural network, the method comprising:
acquiring a target feature extraction network, wherein the target feature extraction network is obtained by training a historical training data set, the historical training data set is provided with N types of first training samples, N is a positive integer, and the target feature extraction network comprises a plurality of first neural network layers;
determining a third neural network layer from the target feature extraction network according to at least one first accuracy rate corresponding to at least one first neural network layer, wherein the first accuracy rate is an accuracy rate obtained by performing classification operation on the first training sample based on feature information generated by the first neural network layer;
acquiring target feature information of the first training sample of a target category, wherein the target category is any one of the N categories, and the target feature information is generated by the third neural network layer in the target feature extraction network;
determining the class center of the first training sample of the target class according to the target feature information of the first training sample of the target class;
acquiring at least one second training sample of the target class from the historical training data set according to the class center of the first training sample of the target class, wherein the distance between the target feature information of the second training sample of each target class and the class center of the target class is smaller than a target threshold value;
and training a first neural network by using a target training data set until a preset condition is met to obtain a second neural network, wherein the target training data set comprises the N types of selected second training samples and at least one type of newly added training samples, and the first neural network comprises the target feature extraction network.
7. The method of claim 6, further comprising:
determining a target neural network layer from the target feature extraction network according to the third neural network layer, the target neural network layer being located at a position in the target feature extraction network that is more advanced than the position of the third neural network layer in the target feature extraction network;
the training of the first neural network by using the target training data set until a preset condition is met comprises the following steps:
keeping parameters of at least one second neural network layer in the target feature extraction network unchanged, training the first neural network by using a target training data set until a preset condition is met, and obtaining the second neural network, wherein the at least one second neural network layer comprises the neural network layer which is positioned in the target feature extraction network and is positioned before the target neural network layer.
8. A method of processing a sample, the method comprising:
obtaining a second neural network, wherein the second neural network is obtained by training a first neural network by using a target training data set, a target feature extraction network in the first neural network performs training operation based on a historical training data set, the historical training data set is provided with N types of first training samples, the target training data set comprises N types of second training samples and at least one type of newly added training sample, the N types of second training samples are derived from the N types of first training samples, and N is a positive integer;
processing the input sample through the second neural network to obtain a processing result output by the second neural network;
wherein the target feature extraction network comprises a plurality of first neural network layers, and parameters of at least one second neural network layer in the first neural network are unchanged in the training process of the first neural network;
the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network before a target neural network layer, the target neural network layer is determined according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer in the target feature extraction network, and the first accuracy rate is the accuracy rate obtained by performing classification operation on input samples based on feature information generated by the first neural network layer.
9. The method of claim 8,
and the distance between the target feature information of the second training sample of each target class and the class center of the first training sample of the target class is smaller than a target threshold value, the target class is any one of the N classes, the class center of the first training sample of the target class is obtained based on the target feature information of the first training sample of the target class, and the target feature information is obtained through the at least one second neural network layer in the target feature extraction network.
10. An apparatus for training a neural network, the apparatus comprising:
the acquisition module is used for acquiring a target feature extraction network, the target feature extraction network is obtained by training a historical training data set, N types of first training samples exist in the historical training data set, N is a positive integer, and the target feature extraction network comprises a plurality of first neural network layers;
a determining module, configured to determine a target neural network layer from the target feature extraction network according to at least one first accuracy corresponding to at least one first neural network layer, where the first accuracy is an accuracy obtained by performing a classification operation on the first training sample based on feature information generated by the first neural network layer;
the training module is used for keeping parameters of at least one second neural network layer in the target feature extraction network unchanged, training a first neural network by using a target training data set until a preset condition is met, and obtaining a second neural network, wherein the first neural network comprises the target feature extraction network;
wherein the at least one second neural network layer comprises a neural network layer in the target feature extraction network before the target neural network layer, the target training data set comprises N types of second training samples and at least one type of newly added training samples, and the N types of second training samples are derived from the N types of first training samples.
11. The apparatus of claim 10, wherein the determining module is specifically configured to:
acquiring at least two first accuracy rates in one-to-one correspondence with the at least two first neural network layers, and acquiring a second accuracy rate from the at least two first accuracy rates, wherein the second accuracy rate has the highest value in the at least two first accuracy rates;
and acquiring a third neural network layer corresponding to the second accuracy rate one by one from the at least two first neural network layers, and determining the target neural network layer according to the third neural network layer.
12. The apparatus of claim 10 or 11,
the determining module is further configured to determine a class center of a first training sample of a target class according to target feature information of the first training sample of the target class, where the target class is any one of the N classes, and the target feature information is obtained through the at least one second neural network layer in the target feature extraction network;
the obtaining module is further configured to obtain at least one second training sample of the target category from the historical training data set according to the category center of the first training sample of the target category, where a distance between the target feature information of the second training sample of each target category and the category center of the target category is smaller than a target threshold.
13. The apparatus according to claim 12, wherein the more discrete the data distribution of the plurality of target feature information corresponding to the plurality of first training samples of the target class is, the larger the value of the target threshold is.
14. The apparatus of claim 10 or 11, wherein the function of the first neural network is any one of: image classification, feature extraction on images, or regression operation from sequence data in text form.
15. An apparatus for training a neural network, the apparatus comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target feature extraction network, the target feature extraction network is obtained by training by adopting a historical training data set, the historical training data set is provided with N types of first training samples, N is a positive integer, and the target feature extraction network comprises a plurality of first neural network layers;
a determining module, configured to determine a third neural network layer from the target feature extraction network according to at least one first accuracy rate corresponding to at least one first neural network layer, where the first accuracy rate is an accuracy rate obtained by performing a classification operation on the first training sample based on feature information generated by the first neural network layer;
the obtaining module is further configured to obtain target feature information of the first training sample of a target category, where the target category is any one of the N categories, and the target feature information is generated by the third neural network layer in the target feature extraction network;
the determining module is further configured to determine a category center of the first training sample of the target category according to the target feature information of the first training sample of the target category;
the obtaining module is further configured to obtain at least one second training sample of the target category from the historical training data set according to a category center of a first training sample of the target category, where a distance between target feature information of the second training sample of each target category and the category center of the target category is smaller than a target threshold;
and the training module is used for training a first neural network by using a target training data set until a preset condition is met to obtain a second neural network, the target training data set comprises the N types of selected second training samples and at least one type of newly added training samples, and the first neural network comprises the target feature extraction network.
16. The apparatus of claim 15,
the determining module is further configured to determine a target neural network layer from the target feature extraction network according to the third neural network layer, where the target neural network layer is located at a position in the target feature extraction network that is more advanced than the position of the third neural network layer in the target feature extraction network;
the training module is specifically configured to keep parameters of at least one second neural network layer in the target feature extraction network unchanged, train the first neural network by using a target training data set until a preset condition is met, and obtain the second neural network, where the at least one second neural network layer includes a neural network layer located before the target neural network layer in the target feature extraction network.
17. An apparatus for processing a sample, the apparatus comprising:
an obtaining module, configured to obtain a second neural network, where the second neural network is obtained by training a first neural network using a target training data set, a target feature extraction network in the first neural network performs a training operation based on a historical training data set, the historical training data set includes N types of first training samples, the target training data set includes N types of second training samples and at least one type of newly added training sample, the N types of second training samples are derived from the N types of first training samples, and N is a positive integer;
the processing module is used for processing the input sample through the second neural network to obtain a processing result output by the second neural network;
wherein the target feature extraction network comprises a plurality of first neural network layers, and parameters of at least one second neural network layer in the first neural network are unchanged in the training process of the first neural network;
the at least one second neural network layer comprises a neural network layer which is positioned in the target feature extraction network and is before a target neural network layer, the target neural network layer is determined according to at least one first accuracy rate in one-to-one correspondence with at least one first neural network layer in the target feature extraction network, and the first accuracy rate is obtained by performing classification operation on input samples based on feature information generated by the first neural network layer.
18. The apparatus according to claim 17, wherein a distance between the target feature information of the second training sample of each target class and a class center of the first training sample of the target class is smaller than a target threshold, the target class is any one of the N classes, the class center of the first training sample of the target class is obtained based on the target feature information of the first training sample of the target class, and the target feature information is obtained through the at least one second neural network layer in the target feature extraction network.
19. A computer program product, characterized in that, when run on a computer, causes the computer to perform the method of any one of claims 1 to 5, or causes the computer to perform the method of claim 6 or 7, or causes the computer to perform the method of claim 8 or 9.
20. A computer-readable storage medium, characterized by comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 5, or causes the computer to perform the method of claim 6 or 7, or causes the computer to perform the method of claim 8 or 9.
21. An exercise device comprising a processor and a memory, the processor coupled with the memory,
the memory is used for storing programs;
the processor to execute a program in the memory to cause the training apparatus to perform the method of any one of claims 1 to 5 or to cause the training apparatus to perform the method of claim 6 or 7.
22. An execution device comprising a processor and a memory, the processor coupled with the memory,
the memory is used for storing programs;
the processor, configured to execute the program in the memory, to cause the execution device to perform the method according to claim 8 or 9.
CN202110742175.XA 2021-06-30 2021-06-30 Neural network training method, sample processing method and related equipment Pending CN115546528A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110742175.XA CN115546528A (en) 2021-06-30 2021-06-30 Neural network training method, sample processing method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110742175.XA CN115546528A (en) 2021-06-30 2021-06-30 Neural network training method, sample processing method and related equipment

Publications (1)

Publication Number Publication Date
CN115546528A true CN115546528A (en) 2022-12-30

Family

ID=84723176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110742175.XA Pending CN115546528A (en) 2021-06-30 2021-06-30 Neural network training method, sample processing method and related equipment

Country Status (1)

Country Link
CN (1) CN115546528A (en)

Similar Documents

Publication Publication Date Title
CN111797893B (en) Neural network training method, image classification system and related equipment
CN112183577A (en) Training method of semi-supervised learning model, image processing method and equipment
EP4167130A1 (en) Neural network training method and related device
CN112016543A (en) Text recognition network, neural network training method and related equipment
WO2022068623A1 (en) Model training method and related device
WO2021218471A1 (en) Neural network for image processing and related device
CN110222718B (en) Image processing method and device
CN113095475A (en) Neural network training method, image processing method and related equipment
CN111797589A (en) Text processing network, neural network training method and related equipment
CN111738403A (en) Neural network optimization method and related equipment
WO2023051369A1 (en) Neural network acquisition method, data processing method and related device
WO2021190433A1 (en) Method and device for updating object recognition model
CN113011568A (en) Model training method, data processing method and equipment
WO2023179482A1 (en) Image processing method, neural network training method and related device
CN113516227A (en) Neural network training method and device based on federal learning
CN114266897A (en) Method and device for predicting pox types, electronic equipment and storage medium
CN111414915A (en) Character recognition method and related equipment
CN113361549A (en) Model updating method and related device
CN112529149A (en) Data processing method and related device
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
CN115063585A (en) Unsupervised semantic segmentation model training method and related device
CN114169393A (en) Image classification method and related equipment thereof
CN114140841A (en) Point cloud data processing method, neural network training method and related equipment
CN113627421A (en) Image processing method, model training method and related equipment
CN113065634A (en) Image processing method, neural network training method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination