WO2021253941A1 - 神经网络模型训练、图像分类、文本翻译方法及装置、设备 - Google Patents

神经网络模型训练、图像分类、文本翻译方法及装置、设备 Download PDF

Info

Publication number
WO2021253941A1
WO2021253941A1 PCT/CN2021/086589 CN2021086589W WO2021253941A1 WO 2021253941 A1 WO2021253941 A1 WO 2021253941A1 CN 2021086589 W CN2021086589 W CN 2021086589W WO 2021253941 A1 WO2021253941 A1 WO 2021253941A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
codeword
network model
weight matrix
training
Prior art date
Application number
PCT/CN2021/086589
Other languages
English (en)
French (fr)
Other versions
WO2021253941A9 (zh
Inventor
胡丁晟
徐斌
姚棋中
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21825658.4A priority Critical patent/EP4152211A4/en
Publication of WO2021253941A1 publication Critical patent/WO2021253941A1/zh
Publication of WO2021253941A9 publication Critical patent/WO2021253941A9/zh
Priority to US18/068,450 priority patent/US20230120631A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a neural network model training method, an image classification method, and a text translation method, device, and equipment.
  • AI artificial intelligence
  • NN neural network
  • a neural network can generally include multiple weight coefficient matrices.
  • the neural network is used to perform preset task calculations, taking the classification task as an example, the data vector of the object to be classified can be input into the neural network so that the neural network can be based on the data vector and itself The vector of multiple weight coefficient matrices is calculated to obtain the output vector corresponding to the data vector, and then the neural network can classify the object to be classified based on the output vector.
  • the weight coefficient matrix of the neural network in the initial state is unknown.
  • the current neural network usually contains more layers (more than 15 layers), and the data volume of the weight coefficient matrix contained in each layer is also relatively large. As a result, repeatedly reading the weight coefficient matrix data during the training process will bring memory bottleneck. It is even difficult to train neural networks in resource-constrained scenarios.
  • the embodiments of the application provide a neural network model training method, an image classification method, a text translation method and device, and equipment, which can reduce the data amount of the weight matrix in the neural network model training and the weight matrix update process.
  • the amount of intermediate parameter calculations enables the use of the neural network model to perform preset tasks (for example, image classification or text translation, etc.) to solve the memory bottleneck problem and achieve the desired effect.
  • this application provides a neural network model training method.
  • the method includes: first obtaining a codeword corresponding to the first weight matrix of the neural network model from the memory, and then determining the weight matrix of the neural network model according to the codeword Is the first weight matrix, and use the training data to train the first weight matrix.
  • the codeword is updated to obtain the updated codeword, and the updated codeword is stored in the memory , Then, use the updated codewords obtained in the memory to determine the weight matrix of the neural network model as the second weight matrix, and use the training data to train the second weight matrix, and then stop when the preset stop condition is met Training of neural network model.
  • the embodiment of the application no longer directly reads the weight matrix from the memory when training the neural network model, but reads the codeword corresponding to the weight matrix to form the weight matrix for training, Since the memory space occupied by the codeword is much smaller than the memory space occupied by the weight matrix, the amount of data read from the memory can be greatly reduced, and the memory bottleneck problem can be overcome.
  • this application no longer calculates the update amount of the weight matrix during the model training process, but calculates the update amount of the codeword to re-determine the new weight matrix for subsequent training, thereby reducing the intermediate parameters in the update process. The amount of calculation, in turn, enables the training of neural network models to be carried out smoothly in resource-constrained scenarios.
  • the method further includes:
  • the initial weight matrix is divided to determine the codewords corresponding to the initial weight matrix. In order to store the codeword in the memory later, reduce the memory usage.
  • dividing the initial weight matrix to determine the codeword corresponding to the initial weight matrix includes:
  • the codeword corresponding to the initial weight matrix of the neural network model can be obtained from the memory, and the model training can be performed without directly reading the initial weight matrix for training, because the codewords occupy
  • the data storage space of is much smaller than the data storage space occupied by the weight matrix, so the amount of data read from the memory can be greatly reduced, and the memory bottleneck problem can be effectively overcome.
  • clustering k sub-matrices of the same dimension to obtain n codewords corresponding to k sub-matrices of the same dimension includes:
  • n codewords can be characterized, and each codeword can simultaneously represent multiple sub-matrices, and then these n codewords can be used to quickly decode the weight matrix of the neural network.
  • the method further includes:
  • the weight matrix of the neural network model is released in the memory. In this way, the memory space can be further saved, which is beneficial to overcome the memory bottleneck problem.
  • updating the codeword to obtain the updated codeword includes:
  • the codeword gradient is determined, and according to the codeword gradient, the updated codeword is determined.
  • the codeword gradient can be determined according to the weight gradient to obtain a more accurate updated codeword for subsequent model training.
  • the codeword gradient is determined according to the weight gradient
  • the updated codeword is determined according to the codeword gradient, including:
  • each codeword can be accurately determined and used to decode a new weight matrix for subsequent model training.
  • the method further includes:
  • index is the correspondence between the codeword and the weight matrix of the neural network model. So that the codeword and index can be used to decode the weight matrix of the neural network model more accurately.
  • the preset stop condition includes one or more of the following conditions:
  • the difference between the result label value corresponding to the training data and the output result of the neural network model on the training data is lower than the preset difference
  • the change rate of the difference between the result label value corresponding to the training data and the output result of the neural network model to the training data is lower than the preset change threshold
  • the update times of the model parameters in the neural network model reach the preset update times
  • the output value of the loss function adopted by the neural network model reaches a preset threshold; the loss function is used to measure the difference between the output result of the neural network model on the training data and the result label value corresponding to the training data.
  • this application also provides an image classification method, which includes: obtaining an image to be classified; inputting the image to be classified into a trained neural network model to obtain an image classification result output by the neural network model; wherein, the neural network
  • the training process of the model includes: first obtain the codeword corresponding to the first weight matrix of the neural network model from the memory, then determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to calculate the first weight Matrix training, where the training data includes positive sample images and negative sample images.
  • the codeword is updated to obtain the updated codeword, and the updated codeword is stored in the memory, and then , Use the updated codewords obtained in the memory to determine the weight matrix of the neural network model as the second weight matrix, and use the training data to train the second weight matrix, and then stop the neural network when the preset stop condition is met Model training.
  • the embodiment of the present application uses a pre-trained neural network model to classify the image to be classified. Since the neural network model can achieve the global optimum, the classification result output by the neural network model is more accurate, and thus can be Improve the accuracy of classification results.
  • this application also provides a text translation method, the method includes: obtaining the text to be translated; inputting the text to be translated into the trained neural network model to obtain the text translation result output by the neural network model; wherein, the neural network
  • the training process of the model includes: first obtain the codeword corresponding to the first weight matrix of the neural network model from the memory, then determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to calculate the first weight Matrix training, where the training data is sample text.
  • the codeword is updated to obtain the updated codeword, and the updated codeword is stored in the memory, and then used in the memory
  • the updated codewords obtained in determines the weight matrix of the neural network model as the second weight matrix, and uses the training data to train the second weight matrix, and then stops the training of the neural network model when the preset stopping condition is met.
  • the embodiment of this application uses a pre-trained neural network model to translate the text to be translated. Since the neural network model can achieve the global optimum, the translation result output by the neural network model is more accurate, and thus the translation result can be more accurate. Improve the accuracy of translation results.
  • the present application also provides a neural network model training device, which includes: a first acquisition unit configured to acquire codewords from a memory, where the codewords correspond to the first weight matrix of the neural network model;
  • the first training unit is used to determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to train the first weight matrix;
  • the update unit is used to when the preset stopping condition is not met, Update the codeword to obtain the updated codeword;
  • the storage unit is used to store the updated codeword in the memory;
  • the second training unit is used to determine the neural network model by using the updated codeword obtained in the memory
  • the weight matrix is the second weight matrix, and the training data is used to train the second weight matrix;
  • the stop unit is used to stop the training of the neural network model when the preset stop condition is met.
  • the device when the first weight matrix is the initial weight matrix, the device further includes:
  • the dividing unit is used to divide the initial weight matrix to determine the codeword corresponding to the initial weight matrix.
  • the division unit includes:
  • the first division subunit is used to divide the initial weight matrix into k sub-matrices of the same dimension; the k is a positive integer greater than 1;
  • the clustering subunit is used to perform clustering processing on the k sub-matrices of the same dimension to obtain n codewords corresponding to the k sub-matrices of the same dimension, where n is a positive integer greater than 0, n ⁇ k;
  • the first determining subunit is configured to determine the n codewords as the codewords corresponding to the initial weight matrix.
  • the clustering subunit includes:
  • the dimensionality reduction subunit is used to reduce the dimensions of k sub-matrices of the same dimension into one-dimensional vectors to obtain k one-dimensional vectors;
  • the second division subunit is used to divide the k one-dimensional vectors into n vector groups, where each vector group contains at least one one-dimensional vector;
  • the calculation subunit is used for averaging the element values of the corresponding positions in all the one-dimensional vectors belonging to the i-th vector group in the k one-dimensional vectors to obtain a code corresponding to all the one-dimensional vectors in the i-th vector group Word; where i is an integer from 1 to n respectively.
  • the device further includes: a releasing unit, configured to release the weight matrix of the neural network model in the memory when the preset stopping condition is not met.
  • the update unit includes:
  • the second determining subunit is used to determine the weight gradient of the first weight matrix of the neural network model when the preset stopping condition is not met;
  • the third determining subunit is used to determine the codeword gradient according to the first weight gradient and index, and determine the updated codeword according to the codeword gradient.
  • the third determining subunit includes:
  • the first obtaining subunit is used to perform a weighted summation of the weight gradients of the sub-matrix corresponding to the index number of the j-th codeword in the weight gradient to obtain the codeword gradient corresponding to the j-th codeword; where j is taken respectively An integer from 1 to n;
  • the second obtaining subunit is used to optimize the codeword gradient corresponding to the jth codeword to obtain the update amount of the jth codeword;
  • the third obtaining subunit is used to update the j-th codeword by using the update amount of the j-th codeword to obtain the updated j-th codeword.
  • the device further includes:
  • the second acquiring unit is used to acquire an index, where the index is the correspondence between the codeword and the weight matrix of the neural network model.
  • the preset stop condition includes one or more of the following conditions:
  • the difference between the result label value corresponding to the training data and the output result of the neural network model on the training data is lower than the preset difference
  • the change rate of the difference between the result label value corresponding to the training data and the output result of the neural network model to the training data is lower than the preset change threshold
  • the update times of the model parameters in the neural network model reach the preset update times
  • the output value of the loss function adopted by the neural network model reaches a preset threshold; among them, the loss function is used to measure the gap between the output result of the neural network model on the training data and the result label value corresponding to the training data.
  • the present application also provides an image classification device.
  • the device includes: an image acquisition unit for acquiring an image to be classified; an image classification unit for inputting the image to be classified into a trained neural network model to obtain a neural network.
  • the image classification result output by the network model; a neural network model training unit for training the neural network model;
  • the neural network model training unit includes:
  • the first acquiring unit is configured to acquire codewords from the memory, where the codewords correspond to the first weight matrix of the neural network model;
  • the first training unit is used to determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to train the first weight matrix; wherein the training data includes positive sample images and negative sample images;
  • the update unit is used to update the codeword to obtain the updated codeword when the preset stopping condition is not satisfied after the neural network model outputs the probability value of the training data as the positive sample image;
  • the storage unit is used to store the updated codeword in the memory
  • the second training unit is used to determine the weight matrix of the neural network model as the second weight matrix by using the updated codeword obtained in the memory, and to train the second weight matrix by using the training data;
  • the stop unit is used to stop the training of the neural network model when the preset stop condition is met.
  • the present application also provides a text translation device.
  • the device includes: a text acquisition unit for acquiring a text to be translated; a text translation unit for inputting the text to be translated into a trained neural network model to obtain a neural network model Text translation results output by the network model; neural network model training unit, used to train the neural network model;
  • the neural network model training unit includes:
  • the first acquiring unit is configured to acquire codewords from the memory, where the codewords correspond to the first weight matrix of the neural network model;
  • the first training unit is used to determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to train the first weight matrix; wherein the training data is sample text;
  • the update unit is used to update the codeword to obtain the updated codeword when the preset stopping condition is not met after the neural network model outputs the translation result of the sample text;
  • the storage unit is used to store the updated codeword in the memory
  • the second training unit is used to determine the weight matrix of the neural network model as the second weight matrix by using the updated codeword obtained in the memory, and to train the second weight matrix by using the training data;
  • the stop unit is used to stop the training of the neural network model when the preset stop condition is met.
  • this application also provides a neural network model training device, the neural network model training device including: a memory and a processor;
  • the memory is used to store instructions; the processor is used to execute the instructions in the memory and execute the neural network model training method in the first aspect and any one of its possible implementations.
  • the present application also provides an image classification device, which includes: a memory and a processor;
  • the memory is used to store instructions; the processor is used to execute the instructions in the memory to execute the image classification method in the second aspect described above.
  • this application also provides a text translation device, which includes: a memory and a processor;
  • the memory is used to store instructions; the processor is used to execute the instructions in the memory and execute the text translation method in the third aspect.
  • this application also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the neural network model training in the first aspect and any one of its possible implementations.
  • Method either executes the image classification method in the second aspect mentioned above, or executes the text translation method in the third aspect mentioned above.
  • the codeword corresponding to the first weight matrix of the neural network model is first obtained from the memory, and then the weight matrix of the neural network model is determined as the first weight matrix according to the codeword, and Use the training data to train the first weight matrix.
  • the preset stop condition is not met, update the codeword to obtain the updated codeword, and store the updated codeword in the memory, and then use
  • the updated codeword obtained in the memory determines the weight matrix of the neural network model as the second weight matrix, and uses the training data to train the second weight matrix, and then stops the neural network model when the preset stopping conditions are met. train.
  • the weight matrix is no longer directly read from the memory, but the codeword corresponding to the weight matrix is read in to form the weight matrix for training.
  • the memory space is much smaller than the memory space occupied by the weight matrix, so the amount of data read from the memory can be greatly reduced, and the memory bottleneck problem can be overcome.
  • this application no longer calculates the update amount of the weight matrix during the model training process, but calculates the update amount of the codeword to re-determine the new weight matrix for subsequent training, thereby reducing the intermediate parameters in the update process.
  • the amount of calculation enables the training of neural network models to be carried out smoothly in resource-constrained scenarios.
  • FIG. 1 is a schematic diagram of a structure of an artificial intelligence main frame provided by an embodiment of the application
  • FIG. 2 is an example diagram of a system architecture applied by an embodiment of this application
  • Fig. 3 is a flowchart of a neural network model training method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of determining a first weight matrix of a neural network model according to a codeword according to an embodiment of the application;
  • FIG. 5 is a schematic diagram of codeword update provided by an embodiment of the application.
  • FIG. 6 is a flowchart of an image classification method provided by an embodiment of the application.
  • FIG. 7 is a flowchart of a text translation method provided by an embodiment of the application.
  • FIG. 8 is a structural block diagram of a neural network model training device provided by an embodiment of this application.
  • FIG. 9 is a structural block diagram of an image classification device provided by an embodiment of this application.
  • FIG. 10 is a structural block diagram of a text translation device provided by an embodiment of this application.
  • FIG. 11 is a schematic structural diagram of a neural network model training device provided by an embodiment of this application.
  • FIG. 12 is a schematic structural diagram of an image classification device provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of a text translation device provided by an embodiment of this application.
  • the embodiments of the application provide a neural network model training method, an image classification method, a text translation method and device, and equipment, which can reduce the data amount of the weight matrix in the neural network model training and the weight matrix update process.
  • Figure 1 shows a schematic diagram of the main framework of artificial intelligence.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • the "IT value chain” is the industrial ecological process from the underlying infrastructure and information (providing and processing technology realization) of human intelligence to the system, reflecting the value that artificial intelligence brings to the information technology industry.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • the basic platform includes distributed computing framework and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart medical care, autonomous driving, safe city, etc.
  • the embodiment of the present application relates to the training process of the neural network model, in order to facilitate understanding, the following first introduces related terms and concepts of the neural network model that may be involved in the embodiment of the present application.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes xs and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • Ws is the weight of Xs
  • b is the bias of the neural unit.
  • f is the activation functions of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Important equation taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.
  • the neural network can use an error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • This application can be applied to the field of artificial intelligence, and a system structure of the neural network model training application in the embodiments of the application will be introduced below.
  • FIG. 2 shows an example diagram of a system architecture applied by an embodiment of the present application.
  • a memory 201 is connected to the processor 202
  • the processor 202 is connected to the AI hardware accelerator 203.
  • the above-mentioned "connection" may be a direct connection or an indirect connection.
  • the memory 201 is one of the important components in the computer, and it is a bridge between the external memory and the processor 202 to communicate. Moreover, all programs in the computer are run in memory.
  • the processor 202 may be a central processing unit (CPU), which is used to allocate acceleration tasks and the like to the AI hardware accelerator 203 mounted on it.
  • CPU central processing unit
  • the AI hardware accelerator 203 may be an independent chip, or may be integrated into a system on chip (system on chip, SoC) as a functional module. It mainly includes a matrix calculation unit (cube unit), a vector calculation unit (vector unit) and a buffer (buffer).
  • SoC system on chip
  • the matrix calculation unit is used to complete matrix multiplication matrix calculations, such as completing the gradient calculation in the neural network and the matrix multiplication matrix calculations corresponding to the convolutional layer and the fully connected layer. Specifically, when performing convolutional layer or fully connected layer operations, the matrix calculation unit reads the data corresponding to the data matrix from the data buffer unit and the parameter buffer unit, where the parameter data read from the parameter buffer unit is passed through The memory read-write controller is transported to the parameter buffer unit. During the transport process, the parameter data decompression operation needs to be performed through the decompression engine first, and then the matrix multiplication calculation can be performed on the matrix calculation unit to obtain the partial or final result of the matrix. Stored in the accumulator.
  • the vector processing unit can further optimize the output result of the matrix calculation unit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., when the actual situation requires it. It is mainly used in the network calculation of the non-convolutional layer and other layers outside the fully connected layer in the neural network, such as the activation function (rectified linear unit, Relu) layer, pooling layer, etc.
  • the cache is used to save the data loaded from the memory into the AI hardware accelerator and the intermediate data generated during the calculation process.
  • the neural network model training process applied to the AI hardware accelerator 203 in this application is as follows:
  • the AI hardware accelerator 203 first obtains the codeword and index corresponding to the initial weight matrix of the neural network model from the memory 201 through the processor 202, and then determines the weight of the neural network model according to the obtained codeword and index Matrix, and use the training data to train the weight matrix.
  • the codeword is updated, and the updated codeword and the index obtained before are used to re-determine the weight matrix for renewed Perform model training, and so on.
  • the codeword is updated repeatedly, and the weight matrix is re-determined with the updated codeword for re-training the model.
  • an embodiment of the present application provides a neural network model training method, which can be applied to the AI hardware accelerator 203. As shown in FIG. 3, the method includes:
  • the weight matrix in order to overcome the memory bottleneck problem in neural network model training, the weight matrix is no longer repeatedly loaded, but codewords are obtained from memory for model training, where the codewords correspond to the neural network model The first weight matrix.
  • the initial weight matrix needs to be divided to determine the codeword corresponding to the initial weight matrix. That is, it is necessary to preprocess the initial weight matrix of the neural network model in advance, and split it into corresponding codewords and corresponding indexes, where the codeword refers to the representation of each appearance state in the dictionary, and in this application
  • Each codeword in refers to a sub-matrix in the weight matrix, and the memory space occupied by the codeword is much smaller than the memory space occupied by the weight matrix.
  • the index characterizes the correspondence between the codeword and the weight matrix of the neural network model, and the weight averages contained in each layer of the neural network model correspond to different codewords and indexes respectively.
  • the codeword corresponding to the initial weight matrix of the neural network model can be obtained from the memory to perform the subsequent steps S302-S306 to complete the model training without directly reading the initial weights Matrix training, because the data storage space occupied by the codeword is much smaller than the data storage space occupied by the weight matrix, it can greatly reduce the amount of data read from the memory and effectively overcome the memory bottleneck problem.
  • the specific implementation process of preprocessing the initial weight matrix of the neural network model and splitting it into corresponding codewords and indexes may include the following steps A1-A3:
  • Step A1 Divide the initial weight matrix into k sub-matrices of the same dimension, and determine the index numbers corresponding to the k sub-matrices of the same dimension; where k is a positive integer greater than 1.
  • preprocessing the initial weight matrix of the neural network model refers to preprocessing the initial weight matrix contained in each layer in the neural network model separately, so that the initial weight contained in each layer corresponds to each Different codewords and indexes. It should be noted that in the following content, this embodiment will take the initial weight matrix contained in a certain layer in the neural network model as the standard to introduce how to preprocess the initial weight matrix to obtain its corresponding codeword and index. Subsequent processing is performed on it, and the processing method of the initial weight matrix contained in other layers is similar to it, and will not be repeated one by one.
  • this application first divides the initial weight into k sub-matrices of the same dimension, and determines the index number corresponding to each sub-matrix.
  • the two are in a one-to-one correspondence (that is, one sub-matrix corresponds to one index number), such as ,
  • the index numbers corresponding to the k sub-matrices of the same dimension can be defined as i 0 , i 1 , ... i k-1 , respectively, to perform step A3.
  • k is a positive integer greater than 1.
  • Step A2 cluster k sub-matrices of the same dimension to obtain n codewords corresponding to k sub-matrices of the same dimension, and determine the index value corresponding to the n codewords, where n is a positive value greater than 0 Integer, n ⁇ k.
  • the k sub-matrices of the same dimension can be further clustered to obtain n category centers (that is, n code Word), where n is a positive integer greater than 0, and n ⁇ k.
  • n category centers that is, n code Word
  • the sub-matrix (ie, code word) corresponding to the center of each category can be used to characterize each sub-matrix in the category to which it belongs.
  • the index values corresponding to the n codewords can be further determined, and the two are in a one-to-one correspondence (that is, one codeword corresponds to an index value).
  • the index values corresponding to the n codewords can be defined separately It is 1, 2,...n, used to execute step A3.
  • an optional implementation is the specific implementation process of "clustering k sub-matrices of the same dimension to obtain n codewords corresponding to k sub-matrices of the same dimension" in this step A2 It can include the following steps A21-A23:
  • Step A21 Reduce the dimensions of the k sub-matrices of the same dimension into one-dimensional vectors, respectively, to obtain k one-dimensional vectors.
  • k sub-matrices of the same dimension contain a matrix of order 2*3: Then it can be reduced to a one-dimensional vector containing 6 elements [a 1 , a 2 , a 3 , a 4 , a 5 , a 6 ].
  • Step A22 Divide the k one-dimensional vectors into n vector groups, where each vector group contains at least one one-dimensional vector.
  • the k one-dimensional vectors can be further grouped, for example, the element values can be compared
  • the close vectors are divided into a vector group, so that each vector group contains at least one one-dimensional vector.
  • Step A23 averaging the element values of the corresponding positions in all the one-dimensional vectors belonging to the i-th vector group among the k one-dimensional vectors to obtain a codeword corresponding to all the one-dimensional vectors in the i-th vector group; where , I takes an integer from 1 to n respectively.
  • the center vector of each vector group can be further determined to determine the codeword corresponding to the vector group.
  • the i-th vector group contains 3 one-dimensional vectors, which are: [a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ], [b 1 ,b 2 ,b 3 ,b 4 ,b 5 ,b 6 ], [c 1 ,c 2 ,c 3 ,c 4 ,c 5 , c 6 ], the element values at the corresponding positions in these three vectors can be averaged, and a one-dimensional average vector can be obtained as:
  • the one-dimensional vector is the codeword corresponding to the i-th vector group, and the length of the codeword is 6, which is the number of elements contained in the one-dimensional vector.
  • Step A3 Determine n codewords as codewords corresponding to the initial weight matrix, and index values corresponding to the n codewords and index numbers corresponding to the index values to form an index corresponding to the initial weight matrix.
  • the initial weight matrix is divided into k sub-matrices of the same dimension through step A2, and the index numbers corresponding to these k sub-matrices of the same dimension are determined, and k sub-matrices of the same dimension are determined through step A2.
  • the n codewords can be determined as the codewords corresponding to the initial weight for storage, or the n codewords can be formed into a dictionary for storage. storage.
  • the index values corresponding to these n codewords (such as 1, 2,...n) and the index numbers corresponding to the index values (such as i 0 , i 1 ,...i k-1 ) can be formed into the corresponding initial weight index.
  • the codeword and the index value are in a one-to-one correspondence (that is, a sub-matrix corresponds to an index number), and the codeword and the index value are in a one-to-one correspondence (that is, a codeword corresponds to an index value)
  • the codeword is the center vector of a vector group (such as the average vector), where each vector in the vector group corresponds to a sub-matrix, so a codeword can correspond to multiple sub-matrices, and then an index can be obtained
  • the value may correspond to multiple index numbers.
  • the codeword can be stored in the memory.
  • the storage space occupied by the codeword is much smaller.
  • the storage space occupied by the two is much smaller, which can greatly reduce the amount of parameters in the memory.
  • the corresponding codeword and index can be obtained with a total of 14.45MB, where the codeword is 1.16MB, the index is 13.29MB, and the compression rate is close to 40. Times.
  • codeword and index corresponding to the initial weight matrix can also be stored in an external memory (such as a hard disk), and then input into the memory from the external memory.
  • an external memory such as a hard disk
  • the specific storage location is not specified in the embodiment of this application. Make restrictions.
  • S302 Determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to train the first weight matrix.
  • the new weight matrix of the neural network model can be further determined according to the obtained codeword (here the Defined as the first weight matrix).
  • An optional implementation method is to obtain an index that characterizes the correspondence between the codeword and the weight matrix of the neural network model, and then the correspondence between the acquired codeword, index, and weight matrix can be used to determine Get the first weight matrix of the neural network model.
  • the neural network model can be decoded by using the one-to-one correspondence between the codeword and the index value, the one-to-many relationship between the index value and the index number, and the one-to-one correspondence between the index number and the sub-matrix And use the training data to train the first weight matrix of the neural network model.
  • the specific calculation formula is as follows:
  • D mat represents the first weight matrix composed of codewords
  • the dimension of the matrix is c ⁇ n, where c represents the length of the codeword, n represents the number of codewords
  • I oh represents the one-hot matrix formed by the index
  • the dimension of the matrix is n ⁇ k, where, based on the characteristics of the one-hot matrix, the value of each row is 1 only at the position corresponding to the index number, and the value at other positions is 0.
  • the left figure shows a dictionary composed of n codewords, which are: the first codeword, the second codeword,..., the nth codeword
  • the middle figure is composed of n
  • the index values corresponding to each codeword (ie 1, 2,..., n) and the index numbers corresponding to the index values (ie i 0 , i 1 , ... i k-1 ) constitute an index.
  • the index value corresponding to the first codeword in the dictionary is "1”
  • the index value corresponds to two index numbers, which are the index numbers of the first row and first column of the index map. And the index numbers in the second row and second column.
  • index numbers respectively correspond to the two sub-matrices in the weight matrix of the neural network model.
  • the sub-matrix represented by the gray square and the sub-matrix represented by light gray in the second row and second column can be used to decode the first row of the weight matrix according to the first codeword in the dictionary.
  • the sub-matrix of columns and the sub-matrix of the second row and second column can be used to decode the first row of the weight matrix according to the first codeword in the dictionary.
  • the index value corresponding to the second codeword is "2"
  • the index value corresponds to an index number, that is, the index number in the third row and first column of the index map
  • the index number corresponds to a sub-matrix in the weight matrix of the neural network model, that is, the sub-matrix represented by the dark gray square in the third row and the first column of the weight matrix diagram on the right, and then this correspondence can be used
  • the second codeword in the dictionary decode the submatrix of the third row and the first column of the weight matrix, and so on, you can use each codeword in the dictionary and the submatrices of the codeword, index, and weight matrix.
  • training data can be used to train the weight matrix.
  • the codeword is generated by clustering and averaging the sub-matrices corresponding to the initial matrix through the above steps A21-A23, the first weight of the neural network model determined by the codeword and the index Compared with the initial weight matrix, although the data space occupied by the two is the same, the weight elements contained in the two matrices are not completely the same, which leads to the fact that the weight values of the two are not completely the same. It is very close, that is, the determined weight matrix of the neural network model can be used to replace the initial weight matrix for model training.
  • the preset stop condition refers to a preset stop condition.
  • Conditions that need to be met when stopping training It can be that the difference between the result label value of the training data and the output result of the model on the training data is lower than the preset difference; it can also be that the change rate of the difference between the result label value of the training data and the output result of the model on the training data is lower than the predetermined difference.
  • Set the change threshold it can also be the update times of the model parameters reaching the preset update times (such as 100 times), etc.; or, it can also be the output value of the loss function that characterizes the difference between the model output result and the target result value (loss)
  • the preset threshold value such as 0.1
  • the codeword needs to be updated according to the current training result to obtain the updated codeword, which is used to pass the subsequent step S304 to compare the model Perform retraining.
  • the specific implementation process of this step S303 may include the following steps B1-B2:
  • Step B1 When the preset stopping condition is not met, determine the weight gradient of the first weight matrix of the neural network model.
  • the preset stopping condition is not satisfied. For example, when the loss value does not reach the preset threshold, the loss can be used.
  • the value is calculated backward to determine the weight gradient of the first weight matrix of the neural network model (here defined as), which is used to execute the subsequent step B2.
  • Step B2 Determine the codeword gradient according to the first weight gradient and index, and determine the updated codeword according to the codeword gradient.
  • the codeword gradient can be further determined according to the correspondence between the codeword, the index, and the weight matrix. Specifically, the one-to-one correspondence between the codeword and the index value, the one-to-many relationship between the index value and the index number, and the one-to-one correspondence between the index number and the sub-matrix can be used, which will belong to the same codeword.
  • the weight gradient in the sub-matrix corresponding to the index number of is processed to obtain the codeword gradient corresponding to the codeword.
  • the specific calculation formula is as follows:
  • g D represents the codeword gradient
  • Re represents the transposition of the one-hot matrix I oh formed by the index
  • g w represents the weight gradient of the first weight matrix of the neural network model.
  • step B2 may include the following steps B21-B23:
  • Step B21 Perform a weighted summation of the weight gradients of the sub-matrix corresponding to the index number of the jth codeword in the weight gradient to obtain the codeword gradient corresponding to the jth codeword; where j is an integer from 1 to n respectively .
  • the weight gradient in the sub-matrix corresponding to the index number of the same codeword it is necessary to process the weight gradient in the sub-matrix corresponding to the index number of the same codeword, and determine the updated value corresponding to the codeword according to the processing result.
  • j can be any integer from 1 to n
  • Step B22 optimizing the codeword gradient corresponding to the jth codeword to obtain the update amount of the jth codeword.
  • the wheat gradient can be further optimized to obtain the update amount of the jth codeword.
  • the widely used Adam optimizer can be used to optimize the codeword gradient corresponding to the jth codeword to obtain the update amount of the jth codeword.
  • four intermediate parameters are generated during the optimization process, respectively As: first-order momentum m t , second-order momentum v t , first-order momentum correction value Second-order momentum correction It should be noted that these four intermediate parameters (i.e. m t , v t , The data amount of each parameter of) is consistent with the number of codeword gradients of the j-th codeword.
  • Step B23 Use the update amount of the j-th codeword to update the j-th codeword to obtain the updated j-th codeword.
  • the update amount of the jth codeword can be further used to update the jth codeword.
  • the jth codeword can be used to subtract the update amount.
  • the result of adding the j-th codeword and the update amount is used as the j-th codeword after the update, which is used to execute the subsequent step S305.
  • the right figure is the weight gradient of the weight matrix
  • the middle figure is the index value corresponding to n codewords (ie 1, 2,..., n) and the index number corresponding to the index value (ie i 0 , i 1 , ... i k-1 ) constitute an index.
  • the index value corresponding to the first codeword in the dictionary is "1”
  • the index value corresponds to two index numbers, which are the index numbers of the first row and first column of the index map. And the index numbers in the second row and second column.
  • the weight gradient of and the weight gradient in the second row and second column, and then the two weight gradients can be weighted and summed to obtain the codeword gradient corresponding to the first codeword.
  • the index value corresponding to the second codeword is "2"
  • the index value page corresponds to two index numbers, which are the third row and the first column of the index map. And the index number in the fourth row and third column.
  • These two index numbers respectively correspond to the weight gradients of the two sub-matrices in the weight gradient of the weight matrix.
  • the weight gradient in one column and the weight gradient in the fourth row and third column, and then the two weight gradients can be weighted and summed to obtain the codeword gradient corresponding to the second codeword, and so on, you can determine
  • Each codeword corresponds to its own codeword gradient.
  • the Adam optimizer can be used to optimize the gradient of each codeword to obtain the update amount of each codeword, and then use the update amount of each codeword to update each codeword to obtain each updated codeword.
  • the updated codeword can be further stored in the memory to execute the subsequent step S305.
  • S305 Determine the weight matrix of the neural network model as the second weight matrix by using the updated codeword obtained in the memory, and use the training data to train the second weight matrix.
  • the updated codeword obtained in the memory can be further used to re-determine the new neural network model by performing step S302. (Here defined as the second weight matrix to replace the first weight matrix introduced in step S302), and use the training data to perform the next round of model training on the second weight matrix of the neural network model.
  • step S302 For the specific implementation process, please refer to the introduction of step S302, which will not be repeated here.
  • the weight matrix of the new neural network model is re-determined through the above steps S303-S305 for the next round of model training.
  • S306 Stop the training of the neural network model when the preset stopping condition is met.
  • the codeword needs to be updated again according to the result of a round of model training to obtain the updated codeword , For retraining the model again through the above step S305.
  • the codeword is updated and the subsequent steps (i.e., steps S303 and S305) are repeated until the preset stop condition is met, the training of the neural network model is stopped.
  • the neural network model training method when training the neural network model, first obtain the codeword corresponding to the first weight matrix of the neural network model from the memory, and then determine the neural network model according to the codeword.
  • the weight matrix of the network model is the first weight matrix, and the training data is used to train the first weight matrix.
  • the codeword is updated to obtain the updated codeword, and update it The latter codewords are stored in the memory.
  • the weight matrix of the neural network model is determined as the second weight matrix using the updated codewords obtained in the memory, and the second weight matrix is trained using the training data, and then the When the preset stopping condition is met, the training of the neural network model is stopped.
  • the weight matrix is no longer directly read from the memory, but the codeword and index corresponding to the weight matrix are read in to form the weight matrix for training.
  • the memory space occupied is much smaller than the memory space occupied by the weight matrix, so the amount of data read from the memory can be greatly reduced, and the memory bottleneck problem can be overcome.
  • this application no longer calculates the update amount of the weight matrix during the model training process, but calculates the update amount of the codeword to re-determine the new weight matrix for subsequent training, thereby reducing the intermediate parameters in the update process.
  • the amount of calculation enables the training of neural network models to be carried out smoothly in resource-constrained scenarios.
  • the data volume of the weight matrix read is 528MB
  • the weight gradient generated is 528MB
  • the four intermediate parameters ie, m t , V t , The data volume of each parameter in
  • the total memory space required is 3.17GB.
  • VGG16 is trained using the model training method provided in this application
  • the codeword and index corresponding to the weight matrix are read in.
  • the two are 14.45MB in total, of which the codeword is 1.16MB and the index is 13.29MB.
  • the generated code The word gradient is 1.16MB
  • the four intermediate parameters generated when optimizing it namely m t , v t , The data volume of each parameter in
  • the total memory space required is 20.25MB. Compared with 3.17GB, the calculated data volume is greatly reduced.
  • the data volume of the read-in weight matrix is 471MB
  • the weight gradient generated is 471MB
  • the four intermediate parameters ie, m t , v t , The data volume of each parameter in
  • the total memory space required is 2.76GB.
  • the codeword and index corresponding to the weight matrix are read in. The two are 11.46MB in total, of which the codeword is 0.12MB and the index is 11.34MB.
  • the generated code The word gradient is 0.12MB
  • the four intermediate parameters generated when optimizing it namely m t , v t , The data volume of each parameter in
  • the total memory space required is 12.06MB.
  • the calculated data volume is also greatly reduced.
  • the embodiments of the present application also provide an image classification method. Based on the neural network model training method provided in the above embodiments, the neural network model obtained according to the neural network model training method can be applied to image classification.
  • the figure is a flowchart of an image classification method provided by an embodiment of this application, and the method may include:
  • S602 Input the image to be classified into the trained neural network model to obtain the image classification result output by the neural network model.
  • the image to be classified is first acquired, and the image to be classified is input into a pre-trained neural network model to obtain an image classification result corresponding to the image to be classified.
  • the neural network model can not only output the classification result corresponding to the image to be classified, but also the probability value corresponding to each classification result, so that the user can directly understand the classification of the image to be classified.
  • the neural network model used is a model that can classify medical image images, and it can be obtained by inputting the medical image image (or its corresponding feature map) into the neural network model
  • the specific classification result of the medical imaging image For example, it can be recognized whether the input medical image image is a medical image image that carries a certain feature or has a certain classification result, or is a medical image image that does not carry a certain feature or does not have a certain classification result.
  • the training process of the neural network model includes:
  • the training data in this embodiment may include positive sample images and negative sample images.
  • a positive sample image refers to an image to be trained that carries a certain characteristic or has a certain classification result
  • the result label value of the positive sample image can be 1.
  • a negative sample image refers to an image to be trained that does not carry a certain feature or does not have a certain classification result; and the result label value of the negative sample image can be 0.
  • the output result of the current neural network model to be trained on the training data may be that the training data is input to the current neural network model to be trained, and the training data output by the current neural network model to be trained is the probability value of the positive sample image.
  • the method when the first weight matrix is the initial weight matrix, the method further includes:
  • the initial weight matrix is divided to determine codewords corresponding to the initial weight matrix.
  • the dividing the initial weight matrix to determine the codeword corresponding to the initial weight matrix includes:
  • the n codewords are determined as the codewords corresponding to the initial weight matrix.
  • performing clustering processing on the k sub-matrices of the same dimension to obtain n codewords corresponding to the k sub-matrices of the same dimension includes:
  • the method further includes:
  • the weight matrix of the neural network model is released in the memory.
  • updating the codeword to obtain the updated codeword includes:
  • the determining the codeword gradient according to the weight gradient, and determining the updated codeword according to the codeword gradient includes:
  • the jth codeword is updated by using the update amount of the jth codeword to obtain the updated jth codeword.
  • the method further includes:
  • index is a correspondence between the codeword and the weight matrix of the neural network model.
  • the preset stop condition includes one or more of the following conditions:
  • the difference between the result label value corresponding to the training data and the output result of the neural network model on the training data is lower than a preset difference
  • the change rate of the difference between the result label value corresponding to the training data and the output result of the neural network model to the training data is lower than a preset change threshold
  • the update times of the model parameters in the neural network model reach the preset update times
  • the output value of the loss function used by the neural network model reaches a preset threshold; the loss function is used to measure the difference between the output result of the neural network model on the training data and the result label value corresponding to the training data difference.
  • a pre-trained neural network model is used to classify the image to be classified. Since the neural network model can reach the global optimum, the classification result output by the neural network model is more accurate, thereby improving the accuracy of the classification result.
  • the embodiments of the present application also provide a text translation method. Based on the neural network model training method provided in the above embodiments, the neural network model obtained according to the neural network model training method can be applied to text translation.
  • the figure is a flowchart of a text translation method provided by an embodiment of the application, and the method may include:
  • S702 Input the text to be translated into the trained neural network model, and obtain the text translation result output by the neural network model.
  • the text to be translated is first obtained, and the text to be translated is input into a pre-trained neural network model to obtain a text translation result corresponding to the text to be translated.
  • a pre-trained neural network model For example, translating English text into Chinese text and outputting Chinese translation results through a pre-trained neural network model, or translating Chinese text into German text and outputting German text translation results through a pre-trained neural network model, etc. , This application does not limit the language of translation.
  • the neural network model used is a model that can translate the text to be translated, and the English can be obtained by inputting the text to be translated (or its corresponding feature vector) into the neural network model The specific translation result of the text.
  • the Chinese text translation result or the German text translation result of the input English text can be translated.
  • the training process of the neural network model includes:
  • the method when the first weight matrix is the initial weight matrix, the method further includes:
  • the initial weight matrix is divided to determine codewords corresponding to the initial weight matrix.
  • the dividing the initial weight matrix to determine the codeword corresponding to the initial weight matrix includes:
  • the n codewords are determined as the codewords corresponding to the initial weight matrix.
  • performing clustering processing on the k sub-matrices of the same dimension to obtain n codewords corresponding to the k sub-matrices of the same dimension includes:
  • the method further includes:
  • the weight matrix of the neural network model is released in the memory.
  • updating the codeword to obtain the updated codeword includes:
  • the determining the codeword gradient according to the weight gradient, and determining the updated codeword according to the codeword gradient includes:
  • the jth codeword is updated by using the update amount of the jth codeword to obtain the updated jth codeword.
  • the method further includes:
  • index is a correspondence between the codeword and the weight matrix of the neural network model.
  • the preset stop condition includes one or more of the following conditions:
  • the difference between the result label value corresponding to the training data and the output result of the neural network model on the training data is lower than a preset difference value
  • the change rate of the difference between the result label value corresponding to the training data and the output result of the neural network model to the training data is lower than a preset change threshold
  • the update times of the model parameters in the neural network model reach the preset update times
  • the output value of the loss function used by the neural network model reaches a preset threshold; the loss function is used to measure the difference between the output result of the neural network model on the training data and the result label value corresponding to the training data difference.
  • a pre-trained neural network model is used to translate the text to be translated. Since the neural network model can reach the global optimum, the translation result output by the neural network model is more accurate, and the accuracy of the translation result is improved.
  • an embodiment of the present application provides a neural network model training device 800.
  • the device 800 may include: a first acquiring unit 801, a first training unit 802, an updating unit 803, a storage unit 804, a second training unit 805, and a stopping unit 806.
  • the first obtaining unit 801 is configured to support the apparatus 800 to execute S301 in the embodiment shown in FIG. 3.
  • the first training unit 802 is used to support the device 800 to perform S302 in the embodiment shown in FIG. 3.
  • the update unit 803 is used to support the device 800 to execute S303 in the embodiment shown in FIG. 3.
  • the storage unit 804 is used to support the device 800 to execute S304 in the embodiment shown in FIG. 3.
  • the second training unit 805 is used to support the device 800 to execute S305 in the embodiment shown in FIG. 3.
  • the stopping unit 806 is used to support the device 800 to execute S306 in the embodiment shown in FIG. 3. specific,
  • the first obtaining unit 801 is configured to obtain codewords from the memory, where the codewords correspond to the first weight matrix of the neural network model;
  • the first training unit 802 is configured to determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to train the first weight matrix;
  • the update unit 803 is configured to update the codeword when the preset stop condition is not met, and obtain the updated codeword
  • the storage unit 804 is configured to store the updated codeword in the memory
  • the second training unit 805 is configured to determine the weight matrix of the neural network model as the second weight matrix by using the updated codewords obtained in the memory, and use the training data to train the second weight matrix;
  • the stopping unit 806 is used to stop the training of the neural network model when the preset stopping condition is met.
  • the device when the first weight matrix is the initial weight matrix, the device further includes:
  • the dividing unit is used to divide the initial weight matrix to determine the codeword corresponding to the initial weight matrix.
  • the dividing unit includes:
  • the first division subunit is used to divide the initial weight matrix into k sub-matrices of the same dimension; where k is a positive integer greater than 1;
  • the clustering subunit is used to cluster k sub-matrices of the same dimension to obtain n codewords corresponding to k sub-matrices of the same dimension, where n is a positive integer greater than 0, and n ⁇ k;
  • the first determining subunit is used to determine n codewords as codewords corresponding to the initial weight matrix.
  • the clustering subunit includes:
  • the dimensionality reduction subunit is used to reduce the dimensions of k sub-matrices of the same dimension into one-dimensional vectors to obtain k one-dimensional vectors;
  • the second division subunit is used to divide the k one-dimensional vectors into n vector groups, wherein each vector group contains at least one one-dimensional vector;
  • the calculation subunit is used for averaging the element values of the corresponding positions in all the one-dimensional vectors belonging to the i-th vector group in the k one-dimensional vectors to obtain all the one-dimensional vectors in the i-th vector group Corresponding to a codeword; among them, i takes an integer from 1 to n respectively.
  • the device further includes:
  • the releasing unit is used to release the weight matrix of the neural network model in the memory when the preset stopping condition is not met.
  • the updating unit 803 includes:
  • the second determining subunit is used to determine the weight gradient of the first weight matrix of the neural network model when the preset stopping condition is not met;
  • the third determining subunit is used to determine the codeword gradient according to the first weight gradient, and determine the updated codeword according to the codeword gradient.
  • the third determining subunit includes:
  • the first obtaining subunit is used to perform a weighted summation of the weight gradients of the sub-matrix corresponding to the index number of the j-th codeword in the weight gradient to obtain the codeword gradient corresponding to the j-th codeword; where j is taken respectively An integer from 1 to n;
  • the second obtaining subunit is used to optimize the codeword gradient corresponding to the jth codeword to obtain the update amount of the jth codeword;
  • the third obtaining subunit is used to update the j-th codeword by using the update amount of the j-th codeword to obtain the updated j-th codeword.
  • the device further includes:
  • the second acquiring unit is used to acquire an index, where the index is the correspondence between the codeword and the weight matrix of the neural network model.
  • the preset stop condition includes one or more of the following conditions:
  • the difference between the result label value corresponding to the training data and the output result of the neural network model on the training data is lower than the preset difference
  • the change rate of the difference between the result label value corresponding to the training data and the output result of the neural network model to the training data is lower than the preset change threshold
  • the update times of the model parameters in the neural network model reach the preset update times
  • the output value of the loss function adopted by the neural network model reaches a preset threshold; wherein, the loss function is used to measure the difference between the output result of the neural network model on the training data and the result label value corresponding to the training data.
  • the neural network model training device when training the neural network model, first obtain the codeword corresponding to the first weight matrix of the neural network model from the memory, and then determine the neural network according to the codeword
  • the weight matrix of the model is the first weight matrix
  • the training data is used to train the first weight matrix.
  • the preset stopping condition is not met
  • the codeword is updated to obtain the updated codeword, and the updated codeword is obtained.
  • the codeword of is stored in the memory.
  • the weight matrix of the neural network model is determined as the second weight matrix by using the updated codeword obtained in the memory, and the second weight matrix is trained using the training data, and then the preset When the stopping condition is met, the training of the neural network model is stopped.
  • the weight matrix is no longer directly read from the memory, but the codeword corresponding to the weight matrix is read in to form the weight matrix for training.
  • the memory space is much smaller than the memory space occupied by the weight matrix, so the amount of data read from the memory can be greatly reduced, and the memory bottleneck problem can be overcome.
  • this application no longer calculates the update amount of the weight matrix during the model training process, but calculates the update amount of the codeword to re-determine the new weight matrix for subsequent training, thereby reducing the intermediate parameters in the update process.
  • the amount of calculation enables the training of neural network models to be carried out smoothly in resource-constrained scenarios.
  • an embodiment of the present application also provides an image classification device 900.
  • the device 900 may include: an image acquisition unit 901, an image classification unit 902, and a neural network model training unit 903.
  • the image acquisition unit 901 is used to support the device 900 to execute S601 in the embodiment shown in FIG. 6.
  • the image classification unit 902 is configured to support the device 900 to execute S602 in the embodiment shown in FIG. 6.
  • the neural network model training unit 903 is used to support the device 900 to execute S301-S306 in the embodiment shown in FIG. 3. specific,
  • the image acquisition unit 901 is configured to acquire an image to be classified
  • the image classification unit 902 is configured to input the image to be classified into the trained neural network model to obtain the image classification result output by the neural network model;
  • the neural network model training unit 903 is used to train the neural network model
  • the neural network model training unit 903 includes:
  • the first acquiring unit is used to acquire a codeword from the memory, where the codeword corresponds to the first weight matrix of the neural network model;
  • the first training unit is used to determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to train the first weight matrix; wherein the training data includes positive sample images and negative sample images;
  • the update unit is used to update the codeword to obtain the updated codeword when the preset stopping condition is not satisfied after the neural network model outputs the probability value of the training data as the positive sample image;
  • the storage unit is used to store the updated codeword in the memory
  • the second training unit is used to determine the weight matrix of the neural network model as the second weight matrix by using the updated codeword obtained in the memory, and to train the second weight matrix by using the training data;
  • the stop unit is used to stop the training of the neural network model when the preset stop condition is met.
  • the device when the first weight matrix is the initial weight matrix, the device further includes:
  • the dividing unit is used to divide the initial weight matrix to determine the codeword corresponding to the initial weight matrix.
  • the dividing unit includes:
  • the first division subunit is used to divide the initial weight matrix into k sub-matrices of the same dimension; where k is a positive integer greater than 1;
  • the clustering subunit is used to cluster k sub-matrices of the same dimension to obtain n codewords corresponding to k sub-matrices of the same dimension, where n is a positive integer greater than 0, and n ⁇ k;
  • the first determining subunit is used to determine n codewords as codewords corresponding to the initial weight matrix.
  • the clustering subunit includes:
  • the dimensionality reduction subunit is used to reduce the dimensions of k sub-matrices of the same dimension into one-dimensional vectors to obtain k one-dimensional vectors;
  • the second division subunit is used to divide the k one-dimensional vectors into n vector groups, wherein each vector group contains at least one one-dimensional vector;
  • the calculation subunit is used for averaging the element values of the corresponding positions in all the one-dimensional vectors belonging to the i-th vector group in the k one-dimensional vectors to obtain all the one-dimensional vectors in the i-th vector group Corresponding to a codeword; among them, i takes an integer from 1 to n respectively.
  • the device further includes:
  • the releasing unit is used to release the weight matrix of the neural network model in the memory when the preset stopping condition is not met.
  • the update unit includes:
  • the second determining subunit is used to determine the weight gradient of the first weight matrix of the neural network model when the preset stopping condition is not met;
  • the third determining subunit is used to determine the codeword gradient according to the first weight gradient, and determine the updated codeword according to the codeword gradient.
  • the third determining subunit includes:
  • the first obtaining subunit is used to perform a weighted summation of the weight gradients of the sub-matrix corresponding to the index number of the j-th codeword in the weight gradient to obtain the codeword gradient corresponding to the j-th codeword; where j is taken respectively An integer from 1 to n;
  • the second obtaining subunit is used to optimize the codeword gradient corresponding to the jth codeword to obtain the update amount of the jth codeword;
  • the third obtaining subunit is used to update the j-th codeword by using the update amount of the j-th codeword to obtain the updated j-th codeword.
  • the device further includes:
  • the second acquiring unit is used to acquire an index, where the index is the correspondence between the codeword and the weight matrix of the neural network model.
  • the preset stop condition includes one or more of the following conditions:
  • the difference between the result label value corresponding to the training data and the output result of the neural network model on the training data is lower than the preset difference
  • the change rate of the difference between the result label value corresponding to the training data and the output result of the neural network model to the training data is lower than the preset change threshold
  • the update times of the model parameters in the neural network model reach the preset update times
  • the output value of the loss function adopted by the neural network model reaches a preset threshold; among them, the loss function is used to measure the gap between the output result of the neural network model on the training data and the result label value corresponding to the training data.
  • an embodiment of the present application also provides a text translation device 1000.
  • the device 1000 may include: a text acquisition unit 1001, a text translation unit 1002, and a neural network model training unit 1003.
  • the text obtaining unit 1001 is used to support the apparatus 1000 to execute S701 in the embodiment shown in FIG. 7.
  • the text translation unit 1002 is used to support the device 1000 to execute S702 in the embodiment shown in FIG. 7.
  • the neural network model training unit 1003 is used to support the device 1000 to execute S301-S306 in the embodiment shown in FIG. 3. specific,
  • the text obtaining unit 1001 is used to obtain the text to be translated
  • the text translation unit 1002 is used to input the text to be translated into the trained neural network model to obtain the text translation result output by the neural network model;
  • the neural network model training unit 1003 is used to train the neural network model
  • the neural network model training unit 1003 includes:
  • the first acquiring unit is configured to acquire codewords from the memory, where the codewords correspond to the first weight matrix of the neural network model;
  • the first training unit is used to determine the weight matrix of the neural network model as the first weight matrix according to the codeword, and use the training data to train the first weight matrix; wherein the training data is sample text;
  • the update unit is used to update the codeword to obtain the updated codeword when the preset stop condition is not met after the neural network model outputs the translation result of the sample text;
  • the storage unit is used to store the updated codeword in the memory
  • the second training unit is used to determine the weight matrix of the neural network model as the second weight matrix by using the updated codewords obtained in the memory, and use training data to train the second weight matrix of the neural network model;
  • the stop unit is used to stop the training of the neural network model when the preset stop condition is met.
  • the device when the first weight matrix is the initial weight matrix, the device further includes:
  • the dividing unit is used to divide the initial weight matrix to determine the codeword corresponding to the initial weight matrix.
  • the dividing unit includes:
  • the first division subunit is used to divide the initial weight matrix into k sub-matrices of the same dimension; where k is a positive integer greater than 1;
  • the clustering subunit is used to cluster k sub-matrices of the same dimension to obtain n codewords corresponding to k sub-matrices of the same dimension, where n is a positive integer greater than 0, and n ⁇ k;
  • the first determining subunit is used to determine n codewords as codewords corresponding to the initial weight matrix.
  • the clustering subunit includes:
  • the dimensionality reduction subunit is used to reduce the dimensions of k sub-matrices of the same dimension into one-dimensional vectors to obtain k one-dimensional vectors;
  • the second division subunit is used to divide the k one-dimensional vectors into n vector groups, wherein each vector group contains at least one one-dimensional vector;
  • the calculation subunit is used for averaging the element values of the corresponding positions in all the one-dimensional vectors belonging to the i-th vector group in the k one-dimensional vectors to obtain all the one-dimensional vectors in the i-th vector group Corresponding to a codeword; among them, i takes an integer from 1 to n respectively.
  • the device further includes:
  • the releasing unit is used to release the weight matrix of the neural network model in the memory when the preset stopping condition is not met.
  • the update unit includes:
  • the second determining subunit is used to determine the weight gradient of the first weight matrix of the neural network model when the preset stopping condition is not met;
  • the third determining subunit is used to determine the codeword gradient according to the first weight gradient, and determine the updated codeword according to the codeword gradient.
  • the third determining subunit includes:
  • the first obtaining subunit is used to perform a weighted summation of the weight gradients of the sub-matrix corresponding to the index number of the j-th codeword in the weight gradient to obtain the codeword gradient corresponding to the j-th codeword; where j is taken respectively An integer from 1 to n;
  • the second obtaining subunit is used to optimize the codeword gradient corresponding to the jth codeword to obtain the update amount of the jth codeword;
  • the third obtaining subunit is used to update the j-th codeword by using the update amount of the j-th codeword to obtain the updated j-th codeword.
  • the device further includes:
  • the second acquiring unit is used to acquire an index, where the index is the correspondence between the codeword and the weight matrix of the neural network model.
  • the preset stop condition includes one or more of the following conditions:
  • the difference between the result label value corresponding to the training data and the output result of the neural network model on the training data is lower than the preset difference
  • the change rate of the difference between the result label value corresponding to the training data and the output result of the neural network model to the training data is lower than the preset change threshold
  • the update times of the model parameters in the neural network model reach the preset update times
  • the output value of the loss function adopted by the neural network model reaches a preset threshold; among them, the loss function is used to measure the gap between the output result of the neural network model on the training data and the result label value corresponding to the training data.
  • an embodiment of the present application provides a neural network model training device 1100, which includes a memory 1101, a processor 1102, and a communication interface 1103.
  • the memory 1101 is used to store instructions
  • the processor 1102 is configured to execute instructions in the memory 1101, and execute the above neural network model training method applied in the embodiment shown in FIG. 3;
  • the communication interface 1103 is used for communication.
  • the memory 1101, the processor 1102, and the communication interface 1103 are connected to each other through a bus 1104; the bus 1104 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus Wait.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used to represent in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • the processor 1102 is configured to, when training the neural network model, first obtain the codeword corresponding to the first weight matrix of the neural network model from the memory, and then determine the weight matrix of the neural network model according to the codeword Is the first weight matrix, and the training data is used to train the first weight matrix.
  • the codeword is updated to obtain the updated codeword, and the updated codeword is stored In the memory, then, use the updated codewords obtained in the memory to determine the weight matrix of the neural network model as the second weight matrix, and use the training data to train the second weight matrix, and then the preset stop condition is satisfied When, stop the training of the neural network model.
  • the processor 1102 please refer to the detailed description of S301, S302, S303, S304, S305, and S306 in the embodiment shown in FIG. 3, which will not be repeated here.
  • the above-mentioned memory 1101 may be random-access memory (RAM), flash memory (flash), read only memory (ROM), erasable programmable read only memory (EPROM) ), electrically erasable programmable read only memory (EEPROM), register, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known to those skilled in the art.
  • RAM random-access memory
  • flash flash memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • EEPROM electrically erasable programmable read only memory
  • register hard disk, mobile hard disk, CD-ROM or any other form of storage medium known to those skilled in the art.
  • the aforementioned processor 1102 may be, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (ASIC), and a field programmable A field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of the embodiments of the present application.
  • the processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the aforementioned communication interface 1103 may be, for example, an interface card or the like, and may be an Ethernet (ethernet) interface or an asynchronous transfer mode (ATM) interface.
  • Ethernet Ethernet
  • ATM asynchronous transfer mode
  • an embodiment of the present application provides an image classification device 1200.
  • the device includes a memory 1201, a processor 1202, and a communication interface 1203.
  • the memory 1201 is used to store instructions
  • the processor 1202 is configured to execute instructions in the memory 1201, and execute the above-mentioned image classification method applied in the embodiment shown in FIG. 6;
  • the communication interface 1203 is used for communication.
  • the memory 1201, the processor 1202, and the communication interface 1203 are connected to each other through a bus 1204; the bus 1204 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus Wait.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used to represent in FIG. 12, but it does not mean that there is only one bus or one type of bus.
  • the processor 1202 is configured to first obtain an image to be classified when classifying an image, and input the image to be classified into a pre-trained neural network model to obtain the corresponding image to be classified. The result of image classification.
  • the processor 1202 please refer to the detailed description of S601, S602, and S603 in the embodiment shown in FIG. 6, which will not be repeated here.
  • the above-mentioned memory 1201 may be random-access memory (RAM), flash memory (flash), read only memory (ROM), erasable programmable read only memory (EPROM) ), electrically erasable programmable read only memory (EEPROM), register, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known to those skilled in the art.
  • RAM random-access memory
  • flash flash memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • EEPROM electrically erasable programmable read only memory
  • register hard disk, mobile hard disk, CD-ROM or any other form of storage medium known to those skilled in the art.
  • the above-mentioned processor 1202 may be, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (ASIC), and a field programmable A field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of the embodiments of the present application.
  • the processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the above-mentioned communication interface 1203 may be, for example, an interface card or the like, and may be an ethernet interface or an asynchronous transfer mode (ATM) interface.
  • ATM asynchronous transfer mode
  • an embodiment of the present application provides a text translation device 1300.
  • the device includes a memory 1301, a processor 1302, and a communication interface 1303.
  • the memory 1301 is used to store instructions
  • the processor 1302 is configured to execute instructions in the memory 1301, and execute the above-mentioned image classification method applied in the embodiment shown in FIG. 7;
  • the communication interface 1303 is used for communication.
  • the memory 1301, the processor 1302, and the communication interface 1303 are connected to each other through a bus 1304; the bus 1304 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus Wait.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 13, but it does not mean that there is only one bus or one type of bus.
  • the processor 1302 is configured to first obtain the text to be translated when translating the text, and input the text to be translated into a pre-trained neural network model to obtain the corresponding text to be translated Text translation result.
  • a pre-trained neural network model to obtain the corresponding text to be translated Text translation result.
  • the aforementioned memory 1301 may be random-access memory (RAM), flash memory (flash), read only memory (ROM), erasable programmable read only memory (EPROM) ), electrically erasable programmable read only memory (EEPROM), register, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known to those skilled in the art.
  • RAM random-access memory
  • flash flash memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • EEPROM electrically erasable programmable read only memory
  • register hard disk, mobile hard disk, CD-ROM or any other form of storage medium known to those skilled in the art.
  • the above-mentioned processor 1302 may be, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (ASIC), and a field programmable A field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of the embodiments of the present application.
  • the processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the aforementioned communication interface 1303 may be, for example, an interface card or the like, and may be an Ethernet (ethernet) interface or an asynchronous transfer mode (Asynchronous Transfer Mode, ATM) interface.
  • the embodiments of the present application also provide a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute any implementation manner of the neural network model training method described in the above-mentioned embodiments, or execute the above-mentioned The image classification method described in the embodiment, or the text translation method described in the above embodiment.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Machine Translation (AREA)

Abstract

本申请涉及人工智能技术领域,公开了一种神经网络模型训练、图像分类、文本翻译方法及装置、设备,神经网络模型的训练方法包括:首先从内存中获取对应于神经网络模型的第一权重矩阵的码字,然后根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练,在预设停止条件未被满足时,对码字进行更新,得到更新后的码字,并将更新后的码字存储在内存中,接着,利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练,进而在预设停止条件被满足时,停止神经网络模型的训练。由于码字占据的内存空间远小于权重矩阵,从而能够降低从内存中读入的数据量。

Description

神经网络模型训练、图像分类、文本翻译方法及装置、设备
本申请要求于2020年06月18日提交中国专利局、申请号为202010558711.6、发明名称为“神经网络模型训练、图像分类、文本翻译方法及装置、设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种神经网络模型训练方法、一种图像分类方法、一种文本翻译方法及装置、设备。
背景技术
随着人工智能(artificial intelligence,AI)的快速发展,神经网络(neural network,NN)作为引领当前AI发展方向的关键技术,已取得了突破性进展,其在图像处理、文本分类、机器翻译、自然语言处理等诸多领域均取得了很高的准确率。
目前,在利用神经网络模型执行预设任务(例如,图像分类或文本翻译等)时,通常需要预先对神经网络模型进行训练,以提高模型输出任务结果的准确率。神经网络一般可以包括多个权重系数矩阵,在利用神经网络进行预设任务运算时,以分类任务为例,可以向神经网络中输入待分类对象的数据向量,以便神经网络基于该数据向量和自身的多个权重系数矩阵的向量进行计算,得到该数据向量对应的输出向量,而后神经网络可以基于该输出向量对待分类对象进行分类。通常情况下,初始状态下的神经网络中的权重系数矩阵是未知的,为了获取更为准确的权重系数矩阵,以使神经网络能够获得更为准确的运算结果,需要对初始状态下的神经网络进行训练,并在训练过程中,根据神经网络的输出结果和理想的输出结果的差异,对神经网络中每一层包含的权重系数矩阵进行不断更新和修正,直至神经网络能够基于修正后的权重系数矩阵对任一数据向量进行处理后,都能得到一个接近理想的输出向量为止。
但在对神经网络模型进行训练时,为了保证神经网络模型输出结果的准确性,需要利用训练数据反复对每一层包含的权重系数进行不断更新和修正。而目前的神经网络通常会包含较多层(大于15层)网络,而且每一层包含的权重系数矩阵的数据量也较为庞大,导致在训练过程中反复读入权重系数矩阵数据会带来内存瓶颈。甚至在资源受限的场景下难以进行神经网络的训练。
发明内容
本申请实施例提供了一种神经网络模型训练方法、一种图像分类方法、一种文本翻译方法及装置、设备,能够减小神经网络模型训练中权重矩阵的数据量和权重矩阵更新过程中的中间参数计算量,从而使得在利用该神经网络模型执行预设任务(例如,图像分类或文本翻译等)时,能够解决内存瓶颈的问题,并达到预期效果。
第一方面,本申请提供了一种神经网络模型训练方法,该方法包括:首先从内存中获取对应于神经网络模型的第一权重矩阵的码字,然后根据码字确定神经网络模型的权重矩 阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练,在预设停止条件未被满足时,更新码字,得到更新后的码字,并将更新后的码字存储在内存中,接着,利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练,进而在预设停止条件被满足时,停止神经网络模型的训练。
与传统技术相比,由于本申请实施例在对神经网络模型进行训练时,不再直接从内存中读入权重矩阵,而是读入权重矩阵对应的码字,用以构成权重矩阵进行训练,由于码字占据的内存空间要远远小于权重矩阵占据的内存空间,所以能够大幅降低从内存中读入的数据量,克服了内存瓶颈问题。并且,由于本申请在模型训练过程中,不再计算权重矩阵的更新量,而是计算码字的更新量,用以重新确定新的权重矩阵进行后续训练,从而能够减少更新过程中的中间参数计算量,进而使得在资源受限的场景下能够顺利进行神经网络模型的训练。
一种可能的实现方式中,第一权重矩阵为初始权重矩阵时,该方法还包括:
将初始权重矩阵进行划分,以确定初始权重矩阵对应的码字。以便于后续将码字存储到内存中,减小内存占用量。
一种可能的实现方式中,将初始权重矩阵进行划分,以确定初始权重矩阵对应的码字,包括:
将初始权重矩阵划分为k个相同维度的子矩阵;其中,k为大于1的正整数;
将k个相同维度的子矩阵进行聚类处理,得到k个相同维度的子矩阵对应的n个码字,其中,n为大于0的正整数,n≤k;
将n个码字确定为初始权重矩阵对应的码字。
这样,在对神经网络模型进行训练时,可以从内存中获取对应于神经网络模型的初始权重矩阵的码字,进行模型训练,而不需要直接读入初始权重矩阵进行训练,由于码字所占据的数据存储空间要远远小于权重矩阵所占据的数据存储空间,所以能够大幅降低从内存中读入的数据量,有效克服内存瓶颈问题。
一种可能的实现方式中,将k个相同维度的子矩阵进行聚类处理,得到k个相同维度的子矩阵对应的n个码字,包括:
将k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
将k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量;
将k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
这样,通过聚类和求平均的方式,得到能够表征n个码字,且每一码字能够同时表征多个子矩阵,进而后续可利用这n个码字快速解码出神经网络的权重矩阵。
一种可能的实现方式中,该方法还包括:
当预设停止条件未被满足时,在内存中释放所述神经网络模型的权重矩阵。这样,可以进一步节省内存空间,有利于克服内存瓶颈问题。
一种可能的实现方式中,在预设停止条件未被满足时,更新码字,得到更新后的码字,包括:
在预设停止条件未被满足时,确定神经网络模型的第一权重矩阵的权重梯度;
根据第一权重梯度,确定码字梯度,并根据码字梯度,确定更新后的码字。
这样,可以根据权重梯度确定出码字梯度,以得到更为准确的更新后的码字,用以进行后续的模型训练。
一种可能的实现方式中,根据权重梯度,确定码字梯度,并根据码字梯度,确定更新后的码字,包括:
将权重梯度中属于第j个码字对应的子矩阵的权重梯度进行加权求和,得到第j个码字对应的码字梯度;其中,j分别取1到n的整数;
对第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
利用第j个码字的更新量对第j个码字进行更新,得到更新后的第j个码字。
这样,可以准确的确定出各个码字,用以解码出新的权重矩阵进行后续模型训练。
一种可能的实现方式中,该方法还包括:
获取索引,其中,索引为码字和神经网络模型的权重矩阵之间的对应关系。以便于可以利用码字和索引,更加准确地解码出神经网络模型的权重矩阵。
一种可能的实现方式中,预设停止条件包括以下一项或多项条件:
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差低于预设差值;
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差的变化率低于预设变化阈值;
神经网络模型中的模型参数的更新次数达到预设更新次数;
神经网络模型所采用的损失函数的输出值达到预设阈值;其中,损失函数用于衡量神经网络模型对所述训练数据的输出结果与训练数据对应的结果标签值之间的差距。
第二方面,本申请还提供了一种图像分类方法,该方法包括:获取待分类图像;将待分类图像输入训练好的神经网络模型,得到神经网络模型输出的图像分类结果;其中,神经网络模型的训练过程包括:首先从内存中获取对应于神经网络模型的第一权重矩阵的码字,然后根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练,其中,训练数据包括正样本图像和负样本图像。当神经网络模型输出训练数据为正样本图像的概率值后,在预设停止条件未被满足时,更新码字,得到更新后的码字,并将更新后的码字存储在内存中,接着,利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练,进而在预设停止条件被满足时,停止神经网络模型的训练。
与传统技术相比,本申请实施例是利用预先训练好的神经网络模型对待分类图像进行分类,由于该神经网络模型能够达到全局最优,使得该神经网络模型输出的分类结果更准确,进而能够提高分类结果准确性。
第三方面,本申请还提供了一种文本翻译方法,该方法包括:获取待翻译文本;将待翻译文本输入训练好的神经网络模型,得到神经网络模型输出的文本翻译结果;其中,神经网络模型的训练过程包括:首先从内存中获取对应于神经网络模型的第一权重矩阵的码字,然后根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一 权重矩阵进行训练,其中,练数据为样本文本。当神经网络模型输出样本文本的翻译结果后,在预设停止条件未被满足时,更新码字,得到更新后的码字,并将更新后的码字存储在内存中,接着,利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练,进而在预设停止条件被满足时,停止神经网络模型的训练。
与传统技术相比,本申请实施例是利用预先训练好的神经网络模型对待翻译文本进行翻译,由于该神经网络模型能够达到全局最优,使得该神经网络模型输出的翻译结果更准确,进而能够提高翻译结果准确性。
第四方面,本申请还提供了一种神经网络模型训练装置,该装置包括:第一获取单元,用于从内存中获取码字,其中,码字对应于神经网络模型的第一权重矩阵;第一训练单元,用于根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练;更新单元,用于在预设停止条件未被满足时,更新码字,得到更新后的码字;存储单元,用于将更新后的码字存储在内存中;第二训练单元,用于利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练;停止单元,用于在预设停止条件被满足时,停止神经网络模型的训练。
一种可能的实现方式中,第一权重矩阵为初始权重矩阵时,该装置还包括:
划分单元,用于将初始权重矩阵进行划分,以确定初始权重矩阵对应的码字。
一种可能的实现方式中,划分单元包括:
第一划分子单元,用于将所述初始权重矩阵划分为k个相同维度的子矩阵;所述k为大于1的正整数;
聚类子单元,用于将所述k个相同维度的子矩阵进行聚类处理,得到所述k个相同维度的子矩阵对应的n个码字,所述n为大于0的正整数,n≤k;
第一确定子单元,用于将所述n个码字确定为所述初始权重矩阵对应的码字。
一种可能的实现方式中,聚类子单元包括:
降维子单元,用于将k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
第二划分子单元,用于将k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量;
计算子单元,用于将k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
一种可能的实现方式中,该装置还包括:释放单元,用于当预设停止条件未被满足时,在内存中释放神经网络模型的权重矩阵。
一种可能的实现方式中,更新单元包括:
第二确定子单元,用于在预设停止条件未被满足时,确定神经网络模型的第一权重矩阵的权重梯度;
第三确定子单元,用于根据第一权重梯度和索引,确定码字梯度,并根据码字梯度,确定更新后的码字。
一种可能的实现方式中,第三确定子单元包括:
第一获得子单元,用于将权重梯度中属于第j个码字的索引编号对应的子矩阵的权重梯度进行加权求和,得到第j个码字对应的码字梯度;其中,j分别取1到n的整数;
第二获得子单元,用于对第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
第三获得子单元,用于利用第j个码字的更新量对第j个码字进行更新,得到更新后的第j个码字。
一种可能的实现方式中,该装置还包括:
第二获取单元,用于获取索引,其中,索引为码字和神经网络模型的权重矩阵之间的对应关系。
一种可能的实现方式中,预设停止条件包括以下一项或多项条件:
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差低于预设差值;
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差的变化率低于预设变化阈值;
神经网络模型中的模型参数的更新次数达到预设更新次数;
神经网络模型所采用的损失函数的输出值达到预设阈值;其中,损失函数用于衡量神经网络模型对训练数据的输出结果与训练数据对应的结果标签值之间的差距。
第五方面,本申请还提供了一种图像分类装置,该装置包括:图像获取单元,用于获取待分类图像;图像分类单元,用于将待分类图像输入训练好的神经网络模型,得到神经网络模型输出的图像分类结果;神经网络模型训练单元,用于训练所述神经网络模型;
其中,神经网络模型训练单元包括:
第一获取单元,用于从内存中获取码字,其中,码字对应于神经网络模型的第一权重矩阵;
第一训练单元,用于根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练;其中,训练数据包括正样本图像和负样本图像;
更新单元,用于当神经网络模型输出训练数据为正样本图像的概率值后,在预设停止条件未被满足时,更新码字,得到更新后的码字;
存储单元,用于将更新后的码字存储在内存中;
第二训练单元,用于利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练;
停止单元,用于在预设停止条件被满足时,停止神经网络模型的训练。
第六方面,本申请还提供了一种文本翻译装置,该装置包括:文本获取单元,用于获取待翻译文本;文本翻译单元,用于将待翻译文本输入训练好的神经网络模型,得到神经网络模型输出的文本翻译结果;神经网络模型训练单元,用于训练神经网络模型;
其中,神经网络模型训练单元包括:
第一获取单元,用于从内存中获取码字,其中,码字对应于神经网络模型的第一权重矩阵;
第一训练单元,用于根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练;其中,训练数据为样本文本;
更新单元,用于当神经网络模型输出所述样本文本的翻译结果后,在预设停止条件未被满足时,更新码字,得到更新后的码字;
存储单元,用于将更新后的码字存储在内存中;
第二训练单元,用于利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练;
停止单元,用于在预设停止条件被满足时,停止神经网络模型的训练。
第七方面,本申请还提供了一种神经网络模型训练设备,该神经网络模型训练设备包括:存储器、处理器;
存储器,用于存储指令;处理器,用于执行存储器中的指令,执行上述第一方面及其任意一种可能的实现方式中的神经网络模型训练方法。
第八方面,本申请还提供了一种图像分类设备,该图像分类设备包括:存储器、处理器;
存储器,用于存储指令;处理器,用于执行存储器中的指令,执行上述第二方面中的图像分类方法。
第九方面,本申请还提供了一种文本翻译设备,该文本翻译设备包括:存储器、处理器;
存储器,用于存储指令;处理器,用于执行存储器中的指令,执行上述第三方面中的文本翻译方法。
第十方面,本申请还提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面及其任意一种可能的实现方式中的神经网络模型训练方法,或者执行上述第二方面中的图像分类方法,或者执行上述第三方面中的文本翻译方法。
从以上技术方案可以看出,本申请实施例具有以下优点:
本申请实施例在对神经网络模型进行训练时,首先从内存中获取对应于神经网络模型的第一权重矩阵的码字,然后根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对该第一权重矩阵进行训练,在预设停止条件未被满足时,对码字进行更新,得到更新后的码字,并将更新后的码字存储在内存中,接着,利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练,进而在预设停止条件被满足时,停止神经网络模型的训练。可见,本申请实施例在对神经网络模型进行训练时,不再直接从内存中读入权重矩阵,而是读入权重矩阵对应的码字,用以构成权重矩阵进行训练,由于码字占据的内存空间要远远小于权重矩阵占据的内存空间,所以能够大幅降低从内存中读入的数据量,克服了内存瓶颈问题。并且,由于本申请在模型训练过程中,不再计算权重矩阵的更新量,而是计算码字的更新量,用以重新确定新的权重矩阵进行后续训练,从而能够减少更新过程中的中间参数计算量,进而使得在资源受限的场景下能够顺利进行神经网络模型的训练。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的人工智能主体框架的一种结构示意图;
图2为本申请实施例应用的一种系统架构的示例图;
图3为本申请实施例提供的一种神经网络模型训练方法的流程图;
图4为本申请实施例的提供的根据码字确定神经网络模型的第一权重矩阵的示意图;
图5为本申请实施例的提供的码字更新的示意图;
图6为本申请实施例提供的一种图像分类方法的流程图;
图7为本申请实施例提供的一种文本翻译方法的流程图;
图8为本申请实施例提供的一种神经网络模型训练装置的结构框图;
图9为本申请实施例提供的一种图像分类装置的结构框图;
图10为本申请实施例提供的一种文本翻译装置的结构框图;
图11为本申请实施例提供的一种神经网络模型训练设备的结构示意图;
图12为本申请实施例提供的一种图像分类设备的结构示意图;
图13为本申请实施例提供的一种文本翻译设备的结构示意图。
具体实施方式
本申请实施例提供了一种神经网络模型训练方法、一种图像分类方法、一种文本翻译方法及装置、设备,能够减小神经网络模型训练中权重矩阵的数据量和权重矩阵更新过程中的中间参数计算量,以解决内存瓶颈的问题,并达到预期训练效果。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据, 这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。
由于本申请实施例涉及神经网络模型的训练过程,为了便于理解,下面先对本申请实施例可能涉及的神经网络模型的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2021086589-appb-000001
其中,s=1、2、……、n,n为大于1的自然数,Ws为Xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)损失函数
在训练神经网络的过程中,因为希望神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深 度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(3)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
本申请可以应用于人工智能领域中,下面将对本申请实施例中神经网络模型训练应用的一种系统结构进行介绍。
参见图2,其示出了本申请实施例应用的一种系统架构的示例图,如图2所示,该场景中,包括了内存201、处理器202和AI硬件加速器203。内存201与处理器202连接,处理器202与AI硬件加速器203连接。上述“连接”可以是直接连接,也可以是间接连接。
其中,内存201是计算机中重要的部件之一,它是外部存储器与处理器202进行沟通的桥梁。并且,计算机中所有程序的运行都是在内存中进行的。
处理器202可以是中央处理器(central processing unit,CPU),用于为挂载其上的AI硬件加速器203分配加速任务等。
AI硬件加速器203可以是一个独立的芯片,也可以作为一个功能模块集成到一个片上系统中(system on chip,SoC)。其主要包括矩阵计算单元(cube unit)、向量计算单元(vector unit)和缓存(buffer)。
矩阵计算单元,用于完成矩阵乘矩阵计算,比如完成神经网络中的梯度计算以及卷积层和全连接层对应的矩阵乘矩阵计算。具体来讲,在进行卷积层或全连接层运算时,矩阵计算单元从数据缓存单元和参数缓存单元中读取数据矩阵对应的数据,其中,从参数缓存单元中读取的参数数据是通过存储器读写控制器搬运到参数缓存单元中,在搬运过程中,需要先通过解压引擎进行参数数据解压缩操作,然后才能在矩阵计算单元上进行矩阵乘计算,得到矩阵的部分结果或最终结果,保存在累加器中。
向量处理单元可以在实际情况需要的情况下,对矩阵计算单元的输出结果做进一步优化处理,如向量乘、向量加、指数运算、对数运算、大小比较等操作。主要用于神经网络中非卷积层和全连接层外其他层的网络计算,如激活函数(rectified linear unit,Relu)层、池化(Pooling)层等。
缓存用于保存从内存加载到AI硬件加速器内部的数据和计算过程中产生的中间数据等。
本申请应用于AI硬件加速器203的神经网络模型训练过程如下:
在本申请实施例中,AI硬件加速器203首先通过处理器202从内存201中获取神经网络模型的初始权重矩阵对应的码字和索引,然后根据获取的码字和索引,确定神经网络模型的权重矩阵,并利用训练数据对该权重矩阵进行训练,当不满足预设停止条件时,对码字进行更新,并利用更新后的码字和之前获取到的索引,重新确定权重矩阵,用以重新进行模型训练,以此类推,只要不满足预设停止条件,就重复对码字进行更新,并利用更新后的码字重新确定权重矩阵,用以重新进行模型训练。直至满足预设停止条件为止,这样,通过减小神经网络模型训练中权重矩阵的数据量和权重矩阵更新过程中的中间参数计算量,不仅解决了内存瓶颈的问题,还能够达到预期的模型训练效果。
需要注意的是,上述应用场景仅是为了便于理解本申请而示出,本申请的实施方式在此方面不受任何限制。相反,本申请的实施方式可以应用于适用的任何场景。
基于以上应用场景,本申请实施例提供了一种神经网络模型训练方法,该方法可应用于AI硬件加速器203,如图3所示,该方法包括:
S301:从内存中获取码字,其中,码字对应于神经网络模型的第一权重矩阵。
在本实施例中,为了克服神经网络模型训练中的内存瓶颈问题,不再反复载入权重矩阵,而是从内存中获取码字,以进行模型训练,其中,码字对应于神经网络模型的第一权重矩阵。并且,当第一权重矩阵为初始权重矩阵时,需要将初始权重矩阵进行划分,以确定初始权重矩阵对应的码字。即,需要预先对神经网络模型的初始权重矩阵进行预处理,将其拆分成对应的码字和对应的索引,其中,码字指的是字典中每个出现状态的表征,而本申请中的每一个码字指的是权重矩阵中的一个子矩阵,且码字占据的内存空间要远远小于权重矩阵占据的内存空间,码字的详细内容请参见后续步骤A2中的相关介绍。索引表征了码字和神经网络模型的权重矩阵之间的对应关系,且神经网络模型每一层包含的权重均值分别各自对应了不同的码字和索引。这样,在对神经网络模型进行训练时,可以从内存中获取对应于神经网络模型的初始权重矩阵的码字,用以执行后续步骤S302-S306来完成模型训练,而不需要直接读入初始权重矩阵进行训练,由于码字所占据的数据存储空间要远远小于权重矩阵所占据的数据存储空间,所以能够大幅降低从内存中读入的数据量,有效克服内存瓶颈问题。
在本实施例的一种可能的实现方式中,预先对神经网络模型的初始权重矩阵进行预处理,将其拆分成对应的码字和索引的具体实现过程可以包括下述步骤A1-A3:
步骤A1:将初始权重矩阵划分为k个相同维度的子矩阵,并确定k个相同维度的子矩阵对应的索引编号;其中,k为大于1的正整数。
在本实现方式中,对神经网络模型的初始权重矩阵进行预处理指的是预先将神经网络模型中每一层包含的初始权重矩阵分别进行预处理,使得每一层包含的初始权重分别各自对应了不同的码字和索引。需要说明的是,在后续内容中,本实施例将以神经网络模型中某一层包含的初始权重矩阵为准来介绍如何对初始权重矩阵进行预处理,以得到其对应的码字和索引并对其进行后续处理,而其它层包含的初始权重矩阵的处理方式与之类似,不再一一赘述。
具体来讲,本申请首先将初始权重划分为k个相同维度的子矩阵,并确定每个子矩阵 对应的索引编号,二者是一一对应的关系(即一个子矩阵对应一个索引编号),比如,可以将k个相同维度的子矩阵对应的索引编号分别定义为i 0、i 1、...i k-1,用以执行步骤A3。其中,k为大于1的正整数。
步骤A2:将k个相同维度的子矩阵进行聚类处理,得到k个相同维度的子矩阵对应的n个码字,并确定n个码字对应的索引值,其中,n为大于0的正整数,n≤k。
在本实现方式中,通过步骤A1将初始权重矩阵划分为k个相同维度的子矩阵后,进一步可以将这k个相同维度的子矩阵进行聚类处理,得到n个类别中心(即n个码字),其中,n为大于0的正整数,且n≤k,这样,可以利用每一类别中心对应的子矩阵(即码字),来表征其所属类别中的各个子矩阵。并且,进一步可以确定这n个码字各自对应的索引值,二者是一一对应的关系(即一个码字对应一个索引值),比如,可以将这n个码字对应的索引值分别定义为1、2、…n,用以执行步骤A3。
具体来讲,一种可选的实现方式是,本步骤A2中“将k个相同维度的子矩阵进行聚类处理,得到k个相同维度的子矩阵对应的n个码字”的具体实现过程可以包括下述步骤A21-A23:
步骤A21:将k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量。
在本实现方式中,为了确定k个相同维度的子矩阵对应的n个码字,首先需要将k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量。
举例说明:假设k个相同维度的子矩阵中包含有一个2*3阶的矩阵:
Figure PCTCN2021086589-appb-000002
则可以将其降维成一个包含6个元素的一维向量[a 1,a 2,a 3,a 4,a 5,a 6]。
步骤A22:将k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量。
在本实现方式中,通过步骤A21将k个相同维度的子矩阵分别降维成对应的k个一维向量后,进一步可以将这k个一维向量进行分组,比如,可以将其中元素值较为接近的向量划分为一个向量组,使得每个向量组中包含有至少一个一维向量。
步骤A23:将k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
在本实现方式中,通过步骤A22将k个一维向量划分为n个向量组后,进一步可以确定出每个向量组的中心向量,用以确定该向量组对应的码字。具体的,以第i个向量组为例(i可以取为1到n中的任意一个整数),假设第i个向量组中包含有3个一维向量,分别为:[a 1,a 2,a 3,a 4,a 5,a 6]、[b 1,b 2,b 3,b 4,b 5,b 6]、[c 1,c 2,c 3,c 4,c 5,c 6],则可以将这三个一个向量中对应位置的元素值进行求平均计算,得到一个一维平均向量为:
Figure PCTCN2021086589-appb-000003
该一维向量即为第i个向量组对应的码字,且该码字的长度为6,也就是该一维向量包含的元素个数。
需要说明的是,在确定第i个向量组中所有一维向量对应的码字时,还可以利用其他数据处理方式,对第i个向量组中所有一维向量中对应位置的元素值进行处理,比如可以进行加权求平均等,具体处理方式可根据实际情况进行选取,本申请实施例对此不进行限制。
步骤A3:将n个码字确定为初始权重矩阵对应的码字,将n个码字对应的索引值以及索引值对应的索引编号构成初始权重矩阵对应的索引。
在本实现方式中,通过步骤A2将初始权重矩阵划分为k个相同维度的子矩阵,并确定出这k个相同维度的子矩阵对应的索引编号,以及通过步骤A2确定出k个相同维度的子矩阵对应的n个码字以及这n个码字对应的索引值后,可以将这n个码字确定为初始权重对应的码字进行存储,或者,将这n个码字构成一个字典进行存储。同时,可以将这n个码字对应的索引值(例如1、2、…n)以及索引值对应的索引编号(如i 0、i 1、...i k-1)构成初始权重对应的索引。
需要说明的是,由于子矩阵与索引编号是一一对应的关系(即一个子矩阵对应一个索引编号),且码字与索引值是一一对应的关系(即一个码字对应一个索引值),而码字又是一个向量组的中心向量(如平均向量),其中,向量组中每个向量又对应了一个子矩阵,所以,一个码字能够对应表征多个子矩阵,进而可得一个索引值可能对应多个索引编号。
进一步的,在确定出初始权重矩阵对应的码字和索引后,可以将码字存储在内存中,相比于直接将初始权重矩阵保存在内存中,码字所占据的存储空间要小得多,或者,也可以将码字和索引均存储在内存中,相比于直接将初始权重矩阵保存在内存中,二者所占据的存储空间也要小得多,从而可以大幅降低内存中参数量占据的存储空间,通常压缩率可达到40倍左右。例如,对于528MB的权重矩阵,在对其进行上述过程的预处理后,可以得到其对应的码字和索引共14.45MB,其中,码字为1.16MB,索引为13.29MB,压缩率接近了40倍。
需要说明的是,在确定出初始权重矩阵对应的码字和索引后,也可以将其存储在外部存储器(如硬盘)中,再由外部存储器输入内存中,具体存储位置,本申请实施例不进行限制。
S302:根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练。
在本实施例中,通过步骤S301从内存中获取到对应于神经网络模型的初始权重矩阵的码字后,进一步可以根据获取到的码字确定神经网络模型的新的权重矩阵(此处将其定义为第一权重矩阵)。一种可选的实现方式是,还需要获取表征码字和神经网络模型的权重矩阵之间的对应关系的索引,进而可以利用获取到的码字、索引以及权重矩阵之间的对应关系,确定出神经网络模型的第一权重矩阵。
具体的,可以利用码字与索引值之间一一对应的关系、索引值与索引编号之间一对多的关系、以及索引编号与子矩阵之间一一对应的关系,解码出神经网络模型的第一权重矩阵,并利用训练数据对神经网络模型的第一权重矩阵进行训练,具体计算公式如下:
W=D mat·I oh         (2)
其中,D mat表示码字组成的第一权重矩阵,该矩阵的维度为c×n,其中,c表示码字的长度,n表示码字的个数;I oh表示索引构成的one-hot矩阵,该矩阵的维度为n×k,其中,基于one-hot矩阵的特点,每一行仅在对应于索引编号位置的值为1,其他位置的值为0。
举例说明:如图4所示,左侧图为由n个码字构成一个字典,分别为:第1个码字、第2个码字、…、第n个码字,中间图为由n个码字对应的索引值(即1、2、…、n)以及索引值对应的索引编号(即i 0、i 1、...i k-1)构成索引。如图中黑色的粗箭头指示,字典中第1个码字对应的索引值为“1”,且该索引值对应了两个索引编号,分别是索引图中第一行第一列的索引编号和第二行第二列的索引编号,这两个索引编号又分别对应了神经网络模型的权重矩阵中的两个子矩阵,分别是右侧的权重矩阵图中第一行第一列中利用浅灰色方块表示的子矩阵和第二行第二列中利用浅灰色表示的子矩阵,进而可利用这样的对应关系,根据字典中的第1个码字解码出权重矩阵中的第一行第一列的子矩阵和第二行第二列的子矩阵。
同理,如图4中黑色的细箭头指示,第2个码字对应的索引值为“2”,且该索引值对应了一个索引编号,即索引图中第三行第一列的索引编号,该索引编号对应了神经网络模型的权重矩阵中的一个子矩阵,即右侧的权重矩阵图中第三行第一列中利用深灰色方块表示的子矩阵,进而可利用这样的对应关系,根据字典中的第2个码字解码出权重矩阵中的第三行第一列的子矩阵,依次类推,即可利用字典中的各个码字以及码字、索引、权重矩阵的子矩阵三者之间的对应关系,解码出神经网络模型的整个权重矩阵。进而可以利用训练数据对该权重矩阵进行训练。
但需要说明的是,由于码字是通过上述步骤A21-A23,将初始矩阵对应的子矩阵进行聚类、平均后生成的,所以利用该码字和索引确定出的神经网络模型的第一权重矩阵相比于初始权重矩阵来说,二者所占据的数据空间虽然是一致的,但二者包含的权重元素并不是完全相同的,进而导致二者的权重值也是不完全一致的,但二者是非常接近的,即,可以利用确定出的神经网络模型的权重矩阵代替初始权重矩阵进行模型训练。
S303:当预设停止条件未被满足时,更新码字,得到更新后的码字。
在本实施例中,通过步骤S302利用训练数据对神经网络模型的第一权重矩阵进行训练后,进一步需要判断是否满足预设的停止条件,其中,预设的停止条件指的是预先设定的停止训练时需要满足的条件。可以为训练数据的结果标签值与模型对训练数据的输出结果之差低于预设差值;也可以为训练数据的结果标签值与模型对训练数据的输出结果之差的变化率低于预设变化阈值;还可以为模型参数的更新次数达到预设更新次数(如100次)等;再或者,还可以是表征模型输出结果与目标结果值之间差异的损失函数的输出值(loss)达到预设阈值(如0.1)等,在预设停止条件未被满足时,则需要根据当前训练的结果,对码字进行更新,得到更新后的码字,用以通过后续步骤S304,对模型进行重新训练。
在本实施例的一种可能的实现方式中,本步骤S303的具体实现过程可以包括下述步骤B1-B2:
步骤B1:在预设停止条件未被满足时,确定神经网络模型的第一权重矩阵的权重梯度。
在本实现方式中,当通过步骤S302利用训练数据对神经网络模型的第一权重矩阵进行训练后,判断出不满足预设的停止条件,如loss值并未达到预设阈值时,可以利用loss值经过反向计算,确定出神经网络模型的第一权重矩阵的权重梯度(此处将其定义为),用以执行后续步骤B2。
步骤B2:根据第一权重梯度和索引,确定码字梯度,并根据码字梯度,确定更新后的码字。
在本实现方式中,当通过步骤B1确定出神经网络模型的第一权重矩阵的权重梯度后,进一步可以根据码字、索引以及权重矩阵之间的对应关系,确定出码字梯度。具体的,可以利用码字与索引值之间一一对应的关系、索引值与索引编号之间一对多的关系、以及索引编号与子矩阵之间一一对应的关系,将属于同一码字的索引编号对应的子矩阵中的权重梯度进行处理,以得到该码字对应的码字梯度,具体计算公式如下:
Figure PCTCN2021086589-appb-000004
其中,g D表示码字梯度;
Figure PCTCN2021086589-appb-000005
表示索引构成的one-hot矩阵I oh的转置;g w表示神经网络模型的第一权重矩阵的权重梯度。
具体来讲,一种可选的实现方式是,本步骤B2的具体实现过程可以包括下述步骤B21-B23:
步骤B21:将权重梯度中属于第j个码字的索引编号对应的子矩阵的权重梯度进行加权求和,得到第j个码字对应的码字梯度;其中,j分别取1到n的整数。
在本实现方式中,为了得到更新后的码字,需要对属于同一码字的索引编号对应的子矩阵中的权重梯度进行处理,并根据处理结果确定该码字对应的更新后的值。具体的,以第j个码字为例(j可以取为1到n中的任意一个整数),该码字对应的索引编号可能有多个,而其中每个索引编号又分别对应了一个子矩阵,进而可以将每个索引编号对应的子矩阵的权重梯度进行加权求和计算,并将计算结果作为第j个码字对应的码字梯度。
需要说明的是,在确定第j个码字对应的码字梯度时,还可以利用其他数据处理方式,对属于第j个码字的每个索引编号对应的子矩阵的权重梯度进行处理,比如可以直接进行累加求和等,具体处理方式可根据实际情况进行选取,本申请实施例对此不进行限制。
步骤B22:对第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量。
在本实现方式中,通过步骤B21得到第j个码字对应的码字梯度后,进一步可以对该麦子梯度进行优化处理,以得到第j个码字的更新量。比如,可以利用应用较为广泛的Adam优化器,对第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量,同时在优化过程中会生成四个中间参数,分别为:一阶动量m t、二阶动量v t、一阶动量修正值
Figure PCTCN2021086589-appb-000006
二阶动量修正值
Figure PCTCN2021086589-appb-000007
需要说明的是,优化过程中生成这四个中间参数(即m t、v t
Figure PCTCN2021086589-appb-000008
)的每一参数的数据量与第j个码字的码字梯度的数量是一致的。
步骤B23:利用第j个码字的更新量对第j个码字进行更新,得到更新后的第j个码字。
在本实现方式中,通过步骤B22得到第j个码字的更新量后,进一步可以利用该更新量对第j个码字进行更新,比如可以利用第j个码字减掉该更新量后的结果,或者,利用第 j个码字与该更新量相加后的结果,作为更新后的第j个码字,用以执行后续步骤S305。
举例说明:如图5所示,右侧图为权重矩阵的权重梯度,中间图为由n个码字对应的索引值(即1、2、…、n)以及索引值对应的索引编号(即i 0、i 1、...i k-1)构成索引。如图中黑色的粗箭头指示,字典中第1个码字对应的索引值为“1”,且该索引值对应了两个索引编号,分别是索引图中第一行第一列的索引编号和第二行第二列的索引编号,这两个索引编号又分别对应了权重矩阵的权重梯度中的两个子矩阵的权重梯度,分别是右侧的权重梯度图中第一行第一列中的权重梯度和第二行第二列中的权重梯度,进而可以将这两个权重梯度进行加权求和,得到第1个码字对应的码字梯度。
同理,如图5中黑色的细箭头指示,第2个码字对应的索引值为“2”,且该索引值页对应了两个索引编号,分别是索引图中第三行第一列的索引编号和第四行第三列的索引编号,这两个索引编号又分别对应了权重矩阵的权重梯度中的两个子矩阵的权重梯度,分别是右侧的权重梯度图中第三行第一列中的权重梯度和第四行第三列中的权重梯度,进而可以将这两个权重梯度进行加权求和,得到第2个码字对应的码字梯度,依次类推,即可确定出每个码字各自对应的码字梯度。进而可以利用Adam优化器对各个码字梯度进行优化处理,得到各个码字的更新量,再利用各个码字的更新量分别对各个码字进行更新,得到更新后的各个码字。
S304:将更新后的码字存储在内存中。
在本实施例中,通过步骤S303得到更新后的各个码字后,进一步可以将更新后的码字存储在内存中,用以执行后续步骤S305。
S305:利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练。
在本实施例中,通过步骤S304将更新后的码字存储在内存中后,进一步可以利用在内存中获取的更新后的码字,通过执行上述步骤S302,重新确定出确定神经网络模型的新的权重矩阵(此处将其定义为第二权重矩阵,以替代步骤S302中介绍的第一权重矩阵),并利用训练数据对神经网络模型的第二权重矩阵进行下一轮模型训练。具体实现过程可参见上述步骤S302的介绍,在此不再赘述。
需要说明的是,当通过步骤S302利用训练数据对神经网络模型的第二权重矩阵进行训练后,判断出预设的停止条件仍未被满足时,为了节省内存空间,可以在内存中释放当前神经网络模型的权重矩阵,再通过上述步骤S303-S305重新确定新的神经网络模型的权重矩阵,用以进行下一轮模型训练。
S306:在预设停止条件被满足时,停止神经网络模型的训练。
在本实施例中,通过步骤S305利用训练数据对神经网络模型的第二权重取值(或者是后续新的权重矩阵)进行下一轮模型训练后,进一步还需要判断是否满足预设的停止条件,比如,需要判断是loss值是否达到预设阈值等,当预设停止条件仍未被满足时,则需要根据一轮模型训练的结果,对码字进行再次更新,得到再次更新后的码字,用以通过上述步骤S305,对模型进行再次的重新训练。依次类推,重复执行当不满足预设停止条件时,更新码字以及后续步骤(即步骤S303和S305),直至在预设停止条件被满足时,停止神经网 络模型的训练。
综上,本实施例提供的一种神经网络模型训练方法,在对神经网络模型进行训练时,首先从内存中获取对应于神经网络模型的第一权重矩阵的码字,然后根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对该第一权重矩阵进行训练,在预设停止条件未被满足时,对码字进行更新,得到更新后的码字,并将更新后的码字存储在内存中,接着,利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练,进而在满足预设停止条件被满足时,停止神经网络模型的训练。可见,本申请实施例在对神经网络模型进行训练时,不再直接从内存中读入权重矩阵,而是读入权重矩阵对应的码字和索引,用以构成权重矩阵进行训练,由于码字占据的内存空间要远远小于权重矩阵占据的内存空间,所以能够大幅降低从内存中读入的数据量,克服了内存瓶颈问题。并且,由于本申请在模型训练过程中,不再计算权重矩阵的更新量,而是计算码字的更新量,用以重新确定新的权重矩阵进行后续训练,从而能够减少更新过程中的中间参数计算量,进而使得在资源受限的场景下能够顺利进行神经网络模型的训练。
举例说明:在采用现有方法对分类网络VGG16进行训练时,读入的权重矩阵的数据量为528MB,产生的权重梯度为528MB,在对其进行优化时产生的四个中间参数(即m t、v t
Figure PCTCN2021086589-appb-000009
)中每一参数的数据量也均为528MB,因此,总计需要内存空间为3.17GB。而采用本申请提供的模型训练方法对VGG16进行训练时,读入的是权重矩阵对应的码字和索引,二者共14.45MB,其中,码字为1.16MB,索引为13.29MB,产生的码字梯度为1.16MB,在对其进行优化时产生的四个中间参数(即m t、v t
Figure PCTCN2021086589-appb-000010
)中每一参数的数据量也均为1.16MB,因此,总计需要内存空间为20.25MB,相比于3.17GB,计算的数据量大幅下降。
在采用现有方法对翻译网络模型transformer进行训练时,读入的权重矩阵的数据量为471MB,产生的权重梯度为471MB,在对其进行优化时产生的四个中间参数(即m t、v t
Figure PCTCN2021086589-appb-000011
)中每一参数的数据量也均为471MB,因此,总计需要内存空间为2.76GB。而采用本申请提供的模型训练方法对transformer进行训练时,读入的是权重矩阵对应的码字和索引,二者共11.46MB,其中,码字为0.12MB,索引为11.34MB,产生的码字梯度为0.12MB,在对其进行优化时产生的四个中间参数(即m t、v t
Figure PCTCN2021086589-appb-000012
)中每一参数的数据量也均为0.12MB,因此,总计需要内存空间为12.06MB,相比于2.76GB,计算的数据量也大幅下降。
另外,本申请实施例还提供一种图像分类方法,基于上述实施例提供的神经网络模型训练方法,可以将根据神经网络模型训练方法获得的神经网络模型应用到图像分类中,参见图6,该图为本申请实施例提供的一种图像分类方法的流程图,该方法可以包括:
S601:获取待分类图像。
S602:将待分类图像输入训练好的神经网络模型,得到神经网络模型输出的图像分类结果。
在本实施例中,首先获取待分类的图像,并将该待分类的图像输入预先训练好的神经网络模型中,以获得该待分类的图像对应的图像分类结果。在具体实现时,神经网络模型不仅可以输出待分类图像对应的分类结果,还可以输出每种分类结果对应的概率值,从而便于用户直接了解待分类图像的分类情况。
举例说明:以待分类图像为医学影像图像为例,所使用的神经网络模型为可以对医学影像图像进行分类的模型,通过将医学影像图像(或其对应的特征图)输入神经网络模型可以获得该医学影像图像的具体分类结果。例如,可以识别出所输入的医学影像图像是携带有某种特征或具有某种分类结果的医学影像图像,还是未携带有某种特征或不具有某种分类结果的医学影像图像。
其中,神经网络模型的训练过程包括:
从内存中获取码字,所述码字对应于神经网络模型的第一权重矩阵;
根据所述码字确定所述神经网络模型的权重矩阵为所述第一权重矩阵,并利用训练数据对所述第一权重矩阵进行训练;
在预设停止条件未被满足时,更新所述码字,得到更新后的码字;
将所述更新后的码字存储在所述内存中;
利用在所述内存中获取的所述更新后的码字确定所述神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对所述第二权重矩阵进行训练;
在所述预设停止条件被满足时,停止所述神经网络模型的训练。
需要说明的是,本实施例中的训练数据可以包括正样本图像和负样本图像。其中,正样本图像是指携带有某种特征或具有某种分类结果的待训练图像,而且正样本图像的结果标签值可以为1。负样本图像是指未携带有某种特征或不具有某种分类结果的待训练图像;而且负样本图像的结果标签值可以为0。当前待训练神经网络模型对训练数据的输出结果可以为将训练数据输入当前待训练神经网络模型,当前待训练神经网络模型输出的训练数据为正样本图像的概率值。
在本实施例的一种实现方式中,第一权重矩阵为初始权重矩阵时,所述方法还包括:
将所述初始权重矩阵进行划分,以确定所述初始权重矩阵对应的码字。
在本实施例的一种实现方式中,所述将所述初始权重矩阵进行划分,以确定所述初始权重矩阵对应的码字,包括:
将所述初始权重矩阵划分为k个相同维度的子矩阵;所述k为大于1的正整数;
将所述k个相同维度的子矩阵进行聚类处理,得到所述k个相同维度的子矩阵对应的n个码字,所述n为大于0的正整数,n≤k;
将所述n个码字确定为所述初始权重矩阵对应的码字。
在本实施例的一种实现方式中,所述将所述k个相同维度的子矩阵进行聚类处理,得到所述k个相同维度的子矩阵对应的n个码字,包括:
将所述k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
将所述k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维 向量;
将所述k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到所述第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
在本实施例的一种实现方式中,所述方法还包括:
当所述预设停止条件未被满足时,在所述内存中释放所述神经网络模型的权重矩阵。
在本实施例的一种实现方式中,所述在预设停止条件未被满足时,更新所述码字,得到更新后的码字,包括:
在预设停止条件未被满足时,确定所述神经网络模型的第一权重矩阵的权重梯度;
根据所述第一权重梯度,确定码字梯度,并根据所述码字梯度,确定更新后的码字。
在本实施例的一种实现方式中,所述根据所述权重梯度,确定码字梯度,并根据所述码字梯度,确定更新后的码字,包括:
将所述权重梯度中属于第j个码字对应的子矩阵的权重梯度进行加权求和,得到所述第j个码字对应的码字梯度;其中,j分别取1到n的整数;
对所述第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
利用所述第j个码字的更新量对所述第j个码字进行更新,得到更新后的第j个码字。
在本实施例的一种实现方式中,所述方法还包括:
获取索引,所述索引为所述码字和所述神经网络模型的权重矩阵之间的对应关系。
在本实施例的一种实现方式中,所述预设停止条件包括以下一项或多项条件:
所述训练数据对应的结果标签值与所述神经网络模型对所述训练数据的输出结果之差低于预设差值;
所述训练数据对应的结果标签值与所述神经网络模型对所述训练数据的输出结果之差的变化率低于预设变化阈值;
所述神经网络模型中的模型参数的更新次数达到预设更新次数;
所述神经网络模型所采用的损失函数的输出值达到预设阈值;所述损失函数用于衡量所述神经网络模型对所述训练数据的输出结果与所述训练数据对应的结果标签值之间的差距。
还需要说明的是,本实施例中关于神经网络模型的具体训练过程的说明可以参见图3所述方法的流程,本实施例在此不再赘述。
本申请实施例利用预先训练好的神经网络模型对待分类图像进行分类,由于该神经网络模型能够达到全局最优,使得该神经网络模型输出的分类结果更准确,进而提高分类结果准确性。
另外,本申请实施例还提供一种文本翻译方法,基于上述实施例提供的神经网络模型训练方法,可以将根据神经网络模型训练方法获得的神经网络模型应用到文本翻 译中,参见图7,该图为本申请实施例提供的一种文本翻译方法的流程图,该方法可以包括:
S701:获取待翻译文本。
S702:将待翻译文本输入训练好的神经网络模型,得到神经网络模型输出的文本翻译结果。
在本实施例中,首先获取待翻译的文本,并将该待翻译的文本输入预先训练好的神经网络模型中,以获得该待翻译文本对应的文本翻译结果。比如将英文文本翻译为中文文本,并通过预先训练好的神经网络模型输出中文翻译结果,或者,将中文文本翻译为德文文本,并通过预先训练好的神经网络模型输出德文文本翻译结果等,本申请不限定翻译的语种。
举例说明:以待翻译文本为英文文本为例,所使用的神经网络模型为可以对待翻译文本进行翻译的模型,通过将待翻译文本(或其对应的特征向量)输入神经网络模型可以获得该英文文本的具体翻译结果。例如,可以翻译出所输入的英文文本的中文文本翻译结果或德文文本翻译结果等。
其中,神经网络模型的训练过程包括:
从内存中获取码字,所述码字对应于神经网络模型的第一权重矩阵;
根据所述码字确定所述神经网络模型的权重矩阵为所述第一权重矩阵,并利用训练数据对所述第一权重矩阵进行训练;
在预设停止条件未被满足时,更新所述码字,得到更新后的码字;
将所述更新后的码字存储在所述内存中;
利用在所述内存中获取的所述更新后的码字确定所述神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对所述第二权重矩阵进行训练;
在所述预设停止条件被满足时,停止所述神经网络模型的训练。
在本实施例的一种实现方式中,第一权重矩阵为初始权重矩阵时,所述方法还包括:
将所述初始权重矩阵进行划分,以确定所述初始权重矩阵对应的码字。
在本实施例的一种实现方式中,所述将所述初始权重矩阵进行划分,以确定所述初始权重矩阵对应的码字,包括:
将所述初始权重矩阵划分为k个相同维度的子矩阵;所述k为大于1的正整数;
将所述k个相同维度的子矩阵进行聚类处理,得到所述k个相同维度的子矩阵对应的n个码字,所述n为大于0的正整数,n≤k;
将所述n个码字确定为所述初始权重矩阵对应的码字。
在本实施例的一种实现方式中,所述将所述k个相同维度的子矩阵进行聚类处理,得到所述k个相同维度的子矩阵对应的n个码字,包括:
将所述k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
将所述k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量;
将所述k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到所述第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
在本实施例的一种实现方式中,所述方法还包括:
当所述预设停止条件未被满足时,在所述内存中释放所述神经网络模型的权重矩阵。
在本实施例的一种实现方式中,所述在预设停止条件未被满足时,更新所述码字,得到更新后的码字,包括:
在预设停止条件未被满足时,确定所述神经网络模型的第一权重矩阵的权重梯度;
根据所述第一权重梯度,确定码字梯度,并根据所述码字梯度,确定更新后的码字。
在本实施例的一种实现方式中,所述根据所述权重梯度,确定码字梯度,并根据所述码字梯度,确定更新后的码字,包括:
将所述权重梯度中属于第j个码字对应的子矩阵的权重梯度进行加权求和,得到所述第j个码字对应的码字梯度;其中,j分别取1到n的整数;
对所述第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
利用所述第j个码字的更新量对所述第j个码字进行更新,得到更新后的第j个码字。
在本实施例的一种实现方式中,所述方法还包括:
获取索引,所述索引为所述码字和所述神经网络模型的权重矩阵之间的对应关系。
在本实施例的一种实现方式中,所述预设停止条件包括以下一项或多项条件:
所述训练数据对应的结果标签值与所述神经网络模型对所述训练数据的输出结果之差低于预设差值;
所述训练数据对应的结果标签值与所述神经网络模型对所述训练数据的输出结果之差的变化率低于预设变化阈值;
所述神经网络模型中的模型参数的更新次数达到预设更新次数;
所述神经网络模型所采用的损失函数的输出值达到预设阈值;所述损失函数用于衡量所述神经网络模型对所述训练数据的输出结果与所述训练数据对应的结果标签值之间的差距。
还需要说明的是,本实施例中关于神经网络模型的具体训练过程的说明可以参见图3所述方法的流程,本实施例在此不再赘述。
本申请实施例利用预先训练好的神经网络模型对待翻译文本进行翻译,由于该神经网络模型能够达到全局最优,使得该神经网络模型输出的翻译结果更准确,进而提高翻译结果准确性。
为便于更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关装置。请参见图8所示,本申请实施例提供了一种神经网络模型训练装置800。该装置800可以包括:第一获取单元801、第一训练单元802、更新单元803、存储单元804、 第二训练单元805和停止单元806。其中,第一获取单元801用于支持装置800执行图3所示实施例中的S301。第一训练单元802用于支持装置800执行图3所示实施例中的S302。更新单元803用于支持装置800执行图3所示实施例中的S303。存储单元804用于支持装置800执行图3所示实施例中的S304。第二训练单元805用于支持装置800执行图3所示实施例中的S305。停止单元806用于支持装置800执行图3所示实施例中的S306。具体的,
第一获取单元801,用于从内存中获取码字,其中,码字对应于神经网络模型的第一权重矩阵;
第一训练单元802,用于根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练;
更新单元803,用于在预设停止条件未被满足时,更新码字,得到更新后的码字;
存储单元804,用于将更新后的码字存储在内存中;
第二训练单元805,用于利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练;
停止单元806,用于在预设停止条件被满足时,停止神经网络模型的训练。
在本实施例的一种实现方式中,第一权重矩阵为初始权重矩阵时,该装置还包括:
划分单元,用于将初始权重矩阵进行划分,以确定初始权重矩阵对应的码字。
在本实施例的一种实现方式中,划分单元包括:
第一划分子单元,用于将初始权重矩阵划分为k个相同维度的子矩阵;其中,k为大于1的正整数;
聚类子单元,用于将k个相同维度的子矩阵进行聚类处理,得到k个相同维度的子矩阵对应的n个码字,其中,n为大于0的正整数,n≤k;
第一确定子单元,用于将n个码字确定为初始权重矩阵对应的码字。
在本实施例的一种实现方式中,聚类子单元包括:
降维子单元,用于将k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
第二划分子单元,用于将所述k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量;
计算子单元,用于将所述k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到所述第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
在本实施例的一种实现方式中,该装置还包括:
释放单元,用于当预设停止条件未被满足时,在内存中释放神经网络模型的权重矩阵。
在本实施例的一种实现方式中,更新单元803包括:
第二确定子单元,用于在预设停止条件未被满足时,确定神经网络模型的第一权重矩阵的权重梯度;
第三确定子单元,用于根据第一权重梯度,确定码字梯度,并根据码字梯度,确定更新后的码字。
在本实施例的一种实现方式中,第三确定子单元包括:
第一获得子单元,用于将权重梯度中属于第j个码字的索引编号对应的子矩阵的权重梯度进行加权求和,得到第j个码字对应的码字梯度;其中,j分别取1到n的整数;
第二获得子单元,用于对第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
第三获得子单元,用于利用第j个码字的更新量对第j个码字进行更新,得到更新后的第j个码字。
在本实施例的一种实现方式中,该装置还包括:
第二获取单元,用于获取索引,其中,索引为码字和神经网络模型的权重矩阵之间的对应关系。
在本实施例的一种实现方式中,预设停止条件包括以下一项或多项条件:
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差低于预设差值;
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差的变化率低于预设变化阈值;
神经网络模型中的模型参数的更新次数达到预设更新次数;
神经网络模型所采用的损失函数的输出值达到预设阈值;其中,损失函数用于衡量神经网络模型对所述训练数据的输出结果与所述训练数据对应的结果标签值之间的差距。
综上,本实施例提供的一种神经网络模型训练装置,对神经网络模型进行训练时,首先从内存中获取对应于神经网络模型的第一权重矩阵的码字,然后根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对该第一权重矩阵进行训练,在预设停止条件未被满足时,对码字进行更新,得到更新后的码字,并将更新后的码字存储在内存中,接着,利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练,进而在预设停止条件被满足时,停止神经网络模型的训练。可见,本申请实施例在对神经网络模型进行训练时,不再直接从内存中读入权重矩阵,而是读入权重矩阵对应的码字,用以构成权重矩阵进行训练,由于码字占据的内存空间要远远小于权重矩阵占据的内存空间,所以能够大幅降低从内存中读入的数据量,克服了内存瓶颈问题。并且,由于本申请在模型训练过程中,不再计算权重矩阵的更新量,而是计算码字的更新量,用以重新确定新的权重矩阵进行后续训练,从而能够减少更新过程中的中间参数计算量,进而使得在资源受限的场景下能够顺利进行神经网络模型的训练。
请参见图9所示,本申请实施例还提供了一种图像分类装置900。该装置900可以包括:图像获取单元901、图像分类单元902和神经网络模型训练单元903。其中,图像获取单元901用于支持装置900执行图6所示实施例中的S601。图像分类单元902 用于支持装置900执行图6所示实施例中的S602。神经网络模型训练单元903用于支持装置900执行图3所示实施例中的S301-S306。具体的,
图像获取单元901,用于获取待分类图像;
图像分类单元902,用于将待分类图像输入训练好的神经网络模型,得到神经网络模型输出的图像分类结果;
神经网络模型训练单元903,用于训练神经网络模型;
其中,神经网络模型训练单元903包括:
第一获取单元,用于从内存中获取码字,其在,码字对应于神经网络模型的第一权重矩阵;
第一训练单元,用于根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练;其中,训练数据包括正样本图像和负样本图像;
更新单元,用于当神经网络模型输出训练数据为正样本图像的概率值后,在预设停止条件未被满足时,更新码字,得到更新后的码字;
存储单元,用于将更新后的码字存储在内存中;
第二训练单元,用于利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练;
停止单元,用于在预设停止条件被满足时,停止神经网络模型的训练。
在本实施例的一种实现方式中,第一权重矩阵为初始权重矩阵时,该装置还包括:
划分单元,用于将初始权重矩阵进行划分,以确定初始权重矩阵对应的码字。
在本实施例的一种实现方式中,划分单元包括:
第一划分子单元,用于将初始权重矩阵划分为k个相同维度的子矩阵;其中,k为大于1的正整数;
聚类子单元,用于将k个相同维度的子矩阵进行聚类处理,得到k个相同维度的子矩阵对应的n个码字,其中,n为大于0的正整数,n≤k;
第一确定子单元,用于将n个码字确定为初始权重矩阵对应的码字。
在本实施例的一种实现方式中,聚类子单元包括:
降维子单元,用于将k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
第二划分子单元,用于将所述k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量;
计算子单元,用于将所述k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到所述第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
在本实施例的一种实现方式中,该装置还包括:
释放单元,用于当预设停止条件未被满足时,在内存中释放神经网络模型的权重矩阵。
在本实施例的一种实现方式中,更新单元包括:
第二确定子单元,用于在预设停止条件未被满足时,确定神经网络模型的第一权重矩阵的权重梯度;
第三确定子单元,用于根据第一权重梯度,确定码字梯度,并根据码字梯度,确定更新后的码字。
在本实施例的一种实现方式中,第三确定子单元包括:
第一获得子单元,用于将权重梯度中属于第j个码字的索引编号对应的子矩阵的权重梯度进行加权求和,得到第j个码字对应的码字梯度;其中,j分别取1到n的整数;
第二获得子单元,用于对第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
第三获得子单元,用于利用第j个码字的更新量对第j个码字进行更新,得到更新后的第j个码字。
在本实施例的一种实现方式中,该装置还包括:
第二获取单元,用于获取索引,其中,索引为码字和神经网络模型的权重矩阵之间的对应关系。
在本实施例的一种实现方式中,预设停止条件包括以下一项或多项条件:
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差低于预设差值;
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差的变化率低于预设变化阈值;
神经网络模型中的模型参数的更新次数达到预设更新次数;
神经网络模型所采用的损失函数的输出值达到预设阈值;其中,损失函数用于衡量神经网络模型对训练数据的输出结果与训练数据对应的结果标签值之间的差距。
请参见图10所示,本申请实施例还提供了一种文本翻译装置1000。该装置1000可以包括:文本获取单元1001、文本翻译单元1002和神经网络模型训练单元1003。其中,文本获取单元1001用于支持装置1000执行图7所示实施例中的S701。文本翻译单元1002用于支持装置1000执行图7所示实施例中的S702。神经网络模型训练单元1003用于支持装置1000执行图3所示实施例中的S301-S306。具体的,
文本获取单元1001,用于获取待翻译文本;
文本翻译单元1002,用于将待翻译文本输入训练好的神经网络模型,得到神经网络模型输出的文本翻译结果;
神经网络模型训练单元1003,用于训练神经网络模型;
其中,神经网络模型训练单元1003包括:
第一获取单元,用于从内存中获取码字,其中,码字对应于神经网络模型的第一权重矩阵;
第一训练单元,用于根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对第一权重矩阵进行训练;其中,训练数据为样本文本;
更新单元,用于当神经网络模型输出样本文本的翻译结果后,在预设停止条件未被满足时,更新码字,得到更新后的码字;
存储单元,用于将更新后的码字存储在内存中;
第二训练单元,用于利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对神经网络模型的第二权重矩阵进行训练;
停止单元,用于在预设停止条件被满足时,停止神经网络模型的训练。
在本实施例的一种实现方式中,第一权重矩阵为初始权重矩阵时,该装置还包括:
划分单元,用于将初始权重矩阵进行划分,以确定初始权重矩阵对应的码字。
在本实施例的一种实现方式中,划分单元包括:
第一划分子单元,用于将初始权重矩阵划分为k个相同维度的子矩阵;其中,k为大于1的正整数;
聚类子单元,用于将k个相同维度的子矩阵进行聚类处理,得到k个相同维度的子矩阵对应的n个码字,其中,n为大于0的正整数,n≤k;
第一确定子单元,用于将n个码字确定为初始权重矩阵对应的码字。
在本实施例的一种实现方式中,聚类子单元包括:
降维子单元,用于将k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
第二划分子单元,用于将所述k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量;
计算子单元,用于将所述k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到所述第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
在本实施例的一种实现方式中,该装置还包括:
释放单元,用于当预设停止条件未被满足时,在内存中释放神经网络模型的权重矩阵。
在本实施例的一种实现方式中,更新单元包括:
第二确定子单元,用于在预设停止条件未被满足时,确定神经网络模型的第一权重矩阵的权重梯度;
第三确定子单元,用于根据第一权重梯度,确定码字梯度,并根据码字梯度,确定更新后的码字。
在本实施例的一种实现方式中,第三确定子单元包括:
第一获得子单元,用于将权重梯度中属于第j个码字的索引编号对应的子矩阵的权重梯度进行加权求和,得到第j个码字对应的码字梯度;其中,j分别取1到n的整数;
第二获得子单元,用于对第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
第三获得子单元,用于利用第j个码字的更新量对第j个码字进行更新,得到更新后的第j个码字。
在本实施例的一种实现方式中,该装置还包括:
第二获取单元,用于获取索引,其中,索引为码字和神经网络模型的权重矩阵之间的对应关系。
在本实施例的一种实现方式中,预设停止条件包括以下一项或多项条件:
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差低于预设差值;
训练数据对应的结果标签值与神经网络模型对训练数据的输出结果之差的变化率低于预设变化阈值;
神经网络模型中的模型参数的更新次数达到预设更新次数;
神经网络模型所采用的损失函数的输出值达到预设阈值;其中,损失函数用于衡量神经网络模型对训练数据的输出结果与训练数据对应的结果标签值之间的差距。
参见图11,本申请实施例提供了一种神经网络模型训练设备1100,该设备包括存储器1101、处理器1102和通信接口1103,
存储器1101,用于存储指令;
处理器1102,用于执行存储器1101中的指令,执行上述应用于图3所示实施例中的神经网络模型训练方法;
通信接口1103,用于进行通信。
存储器1101、处理器1102和通信接口1103通过总线1104相互连接;总线1104可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在具体实施例中,处理器1102用于在对神经网络模型进行训练时,首先从内存中获取对应于神经网络模型的第一权重矩阵的码字,然后根据码字确定神经网络模型的权重矩阵为第一权重矩阵,并利用训练数据对该第一权重矩阵进行训练,在预设停止条件未被满足时,对码字进行更新,得到更新后的码字,并将更新后的码字存储在内存中,接着,利用在内存中获取的更新后的码字确定神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对第二权重矩阵进行训练,进而在预设停止条件被满足时,停止神经网络模型的训练。该处理器1102的详细处理过程请参考上述图3所示实施例中S301、S302、S303、S304、S305和S306的详细描述,这里不再赘述。
上述存储器1101可以是随机存取存储器(random-access memory,RAM)、闪存(flash)、只读存储器(read only memory,ROM)、可擦写可编程只读存储器(erasable programmable read only memory,EPROM)、电可擦除可编程只读存储器(electrically erasable programmable read only memory,EEPROM)、寄存器(register)、硬盘、移动硬盘、CD-ROM或者本领域技术人员知晓的任何其他形式的存储介质。
上述处理器1102例如可以是中央处理器(central processing unit,CPU)、通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific  integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请实施例公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。
上述通信接口1103例如可以是接口卡等,可以为以太(ethernet)接口或异步传输模式(asynchronous transfer mode,ATM)接口。
参见图12,本申请实施例提供了一种图像分类设备1200,该设备包括存储器1201、处理器1202和通信接口1203,
存储器1201,用于存储指令;
处理器1202,用于执行存储器1201中的指令,执行上述应用于图6所示实施例中的图像分类方法;
通信接口1203,用于进行通信。
存储器1201、处理器1202和通信接口1203通过总线1204相互连接;总线1204可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在具体实施例中,处理器1202用于在对图像进行分类时,首先获取待分类的图像,并将该待分类的图像输入预先训练好的神经网络模型中,以获得该待分类的图像对应的图像分类结果。该处理器1202的详细处理过程请参考上述图6所示实施例中S601、S602和S603的详细描述,这里不再赘述。
上述存储器1201可以是随机存取存储器(random-access memory,RAM)、闪存(flash)、只读存储器(read only memory,ROM)、可擦写可编程只读存储器(erasable programmable read only memory,EPROM)、电可擦除可编程只读存储器(electrically erasable programmable read only memory,EEPROM)、寄存器(register)、硬盘、移动硬盘、CD-ROM或者本领域技术人员知晓的任何其他形式的存储介质。
上述处理器1202例如可以是中央处理器(central processing unit,CPU)、通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请实施例公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。
上述通信接口1203例如可以是接口卡等,可以为以太(ethernet)接口或异步传输模式(asynchronous transfer mode,ATM)接口。
参见图13,本申请实施例提供了一种文本翻译设备1300,该设备包括存储器1301、 处理器1302和通信接口1303,
存储器1301,用于存储指令;
处理器1302,用于执行存储器1301中的指令,执行上述应用于图7所示实施例中的图像分类方法;
通信接口1303,用于进行通信。
存储器1301、处理器1302和通信接口1303通过总线1304相互连接;总线1304可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在具体实施例中,处理器1302用于在对文本进行翻译时,首先获取待翻译的文本,并将该待翻译的文本输入预先训练好的神经网络模型中,以获得该待翻译文本对应的文本翻译结果。该处理器1302的详细处理过程请参考上述图7所示实施例中S701、S702和S703的详细描述,这里不再赘述。
上述存储器1301可以是随机存取存储器(random-access memory,RAM)、闪存(flash)、只读存储器(read only memory,ROM)、可擦写可编程只读存储器(erasable programmable read only memory,EPROM)、电可擦除可编程只读存储器(electrically erasable programmable read only memory,EEPROM)、寄存器(register)、硬盘、移动硬盘、CD-ROM或者本领域技术人员知晓的任何其他形式的存储介质。
上述处理器1302例如可以是中央处理器(central processing unit,CPU)、通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请实施例公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。
上述通信接口1303例如可以是接口卡等,可以为以太(ethernet)接口或异步传输模式(asynchronous transfer mode,ATM)接口。
本申请实施例还提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述实施例所述的神经网络模型训练方法的任一实施方式,或者执行如上述实施例所述的图像分类方法,或者执行上述实施例所述的文本翻译方法。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其 它单元。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (26)

  1. 一种神经网络模型训练方法,其特征在于,所述方法包括:
    从内存中获取码字,所述码字对应于神经网络模型的第一权重矩阵;
    根据所述码字确定所述神经网络模型的权重矩阵为所述第一权重矩阵,并利用训练数据对所述第一权重矩阵进行训练;
    在预设停止条件未被满足时,更新所述码字,得到更新后的码字;
    将所述更新后的码字存储在所述内存中;
    利用在所述内存中获取的所述更新后的码字确定所述神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对所述第二权重矩阵进行训练;
    在所述预设停止条件被满足时,停止所述神经网络模型的训练。
  2. 根据权利要求1所述的方法,其特征在于,所述第一权重矩阵为初始权重矩阵时,所述方法还包括:
    将所述初始权重矩阵进行划分,以确定所述初始权重矩阵对应的码字。
  3. 根据权利要求2所述的方法,其特征在于,所述将所述初始权重矩阵进行划分,以确定所述初始权重矩阵对应的码字,包括:
    将所述初始权重矩阵划分为k个相同维度的子矩阵;所述k为大于1的正整数;
    将所述k个相同维度的子矩阵进行聚类处理,得到所述k个相同维度的子矩阵对应的n个码字,所述n为大于0的正整数,n≤k;
    将所述n个码字确定为所述初始权重矩阵对应的码字。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述k个相同维度的子矩阵进行聚类处理,得到所述k个相同维度的子矩阵对应的n个码字,包括:
    将所述k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
    将所述k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量;
    将所述k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到所述第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当所述预设停止条件未被满足时,在所述内存中释放所述神经网络模型的权重矩阵。
  6. 根据权利要求1所述的方法,其特征在于,所述在预设停止条件未被满足时,更新所述码字,得到更新后的码字,包括:
    在预设停止条件未被满足时,确定所述神经网络模型的第一权重矩阵的权重梯度;
    根据所述第一权重梯度,确定码字梯度,并根据所述码字梯度,确定更新后的码字。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述权重梯度,确定码字梯度,并根据所述码字梯度,确定更新后的码字,包括:
    将所述权重梯度中属于第j个码字对应的子矩阵的权重梯度进行加权求和,得到所述第j个码字对应的码字梯度;其中,j分别取1到n的整数;
    对所述第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
    利用所述第j个码字的更新量对所述第j个码字进行更新,得到更新后的第j个码字。
  8. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取索引,所述索引为所述码字和所述神经网络模型的权重矩阵之间的对应关系。
  9. 根据权利要求1至8任一所述的方法,其特征在于,所述预设停止条件包括以下一项或多项条件:
    所述训练数据对应的结果标签值与所述神经网络模型对所述训练数据的输出结果之差低于预设差值;
    所述训练数据对应的结果标签值与所述神经网络模型对所述训练数据的输出结果之差的变化率低于预设变化阈值;
    所述神经网络模型中的模型参数的更新次数达到预设更新次数;
    所述神经网络模型所采用的损失函数的输出值达到预设阈值;所述损失函数用于衡量所述神经网络模型对所述训练数据的输出结果与所述训练数据对应的结果标签值之间的差距。
  10. 一种图像分类方法,其特征在于,所述方法包括:
    获取待分类图像;
    将所述待分类图像输入训练好的神经网络模型,得到所述神经网络模型输出的图像分类结果;
    所述神经网络模型的训练过程包括:
    从内存中获取码字,所述码字对应于神经网络模型的第一权重矩阵;
    根据所述码字确定所述神经网络模型的权重矩阵为所述第一权重矩阵,并利用训练数据对所述第一权重矩阵进行训练;所述训练数据包括正样本图像和负样本图像;
    当所述神经网络模型输出所述训练数据为正样本图像的概率值后,在预设停止条件未被满足时,更新所述码字,得到更新后的码字;
    将所述更新后的码字存储在所述内存中;
    利用在所述内存中获取的所述更新后的码字确定所述神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对所述第二权重矩阵进行训练;
    在所述预设停止条件被满足时,停止所述神经网络模型的训练。
  11. 一种文本翻译方法,其特征在于,所述方法包括:
    获取待翻译文本;
    将所述待翻译文本输入训练好的神经网络模型,得到所述神经网络模型输出的文本翻译结果;
    所述神经网络模型的训练过程包括:
    从内存中获取码字,所述码字对应于神经网络模型的第一权重矩阵;
    根据所述码字确定所述神经网络模型的权重矩阵为所述第一权重矩阵,并利用训练数据对所述第一权重矩阵进行训练;所述训练数据为样本文本;
    当所述神经网络模型输出所述样本文本的翻译结果后,在预设停止条件未被满足时,更新所述码字,得到更新后的码字;
    将所述更新后的码字存储在所述内存中;
    利用在所述内存中获取的所述更新后的码字确定所述神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对所述神经网络模型的第二权重矩阵进行训练;
    在所述预设停止条件被满足时,停止所述神经网络模型的训练。
  12. 一种神经网络模型训练装置,其特征在于,所述装置包括:
    第一获取单元,用于从内存中获取码字,所述码字对应于神经网络模型的第一权重矩阵;
    第一训练单元,用于根据所述码字确定所述神经网络模型的权重矩阵为所述第一权重矩阵,并利用训练数据对所述第一权重矩阵进行训练;
    更新单元,用于在预设停止条件未被满足时,更新所述码字,得到更新后的码字;
    存储单元,用于将所述更新后的码字存储在所述内存中;
    第二训练单元,用于利用在所述内存中获取的所述更新后的码字确定所述神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对所述第二权重矩阵进行训练;
    停止单元,用于在所述预设停止条件被满足时,停止所述神经网络模型的训练。
  13. 根据权利要求11所述的装置,其特征在于,所述第一权重矩阵为初始权重矩阵时,所述装置还包括:
    划分单元,用于将所述初始权重矩阵进行划分,以确定所述初始权重矩阵对应的码字。
  14. 根据权利要求13所述的装置,其特征在于,所述划分单元包括:
    第一划分子单元,用于将所述初始权重矩阵划分为k个相同维度的子矩阵;所述k为大于1的正整数;
    聚类子单元,用于将所述k个相同维度的子矩阵进行聚类处理,得到所述k个相同维度的子矩阵对应的n个码字,所述n为大于0的正整数,n≤k;
    第一确定子单元,用于将所述n个码字确定为所述初始权重矩阵对应的码字。
  15. 根据权利要求14所述的装置,其特征在于,所述聚类子单元包括:
    降维子单元,用于将所述k个相同维度的子矩阵分别降维成一维向量,得到k个一维向量;
    第二划分子单元,用于将所述k个一维向量划分为n个向量组,其中,每个向量组中包含至少一个一维向量;
    计算子单元,用于将所述k个一维向量中属于第i个向量组的所有一维向量中对应位置的元素值进行求平均计算,得到所述第i个向量组中所有一维向量对应的一个码字;其中,i分别取1到n的整数。
  16. 根据权利要求12所述的装置,其特征在于,所述装置还包括:
    释放单元,用于当所述预设停止条件未被满足时,在所述内存中释放所述神经网络模型的权重矩阵。
  17. 根据权利要求12所述的装置,其特征在于,所述更新单元包括:
    第二确定子单元,用于在预设停止条件未被满足时,确定所述神经网络模型的第一权重矩阵的权重梯度;
    第三确定子单元,用于根据所述第一权重梯度,确定码字梯度,并根据所述码字梯度,确定更新后的码字。
  18. 根据权利要求17所述的装置,其特征在于,所述第三确定子单元包括:
    第一获得子单元,用于将所述权重梯度中属于第j个码字对应的子矩阵的权重梯度进行加权求和,得到所述第j个码字对应的码字梯度;其中,j分别取1到n的整数;
    第二获得子单元,用于对所述第j个码字对应的码字梯度进行优化处理,得到第j个码字的更新量;
    第三获得子单元,用于利用所述第j个码字的更新量对所述第j个码字进行更新,得到更新后的第j个码字。
  19. 根据权利要求12所述的装置,其特征在于,所述装置还包括:
    第二获取单元,用于获取索引,所述索引为所述码字和所述神经网络模型的权重矩阵之间的对应关系。
  20. 根据权利要求12至19任一所述的装置,其特征在于,所述预设停止条件包括以下一项或多项条件:
    所述训练数据对应的结果标签值与所述神经网络模型对所述训练数据的输出结果之差低于预设差值;
    所述训练数据对应的结果标签值与所述神经网络模型对所述训练数据的输出结果之差的变化率低于预设变化阈值;
    所述神经网络模型中的模型参数的更新次数达到预设更新次数;
    所述神经网络模型所采用的损失函数的输出值达到预设阈值;所述损失函数用于衡量所述神经网络模型对所述训练数据的输出结果与所述训练数据对应的结果标签值之间的差距。
  21. 一种图像分类装置,其特征在于,所述装置包括:
    图像获取单元,用于获取待分类图像;
    图像分类单元,用于将所述待分类图像输入训练好的神经网络模型,得到所述神经网络模型输出的图像分类结果;
    神经网络模型训练单元,用于训练所述神经网络模型;
    所述神经网络模型训练单元包括:
    第一获取单元,用于从内存中获取码字,所述码字对应于神经网络模型的第一权重矩阵;
    第一训练单元,用于根据所述码字确定所述神经网络模型的权重矩阵为所述第一权重矩阵,并利用训练数据对所述第一权重矩阵进行训练;所述训练数据包括正样本图像和负样本图像;
    更新单元,用于当所述神经网络模型输出所述训练数据为正样本图像的概率值后,在预设停止条件未被满足时,更新所述码字,得到更新后的码字;
    存储单元,用于将所述更新后的码字存储在所述内存中;
    第二训练单元,用于利用在所述内存中获取的所述更新后的码字确定所述神经网络模 型的权重矩阵为第二权重矩阵,并利用训练数据对所述第二权重矩阵进行训练;
    停止单元,用于在所述预设停止条件被满足时,停止所述神经网络模型的训练。
  22. 一种文本翻译装置,其特征在于,所述装置包括:
    文本获取单元,用于获取待翻译文本;
    文本翻译单元,用于将所述待翻译文本输入训练好的神经网络模型,得到所述神经网络模型输出的文本翻译结果;
    神经网络模型训练单元,用于训练所述神经网络模型;
    所述神经网络模型训练单元包括:
    第一获取单元,用于从内存中获取码字,所述码字对应于神经网络模型的第一权重矩阵;
    第一训练单元,用于根据所述码字确定所述神经网络模型的权重矩阵为所述第一权重矩阵,并利用训练数据对所述第一权重矩阵进行训练;所述训练数据为样本文本;
    更新单元,用于当所述神经网络模型输出所述样本文本的翻译结果后,在预设停止条件未被满足时,更新所述码字,得到更新后的码字;
    存储单元,用于将所述更新后的码字存储在所述内存中;
    第二训练单元,用于利用在所述内存中获取的所述更新后的码字确定所述神经网络模型的权重矩阵为第二权重矩阵,并利用训练数据对所述神经网络模型的第二权重矩阵进行训练;
    停止单元,用于在所述预设停止条件被满足时,停止所述神经网络模型的训练。
  23. 一种神经网络模型训练设备,其特征在于,所述设备包括存储器、处理器;
    所述存储器,用于存储指令;
    所述处理器,用于执行所述存储器中的所述指令,执行权利要求1-9任意一项所述的神经网络模型训练方法。
  24. 一种图像分类设备,其特征在于,所述设备包括存储器、处理器;
    所述存储器,用于存储指令;
    所述处理器,用于执行所述存储器中的所述指令,执行权利要求10所述的图像分类方法。
  25. 一种文本翻译设备,其特征在于,所述设备包括存储器、处理器;
    所述存储器,用于存储指令;
    所述处理器,用于执行所述存储器中的所述指令,执行权利要求11所述的文本翻译方法。
  26. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得所述计算机执行以上权利要求1-9任意一项所述的神经网络模型训练方法,或者实现如权利要求10所述的图像分类方法,或者实现如权利要求11所述的文本翻译方法。
PCT/CN2021/086589 2020-06-18 2021-04-12 神经网络模型训练、图像分类、文本翻译方法及装置、设备 WO2021253941A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21825658.4A EP4152211A4 (en) 2020-06-18 2021-04-12 NEURAL NETWORK MODEL TRAINING METHOD, IMAGE CLASSIFICATION METHOD, TEXT TRANSLATION METHOD AND APPARATUS, AND DEVICE
US18/068,450 US20230120631A1 (en) 2020-06-18 2022-12-19 Neural network model training method, apparatus, and device, image classification method, apparatus, and device, and text translation method, apparatus, and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010558711.6 2020-06-18
CN202010558711.6A CN113822410A (zh) 2020-06-18 2020-06-18 神经网络模型训练、图像分类、文本翻译方法及装置、设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/068,450 Continuation US20230120631A1 (en) 2020-06-18 2022-12-19 Neural network model training method, apparatus, and device, image classification method, apparatus, and device, and text translation method, apparatus, and device

Publications (2)

Publication Number Publication Date
WO2021253941A1 true WO2021253941A1 (zh) 2021-12-23
WO2021253941A9 WO2021253941A9 (zh) 2022-09-15

Family

ID=78911660

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/086589 WO2021253941A1 (zh) 2020-06-18 2021-04-12 神经网络模型训练、图像分类、文本翻译方法及装置、设备

Country Status (4)

Country Link
US (1) US20230120631A1 (zh)
EP (1) EP4152211A4 (zh)
CN (1) CN113822410A (zh)
WO (1) WO2021253941A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540774A (zh) * 2022-07-28 2024-02-09 华为技术有限公司 数据处理方法及装置
US20240127000A1 (en) * 2022-09-30 2024-04-18 Huawei Technologies Co., Ltd. Method and system for training large-scale language models
CN116681973B (zh) * 2023-08-03 2023-11-03 浪潮电子信息产业股份有限公司 一种图像处理方法、装置、系统、设备及计算机存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832835A (zh) * 2017-11-14 2018-03-23 贵阳海信网络科技有限公司 一种卷积神经网络的轻量化方法及装置
CN107886164A (zh) * 2017-12-20 2018-04-06 东软集团股份有限公司 一种卷积神经网络训练、测试方法及训练、测试装置
CN107977704A (zh) * 2017-11-10 2018-05-01 中国科学院计算技术研究所 权重数据存储方法和基于该方法的神经网络处理器
CN108629772A (zh) * 2018-05-08 2018-10-09 上海商汤智能科技有限公司 图像处理方法及装置、计算机设备和计算机存储介质
CN109063666A (zh) * 2018-08-14 2018-12-21 电子科技大学 基于深度可分离卷积的轻量化人脸识别方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11531859B2 (en) * 2017-08-08 2022-12-20 Samsung Electronics Co., Ltd. System and method for hashed compressed weighting matrix in neural networks
KR102434728B1 (ko) * 2017-10-20 2022-08-19 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 처리방법 및 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977704A (zh) * 2017-11-10 2018-05-01 中国科学院计算技术研究所 权重数据存储方法和基于该方法的神经网络处理器
CN107832835A (zh) * 2017-11-14 2018-03-23 贵阳海信网络科技有限公司 一种卷积神经网络的轻量化方法及装置
CN107886164A (zh) * 2017-12-20 2018-04-06 东软集团股份有限公司 一种卷积神经网络训练、测试方法及训练、测试装置
CN108629772A (zh) * 2018-05-08 2018-10-09 上海商汤智能科技有限公司 图像处理方法及装置、计算机设备和计算机存储介质
CN109063666A (zh) * 2018-08-14 2018-12-21 电子科技大学 基于深度可分离卷积的轻量化人脸识别方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4152211A4

Also Published As

Publication number Publication date
CN113822410A (zh) 2021-12-21
US20230120631A1 (en) 2023-04-20
WO2021253941A9 (zh) 2022-09-15
EP4152211A1 (en) 2023-03-22
EP4152211A4 (en) 2023-11-22

Similar Documents

Publication Publication Date Title
WO2020228376A1 (zh) 文本处理方法、模型训练方法和装置
WO2021253941A1 (zh) 神经网络模型训练、图像分类、文本翻译方法及装置、设备
US11030997B2 (en) Slim embedding layers for recurrent neural language models
US11790212B2 (en) Quantization-aware neural architecture search
WO2022007823A1 (zh) 一种文本数据处理方法及装置
WO2022057776A1 (zh) 一种模型压缩方法及装置
JP7360497B2 (ja) クロスモーダルな特徴の抽出方法、抽出装置、ならびに、プログラム
US11900260B2 (en) Methods, devices and media providing an integrated teacher-student system
US11604960B2 (en) Differential bit width neural architecture search
WO2022156561A1 (zh) 一种自然语言处理方法以及装置
CN113297975A (zh) 表格结构识别的方法、装置、存储介质及电子设备
CN113168563A (zh) 用于神经网络的残差量化
CN111091175A (zh) 神经网络模型训练方法、分类方法、装置和电子设备
CN111144124B (zh) 机器学习模型的训练方法、意图识别方法及相关装置、设备
US20220230048A1 (en) Neural Architecture Scaling For Hardware Accelerators
WO2021238333A1 (zh) 一种文本处理网络、神经网络训练的方法以及相关设备
CN110781686B (zh) 一种语句相似度计算方法、装置及计算机设备
WO2021057884A1 (zh) 语句复述方法、训练语句复述模型的方法及其装置
CN111079753A (zh) 一种基于深度学习与大数据结合的车牌识别方法及装置
CN113378938B (zh) 一种基于边Transformer图神经网络的小样本图像分类方法及系统
WO2022100607A1 (zh) 一种神经网络结构确定方法及其装置
CN113469338B (zh) 模型训练方法、模型训练装置、终端设备及存储介质
WO2022154829A1 (en) Neural architecture scaling for hardware accelerators
CN113095072A (zh) 文本处理方法及装置
CN110852066B (zh) 一种基于对抗训练机制的多语言实体关系抽取方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21825658

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021825658

Country of ref document: EP

Effective date: 20221215

NENP Non-entry into the national phase

Ref country code: DE