WO2022052656A1 - 一种物体识别模型的训练方法、系统及装置 - Google Patents

一种物体识别模型的训练方法、系统及装置 Download PDF

Info

Publication number
WO2022052656A1
WO2022052656A1 PCT/CN2021/109199 CN2021109199W WO2022052656A1 WO 2022052656 A1 WO2022052656 A1 WO 2022052656A1 CN 2021109199 W CN2021109199 W CN 2021109199W WO 2022052656 A1 WO2022052656 A1 WO 2022052656A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
parameter matrix
deep learning
object recognition
training
Prior art date
Application number
PCT/CN2021/109199
Other languages
English (en)
French (fr)
Inventor
赵旭东
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US18/012,936 priority Critical patent/US20230267710A1/en
Publication of WO2022052656A1 publication Critical patent/WO2022052656A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/48Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present invention relates to the field of model training, in particular to a training method, system and device for an object recognition model.
  • the commonly used training method is: input the face image into a deep learning model, the deep learning model will output a feature vector used to represent the feature information of the input image, and then Multiply this feature vector with a parameter matrix that is linearly related to the total number of identities (used to represent the respective feature information of multiple identities), then calculate the loss function, and finally perform backpropagation of the gradient to update the parameter matrix and depth. Learn all parameters in the model.
  • GPU Graphics Processing Unit, graphics processor
  • the method of model parallelism is usually adopted, that is, the entire parameter matrix is divided into multiple GPUs, and the calculation results are reduced after the calculation is completed on each GPU.
  • the model parallel method is adopted, the problem that the GPU cannot be stored due to the large amount of parameter matrix data cannot be effectively solved; moreover, the large amount of computation on the GPU makes the model training process slow.
  • the object of the present invention is to provide a training method, system and device for an object recognition model.
  • the parameter matrix used for calculation in the model training process is a partial parameter matrix extracted from the original parameter matrix, and the data volume of the extracted partial parameter matrix is This reduces the amount of calculation and speeds up the model training process; moreover, the original parameter matrix is stored in the memory with a large storage space, which effectively solves the problem that the parameter matrix data is too large to be stored.
  • the present invention provides a training method for an object recognition model, including:
  • a parameter matrix composed of a plurality of feature vectors used to represent the feature information of the object is stored in the memory in advance;
  • the sample image is input into the deep learning model for object recognition, and the sample feature vector used to represent the feature information of the sample image is obtained;
  • a loss function is calculated according to the similarity, and gradient backpropagation is performed based on the loss function, the new parameter matrix and the parameters of the deep learning model are updated, and the in-memory is updated based on the updated new parameter matrix.
  • the total parameter matrix to complete the current round of training of the deep learning model.
  • the process of storing a parameter matrix composed of a plurality of feature vectors used to represent the feature information of the object in the memory in advance includes:
  • the training method of the object recognition model further comprises:
  • the sample picture is input into the deep learning model for object recognition, and the sample feature vector used to represent the feature information of the sample picture is obtained; the feature vector corresponding to the sample picture is extracted from the parameter matrix.
  • the process of randomly extracting a certain number of eigenvectors from the remaining parameter matrix, and reconstructing all the extracted eigenvectors into a new parameter matrix including:
  • a certain number of random sample IDs are randomly obtained from the remaining sample IDs, and the feature vectors corresponding to the target sample ID and the random sample ID are extracted from the parameter matrix, and all the extracted feature vectors are reconstructed into a new parameter matrix.
  • the current training process of the deep learning model specifically includes:
  • the sample picture corresponding to the target GPU is input into the deep learning model for object recognition, and the target sample feature vector used to represent the feature information of the sample picture is obtained; wherein, the target GPU is any of the GPUs;
  • the target sample feature vector is multiplied by the new parameter matrix using the target GPU to obtain the target similarity between the target sample feature vector and each feature vector in the new parameter matrix, and according to the target similarity Calculate the target loss function, and perform back-propagation of the gradient based on the target loss function to obtain the new parameter matrix and the gradient of the parameter values to be updated of the deep learning model;
  • the training method of the object recognition model further comprises:
  • the deep learning model is specifically a convolutional neural network model.
  • the present invention also provides a training system for an object recognition model, including:
  • the matrix storage module is used to store the parameter matrix composed of a plurality of feature vectors used to represent the feature information of the object in the memory in advance;
  • a vector acquisition module used for inputting the sample image into the deep learning model for object recognition during the model training process, to obtain a sample feature vector used to represent the feature information of the sample image;
  • the matrix reconstruction module is used to extract the eigenvectors corresponding to the sample pictures from the parameter matrix and randomly extract a certain number of eigenvectors from the remaining parameter matrix, and reconstruct all the extracted eigenvectors into new parameters matrix;
  • a similarity obtaining module configured to multiply the sample eigenvectors with the new parameter matrix to obtain the similarity between the sample eigenvectors and each eigenvector in the new parameter matrix
  • a parameter update module configured to calculate a loss function according to the similarity, and perform back-propagation of the gradient based on the loss function, update the new parameter matrix and the parameters of the deep learning model, and based on the updated new parameters
  • the matrix updates the total parameter matrix in the memory to complete the current round of training of the deep learning model.
  • the matrix saving module is specifically used for:
  • the training system for the object recognition model further includes:
  • the ID configuration module is used to save multiple sample images in the data set in advance, and configure the sample IDs for the multiple sample images one by one;
  • the vector acquisition module is specifically used for:
  • the matrix reconstruction module is specifically used for:
  • a certain number of random sample IDs are randomly obtained from the remaining sample IDs, and the feature vectors corresponding to the target sample ID and the random sample ID are extracted from the parameter matrix, and all the extracted feature vectors are reconstructed into a new parameter matrix.
  • the present invention also provides a training device for an object recognition model, including:
  • the processor is configured to implement the steps of any one of the above object recognition model training methods when executing the computer program.
  • the invention provides a training method for an object recognition model.
  • the parameter matrix is stored in the memory in advance; in the model training process, the sample picture is input into the deep learning model to obtain the sample feature vector; the sample is extracted from the parameter matrix The eigenvector corresponding to the picture and a certain number of eigenvectors are randomly extracted from the remaining parameter matrix, and all the extracted eigenvectors are reconstructed into a new parameter matrix; the sample eigenvector is multiplied by the new parameter matrix, and the sample eigenvector and The similarity of each eigenvector in the new parameter matrix; the loss function is calculated according to the similarity, and the gradient backpropagation is performed based on the loss function, and the parameters of the new parameter matrix and the deep learning model are updated, and the updated new parameter matrix is updated.
  • the total parameter matrix in memory to complete the current round of training of the deep learning model is a partial parameter matrix extracted from the original parameter matrix, and the data volume of the extracted partial parameter matrix is small, thereby reducing the amount of calculation and speeding up the model training process; Moreover, the original parameter matrix is stored in a memory with a large storage space, which effectively solves the problem that the data volume of the parameter matrix is too large to be stored.
  • the present invention also provides a training system and device for an object recognition model, which have the same beneficial effects as the above training method.
  • FIG. 1 is a flowchart of a method for training an object recognition model according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a training method for a face recognition model provided by an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a training system for an object recognition model provided by an embodiment of the present invention.
  • the core of the present invention is to provide a training method, system and device for an object recognition model.
  • the parameter matrix used for calculation in the model training process is a partial parameter matrix extracted from the original parameter matrix, and the data amount of the extracted partial parameter matrix is This reduces the amount of calculation and speeds up the model training process; moreover, the original parameter matrix is stored in the memory with a large storage space, which effectively solves the problem that the parameter matrix data is too large to be stored.
  • FIG. 1 is a flowchart of a training method for an object recognition model provided by an embodiment of the present invention.
  • the training method of the object recognition model includes:
  • Step S1 Pre-store a parameter matrix composed of a plurality of feature vectors used to represent the feature information of the object in the memory.
  • the present application stores a whole parameter matrix composed of a plurality of feature vectors used to represent the feature information of the object in the memory in advance, thereby effectively solving the problem of a whole The problem that the amount of parameter matrix data is too large to be stored.
  • a feature vector in the parameter matrix stored in the memory corresponds to the feature information of a picture, and a whole parameter matrix corresponds to many pictures, which can basically reach 100 million pictures, and the subsequent training is used for the depth of object recognition.
  • the sample images used to learn the model need to be selected from these images.
  • Step S2 During the model training process, the sample image is input into the deep learning model for object recognition, and a sample feature vector used to represent the feature information of the sample image is obtained.
  • the sample pictures required for this round of training are first obtained from the multiple pictures corresponding to the parameter matrix saved in the memory, and then the sample pictures are input into the deep learning model, and the deep learning model will The sample feature vector representing the feature information of the sample image is output for subsequent calculations.
  • Step S3 Extract the eigenvectors corresponding to the sample pictures from the parameter matrix, randomly extract a certain number of eigenvectors from the remaining parameter matrix, and reconstruct all the extracted eigenvectors into a new parameter matrix.
  • the present application reconstructs a new parameter matrix with a relatively small amount of data, thereby reducing the amount of calculation and speeding up Model training process.
  • the process of reconstructing the new parameter matrix on the one hand, the eigenvector (called the first eigenvector) corresponding to the sample picture is extracted from an entire parameter matrix stored in the memory; on the other hand, from the remaining parameter matrix (In the entire parameter matrix stored in the memory, the parameter matrix composed of the remaining eigenvectors after removing the eigenvectors corresponding to the sample images) randomly extracts a certain number of eigenvectors (called the second eigenvectors), and then extracts the All eigenvectors (the first eigenvector + the second eigenvector) of , reconstruct a new parameter matrix for use in subsequent calculations.
  • the eigenvector called the first eigenvector corresponding to the sample picture is extracted from an entire parameter matrix stored in the memory
  • the remaining parameter matrix In the entire parameter matrix stored in the memory, the parameter matrix composed of the remaining eigenvectors after removing the eigenvectors corresponding to the sample images
  • Step S4 Multiply the sample feature vector by the new parameter matrix to obtain the similarity between the sample feature vector and each feature vector in the new parameter matrix.
  • the present application multiplies the sample eigenvector and the new parameter matrix, and calculates the difference between the sample eigenvector and each eigenvector in the new parameter matrix. Similarity.
  • Step S5 Calculate the loss function according to the similarity, and perform back-propagation of the gradient based on the loss function, update the parameters of the new parameter matrix and the deep learning model, and update the total parameter matrix in the memory based on the updated new parameter matrix to complete. This round of training of the deep learning model.
  • the present application can calculate a loss function according to the similarity between the sample eigenvector and each eigenvector in the new parameter matrix, and based on the loss function, the gradient back-propagation is performed, the new parameter matrix is updated, and the new parameter matrix is updated based on the updated new parameter matrix.
  • the total parameter matrix in the memory is updated, and then the back-propagation of the gradient is continued to update the parameters of the deep learning model. This round of training of the deep learning model ends.
  • the deep learning model of the present application can be specifically applied to face recognition.
  • the invention provides a training method for an object recognition model.
  • the parameter matrix is stored in the memory in advance; in the model training process, the sample picture is input into the deep learning model to obtain the sample feature vector; the sample is extracted from the parameter matrix The eigenvector corresponding to the picture and a certain number of eigenvectors are randomly extracted from the remaining parameter matrix, and all the extracted eigenvectors are reconstructed into a new parameter matrix; the sample eigenvector is multiplied by the new parameter matrix, and the sample eigenvector and The similarity of each feature vector in the new parameter matrix; the loss function is calculated according to the similarity, and the gradient backpropagation is performed based on the loss function, and the parameters of the new parameter matrix and the deep learning model are updated, and the updated new parameter matrix is updated The total parameter matrix in memory to complete the current round of training of the deep learning model.
  • the parameter matrix used for calculation in the model training process of the present application is a partial parameter matrix extracted from the original parameter matrix, and the data volume of the extracted partial parameter matrix is small, thereby reducing the amount of calculation and speeding up the model training process; Moreover, the original parameter matrix is stored in a memory with a large storage space, which effectively solves the problem that the data volume of the parameter matrix is too large to be stored.
  • the process of storing a parameter matrix composed of a plurality of feature vectors used to represent feature information of an object in the memory in advance includes:
  • emb_size is used to represent the feature vector of the feature information of a sample picture.
  • Size, cls_size is the total number of sample images.
  • the size of an entire parameter matrix initially stored in the memory is: emb_size ⁇ cls_size, where emb_size is the size of a feature vector, and cls_size is the total number of feature vectors included in the entire parameter matrix.
  • emb_size is the size of a feature vector
  • cls_size is the total number of feature vectors included in the entire parameter matrix.
  • the initial value of the parameter matrix is randomly generated, and a feature vector is used to represent the feature information of a sample image, and an entire parameter matrix corresponds to cls_size images.
  • the data size of the new parameter matrix reconstructed in this application is m ⁇ emb_size ⁇ 4B, where m is the total number of eigenvectors included in the new parameter matrix, and m is much smaller than cls_size.
  • the training method of the object recognition model further includes:
  • the sample image is input into the deep learning model for object recognition, and the sample feature vector used to represent the feature information of the sample image is obtained; the feature vector corresponding to the sample image is extracted from the parameter matrix and the remaining parameter matrix
  • the process of randomly extracting a certain number of eigenvectors and reconstructing all the extracted eigenvectors into a new parameter matrix including:
  • a certain number of random sample IDs are randomly obtained from the remaining sample IDs, and the eigenvectors corresponding to the target sample IDs and random sample IDs are extracted from the parameter matrix, and all the extracted eigenvectors are reconstructed into a new parameter matrix.
  • the present application can save multiple sample pictures corresponding to an entire parameter matrix stored in the memory in the data set in advance, and configure the sample ID (Identity Document, identification number) for the multiple sample pictures one by one, which is equivalent to Each sample image is configured with a label, which facilitates subsequent acquisition of the required sample images.
  • sample ID Identity Document, identification number
  • This application can randomly obtain a batch of sample IDs (called target sample IDs) from all sample IDs, and obtain corresponding samples from the data set based on the target sample IDs
  • the picture (called the target sample picture) is the sample picture required for this round of training of the deep learning model; then the target sample picture is input into the deep learning model to obtain the sample feature vector used to represent the feature information of the target sample picture.
  • the process of obtaining a new parameter matrix for subsequent calculations on the one hand, randomly obtain a batch of sample IDs (called target sample IDs) from all sample IDs; on the other hand, remove the target from the remaining sample IDs (all sample IDs). A certain number of sample IDs (called random sample IDs) are randomly obtained from the sample ID and the remaining sample IDs), and then the feature vector corresponding to the target sample ID and the random sample ID is extracted from a whole parameter matrix stored in the memory, and Reconstitute all the extracted eigenvectors into a new parameter matrix.
  • target sample IDs sample IDs
  • random sample IDs A certain number of sample IDs (called random sample IDs) are randomly obtained from the sample ID and the remaining sample IDs), and then the feature vector corresponding to the target sample ID and the random sample ID is extracted from a whole parameter matrix stored in the memory, and Reconstitute all the extracted eigenvectors into a new parameter matrix.
  • the current training process of the deep learning model specifically includes:
  • the sample picture corresponding to the target GPU is input into the deep learning model for object recognition, and the target sample feature vector used to represent the feature information of the sample picture is obtained; wherein, the target GPU is any GPU;
  • the present application uses multiple GPUs to participate in the training of the deep learning model, and the training process of the deep learning model is: pre-allocate different sample pictures for different GPUs (for example, there are two GPUs participating in the model training, and the sample pictures 1 and 1 are allocated for GPU 1 and Sample picture 2, allocate sample picture 3 and sample picture 4 for GPU 2); input the sample picture corresponding to any GPU (referred to as the target GPU) into the deep learning model to obtain the target sample used to represent the feature information of the sample picture Feature vector; extract the feature vector corresponding to all sample pictures allocated to all GPUs from a whole parameter matrix stored in memory (for example, all sample pictures are sample pictures 1, 2, 3, 4, sample pictures 1, 2, 3 , 4 correspond to eigenvectors 1, 2, 3, 4), and randomly extract a certain number of eigenvectors (such as eigenvectors 5, 6, 7, 8) from the remaining parameter matrix, and extract all the eigenvectors ( For example, the feature vectors 1, 2, 3, 4, 5, 6, 7, 8) reconstruct a new parameter matrix,
  • Update the gradient of the parameter value obtain the average value of the gradient of the parameter value to be updated corresponding to each GPU, and update the new parameter matrix and the parameters of the deep learning model according to the average value of the gradient of the parameter value to be updated, and based on the updated new parameter matrix
  • the total parameter matrix in memory is updated, and the current round of training of the deep learning model ends.
  • the training method of the object recognition model further includes:
  • this application can also judge whether the deep learning model meets the model accuracy requirements for object recognition after completing the previous round of training of the deep learning model; if the deep learning model has met the model accuracy requirements for object recognition, it means that the deep learning model does not need Continue training, and it can be put into use directly, then it is determined that the training of the deep learning model is over; if the deep learning model has not yet met the model accuracy requirements for object recognition, it means that the deep learning model needs to continue training and cannot be put into use directly, then reload the new sample image Input into the deep learning model to enter a new round of training, and the training of the deep learning model will not end until the deep learning model meets the model accuracy requirements for object recognition.
  • the deep learning model is specifically a convolutional neural network model.
  • the deep learning model of the present application may be selected but not limited to convolutional neural network models (eg, models such as ResNet, SqueezeNet, etc.), which are not specifically limited in this application.
  • convolutional neural network models eg, models such as ResNet, SqueezeNet, etc.
  • FIG. 3 is a schematic structural diagram of a training system for an object recognition model according to an embodiment of the present invention.
  • the training system for this object recognition model includes:
  • the matrix storage module 1 is used to pre-store the parameter matrix composed of a plurality of feature vectors used to represent the feature information of the object in the memory;
  • the vector acquisition module 2 is used to input the sample image into the deep learning model for object recognition during the model training process, and obtain the sample feature vector used to represent the feature information of the sample image;
  • the matrix reconstruction module 3 is used to extract the eigenvectors corresponding to the sample pictures from the parameter matrix and randomly extract a certain number of eigenvectors from the remaining parameter matrix, and reconstruct all the extracted eigenvectors into a new parameter matrix;
  • Similarity acquisition module 4 for multiplying the sample feature vector and the new parameter matrix to obtain the similarity between the sample feature vector and each feature vector in the new parameter matrix
  • the parameter update module 5 is used to calculate the loss function according to the similarity, and perform back-propagation of the gradient based on the loss function, update the parameters of the new parameter matrix and the deep learning model, and update the total parameters in the memory based on the updated new parameter matrix matrix to complete the current round of training of the deep learning model.
  • the matrix storage module 1 is specifically used for:
  • emb_size is used to represent the feature vector of the feature information of a sample picture.
  • Size, cls_size is the total number of sample images.
  • the training system of the object recognition model further includes:
  • the ID configuration module is used to save multiple sample images in the data set in advance, and configure the sample IDs for the multiple sample images one by one;
  • the vector acquisition module 2 is specifically used for:
  • the matrix reconstruction module 3 is specifically used for:
  • a certain number of random sample IDs are randomly obtained from the remaining sample IDs, and the eigenvectors corresponding to the target sample IDs and random sample IDs are extracted from the parameter matrix, and all the extracted eigenvectors are reconstructed into a new parameter matrix.
  • the application also provides a training device for an object recognition model, including:
  • the processor is configured to implement the steps of any one of the above object recognition model training methods when executing the computer program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

一种物体识别模型的训练方法、系统及装置,预先将参数矩阵保存在内存中;在模型训练过程中,将样本图片输入至深度学习模型中,得到样本特征向量;从参数矩阵中抽取出样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;将样本特征向量与新参数矩阵相乘,再计算损失函数,然后进行梯度的反向传播,更新新参数矩阵和深度学习模型的参数,且基于更新后的新参数矩阵更新内存中的总参数矩阵。可见,本申请计算使用的参数矩阵的数据量较小,减少了计算量,加快了模型训练过程,且原参数矩阵保存在存储空间较大的内存中,有效解决了参数矩阵数据量过大无法存放的问题。

Description

一种物体识别模型的训练方法、系统及装置
本申请要求于2020年09月11日提交至中国专利局、申请号为202010956031.X、发明名称为“一种物体识别模型的训练方法、系统及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及模型训练领域,特别是涉及一种物体识别模型的训练方法、系统及装置。
背景技术
随着深度学习模型在计算机视觉领域的高速发展,人脸识别技术得到了显著的进展,模型精度基本可以达到人类识别的水平,已经被广泛应用于门禁考勤等应用场景。
在现有的人脸识别模型训练过程中,普遍采用的训练方法为:将人脸图片输入到一个深度学习模型中,深度学习模型会输出一个用于表示输入图片的特征信息的特征向量,然后将这个特征向量与一个与身份总数成线性关系的参数矩阵(用于表示多个身份各自的特征信息)相乘,再进行损失函数的计算,最后进行梯度的反向传播,更新参数矩阵和深度学习模型中的所有参数。
但是,参数矩阵的大小随着身份总数的增加而线性的增大,若每个身份采用128维的向量表示,当身份总数达到十亿时,参数矩阵需要占用接近0.5TB的内存空间(10 9*128*4B=0.5TB),用于模型训练计算的GPU(Graphics Processing Unit,图形处理器)已经无法存放下所有的参数矩阵数据。
目前,在亿级人脸识别模型训练时,通常采用模型并行的方法,也就是将整个参数矩阵拆分到多个GPU上,在每个GPU上完成计算后,将计算结果进行规约。但是,即使采用模型并行的方法,也无法有效解决参数矩阵数据量过大导致GPU无法存放的问题;而且,GPU上的计算量很大,导致模型训练过程较慢。
因此,如何提供一种解决上述技术问题的方案是本领域的技术人员目前需要解决的问题。
发明内容
本发明的目的是提供一种物体识别模型的训练方法、系统及装置,在模型训练过程中计算使用的参数矩阵是一个从原参数矩阵中抽取的部分参数矩阵,抽取的部分参数矩阵的数据量较小,从而减少了计算量,加快了模型训练过程;而且,原参数矩阵保存在存储空间较大的内存中,从而有效解决了参数矩阵数据量过大无法存放的问题。
为解决上述技术问题,本发明提供了一种物体识别模型的训练方法,包括:
预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中;
在模型训练过程中,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述样本图片的特征信息的样本特征向量;
从所述参数矩阵中抽取出所述样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;
将所述样本特征向量与所述新参数矩阵相乘,得到所述样本特征向量与所述新参数矩阵中的各特征向量的相似度;
根据所述相似度计算损失函数,并基于所述损失函数进行梯度的反向传播,更新所述新参数矩阵和所述深度学习模型的参数,且基于更新后的新参数矩阵更新所述内存中的总参数矩阵,以完成所述深度学习模型的本轮训练。
优选地,预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中的过程,包括:
随机初始化一个大小为emb_size×cls_size、用于表示多张样本图片的特征信息的参数矩阵,并将所述参数矩阵保存在内存中;其中,emb_size 为用于表示一张样本图片的特征信息的特征向量的大小,cls_size为样本图片的总数量。
优选地,所述物体识别模型的训练方法还包括:
预先将多张样本图片保存在数据集中,并为多张样本图片一一配置样本ID;
相应的,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述样本图片的特征信息的样本特征向量;从所述参数矩阵中抽取出所述样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵的过程,包括:
从所有样本ID中随机获取一批次目标样本ID,并基于所述目标样本ID从所述数据集中获取对应的目标样本图片;
将目标样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述目标样本图片的特征信息的样本特征向量;
从剩余样本ID中随机获取一定数量的随机样本ID,并从所述参数矩阵中抽取出所述目标样本ID和所述随机样本ID对应的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵。
优选地,所述深度学习模型的本轮训练过程具体包括:
预先为不同GPU分配不同样本图片;
将目标GPU对应的样本图片输入至用于物体识别的深度学习模型中,得到用于表示样本图片的特征信息的目标样本特征向量;其中,所述目标GPU为任一所述GPU;
从所述参数矩阵中抽取出为所有GPU分配的所有样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;
利用所述目标GPU将所述目标样本特征向量与所述新参数矩阵相乘,得到所述目标样本特征向量与所述新参数矩阵中的各特征向量的目标相似度,并根据所述目标相似度计算目标损失函数,且基于所述目标损失函数 进行梯度的反向传播,得到所述新参数矩阵和所述深度学习模型的待更新参数值的梯度;
求取各GPU对应的待更新参数值梯度的平均值,并根据所述待更新参数值梯度的平均值更新所述新参数矩阵和所述深度学习模型的参数,且基于更新后的新参数矩阵更新所述内存中的总参数矩阵,以完成所述深度学习模型的本轮训练。
优选地,所述物体识别模型的训练方法还包括:
在完成所述深度学习模型的上一轮训练之后,判断所述深度学习模型是否满足物体识别的模型精度要求;
若是,则确定所述深度学习模型训练结束;
若否,则重新将新的样本图片输入至用于物体识别的深度学习模型中进入新一轮的训练。
优选地,所述深度学习模型具体为卷积神经网络模型。
为解决上述技术问题,本发明还提供了一种物体识别模型的训练系统,包括:
矩阵保存模块,用于预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中;
向量获取模块,用于在模型训练过程中,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述样本图片的特征信息的样本特征向量;
矩阵重构模块,用于从所述参数矩阵中抽取出所述样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;
相似度获取模块,用于将所述样本特征向量与所述新参数矩阵相乘,得到所述样本特征向量与所述新参数矩阵中的各特征向量的相似度;
参数更新模块,用于根据所述相似度计算损失函数,并基于所述损失函数进行梯度的反向传播,更新所述新参数矩阵和所述深度学习模型的参数,且基于更新后的新参数矩阵更新所述内存中的总参数矩阵,以完成所述深度学习模型的本轮训练。
优选地,所述矩阵保存模块具体用于:
随机初始化一个大小为emb_size×cls_size、用于表示多张样本图片的特征信息的参数矩阵,并将所述参数矩阵保存在内存中;其中,emb_size为用于表示一张样本图片的特征信息的特征向量的大小,cls_size为样本图片的总数量。
优选地,所述物体识别模型的训练系统还包括:
ID配置模块,用于预先将多张样本图片保存在数据集中,并为多张样本图片一一配置样本ID;
相应的,所述向量获取模块具体用于:
从所有样本ID中随机获取一批次目标样本ID,并基于所述目标样本ID从所述数据集中获取对应的目标样本图片;将目标样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述目标样本图片的特征信息的样本特征向量;
所述矩阵重构模块具体用于:
从剩余样本ID中随机获取一定数量的随机样本ID,并从所述参数矩阵中抽取出所述目标样本ID和所述随机样本ID对应的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵。
为解决上述技术问题,本发明还提供了一种物体识别模型的训练装置,包括:
存储器,用于存储计算机程序;
处理器,用于在执行所述计算机程序时实现上述任一种物体识别模型的训练方法的步骤。
本发明提供了一种物体识别模型的训练方法,预先将参数矩阵保存在内存中;在模型训练过程中,将样本图片输入至深度学习模型中,得到样本特征向量;从参数矩阵中抽取出样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;将样本特征向量与新参数矩阵相乘,得到样本特征向量与新参数矩阵中的各特征向量的相似度;根据相似度计算损失函数,并基于损失函数进行梯度的反向传播,更新新参数矩阵和深度学习模型的参数, 且基于更新后的新参数矩阵更新内存中的总参数矩阵,以完成深度学习模型的本轮训练。可见,本申请在模型训练过程中计算使用的参数矩阵是一个从原参数矩阵中抽取的部分参数矩阵,抽取的部分参数矩阵的数据量较小,从而减少了计算量,加快了模型训练过程;而且,原参数矩阵保存在存储空间较大的内存中,从而有效解决了参数矩阵数据量过大无法存放的问题。
本发明还提供了一种物体识别模型的训练系统及装置,与上述训练方法具有相同的有益效果。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对现有技术和实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种物体识别模型的训练方法的流程图;
图2为本发明实施例提供的一种人脸识别模型的训练方法的流程图;
图3为本发明实施例提供的一种物体识别模型的训练系统的结构示意图。
具体实施方式
本发明的核心是提供一种物体识别模型的训练方法、系统及装置,在模型训练过程中计算使用的参数矩阵是一个从原参数矩阵中抽取的部分参数矩阵,抽取的部分参数矩阵的数据量较小,从而减少了计算量,加快了模型训练过程;而且,原参数矩阵保存在存储空间较大的内存中,从而有效解决了参数矩阵数据量过大无法存放的问题。
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提 下所获得的所有其他实施例,都属于本发明保护的范围。
请参照图1,图1为本发明实施例提供的一种物体识别模型的训练方法的流程图。
该物体识别模型的训练方法包括:
步骤S1:预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中。
具体地,考虑到内存的存储空间远大于GPU设备的存储空间,所以本申请提前将由多个用于表示物体特征信息的特征向量构成的一整个参数矩阵保存在内存中,从而有效解决了一整个参数矩阵数据量过大无法存放的问题。
可以理解的是,内存中保存的参数矩阵中的一个特征向量对应表示一张图片的特征信息,一整个参数矩阵对应着很多张图片,基本能达到亿级图片,后续训练用于物体识别的深度学习模型所需使用的样本图片需从这些图片中选择。
步骤S2:在模型训练过程中,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示样本图片的特征信息的样本特征向量。
具体地,在深度学习模型训练过程中,首先从内存中保存的参数矩阵对应的多张图片中获取本轮训练所需的样本图片,然后将样本图片输入至深度学习模型中,深度学习模型会输出表示样本图片的特征信息的样本特征向量,供后续计算使用。
步骤S3:从参数矩阵中抽取出样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵。
具体地,考虑到现有技术参与计算的参数矩阵是内存中保存的一整个参数矩阵,计算量太大,所以本申请重构一个数据量相对较小的新参数矩阵,从而减少计算量,加快模型训练过程。
更具体地,重构新参数矩阵的过程:一方面,从内存中保存的一整个参数矩阵中抽取出样本图片对应的特征向量(称为第一特征向量);另一方 面,从剩余参数矩阵(内存中保存的一整个参数矩阵中,除去样本图片对应的特征向量剩下的特征向量组成的参数矩阵)中随机抽取出一定数量的特征向量(称为第二特征向量),然后将抽取出的所有特征向量(第一特征向量+第二特征向量)重新构成新参数矩阵,供后续计算使用。
步骤S4:将样本特征向量与新参数矩阵相乘,得到样本特征向量与新参数矩阵中的各特征向量的相似度。
具体地,本申请在得到深度学习模型输出的样本特征向量及重构的新参数矩阵之后,将样本特征向量与新参数矩阵相乘,计算得到样本特征向量与新参数矩阵中的各特征向量的相似度。
步骤S5:根据相似度计算损失函数,并基于损失函数进行梯度的反向传播,更新新参数矩阵和深度学习模型的参数,且基于更新后的新参数矩阵更新内存中的总参数矩阵,以完成深度学习模型的本轮训练。
具体地,本申请根据样本特征向量与新参数矩阵中的各特征向量的相似度可计算损失函数,并基于损失函数进行梯度的反向传播,更新新参数矩阵,且基于更新后的新参数矩阵更新内存中的总参数矩阵,然后继续进行梯度的反向传播,更新深度学习模型的参数,深度学习模型的本轮训练结束。
需要说明的是,如图2所示,本申请的深度学习模型可具体应用于人脸识别。
本发明提供了一种物体识别模型的训练方法,预先将参数矩阵保存在内存中;在模型训练过程中,将样本图片输入至深度学习模型中,得到样本特征向量;从参数矩阵中抽取出样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;将样本特征向量与新参数矩阵相乘,得到样本特征向量与新参数矩阵中的各特征向量的相似度;根据相似度计算损失函数,并基于损失函数进行梯度的反向传播,更新新参数矩阵和深度学习模型的参数,且基于更新后的新参数矩阵更新内存中的总参数矩阵,以完成深度学习模型的本轮训练。可见,本申请在模型训练过程中计算使用的参数矩阵是一个从原参数矩阵中抽取的部分参数矩阵,抽取的部分参数矩阵的数据量较 小,从而减少了计算量,加快了模型训练过程;而且,原参数矩阵保存在存储空间较大的内存中,从而有效解决了参数矩阵数据量过大无法存放的问题。
在上述实施例的基础上:
作为一种可选的实施例,预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中的过程,包括:
随机初始化一个大小为emb_size×cls_size、用于表示多张样本图片的特征信息的参数矩阵,并将参数矩阵保存在内存中;其中,emb_size为用于表示一张样本图片的特征信息的特征向量的大小,cls_size为样本图片的总数量。
具体地,内存中初始保存的一整个参数矩阵的大小为:emb_size×cls_size,其中,emb_size为一个特征向量的大小,cls_size为一整个参数矩阵包含的特征向量的总个数。参数矩阵的初始值是随机生成的,一个特征向量用于表示一张样本图片的特征信息,则一整个参数矩阵对应着cls_size张图片。
基于此,本申请重构的新参数矩阵的数据量大小为m×emb_size×4B,其中,m为新参数矩阵包含的特征向量的总个数,m远小于cls_size。
作为一种可选的实施例,物体识别模型的训练方法还包括:
预先将多张样本图片保存在数据集中,并为多张样本图片一一配置样本ID;
相应的,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示样本图片的特征信息的样本特征向量;从参数矩阵中抽取出样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵的过程,包括:
从所有样本ID中随机获取一批次目标样本ID,并基于目标样本ID从数据集中获取对应的目标样本图片;
将目标样本图片输入至用于物体识别的深度学习模型中,得到用于表示目标样本图片的特征信息的样本特征向量;
从剩余样本ID中随机获取一定数量的随机样本ID,并从参数矩阵中抽取出目标样本ID和随机样本ID对应的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵。
进一步地,本申请可提前将内存中保存的一整个参数矩阵对应的多张样本图片保存在数据集中,并为多张样本图片一一配置样本ID(Identity Document,身份标识号),相当于为每张样本图片配置一个标签,从而便于后续获取所需的样本图片。
基于此,获取供后续计算使用的样本特征向量的过程:本申请可从所有样本ID中随机获取一批次样本ID(称为目标样本ID),并基于目标样本ID从数据集中获取对应的样本图片(称为目标样本图片),即深度学习模型本轮训练所需的样本图片;然后将目标样本图片输入至深度学习模型中,得到用于表示目标样本图片的特征信息的样本特征向量。
获取供后续计算使用的新参数矩阵的过程:一方面,从所有样本ID中随机获取一批次样本ID(称为目标样本ID);另一方面,从剩余样本ID(所有样本ID中除去目标样本ID剩下的样本ID)中随机获取一定数量的样本ID(称为随机样本ID),然后从内存中保存的一整个参数矩阵中抽取出目标样本ID和随机样本ID对应的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵。
作为一种可选的实施例,深度学习模型的本轮训练过程具体包括:
预先为不同GPU分配不同样本图片;
将目标GPU对应的样本图片输入至用于物体识别的深度学习模型中,得到用于表示样本图片的特征信息的目标样本特征向量;其中,目标GPU为任一GPU;
从参数矩阵中抽取出为所有GPU分配的所有样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;
利用目标GPU将目标样本特征向量与新参数矩阵相乘,得到目标样本特征向量与新参数矩阵中的各特征向量的目标相似度,并根据目标相似度计算目标损失函数,且基于目标损失函数进行梯度的反向传播,得到新参数矩阵和深度学习模型的待更新参数值的梯度;
求取各GPU对应的待更新参数值梯度的平均值,并根据待更新参数值梯度的平均值更新新参数矩阵和深度学习模型的参数,且基于更新后的新参数矩阵更新内存中的总参数矩阵,以完成深度学习模型的本轮训练。
具体地,本申请采用多个GPU一起参与训练深度学习模型,深度学习模型的训练过程为:预先为不同GPU分配不同样本图片(如共有两个GPU参与模型训练,为GPU 1分配样本图片1和样本图片2,为GPU 2分配样本图片3和样本图片4);将任一GPU(称为目标GPU)对应的样本图片输入至深度学习模型中,得到用于表示样本图片的特征信息的目标样本特征向量;从内存中保存的一整个参数矩阵中抽取出为所有GPU分配的所有样本图片对应的特征向量(如所有样本图片为样本图片1、2、3、4,样本图片1、2、3、4对应特征向量1、2、3、4),及从剩余参数矩阵中随机抽取出一定数量的特征向量(如特征向量5、6、7、8),并将抽取出的所有特征向量(如特征向量1、2、3、4、5、6、7、8)重新构成新参数矩阵,且将新参数矩阵传送给目标GPU;利用目标GPU将目标样本特征向量与新参数矩阵相乘,得到目标样本特征向量与新参数矩阵中的各特征向量的目标相似度;根据目标相似度计算目标损失函数,并基于目标损失函数进行梯度的反向传播,得到新参数矩阵和深度学习模型的待更新参数值的梯度;求取各GPU对应的待更新参数值梯度的平均值,并根据待更新参数值梯度的平均值更新新参数矩阵和深度学习模型的参数,且基于更新后的新参数矩阵更新内存中的总参数矩阵,深度学习模型的本轮训练结束。
作为一种可选的实施例,物体识别模型的训练方法还包括:
在完成深度学习模型的上一轮训练之后,判断深度学习模型是否满足物体识别的模型精度要求;
若是,则确定深度学习模型训练结束;
若否,则重新将新的样本图片输入至用于物体识别的深度学习模型中进入新一轮的训练。
进一步地,本申请还可在完成深度学习模型的上一轮训练之后,判断深度学习模型是否满足物体识别的模型精度要求;若深度学习模型已满足物体识别的模型精度要求,说明深度学习模型无需继续训练,可直接投入使用,则确定深度学习模型训练结束;若深度学习模型还未满足物体识别的模型精度要求,说明深度学习模型需继续训练,不可直接投入使用,则重新将新的样本图片输入至深度学习模型中进入新一轮的训练,直至深度学习模型满足物体识别的模型精度要求,才结束深度学习模型的训练。
作为一种可选的实施例,深度学习模型具体为卷积神经网络模型。
具体地,本申请的深度学习模型可选用但不仅限于卷积神经网络模型(如ResNet、SqueezeNet等模型),本申请在此不做特别的限定。
请参照图3,图3为本发明实施例提供的一种物体识别模型的训练系统的结构示意图。
该物体识别模型的训练系统包括:
矩阵保存模块1,用于预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中;
向量获取模块2,用于在模型训练过程中,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示样本图片的特征信息的样本特征向量;
矩阵重构模块3,用于从参数矩阵中抽取出样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;
相似度获取模块4,用于将样本特征向量与新参数矩阵相乘,得到样本特征向量与新参数矩阵中的各特征向量的相似度;
参数更新模块5,用于根据相似度计算损失函数,并基于损失函数进行梯度的反向传播,更新新参数矩阵和深度学习模型的参数,且基于更新 后的新参数矩阵更新内存中的总参数矩阵,以完成深度学习模型的本轮训练。
作为一种可选的实施例,矩阵保存模块1具体用于:
随机初始化一个大小为emb_size×cls_size、用于表示多张样本图片的特征信息的参数矩阵,并将参数矩阵保存在内存中;其中,emb_size为用于表示一张样本图片的特征信息的特征向量的大小,cls_size为样本图片的总数量。
作为一种可选的实施例,物体识别模型的训练系统还包括:
ID配置模块,用于预先将多张样本图片保存在数据集中,并为多张样本图片一一配置样本ID;
相应的,向量获取模块2具体用于:
从所有样本ID中随机获取一批次目标样本ID,并基于目标样本ID从数据集中获取对应的目标样本图片;将目标样本图片输入至用于物体识别的深度学习模型中,得到用于表示目标样本图片的特征信息的样本特征向量;
矩阵重构模块3具体用于:
从剩余样本ID中随机获取一定数量的随机样本ID,并从参数矩阵中抽取出目标样本ID和随机样本ID对应的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵。
本申请提供的训练系统的介绍请参考上述训练方法的实施例,本申请在此不再赘述。
本申请还提供了一种物体识别模型的训练装置,包括:
存储器,用于存储计算机程序;
处理器,用于在执行计算机程序时实现上述任一种物体识别模型的训练方法的步骤。
本申请提供的训练装置的介绍请参考上述训练方法的实施例,本申请在此不再赘述。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其他实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (10)

  1. 一种物体识别模型的训练方法,其特征在于,包括:
    预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中;
    在模型训练过程中,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述样本图片的特征信息的样本特征向量;
    从所述参数矩阵中抽取出所述样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;
    将所述样本特征向量与所述新参数矩阵相乘,得到所述样本特征向量与所述新参数矩阵中的各特征向量的相似度;
    根据所述相似度计算损失函数,并基于所述损失函数进行梯度的反向传播,更新所述新参数矩阵和所述深度学习模型的参数,且基于更新后的新参数矩阵更新所述内存中的总参数矩阵,以完成所述深度学习模型的本轮训练。
  2. 如权利要求1所述的物体识别模型的训练方法,其特征在于,预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中的过程,包括:
    随机初始化一个大小为emb_size×cls_size、用于表示多张样本图片的特征信息的参数矩阵,并将所述参数矩阵保存在内存中;其中,emb_size为用于表示一张样本图片的特征信息的特征向量的大小,cls_size为样本图片的总数量。
  3. 如权利要求2所述的物体识别模型的训练方法,其特征在于,所述物体识别模型的训练方法还包括:
    预先将多张样本图片保存在数据集中,并为多张样本图片一一配置样本ID;
    相应的,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述样本图片的特征信息的样本特征向量;从所述参数矩阵中抽取出所述样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量 的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵的过程,包括:
    从所有样本ID中随机获取一批次目标样本ID,并基于所述目标样本ID从所述数据集中获取对应的目标样本图片;
    将目标样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述目标样本图片的特征信息的样本特征向量;
    从剩余样本ID中随机获取一定数量的随机样本ID,并从所述参数矩阵中抽取出所述目标样本ID和所述随机样本ID对应的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵。
  4. 如权利要求1所述的物体识别模型的训练方法,其特征在于,所述深度学习模型的本轮训练过程具体包括:
    预先为不同GPU分配不同样本图片;
    将目标GPU对应的样本图片输入至用于物体识别的深度学习模型中,得到用于表示样本图片的特征信息的目标样本特征向量;其中,所述目标GPU为任一所述GPU;
    从所述参数矩阵中抽取出为所有GPU分配的所有样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;
    利用所述目标GPU将所述目标样本特征向量与所述新参数矩阵相乘,得到所述目标样本特征向量与所述新参数矩阵中的各特征向量的目标相似度,并根据所述目标相似度计算目标损失函数,且基于所述目标损失函数进行梯度的反向传播,得到所述新参数矩阵和所述深度学习模型的待更新参数值的梯度;
    求取各GPU对应的待更新参数值梯度的平均值,并根据所述待更新参数值梯度的平均值更新所述新参数矩阵和所述深度学习模型的参数,且基于更新后的新参数矩阵更新所述内存中的总参数矩阵,以完成所述深度学习模型的本轮训练。
  5. 如权利要求1所述的物体识别模型的训练方法,其特征在于,所述物体识别模型的训练方法还包括:
    在完成所述深度学习模型的上一轮训练之后,判断所述深度学习模型是否满足物体识别的模型精度要求;
    若是,则确定所述深度学习模型训练结束;
    若否,则重新将新的样本图片输入至用于物体识别的深度学习模型中进入新一轮的训练。
  6. 如权利要求1所述的物体识别模型的训练方法,其特征在于,所述深度学习模型具体为卷积神经网络模型。
  7. 一种物体识别模型的训练系统,其特征在于,包括:
    矩阵保存模块,用于预先将由多个用于表示物体特征信息的特征向量构成的参数矩阵保存在内存中;
    向量获取模块,用于在模型训练过程中,将样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述样本图片的特征信息的样本特征向量;
    矩阵重构模块,用于从所述参数矩阵中抽取出所述样本图片对应的特征向量及从剩余参数矩阵中随机抽取出一定数量的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵;
    相似度获取模块,用于将所述样本特征向量与所述新参数矩阵相乘,得到所述样本特征向量与所述新参数矩阵中的各特征向量的相似度;
    参数更新模块,用于根据所述相似度计算损失函数,并基于所述损失函数进行梯度的反向传播,更新所述新参数矩阵和所述深度学习模型的参数,且基于更新后的新参数矩阵更新所述内存中的总参数矩阵,以完成所述深度学习模型的本轮训练。
  8. 如权利要求7所述的物体识别模型的训练系统,其特征在于,所述矩阵保存模块具体用于:
    随机初始化一个大小为emb_size×cls_size、用于表示多张样本图片的特征信息的参数矩阵,并将所述参数矩阵保存在内存中;其中,emb_size为用于表示一张样本图片的特征信息的特征向量的大小,cls_size为样本图片的总数量。
  9. 如权利要求8所述的物体识别模型的训练系统,其特征在于,所述物体识别模型的训练系统还包括:
    ID配置模块,用于预先将多张样本图片保存在数据集中,并为多张样本图片一一配置样本ID;
    相应的,所述向量获取模块具体用于:
    从所有样本ID中随机获取一批次目标样本ID,并基于所述目标样本ID从所述数据集中获取对应的目标样本图片;将目标样本图片输入至用于物体识别的深度学习模型中,得到用于表示所述目标样本图片的特征信息的样本特征向量;
    所述矩阵重构模块具体用于:
    从剩余样本ID中随机获取一定数量的随机样本ID,并从所述参数矩阵中抽取出所述目标样本ID和所述随机样本ID对应的特征向量,并将抽取出的所有特征向量重新构成新参数矩阵。
  10. 一种物体识别模型的训练装置,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于在执行所述计算机程序时实现如权利要求1-6任一项所述的物体识别模型的训练方法的步骤。
PCT/CN2021/109199 2020-09-11 2021-07-29 一种物体识别模型的训练方法、系统及装置 WO2022052656A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/012,936 US20230267710A1 (en) 2020-09-11 2021-07-29 Method, system and apparatus for training object recognition model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010956031.X 2020-09-11
CN202010956031.XA CN112115997B (zh) 2020-09-11 2020-09-11 一种物体识别模型的训练方法、系统及装置

Publications (1)

Publication Number Publication Date
WO2022052656A1 true WO2022052656A1 (zh) 2022-03-17

Family

ID=73802442

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109199 WO2022052656A1 (zh) 2020-09-11 2021-07-29 一种物体识别模型的训练方法、系统及装置

Country Status (3)

Country Link
US (1) US20230267710A1 (zh)
CN (1) CN112115997B (zh)
WO (1) WO2022052656A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115997B (zh) * 2020-09-11 2022-12-02 苏州浪潮智能科技有限公司 一种物体识别模型的训练方法、系统及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080298691A1 (en) * 2007-05-30 2008-12-04 Microsoft Corporation Flexible mqdf classifier model compression
CN111368997A (zh) * 2020-03-04 2020-07-03 支付宝(杭州)信息技术有限公司 神经网络模型的训练方法及装置
CN111401521A (zh) * 2020-03-11 2020-07-10 北京迈格威科技有限公司 神经网络模型训练方法及装置、图像识别方法及装置
CN111611880A (zh) * 2020-04-30 2020-09-01 杭州电子科技大学 一种基于神经网络无监督对比学习的高效行人重识别方法
CN112115997A (zh) * 2020-09-11 2020-12-22 苏州浪潮智能科技有限公司 一种物体识别模型的训练方法、系统及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840530A (zh) * 2017-11-24 2019-06-04 华为技术有限公司 训练多标签分类模型的方法和装置
CN111368992B (zh) * 2018-12-26 2023-08-22 阿里巴巴集团控股有限公司 数据处理方法、装置及电子设备
CN111145728B (zh) * 2019-12-05 2022-10-28 厦门快商通科技股份有限公司 语音识别模型训练方法、系统、移动终端及存储介质
CN111241570B (zh) * 2020-04-24 2020-07-17 支付宝(杭州)信息技术有限公司 保护数据隐私的双方联合训练业务预测模型的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080298691A1 (en) * 2007-05-30 2008-12-04 Microsoft Corporation Flexible mqdf classifier model compression
CN111368997A (zh) * 2020-03-04 2020-07-03 支付宝(杭州)信息技术有限公司 神经网络模型的训练方法及装置
CN111401521A (zh) * 2020-03-11 2020-07-10 北京迈格威科技有限公司 神经网络模型训练方法及装置、图像识别方法及装置
CN111611880A (zh) * 2020-04-30 2020-09-01 杭州电子科技大学 一种基于神经网络无监督对比学习的高效行人重识别方法
CN112115997A (zh) * 2020-09-11 2020-12-22 苏州浪潮智能科技有限公司 一种物体识别模型的训练方法、系统及装置

Also Published As

Publication number Publication date
CN112115997A (zh) 2020-12-22
CN112115997B (zh) 2022-12-02
US20230267710A1 (en) 2023-08-24

Similar Documents

Publication Publication Date Title
CN107292352B (zh) 基于卷积神经网络的图像分类方法和装置
WO2019228122A1 (zh) 模型的训练方法、存储介质及计算机设备
EP3270330B1 (en) Method for neural network and apparatus performing same method
JP7291183B2 (ja) モデルをトレーニングするための方法、装置、デバイス、媒体、およびプログラム製品
EP3602419B1 (en) Neural network optimizer search
WO2020108336A1 (zh) 图像处理方法、装置、设备及存储介质
WO2020260862A1 (en) Facial behaviour analysis
WO2022228425A1 (zh) 一种模型训练方法及装置
WO2018120723A1 (zh) 视频压缩感知重构方法、系统、电子装置及存储介质
CN112861659B (zh) 一种图像模型训练方法、装置及电子设备、存储介质
CN109784415A (zh) 图像识别方法及装置、训练卷积神经网络的方法及装置
WO2022052656A1 (zh) 一种物体识别模型的训练方法、系统及装置
CN114547267A (zh) 智能问答模型的生成方法、装置、计算设备和存储介质
CN109241930B (zh) 用于处理眉部图像的方法和装置
CN114782742A (zh) 基于教师模型分类层权重的输出正则化方法
US20210248187A1 (en) Tag recommending method and apparatus, computer device, and readable medium
CN115129460A (zh) 获取算子硬件时间的方法、装置、计算机设备和存储介质
CN114816719B (zh) 多任务模型的训练方法及装置
CN116152645A (zh) 一种融合多种表征平衡策略的室内场景视觉识别方法及系统
CN110502975A (zh) 一种行人重识别的批量处理系统
CN115439916A (zh) 面部识别方法、装置、设备及介质
CN113223128B (zh) 用于生成图像的方法和装置
CN112861892B (zh) 图片中目标的属性的确定方法和装置
CN114187465A (zh) 分类模型的训练方法、装置、电子设备及存储介质
CN113128292A (zh) 一种图像识别方法、存储介质及终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21865712

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21865712

Country of ref document: EP

Kind code of ref document: A1