WO2020114119A1 - Cross-domain network training method and cross-domain image recognition method - Google Patents

Cross-domain network training method and cross-domain image recognition method Download PDF

Info

Publication number
WO2020114119A1
WO2020114119A1 PCT/CN2019/112492 CN2019112492W WO2020114119A1 WO 2020114119 A1 WO2020114119 A1 WO 2020114119A1 CN 2019112492 W CN2019112492 W CN 2019112492W WO 2020114119 A1 WO2020114119 A1 WO 2020114119A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain
training
cross
learning rate
loss
Prior art date
Application number
PCT/CN2019/112492
Other languages
French (fr)
Chinese (zh)
Inventor
刘若鹏
栾琳
赵盟盟
Original Assignee
深圳光启空间技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳光启空间技术有限公司 filed Critical 深圳光启空间技术有限公司
Publication of WO2020114119A1 publication Critical patent/WO2020114119A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the invention relates to the field of artificial intelligence, and in particular, to a cross-domain network training and image recognition method.
  • Face recognition is a kind of biometrics recognition technology based on human facial feature information.
  • a series of related technologies that use cameras or cameras to collect images or video streams containing human faces, and automatically detect and track human faces in the images, and then perform face recognition on the detected faces, usually also called portrait recognition and facial recognition .
  • the face photos extracted from the video captured by the surveillance camera have very complex changes in lighting, angle, resolution, and expression, which makes the image to be recognized and the training sample image have statistically significant distribution characteristics. , That is, cross-domain identification.
  • the current artificial intelligence network is difficult to achieve accurate recognition of cross-domain images.
  • the purpose of the present invention is to provide a new cross-domain image recognition method to overcome the defect that the existing image recognition method cannot accurately recognize cross-domain scenes to overcome the current image recognition technology in cross-domain recognition scenes The problem of poor accuracy.
  • An aspect of the present invention provides a cross-domain network training method, including the following steps:
  • S1 input the sample data of the first domain and the second domain to the deep neural network, and train the sample data of the first domain and the second domain, so that the deep neural network has the classification ability on the first domain and the second domain, respectively ;
  • S2 Eliminate the difference in statistical distribution between domains, so that the first domain and the second domain have similar statistical distribution characteristics
  • the training of the sample data of the first domain and the second domain in step S1 includes: training the loss function Triplet-Loss of the sample data of the first domain and the second domain.
  • step S2 includes: when the loss function Triplet-Loss is stable and satisfies convergence, calculate the maximum mean difference loss MMD-Loss using the highest dimensional characteristics of the first domain and the second domain, and add the result to the synthetic loss function, together Perform back propagation and gradient derivation.
  • step S3 includes: removing the MMD-Loss in the synthesis loss function, and adding the mixed Triplet-Loss of the first domain and the second domain to perform the training of enhanced inner aggregation.
  • step S4 includes: When the synthetic loss function of the training with enhanced intra-aggregation converges and is less than the set value, the training result is saved.
  • the training of the loss function Triplet-Loss further includes: setting the first learning rate, the initial value of the first learning rate is 0.001 to 0.01, and the training of the loss function Triplet-Loss is performed every 10 rounds.
  • a learning rate is set to 0.7 to 0.9 times.
  • a second learning rate is set, the initial value of the second learning rate is less than the initial value of the first learning rate, the training for enhanced internal aggregation is performed every 10 rounds, and the second learning rate is set to 0.7 to 0.9 Times.
  • the initial value of the second learning rate is 0.0001 to 0.001.
  • a third learning rate is set, the initial value of the third learning rate is less than the initial value of the first learning rate, the training for enhanced inner aggregation is performed every 5 rounds, and the third learning rate is set to 0.7 to 0.9 Times.
  • the initial value of the third learning rate is 0.0001 to 0.001.
  • step S1 further includes: after the training of Triplet-Loss is completed, extracting the features of the first domain and the second domain, performing data dimensionality reduction, and drawing the distribution of feature positions in the two-dimensional space.
  • step S2 further includes: after the training of the synthetic loss function is completed, extracting the features of the first domain and the second domain, performing data dimensionality reduction, and drawing the feature position distribution in the two-dimensional space.
  • step S3 further includes: after the training for enhanced internal aggregation is completed, extracting the features of the first domain and the second domain, performing data dimensionality reduction, and drawing the distribution of feature positions in the two-dimensional space.
  • Another aspect of the present invention provides an image recognition method that performs training on the deep neural network as described above, and uses the trained deep neural network to recognize the image.
  • the present invention also provides a storage medium that stores a computer program, wherein the computer program is set to execute the training method described above at runtime.
  • the cross-domain network training method of the present invention implements the training of the network through cross-domain data as input, so that the data can be correctly identified even when the data has different statistical distribution characteristics.
  • the deep neural network obtained by this training method it can identify and match images from different environmental domains, and is particularly suitable for identifying identity information through video images in the field of security.
  • FIG. 1 is a flowchart of a preferred embodiment of a cross-domain network training method of the present invention
  • FIG. 2 is a flowchart of an image recognition method based on a deep neural network with cross-domain recognition capability
  • FIG. 3 is a flowchart of another embodiment of another cross-domain network training method
  • FIG. 4 is a flowchart of a preferred embodiment of a cross-domain network training method of the present invention.
  • FIG. 5 is a schematic diagram of a preferred embodiment of a learning rate adjustment scheme
  • Figure 6 is the effect diagram after the training of Triplet-Loss
  • Figure 7 is the effect diagram after the training of MMD+Triplet-Loss ends
  • Fig. 8 is an effect diagram after the completion of the training synthesis loss function.
  • the directional words such as “up, down, top, and bottom” are generally used in the direction shown in the drawings, or in the vertical direction of the component itself, unless otherwise stated.
  • FIG. 1 is a flowchart of a preferred embodiment of the cross-domain network training method provided by the present invention. It should be noted that the network training method for FIG. 1 is carried out using Facenet as an example, and those skilled in the art can also obtain other similar neural network similar to the present invention after training other deep neural networks according to the training method of the present invention. Cross-domain recognition effect.
  • Step S1 Input sample data from two different domains: the first domain and the second domain to Facenet, and train the first domain and the second domain on the network architecture of Facenet so that the first domain and the second domain
  • the two domains have their own classification capabilities
  • Step S2 remove the statistical distribution difference between the domains trained in the above step S1, so that the first domain and the second domain have similar statistical distribution characteristics;
  • Step S3 Strengthen the inner aggregation of the first domain and the second domain obtained by training in step S2 with similar statistical distribution characteristics
  • Step S4 After step S3 is executed, if the deep neural network meets the preset conditions, the training is stopped, the training step of the cross-domain network is completed, and the trained deep neural network is saved.
  • the present invention also provides an image recognition method based on a deep neural network with cross-domain recognition capability, as shown in FIG. 2 is a preferred embodiment of the method.
  • the image recognition method in the embodiment of FIG. 2 includes two steps: training the deep neural network with different sample data of two ANDs, so that the trained deep neural network eliminates the difference in statistical distribution characteristics of the sample data in different domains .
  • the method corresponding to FIG. 1 is specifically used for deep neural network training.
  • S5 input image data from the first domain and image data of the second domain to the trained deep neural network for recognition and matching.
  • the matching relationship between the image data from the first domain and the image data from the second domain is obtained.
  • the image recognition is completed.
  • this application provides another embodiment of a cross-domain network training method as shown in FIG. 3. This embodiment continues to be described by training the Facenet network.
  • step S10 this step is a further improvement of S1.
  • input the sample data of the first domain and the second domain respectively train the loss function Triplet-Loss of the respective domain, and set the learning rate at Between 0.001 and 0.01, the purpose is to make the two domains have classification capabilities;
  • step S20 This step is a further improvement of S2.
  • the maximum mean difference loss MMD-Loss is calculated using the highest dimensional features of the two domains, and the result is added to the composite loss function. That is (MMD-Loss) + (Triplet-Loss) [for the convenience of description, the composite loss function is abbreviated as MMD+Triplet], which performs back propagation and gradient derivation together, and the learning rate is set between 0.0001 and 0.001 to eliminate the two Differences in statistical distribution among domains;
  • step S20 After eliminating the difference in statistical distribution between domains, before performing the next training, it is necessary to detect whether the loss function in step S20 converges. When the loss function converges, perform step S30, otherwise continue to step S20 until convergence. Since the loss function represents the difference between the predicted value and the true value of the model, when it converges, it indicates that the recognition of the deep neural network has been stable at this stage.
  • step S30 this step is a further improvement of S3.
  • the loss function has converged, it is necessary to further remove the MMD-Loss in the synthesized loss function and add the mixed Triplet-Loss of the first domain and the second domain to learn
  • the rate is set between 0.0001 and 0.001, the purpose is to strengthen the clustering effect within the class;
  • step S40 this step is a further improvement of S4, and it is detected whether the synthetic loss function in step S30 converges and is smaller than a set value.
  • the set value is 0.01. If the above two conditions are satisfied, the training is completed, otherwise step S30 is continued until the synthesized loss function converges and the value is less than the set value of 0.01. For the deep neural network after training, it can be saved for subsequent use.
  • FIG. 4 shows a preferred embodiment of the present invention regarding a cross-domain network training method. This embodiment is based on the improvement in the corresponding embodiment of FIG. 3.
  • step S10 on the basis of the Facenet network structure, input the sample data of the first domain and the second domain respectively, train the loss function Triplet-Loss of the respective domain, and set the learning rate to be set between 0.001 and 0.01, so that the two Each domain has a classification capability;
  • the forward propagation process specifically includes: performing weighted sum operation on the input specific data of the two domains, and then adding the offset value, and finally through a non-linear function, that is, the activation function, to obtain the output after processing .
  • step S20 when step S20 is executed, if the loss function of step S10 is stable and the conditions for convergence are not satisfied, the loss function Triplet-Loss of each domain is trained again until the loss function meets the conditions.
  • the training of the synthesis loss function in step S30 is implemented in the following manner: first, a pair of images is selected, and the pair of images refers to data of the same domain target having both the first domain image and the second domain image Yes, as if a person has both a camera video screenshot and an ID card photo, use this as training data to train the synthetic loss function in S30.
  • the present invention provides another preferred embodiment, which uses the deep neural network trained in the embodiment of FIG. 4 to perform image pairing.
  • FIG. 5 is a schematic diagram of a preferred embodiment of the learning rate adjustment scheme.
  • the learning rate is adjusted three times, corresponding to three trainings in the cross-domain network training method provided by the present invention.
  • the learning rate adjustment at this step corresponds to step S20 in the aforementioned embodiment.
  • the third learning rate is set to be less than the first learning rate.
  • the corresponding step in the foregoing is set to 0.001, and the learning rate is adjusted to 0.7 to 0.9 times every 5 rounds, and is selected to be 0.8 times in this embodiment. This allows the loss function to converge faster.
  • the adjustment of the learning rate at this step corresponds to step S30 in the aforementioned embodiment.
  • the invention further provides a visualization solution for the training process. Add visual output to each training session.
  • the above embodiment corresponding to FIG. 4 will be used as an explanation.
  • step S10 after the training of Triplet-Loss is completed, the features of different domains are extracted and data dimensionality reduction is performed, for example, T-SNE (T-distributed) is adopted in various embodiments of the present invention (stochastic neighbor embedding) dimensionality reduction. After T-sne dimensionality reduction, the feature position distribution is drawn in two-dimensional space to realize the training effect visualization and output the training effect shown in FIG.
  • step S20 MMD+Triplet-Loss training, and step S40 in the training synthesis loss function stage, the training results are visually output to obtain the training effect as shown in FIGS. 7 and 8.
  • each dot represents the feature position of a face image
  • dots of the same color (or shape) represent different images of the same person
  • the close distance between dots indicates that the features are similar, ideal for cross-domain recognition It is that all the points of the same person are highly concentrated, and the points of different people are far away. It can be seen that as the training progresses, the points of the same color gradually gather, and the points of different colors gradually pull away, finally achieving the cross-domain recognition effect.
  • the first domain data is video image data
  • the second domain is image data on the ID card.
  • the network used is the Facenet network.
  • the two channels of data are input to the network through forward propagation.
  • the two channels of data will be used again after MMD-Loss training in the subsequent process.
  • the network After the two channels of data are input through forward propagation, the network performs the first training: the learning rate is 0.01, and the Triplet-Loss training of the respective domain is performed.
  • the learning rate In the training for this loss function, there will be multiple rounds, each 10 In the round of training, the learning rate is adjusted to 0.8 times, so that after a certain round of training, the loss function will converge, indicating that the network has now "learned" the data features of these two domains on the loss function.
  • the two domains achieve the elimination of the statistical distribution difference between the two domains, that is, the network "learning" of the images in the two domains will obtain the "basis” of recognition, and will follow the same person in the video.
  • the recognition in the image provides possibilities for tasks on the ID card.
  • the corresponding tasks can be matched from the relevant image data of the ID card, so as to realize the character recognition and obtain the effect of cross-domain recognition . Or, input the ID card image data, and then identify the person in the video data.
  • a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments at runtime.
  • the solution of the present invention can solve the problem that the image to be recognized and the training sample image have different statistical distribution characteristics.
  • the image obtained by the video can be analyzed to match its corresponding The identity on the ID photo. Solved the shortcomings that currently cannot achieve this effect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a cross-domain network training method and a cross-domain image recognition method. The cross-domain network training method comprises the following steps: S1: inputting sample data of a first domain and a second domain into a deep neural network, and training the sample data of the first domain and the second domain, such that the deep neural network has a classification capability on the first domain and the second domain, respectively; S2: eliminating an inter-domain statistical distribution difference, such that the first domain and the second domain have similar statistical distribution characteristics; S3: carrying out training, for enhancing internal aggregation, on the first domain and the second domain; and S4: storing a training result that meets a pre-set condition. According to the present invention, even in the case where data has different statistical distribution characteristics, an image can also be correctly recognized.

Description

一种跨域网络训练及图像识别方法A cross-domain network training and image recognition method 技术领域Technical field
本发明涉及人工智能领域,具体的,涉及一种跨域网络训练及图像识别方法。The invention relates to the field of artificial intelligence, and in particular, to a cross-domain network training and image recognition method.
背景技术Background technique
人脸识别,是基于人的脸部特征信息进行身份识别的一种生物识别技术。用摄像机或摄像头采集含有人脸的图像或视频流,并自动在图像中检测和跟踪人脸,进而对检测到的人脸进行脸部识别的一系列相关技术,通常也叫做人像识别、面部识别。Face recognition is a kind of biometrics recognition technology based on human facial feature information. A series of related technologies that use cameras or cameras to collect images or video streams containing human faces, and automatically detect and track human faces in the images, and then perform face recognition on the detected faces, usually also called portrait recognition and facial recognition .
现有人脸识别算法大多可以解决单域人脸识别(即待识别图像和训练样本图像具有相同的统计分布特性)问题,例如,使用视频作为训练样本,然后对训练好的网络识别其他视频中的人脸。Florian Schroff等人提出的Facenet算法是目前效果较好的单域人脸识别算法,同时作者也给出了相应的单域数据训练方法。Most of the existing face recognition algorithms can solve the problem of single-domain face recognition (that is, the image to be recognized and the training sample image have the same statistical distribution characteristics), for example, using the video as the training sample, and then identifying the other videos in the trained network. human face. Florian The Facenet algorithm proposed by Schroff et al. is a single-domain face recognition algorithm with good results, and the author also gives a corresponding single-domain data training method.
技术问题technical problem
但在实际应用场景下,监控摄像头拍摄视频中提取出的人脸照片具有十分复杂的光照、角度、分辨率和表情等变化因素,这使得待识别图像与训练样本图像具有差异巨大的统计分布特性,即跨域识别问题。目前的人工智能网络很难对跨域的图像实现准确的识别。However, in actual application scenarios, the face photos extracted from the video captured by the surveillance camera have very complex changes in lighting, angle, resolution, and expression, which makes the image to be recognized and the training sample image have statistically significant distribution characteristics. , That is, cross-domain identification. The current artificial intelligence network is difficult to achieve accurate recognition of cross-domain images.
技术解决方案Technical solution
本发明的目的在于,针对现有的图像识别方法无法对跨域的场景实现准确的识别的缺陷,提供一种新的跨域图像识别方法,以克服目前的图像识别技术在跨域识别的场景中准确性差的问题。The purpose of the present invention is to provide a new cross-domain image recognition method to overcome the defect that the existing image recognition method cannot accurately recognize cross-domain scenes to overcome the current image recognition technology in cross-domain recognition scenes The problem of poor accuracy.
本发明的一个方面提供一种跨域网络训练方法,包括以下步骤:An aspect of the present invention provides a cross-domain network training method, including the following steps:
S1:向深度神经网络输入第一域和第二域的样本数据,对第一域和第二域的样本数据进行训练,使得在深度神经网络在第一域与第二域上各自具有分类能力;S1: input the sample data of the first domain and the second domain to the deep neural network, and train the sample data of the first domain and the second domain, so that the deep neural network has the classification ability on the first domain and the second domain, respectively ;
S2:消除域间统计分布差异,使得第一域和第二域具有相近的统计分布特性;S2: Eliminate the difference in statistical distribution between domains, so that the first domain and the second domain have similar statistical distribution characteristics;
S3:对第一域和第二域进行加强内聚集的训练;S3: Strengthen the internal aggregation training for the first domain and the second domain;
S4:对符合预设条件的训练结果进行保存。S4: Save the training results that meet the preset conditions.
较佳的,步骤S1中对第一域和第二域的样本数据进行训练包括:对第一域和第二域的样本数据进行损失函数Triplet-Loss的训练。Preferably, the training of the sample data of the first domain and the second domain in step S1 includes: training the loss function Triplet-Loss of the sample data of the first domain and the second domain.
较佳的,步骤S2包括:当损失函数Triplet-Loss稳定且满足收敛时,用第一域和第二域的最高维度特征计算最大均值差异损失MMD-Loss,并将结果加入合成损失函数,共同进行反向传播和梯度求导。Preferably, step S2 includes: when the loss function Triplet-Loss is stable and satisfies convergence, calculate the maximum mean difference loss MMD-Loss using the highest dimensional characteristics of the first domain and the second domain, and add the result to the synthetic loss function, together Perform back propagation and gradient derivation.
较佳的,步骤S3包括:将合成损失函数中的MMD-Loss去除,并加入第一域和第二域的混合Triplet-Loss,进行加强内聚集的训练。Preferably, step S3 includes: removing the MMD-Loss in the synthesis loss function, and adding the mixed Triplet-Loss of the first domain and the second domain to perform the training of enhanced inner aggregation.
较佳的,,步骤S4包括:      当加强内聚集的训练的合成损失函数收敛且小于设定值时,对训练结果进行保存。Preferably, step S4 includes: When the synthetic loss function of the training with enhanced intra-aggregation converges and is less than the set value, the training result is saved.
较佳的,步骤S1中,进行损失函数Triplet-Loss的训练还包括:       设置第一学习率,第一学习率初始值为0.001至0.01,每10轮进行损失函数Triplet-Loss的训练,将第一学习率设置为0.7至0.9倍。Preferably, in step S1, the training of the loss function Triplet-Loss further includes: setting the first learning rate, the initial value of the first learning rate is 0.001 to 0.01, and the training of the loss function Triplet-Loss is performed every 10 rounds. A learning rate is set to 0.7 to 0.9 times.
较佳的,步骤S2中,设置第二学习率,第二学习率的初始值小于第一学习率的初始值,每10轮进行加强内聚集的训练,将第二学习率设置为0.7至0.9倍。Preferably, in step S2, a second learning rate is set, the initial value of the second learning rate is less than the initial value of the first learning rate, the training for enhanced internal aggregation is performed every 10 rounds, and the second learning rate is set to 0.7 to 0.9 Times.
较佳的,第二学习率初始值为0.0001至0.001。Preferably, the initial value of the second learning rate is 0.0001 to 0.001.
较佳的,步骤S3中,设置第三学习率,第三学习率的初始值小于第一学习率的初始值,每5轮进行加强内聚集的训练,将第三学习率设置为0.7至0.9倍。Preferably, in step S3, a third learning rate is set, the initial value of the third learning rate is less than the initial value of the first learning rate, the training for enhanced inner aggregation is performed every 5 rounds, and the third learning rate is set to 0.7 to 0.9 Times.
较佳的,第三学习率的初始值为0.0001至0.001。Preferably, the initial value of the third learning rate is 0.0001 to 0.001.
较佳的,步骤S1还包括:对Triplet-Loss的训练结束后,提取第一域和第二域的特征,执行数据降维,在二维空间画出特征位置分布。Preferably, step S1 further includes: after the training of Triplet-Loss is completed, extracting the features of the first domain and the second domain, performing data dimensionality reduction, and drawing the distribution of feature positions in the two-dimensional space.
较佳的,步骤S2还包括:对合成损失函数训练结束后,提取第一域和第二域的特征,执行数据降维,在二维空间画出特征位置分布。Preferably, step S2 further includes: after the training of the synthetic loss function is completed, extracting the features of the first domain and the second domain, performing data dimensionality reduction, and drawing the feature position distribution in the two-dimensional space.
较佳的,步骤S3还包括:进行加强内聚集的训练结束后,提取第一域和第二域的特征,执行数据降维,在二维空间画出特征位置分布。Preferably, step S3 further includes: after the training for enhanced internal aggregation is completed, extracting the features of the first domain and the second domain, performing data dimensionality reduction, and drawing the distribution of feature positions in the two-dimensional space.
本发明的另一个方面,提供一种图像识别方法,对深度神经网络进行如前述的训练,使用训练后的深度神经网络对图像进行识别。Another aspect of the present invention provides an image recognition method that performs training on the deep neural network as described above, and uses the trained deep neural network to recognize the image.
本发明还提供一种存储介质,其存储有计算机程序,其中,计算机程序被设置为运行时执行前述中的训练方法。The present invention also provides a storage medium that stores a computer program, wherein the computer program is set to execute the training method described above at runtime.
有益效果Beneficial effect
实施本发明的跨域网络训练方法,通过跨域的数据作为输入,实现对网络的训练,使得即使在数据具有不同统计分布特性的情况下,也能够正确的识别出来。当使用该训练方法得到的深度神经网络,能够对来自不同环境域的图像进行识别匹配,尤其适合在安保领域中通过视频图像识别出身份信息。The cross-domain network training method of the present invention implements the training of the network through cross-domain data as input, so that the data can be correctly identified even when the data has different statistical distribution characteristics. When using the deep neural network obtained by this training method, it can identify and match images from different environmental domains, and is particularly suitable for identifying identity information through video images in the field of security.
附图说明BRIEF DESCRIPTION
构成本申请的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings forming part of this application are used to provide a further understanding of the present invention. The schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an undue limitation on the present invention. In the drawings:
图1为本发明跨域网络训练方法一则优选实施例的流程图;1 is a flowchart of a preferred embodiment of a cross-domain network training method of the present invention;
图2为基于具有跨域识别能力的深度神经网络进行的图像识别方法的流程图;2 is a flowchart of an image recognition method based on a deep neural network with cross-domain recognition capability;
图3为另一则跨域网络训练方法的另一实施例的流程图;FIG. 3 is a flowchart of another embodiment of another cross-domain network training method;
图4为本发明关于跨域网络训练方法的一较佳的实施例的流程图;4 is a flowchart of a preferred embodiment of a cross-domain network training method of the present invention;
图5为学习率调整方案的一则优选实施例的示意图;5 is a schematic diagram of a preferred embodiment of a learning rate adjustment scheme;
图6为对Triplet-Loss的训练结束后的效果图;Figure 6 is the effect diagram after the training of Triplet-Loss;
图7为MMD+Triplet-Loss的训练结束后的效果图;Figure 7 is the effect diagram after the training of MMD+Triplet-Loss ends;
图8为完成训练合成损失函数后的效果图。Fig. 8 is an effect diagram after the completion of the training synthesis loss function.
本发明的实施方式Embodiments of the invention
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。It should be noted that the embodiments in the present application and the features in the embodiments can be combined with each other if there is no conflict. The present invention will be described in detail below with reference to the drawings and in conjunction with the embodiments.
需要指出的是,除非另有指明,本申请使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that, unless otherwise specified, all technical and scientific terms used in this application have the same meaning as commonly understood by those of ordinary skill in the technical field to which this application belongs.
在本发明中,在未作相反说明的情况下,使用的方位词如“上、下、顶、底”通常是针对附图所示的方向而言的,或者是针对部件本身在竖直、垂直或重力方向上而言的;同样地,为便于理解和描述,“内、外”是指相对于各部件本身的轮廓的内、外,但上述方位词并不用于限制本发明。In the present invention, the directional words such as "up, down, top, and bottom" are generally used in the direction shown in the drawings, or in the vertical direction of the component itself, unless otherwise stated. In terms of vertical or gravity directions; similarly, for ease of understanding and description, "inside and outside" refers to inside and outside relative to the contour of each component itself, but the above directional words are not used to limit the present invention.
如图1所示为本发明提供的跨域网络训练方法一则优选实施例的流程图。需要说明的是,图1针对的网络训练方法是以Facenet为例进行的,本领域的技术人员对其他的深度神经网络依据本发明的训练方法进行训练后,也可以获得与本发明相近似的跨域识别效果。FIG. 1 is a flowchart of a preferred embodiment of the cross-domain network training method provided by the present invention. It should be noted that the network training method for FIG. 1 is carried out using Facenet as an example, and those skilled in the art can also obtain other similar neural network similar to the present invention after training other deep neural networks according to the training method of the present invention. Cross-domain recognition effect.
步骤S1:向Facenet输入来自相异的两个域:第一域和第二域的样本数据,在Facenet的网络架构上对该第一域和第二域进行训练,使得在第一域与第二域上各自具有分类能力;Step S1: Input sample data from two different domains: the first domain and the second domain to Facenet, and train the first domain and the second domain on the network architecture of Facenet so that the first domain and the second domain The two domains have their own classification capabilities;
步骤S2:对上述步骤S1训练好的结果进行消除域间统计分布差异,使得第一域和第二域具有相近的统计分布特性;Step S2: remove the statistical distribution difference between the domains trained in the above step S1, so that the first domain and the second domain have similar statistical distribution characteristics;
步骤S3:对步骤S2中训练得到具有相近统计分布特性的第一域和第二域进行加强内聚集;Step S3: Strengthen the inner aggregation of the first domain and the second domain obtained by training in step S2 with similar statistical distribution characteristics;
步骤S4:在步骤S3执行后,若深度神经网络符合预设条件,则停止训练,完成对跨域网络的训练步骤,保存训练好的深度神经网络。Step S4: After step S3 is executed, if the deep neural network meets the preset conditions, the training is stopped, the training step of the cross-domain network is completed, and the trained deep neural network is saved.
本发明还提供一种基于具有跨域识别能力的深度神经网络进行的图像识别方法,如图2所示是该方法的一则优选实施例。The present invention also provides an image recognition method based on a deep neural network with cross-domain recognition capability, as shown in FIG. 2 is a preferred embodiment of the method.
图2实施例的图像识别方法包括两个步骤:利用相异的两个与的样本数据对深度神经网络进行训练,使得经过训练后的深度神经网络对不同域的样本数据消除统计分布特性的差异。在本实施例中,具体采用图1所对应的方法进行深度神经网络的训练。The image recognition method in the embodiment of FIG. 2 includes two steps: training the deep neural network with different sample data of two ANDs, so that the trained deep neural network eliminates the difference in statistical distribution characteristics of the sample data in different domains . In this embodiment, the method corresponding to FIG. 1 is specifically used for deep neural network training.
在上述的S1~S4的训练完成后,执行S5:向训练好的深度神经网络输入来自第一域的图像数据和第二域的图像数据,进行识别匹配。得到来自第一域的图像数据与第二域的图像数据的匹配关系。即完成图像的识别。After the above trainings of S1 to S4 are completed, S5 is executed: input image data from the first domain and image data of the second domain to the trained deep neural network for recognition and matching. The matching relationship between the image data from the first domain and the image data from the second domain is obtained. The image recognition is completed.
较佳的,本申请提供如图3所示,关于跨域网络训练方法的另一实施例。本实施例继续以训练Facenet网络进行说明。Preferably, this application provides another embodiment of a cross-domain network training method as shown in FIG. 3. This embodiment continues to be described by training the Facenet network.
首先在步骤S10:该步骤为S1的进一步改进,在Facenet网络结构的基础上,分别输入第一域和第二域的样本数据,训练各自域的损失函数Triplet-Loss,并设置学习率设置在0.001至0.01之间,目使得两个域各自具有分类能力;First in step S10: this step is a further improvement of S1. On the basis of the Facenet network structure, input the sample data of the first domain and the second domain respectively, train the loss function Triplet-Loss of the respective domain, and set the learning rate at Between 0.001 and 0.01, the purpose is to make the two domains have classification capabilities;
然后在步骤S20:该步骤为S2的进一步改进,当步骤S10的损失函数稳定且满足收敛时,用两个域的最高维度特征计算最大均值差异损失MMD-Loss,并将结果加入合成损失函数,即(MMD-Loss)+(Triplet-Loss)【为方便描述,合成损失函数简写为MMD+Triplet】,共同进行反向传播和梯度求导,学习率设置在0.0001至0.001之间,实现消除两个域的域间统计分布差异;Then in step S20: This step is a further improvement of S2. When the loss function in step S10 is stable and satisfies convergence, the maximum mean difference loss MMD-Loss is calculated using the highest dimensional features of the two domains, and the result is added to the composite loss function. That is (MMD-Loss) + (Triplet-Loss) [for the convenience of description, the composite loss function is abbreviated as MMD+Triplet], which performs back propagation and gradient derivation together, and the learning rate is set between 0.0001 and 0.001 to eliminate the two Differences in statistical distribution among domains;
对于消除了域间统计分布差异之后,在进行下一步的训练前,需要检测在步骤S20中的损失函数是否收敛,当损失函数收敛时,执行步骤S30,否则继续进行步骤S20,直到收敛。由于损失函数表示模型对于预测值与真实值之间的差异,当其收敛时,表明深度神经网络在这一阶段的识别已经稳定。After eliminating the difference in statistical distribution between domains, before performing the next training, it is necessary to detect whether the loss function in step S20 converges. When the loss function converges, perform step S30, otherwise continue to step S20 until convergence. Since the loss function represents the difference between the predicted value and the true value of the model, when it converges, it indicates that the recognition of the deep neural network has been stable at this stage.
在步骤S30,该步骤为S3的进一步改进,对于损失函数已经收敛的情况,需要进一步将合成的损失函数中的MMD-Loss去除,并加入第一域和第二域的混合Triplet-Loss,学习率设置在0.0001至0.001之间,目的是加强类内聚集效果;In step S30, this step is a further improvement of S3. For the case where the loss function has converged, it is necessary to further remove the MMD-Loss in the synthesized loss function and add the mixed Triplet-Loss of the first domain and the second domain to learn The rate is set between 0.0001 and 0.001, the purpose is to strengthen the clustering effect within the class;
在步骤S40,该步骤为S4的进一步改进,检测步骤S30中的合成损失函数是否收敛且小于设定值,在本实施例中,该设定值为0.01。若满足上述的两个条件,则完成训练,否则继续进行步骤S30,直到合成的损失函数收敛并且数值小于0.01的设定值。而对于完成训练的深度神经网络,在进行保存,可进行后续的使用。In step S40, this step is a further improvement of S4, and it is detected whether the synthetic loss function in step S30 converges and is smaller than a set value. In this embodiment, the set value is 0.01. If the above two conditions are satisfied, the training is completed, otherwise step S30 is continued until the synthesized loss function converges and the value is less than the set value of 0.01. For the deep neural network after training, it can be saved for subsequent use.
如图4所示为本发明关于跨域网络训练方法的一较佳的实施例,该实施例是基于图3对应实施例上的改进。FIG. 4 shows a preferred embodiment of the present invention regarding a cross-domain network training method. This embodiment is based on the improvement in the corresponding embodiment of FIG. 3.
在步骤S10:在Facenet网络结构的基础上,分别输入第一域和第二域的样本数据,训练各自域的损失函数Triplet-Loss,并设置学习率设置在0.001至0.01之间,使得两个域各自具有分类能力;In step S10: on the basis of the Facenet network structure, input the sample data of the first domain and the second domain respectively, train the loss function Triplet-Loss of the respective domain, and set the learning rate to be set between 0.001 and 0.01, so that the two Each domain has a classification capability;
步骤S10中进行训练各自域的损失函数Triplet-Loss前,还包括S11:对输入的两个域进行前向传播处理。在本实施例中,前向传播处理具体包括:对输入的两个域的具体数据,进行加权求和运算,然后加入偏置值,最后通过非线性函数,即激活函数,进行处理后得到输出。Before training the loss function Triplet-Loss of each domain in step S10, it also includes S11: performing forward propagation processing on the two input domains. In this embodiment, the forward propagation process specifically includes: performing weighted sum operation on the input specific data of the two domains, and then adding the offset value, and finally through a non-linear function, that is, the activation function, to obtain the output after processing .
在本实施例中,在执行步骤S20时,若步骤S10的损失函数稳定且满足收敛的条件不满足,则重新执行训练各自域的损失函数Triplet-Loss,直到损失函数满足条件。In this embodiment, when step S20 is executed, if the loss function of step S10 is stable and the conditions for convergence are not satisfied, the loss function Triplet-Loss of each domain is trained again until the loss function meets the conditions.
在本实施例中,对步骤S30中的合成损失函数的训练是通过以下方式实现的:首先选取成对图像,成对图像是指同一类目标同时具有第一域图像和第二域图像的数据对,如同一个人既有摄像头视频截图也有身份证照,将此作为训练数据对S30中的合成损失函数进行训练。In this embodiment, the training of the synthesis loss function in step S30 is implemented in the following manner: first, a pair of images is selected, and the pair of images refers to data of the same domain target having both the first domain image and the second domain image Yes, as if a person has both a camera video screenshot and an ID card photo, use this as training data to train the synthetic loss function in S30.
基于上述图4的关于跨域网络训练方法,本发明提供另一优选实施例,其使用图4实施例训练好的深度神经网络进行图像配对。Based on the above-mentioned cross-domain network training method of FIG. 4, the present invention provides another preferred embodiment, which uses the deep neural network trained in the embodiment of FIG. 4 to perform image pairing.
进一步的,在本申请中对于跨域网络进行训练时,为提高训练效果,使得模型尽快收敛且均有较好的识别能力,采用随训练进度动态更新学习率的方案。如图5所示为关于该学习率调整方案的一则优选实施例的示意图。Further, when training the cross-domain network in this application, in order to improve the training effect, make the model converge as soon as possible and have a better recognition ability, a scheme of dynamically updating the learning rate with the training progress is adopted. FIG. 5 is a schematic diagram of a preferred embodiment of the learning rate adjustment scheme.
在本实施例中对学习率进行了3次调整,对应于本发明所提供的跨域网络训练方法中的3次训练。In this embodiment, the learning rate is adjusted three times, corresponding to three trainings in the cross-domain network training method provided by the present invention.
首先在对Triplet-Loss进行训练时,设置第一学习率,为了尽快地将损失函数降至较低水平,设置较大的学习率,例如0.001至0.01,具体的前述中对应步骤设置的0.01,并且保持每10轮将学习率调整为0.7至0.9倍,本实施例中选择为0.8倍。该步骤的学习率调整对应于前述实施例中的步骤S10。First, when training Triplet-Loss, set the first learning rate. In order to reduce the loss function to a lower level as soon as possible, set a larger learning rate, such as 0.001 to 0.01, the specific 0.01 set in the corresponding step above, And keep adjusting the learning rate from 0.7 to 0.9 times every 10 rounds, and choose 0.8 times in this embodiment. The learning rate adjustment at this step corresponds to step S10 in the aforementioned embodiment.
然后进行第二次的学习率调整,在对MMD+Triplet进行训练时,设置第二学习率,为了让损失函数达到更低的水平,第二学习率设置为小于第一学习率,例如在前述中对应步骤设置为0.001,并且保持每10轮将学习率调整为0.7至0.9倍,本实施例中选择为0.8倍。该步骤的学习率调整对应于前述实施例中的步骤S20。Then adjust the learning rate for the second time. When training MMD+Triplet, set the second learning rate. In order to make the loss function reach a lower level, the second learning rate is set to be less than the first learning rate. The corresponding step in the middle is set to 0.001, and the learning rate is adjusted to 0.7 to 0.9 times every 10 rounds, and is selected to be 0.8 times in this embodiment. The learning rate adjustment at this step corresponds to step S20 in the aforementioned embodiment.
然后进行第三次的学习率调整,在进行合成损失函数的训练时,设置第三学习率,同样为了使损失函数达到更低的水平,第三学习率设置为小于第一学习率,例如在前述中对应步骤设置为0.001,并且保持每5轮将学习率调整为0.7至0.9倍,本实施例中选择为0.8倍。从而让损失函数更快的收敛。该步骤的学习率调整对应于前述实施例中的步骤S30。Then adjust the learning rate for the third time. When training the synthetic loss function, set the third learning rate. Also in order to make the loss function reach a lower level, the third learning rate is set to be less than the first learning rate. The corresponding step in the foregoing is set to 0.001, and the learning rate is adjusted to 0.7 to 0.9 times every 5 rounds, and is selected to be 0.8 times in this embodiment. This allows the loss function to converge faster. The adjustment of the learning rate at this step corresponds to step S30 in the aforementioned embodiment.
本发明还进一步提供对训练过程的可视化方案。在每一次的训练中加入可视化输出。以上述图4对应的实施例为说明。在步骤S10中,对Triplet-Loss的训练结束后,提取不同域的特征,执行数据降维,例如本发明各个实施例中采用T-SNE(T-distributed stochastic neighbor embedding)降维,通过T-sne降维后在二维空间画出特征位置分布,实现训练效果可视化,输出如图6所述的训练效果。The invention further provides a visualization solution for the training process. Add visual output to each training session. The above embodiment corresponding to FIG. 4 will be used as an explanation. In step S10, after the training of Triplet-Loss is completed, the features of different domains are extracted and data dimensionality reduction is performed, for example, T-SNE (T-distributed) is adopted in various embodiments of the present invention (stochastic neighbor embedding) dimensionality reduction. After T-sne dimensionality reduction, the feature position distribution is drawn in two-dimensional space to realize the training effect visualization and output the training effect shown in FIG.
类似的,在步骤S20,MMD+Triplet-Loss的训练中,以及步骤S40的训练合成损失函数阶段,都进行训练结果的可视化输出,得到如图7和图8的训练效果。Similarly, in step S20, MMD+Triplet-Loss training, and step S40 in the training synthesis loss function stage, the training results are visually output to obtain the training effect as shown in FIGS. 7 and 8.
在图6~图8中,每个点表示一张脸部图像的特征位置,相同颜色(或形状)的点代表同一个人的不同图像,点间距离近表示特征相近,跨域识别的理想效果是同一个人的所有点高度聚集,不同人间的点距离较远。可以看出,随着训练进行,相同颜色的点逐渐聚集,不同颜色的点逐渐拉远,最终达到跨域识别效果。In Figures 6 to 8, each dot represents the feature position of a face image, dots of the same color (or shape) represent different images of the same person, and the close distance between dots indicates that the features are similar, ideal for cross-domain recognition It is that all the points of the same person are highly concentrated, and the points of different people are far away. It can be seen that as the training progresses, the points of the same color gradually gather, and the points of different colors gradually pull away, finally achieving the cross-domain recognition effect.
为了更清晰的说明本发明上述实施例在实际的图像识别过程中的使用以及其对应的使用效果,以下将结合具体的实际应用进行说明。In order to more clearly explain the use of the above embodiment of the present invention in the actual image recognition process and its corresponding use effect, the following will be described in conjunction with specific practical applications.
在一个实际使用的环境中,第一域数据为视频图像数据,第二域为身份证上的图像数据。使用的网络为Facenet网络。In a practical environment, the first domain data is video image data, and the second domain is image data on the ID card. The network used is the Facenet network.
首先将视频图像数据和身份证图像数据输入网络,两路数据一方面通过前向传播输入到网络,另一方面这两路数据将会在后续的过程中进行MMD-Loss的训练后再次使用。First input the video image data and ID card image data into the network. On the one hand, the two channels of data are input to the network through forward propagation. On the other hand, the two channels of data will be used again after MMD-Loss training in the subsequent process.
两路数据在经过前向传播输入后,网络进行第一次的训练:采用学习率为0.01,进行各自域的Triplet-Loss训练,在针对该损失函数的训练中会进行多轮,每进行10轮的训练,将学习率调整为0.8倍,这样经过一定轮次的训练后,损失函数将会收敛,表明此时的网络在该损失函数上已经“学习”到这两个域的数据特征。After the two channels of data are input through forward propagation, the network performs the first training: the learning rate is 0.01, and the Triplet-Loss training of the respective domain is performed. In the training for this loss function, there will be multiple rounds, each 10 In the round of training, the learning rate is adjusted to 0.8 times, so that after a certain round of training, the loss function will converge, indicating that the network has now "learned" the data features of these two domains on the loss function.
然后用两个域的最高维度特征计算最大均值差异损失MMD-Loss,并将结果加入合成损失函数,共同进行反向传播和梯度求导,该方式的学习率设置在0.001,同样的,该训练也会进行多个轮次,每进行10轮的训练,将学习率调整为0.8倍,这样经过一定轮次的训练后,将会获得稳定的收敛效果。经过这一方式的训练,两个域实现消除两个域的域间统计分布差异,即网络对于两个域中的图像的“学习”将获得识别的“基础”,为后续将相同人物在视频图像中识别为身份证上的任务提供可能性。Then use the highest dimensional features of the two domains to calculate the maximum mean difference loss MMD-Loss, and add the result to the synthetic loss function to jointly perform back propagation and gradient derivation. The learning rate of this method is set at 0.001. Similarly, the training There will also be multiple rounds, for every 10 rounds of training, the learning rate will be adjusted to 0.8 times, so that after a certain round of training, a stable convergence effect will be obtained. After training in this way, the two domains achieve the elimination of the statistical distribution difference between the two domains, that is, the network "learning" of the images in the two domains will obtain the "basis" of recognition, and will follow the same person in the video. The recognition in the image provides possibilities for tasks on the ID card.
此时,则需要另外的成对数据来对该阶段的网络进行训练,以使得网络能够具备跨域识别匹配的能力:具体来说,选取成对的数据,即预先准备相同人物在视频与在身份证图像上的数据,输入到网络中进行训练,该轮次的训练需要将合成的损失函数中的MMD-Loss去除,并加入第一域和第二域的混合Triplet-Loss,学习率设置0.001,每进行5轮的训练,将学习率调整为0.8倍。经过此轮次的训练后,损失函数收敛并稳定的情况下,网络模型以实现全部的训练。此时的网络,能过对视频中的人物进行识别,提取出“特征”,并且该“特征”能够在身份证的对应的人物中识别并匹配。At this time, additional pairs of data are needed to train the network at this stage, so that the network can have the ability to identify and match across domains: specifically, select pairs of data, that is, prepare the same person in the video and in The data on the ID card image is input to the network for training. This round of training requires removing the MMD-Loss in the synthetic loss function and adding the mixed Triplet-Loss in the first domain and the second domain. The learning rate setting 0.001, every 5 rounds of training, the learning rate is adjusted to 0.8 times. After this round of training, under the condition that the loss function converges and is stable, the network model implements all training. At this time, the network can recognize people in the video and extract "features", and the "features" can be identified and matched among the corresponding people in the ID card.
在使用该模型进行人物识别的时候,只要向网络模型输入来自视频图像的相关数据,就能够从身份证的相关图像数据中匹配出对应的任务,从而实现人物的识别,获得跨域识别的效果。又或者,将身份证图像数据输入,然后在视频数据中识别出人物来。When using this model for character recognition, as long as the relevant data from the video image is input to the network model, the corresponding tasks can be matched from the relevant image data of the ID card, so as to realize the character recognition and obtain the effect of cross-domain recognition . Or, input the ID card image data, and then identify the person in the video data.
根据本发明的又一个实施例,还提供了一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。According to yet another embodiment of the present invention, there is also provided a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments at runtime.
通过上述的实施例的描述可以看出,实施本发明的方案,能够解决待识别图像与训练样本图像具有不同统计分布特性的问题,例如可以通过对视频获得的图像进行分析,匹配出其对应在身份证照片上的身份。解决了目前无法实现这一效果的缺陷。It can be seen from the description of the above embodiments that the solution of the present invention can solve the problem that the image to be recognized and the training sample image have different statistical distribution characteristics. For example, the image obtained by the video can be analyzed to match its corresponding The identity on the ID photo. Solved the shortcomings that currently cannot achieve this effect.
显然,上述所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。Obviously, the embodiments described above are only a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、工作、器件、组件和/或它们的组合。It should be noted that the terminology used herein is only for describing specific embodiments, and is not intended to limit exemplary embodiments according to the present application. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should also be understood that when the terms "comprising" and/or "including" are used in this specification, they indicate There are features, steps, jobs, devices, components, and/or combinations thereof.
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施方式能够以除了在这里图示或描述的那些以外的顺序实施。It should be noted that the terms “first” and “second” in the description and claims of the present application and the above drawings are used to distinguish similar objects, and do not have to be used to describe a specific order or sequence. It should be understood that the data used in this way are interchangeable under appropriate circumstances so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein.
工业实用性Industrial applicability
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above is only the preferred embodiments of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims (15)

  1. 一种跨域网络训练方法,其特征在于,包括以下步骤:        S1:向深度神经网络输入第一域和第二域的样本数据,对所述第一域和第二域的样本数据进行训练,使得在所述深度神经网络在第一域与第二域上各自具有分类能力;        S2:消除域间统计分布差异,使得第一域和第二域具有相近的统计分布特性;        S3:对第一域和第二域进行加强内聚集的训练;        S4:对符合预设条件的训练结果进行保存。A cross-domain network training method, which is characterized by the following steps: S1: input the sample data of the first domain and the second domain to the deep neural network, and train the sample data of the first domain and the second domain, so that the deep neural network is on the first domain and the second domain Each has the ability to classify; S2: Eliminate the difference in statistical distribution between domains, so that the first domain and the second domain have similar statistical distribution characteristics; S3: Strengthen the internal aggregation training for the first domain and the second domain; S4: Save the training results that meet the preset conditions.
  2. 如权利要求1所述的跨域网络训练方法,其特征在于,所述步骤S1中对所述第一域和第二域的样本数据进行训练包括:        对所述第一域和第二域的样本数据进行损失函数Triplet-Loss的训练。The cross-domain network training method according to claim 1, wherein the training of the sample data of the first domain and the second domain in the step S1 comprises: the training of the first domain and the second domain The sample data is trained for the loss function Triplet-Loss.
  3. 如权利要求2所述的跨域网络训练方法,其特征在于,所述步骤S2包括:        当所述损失函数Triplet-Loss稳定且满足收敛时,用所述第一域和第二域的最高维度特征计算最大均值差异损失MMD-Loss,并将结果加入合成损失函数,共同进行反向传播和梯度求导。The cross-domain network training method of claim 2, wherein the step S2 comprises: When the loss function Triplet-Loss is stable and satisfies convergence, the maximum mean difference loss MMD-Loss is calculated using the highest dimensional features of the first domain and the second domain, and the result is added to the synthetic loss function to jointly perform back propagation And gradient derivation.
  4. 如权利要求3所述的跨域网络训练方法,其特征在于,所述步骤S3包括:        将所述合成损失函数中的MMD-Loss去除,并加入所述第一域和所述第二域的混合Triplet-Loss,进行加强内聚集的训练。The cross-domain network training method of claim 3, wherein the step S3 comprises: The MMD-Loss in the synthesis loss function is removed, and the mixed Triplet-Loss of the first domain and the second domain is added to perform the training of enhanced internal aggregation.
  5. 如权利要求4所述的跨域网络训练方法,其特征在于,所述步骤S4包括:        当加强内聚集的训练的合成损失函数收敛且小于设定值时,对训练结果进行保存。The cross-domain network training method of claim 4, wherein the step S4 comprises: When the synthetic loss function of the training with enhanced intra-aggregation converges and is less than the set value, the training results are saved.
  6. 如权利要求2所述的跨域网络训练方法,其特征在于,所述步骤S1中,所述进行损失函数Triplet-Loss的训练还包括:        设置第一学习率,所述第一学习率初始值为0.001至0.01,每10轮所述进行损失函数Triplet-Loss的训练,将第一学习率设置为0.7至0.9倍。The cross-domain network training method according to claim 2, wherein in step S1, the training of the loss function Triplet-Loss further comprises: Set a first learning rate. The initial value of the first learning rate is 0.001 to 0.01. The training of the loss function Triplet-Loss is performed every 10 rounds, and the first learning rate is set to 0.7 to 0.9 times.
  7. 如权利要求6所述的跨域网络训练方法,其特征在于,所述步骤S2中,设置第二学习率,所述第二学习率的初始值小于所述第一学习率的初始值,每10轮所述进行加强内聚集的训练,将第二学习率设置为0.7至0.9倍。The cross-domain network training method according to claim 6, wherein in step S2, a second learning rate is set, the initial value of the second learning rate is less than the initial value of the first learning rate, each In the 10 rounds of training to strengthen the inner aggregation, the second learning rate is set to 0.7 to 0.9 times.
  8. 如权利要求7所述的跨域网络训练方法,其特征在于,所述第二学习率初始值为0.0001至0.001。The cross-domain network training method according to claim 7, wherein the initial value of the second learning rate is 0.0001 to 0.001.
  9. 如权利要求7所述的跨域网络训练方法,其特征在于,所述步骤S3中,设置第三学习率,所述第三学习率的初始值小于所述第一学习率的初始值,每5轮所述进行加强内聚集的训练,将第三学习率设置为0.7至0.9倍。The cross-domain network training method according to claim 7, wherein in step S3, a third learning rate is set, the initial value of the third learning rate is less than the initial value of the first learning rate, each In the five rounds of training to strengthen the inner aggregation, the third learning rate is set to 0.7 to 0.9 times.
  10. 如权利要求9所述的跨域网络训练方法,其特征在于,所述第三学习率的初始值为0.0001至0.001。The cross-domain network training method according to claim 9, wherein the initial value of the third learning rate is 0.0001 to 0.001.
  11. 如权利要求2所述的跨域网络训练方法,其特征在于,所述步骤S1还包括:        对Triplet-Loss的训练结束后,提取所述第一域和第二域的特征,执行数据降维,在二维空间画出所述特征位置分布。The cross-domain network training method according to claim 2, wherein the step S1 further comprises: After the training of Triplet-Loss is completed, the features of the first domain and the second domain are extracted, data dimensionality reduction is performed, and the feature position distribution is drawn in a two-dimensional space.
  12. 如权利要求3所述的跨域网络训练方法,其特征在于,所述步骤S2还包括:        对合成损失函数训练结束后,提取所述第一域和第二域的特征,执行数据降维,在二维空间画出所述特征位置分布。The cross-domain network training method of claim 3, wherein the step S2 further comprises: After the training of the synthetic loss function is completed, the features of the first domain and the second domain are extracted, data dimensionality reduction is performed, and the feature position distribution is drawn in a two-dimensional space.
  13. 如权利要求4所述的跨域网络训练方法,其特征在于,所述步骤S3还包括:        进行加强内聚集的训练结束后,提取所述第一域和第二域的特征,执行数据降维,在二维空间画出所述特征位置分布。The cross-domain network training method according to claim 4, wherein the step S3 further comprises: After the training to strengthen the inner aggregation is completed, the features of the first domain and the second domain are extracted, data dimensionality reduction is performed, and the feature position distribution is drawn in a two-dimensional space.
  14. 一种图像识别方法,其特征在于,对深度神经网络进行如权利要求1-13任一所述的训练,使用训练后的深度神经网络对图像进行识别。An image recognition method, characterized in that the deep neural network is trained according to any one of claims 1-13, and the trained deep neural network is used to recognize the image.
  15. 一种存储介质,其特征在于,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1-13任一所述的训练方法。A storage medium characterized in that a computer program is stored in the storage medium, wherein the computer program is set to execute the training method according to any one of claims 1-13 when it is run.
PCT/CN2019/112492 2018-12-07 2019-10-22 Cross-domain network training method and cross-domain image recognition method WO2020114119A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811500433.8A CN111291780A (en) 2018-12-07 2018-12-07 Cross-domain network training and image recognition method
CN201811500433.8 2018-12-07

Publications (1)

Publication Number Publication Date
WO2020114119A1 true WO2020114119A1 (en) 2020-06-11

Family

ID=70974915

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/112492 WO2020114119A1 (en) 2018-12-07 2019-10-22 Cross-domain network training method and cross-domain image recognition method

Country Status (2)

Country Link
CN (1) CN111291780A (en)
WO (1) WO2020114119A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610121A (en) * 2021-07-22 2021-11-05 哈尔滨工程大学 Cross-domain task deep learning identification method
CN113780526A (en) * 2021-08-30 2021-12-10 北京的卢深视科技有限公司 Network training method, electronic device and storage medium
CN116147724A (en) * 2023-02-20 2023-05-23 青岛鼎信通讯科技有限公司 Metering method suitable for ultrasonic water meter

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170116467A1 (en) * 2015-03-18 2017-04-27 Adobe Systems Incorporated Facial Expression Capture for Character Animation
CN107491726A (en) * 2017-07-04 2017-12-19 重庆邮电大学 A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks
CN107735795A (en) * 2015-07-02 2018-02-23 北京市商汤科技开发有限公司 Method and system for social relationships identification
CN108573211A (en) * 2018-03-05 2018-09-25 重庆邮电大学 A kind of face feature extraction method based on local feature and deep learning
WO2018187953A1 (en) * 2017-04-12 2018-10-18 邹霞 Facial recognition method based on neural network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015143580A1 (en) * 2014-03-28 2015-10-01 Huawei Technologies Co., Ltd Method and system for verifying facial data
CN106096551B (en) * 2016-06-14 2019-05-21 湖南拓视觉信息技术有限公司 The method and apparatus of face position identification
KR20180027194A (en) * 2016-09-06 2018-03-14 한화테크윈 주식회사 Apparatus For Detecting Face
CN106778519A (en) * 2016-11-25 2017-05-31 深圳市唯特视科技有限公司 A kind of face verification method by matching user identity certificate and take pictures certainly
KR102403494B1 (en) * 2017-04-27 2022-05-27 에스케이텔레콤 주식회사 Method for learning Cross-domain Relations based on Generative Adversarial Network
CN107704812A (en) * 2017-09-18 2018-02-16 维沃移动通信有限公司 A kind of face identification method and mobile terminal
CN108334904A (en) * 2018-02-07 2018-07-27 深圳市唯特视科技有限公司 A kind of multiple domain image conversion techniques based on unified generation confrontation network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170116467A1 (en) * 2015-03-18 2017-04-27 Adobe Systems Incorporated Facial Expression Capture for Character Animation
CN107735795A (en) * 2015-07-02 2018-02-23 北京市商汤科技开发有限公司 Method and system for social relationships identification
WO2018187953A1 (en) * 2017-04-12 2018-10-18 邹霞 Facial recognition method based on neural network
CN107491726A (en) * 2017-07-04 2017-12-19 重庆邮电大学 A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks
CN108573211A (en) * 2018-03-05 2018-09-25 重庆邮电大学 A kind of face feature extraction method based on local feature and deep learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610121A (en) * 2021-07-22 2021-11-05 哈尔滨工程大学 Cross-domain task deep learning identification method
CN113610121B (en) * 2021-07-22 2023-09-29 哈尔滨工程大学 Cross-domain task deep learning identification method
CN113780526A (en) * 2021-08-30 2021-12-10 北京的卢深视科技有限公司 Network training method, electronic device and storage medium
CN113780526B (en) * 2021-08-30 2022-08-05 合肥的卢深视科技有限公司 Face recognition network training method, electronic equipment and storage medium
CN116147724A (en) * 2023-02-20 2023-05-23 青岛鼎信通讯科技有限公司 Metering method suitable for ultrasonic water meter
CN116147724B (en) * 2023-02-20 2024-01-19 青岛鼎信通讯科技有限公司 Metering method suitable for ultrasonic water meter

Also Published As

Publication number Publication date
CN111291780A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN107506717B (en) Face recognition method based on depth transformation learning in unconstrained scene
Qin et al. Learning meta model for zero-and few-shot face anti-spoofing
CN109598268B (en) RGB-D (Red Green blue-D) significant target detection method based on single-stream deep network
CN110084135B (en) Face recognition method, device, computer equipment and storage medium
WO2020244434A1 (en) Method and apparatus for recognizing facial expression, and electronic device and storage medium
WO2020114119A1 (en) Cross-domain network training method and cross-domain image recognition method
CN104809426B (en) Training method, target identification method and the device of convolutional neural networks
WO2019152983A2 (en) System and apparatus for face anti-spoofing via auxiliary supervision
CN109858466A (en) A kind of face critical point detection method and device based on convolutional neural networks
JP2018160237A (en) Facial verification method and apparatus
Oyedotun et al. Facial expression recognition via joint deep learning of rgb-depth map latent representations
CN105359162A (en) Image masks for face-related selection and processing in images
Zhang et al. Content-adaptive sketch portrait generation by decompositional representation learning
CN105874474A (en) Systems and methods for facial representation
CN111652082B (en) Face living body detection method and device
WO2012162202A2 (en) Dual-phase red eye correction
EP3839768A1 (en) Mediating apparatus and method, and computer-readable recording medium thereof
CN106529494A (en) Human face recognition method based on multi-camera model
WO2021184754A1 (en) Video comparison method and apparatus, computer device and storage medium
CN112541422A (en) Expression recognition method and device with robust illumination and head posture and storage medium
CN110046544A (en) Digital gesture identification method based on convolutional neural networks
CN113591763B (en) Classification recognition method and device for face shapes, storage medium and computer equipment
CN103544478A (en) All-dimensional face detection method and system
JP4757787B2 (en) Emotion estimation device
KR102160128B1 (en) Method and apparatus for creating smart albums based on artificial intelligence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19893239

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.09.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19893239

Country of ref document: EP

Kind code of ref document: A1