CN109034186B

CN109034186B - Handwriting data identification method based on DA-RBM classifier model

Info

Publication number: CN109034186B
Application number: CN201810595182.XA
Authority: CN
Inventors: 赵子恒; 赵煜辉; 刘赣; 单鹏
Original assignee: Northeastern University Qinhuangdao Branch
Current assignee: Northeastern University Qinhuangdao Branch
Priority date: 2018-06-11
Filing date: 2018-06-11
Publication date: 2022-05-24
Anticipated expiration: 2038-06-11
Also published as: CN109034186A

Abstract

The invention relates to a method for establishing a DA-RBM classifier model, which comprises the following steps: obtaining source domain numberAccording to X^sAnd source domain data X^sCorresponding label Y^sTarget domain data X^TAnd label Y^T(ii) a Initializing RBM model parameters, and collecting data X^s、X^TInputting the first-order characteristics into an RBM network; taking the first-order features as input of a next-order network, and carrying out RBM training; outputting hidden layer H of RBM_s、H_TInputting the data into a softmax regression layer for classification; performing constraint on distribution of source domain data and target domain data by using MMD on RBM hidden layer output; using MMD to carry out constraint of a prediction result in a top classification layer of the RBM model; and constructing a total cost function J (theta) of the model, and optimizing the parameters of the classifier model by optimizing the total cost function. The model established by the invention can effectively identify cross-domain data.

Description

Handwriting data identification method based on DA-RBM classifier model

Technical Field

The invention relates to the field of deep learning identification, in particular to a method for establishing a DA-RBM classifier model.

Background

The character recognition information is generally divided into two types, one is for recognizing character information, and the other is mainly for recognizing printed matters such as characters of different countries or different nationalities, such as information of newspapers and periodicals, and handwritten characters. The other is the identification of data information, and has important application in the field of digital information identification, such as a series of data of enterprise report data, bank report data, postal code data and the like. In the series of digital information, great manpower and material resources are required to supervise and process the data, especially, the data volume processed by the increasingly developed financial industry is larger and larger, and if only the human is used for processing, the efficiency is low, and the error rate is large. It would be clearly advantageous if this information could be handled automatically, not only to reduce the probability of coarseness, but also to save a great deal of time. If recognition of handwritten numbers is to be performed, these data need to be classified. Selecting an appropriate recognition algorithm has a significant impact on improving the recognition rate of handwritten digits. Therefore, techniques that can handle recognition of handwritten digital fonts would be of great benefit to the financial field as well as other fields of data interaction. In addition, in the current society, all the works require intellectualization, and the flow of the work of data entry, data checking and the like can be simplified and the work efficiency can be improved through a handwritten number recognition technology. With the improvement of computer technology, especially the progress of machine learning, the recognition of handwritten numbers by using a machine learning algorithm is gradually emphasized by people, which brings good news for the automatic recognition of the handwritten numbers. In the field of machine learning, deep learning algorithms are receiving more and more attention, so that the application of deep learning algorithms to handwritten number recognition technology is also a hotspot, and good results are achieved in this respect, for example, google laboratories can almost reach recognition rate of more than 99% when using convolutional neural networks to recognize handwritten numbers. The method provides a test basis for commercial use, and the application of the deep learning algorithm to the handwritten digit recognition technology expands the direction of digit recognition, and places attention on processing handwritten digit recognition by using the deep learning method.

In the deep learning field, the traditional machine learning algorithm performs well when handling handwritten number recognition, but the traditional deep learning algorithm such as RBM (restricted boltzmann machine) requires that the processed data belong to the same distribution, that is, the training data and the test data are from the same data set. However, the recognition of handwritten digital fonts in the real world comes from different data sets, i.e. their distribution is different, so that it is not suitable to use the conventional deep learning algorithm to classify mixed domain data. Moreover, a large amount of training samples are needed for training a reliable model by using a traditional machine learning algorithm, and in the real world, it is sometimes difficult to acquire enough labeled data which can be used for training. Using the labeled dataset to train the model and then applying this model to a target task that is related to but distinct from the labeled dataset is a very important application in real life.

Disclosure of Invention

The invention provides a method for establishing a DA-RBM classifier model in order to overcome the problem of inadaptation caused by different data sets of training data and test data, which comprises the following steps:

s110, acquiring source domain data X^sAnd source domain data X^sCorresponding label Y^sTarget domain data X^TAnd label Y^T；

S120, initializing RBM model parameters, and enabling data X^s、X^TInputting the first-order characteristics into an RBM network;

s130, taking the first-order features as input of a next-order network, and carrying out RBM training;

s140, outputting H from the hidden layer of the RBM_s、H_TInputting the data into a softmax regression layer for classification;

s150, performing constraint on distribution of source domain data and target domain data by using MMD on RBM hidden layer output;

s160, using MMD to carry out constraint of a prediction result in a top classification layer of the RBM model;

s170, constructing a total cost function J (theta) of the model, and optimizing parameters of the classifier model by optimizing the total cost function.

Further, in step S130, a Gibbs sampling and contrast divergence algorithm is used for RBM training.

Further, the step S130 further includes setting the number of hidden layer units to m and the learning rate to the source domain data and the target domain data

The maximum training period is T, the parameter setting of the RBM network is respectively that the connection weight is set as W, the layer bias is set as b, and the hidden layer bias is set as c; the RBM network is initialized, and then the activation probability of all hidden layer nodes of the source domain data is calculated, and the activation probability P (h)_s＝1|v_s) The calculation formula is as follows:

P(h_sj＝1|v_s)＝σ(c+∑_iv_1iw_ij)

and (3) according to the conditional probability of the hidden node, using Gibbs sampling to solve the output of the hidden node by the following form:

h_s～P(h_s|v)

calculating the activation probability P (h) of all hidden nodes of the target domain data_T＝1|v_T) The calculation formula is as follows:

P(h_Tj＝1|v_T)＝σ(c+∑_iv_1iw_ij)

and (3) according to the conditional probability of the hidden layer node, using Gibbs sampling, and solving the output of the hidden layer node of the target domain in the following way: h is_T～P(h_T|v)。

Further, the step S140 includes: the classification solving calculation formula in the softmax regression layer is as follows:

the following formula is used for the source domain data:

the following formula is used for the target domain:

further, the last layer in the DA-RBM model is a classification layer, the output is the probability of belonging to each class, and the DA-RBM classifier measures the MMD-based distribution loss of a source domain and a target domain at the output of a feature extraction layer and the output of the classifier; on feature distribution, feature distribution differences over two domains are measured by feature MMD in the objective function:

wherein H represents the output of the hidden layer of the RBM;

adding MMD loss at the level of the classifier, and defining the MMD loss as a conditional MMD, wherein the calculation formula of the MMD loss is as follows:

wherein C is the number of label types, and q corresponds to the vector formed by all the outputs of a certain type.

Further, in the step S150, the total cost J (θ) is expressed as follows:

where L (θ) is the loss function of the classifier.

Further, the source domain data is an MNIST data set, and the target domain data is a USPS data set.

Through the technical scheme of the embodiment, the invention establishes a model capable of processing cross-domain handwritten data, and the recognition effect is good.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:

FIG. 1 is a schematic view of an RBM model according to some embodiments of the present invention;

FIG. 2 is a diagram of a domain adaptive learning model in some embodiments of the invention;

FIG. 3 is a diagram of a DA-RBM classifier model in some embodiments of the invention;

FIG. 4 is a diagram illustrating classification results in some embodiments of the inventions;

FIG. 5 is a diagram illustrating USPS classification results in some embodiments of the present invention;

FIG. 6 is a diagram illustrating classification results in some embodiments of the inventions;

FIG. 7 is a diagram illustrating the result of DA-RBM classifying cross-domain data in some embodiments of the present invention;

FIG. 8 is a flow chart of a method for building a DA-RBM classifier model according to some embodiments of the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

In the field of deep learning research, the MNIST database is used as experimental data of various algorithms, good experimental effects are obtained, and the traditional deep learning field has good effects on solving the data processing of a single domain, but the deep learning has some problems in solving the data processing of some cross-domain aspects. The use of a Restricted Boltzmann Machine (RBM) for pattern recognition and regression has been shown to be a practical and efficient technique. However, the RBM focuses on unsupervised learning, semi-supervised learning, and supervised learning all in a single domain (i.e., source data). The learning capability of a limited Boltzmann machine of a mixed Domain is not researched, and in order to improve the learning capability of the RBM, the invention combines the RBM with Domain Adaptation (DA) to expand the learning capability of the RBM. The present invention proposes a unified framework, called the domain-adaptive constrained boltzmann machine (DA-RBM), which constructs a robust classifier by learning source domain data and target domain (target data) data. The method takes MNIST (hand-written digital data) as source domain data, takes USPS (universal serial bus) hand-written digital data as target domain data, learns a mixed feature library through a DA-RBM (digital-analog-to-radial) frame, and then identifies data of other target domains. At the classification level, the invention uses Softmax Regression (SR) for classification, and uses the Maximum Mean variance (MMD) algorithm for constraint in order to form a classifier capable of classifying mixed domain data. The specific way is to use MMD to constrain the feature distributions extracted in two domains (source domain and target domain), so that the feature distributions in the two domains are as same as possible under the condition of constraint, and meanwhile, the SR classification results in the two domains are constrained by MMD, so that the distributions of the two classification results are as same as possible. The main contribution of the invention is to form a method suitable for solving mixed domain data classification by combining DA algorithm, RBM algorithm and SR classification algorithm.

The RBM is a method for learning a probability distribution about input data, and can be trained in a supervised learning manner or an unsupervised learning manner, and various neural networks formed by using the RBM learning are widely applied to various practical problems such as classification, modeling and regression.

The RBM belongs to a non-directional probability map model, which is also called a Markov random field model, and the model consists of a series of random variables, the relationship between the random variables can be represented by a non-directional map, and the characteristics of the Markov random field are also met, namely, the future state of condition probability distribution in a random variation process is related to the current state and has no relationship with other states. The Markov characteristic of the Markov chain is embodied in that after other neighbor points of a given node are given, all nodes keep conditional independence, namely, the current node only has a relationship with the node directly connected with the current node and is unrelated with other unconnected nodes. A series of processes for reasoning and solving of the undirected probabilistic graphical model involve energy functions, probability densities, parameter estimation and other related methods.

The energy function is used for describing the measure of a system state, and when the system state is more ordered or the probability distribution is more concentrated, the energy of the system is smaller; the more disordered the system state or the more evenly distributed the probability distribution, the more the energy of the system will be. Since the undirected probabilistic graphical model belongs to one of random networks and is introduced by comparing energy functions in dynamics and statistical mechanics, the smaller the energy function is, the more stable the state of the system is. For the undirected probability graph model, global variables can be provided for the whole network by defining an energy model, an objective function can be provided for the learning of the model, and an optimization rule of the model can be defined, even if the numerical value of the energy function tends to be minimum, so that when the whole model is the most stable, the parameters of the model are the optimized parameters, and the probability distribution function represented by the solution model is provided with the possibility of solving. Therefore, designing an energy function for an undirected graph probability model helps to provide a solution from a mathematical perspective. It also belongs to a special form of Markov random field model for RBM, and therefore it also defines an energy function model when training the model.

The RBM generally has a two-layer structure of a display layer v and a hidden layer h, wherein the connection weight between the display layer and the hidden layer is represented by w, the bias of the display layer is represented by b, and the bias of the hidden layer is represented by c. The schematic diagram of the RBM model is shown in FIG. 1.

In the restricted boltzmann model, since there are no connections of the explicit layer unit and connections of the implicit layer and the implicit layer unit, the E (v, h | θ) function for the energy of the RBM is defined as:

wherein θ ═ { w ═ w_ij,b_i,c_jThe RBM model parameters can be obtained by obtaining the joint probability distribution of (v, h) according to the energy function in the process of training model optimization parameters, wherein the joint probability distribution function of (v, h) is as follows:

wherein Z (θ) is a normalization factor.

By knowing the joint probability distribution of the hidden layer, the invention can obtain the edge probability distribution of the hidden layer or the display layer or the conditional probability distribution of the hidden layer through a mathematical method such as summation or integration, and the edge probability distribution of the display layer v, the edge probability of the hidden layer h, the conditional probability distribution of the display layer v relative to the hidden layer h and the conditional probability distribution of the hidden layer h relative to the display layer v are as follows:

gibbs sampling is a kind of markov chain monte carlo sampling algorithm, and is mainly used for constructing random samples of multivariate probability distribution, such as constructing joint probability distribution of two or more variables. For the RBM model, because the joint probability distribution between the explicit layer and the implicit layer is difficult to calculate due to the existence of the normalization factor, the solution needs to be carried out in an approximate mode, and Gibbs sampling has the capability, so that when the integral, the expectation and the joint probability distribution cannot be calculated, the approximate solution can be obtained through the Gibbs sampling. The basic principle of Gibbs sampling is to assume that one random vector sample with D dimension X ═ X (X)₁,...,X_D) The joint probability calculation is difficult, that is, it is difficult to solve by the conventional calculation method of solving the joint probability. The invention assumes that the conditional probability of a certain component in a sample vector to other sample components is known based on the prior knowledge and can be used

Is shown in which

Then the present invention can iteratively sample any vector in the sample vector using the above conditional probabilities, and when the number of iterations of the present invention is large enough, the probability distribution of the heap random variables will convergeJoint probability distribution p (X) at X. The invention of sampling by Gibbs can sample without knowing the joint distribution of samples X. The Gibbs sampling capability can exactly solve the problem that the sampling cannot be carried out under the condition that the joint probability between the display layer and the hidden layer in the RBM model is unknown.

The RBM model has the characteristic of structural symmetry, and the visible layer and the hidden layer of the RBM model are conditionally independent relative to each other, so that random samples which accord with the distribution defined by the RBM model can be obtained through Gibbs sampling. Given the RBM model, the specific algorithm for k-step sampling using Gibbs sampling method is as follows, using a training sample (or any random state of the display layer) to initialize the state v of the display layer₀Alternately, the following sampling is performed:

h₀～p(h|v₀)，v₁～p(v|h₀)

h₁～p(h|v₁)，v₂～p(v|h₁)

......，h_k+1～p(v|h_k+1)

under the condition of sufficiently large sampling, the sample distribution required by the RBM learning model can be obtained, and the problem that the joint distribution between the explicit layer and the implicit layer of the RBM cannot be solved in the maximum likelihood function can be solved.

Although the problem that the joint probability distribution cannot be solved can be solved by using the Gibbs sampling algorithm, in practical calculation, especially when the data of the invention is data with higher dimensionality and the training sample size is larger, the data sample sampling by using the Gibbs sampling algorithm becomes extremely difficult. In order to improve the training efficiency of the RBM model, xinton in 2002 proposed a method for solving the problem of low training efficiency of the RBM model, namely a contrast divergence algorithm. Unlike the Gibbs sampling algorithm, for the contrast divergence algorithm, when the present invention initializes v using sample data₀Only one or a few steps of Gibbs sampling can obtain a good approximation.

In the contrast divergence algorithm, the invention gives sample data to initialize a display layer and then calculates the state of a hidden layer, and in the RBM model, because the hidden layer states are independent of each other on the premise of giving the display layer state, the probability of solving the activation of the hidden layer node of the jth node is as follows:

similarly, when all hidden layer states are determined, the activation probabilities of all display layer states are conditionally independent, so that the probability of solving activation of the ith display layer node of a display layer when a hidden layer is determined is as follows:

through the calculation, the state of the display layer node can be regarded as the reconstruction of the display layer data, so that the updating rule of each parameter in the RBM model under the condition is as follows:

Δw_ij＝η(<v_ih_j>_data-<v_ih_j>_recon)，

Δb_i＝η(<v_i>_data-<v_i>_recon)，

Δc_j＝(<h_j>_data-<h_j>_recon)

compared with the traditional Gibbs sampling, the method uses the reconstruction distribution of the sample data to replace the approximate model distribution in the original Gibbs sampling in the contrast divergence algorithm, thereby not only optimizing the RBM model efficiency, but also fully utilizing the characteristic information of the sample data and often obtaining good effect in the actual training.

For a given sample of RBM model, the appropriate parameters are found to fit the training sample, i.e., the resulting probability distribution of the RBM model matches the data sample as closely as possible in determining the parameters. The invention can determine parameters through maximum likelihood estimation, and the core of the maximum likelihood estimation is to form a model through learning so as to maximize the probability that a training sample is observed in the learned model. For the undirected probabilistic graphical model, the joint probability distribution obtained by training can be matched with the distribution of the sample data as much as possible. In the invention, T is assumed to be the total amount of training samples, and the training samples are independently and uniformly distributed, and the key point of RBM training is to maximize the following likelihood function:

since the maximum likelihood is solved, the maximum value is solved for the likelihood function, the solving process is to differentiate the parameters, and then the objective function is continuously lifted by adopting a gradient lifting method to reach the final stop condition. The invention firstly needs to solve the logarithm of the maximum likelihood function and then differentiates the logarithm, and the calculation formula is as follows:

since the normalization factor still exists in the above equation, the solution cannot be directly performed, and the Gibbs adoption and contrast divergence algorithm proposed above is required for performing the solution.

As with training other neural networks, there are many parameters for the RBM model network to be set, such as the number of hidden nodes, the learning rate, and initialization of parameters. The setting of these parameters is crucial to training a stable RBM model. It is not good to increase the number of hidden nodes, which is usually set based on the characteristics of data, and it is generally set for the learning rate that the update amount of the weight is 10 times the weight^-3Left and right, the initial setting for bias, etc., is generally a random value generated using a positive distribution.

Domain adaptive learning asThe algorithm core of the transfer learning naturally conforms to the core of the transfer learning, for the transfer learning, the psychology of the invention often utilizes the experience of the predecessor to solve the existing related problems, the behavior is actually one of the transfer learning, and the invention applies the model knowledge learned in other fields to the learning models in other related but different fields from the machine learning point of view. Definition of the concept of domain-adapted learning the present invention can therefore be explained as follows: given a source domain data D_sAnd source domain learning task T_sTarget domain data D_TAnd target domain learning task T_TBy using D_sAnd T_sLearning a prediction function f (-) on a target domain, wherein the constraint condition is that the feature spaces of the source domain and the target domain are the same, and the class spaces of the source domain and the target domain are the same, but the data distribution of the source domain and the target domain is different.

The purpose of domain adaptation is to solve a learning problem in the target domain by using training data in the source domain, and possibly different data distributions in the source domain and the target domain. Since tagged data is often difficult to obtain on the target domain, learning knowledge of the source domain data to solve a problem on the target domain becomes extremely important to solve such a problem. Domain-adapted learning is generally considered to be a special kind of migratory learning, which is generally referred to as migratory shared knowledge between different but related domains. The main concern for domain adaptation is how to reduce the difference between the previous distributions of the source domain and the target domain, it is very important to train the feature representation in the mixed domain, a good feature representation should reduce the difference between the data distributions of the source domain and the target domain as much as possible, and therefore the final criterion for domain adaptation is to make the data distributions of the source domain and the target domain as similar as possible on the feature representation.

Domain in Domain Adaptation the present invention considers that it contains two layers of meaning, one is the feature space of the input data X and the other is the probability distribution p (X) of the input data, where X ═ { X }₁，...，x_nIs a series of learning samples. In domain-adapted learning in general, the present inventionIt is usually assumed that there are two different domains, for which they may usually be composed of different feature spaces or have different data probability distributions. The source domain and the data domain, which are the focus of the present invention in this slice paper, have the same feature space, and they are different in that the probability distribution of the data is different. The invention assumes tagged data D in the domain adaptation model_SFor source domain data, the present invention assumes unlabeled data D_TIs the target domain data. Wherein the source domain data D_S＝((xs₁,ys₁),...,(xs_n1,ys_n1) Similar for target domain data

The present invention uses P (X)_S) And

representing the distribution of data from a source domain and a target domain, where P and

are not identical. For the domain adaptation of the invention, the task of the invention is to predict the label yT of the data of the target domain_i. For most domain adaptation learning, the present invention generally makes the following assumptions:

P(Y_S|X_S)＝P(Y_T|X_T)

for domain adaptation, the present invention also applies source domain training knowledge to the target domain, and the domain adaptation learning model is shown in FIG. 2.

For domain adaptation, the machine learning model can be made more extensive by migrating the knowledge learned in the source domain to the target domain. For the invention, the RBM learning model used by the invention can be more robust and can be suitable for data of different domains.

The domain adaptive learning does have a great play in solving the cross-domain learning problem, so how to implement the domain adaptive learning becomes very important, and other different fields are included for the domain adaptation, such as learning about multi-domain problems and learning about single-domain problems. Supervised or unsupervised domain adaptive learning, and various domain adaptive classifications, must be different solutions. Meanwhile, for the domain adaptation learning task, the domain adaptation problem can be solved by the same model layer or from various layers such as a feature space or a relation. Therefore, for domain adaptive learning, selecting an appropriate learning model becomes crucial to solve the cross-domain learning problem. In the domain adaptive learning, the data sets concerned by the invention are different, namely the knowledge which is possible to migrate is different, and the expansion capability of the training model can be improved for specific data by selecting different domain adaptive learning methods according to the characteristics of the data. The learning method for solving the domain adaptation focuses on the data distribution similarity of the source domain and the target domain, and if the shared knowledge base constructed by the learned model can be compatible with the knowledge of the source domain and the target domain, great help is provided for solving the learning problem of cross-domain data.

If there are many shared features in the source domain data and the target domain data in the domain adaptation learning, the present invention can seek a solution by migrating the features at this time. Therefore, when the method can judge that the source domain data and the target domain data have a lot of similarities in characteristics, the method can rewrite the source domain data, screen out the data with extremely high similarity to the target domain data and then train and learn. The targeted learning mode has great superiority for solving the cross-domain problem with many common characteristics. Therefore, the invention can use the example-based migration mode to solve the domain adaptation learning problem. If the source domain data and the target domain data have some common cross features, the invention can transform the source domain data and the target domain data into the same feature space by transforming the feature space and then learn through the traditional machine learning algorithm. The primary approach used by the present invention for the present invention is based on analysis of the migration component, i.e., transforming the source and target domain data to the same space and then de-constraining the learning model in a manner that minimizes the distance between them. The constraint commonly used in the present invention is the Maximum mean variance (MMD). The following article therefore describes some knowledge about the MMD algorithm.

The maximum mean difference is used to solve the problem of two-sample solution, that is, the problem of two data distributions, and the domain adaptive learning can be used to determine the source domain data distribution and the target domain data distribution. The maximum average difference is mainly measured by the difference of the overall means to measure the distribution difference of the data. Therefore, for domain adaptation learning, one way to find a common data distribution space is to constrain the data distribution space by the maximum mean difference, and the present invention can solve the problem of the common data distribution space required for domain adaptation by the constraint of MMD. Different solutions to the MMD algorithm the invention can be understood from any spatial MMD, and the invention can also be understood by regenerating the MMD in the hubert space of the kernel. The basic assumption for maximum mean difference is the generation function f for all sample spaces, and two data distributions are considered to be the same distribution if the sample data has enough mean values equal on the corresponding images of the f function.

When the MMD is used to perform the constraint of domain-adaptive learning data feature distribution, the constraint is mainly performed by regenerating the hilbert space MMD, and therefore, in the following description, the present invention will focus on the definition of regenerating the hilbert space MMD and some basic concepts. The invention first assumes two differently distributed data sets, one of which satisfies P distribution, and defines it as source domain data X^(s)＝[x^(s)1,...,x^(s)n_s]And the other satisfies Q distribution, which is defined by the present invention as target domain data X^(t)＝[x^(t)1,...,x^(t)n_t]The invention uses H to represent the regenerated Hilbert space, which is obtained by a mapping function represented by phi (phi), whose function is to map the original data to featuresSpatially. When mapping the raw features of the data to a feature space, the present invention can represent the maximum average difference using the following formula:

the present invention uses the difference in population mean between the source domain and the target domain in domain adaptation learning to represent the difference between the source domain data distribution and the target domain data distribution. Therefore, using the MMD algorithm in domain-adapted learning can constrain the distribution of data between two domains at the feature space level. The constraint behavior of the MMD algorithm on the feature space can provide a chance for the traditional machine learning to intervene in the domain adaptation learning.

Logistic regression is an efficient classification that can be used not only for prediction of data labels but also to calculate the magnitude of the probability of occurrence of various labels. In the invention, the identification of the handwritten numbers is required, because the selection of a proper classification algorithm for identification has great influence on the identification efficiency. In the core model provided by the invention, a classification layer needs to be added at the top layer of the deep neural network, a softmax regression classifier is selected as the classifier model of the invention in the model, the softmax regression classifier is a kind of logistic regression, and the classification which can be carried out by the logistic regression is necessarily a two-classification problem, but the handwritten digital font identification of the invention is a multi-classification problem, so that the classification by using the traditional logistic regression model becomes extremely unsuitable. Therefore, it becomes important to select variants of logistic regression to solve the multi-classification problem at this time.

In softmax regression, the invention assumes that there are c classification problems, where the classification label is y_iE {1, c }, how to identify the multi-classification problem in the softmax regression, i.e. how to expand the two-classification problem of the logistic regressionThe method is developed into a multi-classification problem, namely, firstly, any one class is selected as one class, then the rest class is regarded as the other class, and a two-class classifier is constructed, so that c classifiers can be constructed. For this c classifier the invention can be defined as follows:

by the method for classifying the classes, a softmax regression can be easily used for carrying out multi-classification problems, so in the core model of the invention, the softmax model is used for solving the classification problems.

The DA-RBM classifier model can form a public feature library by transferring the source domain data knowledge to the target domain, make up the situation of insufficient knowledge when the model learns the target domain data, and respectively carry out constraint on the feature space and the classification result in an MMD mode. The DA-RBM classification model expands the learning capacity of the RBM, so that the RBM can conveniently solve the cross-domain problem, and particularly, under the condition that a target domain mark sample is not large, the cost of marking in the target domain is reduced by transferring the previous related but different data set knowledge. The DA-RBM classifier model is used for proposing data features by means of the advantages of strong characteristic expressive force, high speed and the like of RBM, then a domain adaptation method is used for carrying out constraint on feature space so that the source domain data feature space distribution and the target domain data feature space distribution are similar as much as possible, meanwhile, the invention is used for classifying by using a classifier algorithm on the top layer, the invention is also used for constraining the probability distribution of classification results by using the domain adaptation method so that the source domain data classification result distribution and the target domain data classification result distribution are similar as much as possible, and then the parameters of the whole model are updated by the constraint so that the DA-RBM classifier model obtains the capability of processing cross-domain data.

The traditional learning in the field of handwritten digit recognition is that the invention divides a training set test set on the same data set, namely a source domain, trains a model through the training set, then applies the trained model to the test data set, and usually adjusts machine learning parameters, so as to well recognize handwritten digits. This behavior of training and testing on the same domain is a prerequisite for traditional machine learning, which also requires that the training data set and the testing data set come from the same domain for RBM, and an undesirable modeling effect may occur if they are different domains. The conventional rule for RBM processing handwritten digit recognition is to first extract features into the hidden layer and then add a classification layer, such as softmax regression, on top of the hidden layer. However, the RBM only has learning ability to solve a single domain, and does not seem to be very consistent with his processing idea for solving the problem across domains. Therefore, there are theoretical limitations to solving the cross-domain handwritten digit recognition RBM. The limitation of the RBM is that generally, the feature database extracted by using source domain data is inapplicable when being applied to a target domain, so that how to use the feature database extracted by using source domain data to learn a large amount of model knowledge in the past and how to design a mode for constructing a public feature database to solve the problem that the processing of the RBM is not ideal for cross-domain handwritten number recognition is important.

Aiming at the situation that training data and test data are required to be distributed from the same data when the traditional RBM algorithm is used for solving the problem of handwritten number recognition, the invention provides a DA-RBM algorithm model for processing cross-domain handwritten number recognition, and the algorithm aims to mine useful information in a source domain to recognize data of a target domain. Although the data distribution of the target domain and the data distribution of the source domain are different, the knowledge of the source domain can still be migrated into the learning of the target domain. The key point of identifying handwritten numbers through RBM is to construct a proper network structure, so a proper learning model should be established in the DA-RBM model of the invention. The main idea of the invention is to perform data distribution constraint of a source domain and a target domain on a feature extraction layer so that the data distribution of the source domain and the data distribution of the target domain are as similar as possible, and simultaneously perform constraint on a classification result on a classification layer, and construct a public feature model library through twice constraint to realize domain adaptive learning.

The method mainly aims at performing handwritten number recognition, so that a DA-RBM is constructed for recognizing handwritten fonts, the RBM is used for feature extraction, then a multi-classifier softmax regression is added at the top end of the features for classification, in the features, the data probability distribution of a source domain and a target domain of an MMD algorithm is utilized for result constraint on the results of a classification layer, an overall cost function is constructed through the two layers of constraints and the cost function of a neural network, and then the parameters of the whole network are optimized through optimizing the overall cost function in a reverse mode, so that a model trained by the whole network can be used for classifying cross-domain data. A schematic diagram of the DA-RBM classifier model is shown in FIG. 3.

The above model gives the processing scheme of the present invention from the flow of data processing, and the present invention will then process this process from a mathematical perspective. Suppose that the present invention presents source domain data D_sAnd a tag value Y_sAnd target domain data D_TAnd a small number of labels Y_TSetting the number of hidden layer units as m and the learning rate as m for the source domain data and the target domain data

The maximum training period is T, the parameter setting of the RBM network is respectively the connection weight setting of W, the layer bias can be set as b, the hidden layer bias is set as c, the random network is initialized, then the activation probability of all hidden layer nodes of the source domain data is calculated, and P (h) is calculated_s＝1|v_s) The calculation formula is as follows:

P(h_sj＝1|v_s)＝σ(c+∑_iv_1iw_ij) Wherein i, j denotes the node reference number, w_ijRepresenting weights, v representing a rendering layer;

using Gibbs sampling according to the conditional probability of the hidden layer node, solving the output of the hidden layer node by the following form:

h_s～P(h_s|v)

calculating the activation probability of all hidden layer nodes of the target domain data, and calculating P (h)_T＝1|v_T) The calculation formula is as follows:

P(h_Tj＝1|v_T)＝σ(c+∑_iv_1iw_ij)

and using Gibbs sampling according to the conditional probability of the hidden node, and solving the output of the hidden node of the target domain by the following method:

h_T～P(h_T|v)

the above formula is solved in a feature level, because the domain adaptation model constructed by the invention needs to be further solved on the classification result, the invention solves the classification result by a solving method in softmax regression, and the calculation formula is as follows:

for source domain data:

the resulting formulation of the solution for the classification of the target domain is:

in the DA-RBM model, the invention simplifies the solving formula of the classification layer into

And

thus, the way to solve the calculation data of the whole model is basically given, and then the invention needs to construct the cost function which is in the constraint condition and then gives the whole classifier model.

In the DA-RBM model, the invention uses RBM to extract features, a classification layer is added in the last layer, the output is the probability belonging to each class, and the output of the classifier in the feature extraction layer and the classifier is used for measuring the MMD-based distribution loss of a source domain and a target domain. On feature distribution, feature distribution differences over two domains are measured by feature MMD in the objective function:

where H represents the output of the hidden layer of the RBM, M represents a parameter matrix, where n^sRepresents the total number of source domain nodes, n^TRepresenting the total number of nodes in the target domain, i, j representing the node labels, Tr () representing the traces, h_i，h_jRepresenting the node output.

At the level of the classifier, the invention adds MMD loss to ensure that two domains are as identical as possible in condition distribution, and defines the conditional MMD:

wherein C is the number of label categories, q_cM represents a parameter matrix corresponding to a vector formed by all outputs of a certain class.

Finally, the classification loss in the network is added to form an objective function of the whole network:

where L (θ) is the loss function of the classifier and λ and μ represent scaling coefficients.

The basic mathematical logic of the DA-RBM classification model has been explained, and the present invention will be described next from the perspective of the algorithm flow. The DA-RBM algorithm is summarized as follows:

inputting: source data and label X^s,Y^sAnd target data X^TAnd a small number of labels Y^T

And (3) outputting: parameters w, b, c of RBM and prediction tag yT of predicted target domain

Beginning:

initializing feature space H by RBM_s,H_T；

1. Initializing RBM model parameters, and converting the original data X^s,X^TInputting the first-order feature expression into an RBM network;

2. then, taking the first-order features as the input of a next-order network, and carrying out RBM training according to the first-order features;

3. outputting hidden layer H of RBM_s,H_TInputting the classification into a softmax regression layer;

4. using Maximum Mean Difference (MMD) to carry out constraint on distribution of source domain data and target domain data on RBM hidden layer output;

5. using MMD to carry out constraint of a prediction result on a top classification layer of the model;

6. constructing a total cost function J (theta) of the model, and then optimizing parameters of the classifier model by optimizing the cost function;

and (6) ending.

The DA-RBM is used for solving the recognition problem of the handwritten fonts of the cross-domain problem, so a data set used by the invention is also a cross-domain data set during the experiment, and the invention selects an MNIST handwritten digital data set USPS handwritten digital data set for the experiment, wherein the MNIST data set is used as source domain data, and the USPS handwritten digital data set is used as a target domain data set. The DA-RBM model is to migrate the knowledge learned by the MNIST to the USPS to identify the USPS data set. The MNIST handwritten digit data set is used in many experiments as a handwritten digit data set commonly used for traditional machine learning, so that the MNIST data set is selected as source domain data when an experiment for cross-domain handwritten digit recognition is carried out, and the MNIST data set is selected as target domain data for an uncommon USPS handwritten digit data set. The data set is applied to RBM single-domain classification algorithm identification, domain-adaptive convolutional neural network classification identification and DA-RBM classification models.

The MNIST handwriting data set used by the method is called source domain data, wherein the MNIST handwriting data set comprises 60000 training sample data sets and 10000 test sample data. These digital images have been normalized by centering the number in the image and making the image size uniform. The image size in the MNIST dataset of the present invention is 16 x 16, i.e. its characteristic dimension is 256. The USPS data set is a U.S. postal handwritten digit set and is used as a target domain data set in a DA-RBM model, wherein the USPS data set contains 7291 training sample data and 2007 test sample data. The image size in the USPS dataset of the present invention is 16 × 16, i.e., its characteristic dimension is 256. The invention extracts a part of data on the source domain and the target domain to do experiments.

Testing for DA-RBM the present invention first considers a single domain, i.e., applies the MNIST data and USPS data of the single domain to RBM classification. When the RBM is used for classification, the training data and the test data are both selected from data sets from the same domain, firstly, under the condition that the RBM to be classified is required to be tested on an MNIST data set, the hidden node is selected to be 200, the learning rate is selected to be 0.01, and the classification result is shown in figure 4.

From the above experimental results, it can be seen that the recognition rate is 96% if the training data and the test data are both from MNIST data set

Next, the present invention will use RBM to recognize USPS handwritten digit data sets, that is, when both test data and training data come from USPS data sets, the RBM classifier model constructed by the present invention has a hidden layer unit number of 200 and a learning rate of 0.01. The result of the operation of applying the selected USPS data to the RBM classification is shown in fig. 5.

From the above experimental results, the present invention can see that the recognition rate of the USPS data set by the RBM classifier is 96%. Next, the present invention uses RBM classifier model to process the recognition of cross-domain data, that is, MNIST is used as source domain data, USPS is used as target domain data, the present invention sets the number of hidden layer units in classifier model to 200 learning rates to 0.01, and the result of applying the above data to cross-domain handwritten digit recognition is shown in fig. 6.

From the above experimental results, it can be seen that the use of RBMs to classify cross-domain data results in a recognition rate of 24%, and thus, is very low for cross-domain recognition of handwritten numbers.

Next, the present invention will use DA-RBM to process the recognition problem of handwritten numbers, wherein the present invention uses MNIST as source domain data and USPS data as target domain data, the present invention sets the number of hidden nodes of the network model to 200, sets the learning rate to 0.01, and classifies and recognizes the above data by DA-RBM, and the result is shown in fig. 7.

From the above experimental data, the present invention can see that the recognition rate of recognizing handwritten numbers using DA-RBM in the case of cross-domain is 92%. Compared with the RBM algorithm without the application domain adaptive learning algorithm, the DA-RBM algorithm has improved recognition capability compared with the recognition of handwriting fonts.

Following experiments the invention will compare the performance of DA-RBM by a comparison algorithm, using a domain adaptive convolutional neural network to perform cross-domain handwritten digit classification recognition. Because convolutional neural networks are advantageous in processing images, and handwriting font data is usually stored in the form of pictures, it is more representative to use convolutional neural networks to perform experiments. The following comparative data for various experiments are shown in Table 1

Table 1 experimental comparison results table

In a comparison experiment, firstly, a CNN algorithm processes handwritten number recognition of a mixed domain on the premise of no domain adaptation capability, then a DA-CNN algorithm is used for processing the handwritten number recognition of the mixed domain, and the recognition rate of an RBM is improved by 68% in cross-domain recognition after application domain adaptation learning from the comparison result. Therefore, the learning ability of the DA-RBM is obviously improved from the experimental results.

The invention adopts MNIST handwritten digit data set and USPS handwritten digit data set, then experiments are carried out to respectively identify MNIST data set and USPS data set by using RBM, meanwhile, DA-RBM model is used to process identification of cross-domain handwritten digit data set, then DA-CNN is used to process identification of cross-domain handwritten digit data set, according to experiment results, the invention can obtain that DA-RBM model can improve learning ability of RBM on the aspect of processing cross-domain handwritten digit identification, and compared with experiment results of DA-CNN, DA-RBM is the same as DA-CNN in the aspect of improving identification rate. Therefore, according to experimental results, the DA-RBM model can be considered to be effective in processing cross-domain handwritten digit recognition.

The method of the invention can be summarized as follows: as shown in fig. 8, the present invention provides a method for building a DA-RBM classifier model, which includes the following steps:

In step S130, a Gibbs sampling and contrast divergence algorithm is used to perform RBM training.

Said step S130 further comprises setting the number of hidden layer units to m and the learning rate to be m for the source domain data and the target domain data

P(h_sj＝1|v_s)＝σ(c+∑_iv_1iw_ij)

and (3) according to the conditional probability of the hidden layer node, using Gibbs sampling to solve the output of the hidden layer node by the following form:

h_s～P(h_s|v)

P(h_Tj＝1|v_T)＝σ(c+∑_iv_1iw_ij)

The step S140 includes: the classification solving calculation formula in the softmax regression layer is as follows:

the following formula is used for the source domain data:

the following formula is used for the target domain:

the last layer in the DA-RBM model is a classification layer, the output is the probability belonging to each class, and the DA-RBM classifier measures the MMD-based distribution loss of a source domain and a target domain in the output of a feature extraction layer and the classifier; on feature distribution, feature distribution differences over two domains are measured by feature MMD in the objective function:

wherein H represents the output of the hidden layer of the RBM;

In step S150, the total cost J (θ) is expressed as follows:

where L (θ) is the loss function of the classifier.

The source domain data is an MNIST data set, and the target domain data is a USPS data set.

The DA-RMB modeling method can effectively identify cross-domain handwritten data.

In the present invention, the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A handwritten data recognition method based on a DA-RBM classifier model is characterized by comprising the following steps:

s110, acquiring source domain data X^sAnd source domain data X^sCorresponding label Y^sTarget domain data X^TAnd label Y^T(ii) a The source domain number X^sAccording to MNIST data set, the target domain data X^TIs a USPS data set;

s130, taking the first-order features as input of a next-order network to perform RBM training;

s170, constructing a total cost function J (theta) of the model, and optimizing parameters of the classifier model by optimizing the total cost function;

s180, importing the handwriting data into a DA-RMB classifier model with optimized parameters for classification and identification of handwriting;

in step S130, a Gibbs sampling and contrast divergence algorithm is adopted to perform RBM training; the step S130 further includes setting the number of hidden layer units to m and the learning rate to m for the source domain data and the target domain data

The maximum training period is T, the parameter setting of the RBM network is respectively that the connection weight is set as W, the layer bias is set as b, and the hidden layer bias is set as c; initialize the RBM network and then for the source domain dataAll hidden nodes calculate their activation probability, activation probability p (h)_s＝1|v_s) The calculation formula is as follows:

Ρ(h_sj＝1|v_s)＝σ(c+∑_iv_1iw_ij)

h_s～Ρ(h_s|v)

Ρ(h_Tj＝1|v_T)＝σ(c+∑_iv_1iw_ij)

and (3) according to the conditional probability of the hidden layer node, using Gibbs sampling, and solving the output of the hidden layer node of the target domain in the following way: h is_T～Ρ(h_T|v)；

the following formula is used for the source domain data:

the following formula is used for the target domain:

wherein H represents the output of the hidden layer of the RBM;

wherein C is the category number of the label, and q is a vector formed by all outputs corresponding to a certain category;

the formula of the total cost function J (θ) is as follows:

where L (θ) is the loss function of the classifier.