CN104866867B

CN104866867B - A kind of multinational paper money sequence number character identifying method based on cleaning-sorting machine

Info

Publication number: CN104866867B
Application number: CN201510253055.8A
Authority: CN
Inventors: 于慧敏; 施成燕; 李天豪
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2015-05-15
Filing date: 2015-05-15
Publication date: 2017-12-05
Anticipated expiration: 2035-05-15
Also published as: CN104866867A

Abstract

The embodiment of the invention discloses a kind of multinational paper money sequence number character identifying method based on cleaning-sorting machine, serial number image is split, obtains the image of multiple characters, by the size normalization of the image of each character, on this basis, the character picture x after normalization is handled according to the following steps：Binary conversion treatment is carried out to x, obtains the binaryzation matrix x ' of character picture, and is converted into binaryzation vector

Description

A kind of multinational paper money sequence number character identifying method based on cleaning-sorting machine

Technical field

The invention belongs to automatic identification technology field, a kind of particularly multinational paper money sequence number word based on cleaning-sorting machine Recognition methods is accorded with, Logic Regression Models are relate in template training part.

Background technology

Character mother plate generation is the important step of paper money sequence number character recognition.Pre-training is used in paper money sequence number identification first Mode produce Character mother plate, then by the way that the character picture of input is matched to be identified result with Character mother plate. Therefore Character mother plate generates the step and follow-up recognition result is had a great influence.

Often by caused by debugging by hand, the method for debugging is to obtain each word first for conventional Character mother plate generation The statistical matrix of symbolWherein m is number of samples,Each element minimum 0 be up to m, it is higher to its value Part, i.e. dense parts, W takes larger on the occasion of in the less part of value, W takes less on the occasion of taking 0 part, i.e., The sparse part of character, W take negative value.

This generation method not only wastes time and energy, and the degree of accuracy is not high so that the accuracy rate of match cognization is also big afterwards It is big to reduce.

For the existing drawbacks described above of currently available technology, it is necessary to studied, to provide a kind of scheme, solved existing There is defect present in technology, improve speed and the degree of accuracy of template generation.

The content of the invention

To solve the above problems, object of the present invention is to provide a kind of multinational paper money sequence number word based on cleaning-sorting machine Accord with recognition methods.This method uses Logic Regression Models, and solve wastes time and energy in existing paper money sequence number Character mother plate generation With the degree of accuracy it is relatively low the problem of.

To achieve the above object, the technical scheme is that：A kind of multinational paper money sequence number character based on cleaning-sorting machine Recognition methods, this method are：Serial number image I is split first, obtains the image of multiple characters, by the figure of each character The size normalization of picture is m × n, that is, the character picture after normalizing Represent the real number matrix of m rows n row； On the basis of this, the character picture x after normalization is handled according to the following steps：

Step 1：Binary conversion treatment is carried out to this character picture x first, obtains character picture x binaryzation matrixThen this binaryzation matrix x ' is converted into binaryzation vectorWherein, used in binary conversion treatment Threshold value is calculated by Two-peak method.

Step 2：By binaryzation vectorMatched with each subtemplate in template set W.The method of matching is by binaryzation VectorDot product is carried out with each subtemplate respectively, and each element that product matrix is obtained to dot product is summed, and obtains element summation r, ifWith character k matching subtemplate W_kDuring dot product, element summation r obtains maximum, i.e., Then k is recognition result.

Further, each subtemplate in the template set W in the step 2 obtains by the following method：

(1) N number of character picture is inputted, obtains the binaryzation vector of each image.The binaryzation vector passes through with lower section Formula obtains：Binary conversion treatment is carried out to each character picture of input first, obtains the binaryzation matrix of the character pictureThen by this binaryzation matrix x '_jBe converted to binaryzation vectorWherein, binaryzation Threshold value is calculated by Two-peak method used in processing.Using the binaryzation vector of this N number of character picture as in training set X Element, training set X is formed, i.e.,

(2) according to training set X, any character c pre-matching template W is obtained_cFor：W_c=argmaxl (θ)

Wherein,Y_jIt is 0 or 1 for authentic signature value, word The mark value for according with c is 1, and the mark value of other characters is 0；X_jFor j-th of element in training set X, i.e.,Function

Optimized parameter W can be solved by being iterated using gradient descent method_c, in each iteration, parameter θ enters according to following formula Row renewal is until convergence：

Wherein, α is learning rate, and gradient is

(3) to character c pre-matching template W_cFixed point operation is carried out, obtains character c pre-matching template W_c ^F, specific side Formula is as follows：

W_c ^F=(W_c-min(W_c))./(max(W_c)-min(W_c))*(2^p-1)

Wherein/and it is a division operation, template W when p is fixed point_cIn the integer figure that changes into needed for each element.

The beneficial effects of the invention are as follows：

(1) Logic Regression Models are utilized, the training character sample of input is trained automatically, produces the mould of each character Plate, compared with the method debugged manually before, formation speed greatly improves.

(2) as a result of Logic Regression Models, once many samples can be trained, compared with debugging manually before Method, substantially increase the degree of accuracy of template.This method is flexible, suitable for current main flow a variety of currency types include RMB, Dollar, Euro, Hongkong dollar, yen etc..

Brief description of the drawings

Fig. 1 is the step flow of the multinational paper money sequence number character identifying method based on cleaning-sorting machine of the embodiment of the present invention Figure；

Fig. 2 is the character " 3 " of the multinational paper money sequence number character identifying method based on cleaning-sorting machine of the embodiment of the present invention Binaryzation matrix schematic diagram.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.

On the contrary, the present invention covers any replacement done in the spirit and scope of the present invention being defined by the claims, repaiied Change, equivalent method and scheme.Further, in order that the public has a better understanding to the present invention, below to the thin of the present invention It is detailed to describe some specific detail sections in section description.Part without these details for a person skilled in the art Description can also understand the present invention completely.

Fig. 1 show the step of multinational paper money sequence number character identifying method based on cleaning-sorting machine of the embodiment of the present invention and flowed Cheng Tu.

Serial number image I is split first, obtains the image of multiple characters, the size of the image of each character is returned One turns to m × n, and on this basis, the character picture x after normalization is handled according to the following steps：

Step 1：Represent the real number matrix of m rows n row；Binaryzation is carried out to this character picture x first Processing, obtain character picture x binaryzation matrixIt is the binaryzation matrix of character " 3 " as shown in Figure 2.So This binaryzation matrix x ' is converted into binaryzation vector afterwardsWherein, threshold value used in binary conversion treatment passes through bimodal Method is calculated.

Step 2：Each subtemplate in template set W is obtained, is obtained by the following method：

(2.1) N number of character picture is inputted, obtains the binaryzation vector of each image.The binaryzation vector passes through following Mode obtains：Binary conversion treatment is carried out to each character picture of input first, obtains the binaryzation matrix of the character pictureThen by this binaryzation matrix x '_jBe converted to binaryzation vectorWherein, binaryzation Threshold value is calculated by Two-peak method used in processing.Using the binaryzation vector of this N number of character picture as in training set X Element, training set X is formed, i.e.,

Wherein, α is learning rate, and gradient is

W_c ^F=(W_c-min(W_c))./(max(W_c)-min(W_c))*(2^p-1)

Step 3：By the binaryzation obtained in step 1 vectorMatched with each subtemplate in template set W.Matching Method be by binaryzation vectorDot product is carried out with each subtemplate respectively, and each element of product matrix is obtained to dot product Summation, obtains element summation r, ifWith character k matching subtemplate W_kDuring dot product, element summation r obtains maximum, i.e.,Then k is recognition result.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.

Claims

1. a kind of multinational paper money sequence number character identifying method based on cleaning-sorting machine, it is characterised in that this method is：First to sequence Row number image I is split, and obtains the image of multiple characters, is m × n by the size normalization of the image of each character, that is, returns Character picture after one change Represent the real number matrix of m rows n row；On this basis, the character figure after normalization As x is handled according to the following steps：

Step 1：Binary conversion treatment is carried out to this character picture x first, obtains character picture x binaryzation matrix Then this binaryzation matrix x ' is converted into binaryzation vectorWherein, threshold value used in binary conversion treatment passes through double Peak method is calculated；

Step 2：By binaryzation vectorMatched with each subtemplate in template set W；The method of matching is by binaryzation vector Dot product is carried out with each subtemplate respectively, and each element that product matrix is obtained to dot product is summed, and obtains element summation r, ifWith Character k matching subtemplate W_kDuring dot product, element summation r obtains maximum, i.e., Then k is recognition result；

Each subtemplate in template set W in the step 2 obtains by the following method：

(2.1) N number of character picture is inputted, obtains the binaryzation vector of each image；The binaryzation vector is in the following manner Obtain：Binary conversion treatment is carried out to each character picture of input first, obtains the binaryzation matrix of the character pictureThen by this binaryzation matrix x_j' be converted to binaryzation vectorWherein, at binaryzation Threshold value is calculated by Two-peak method used in reason；Using the binaryzation vector of this N number of character picture as the member in training set X Element, training set X is formed, i.e.,(2.2) according to training set X, any character c pre-matching template W is obtained_c For：W_c=argmaxl (θ) wherein,Y_jFor authentic signature value, For 0 or 1, character c mark value is 1, and the mark value of other characters is 0；X_jFor j-th of element in training set X, i.e.,Function

Optimized parameter W can be solved by being iterated using gradient descent method_c, in each iteration, parameter θ is carried out more according to following formula Newly until convergence：

<mrow> <mi>&theta;</mi> <mo>:</mo> <mo>=</mo> <mi>&theta;</mi> <mo>+</mo> <mi>&alpha;</mi> <mfrac> <mrow> <mo>&part;</mo> <mrow> <mo>(</mo> <mi>l</mi> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mrow> <mo>&part;</mo> <mi>&theta;</mi> </mrow> </mfrac> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mi>j</mi> </msub> </mrow>

Wherein, α is learning rate, and gradient is

(2.3) to character c pre-matching template W_cFixed point operation is carried out, obtains character c pre-matching template W_c ^F, concrete mode It is as follows：

W_c ^F=(W_c-min(W_c))./(max(W_c)-min(W_c))*(2^p-1)