CN106157284B

CN106157284B - The localization method and device of character area in image

Info

Publication number: CN106157284B
Application number: CN201510151823.9A
Authority: CN
Inventors: 刘彬; 刘扬; 张洪明
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-04-01
Filing date: 2015-04-01
Publication date: 2019-10-11
Anticipated expiration: 2035-04-01
Also published as: CN106157284A

Abstract

The invention discloses a kind of localization methods of character area in image, including building Partial Linear Models；Text space distribution parameter corresponding with image is generated by the Partial Linear Models；The non-legible bianry image of text-corresponding to the image is reconstructed according to the text space distribution parameter.The embodiment of the invention also discloses a kind of positioning devices of character area in image.Compared with prior art, the technical solution of the embodiment of the present invention, complete abandoning tradition is in such a way that image outline or provincial characteristics carry out character area positioning, by to the deeper semantic feature of image, image text spatial distributed parameters, it is analyzed, character area is positioned, it can not only avoid the interference to positioning such as picture size, font, color, languages, make to position it is more accurate, it is more robust, and the semantic feature that this method is most basic based on image, it can be suitable for the image of various formats, there is versatility.

Description

The localization method and device of character area in image

Technical field

The present invention relates to picture and text processing technology fields, more specifically, are related to a kind of positioning side of character area in image Method and device.

Background technique

In cyber transaction, since consumer can not intuitively see commodity, commodity image becomes businessman to consumption Person describes the important means of commodity.However, part businessman in order to improve the attention rates of commodity, can be embedded in false in commodity image Publicity text, not only cause malice to compete, but also cause consumer experience bad.Therefore, each e-commerce website is equal The word content of commodity image is audited, to form the monitoring to commodity image.

In general, the process of text is generally in existing audit commodity image, firstly, to the character area in commodity image It is positioned, determines the position of word segment in image；Then, according to identified character area, word content is refined, Obtain clearly word content.The technological means positioned at present to text in commodity image is usually, first to commodity figure Text salient region as in carries out coarse positioning, as far as possible exclusion background area；Secondly, using the information such as edge and color into The further analysis of row, screens out, merges character area, obtain possible literal line；Finally, using classifier to candidate text Row region carries out classification verifying, obtains true literal line region.

However, the size of commodity image, font, color, languages etc. all have uncertainty, and the text in commodity image Word and the background of commodity image complexity are easily obscured, and bring very big interference for the character area in positioning image, lead to not The character area in image is positioned, and then is unable to get clearly word content, commodity can not be carried out by way of automation The audit of image.

Summary of the invention

To overcome problems of the prior art, the present invention provides the localization method and dress of character area in a kind of image It sets.

In a first aspect, the present invention provides a kind of localization methods of character area in image, comprising: building parametric regression mould Type；Text space distribution parameter corresponding with image is generated by the Partial Linear Models；It is distributed according to the text space Parameter reconstructs the non-legible bianry image of text-corresponding to the image.

In a first possible implementation of that first aspect, the building Partial Linear Models, comprising: obtain the ginseng The target text spatial distributed parameters of number regression model；Test image is inputted into the Partial Linear Models and generates test text sky Between distribution parameter；It is obtained currently according to the target text spatial distributed parameters and the test text spatial distributed parameters operation Error；Calculate the difference of the error current and pedestal error；Wherein, the pedestal error is the error that last time operation obtains； Judge the difference whether less than the first preset threshold；If the difference is more than or equal to first preset threshold, according to The error current adjusts the unknown parameter of the Partial Linear Models, and the error current is determined as pedestal error, is laid equal stress on Step is executed again, test image is inputted into the Partial Linear Models generation test text spatial distributed parameters, until the difference Less than first preset threshold；If the difference is less than first preset threshold, by the unknown ginseng of the Partial Linear Models Several current values are determined as model parameter.

With reference to the above first aspect, in second of possible embodiment, described be distributed according to the text space is joined Number reconstructs non-legible bianry images of text-corresponding to the image, comprising: by the text space distribution parameter less than the The parameter of two preset thresholds is set as 0；The parameter for being greater than second preset threshold in the text space distribution parameter is set It is set to 1；Parameter 0 and parameter 1 are converted into binarized pixel gray value；The text is constructed according to the binarized pixel gray value The non-legible bianry image of word-.

With reference to the above first aspect, in the third possible embodiment, join in described be distributed the text space Parameter in number less than preset threshold is set as 0；The parameter setting of preset threshold will be greater than in the text space distribution parameter Before 1, further includes: building dimensionality reduction model；The text space distribution parameter is inputted into the dimensionality reduction model；Pass through parameter weight The text space distribution parameter is carried out dimension-reduction treatment by the mode of structure.

With reference to the above first aspect, in the 4th kind of possible embodiment, the building dimensionality reduction model, comprising: obtain The text space distribution parameter of the binary image marked in advance is as calibration text space distribution parameter；By the binary picture The grey scale pixel value of picture inputs the dimensionality reduction model and generates reconstruct text space distribution parameter；According to the calibration text space point Cloth parameter and the reconstruct text space distribution parameter operation obtain error current；Calculate the error current and pedestal error Difference；Wherein, the pedestal error is the error that last time operation obtains；Judge whether the difference is less than third predetermined threshold value； If the difference is more than or equal to the third predetermined threshold value, the unknown ginseng of the dimensionality reduction model is adjusted according to the error current Number, is determined as pedestal error for the error current, and it is defeated by the grey scale pixel value of the binary image to repeat step Enter the dimensionality reduction model and generate reconstruct text space distribution parameter, until the difference is less than the third predetermined threshold value；If institute Difference is stated less than the third predetermined threshold value, the current value of the dimensionality reduction unknown-model parameter is determined as model parameter.

With reference to the above first aspect, in the 5th kind of possible embodiment, the acquisition Partial Linear Models Target text spatial distributed parameters, comprising: read the output data of the dimensionality reduction model the last layer；Most by the dimensionality reduction model The output data of later layer is determined as the target text spatial distributed parameters.

Second aspect, the present invention provides a kind of positioning devices of character area in image, comprising: building module is used for Construct Partial Linear Models；Generation module, for passing through the generation of Partial Linear Models constructed by the building module and image Corresponding text space distribution parameter；Reconstructed module, for according to generation module text space distribution parameter generated Reconstruct the non-legible bianry image of text-corresponding to the image.

In second aspect in the first possible implementation, the building module includes: acquiring unit, generation unit, Computing unit, judging unit, adjustment unit and determination unit, wherein the acquiring unit, for obtaining the parametric regression mould The target text spatial distributed parameters of type；The generation unit is generated for test image to be inputted the Partial Linear Models Test text spatial distributed parameters；The computing unit, for according to the target text spatial distributed parameters and the test Text space distribution parameter operation obtains error current；It is also used to calculate the difference of the error current and pedestal error；Wherein, The pedestal error is the error that last time operation obtains；The judging unit, for judging whether the difference is pre- less than first If threshold value；When the difference is more than or equal to first preset threshold, the adjustment unit, for according to described current The unknown parameter of Partial Linear Models described in error transfer factor；The determination unit, for being more than or equal to institute in the difference When stating the first preset threshold, the error current is determined as pedestal error；It is less than first preset threshold in the difference When, the determination unit is also used to the current value of the Partial Linear Models unknown parameter being determined as model parameter.

In conjunction with above-mentioned second aspect, in second of possible embodiment, the reconstructed module includes: binaryzation list Member, converting unit and construction unit, wherein the binarization unit, for by the text space distribution parameter less than the The parameter of two preset thresholds is set as 0；The parameter for being greater than second preset threshold in the text space distribution parameter is set It is set to 1；The converting unit, for parameter 0 and parameter 1 to be converted to binarized pixel gray value；The construction unit is used for root The non-legible bianry image of text-is constructed according to the binarized pixel gray value.

In conjunction with above-mentioned second aspect, in the third possible embodiment, described device further include: input unit and drop Tie up unit, wherein the building module is also used to construct dimensionality reduction model；The input unit, for dividing the text space Cloth parameter inputs the dimensionality reduction model；The dimensionality reduction unit, for being distributed the text space in such a way that parameter reconstructs Parameter carries out dimension-reduction treatment.

In conjunction with above-mentioned second aspect, in the 4th kind of possible embodiment, the acquiring unit is also used to obtain in advance The text space distribution parameter of the binary image of mark is as calibration text space distribution parameter；The generation unit, is also used Reconstruct text space distribution parameter is generated in the grey scale pixel value of the binary image is inputted the dimensionality reduction model；The meter Unit is calculated, is also used to be worked as according to the calibration text space distribution parameter and the reconstruct text space distribution parameter operation Preceding error；Calculate the difference of the error current and pedestal error；The judging unit is also used to judge whether the difference is small In third predetermined threshold value；When the difference is more than or equal to the third predetermined threshold value, the adjustment unit is also used to according to institute State the unknown parameter that error current adjusts the dimensionality reduction model；The determination unit is also used to for the error current being determined as Pedestal error；When the difference is less than the third predetermined threshold value, the determination unit is also used to the dimensionality reduction model not Know that the current value of parameter is determined as model parameter.

In conjunction with above-mentioned second aspect, in the 5th kind of possible embodiment, the acquiring unit includes: to read son list Member, for reading the output data of the dimensionality reduction model the last layer；The determination unit is also used to the dimensionality reduction model most The output data of later layer is determined as the target text spatial distributed parameters.

From the above technical scheme, the embodiment of the present invention is when positioning the character area in image, firstly, building parameter Regression model generates text space distribution parameter corresponding with image by Partial Linear Models, then, according to text space Distribution parameter constructs the non-legible bianry image of text-, non-textual represents the text of image and explicitly.That is, image is joined Numberization is explicitly positioned the character area in image by handling the corresponding parameter of image.As can be seen that The technical solution of the embodiment of the present invention, complete abandoning tradition carry out character area positioning by image outline or provincial characteristics Mode, by the way that the deeper semantic feature of image, image text spatial distributed parameters are analyzed, to character area into Row positioning, picture size, font, color, languages etc. can not only be avoided to the interference of positioning, make to position it is more accurate, more Robust, and the semantic feature that this method is most basic based on image can be suitable for the image of various formats, have versatility.

It should be understood that above general description and following detailed description be merely illustrative with explanatory description, it is right Technical solution of the present invention does not constitute a limitation simultaneously.

Detailed description of the invention

It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.By the way that shown in attached drawing, above and other purpose of the invention, feature and advantage will be more clear.In whole Identical attached drawing mark indicates identical part in attached drawing.Attached drawing, emphasis deliberately are not drawn by actual size equal proportion scaling It is to show the gist of the present invention.

Fig. 1 is the flow chart of the localization method of character area in a kind of image provided in an embodiment of the present invention；

Fig. 2 is the flow chart of the localization method of character area in another image provided in an embodiment of the present invention；

Fig. 3 is that the present invention provides the schematic diagram of bit image undetermined；

Fig. 4 is the non-legible bianry image of the corresponding text-of image shown in Fig. 3；

Fig. 5 is the structural schematic diagram of the positioning device of character area in a kind of image provided in an embodiment of the present invention；

Fig. 6 is the structural schematic diagram of the positioning device of character area in another image provided in an embodiment of the present invention.

Specific embodiment

The mode of character area includes: based on Region Feature Extraction (Maximally Stable in existing positioning image Extremal Regions, MSER) or based on stroke width transformation (Stroke Width Transform, SWT) text Area positioning method etc..And existing character area positioning method is mostly based on the feature of engineer and rule is realized, manually The feature of design and regular generalization ability are simultaneously bad, single for position color, and word column, literal line, text interval are consistent etc. The single character area of rule is relatively applicable in, when character area in the changeable image of detection of complex, it is easy to erroneous detection is generated, Poor robustness.In order to solve the above-mentioned technical problem, technical solution of the present invention is proposed.

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Whole description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Referring to Figure 1, Fig. 1 is the process of the localization method of character area in a kind of image provided in an embodiment of the present invention, This approach includes the following steps.

Step S101 constructs Partial Linear Models.

Wherein, the embodiment of the present invention converts the image into text space distribution parameter by way of parametric regression, in order to The text space distribution parameter of image can be accurately obtained, the technical solution of the embodiment of the present invention can be by marking sample learning Mode construct Partial Linear Models.In the present embodiment, Partial Linear Models can be depth convolutional network (Depth of Convolutional network, DCNN), deep neural network (Depth of neural network, DNN), support to Amount machine (Support Vector Machine, SVM)) or AdaBoost etc..

Specifically, the present embodiment is illustrated building Partial Linear Models by taking DCNN study optimization as an example.It determines first Partial Linear Models, Partial Linear Models can be such as following formula (1) to formula (4), wherein S is the target text of Partial Linear Models This spatial distributed parameters, x are the text image of input, and S and x meet mapping relations F, as shown in formula (1), in the embodiment of the present invention Middle F represents nonlinear mapping function, shown in the mapping relations such as formula (2).F in formula (2)_iFor each layer of mapping function, mapping Shown in functional expression such as formula (3).σ represents activation primitive in formula (3), for example, being the activation primitive of the last layer shown in formula (4).

F:S←x (1)

f_i(a_i-1)=σ (W_ia_i-1+b_i)@a_i, i=1, K, k-1 (3)

f_k(a_k-1)=W_ka_k-1+b_k@S₀ (4)

In the present embodiment, in the initial state by W_i,b_iValue be preset as the parameter of any non-zero, which can be with For random natural number.Due to it is initial when W_i,b_iValue be arbitrary value, leading to Partial Linear Models may not be best model, according to The calculating of input sample image is resulting, and text space distribution parameter may there are biggish mistakes with target text spatial distributed parameters Difference, therefore, the process for constructing Partial Linear Models is W in Optimized model_i,b_iProcess.Since the degree of optimization of model can lead to The error for crossing text space distribution parameter is reacted, and can be according to W with error in embodiments of the present invention_i,b_iValue It is adjusted, to be optimized to model.

Specifically, the bianry image for marking out character area using one is as sample, it can be by the text space of the sample Target text spatial distributed parameters of the distribution parameter as parametric regression function, i.e. S in formula (1), then by the RGB of the sample Image is as input sample image, i.e. x in formula (1).Parameter current regression model is calculated into resulting text space distribution ginseng Number is used as test text spatial distributed parameters, calculates the mistake of test text spatial distributed parameters and target text spatial distributed parameters Difference, and the difference of error current Yu last time errors is calculated, if the difference is less than default first preset threshold, then it is assumed that W_i,b_iOptimal value is had converged to, can be used as the model parameter of Partial Linear Models；If the difference is more than or equal to pre- If the first preset threshold illustrates current W_i,b_iOptimal value is not converged to, W can be adjusted according to current error value_i,b_i, with contracting The error of small test text spatial distributed parameters and target text spatial distributed parameters, then again by the RGB image of the sample Partial Linear Models are inputted as input sample image, new test text spatial distributed parameters is generated, new mistake is calculated Then difference calculates the difference of new error and last time error that this is obtained, until difference is less than default first preset threshold.

It should be noted that in the present embodiment, when first time sample image being inputted Partial Linear Models, due to not depositing Therefore the error caused by last time when calculating the difference of this error and last time error, sets 0 for last time error.

In addition, the first preset threshold can be set according to the specific functional relation and empirical value of Partial Linear Models, and On-fixed value, the present invention is herein without repeating.

In the present embodiment, Partial Linear Models are trained by way of machine learning, can not only guarantee model The parameter accuracy exported when in use is higher, and can be avoided the feature of engineer, and applicability is wider.

Step S102 generates text space distribution parameter corresponding with image by the Partial Linear Models.

Wherein, if using image as two-dimensional space, in image each pixel with a position of the two-dimensional space Correspondence is set, and the position of two-dimensional space can be indicated by text space distribution parameter, therefore, each pixel is with one in image A text space distribution parameter mutually maps, and the position where pixel can be indicated by text space distribution parameter.According to upper Description is stated it is found that Partial Linear Models can give birth to image input Partial Linear Models by mark sample repetition learning building At the text space distribution parameter of the image, and obtained text space distribution parameter is relatively accurate.

It is noted that the color value of the pixel should be also obtained, in order to true while obtaining pixel corresponding position Determine the position of character area in image, therefore, after input picture, model passes through the tri- color value meters of R, G, B for reading pixel The text space distribution parameter of the pixel is calculated, and text space distribution parameter generated can indicate location of pixels to be a series of And the floating number of color, for example, 0.5,0.8 etc., wherein a pixel in each floating number correspondence image.

In addition, when constructing Partial Linear Models, in order to reduce calculation amount, the efficiency of model treatment image is improved, it can be with Image is normalized by arest neighbors interpolation algorithm, reduces the dimension of image, for example, being 1024* by life size 1024 image is normalized to 256*256 size by arest neighbors interpolation algorithm.It should be pointed out that in order to guarantee to be generated Text space distribution parameter accuracy, the sample image size that inputs is how many when constructing Partial Linear Models, passes through ginseng The image that inputs should also be as to be correspondingly sized when number regression models positioning, for example, if when building Partial Linear Models, the sample of input This image size is 256*256, then the image size inputted when using Partial Linear Models is also 256*256.Certainly, It above are only a preferred embodiment of the present invention, the size according to the difference of Partial Linear Models, after image normalization Not identical, the present invention is without limitation.In addition, image handle as those skilled in the art by arest neighbors interpolation algorithm Technology known to member, details are not described herein again by the present invention.

The setting of the present embodiment can not only be accurately obtained the corresponding text space distribution parameter of image pixel, be fixed The foundation that offers precise data of position character area, and image is normalized, additionally it is possible to greatly reduce parameter and returns Return the calculation amount of model.

Step S103 reconstructs the non-legible two-value of text-corresponding to the image according to the text space distribution parameter Image.

Wherein, in commodity image, in order to attract the attention of consumer, publicity or descriptive matter in which there are waken up mostly Mesh, even if it is possible that the text of different colours, position and size in image, but text usually has in a zonule Have very strong consistency, be that grey scale pixel value in the region is close then being showed, and with the pixel in other regions Gray value is different, therefore, can by grey scale pixel value in analysis image and position, to the character area in image into Row positioning.

According to foregoing description it is found that being image pixel gray level value and position, this reality represented by text space distribution parameter It applies example and passes through the character area of the processing detection image to text space distribution parameter.In order to explicitly by character area and non-text Block domain distinguishes, and the embodiment of the present invention sets text and non-legible region to the bianry image of two kinds of colors.

For example, white is arranged in character area, non-legible region is set as black.Specifically, due to text generated Spatial distributed parameters are the numerical value of different sizes, firstly, it is necessary to by text space distribution parameter binaryzation, then by binaryzation Text space distribution parameter is converted into the grey scale pixel value of binaryzation, to construct text-according to the grey scale pixel value of binaryzation Non-legible bianry image.Include setting the second preset threshold for text space distribution parameter binaryzation, text space is distributed and is joined Parameter in number less than the second preset threshold is set as 0；The parameter of the second preset threshold will be greater than in text space distribution parameter It is set as 1, so that text space distribution parameter be made to only exist two kinds of settings.If constructing black-and-white bianry image, then will be after binaryzation Text space distribution parameter multiplied by 255, generate two kinds of color gray values of black and white, construct to be formed according to color gray value The non-legible bianry image of text-.

It is noted that the image of bianry image and input Partial Linear Models that building is formed is in the same size, and input The image of Partial Linear Models may have already passed through normalized, not life size, and the bianry image for causing building to be formed is simultaneously Non- life size, therefore, after obtaining the non-legible bianry image of text-, it is also necessary to judge whether the bianry image is less than original image The bianry image is normalized to original image size by arest neighbors interpolation algorithm if being less than by the size of picture.

As can be seen from the above embodiments, in image described in the embodiment of the present invention character area localization method, image is joined Numberization is explicitly positioned the character area in image by handling the corresponding parameter of image.As can be seen that The technical solution of the embodiment of the present invention, complete abandoning tradition carry out character area positioning by image outline or provincial characteristics Mode, by the way that the deeper semantic feature of image, image text spatial distributed parameters are analyzed, to character area into Row positioning, picture size, font, color, languages etc. can not only be avoided to the interference of positioning, make to position it is more accurate, more Robust, and the semantic feature that this method is most basic based on image can be suitable for the image of various formats, have versatility.

Above-described embodiment from the localization method of the embodiment of the present invention is described on one side, in order to make technical side of the invention Case is clearer, perfect, and on the basis of the above embodiments, the embodiment of the present invention is also from other side to the technology of the present invention side Case is described.Since the present embodiment is the additional notes to above-described embodiment, the present embodiment and above-described embodiment phase The description of above-described embodiment is detailed in same part, repeats no more in the present embodiment.

Fig. 2 is referred to, Fig. 2 is the process of the localization method of character area in another image provided in an embodiment of the present invention Figure, the positioning problems method include the following steps.

Step S201 constructs Partial Linear Models.

In the present embodiment, it is assumed that Partial Linear Models DCNN, when constructing DCNN, the 256*256 of sample image size.Structure The process for building DCNN is detailed in the description of above-described embodiment, and details are not described herein again for the present embodiment.

Step S202 generates text space distribution parameter corresponding with image by the Partial Linear Models.

Referring to Fig. 3, Fig. 3 is bit image undetermined provided in an embodiment of the present invention, region 01, region 02 and region in the image 03 is character area, other regions are background area.Assuming that the size of the image is 1024*1024, when due to building DCNN Sample image size is 256*256, therefore, before the image in Fig. 3 is inputted DCNN model, is needed the image by most Neighbour's interpolation algorithm is normalized to 256*256, and normalized image is then inputted DCNN, each pixel of DCNN model read R, the value of G and B, and calculated, corresponding each pixel generates a text space distribution parameter.

Step S203 constructs dimensionality reduction model.

It wherein,, can will be literary in order to reduce data processing amount when constructing bianry image according to text space distribution parameter This spatial distributed parameters carry out dimension-reduction treatment, and by text space distribution parameter progress dimension-reduction treatment need by dimensionality reduction model into Row, therefore, it is necessary to construct dimensionality reduction model.Dimensionality reduction model is provided with multitiered network and multiple nodes, and the first layer network receives input Operation is carried out after data, is once merged node in calculating process；The output data of first layer network is as second layer net Node is carried out secondary merging by the input data of network, and using output data as the input data of third layer network, until obtaining The output data of the last layer network is merged by every layer of node and completes dimension-reduction treatment.In the present embodiment, dimensionality reduction model can be with For depth Boltzmann machine (The depth of the Boltzmann machine, DBM), depth confidence network (Deep Belief network, DBN) or limited Boltzmann machine (restricted Boltzmann machine, RBM) etc..In order to The feature that can be avoided engineer, similar with building Partial Linear Models, dimensionality reduction model can also be by marking sample learning Mode construct.

The present embodiment is described in detail building dimensionality reduction model by taking DBM as an example.Firstly, three layers of DBM model of building, in detail See formula (4), wherein v represents visible variable, h¹And h²The respectively hidden layer variable of the second layer and third layer, w be node unit it Between connect the weight on side, b and c are node unit amount of bias.It is similar to building Partial Linear Models, it in the initial state, will be upper The parameter that unknown parameter is set as any non-zero is stated, and optimal value is determined by sample training.

The bianry image marked in advance using one obtains the text space distribution parameter conduct of the bianry image as sample The grey scale pixel value input dimensionality reduction model of the sample is generated reconstruct text by the calibration text space distribution parameter of DBM model training This spatial distributed parameters.Since reconstruct text space distribution parameter is generated by dimensionality reduction model, unknown ginseng in dimensionality reduction model Several values can directly be embodied by the error of reconstruct text space distribution parameter and calibration text space distribution parameter, with structure It is similar to build Partial Linear Models, can be that foundation optimizes dimensionality reduction model with error amount.

Specifically, currently being missed according to calibration text space distribution parameter and reconstruct text space distribution parameter operation Difference calculates the difference of error current and last time errors, if the difference is less than default third predetermined threshold value, then it is assumed that unknown Parameter has converged to optimal value, can be used as the model parameter of dimensionality reduction model；If the difference is more than or equal to default the Three preset thresholds illustrate that current unknown parameter does not converge to optimal value, can be adjusted according to current error value, to reduce It reconstructs text space distribution parameter and demarcates the error of text space distribution parameter, then again by the grey scale pixel value of the sample Dimensionality reduction model is inputted, new reconstruct text space distribution parameter is generated, new error is calculated, then calculates what this was obtained The difference of new error and last time error, until difference is less than default third predetermined threshold value.

It should be noted that in the present embodiment, when first time the grey scale pixel value of sample being inputted dimensionality reduction model, due to There is no errors caused by last time, therefore, when calculating the difference of this error and last time error, set last time error to 0。

In addition, the grey scale pixel value input dimensionality reduction model of sample is generated reconstruct text space distribution parameter, specifically include: The grey scale pixel value of bianry image is inputted to the first layer of DBM model, the output of DBM model first layer operation according to preset order Data continue operation as the input data of the second layer, and the output data of the second layer is transported as the input data of third layer Calculate, DBM model since first layer successively using output data as next layer of input data, until obtaining the defeated of the last layer Data out.Then, inverse operation is carried out using the output data of the last layer, obtains the reconstruct text space distribution of the bianry image Parameter.

It should be noted that since the bianry image marked in advance is two-dimensional image, and in training DBM model, institute The data of input should be it is one-dimensional, therefore, when reading data, with preset row or column for sequence be read out.

In addition, the DBM model in the present embodiment is provided with three-layer network, the number of nodes of the second layer be can be set to 1024, the number of nodes of third layer can be 256.Certainly, the present embodiment is only a preferable example of the invention, planned network When, the network layer and every layer of number of nodes of different number can be set according to demand, and the present invention is without limitation.

The text space distribution parameter is inputted the dimensionality reduction model by step S204.

The text space distribution parameter is carried out dimension-reduction treatment in such a way that parameter reconstructs by step S205.

The text space distribution parameter of the DCNN bit image undetermined generated is inputted into DBM, the calculating of DBM through the above steps Mode successively carries out parameter reconstruct, and the data that DBM the last layer is exported are the text space distribution parameter after dimensionality reduction.Wherein, Parameter is reconstructed into the usual technological means of those skilled in the art, and and will not be described here in detail by the present invention.

As can be seen from the above embodiments, the text space distribution parameter that DCNN is exported is floating number, and DBM is to pass through parameter The mode of reconstruct reduces the dimension of text space distribution parameter, does not change text space distribution parameter value, therefore, after dimensionality reduction Text space distribution parameter remains as floating number.

It should be noted that since DCNN and DBM is handled same sub-picture, and as can be seen from the above description, DBM is using the output data of the last layer hidden layer as the feature extracted, therefore, in order to enhance the stabilization to character area positioning Property and robustness can be using the output datas of DBM the last layer as the target text spatial distribution of DCNN when constructing model Parameter.Such setting, firstly, be trained by the same sample to DCNN and DBM, by DCNN and DBM combined training and It uses, the performance of positioning can be greatly improved；Secondly, the output data of DBM the last layer is the feature that is extracted, therefore, Not only there is specific representativeness, but also data volume is few, when as target text spatial distributed parameters training DCNN, can protect It demonstrate,proves and significantly reduces operand under conditions of training accuracy.

In the present embodiment, by using Partial Linear Models and dimensionality reduction models coupling, when can greatly improve positioning Performance, and enable to processing result robustness higher.

Step S206 sets 0 for the parameter for being less than preset threshold in the text space distribution parameter；By the text Parameter in spatial distributed parameters greater than preset threshold is set as 1.

Wherein, the present embodiment is specially that the text space distribution parameter after DBM dimensionality reduction is carried out binaryzation setting.

Parameter 0 and parameter 1 are converted to binarized pixel gray value by step S207.

The present embodiment, by the parameter of binaryzation multiplied by 255, obtains pixel ash for Fig. 3 is converted to black-and-white binary map Angle value 0 and 255, wherein grey scale pixel value 255 indicates that the pixel is black, and grey scale pixel value 0 indicates that the pixel is white.When So, it above are only preferable example of the invention, other colors and white can also be converted by the parameter of binaryzation, as long as can Region 01, region 02, region 03 and background area are explicitly distinguished, the present invention is without limitation.

Step S208 reconstructs the non-legible two-value of text-corresponding to the image according to the text space distribution parameter Image.

As shown in figure 4, it is the corresponding black-and-white bianry image of the Fig. 3 constructed, wherein region according to grey scale pixel value 01, region 02 and region 03 are white, and background area is black, so that character area at three in Fig. 3 carried out explicitly Positioning.

Furthermore, it is necessary to explanation, before Fig. 3 is inputted DCNN, Fig. 3 is normalized to size 256*256, and it is big with this It is also the two-dimensional space of 256*256 represented by the corresponding text space distribution parameter of small image, therefore, DBM is generated Bianry image size is also 256*256, and the size that Fig. 3 is middle image is 1024*1024, so, bianry image is generated in DBM Afterwards, it is also necessary to bianry image is normalized to 1024*1024 size by arest neighbors interpolation algorithm, obtain image shown in Fig. 4.

Corresponding with above-mentioned implementation method, the embodiment of the invention also provides a kind of positioning dresses of character area in image It setting, refers to Fig. 5, Fig. 5 is the structural schematic diagram of the positioning device of character area in a kind of image provided in an embodiment of the present invention, The device includes: building module 11, generation module 12 and reconstructed module 13.Wherein, module 11 is constructed, for constructing parametric regression Model；Generation module 12, for generating text corresponding with image by Partial Linear Models constructed by the building module 11 This spatial distributed parameters；Reconstructed module 13, for being reconstructed according to the generation module 12 text space distribution parameter generated The non-legible bianry image of text-corresponding to the image.

The function of each unit and the realization process of effect are detailed in corresponding realization process in the above method in described device, Details are not described herein.

The localization method of character area in the image described in the embodiment of the present invention it can be seen from the present embodiment, by image Parametrization, by handling the corresponding parameter of image, the character area in image is explicitly positioned.It can see Out, the technical solution of the embodiment of the present invention, complete abandoning tradition, which carries out character area by image outline or provincial characteristics, to be determined The mode of position, by the way that the deeper semantic feature of image, image text spatial distributed parameters are analyzed, to literal field Domain is positioned, and can not only avoid picture size, font, color, languages etc. to the interference of positioning, make to position it is more accurate, It is more robust, and the semantic feature that this method is most basic based on image, the image of various formats can be suitable for, had general Property.

On the basis of the above embodiments, in the present embodiment, the building module 11 includes: acquiring unit, is generated single Member, computing unit, judging unit, adjustment unit and determination unit, wherein the acquiring unit is returned for obtaining the parameter Return the target text spatial distributed parameters of model；The generation unit, for test image to be inputted the Partial Linear Models Generate test text spatial distributed parameters；The computing unit, for according to the target text spatial distributed parameters with it is described Test text spatial distributed parameters operation obtains error current；It is also used to calculate the difference of the error current and pedestal error； Wherein, the pedestal error is the error that last time operation obtains；The judging unit, for judging the difference whether less than One preset threshold；When the difference is more than or equal to first preset threshold, the adjustment unit, for according to Error current adjusts the unknown parameter of the Partial Linear Models；The determination unit, for being greater than or waiting in the difference When first preset threshold, the error current is determined as pedestal error；It is default to be less than described first in the difference When threshold value, the determination unit is also used to the current value of the Partial Linear Models unknown parameter being determined as model parameter.

The reconstructed module 13 includes: binarization unit, converting unit and construction unit, wherein the binarization unit, For setting 0 for the parameter in the text space distribution parameter less than the second preset threshold；The text space is distributed Parameter in parameter greater than second preset threshold is set as 1；The converting unit, for being converted to parameter 0 and parameter 1 Binarized pixel gray value；The construction unit, for constructing the text-non-legible two according to the binarized pixel gray value It is worth image.

In order to make technical solution of the present invention in further detail, the embodiment of the invention also provides literal fields in another image The positioning device in domain, refers to Fig. 6, and Fig. 6 is the positioning device of character area in another image provided in an embodiment of the present invention Structural schematic diagram, which includes: building module 21, generation module 22, input unit 23, dimensionality reduction unit 24 and reconstructed module 25.Wherein, function and the effect for constructing module 21, generation module 22 and reconstructed module 25 are similar to the aforementioned embodiment, the present invention Details are not described herein again for embodiment.In the present embodiment, module 21 is constructed, is also used to construct dimensionality reduction model；Input unit 23, for the text space distribution parameter to be inputted the dimensionality reduction model；Dimensionality reduction unit 24, the side for being reconstructed by parameter The text space distribution parameter is carried out dimension-reduction treatment by formula.

In the present embodiment, the acquiring unit in module 21 is constructed, is also used to obtain the binary image marked in advance Text space distribution parameter is as calibration text space distribution parameter；The generation unit is also used to the binary image Grey scale pixel value input the dimensionality reduction model and generate reconstruct text space distribution parameter；The computing unit, is also used to basis The calibration text space distribution parameter and the reconstruct text space distribution parameter operation obtain error current；Work as described in calculating The difference of preceding error and pedestal error；The judging unit, is also used to judge whether the difference is less than third predetermined threshold value；? When the difference is more than or equal to the third predetermined threshold value, the adjustment unit is also used to adjust institute according to the error current State the unknown parameter of dimensionality reduction model；The determination unit is also used to the error current being determined as pedestal error；In the difference When value is less than the third predetermined threshold value, the determination unit is also used to the current value of the dimensionality reduction unknown-model parameter is true It is set to model parameter.

In conjunction with above-described embodiment, in the present embodiment, the acquiring unit includes reading subunit, for reading the drop The output data of dimension module the last layer；In the present embodiment, the determination unit, be also used to by the dimensionality reduction model last The output data of layer is determined as the target text spatial distributed parameters.

In summary, the embodiment of the present invention is when positioning the character area in image, firstly, building Partial Linear Models, Text space distribution parameter corresponding with image is generated by Partial Linear Models, then, according to text space distribution parameter The non-legible bianry image of text-is constructed, non-textual is represented the text of image and explicitly.That is, image parameter is passed through The corresponding parameter of image is handled, the character area in image is explicitly positioned.As can be seen that the present invention is implemented The technical solution of example, complete abandoning tradition pass through in such a way that image outline or provincial characteristics carry out character area positioning To the deeper semantic feature of image, image text spatial distributed parameters are analyzed, are positioned to character area, no Only can be avoided the interference to positioning such as picture size, font, color, languages, make to position it is more accurate, it is more robust, and This method semantic feature most basic based on image can be suitable for the image of various formats, have versatility.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims

1. the localization method of character area in a kind of image characterized by comprising

Construct Partial Linear Models；

Text space distribution parameter corresponding with image is generated by the Partial Linear Models；

The non-legible bianry image of text-corresponding to the image is reconstructed according to the text space distribution parameter；

It is described that the non-legible bianry image of text-corresponding to the image is reconstructed according to the text space distribution parameter, comprising:

0 is set by the parameter in the text space distribution parameter less than the second preset threshold；The text space is distributed Parameter in parameter greater than second preset threshold is set as 1；

Parameter 0 and parameter 1 are converted into binarized pixel gray value；

The non-legible bianry image of text-is constructed according to the binarized pixel gray value；

0 is set by the parameter for being less than preset threshold in the text space distribution parameter described；By the text space point Parameter in cloth parameter greater than preset threshold is set as before 1, further includes:

Construct dimensionality reduction model；The dimensionality reduction model is three layers of DBM model, and following formula is utilized to construct three layers of DBM model,Wherein, v represents visible variable, h¹And h²Respectively the second layer and The hidden layer variable of third layer, w connect the weight on side between node unit, and b and c are node unit amount of bias；

The text space distribution parameter is inputted into the dimensionality reduction model；

The text space distribution parameter is subjected to dimension-reduction treatment in such a way that parameter reconstructs.

2. the localization method of character area in image according to claim 1, which is characterized in that the building parametric regression Model, comprising:

Obtain the target text spatial distributed parameters of the Partial Linear Models；

Test image is inputted into the Partial Linear Models and generates test text spatial distributed parameters；

Error current is obtained according to the target text spatial distributed parameters and the test text spatial distributed parameters operation；

Calculate the difference of the error current and pedestal error；Wherein, the pedestal error is the error that last time operation obtains；

Judge the difference whether less than the first preset threshold；

If the difference is more than or equal to first preset threshold, the parametric regression mould is adjusted according to the error current The error current is determined as pedestal error by the unknown parameter of type, and repeats step and test image is inputted the ginseng Number regression model generates test text spatial distributed parameters, until the difference is less than first preset threshold；

If the difference is less than first preset threshold, the current value of the Partial Linear Models unknown parameter is determined as mould Shape parameter.

3. the localization method of character area in image according to claim 1, which is characterized in that the building dimensionality reduction mould Type, comprising:

The text space distribution parameter of the binary image marked in advance is obtained as calibration text space distribution parameter；

The grey scale pixel value of the binary image is inputted into the dimensionality reduction model and generates reconstruct text space distribution parameter；

Error current is obtained according to the calibration text space distribution parameter and the reconstruct text space distribution parameter operation；

Judge whether the difference is less than third predetermined threshold value；

If the difference is more than or equal to the third predetermined threshold value, the unknown of the dimensionality reduction model is adjusted according to the error current The error current is determined as pedestal error by parameter, and repeats step for the grey scale pixel value of the binary image It inputs the dimensionality reduction model and generates reconstruct text space distribution parameter, until the difference is less than the third predetermined threshold value；

If the difference is less than the third predetermined threshold value, the current value of the dimensionality reduction unknown-model parameter is determined as model ginseng Number.

4. according to the localization method of character area in image described in claim any in claim 2 to 3, which is characterized in that institute State the target text spatial distributed parameters for obtaining the Partial Linear Models, comprising:

Read the output data of the dimensionality reduction model the last layer；

The output data of the dimensionality reduction model the last layer is determined as the target text spatial distributed parameters.

5. the positioning device of character area in a kind of image characterized by comprising

Module is constructed, for constructing Partial Linear Models；

Generation module, for generating text space corresponding with image by Partial Linear Models constructed by the building module Distribution parameter；

Reconstructed module, for corresponding to the image according to generation module text space distribution parameter reconstruct generated The non-legible bianry image of text-；

The reconstructed module includes: binarization unit, converting unit and construction unit, wherein

The binarization unit, for setting 0 for the parameter in the text space distribution parameter less than the second preset threshold； 1 is set by the parameter for being greater than second preset threshold in the text space distribution parameter；

The converting unit, for parameter 0 and parameter 1 to be converted to binarized pixel gray value；

The construction unit, for constructing the non-legible bianry image of text-according to the binarized pixel gray value；

Described device further include: input unit and dimensionality reduction unit, wherein

The building module, is also used to construct dimensionality reduction model；The dimensionality reduction model is three layers of DBM model, and utilizes following formula Three layers of DBM model are constructed,

Wherein, v represents visible variable, h¹And h²Respectively second The hidden layer variable of layer and third layer, w connect the weight on side between node unit, and b and c are node unit amount of bias；

The input unit, for the text space distribution parameter to be inputted the dimensionality reduction model；

The dimensionality reduction unit, for the text space distribution parameter to be carried out dimension-reduction treatment in such a way that parameter reconstructs.

6. device according to claim 5, which is characterized in that the building module includes: acquiring unit, generation unit, Computing unit, judging unit, adjustment unit and determination unit, wherein

The acquiring unit, for obtaining the target text spatial distributed parameters of the Partial Linear Models；

The generation unit generates test text spatial distributed parameters for test image to be inputted the Partial Linear Models；

The computing unit, for being transported according to the target text spatial distributed parameters and the test text spatial distributed parameters Calculation obtains error current；It is also used to calculate the difference of the error current and pedestal error；Wherein, the pedestal error is last time The error that operation obtains；

The judging unit, for judging the difference whether less than the first preset threshold；

When the difference is more than or equal to first preset threshold, the adjustment unit, for according to the current mistake Difference adjusts the unknown parameter of the Partial Linear Models；The determination unit, it is described for being more than or equal in the difference When the first preset threshold, the error current is determined as pedestal error；

When the difference is less than first preset threshold, the determination unit is also used to the Partial Linear Models not Know that the current value of parameter is determined as model parameter.

7. device according to claim 6, which is characterized in that

The acquiring unit is also used to obtain the text space distribution parameter of the binary image marked in advance as calibration text Spatial distributed parameters；

The generation unit is also used to input the grey scale pixel value of the binary image dimensionality reduction model generation reconstruct text This spatial distributed parameters；

The computing unit is also used to according to the calibration text space distribution parameter and the reconstruct text space distribution parameter Operation obtains error current；Calculate the difference of the error current and pedestal error；

The judging unit, is also used to judge whether the difference is less than third predetermined threshold value；

When the difference is more than or equal to the third predetermined threshold value, the adjustment unit is also used to according to the error current Adjust the unknown parameter of the dimensionality reduction model；The determination unit is also used to the error current being determined as pedestal error；

When the difference is less than the third predetermined threshold value, the determination unit is also used to join the dimensionality reduction unknown-model Several current values are determined as model parameter.

8. device according to claim 6 or 7, which is characterized in that the acquiring unit includes:

Reading subunit, for reading the output data of the dimensionality reduction model the last layer；

The determination unit is also used to the output data of the dimensionality reduction model the last layer being determined as the target text space Distribution parameter.