CN106257496A

CN106257496A - Mass network text and non-textual image classification method

Info

Publication number: CN106257496A
Application number: CN201610541508.1A
Authority: CN
Inventors: 白翔; 石葆光; 章成全
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2016-07-12
Filing date: 2016-07-12
Publication date: 2016-12-28
Anticipated expiration: 2036-07-12
Also published as: CN106257496B

Abstract

The invention discloses a kind of mass network text and non-textual image classification method, first build multiscale space and divide network, then image training image concentrated, obtain the multiscale image block label information of image, and divide network according to the multiscale space built, the training dataset training multiscale space marked is utilized to divide network of network parameter, then the multiscale space utilizing structure divides the network parameter that network and training obtain, large scale network image to be tested is classified, the final classification results obtaining image, whether it is that text image makes judgement to image, and obtain text filed approximate location in the picture.The inventive method text is high with non-textual image classification accuracy, and has the highest classification effectiveness.

Description

Mass network text and non-textual image classification method

Technical field

The invention belongs to technical field of computer vision, more particularly, to a kind of mass network text and non-textual figure As sorting technique.

Background technology

Along with TV, the developing rapidly of the Internet, human society has the most gradually marched toward information age, from now on, the mankind's Economic life by based on the occupying of information, configure, produce, use.And along with the arrival of information age, increasing image Video data is propagated with approach miscellaneous, and these packets are containing substantial amounts of useful information, how from the number of these magnanimity Extract these useful information according to, will be the information age mankind keys that can obtain more income quickly and efficiently.When Front the Internet provides the video of magnanimity, view data, and the Internet video frame of these magnanimity and the text in network image As a kind of extremely important information source, can be used to aid in multiple practical applications, including image retrieval, man-machine interaction With driving navigation system etc..

The method of the text message in existing acquisition image mainly comprises text detection and text identification two parts, therefore The research problem that always computer realm receives much concern of the major technique of the two image text automatic reading.But, In the data that magnanimity is propagated, the most least a portion of image comprises text, and existing text detection and text recognition method are subject to It is limited to the speed extracting image Chinese version information, is difficult to be directly used in the useful text message extracted in these data, therefore closes Research in text Yu non-textual image classification algorithms possesses higher realistic meaning and use value.

Summary of the invention

It is an object of the invention to provide a kind of mass network text and non-textual image classification method, the method text with Non-textual image categorizing process is simple, and classification accuracy is high.

For achieving the above object, the invention provides a kind of mass network text and non-textual image classification method, including Following steps:

(1) multiscale space divides network struction, and described multiscale space divides network and includes that multi-level features figure is raw One-tenth sub-network, multi-scale image block feature generation sub-network and text and non-textual image block classification sub-network:

(1.1) definition multi-level features figure generates sub-network network structure；

(1.1.1) definition image characteristics extraction network structure；

Specifically, described image characteristics extraction network structure includes five convolution stages, wherein first and second The network structure in convolution stage is two convolutional layers and a maximum pond layer, and the network structure in last three convolution stages is equal It is three convolutional layers and a maximum pond layer, to input picture I, each volume can be obtained through this image characteristics extraction network The output characteristic figure in long-pending stage, is designated asWhereinRepresent the defeated of the s convolution stage The characteristic pattern sequence gone out, M_s,mRepresent m-th characteristic pattern, MNum_sNumber for the s default convolution stage output characteristic figure；

(1.1.2) definition multi-level features figure generates sub-network network structure；

Specifically, the 3rd, the 4th and the 5th net to the image characteristics extraction network described in step (1.1.1) A warp lamination is connect respectively, by the output in these three convolution stage after the network stageIn the chi of all characteristic patterns Degree all zooms to Wm × Hm size, and the characteristic pattern sequence after gained scaling is designated asWherein Wm and Hm represents width and the height of characteristic pattern after default characteristic pattern scaling respectively,Represent the s volume The output characteristic graphic sequence FM in long-pending stage_sIn the characteristic pattern sequence that obtains after scaling of each characteristic pattern, M '_s,mTable Show FM_sThe characteristic pattern that middle m-th characteristic pattern obtains after scaling, MNum_sFor the s default convolution stage output spy Levy the number of figure, afterwards all characteristic patterns in FMS ' are stacked, obtain multi-level features figure, be designated asWherein M "_cRepresent image C characteristic pattern of multi-level features figure, MNum=MNum₃+MNum₄+MNum₅, represent characteristic pattern number in multi-level features figure；

(1.2) definition multi-scale image block feature generates sub-network network structure；

(1.2.1) single scalogram is as block space division；

Specifically, the multi-level features figure described in step (1.1) is generated the image multi-level features figure that sub-network obtains F, multi-level features figure is divided into yardstick isImage block, division methods is expressed as:

F^{i j} (x, y) = F (x + i \frac{W m}{s p}, y + j \frac{H m}{s p}), \{\begin{matrix} 0 \leq x < \frac{W m}{s p} \\ 0 \leq y < \frac{H m}{s p} \end{matrix}

As such, it is possible to multi-level features figure is divided into SP=sp × sp image block, for the image block divided F^ij, image block I corresponding in input picture I^ijComputational methods are:

I^{i j} (x, y) = I (x + i \frac{W}{s p}, y + j \frac{H}{s p}), \{\begin{matrix} 0 \leq x < \frac{W}{s p} \\ 0 \leq y < \frac{H}{s p} \end{matrix}

Wherein F^ijRepresenting and carried out by multi-level features figure after image block division at the i-th row, the image block of jth row, x and y divides Not Biao Shi pixel abscissa in image block and vertical coordinate, Wm and Hm represents width and the height of multi-level features figure respectively Degree, W and H represents width and the height of input picture I respectively, and sp is that default image block divides yardstick；

(1.2.2) multi-scale image block space divides；

Specifically, preset multiple different image block and divide yardstick, be designated asTo each stroke therein Divide yardstick sp_k, according to the method described in step (1.2.1), multi-level features figure F is carried out the division of image block space, can obtain To SP_k=sp_k×sp_kIndividual image block, is divided by multi-scale image block space, and all image block sequences obtained are PS, andWherein Patch_nRepresent n-th image block,Represent image block sum；

(1.2.3) multiscale image block feature extraction；

Specifically, to multi-level features figure F is carried out the figure that the division of multi-scale image block space obtains by step (1.2.2) As each image block Patch in block sequence PS, image block is divided into the most respectively Nsp part, the most each image block Patch can be divided into SPNum=Nsp × Nsp subimage block, is designated asWherein SubP_nspTable Show n-th sp subimage block, then utilize a maximum pond layer that each subimage block is converted to this subimage block corresponding Characteristic vector, then can obtain each subimage block characteristic vector sequence corresponding for image block Patch, be designated asWherein SubV_nspRepresent n-th sp subimage block characteristic of correspondence vector, characteristic vector length It is in described step (1.1.2) characteristic pattern number MNum in the multi-level features figure of gained, by subimages all in image block Block characteristic of correspondence vector splices, and can obtain image block characteristic of correspondence vector, be designated as V=[SubV₁,..., SubV_SPNum], then image block characteristics vector length is MNum × SPNum, and multi-scale image block space divides each obtained Image block extracts the characteristic vector of image block as stated above, obtains the characteristic vector set of all image blocks, is designated asWherein V_nRepresenting n-th image block characteristic of correspondence vector, PNum represents image block sum；

(1.3) definition text and non-textual image block classification sub-network network structure；

Specifically, after the multi-scale image block feature described in step (1.2) generates sub-network, connect one by three entirely Text and the non-textual image block classification network that articulamentum is constituted, to the multi-scale image block feature of gained in step (1.2) to Each image block characteristics vector V in duration set VS, carries out classification by the text with non-textual image block classification network and sentences Certainly, the output Pro obtained represents the probability that this image block is text image block, if Pro ＞ tP, then and the classification results of this image block Being designated as 1, otherwise classification results is 0, it is hereby achieved that the classification results of all image blocks, is designated asWherein Pred_nRepresent the classification results of n-th image block, and Pred_n∈ 0,1}, if Pred_n= 0 represents that this image block is non-textual image block, Pred_n=1 represents that this image block is text image block；

(1.4) build multiscale space and divide network；

Specifically, by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure, Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level It is linked togather, is a complete multiscale space and divides network；

(2) multiscale space division network training:

(2.1) each the image concentrating training image, obtains multiscale image block label information；

Specifically, to training image collectionIn each image Itr, obtain by the mode of artificial mark The position in image Chinese version region, is designated asWherein T represents the number of training image, bb_qRepresent in image The bounding box that q-th is text filed, Q is the number in image Chinese version region, then according to the method described in step (1.2.1), Multiple different image block according to presetting in step (1.2.2) divides yardstickIn each division chi Degree, carries out multi-scale image block space division to image Itr, each image block PatchTr after dividing for space, note The area of image block is SPatchTr, and the height of image block is HPatchTr, and the area in image block Chinese version region is SText, The height in image block Chinese version region is HText, if this image block meets condition:

\{\begin{matrix} \frac{S T e x t}{S P a t c h T r} > t S \\ \frac{H T e x t}{H P a t c h T r} > t H \end{matrix}

It is text filed for then marking this image block, and corresponding label information is 1, and otherwise marking this image block is non-textual district Territory, corresponding label information is 0, and the image block Chinese version region that wherein tS is default accounts for the threshold value of whole image block area ratio, TH is the height threshold value with image block aspect ratio in default image block Chinese version region, and note multiscale image block label information isWherein lbl_lRepresent that the label information of l image block, PNum represent image after multiscale space division The number of block；

(2.2) training obtains the parameter of multiscale space division network；

Specifically, the training image collection χ marked and the training image marked is utilized to concentrate every training image Multiscale image block label informationUtilize build in the method training step (1) of reverse conduction multiple dimensioned Space divides network, and wherein, loss function computational methods are:

L o s s = - Σ_{l = 1}^{P N u m} ({lbl}_{l} \log {pro}_{l} + (1 - {lbl}_{l}) \log (1 - {pro}_{l}))

Wherein, lbl_lRepresent the label information of l image block, PNum represent multiscale space divide after image block Number, pro_lRepresent the probability that the l image block classification result is text image block, divide the output of network for multiscale space, The multiscale space of training gained divides network parameter and is designated as θ；

(3) text is classified with non-textual image:

Specifically, to test image Ite, it is first according to the method described in step (1.2.1), according in step (1.2.2) The multiple different image block preset divides yardstickIn each division yardstick, image Itr is carried out many Scalogram divides as block space, and the collection of all image blocks that note space obtains after dividing is combined intoThen the multiscale space built in step (1) is utilized to divide in network and step (2) The multiscale space that training obtains divides the parameter θ of network, obtains testing the classification court verdict of imageWherein PredTe_rRepresenting the predicting the outcome of r image block in test image, PNum represents Multi-scale image block space divide after image block number, in SubPS all predict the outcome be 1 image block set TextPS i.e. For text image set of blocks all in input picture Ite, it is hereby achieved that the approximate location in image Chinese version region and literary composition The dimensional information of one's respective area, if TextPS is not empty, then the classification results of this test image is text image, otherwise tests figure The classification results of picture is non-textual image.

By the above technical scheme that the present invention is contemplated, compared with prior art, the present invention has following technical effect that

(1) existing mass network text and non-textual image classification method are generally firstly the need of extracting candidate in image Then these candidate regions are filtered by class character area by methods such as classification, finally by the classification to candidate region Whether adjudicated image is the prediction of text image；The inventive method first construct one end-to-end, trainable Multiscale space divides network, be can be achieved with image for input by this network, and image is carried out image block rank Prediction, finally gives discriminant classification result and the text approximate location in the picture of image, such that it is able to do end to end Differentiation to text Yu non-textual image；Therefore the inventive method realizes more succinct；

(2) owing to image being usually present very many class character areas, and existing mass network text and non-textual Image classification method extracts the candidate's class character area in image, and uses the methods such as cluster, classification to all of candidate region Carrying out filtering classification, obtain final classification results, the most this kind of method processing speed is very slow, and this kind of algorithm is easy to be subject to Impact to environmental factorss such as illumination；The inventive method uses the convolutional Neural having the strongest robustness to external conditions such as illumination The method of network, carry out space division by artificial, and classify the image block of each division image, it is to avoid robust Property poor class character area extract process；Therefore the inventive method has the highest classification accuracy and locates the most efficiently Reason speed, and there is the strongest robustness；

(3) present invention is about the differentiation result of mass network text Yu non-textual image, not only comprises whether image is literary composition The information of this image, additionally it is possible to point out word approximate location in picture and dimensional information is big for follow-up text detection link Reduce greatly text search scope.

Accompanying drawing explanation

Fig. 1 is that the multiscale space that the inventive method builds divides network structure.

Detailed description of the invention

In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, right The present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, and It is not used in the restriction present invention.If additionally, technical characteristic involved in each embodiment of invention described below The conflict of not constituting each other just can be mutually combined.

Mass network text of the present invention comprises the following steps with non-textual image classification method:

(1.1.1) definition image characteristics extraction network structure；

(1.2.1) single scalogram is as block space division；

F^{i j} (x, y) = F (x + i \frac{W m}{s p}, y + j \frac{H m}{s p}), \{\begin{matrix} 0 \leq x < \frac{W m}{s p} \\ 0 \leq y < \frac{H m}{s p} \end{matrix}

I^{i j} (x, y) = I (x + i \frac{W}{s p}, y + j \frac{H}{s p}), \{\begin{matrix} 0 \leq x < \frac{W}{s p} \\ 0 \leq y < \frac{H}{s p} \end{matrix}

(1.2.2) multi-scale image block space divides；

(1.2.3) multiscale image block feature extraction；

Specifically, after the multi-scale image block feature described in step (1.2) generates sub-network network, one is connect by three The text that individual full articulamentum is constituted and non-textual image block classification network, special to the multiscale image block of gained in step (1.2) Levy each image block characteristics vector V in vector set VS, carried out point by the text and non-textual image block classification network Class is adjudicated, and the output Pro obtained represents the probability that this image block is text image block, if Pro ＞ tP, then and the classification of this image block Result is designated as 1, and otherwise classification results is 0, it is hereby achieved that the classification results of all image blocks, is designated asWherein Pred_nRepresent the classification results of n-th image block, and Pred_n∈ 0,1}, if Pred_n= 0 represents that this image block is non-textual image block, Pred_n=1 represents that this image block is text image block；

(1.4) build multiscale space and divide network；

Specifically, by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure, Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level It is linked togather, as it is shown in figure 1, be a complete multiscale space to divide network；

(2) multiscale space division network training:

\{\begin{matrix} \frac{S T e x t}{S P a t c h T r} > t S \\ \frac{H T e x t}{H P a t c h T r} > t H \end{matrix}

(2.2) training obtains the parameter of multiscale space division network；

L o s s = - Σ_{l = 1}^{P N u m} ({lbl}_{l} \log {pro}_{l} + (1 - {lbl}_{l}) \log (1 - {pro}_{l}))

(3) text is classified with non-textual image:

Specifically, to test image Ite, it is first according to the method described in step (1.2.1), according in step (1.2.2) The multiple different image block preset divides yardstickIn each division yardstick, image Itr is carried out many Scalogram divides as block space, and the collection of all image blocks that note space obtains after dividing is combined intoThen the multiscale space built in step (1) is utilized to divide instruction in network and step (2) The multiscale space got divides the parameter θ of network, obtains testing the classification court verdict of imageWherein PredTe_rRepresenting the predicting the outcome of r image block in test image, PNum represents Multi-scale image block space divide after image block number, in SubPS all predict the outcome be 1 image block set TextPS i.e. For text image set of blocks all in input picture Ite, it is hereby achieved that the approximate location in image Chinese version region and literary composition The dimensional information of one's respective area, if TextPS is not empty, then the classification results of this test image is text image, otherwise tests figure The classification results of picture is non-textual image.

As it will be easily appreciated by one skilled in the art that and the foregoing is only presently preferred embodiments of the present invention, not in order to Limit the present invention, all any amendment, equivalent and improvement etc. made within the spirit and principles in the present invention, all should comprise Within protection scope of the present invention.

Claims

1. a mass network text and non-textual image classification method, it is characterised in that described method comprises following step:

(1) multiscale space divides network struction, including: (1.1) definition multi-level features figure generates sub-network network structure； (1.2) definition multi-scale image block feature generates sub-network network structure；(1.3) definition text and non-textual image block classification Network of network structure；(1.4) build multiscale space and divide network；

(2) multiscale space division network training: each the image that training image is concentrated by (2.1), obtains multi-scale image Block label information；(2.2) obtain multiscale space according to the training of described multiscale image block label information and divide the parameter of network；

(3) text is classified with non-textual image: divides the parameter of network according to multiscale space, utilizes described multiscale space to draw Text to be identified or non-textual image are classified by subnetwork.

Mass network text the most according to claim 1 and non-textual image classification method, it is characterised in that described step (1.1) particularly as follows:

(1.1.1) definition image characteristics extraction network structure: described image characteristics extraction network structure includes five convolution order Section, wherein the network structure in first and second convolution stage is two convolutional layers and a maximum pond layer, and last three The network structure in individual convolution stage is three convolutional layers and a maximum pond layer, to input picture I, through this characteristics of image Extract network and obtain the output characteristic figure in each convolution stage, be designated asWhereinRepresent The characteristic pattern sequence of the output in the s convolution stage, M_s,mRepresent m-th characteristic pattern, MNum_sFor the s default convolution stage The number of output characteristic figure；

(1.1.2) definition multi-level features figure generates sub-network network structure: carry the characteristics of image described in step (1.1.1) A warp lamination is connect respectively, by the output in these three convolution stage after rear three the convolution stages taking networkIn The yardstick of all characteristic patterns all zoom to Wm × Hm size, the characteristic pattern sequence after gained scaling is designated asCharacteristic pattern width and height after wherein Wm and Hm represents default characteristic pattern scaling respectively,Represent the output characteristic graphic sequence FM in the s convolution stage_sIn each characteristic pattern through scaling After the characteristic pattern sequence that obtains, M_s′_,mRepresent FM_sThe characteristic pattern that middle m-th characteristic pattern obtains after scaling, MNum_sFor All characteristic patterns in FMS ' are stacked, obtain multilamellar by the number of the s the convolution stage output characteristic figure preset afterwards Secondary characteristic pattern, is designated asIts Middle M "_cRepresent c characteristic pattern of the multi-level features figure of image, MNum=MNum₃+MNum₄+MNum₅, represent multi-level features Characteristic pattern number in figure.

Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (1.2) particularly as follows:

(1.2.1) single scalogram is as block space division: the multi-level features figure described in step (1.1) is generated sub-network and obtains Image multi-level features figure F, multi-level features figure is divided into yardstick isImage block, division methods represents For:

F^{i j} (x, y) = F (x + i \frac{W m}{s p}, y + j \frac{H m}{s p}), \{\begin{matrix} 0 \leq x < \frac{W m}{s p} \\ 0 \leq y < \frac{H m}{s p} \end{matrix}

Multi-level features figure is divided into SP=sp × sp image block, for the image block F divided^ij, at input picture I The image block I of middle correspondence^ijComputational methods are:

I^{i j} (x, y) = I (x + i \frac{W}{s p}, y + j \frac{H}{s p}), \{\begin{matrix} 0 \leq x < \frac{W}{s p} \\ 0 \leq y < \frac{H}{s p} \end{matrix}

Wherein F^ijRepresent and distinguish table at the i-th row, the image block of jth row, x and y after multi-level features figure is carried out image block division Show pixel abscissa in image block and vertical coordinate, Wm and Hm represent respectively multi-level features figure width and height, W and H represents width and the height of input picture I respectively, and sp is that default image block divides yardstick；

(1.2.2) multi-scale image block space divides: presets multiple different image block and divides yardstick, is designated asTo each division yardstick sp therein_k, according to the method described in step (1.2.1), to multi-level features Figure F carries out the division of image block space, obtains SP_k=sp_k×sp_kIndividual image block, is divided by multi-scale image block space, obtains All image block sequences be PS, andWherein Patch_nRepresent n-th image block,Represent image block sum；

(1.2.3) multiscale image block feature extraction: multi-level features figure F is carried out multi-scale image in step (1.2.2) Block space divides each image block Patch in image block sequence PS obtained, and is divided into the most respectively by image block Nsp part, the most each image block Patch is divided into SPNum=Nsp × Nsp subimage block, is designated asWherein SubP_nspRepresent n-th sp subimage block, then utilize a maximum pond layer by every Individual subimage block is converted to this subimage block characteristic of correspondence vector, then obtain each subimage block corresponding for image block Patch Characteristic vector sequence, is designated asWherein SubV_nspRepresent n-th sp subimage block characteristic of correspondence Vector, characteristic vector length is in described step (1.1.2) characteristic pattern number MNum in the multi-level features figure of gained, will figure As subimage block characteristic of correspondence vectors all in block splice, obtain image block characteristic of correspondence vector, be designated as V= [SubV₁,...,SubV_SPNum], then image block characteristics vector length is MNum × SPNum, divides multi-scale image block space Each image block obtained extracts the characteristic vector of image block as stated above, obtains the set of eigenvectors of all image blocks Close, be designated asWherein V_nRepresenting n-th image block characteristic of correspondence vector, PNum represents image block sum；

Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (1.3) is particularly as follows: after the multi-scale image block feature described in step (1.2) generates sub-network, connect one by three The text of full articulamentum composition and non-textual image block classification network, to the multi-scale image block feature of gained in step (1.2) Each image block characteristics vector V in vector set VS, is classified with non-textual image block classification network by the text Judgement, the output Pro obtained represents the probability that this image block is text image block, if Pro ＞ tP, then the classification knot of this image block Fruit is designated as 1, and otherwise classification results is 0, thus obtains the classification results of all image blocks, is designated as Wherein Pred_nRepresent the classification results of n-th image block, and Pred_n∈ 0,1}, if Pred_n=0 represents that this image block is non- Text image block, Pred_n=1 represents that this image block is text image block.

Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (1.4) particularly as follows: by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure, Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level It is linked togather, builds a complete multiscale space and divide network.

Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (2.1) particularly as follows:

To training image collectionIn each image Itr, obtain image Chinese local area by the mode of artificial mark The position in territory, is designated asWherein T represents the number of training image, bb_qRepresent that in image, q-th is text filed Bounding box, Q is the number in image Chinese version region, then according to the method described in step (1.2.1), according to step (1.2.2) the multiple different image block preset in divides yardstickIn each division yardstick, to image Itr carries out multi-scale image block space division, each image block PatchTr after dividing for space, note image block Area is SPatchTr, and the height of image block is HPatchTr, and the area in image block Chinese version region is SText, in image block Text filed height is HText, if this image block meets condition:

\{\begin{matrix} \frac{S T e x t}{S P a t c h T r} > t S \\ \frac{H T e x t}{H P a t c h T r} > t H \end{matrix}

It is text filed for then marking this image block, and corresponding label information is 1, and otherwise marking this image block is non-textual region, Corresponding label information is 0, and the image block Chinese version region that wherein tS is default accounts for the threshold value of whole image block area ratio, and tH is The height in the image block Chinese version region preset and the threshold value of image block aspect ratio, note multiscale image block label information isWherein lbl_lRepresent that the label information of l image block, PNum represent image after multiscale space division The number of block；

Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (2.2) particularly as follows:

The training image collection χ marked and the training image marked is utilized to concentrate the multiscale image block of every training image Label informationThe multiscale space built in the method training step (1) of reverse conduction is utilized to divide net Network, wherein, loss function computational methods are:

L o s s = - Σ_{l = 1}^{P N u m} ({lbl}_{l} \log {pro}_{l} + (1 - {lbl}_{l}) l o g (1 - {pro}_{l}))

Wherein, lbl_lRepresent the label information of l image block, PNum represent multiscale space divide after the number of image block, pro_lRepresent the probability that the l image block classification result is text image block, divide the output of network, training for multiscale space The multiscale space of gained divides network parameter and is designated as θ；

Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (3) is particularly as follows: to test image Ite, be first according to the method described in step (1.2.1), according to pre-in step (1.2.2) If multiple different image block divide yardstickIn each division yardstick, image Itr is carried out many chis Degree image block space divides, and the collection of all image blocks that note space obtains after dividing is combined into Then utilize the multiscale space built in step (1) to divide the multiscale space that in network and step (2), training obtains to draw The parameter θ of subnetwork, obtains testing the classification court verdict of imageWherein PredTe_rRepresent The predicting the outcome of r image block in test image, PNum represents the image block number after the division of multi-scale image block space, In SubPS all predict the outcome be 1 image block set TextPS be all text image set of blocks in input picture Ite, Thus obtain the approximate location in image Chinese version region and text filed dimensional information, if TextPS is not empty, then should The classification results of test image is text image, and the classification results otherwise testing image is non-textual image.