CN106257496A - Mass network text and non-textual image classification method - Google Patents

Mass network text and non-textual image classification method Download PDF

Info

Publication number
CN106257496A
CN106257496A CN201610541508.1A CN201610541508A CN106257496A CN 106257496 A CN106257496 A CN 106257496A CN 201610541508 A CN201610541508 A CN 201610541508A CN 106257496 A CN106257496 A CN 106257496A
Authority
CN
China
Prior art keywords
image
image block
network
text
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610541508.1A
Other languages
Chinese (zh)
Other versions
CN106257496B (en
Inventor
白翔
石葆光
章成全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201610541508.1A priority Critical patent/CN106257496B/en
Publication of CN106257496A publication Critical patent/CN106257496A/en
Application granted granted Critical
Publication of CN106257496B publication Critical patent/CN106257496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of mass network text and non-textual image classification method, first build multiscale space and divide network, then image training image concentrated, obtain the multiscale image block label information of image, and divide network according to the multiscale space built, the training dataset training multiscale space marked is utilized to divide network of network parameter, then the multiscale space utilizing structure divides the network parameter that network and training obtain, large scale network image to be tested is classified, the final classification results obtaining image, whether it is that text image makes judgement to image, and obtain text filed approximate location in the picture.The inventive method text is high with non-textual image classification accuracy, and has the highest classification effectiveness.

Description

Mass network text and non-textual image classification method
Technical field
The invention belongs to technical field of computer vision, more particularly, to a kind of mass network text and non-textual figure As sorting technique.
Background technology
Along with TV, the developing rapidly of the Internet, human society has the most gradually marched toward information age, from now on, the mankind's Economic life by based on the occupying of information, configure, produce, use.And along with the arrival of information age, increasing image Video data is propagated with approach miscellaneous, and these packets are containing substantial amounts of useful information, how from the number of these magnanimity Extract these useful information according to, will be the information age mankind keys that can obtain more income quickly and efficiently.When Front the Internet provides the video of magnanimity, view data, and the Internet video frame of these magnanimity and the text in network image As a kind of extremely important information source, can be used to aid in multiple practical applications, including image retrieval, man-machine interaction With driving navigation system etc..
The method of the text message in existing acquisition image mainly comprises text detection and text identification two parts, therefore The research problem that always computer realm receives much concern of the major technique of the two image text automatic reading.But, In the data that magnanimity is propagated, the most least a portion of image comprises text, and existing text detection and text recognition method are subject to It is limited to the speed extracting image Chinese version information, is difficult to be directly used in the useful text message extracted in these data, therefore closes Research in text Yu non-textual image classification algorithms possesses higher realistic meaning and use value.
Summary of the invention
It is an object of the invention to provide a kind of mass network text and non-textual image classification method, the method text with Non-textual image categorizing process is simple, and classification accuracy is high.
For achieving the above object, the invention provides a kind of mass network text and non-textual image classification method, including Following steps:
(1) multiscale space divides network struction, and described multiscale space divides network and includes that multi-level features figure is raw One-tenth sub-network, multi-scale image block feature generation sub-network and text and non-textual image block classification sub-network:
(1.1) definition multi-level features figure generates sub-network network structure;
(1.1.1) definition image characteristics extraction network structure;
Specifically, described image characteristics extraction network structure includes five convolution stages, wherein first and second The network structure in convolution stage is two convolutional layers and a maximum pond layer, and the network structure in last three convolution stages is equal It is three convolutional layers and a maximum pond layer, to input picture I, each volume can be obtained through this image characteristics extraction network The output characteristic figure in long-pending stage, is designated asWhereinRepresent the defeated of the s convolution stage The characteristic pattern sequence gone out, Ms,mRepresent m-th characteristic pattern, MNumsNumber for the s default convolution stage output characteristic figure;
(1.1.2) definition multi-level features figure generates sub-network network structure;
Specifically, the 3rd, the 4th and the 5th net to the image characteristics extraction network described in step (1.1.1) A warp lamination is connect respectively, by the output in these three convolution stage after the network stageIn the chi of all characteristic patterns Degree all zooms to Wm × Hm size, and the characteristic pattern sequence after gained scaling is designated asWherein Wm and Hm represents width and the height of characteristic pattern after default characteristic pattern scaling respectively,Represent the s volume The output characteristic graphic sequence FM in long-pending stagesIn the characteristic pattern sequence that obtains after scaling of each characteristic pattern, M 's,mTable Show FMsThe characteristic pattern that middle m-th characteristic pattern obtains after scaling, MNumsFor the s default convolution stage output spy Levy the number of figure, afterwards all characteristic patterns in FMS ' are stacked, obtain multi-level features figure, be designated asWherein M "cRepresent image C characteristic pattern of multi-level features figure, MNum=MNum3+MNum4+MNum5, represent characteristic pattern number in multi-level features figure;
(1.2) definition multi-scale image block feature generates sub-network network structure;
(1.2.1) single scalogram is as block space division;
Specifically, the multi-level features figure described in step (1.1) is generated the image multi-level features figure that sub-network obtains F, multi-level features figure is divided into yardstick isImage block, division methods is expressed as:
F i j ( x , y ) = F ( x + i W m s p , y + j H m s p ) , 0 &le; x < W m s p 0 &le; y < H m s p
As such, it is possible to multi-level features figure is divided into SP=sp × sp image block, for the image block divided Fij, image block I corresponding in input picture IijComputational methods are:
I i j ( x , y ) = I ( x + i W s p , y + j H s p ) , 0 &le; x < W s p 0 &le; y < H s p
Wherein FijRepresenting and carried out by multi-level features figure after image block division at the i-th row, the image block of jth row, x and y divides Not Biao Shi pixel abscissa in image block and vertical coordinate, Wm and Hm represents width and the height of multi-level features figure respectively Degree, W and H represents width and the height of input picture I respectively, and sp is that default image block divides yardstick;
(1.2.2) multi-scale image block space divides;
Specifically, preset multiple different image block and divide yardstick, be designated asTo each stroke therein Divide yardstick spk, according to the method described in step (1.2.1), multi-level features figure F is carried out the division of image block space, can obtain To SPk=spk×spkIndividual image block, is divided by multi-scale image block space, and all image block sequences obtained are PS, andWherein PatchnRepresent n-th image block,Represent image block sum;
(1.2.3) multiscale image block feature extraction;
Specifically, to multi-level features figure F is carried out the figure that the division of multi-scale image block space obtains by step (1.2.2) As each image block Patch in block sequence PS, image block is divided into the most respectively Nsp part, the most each image block Patch can be divided into SPNum=Nsp × Nsp subimage block, is designated asWherein SubPnspTable Show n-th sp subimage block, then utilize a maximum pond layer that each subimage block is converted to this subimage block corresponding Characteristic vector, then can obtain each subimage block characteristic vector sequence corresponding for image block Patch, be designated asWherein SubVnspRepresent n-th sp subimage block characteristic of correspondence vector, characteristic vector length It is in described step (1.1.2) characteristic pattern number MNum in the multi-level features figure of gained, by subimages all in image block Block characteristic of correspondence vector splices, and can obtain image block characteristic of correspondence vector, be designated as V=[SubV1,..., SubVSPNum], then image block characteristics vector length is MNum × SPNum, and multi-scale image block space divides each obtained Image block extracts the characteristic vector of image block as stated above, obtains the characteristic vector set of all image blocks, is designated asWherein VnRepresenting n-th image block characteristic of correspondence vector, PNum represents image block sum;
(1.3) definition text and non-textual image block classification sub-network network structure;
Specifically, after the multi-scale image block feature described in step (1.2) generates sub-network, connect one by three entirely Text and the non-textual image block classification network that articulamentum is constituted, to the multi-scale image block feature of gained in step (1.2) to Each image block characteristics vector V in duration set VS, carries out classification by the text with non-textual image block classification network and sentences Certainly, the output Pro obtained represents the probability that this image block is text image block, if Pro > tP, then and the classification results of this image block Being designated as 1, otherwise classification results is 0, it is hereby achieved that the classification results of all image blocks, is designated asWherein PrednRepresent the classification results of n-th image block, and Predn∈ 0,1}, if Predn= 0 represents that this image block is non-textual image block, Predn=1 represents that this image block is text image block;
(1.4) build multiscale space and divide network;
Specifically, by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure, Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level It is linked togather, is a complete multiscale space and divides network;
(2) multiscale space division network training:
(2.1) each the image concentrating training image, obtains multiscale image block label information;
Specifically, to training image collectionIn each image Itr, obtain by the mode of artificial mark The position in image Chinese version region, is designated asWherein T represents the number of training image, bbqRepresent in image The bounding box that q-th is text filed, Q is the number in image Chinese version region, then according to the method described in step (1.2.1), Multiple different image block according to presetting in step (1.2.2) divides yardstickIn each division chi Degree, carries out multi-scale image block space division to image Itr, each image block PatchTr after dividing for space, note The area of image block is SPatchTr, and the height of image block is HPatchTr, and the area in image block Chinese version region is SText, The height in image block Chinese version region is HText, if this image block meets condition:
S T e x t S P a t c h T r > t S H T e x t H P a t c h T r > t H
It is text filed for then marking this image block, and corresponding label information is 1, and otherwise marking this image block is non-textual district Territory, corresponding label information is 0, and the image block Chinese version region that wherein tS is default accounts for the threshold value of whole image block area ratio, TH is the height threshold value with image block aspect ratio in default image block Chinese version region, and note multiscale image block label information isWherein lbllRepresent that the label information of l image block, PNum represent image after multiscale space division The number of block;
(2.2) training obtains the parameter of multiscale space division network;
Specifically, the training image collection χ marked and the training image marked is utilized to concentrate every training image Multiscale image block label informationUtilize build in the method training step (1) of reverse conduction multiple dimensioned Space divides network, and wherein, loss function computational methods are:
L o s s = - &Sigma; l = 1 P N u m ( lbl l log pro l + ( 1 - lbl l ) log ( 1 - pro l ) )
Wherein, lbllRepresent the label information of l image block, PNum represent multiscale space divide after image block Number, prolRepresent the probability that the l image block classification result is text image block, divide the output of network for multiscale space, The multiscale space of training gained divides network parameter and is designated as θ;
(3) text is classified with non-textual image:
Specifically, to test image Ite, it is first according to the method described in step (1.2.1), according in step (1.2.2) The multiple different image block preset divides yardstickIn each division yardstick, image Itr is carried out many Scalogram divides as block space, and the collection of all image blocks that note space obtains after dividing is combined intoThen the multiscale space built in step (1) is utilized to divide in network and step (2) The multiscale space that training obtains divides the parameter θ of network, obtains testing the classification court verdict of imageWherein PredTerRepresenting the predicting the outcome of r image block in test image, PNum represents Multi-scale image block space divide after image block number, in SubPS all predict the outcome be 1 image block set TextPS i.e. For text image set of blocks all in input picture Ite, it is hereby achieved that the approximate location in image Chinese version region and literary composition The dimensional information of one's respective area, if TextPS is not empty, then the classification results of this test image is text image, otherwise tests figure The classification results of picture is non-textual image.
By the above technical scheme that the present invention is contemplated, compared with prior art, the present invention has following technical effect that
(1) existing mass network text and non-textual image classification method are generally firstly the need of extracting candidate in image Then these candidate regions are filtered by class character area by methods such as classification, finally by the classification to candidate region Whether adjudicated image is the prediction of text image;The inventive method first construct one end-to-end, trainable Multiscale space divides network, be can be achieved with image for input by this network, and image is carried out image block rank Prediction, finally gives discriminant classification result and the text approximate location in the picture of image, such that it is able to do end to end Differentiation to text Yu non-textual image;Therefore the inventive method realizes more succinct;
(2) owing to image being usually present very many class character areas, and existing mass network text and non-textual Image classification method extracts the candidate's class character area in image, and uses the methods such as cluster, classification to all of candidate region Carrying out filtering classification, obtain final classification results, the most this kind of method processing speed is very slow, and this kind of algorithm is easy to be subject to Impact to environmental factorss such as illumination;The inventive method uses the convolutional Neural having the strongest robustness to external conditions such as illumination The method of network, carry out space division by artificial, and classify the image block of each division image, it is to avoid robust Property poor class character area extract process;Therefore the inventive method has the highest classification accuracy and locates the most efficiently Reason speed, and there is the strongest robustness;
(3) present invention is about the differentiation result of mass network text Yu non-textual image, not only comprises whether image is literary composition The information of this image, additionally it is possible to point out word approximate location in picture and dimensional information is big for follow-up text detection link Reduce greatly text search scope.
Accompanying drawing explanation
Fig. 1 is that the multiscale space that the inventive method builds divides network structure.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, right The present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, and It is not used in the restriction present invention.If additionally, technical characteristic involved in each embodiment of invention described below The conflict of not constituting each other just can be mutually combined.
Mass network text of the present invention comprises the following steps with non-textual image classification method:
(1) multiscale space divides network struction, and described multiscale space divides network and includes that multi-level features figure is raw One-tenth sub-network, multi-scale image block feature generation sub-network and text and non-textual image block classification sub-network:
(1.1) definition multi-level features figure generates sub-network network structure;
(1.1.1) definition image characteristics extraction network structure;
Specifically, described image characteristics extraction network structure includes five convolution stages, wherein first and second The network structure in convolution stage is two convolutional layers and a maximum pond layer, and the network structure in last three convolution stages is equal It is three convolutional layers and a maximum pond layer, to input picture I, each volume can be obtained through this image characteristics extraction network The output characteristic figure in long-pending stage, is designated asWhereinRepresent the defeated of the s convolution stage The characteristic pattern sequence gone out, Ms,mRepresent m-th characteristic pattern, MNumsNumber for the s default convolution stage output characteristic figure;
(1.1.2) definition multi-level features figure generates sub-network network structure;
Specifically, the 3rd, the 4th and the 5th net to the image characteristics extraction network described in step (1.1.1) A warp lamination is connect respectively, by the output in these three convolution stage after the network stageIn the chi of all characteristic patterns Degree all zooms to Wm × Hm size, and the characteristic pattern sequence after gained scaling is designated asWherein Wm and Hm represents width and the height of characteristic pattern after default characteristic pattern scaling respectively,Represent the s volume The output characteristic graphic sequence FM in long-pending stagesIn the characteristic pattern sequence that obtains after scaling of each characteristic pattern, M 's,mTable Show FMsThe characteristic pattern that middle m-th characteristic pattern obtains after scaling, MNumsFor the s default convolution stage output spy Levy the number of figure, afterwards all characteristic patterns in FMS ' are stacked, obtain multi-level features figure, be designated asWherein M "cRepresent image C characteristic pattern of multi-level features figure, MNum=MNum3+MNum4+MNum5, represent characteristic pattern number in multi-level features figure;
(1.2) definition multi-scale image block feature generates sub-network network structure;
(1.2.1) single scalogram is as block space division;
Specifically, the multi-level features figure described in step (1.1) is generated the image multi-level features figure that sub-network obtains F, multi-level features figure is divided into yardstick isImage block, division methods is expressed as:
F i j ( x , y ) = F ( x + i W m s p , y + j H m s p ) , 0 &le; x < W m s p 0 &le; y < H m s p
As such, it is possible to multi-level features figure is divided into SP=sp × sp image block, for the image block divided Fij, image block I corresponding in input picture IijComputational methods are:
I i j ( x , y ) = I ( x + i W s p , y + j H s p ) , 0 &le; x < W s p 0 &le; y < H s p
Wherein FijRepresenting and carried out by multi-level features figure after image block division at the i-th row, the image block of jth row, x and y divides Not Biao Shi pixel abscissa in image block and vertical coordinate, Wm and Hm represents width and the height of multi-level features figure respectively Degree, W and H represents width and the height of input picture I respectively, and sp is that default image block divides yardstick;
(1.2.2) multi-scale image block space divides;
Specifically, preset multiple different image block and divide yardstick, be designated asTo each stroke therein Divide yardstick spk, according to the method described in step (1.2.1), multi-level features figure F is carried out the division of image block space, can obtain To SPk=spk×spkIndividual image block, is divided by multi-scale image block space, and all image block sequences obtained are PS, andWherein PatchnRepresent n-th image block,Represent image block sum;
(1.2.3) multiscale image block feature extraction;
Specifically, to multi-level features figure F is carried out the figure that the division of multi-scale image block space obtains by step (1.2.2) As each image block Patch in block sequence PS, image block is divided into the most respectively Nsp part, the most each image block Patch can be divided into SPNum=Nsp × Nsp subimage block, is designated asWherein SubPnspTable Show n-th sp subimage block, then utilize a maximum pond layer that each subimage block is converted to this subimage block corresponding Characteristic vector, then can obtain each subimage block characteristic vector sequence corresponding for image block Patch, be designated asWherein SubVnspRepresent n-th sp subimage block characteristic of correspondence vector, characteristic vector length It is in described step (1.1.2) characteristic pattern number MNum in the multi-level features figure of gained, by subimages all in image block Block characteristic of correspondence vector splices, and can obtain image block characteristic of correspondence vector, be designated as V=[SubV1,..., SubVSPNum], then image block characteristics vector length is MNum × SPNum, and multi-scale image block space divides each obtained Image block extracts the characteristic vector of image block as stated above, obtains the characteristic vector set of all image blocks, is designated asWherein VnRepresenting n-th image block characteristic of correspondence vector, PNum represents image block sum;
(1.3) definition text and non-textual image block classification sub-network network structure;
Specifically, after the multi-scale image block feature described in step (1.2) generates sub-network network, one is connect by three The text that individual full articulamentum is constituted and non-textual image block classification network, special to the multiscale image block of gained in step (1.2) Levy each image block characteristics vector V in vector set VS, carried out point by the text and non-textual image block classification network Class is adjudicated, and the output Pro obtained represents the probability that this image block is text image block, if Pro > tP, then and the classification of this image block Result is designated as 1, and otherwise classification results is 0, it is hereby achieved that the classification results of all image blocks, is designated asWherein PrednRepresent the classification results of n-th image block, and Predn∈ 0,1}, if Predn= 0 represents that this image block is non-textual image block, Predn=1 represents that this image block is text image block;
(1.4) build multiscale space and divide network;
Specifically, by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure, Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level It is linked togather, as it is shown in figure 1, be a complete multiscale space to divide network;
(2) multiscale space division network training:
(2.1) each the image concentrating training image, obtains multiscale image block label information;
Specifically, to training image collectionIn each image Itr, obtain by the mode of artificial mark The position in image Chinese version region, is designated asWherein T represents the number of training image, bbqRepresent in image The bounding box that q-th is text filed, Q is the number in image Chinese version region, then according to the method described in step (1.2.1), Multiple different image block according to presetting in step (1.2.2) divides yardstickIn each division chi Degree, carries out multi-scale image block space division to image Itr, each image block PatchTr after dividing for space, note The area of image block is SPatchTr, and the height of image block is HPatchTr, and the area in image block Chinese version region is SText, The height in image block Chinese version region is HText, if this image block meets condition:
S T e x t S P a t c h T r > t S H T e x t H P a t c h T r > t H
It is text filed for then marking this image block, and corresponding label information is 1, and otherwise marking this image block is non-textual district Territory, corresponding label information is 0, and the image block Chinese version region that wherein tS is default accounts for the threshold value of whole image block area ratio, TH is the height threshold value with image block aspect ratio in default image block Chinese version region, and note multiscale image block label information isWherein lbllRepresent that the label information of l image block, PNum represent image after multiscale space division The number of block;
(2.2) training obtains the parameter of multiscale space division network;
Specifically, the training image collection χ marked and the training image marked is utilized to concentrate every training image Multiscale image block label informationUtilize build in the method training step (1) of reverse conduction multiple dimensioned Space divides network, and wherein, loss function computational methods are:
L o s s = - &Sigma; l = 1 P N u m ( lbl l log pro l + ( 1 - lbl l ) log ( 1 - pro l ) )
Wherein, lbllRepresent the label information of l image block, PNum represent multiscale space divide after image block Number, prolRepresent the probability that the l image block classification result is text image block, divide the output of network for multiscale space, The multiscale space of training gained divides network parameter and is designated as θ;
(3) text is classified with non-textual image:
Specifically, to test image Ite, it is first according to the method described in step (1.2.1), according in step (1.2.2) The multiple different image block preset divides yardstickIn each division yardstick, image Itr is carried out many Scalogram divides as block space, and the collection of all image blocks that note space obtains after dividing is combined intoThen the multiscale space built in step (1) is utilized to divide instruction in network and step (2) The multiscale space got divides the parameter θ of network, obtains testing the classification court verdict of imageWherein PredTerRepresenting the predicting the outcome of r image block in test image, PNum represents Multi-scale image block space divide after image block number, in SubPS all predict the outcome be 1 image block set TextPS i.e. For text image set of blocks all in input picture Ite, it is hereby achieved that the approximate location in image Chinese version region and literary composition The dimensional information of one's respective area, if TextPS is not empty, then the classification results of this test image is text image, otherwise tests figure The classification results of picture is non-textual image.
As it will be easily appreciated by one skilled in the art that and the foregoing is only presently preferred embodiments of the present invention, not in order to Limit the present invention, all any amendment, equivalent and improvement etc. made within the spirit and principles in the present invention, all should comprise Within protection scope of the present invention.

Claims (8)

1. a mass network text and non-textual image classification method, it is characterised in that described method comprises following step:
(1) multiscale space divides network struction, including: (1.1) definition multi-level features figure generates sub-network network structure; (1.2) definition multi-scale image block feature generates sub-network network structure;(1.3) definition text and non-textual image block classification Network of network structure;(1.4) build multiscale space and divide network;
(2) multiscale space division network training: each the image that training image is concentrated by (2.1), obtains multi-scale image Block label information;(2.2) obtain multiscale space according to the training of described multiscale image block label information and divide the parameter of network;
(3) text is classified with non-textual image: divides the parameter of network according to multiscale space, utilizes described multiscale space to draw Text to be identified or non-textual image are classified by subnetwork.
Mass network text the most according to claim 1 and non-textual image classification method, it is characterised in that described step (1.1) particularly as follows:
(1.1.1) definition image characteristics extraction network structure: described image characteristics extraction network structure includes five convolution order Section, wherein the network structure in first and second convolution stage is two convolutional layers and a maximum pond layer, and last three The network structure in individual convolution stage is three convolutional layers and a maximum pond layer, to input picture I, through this characteristics of image Extract network and obtain the output characteristic figure in each convolution stage, be designated asWhereinRepresent The characteristic pattern sequence of the output in the s convolution stage, Ms,mRepresent m-th characteristic pattern, MNumsFor the s default convolution stage The number of output characteristic figure;
(1.1.2) definition multi-level features figure generates sub-network network structure: carry the characteristics of image described in step (1.1.1) A warp lamination is connect respectively, by the output in these three convolution stage after rear three the convolution stages taking networkIn The yardstick of all characteristic patterns all zoom to Wm × Hm size, the characteristic pattern sequence after gained scaling is designated asCharacteristic pattern width and height after wherein Wm and Hm represents default characteristic pattern scaling respectively,Represent the output characteristic graphic sequence FM in the s convolution stagesIn each characteristic pattern through scaling After the characteristic pattern sequence that obtains, Ms,mRepresent FMsThe characteristic pattern that middle m-th characteristic pattern obtains after scaling, MNumsFor All characteristic patterns in FMS ' are stacked, obtain multilamellar by the number of the s the convolution stage output characteristic figure preset afterwards Secondary characteristic pattern, is designated asIts Middle M "cRepresent c characteristic pattern of the multi-level features figure of image, MNum=MNum3+MNum4+MNum5, represent multi-level features Characteristic pattern number in figure.
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (1.2) particularly as follows:
(1.2.1) single scalogram is as block space division: the multi-level features figure described in step (1.1) is generated sub-network and obtains Image multi-level features figure F, multi-level features figure is divided into yardstick isImage block, division methods represents For:
F i j ( x , y ) = F ( x + i W m s p , y + j H m s p ) , 0 &le; x < W m s p 0 &le; y < H m s p
Multi-level features figure is divided into SP=sp × sp image block, for the image block F dividedij, at input picture I The image block I of middle correspondenceijComputational methods are:
I i j ( x , y ) = I ( x + i W s p , y + j H s p ) , 0 &le; x < W s p 0 &le; y < H s p
Wherein FijRepresent and distinguish table at the i-th row, the image block of jth row, x and y after multi-level features figure is carried out image block division Show pixel abscissa in image block and vertical coordinate, Wm and Hm represent respectively multi-level features figure width and height, W and H represents width and the height of input picture I respectively, and sp is that default image block divides yardstick;
(1.2.2) multi-scale image block space divides: presets multiple different image block and divides yardstick, is designated asTo each division yardstick sp thereink, according to the method described in step (1.2.1), to multi-level features Figure F carries out the division of image block space, obtains SPk=spk×spkIndividual image block, is divided by multi-scale image block space, obtains All image block sequences be PS, andWherein PatchnRepresent n-th image block,Represent image block sum;
(1.2.3) multiscale image block feature extraction: multi-level features figure F is carried out multi-scale image in step (1.2.2) Block space divides each image block Patch in image block sequence PS obtained, and is divided into the most respectively by image block Nsp part, the most each image block Patch is divided into SPNum=Nsp × Nsp subimage block, is designated asWherein SubPnspRepresent n-th sp subimage block, then utilize a maximum pond layer by every Individual subimage block is converted to this subimage block characteristic of correspondence vector, then obtain each subimage block corresponding for image block Patch Characteristic vector sequence, is designated asWherein SubVnspRepresent n-th sp subimage block characteristic of correspondence Vector, characteristic vector length is in described step (1.1.2) characteristic pattern number MNum in the multi-level features figure of gained, will figure As subimage block characteristic of correspondence vectors all in block splice, obtain image block characteristic of correspondence vector, be designated as V= [SubV1,...,SubVSPNum], then image block characteristics vector length is MNum × SPNum, divides multi-scale image block space Each image block obtained extracts the characteristic vector of image block as stated above, obtains the set of eigenvectors of all image blocks Close, be designated asWherein VnRepresenting n-th image block characteristic of correspondence vector, PNum represents image block sum;
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (1.3) is particularly as follows: after the multi-scale image block feature described in step (1.2) generates sub-network, connect one by three The text of full articulamentum composition and non-textual image block classification network, to the multi-scale image block feature of gained in step (1.2) Each image block characteristics vector V in vector set VS, is classified with non-textual image block classification network by the text Judgement, the output Pro obtained represents the probability that this image block is text image block, if Pro > tP, then the classification knot of this image block Fruit is designated as 1, and otherwise classification results is 0, thus obtains the classification results of all image blocks, is designated as Wherein PrednRepresent the classification results of n-th image block, and Predn∈ 0,1}, if Predn=0 represents that this image block is non- Text image block, Predn=1 represents that this image block is text image block.
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (1.4) particularly as follows: by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure, Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level It is linked togather, builds a complete multiscale space and divide network.
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (2.1) particularly as follows:
To training image collectionIn each image Itr, obtain image Chinese local area by the mode of artificial mark The position in territory, is designated asWherein T represents the number of training image, bbqRepresent that in image, q-th is text filed Bounding box, Q is the number in image Chinese version region, then according to the method described in step (1.2.1), according to step (1.2.2) the multiple different image block preset in divides yardstickIn each division yardstick, to image Itr carries out multi-scale image block space division, each image block PatchTr after dividing for space, note image block Area is SPatchTr, and the height of image block is HPatchTr, and the area in image block Chinese version region is SText, in image block Text filed height is HText, if this image block meets condition:
S T e x t S P a t c h T r > t S H T e x t H P a t c h T r > t H
It is text filed for then marking this image block, and corresponding label information is 1, and otherwise marking this image block is non-textual region, Corresponding label information is 0, and the image block Chinese version region that wherein tS is default accounts for the threshold value of whole image block area ratio, and tH is The height in the image block Chinese version region preset and the threshold value of image block aspect ratio, note multiscale image block label information isWherein lbllRepresent that the label information of l image block, PNum represent image after multiscale space division The number of block;
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (2.2) particularly as follows:
The training image collection χ marked and the training image marked is utilized to concentrate the multiscale image block of every training image Label informationThe multiscale space built in the method training step (1) of reverse conduction is utilized to divide net Network, wherein, loss function computational methods are:
L o s s = - &Sigma; l = 1 P N u m ( lbl l log pro l + ( 1 - lbl l ) l o g ( 1 - pro l ) )
Wherein, lbllRepresent the label information of l image block, PNum represent multiscale space divide after the number of image block, prolRepresent the probability that the l image block classification result is text image block, divide the output of network, training for multiscale space The multiscale space of gained divides network parameter and is designated as θ;
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described Step (3) is particularly as follows: to test image Ite, be first according to the method described in step (1.2.1), according to pre-in step (1.2.2) If multiple different image block divide yardstickIn each division yardstick, image Itr is carried out many chis Degree image block space divides, and the collection of all image blocks that note space obtains after dividing is combined into Then utilize the multiscale space built in step (1) to divide the multiscale space that in network and step (2), training obtains to draw The parameter θ of subnetwork, obtains testing the classification court verdict of imageWherein PredTerRepresent The predicting the outcome of r image block in test image, PNum represents the image block number after the division of multi-scale image block space, In SubPS all predict the outcome be 1 image block set TextPS be all text image set of blocks in input picture Ite, Thus obtain the approximate location in image Chinese version region and text filed dimensional information, if TextPS is not empty, then should The classification results of test image is text image, and the classification results otherwise testing image is non-textual image.
CN201610541508.1A 2016-07-12 2016-07-12 Mass network text and non-textual image classification method Active CN106257496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610541508.1A CN106257496B (en) 2016-07-12 2016-07-12 Mass network text and non-textual image classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610541508.1A CN106257496B (en) 2016-07-12 2016-07-12 Mass network text and non-textual image classification method

Publications (2)

Publication Number Publication Date
CN106257496A true CN106257496A (en) 2016-12-28
CN106257496B CN106257496B (en) 2019-06-07

Family

ID=57714130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610541508.1A Active CN106257496B (en) 2016-07-12 2016-07-12 Mass network text and non-textual image classification method

Country Status (1)

Country Link
CN (1) CN106257496B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657522A (en) * 2017-10-10 2019-04-19 北京京东尚科信息技术有限公司 Detect the method and apparatus that can travel region
CN109711481A (en) * 2019-01-02 2019-05-03 京东方科技集团股份有限公司 Neural network, correlation technique, medium and equipment for the identification of paintings multi-tag
CN109711241A (en) * 2018-10-30 2019-05-03 百度在线网络技术(北京)有限公司 Object detecting method, device and electronic equipment
CN109740482A (en) * 2018-12-26 2019-05-10 北京科技大学 A kind of image text recognition methods and device
CN109815473A (en) * 2019-01-28 2019-05-28 四川译讯信息科技有限公司 A kind of documents editing householder method
CN109858432A (en) * 2019-01-28 2019-06-07 北京市商汤科技开发有限公司 Method and device, the computer equipment of text information in a kind of detection image
CN110378330A (en) * 2018-04-12 2019-10-25 Oppo广东移动通信有限公司 Picture classification method and Related product
WO2020052085A1 (en) * 2018-09-13 2020-03-19 北京字节跳动网络技术有限公司 Video text detection method and device, and computer readable storage medium
CN114565800A (en) * 2022-04-24 2022-05-31 深圳尚米网络技术有限公司 Method for detecting illegal picture and picture detection engine

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070065003A1 (en) * 2005-09-21 2007-03-22 Lockheed Martin Corporation Real-time recognition of mixed source text
CN105184312A (en) * 2015-08-24 2015-12-23 中国科学院自动化研究所 Character detection method and device based on deep learning
CN105608456A (en) * 2015-12-22 2016-05-25 华中科技大学 Multi-directional text detection method based on full convolution network
CN105740909A (en) * 2016-02-02 2016-07-06 华中科技大学 Text recognition method under natural scene on the basis of spatial transformation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070065003A1 (en) * 2005-09-21 2007-03-22 Lockheed Martin Corporation Real-time recognition of mixed source text
CN105184312A (en) * 2015-08-24 2015-12-23 中国科学院自动化研究所 Character detection method and device based on deep learning
CN105608456A (en) * 2015-12-22 2016-05-25 华中科技大学 Multi-directional text detection method based on full convolution network
CN105740909A (en) * 2016-02-02 2016-07-06 华中科技大学 Text recognition method under natural scene on the basis of spatial transformation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENGQUAN ZHANG ETAL.: "Automatic discrimination of text and non-text natural images", 《INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR)》 *
N. SHARMA ETAL.: "Piecewise linearity based method for text frame classification in video", 《PATTERN RECOGNITION》 *
马然: "基于深度学习的自然场景文本识别系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657522A (en) * 2017-10-10 2019-04-19 北京京东尚科信息技术有限公司 Detect the method and apparatus that can travel region
CN110378330A (en) * 2018-04-12 2019-10-25 Oppo广东移动通信有限公司 Picture classification method and Related product
CN110378330B (en) * 2018-04-12 2021-07-13 Oppo广东移动通信有限公司 Picture classification method and related product
WO2020052085A1 (en) * 2018-09-13 2020-03-19 北京字节跳动网络技术有限公司 Video text detection method and device, and computer readable storage medium
CN109711241B (en) * 2018-10-30 2021-07-20 百度在线网络技术(北京)有限公司 Object detection method and device and electronic equipment
CN109711241A (en) * 2018-10-30 2019-05-03 百度在线网络技术(北京)有限公司 Object detecting method, device and electronic equipment
CN109740482A (en) * 2018-12-26 2019-05-10 北京科技大学 A kind of image text recognition methods and device
CN109711481A (en) * 2019-01-02 2019-05-03 京东方科技集团股份有限公司 Neural network, correlation technique, medium and equipment for the identification of paintings multi-tag
CN109711481B (en) * 2019-01-02 2021-09-10 京东方艺云科技有限公司 Neural networks for drawing multi-label recognition, related methods, media and devices
CN109858432A (en) * 2019-01-28 2019-06-07 北京市商汤科技开发有限公司 Method and device, the computer equipment of text information in a kind of detection image
CN109815473A (en) * 2019-01-28 2019-05-28 四川译讯信息科技有限公司 A kind of documents editing householder method
CN109858432B (en) * 2019-01-28 2022-01-04 北京市商汤科技开发有限公司 Method and device for detecting character information in image and computer equipment
CN114565800A (en) * 2022-04-24 2022-05-31 深圳尚米网络技术有限公司 Method for detecting illegal picture and picture detection engine

Also Published As

Publication number Publication date
CN106257496B (en) 2019-06-07

Similar Documents

Publication Publication Date Title
CN106257496A (en) Mass network text and non-textual image classification method
CN112001385B (en) Target cross-domain detection and understanding method, system, equipment and storage medium
Serna et al. Classification of traffic signs: The european dataset
JP6351689B2 (en) Attention based configurable convolutional neural network (ABC-CNN) system and method for visual question answering
Ruiz et al. Information theory in computer vision and pattern recognition
Rong et al. Recognizing text-based traffic guide panels with cascaded localization network
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
CN106022300A (en) Traffic sign identifying method and traffic sign identifying system based on cascading deep learning
CN106815604A (en) Method for viewing points detecting based on fusion of multi-layer information
CN104778476A (en) Image classification method
CN114187311A (en) Image semantic segmentation method, device, equipment and storage medium
CN112287983B (en) Remote sensing image target extraction system and method based on deep learning
CN112990282B (en) Classification method and device for fine-granularity small sample images
CN112488229A (en) Domain self-adaptive unsupervised target detection method based on feature separation and alignment
CN112861739A (en) End-to-end text recognition method, model training method and device
CN111680684B (en) Spine text recognition method, device and storage medium based on deep learning
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
CN115984537A (en) Image processing method and device and related equipment
CN113903022A (en) Text detection method and system based on feature pyramid and attention fusion
Cao et al. An end-to-end neural network for multi-line license plate recognition
CN109034213A (en) Hyperspectral image classification method and system based on joint entropy principle
CN117173422B (en) Fine granularity image recognition method based on graph fusion multi-scale feature learning
CN113822134A (en) Instance tracking method, device, equipment and storage medium based on video
Li A deep learning-based text detection and recognition approach for natural scenes
CN112613474A (en) Pedestrian re-identification method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant