CN106257496A - Mass network text and non-textual image classification method - Google Patents
Mass network text and non-textual image classification method Download PDFInfo
- Publication number
- CN106257496A CN106257496A CN201610541508.1A CN201610541508A CN106257496A CN 106257496 A CN106257496 A CN 106257496A CN 201610541508 A CN201610541508 A CN 201610541508A CN 106257496 A CN106257496 A CN 106257496A
- Authority
- CN
- China
- Prior art keywords
- image
- image block
- network
- text
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of mass network text and non-textual image classification method, first build multiscale space and divide network, then image training image concentrated, obtain the multiscale image block label information of image, and divide network according to the multiscale space built, the training dataset training multiscale space marked is utilized to divide network of network parameter, then the multiscale space utilizing structure divides the network parameter that network and training obtain, large scale network image to be tested is classified, the final classification results obtaining image, whether it is that text image makes judgement to image, and obtain text filed approximate location in the picture.The inventive method text is high with non-textual image classification accuracy, and has the highest classification effectiveness.
Description
Technical field
The invention belongs to technical field of computer vision, more particularly, to a kind of mass network text and non-textual figure
As sorting technique.
Background technology
Along with TV, the developing rapidly of the Internet, human society has the most gradually marched toward information age, from now on, the mankind's
Economic life by based on the occupying of information, configure, produce, use.And along with the arrival of information age, increasing image
Video data is propagated with approach miscellaneous, and these packets are containing substantial amounts of useful information, how from the number of these magnanimity
Extract these useful information according to, will be the information age mankind keys that can obtain more income quickly and efficiently.When
Front the Internet provides the video of magnanimity, view data, and the Internet video frame of these magnanimity and the text in network image
As a kind of extremely important information source, can be used to aid in multiple practical applications, including image retrieval, man-machine interaction
With driving navigation system etc..
The method of the text message in existing acquisition image mainly comprises text detection and text identification two parts, therefore
The research problem that always computer realm receives much concern of the major technique of the two image text automatic reading.But,
In the data that magnanimity is propagated, the most least a portion of image comprises text, and existing text detection and text recognition method are subject to
It is limited to the speed extracting image Chinese version information, is difficult to be directly used in the useful text message extracted in these data, therefore closes
Research in text Yu non-textual image classification algorithms possesses higher realistic meaning and use value.
Summary of the invention
It is an object of the invention to provide a kind of mass network text and non-textual image classification method, the method text with
Non-textual image categorizing process is simple, and classification accuracy is high.
For achieving the above object, the invention provides a kind of mass network text and non-textual image classification method, including
Following steps:
(1) multiscale space divides network struction, and described multiscale space divides network and includes that multi-level features figure is raw
One-tenth sub-network, multi-scale image block feature generation sub-network and text and non-textual image block classification sub-network:
(1.1) definition multi-level features figure generates sub-network network structure;
(1.1.1) definition image characteristics extraction network structure;
Specifically, described image characteristics extraction network structure includes five convolution stages, wherein first and second
The network structure in convolution stage is two convolutional layers and a maximum pond layer, and the network structure in last three convolution stages is equal
It is three convolutional layers and a maximum pond layer, to input picture I, each volume can be obtained through this image characteristics extraction network
The output characteristic figure in long-pending stage, is designated asWhereinRepresent the defeated of the s convolution stage
The characteristic pattern sequence gone out, Ms,mRepresent m-th characteristic pattern, MNumsNumber for the s default convolution stage output characteristic figure;
(1.1.2) definition multi-level features figure generates sub-network network structure;
Specifically, the 3rd, the 4th and the 5th net to the image characteristics extraction network described in step (1.1.1)
A warp lamination is connect respectively, by the output in these three convolution stage after the network stageIn the chi of all characteristic patterns
Degree all zooms to Wm × Hm size, and the characteristic pattern sequence after gained scaling is designated asWherein Wm and
Hm represents width and the height of characteristic pattern after default characteristic pattern scaling respectively,Represent the s volume
The output characteristic graphic sequence FM in long-pending stagesIn the characteristic pattern sequence that obtains after scaling of each characteristic pattern, M 's,mTable
Show FMsThe characteristic pattern that middle m-th characteristic pattern obtains after scaling, MNumsFor the s default convolution stage output spy
Levy the number of figure, afterwards all characteristic patterns in FMS ' are stacked, obtain multi-level features figure, be designated asWherein M "cRepresent image
C characteristic pattern of multi-level features figure, MNum=MNum3+MNum4+MNum5, represent characteristic pattern number in multi-level features figure;
(1.2) definition multi-scale image block feature generates sub-network network structure;
(1.2.1) single scalogram is as block space division;
Specifically, the multi-level features figure described in step (1.1) is generated the image multi-level features figure that sub-network obtains
F, multi-level features figure is divided into yardstick isImage block, division methods is expressed as:
As such, it is possible to multi-level features figure is divided into SP=sp × sp image block, for the image block divided
Fij, image block I corresponding in input picture IijComputational methods are:
Wherein FijRepresenting and carried out by multi-level features figure after image block division at the i-th row, the image block of jth row, x and y divides
Not Biao Shi pixel abscissa in image block and vertical coordinate, Wm and Hm represents width and the height of multi-level features figure respectively
Degree, W and H represents width and the height of input picture I respectively, and sp is that default image block divides yardstick;
(1.2.2) multi-scale image block space divides;
Specifically, preset multiple different image block and divide yardstick, be designated asTo each stroke therein
Divide yardstick spk, according to the method described in step (1.2.1), multi-level features figure F is carried out the division of image block space, can obtain
To SPk=spk×spkIndividual image block, is divided by multi-scale image block space, and all image block sequences obtained are PS, andWherein PatchnRepresent n-th image block,Represent image block sum;
(1.2.3) multiscale image block feature extraction;
Specifically, to multi-level features figure F is carried out the figure that the division of multi-scale image block space obtains by step (1.2.2)
As each image block Patch in block sequence PS, image block is divided into the most respectively Nsp part, the most each image block
Patch can be divided into SPNum=Nsp × Nsp subimage block, is designated asWherein SubPnspTable
Show n-th sp subimage block, then utilize a maximum pond layer that each subimage block is converted to this subimage block corresponding
Characteristic vector, then can obtain each subimage block characteristic vector sequence corresponding for image block Patch, be designated asWherein SubVnspRepresent n-th sp subimage block characteristic of correspondence vector, characteristic vector length
It is in described step (1.1.2) characteristic pattern number MNum in the multi-level features figure of gained, by subimages all in image block
Block characteristic of correspondence vector splices, and can obtain image block characteristic of correspondence vector, be designated as V=[SubV1,...,
SubVSPNum], then image block characteristics vector length is MNum × SPNum, and multi-scale image block space divides each obtained
Image block extracts the characteristic vector of image block as stated above, obtains the characteristic vector set of all image blocks, is designated asWherein VnRepresenting n-th image block characteristic of correspondence vector, PNum represents image block sum;
(1.3) definition text and non-textual image block classification sub-network network structure;
Specifically, after the multi-scale image block feature described in step (1.2) generates sub-network, connect one by three entirely
Text and the non-textual image block classification network that articulamentum is constituted, to the multi-scale image block feature of gained in step (1.2) to
Each image block characteristics vector V in duration set VS, carries out classification by the text with non-textual image block classification network and sentences
Certainly, the output Pro obtained represents the probability that this image block is text image block, if Pro > tP, then and the classification results of this image block
Being designated as 1, otherwise classification results is 0, it is hereby achieved that the classification results of all image blocks, is designated asWherein PrednRepresent the classification results of n-th image block, and Predn∈ 0,1}, if Predn=
0 represents that this image block is non-textual image block, Predn=1 represents that this image block is text image block;
(1.4) build multiscale space and divide network;
Specifically, by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure,
Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level
It is linked togather, is a complete multiscale space and divides network;
(2) multiscale space division network training:
(2.1) each the image concentrating training image, obtains multiscale image block label information;
Specifically, to training image collectionIn each image Itr, obtain by the mode of artificial mark
The position in image Chinese version region, is designated asWherein T represents the number of training image, bbqRepresent in image
The bounding box that q-th is text filed, Q is the number in image Chinese version region, then according to the method described in step (1.2.1),
Multiple different image block according to presetting in step (1.2.2) divides yardstickIn each division chi
Degree, carries out multi-scale image block space division to image Itr, each image block PatchTr after dividing for space, note
The area of image block is SPatchTr, and the height of image block is HPatchTr, and the area in image block Chinese version region is SText,
The height in image block Chinese version region is HText, if this image block meets condition:
It is text filed for then marking this image block, and corresponding label information is 1, and otherwise marking this image block is non-textual district
Territory, corresponding label information is 0, and the image block Chinese version region that wherein tS is default accounts for the threshold value of whole image block area ratio,
TH is the height threshold value with image block aspect ratio in default image block Chinese version region, and note multiscale image block label information isWherein lbllRepresent that the label information of l image block, PNum represent image after multiscale space division
The number of block;
(2.2) training obtains the parameter of multiscale space division network;
Specifically, the training image collection χ marked and the training image marked is utilized to concentrate every training image
Multiscale image block label informationUtilize build in the method training step (1) of reverse conduction multiple dimensioned
Space divides network, and wherein, loss function computational methods are:
Wherein, lbllRepresent the label information of l image block, PNum represent multiscale space divide after image block
Number, prolRepresent the probability that the l image block classification result is text image block, divide the output of network for multiscale space,
The multiscale space of training gained divides network parameter and is designated as θ;
(3) text is classified with non-textual image:
Specifically, to test image Ite, it is first according to the method described in step (1.2.1), according in step (1.2.2)
The multiple different image block preset divides yardstickIn each division yardstick, image Itr is carried out many
Scalogram divides as block space, and the collection of all image blocks that note space obtains after dividing is combined intoThen the multiscale space built in step (1) is utilized to divide in network and step (2)
The multiscale space that training obtains divides the parameter θ of network, obtains testing the classification court verdict of imageWherein PredTerRepresenting the predicting the outcome of r image block in test image, PNum represents
Multi-scale image block space divide after image block number, in SubPS all predict the outcome be 1 image block set TextPS i.e.
For text image set of blocks all in input picture Ite, it is hereby achieved that the approximate location in image Chinese version region and literary composition
The dimensional information of one's respective area, if TextPS is not empty, then the classification results of this test image is text image, otherwise tests figure
The classification results of picture is non-textual image.
By the above technical scheme that the present invention is contemplated, compared with prior art, the present invention has following technical effect that
(1) existing mass network text and non-textual image classification method are generally firstly the need of extracting candidate in image
Then these candidate regions are filtered by class character area by methods such as classification, finally by the classification to candidate region
Whether adjudicated image is the prediction of text image;The inventive method first construct one end-to-end, trainable
Multiscale space divides network, be can be achieved with image for input by this network, and image is carried out image block rank
Prediction, finally gives discriminant classification result and the text approximate location in the picture of image, such that it is able to do end to end
Differentiation to text Yu non-textual image;Therefore the inventive method realizes more succinct;
(2) owing to image being usually present very many class character areas, and existing mass network text and non-textual
Image classification method extracts the candidate's class character area in image, and uses the methods such as cluster, classification to all of candidate region
Carrying out filtering classification, obtain final classification results, the most this kind of method processing speed is very slow, and this kind of algorithm is easy to be subject to
Impact to environmental factorss such as illumination;The inventive method uses the convolutional Neural having the strongest robustness to external conditions such as illumination
The method of network, carry out space division by artificial, and classify the image block of each division image, it is to avoid robust
Property poor class character area extract process;Therefore the inventive method has the highest classification accuracy and locates the most efficiently
Reason speed, and there is the strongest robustness;
(3) present invention is about the differentiation result of mass network text Yu non-textual image, not only comprises whether image is literary composition
The information of this image, additionally it is possible to point out word approximate location in picture and dimensional information is big for follow-up text detection link
Reduce greatly text search scope.
Accompanying drawing explanation
Fig. 1 is that the multiscale space that the inventive method builds divides network structure.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, right
The present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, and
It is not used in the restriction present invention.If additionally, technical characteristic involved in each embodiment of invention described below
The conflict of not constituting each other just can be mutually combined.
Mass network text of the present invention comprises the following steps with non-textual image classification method:
(1) multiscale space divides network struction, and described multiscale space divides network and includes that multi-level features figure is raw
One-tenth sub-network, multi-scale image block feature generation sub-network and text and non-textual image block classification sub-network:
(1.1) definition multi-level features figure generates sub-network network structure;
(1.1.1) definition image characteristics extraction network structure;
Specifically, described image characteristics extraction network structure includes five convolution stages, wherein first and second
The network structure in convolution stage is two convolutional layers and a maximum pond layer, and the network structure in last three convolution stages is equal
It is three convolutional layers and a maximum pond layer, to input picture I, each volume can be obtained through this image characteristics extraction network
The output characteristic figure in long-pending stage, is designated asWhereinRepresent the defeated of the s convolution stage
The characteristic pattern sequence gone out, Ms,mRepresent m-th characteristic pattern, MNumsNumber for the s default convolution stage output characteristic figure;
(1.1.2) definition multi-level features figure generates sub-network network structure;
Specifically, the 3rd, the 4th and the 5th net to the image characteristics extraction network described in step (1.1.1)
A warp lamination is connect respectively, by the output in these three convolution stage after the network stageIn the chi of all characteristic patterns
Degree all zooms to Wm × Hm size, and the characteristic pattern sequence after gained scaling is designated asWherein Wm and
Hm represents width and the height of characteristic pattern after default characteristic pattern scaling respectively,Represent the s volume
The output characteristic graphic sequence FM in long-pending stagesIn the characteristic pattern sequence that obtains after scaling of each characteristic pattern, M 's,mTable
Show FMsThe characteristic pattern that middle m-th characteristic pattern obtains after scaling, MNumsFor the s default convolution stage output spy
Levy the number of figure, afterwards all characteristic patterns in FMS ' are stacked, obtain multi-level features figure, be designated asWherein M "cRepresent image
C characteristic pattern of multi-level features figure, MNum=MNum3+MNum4+MNum5, represent characteristic pattern number in multi-level features figure;
(1.2) definition multi-scale image block feature generates sub-network network structure;
(1.2.1) single scalogram is as block space division;
Specifically, the multi-level features figure described in step (1.1) is generated the image multi-level features figure that sub-network obtains
F, multi-level features figure is divided into yardstick isImage block, division methods is expressed as:
As such, it is possible to multi-level features figure is divided into SP=sp × sp image block, for the image block divided
Fij, image block I corresponding in input picture IijComputational methods are:
Wherein FijRepresenting and carried out by multi-level features figure after image block division at the i-th row, the image block of jth row, x and y divides
Not Biao Shi pixel abscissa in image block and vertical coordinate, Wm and Hm represents width and the height of multi-level features figure respectively
Degree, W and H represents width and the height of input picture I respectively, and sp is that default image block divides yardstick;
(1.2.2) multi-scale image block space divides;
Specifically, preset multiple different image block and divide yardstick, be designated asTo each stroke therein
Divide yardstick spk, according to the method described in step (1.2.1), multi-level features figure F is carried out the division of image block space, can obtain
To SPk=spk×spkIndividual image block, is divided by multi-scale image block space, and all image block sequences obtained are PS, andWherein PatchnRepresent n-th image block,Represent image block sum;
(1.2.3) multiscale image block feature extraction;
Specifically, to multi-level features figure F is carried out the figure that the division of multi-scale image block space obtains by step (1.2.2)
As each image block Patch in block sequence PS, image block is divided into the most respectively Nsp part, the most each image block
Patch can be divided into SPNum=Nsp × Nsp subimage block, is designated asWherein SubPnspTable
Show n-th sp subimage block, then utilize a maximum pond layer that each subimage block is converted to this subimage block corresponding
Characteristic vector, then can obtain each subimage block characteristic vector sequence corresponding for image block Patch, be designated asWherein SubVnspRepresent n-th sp subimage block characteristic of correspondence vector, characteristic vector length
It is in described step (1.1.2) characteristic pattern number MNum in the multi-level features figure of gained, by subimages all in image block
Block characteristic of correspondence vector splices, and can obtain image block characteristic of correspondence vector, be designated as V=[SubV1,...,
SubVSPNum], then image block characteristics vector length is MNum × SPNum, and multi-scale image block space divides each obtained
Image block extracts the characteristic vector of image block as stated above, obtains the characteristic vector set of all image blocks, is designated asWherein VnRepresenting n-th image block characteristic of correspondence vector, PNum represents image block sum;
(1.3) definition text and non-textual image block classification sub-network network structure;
Specifically, after the multi-scale image block feature described in step (1.2) generates sub-network network, one is connect by three
The text that individual full articulamentum is constituted and non-textual image block classification network, special to the multiscale image block of gained in step (1.2)
Levy each image block characteristics vector V in vector set VS, carried out point by the text and non-textual image block classification network
Class is adjudicated, and the output Pro obtained represents the probability that this image block is text image block, if Pro > tP, then and the classification of this image block
Result is designated as 1, and otherwise classification results is 0, it is hereby achieved that the classification results of all image blocks, is designated asWherein PrednRepresent the classification results of n-th image block, and Predn∈ 0,1}, if Predn=
0 represents that this image block is non-textual image block, Predn=1 represents that this image block is text image block;
(1.4) build multiscale space and divide network;
Specifically, by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure,
Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level
It is linked togather, as it is shown in figure 1, be a complete multiscale space to divide network;
(2) multiscale space division network training:
(2.1) each the image concentrating training image, obtains multiscale image block label information;
Specifically, to training image collectionIn each image Itr, obtain by the mode of artificial mark
The position in image Chinese version region, is designated asWherein T represents the number of training image, bbqRepresent in image
The bounding box that q-th is text filed, Q is the number in image Chinese version region, then according to the method described in step (1.2.1),
Multiple different image block according to presetting in step (1.2.2) divides yardstickIn each division chi
Degree, carries out multi-scale image block space division to image Itr, each image block PatchTr after dividing for space, note
The area of image block is SPatchTr, and the height of image block is HPatchTr, and the area in image block Chinese version region is SText,
The height in image block Chinese version region is HText, if this image block meets condition:
It is text filed for then marking this image block, and corresponding label information is 1, and otherwise marking this image block is non-textual district
Territory, corresponding label information is 0, and the image block Chinese version region that wherein tS is default accounts for the threshold value of whole image block area ratio,
TH is the height threshold value with image block aspect ratio in default image block Chinese version region, and note multiscale image block label information isWherein lbllRepresent that the label information of l image block, PNum represent image after multiscale space division
The number of block;
(2.2) training obtains the parameter of multiscale space division network;
Specifically, the training image collection χ marked and the training image marked is utilized to concentrate every training image
Multiscale image block label informationUtilize build in the method training step (1) of reverse conduction multiple dimensioned
Space divides network, and wherein, loss function computational methods are:
Wherein, lbllRepresent the label information of l image block, PNum represent multiscale space divide after image block
Number, prolRepresent the probability that the l image block classification result is text image block, divide the output of network for multiscale space,
The multiscale space of training gained divides network parameter and is designated as θ;
(3) text is classified with non-textual image:
Specifically, to test image Ite, it is first according to the method described in step (1.2.1), according in step (1.2.2)
The multiple different image block preset divides yardstickIn each division yardstick, image Itr is carried out many
Scalogram divides as block space, and the collection of all image blocks that note space obtains after dividing is combined intoThen the multiscale space built in step (1) is utilized to divide instruction in network and step (2)
The multiscale space got divides the parameter θ of network, obtains testing the classification court verdict of imageWherein PredTerRepresenting the predicting the outcome of r image block in test image, PNum represents
Multi-scale image block space divide after image block number, in SubPS all predict the outcome be 1 image block set TextPS i.e.
For text image set of blocks all in input picture Ite, it is hereby achieved that the approximate location in image Chinese version region and literary composition
The dimensional information of one's respective area, if TextPS is not empty, then the classification results of this test image is text image, otherwise tests figure
The classification results of picture is non-textual image.
As it will be easily appreciated by one skilled in the art that and the foregoing is only presently preferred embodiments of the present invention, not in order to
Limit the present invention, all any amendment, equivalent and improvement etc. made within the spirit and principles in the present invention, all should comprise
Within protection scope of the present invention.
Claims (8)
1. a mass network text and non-textual image classification method, it is characterised in that described method comprises following step:
(1) multiscale space divides network struction, including: (1.1) definition multi-level features figure generates sub-network network structure;
(1.2) definition multi-scale image block feature generates sub-network network structure;(1.3) definition text and non-textual image block classification
Network of network structure;(1.4) build multiscale space and divide network;
(2) multiscale space division network training: each the image that training image is concentrated by (2.1), obtains multi-scale image
Block label information;(2.2) obtain multiscale space according to the training of described multiscale image block label information and divide the parameter of network;
(3) text is classified with non-textual image: divides the parameter of network according to multiscale space, utilizes described multiscale space to draw
Text to be identified or non-textual image are classified by subnetwork.
Mass network text the most according to claim 1 and non-textual image classification method, it is characterised in that described step
(1.1) particularly as follows:
(1.1.1) definition image characteristics extraction network structure: described image characteristics extraction network structure includes five convolution order
Section, wherein the network structure in first and second convolution stage is two convolutional layers and a maximum pond layer, and last three
The network structure in individual convolution stage is three convolutional layers and a maximum pond layer, to input picture I, through this characteristics of image
Extract network and obtain the output characteristic figure in each convolution stage, be designated asWhereinRepresent
The characteristic pattern sequence of the output in the s convolution stage, Ms,mRepresent m-th characteristic pattern, MNumsFor the s default convolution stage
The number of output characteristic figure;
(1.1.2) definition multi-level features figure generates sub-network network structure: carry the characteristics of image described in step (1.1.1)
A warp lamination is connect respectively, by the output in these three convolution stage after rear three the convolution stages taking networkIn
The yardstick of all characteristic patterns all zoom to Wm × Hm size, the characteristic pattern sequence after gained scaling is designated asCharacteristic pattern width and height after wherein Wm and Hm represents default characteristic pattern scaling respectively,Represent the output characteristic graphic sequence FM in the s convolution stagesIn each characteristic pattern through scaling
After the characteristic pattern sequence that obtains, Ms′,mRepresent FMsThe characteristic pattern that middle m-th characteristic pattern obtains after scaling, MNumsFor
All characteristic patterns in FMS ' are stacked, obtain multilamellar by the number of the s the convolution stage output characteristic figure preset afterwards
Secondary characteristic pattern, is designated asIts
Middle M "cRepresent c characteristic pattern of the multi-level features figure of image, MNum=MNum3+MNum4+MNum5, represent multi-level features
Characteristic pattern number in figure.
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described
Step (1.2) particularly as follows:
(1.2.1) single scalogram is as block space division: the multi-level features figure described in step (1.1) is generated sub-network and obtains
Image multi-level features figure F, multi-level features figure is divided into yardstick isImage block, division methods represents
For:
Multi-level features figure is divided into SP=sp × sp image block, for the image block F dividedij, at input picture I
The image block I of middle correspondenceijComputational methods are:
Wherein FijRepresent and distinguish table at the i-th row, the image block of jth row, x and y after multi-level features figure is carried out image block division
Show pixel abscissa in image block and vertical coordinate, Wm and Hm represent respectively multi-level features figure width and height, W and
H represents width and the height of input picture I respectively, and sp is that default image block divides yardstick;
(1.2.2) multi-scale image block space divides: presets multiple different image block and divides yardstick, is designated asTo each division yardstick sp thereink, according to the method described in step (1.2.1), to multi-level features
Figure F carries out the division of image block space, obtains SPk=spk×spkIndividual image block, is divided by multi-scale image block space, obtains
All image block sequences be PS, andWherein PatchnRepresent n-th image block,Represent image block sum;
(1.2.3) multiscale image block feature extraction: multi-level features figure F is carried out multi-scale image in step (1.2.2)
Block space divides each image block Patch in image block sequence PS obtained, and is divided into the most respectively by image block
Nsp part, the most each image block Patch is divided into SPNum=Nsp × Nsp subimage block, is designated asWherein SubPnspRepresent n-th sp subimage block, then utilize a maximum pond layer by every
Individual subimage block is converted to this subimage block characteristic of correspondence vector, then obtain each subimage block corresponding for image block Patch
Characteristic vector sequence, is designated asWherein SubVnspRepresent n-th sp subimage block characteristic of correspondence
Vector, characteristic vector length is in described step (1.1.2) characteristic pattern number MNum in the multi-level features figure of gained, will figure
As subimage block characteristic of correspondence vectors all in block splice, obtain image block characteristic of correspondence vector, be designated as V=
[SubV1,...,SubVSPNum], then image block characteristics vector length is MNum × SPNum, divides multi-scale image block space
Each image block obtained extracts the characteristic vector of image block as stated above, obtains the set of eigenvectors of all image blocks
Close, be designated asWherein VnRepresenting n-th image block characteristic of correspondence vector, PNum represents image block sum;
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described
Step (1.3) is particularly as follows: after the multi-scale image block feature described in step (1.2) generates sub-network, connect one by three
The text of full articulamentum composition and non-textual image block classification network, to the multi-scale image block feature of gained in step (1.2)
Each image block characteristics vector V in vector set VS, is classified with non-textual image block classification network by the text
Judgement, the output Pro obtained represents the probability that this image block is text image block, if Pro > tP, then the classification knot of this image block
Fruit is designated as 1, and otherwise classification results is 0, thus obtains the classification results of all image blocks, is designated as
Wherein PrednRepresent the classification results of n-th image block, and Predn∈ 0,1}, if Predn=0 represents that this image block is non-
Text image block, Predn=1 represents that this image block is text image block.
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described
Step (1.4) particularly as follows: by step (1.1) to defined in step (1.3) multi-level features figure generate sub-network network structure,
Multi-scale image block feature generates sub-network network structure and text and non-textual image block classification sub-network network structure level
It is linked togather, builds a complete multiscale space and divide network.
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described
Step (2.1) particularly as follows:
To training image collectionIn each image Itr, obtain image Chinese local area by the mode of artificial mark
The position in territory, is designated asWherein T represents the number of training image, bbqRepresent that in image, q-th is text filed
Bounding box, Q is the number in image Chinese version region, then according to the method described in step (1.2.1), according to step
(1.2.2) the multiple different image block preset in divides yardstickIn each division yardstick, to image
Itr carries out multi-scale image block space division, each image block PatchTr after dividing for space, note image block
Area is SPatchTr, and the height of image block is HPatchTr, and the area in image block Chinese version region is SText, in image block
Text filed height is HText, if this image block meets condition:
It is text filed for then marking this image block, and corresponding label information is 1, and otherwise marking this image block is non-textual region,
Corresponding label information is 0, and the image block Chinese version region that wherein tS is default accounts for the threshold value of whole image block area ratio, and tH is
The height in the image block Chinese version region preset and the threshold value of image block aspect ratio, note multiscale image block label information isWherein lbllRepresent that the label information of l image block, PNum represent image after multiscale space division
The number of block;
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described
Step (2.2) particularly as follows:
The training image collection χ marked and the training image marked is utilized to concentrate the multiscale image block of every training image
Label informationThe multiscale space built in the method training step (1) of reverse conduction is utilized to divide net
Network, wherein, loss function computational methods are:
Wherein, lbllRepresent the label information of l image block, PNum represent multiscale space divide after the number of image block,
prolRepresent the probability that the l image block classification result is text image block, divide the output of network, training for multiscale space
The multiscale space of gained divides network parameter and is designated as θ;
Mass network text the most according to claim 1 and 2 and non-textual image classification method, it is characterised in that described
Step (3) is particularly as follows: to test image Ite, be first according to the method described in step (1.2.1), according to pre-in step (1.2.2)
If multiple different image block divide yardstickIn each division yardstick, image Itr is carried out many chis
Degree image block space divides, and the collection of all image blocks that note space obtains after dividing is combined into
Then utilize the multiscale space built in step (1) to divide the multiscale space that in network and step (2), training obtains to draw
The parameter θ of subnetwork, obtains testing the classification court verdict of imageWherein PredTerRepresent
The predicting the outcome of r image block in test image, PNum represents the image block number after the division of multi-scale image block space,
In SubPS all predict the outcome be 1 image block set TextPS be all text image set of blocks in input picture Ite,
Thus obtain the approximate location in image Chinese version region and text filed dimensional information, if TextPS is not empty, then should
The classification results of test image is text image, and the classification results otherwise testing image is non-textual image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610541508.1A CN106257496B (en) | 2016-07-12 | 2016-07-12 | Mass network text and non-textual image classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610541508.1A CN106257496B (en) | 2016-07-12 | 2016-07-12 | Mass network text and non-textual image classification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106257496A true CN106257496A (en) | 2016-12-28 |
CN106257496B CN106257496B (en) | 2019-06-07 |
Family
ID=57714130
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610541508.1A Active CN106257496B (en) | 2016-07-12 | 2016-07-12 | Mass network text and non-textual image classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106257496B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109657522A (en) * | 2017-10-10 | 2019-04-19 | 北京京东尚科信息技术有限公司 | Detect the method and apparatus that can travel region |
CN109711481A (en) * | 2019-01-02 | 2019-05-03 | 京东方科技集团股份有限公司 | Neural network, correlation technique, medium and equipment for the identification of paintings multi-tag |
CN109711241A (en) * | 2018-10-30 | 2019-05-03 | 百度在线网络技术(北京)有限公司 | Object detecting method, device and electronic equipment |
CN109740482A (en) * | 2018-12-26 | 2019-05-10 | 北京科技大学 | A kind of image text recognition methods and device |
CN109815473A (en) * | 2019-01-28 | 2019-05-28 | 四川译讯信息科技有限公司 | A kind of documents editing householder method |
CN109858432A (en) * | 2019-01-28 | 2019-06-07 | 北京市商汤科技开发有限公司 | Method and device, the computer equipment of text information in a kind of detection image |
CN110378330A (en) * | 2018-04-12 | 2019-10-25 | Oppo广东移动通信有限公司 | Picture classification method and Related product |
WO2020052085A1 (en) * | 2018-09-13 | 2020-03-19 | 北京字节跳动网络技术有限公司 | Video text detection method and device, and computer readable storage medium |
CN114565800A (en) * | 2022-04-24 | 2022-05-31 | 深圳尚米网络技术有限公司 | Method for detecting illegal picture and picture detection engine |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070065003A1 (en) * | 2005-09-21 | 2007-03-22 | Lockheed Martin Corporation | Real-time recognition of mixed source text |
CN105184312A (en) * | 2015-08-24 | 2015-12-23 | 中国科学院自动化研究所 | Character detection method and device based on deep learning |
CN105608456A (en) * | 2015-12-22 | 2016-05-25 | 华中科技大学 | Multi-directional text detection method based on full convolution network |
CN105740909A (en) * | 2016-02-02 | 2016-07-06 | 华中科技大学 | Text recognition method under natural scene on the basis of spatial transformation |
-
2016
- 2016-07-12 CN CN201610541508.1A patent/CN106257496B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070065003A1 (en) * | 2005-09-21 | 2007-03-22 | Lockheed Martin Corporation | Real-time recognition of mixed source text |
CN105184312A (en) * | 2015-08-24 | 2015-12-23 | 中国科学院自动化研究所 | Character detection method and device based on deep learning |
CN105608456A (en) * | 2015-12-22 | 2016-05-25 | 华中科技大学 | Multi-directional text detection method based on full convolution network |
CN105740909A (en) * | 2016-02-02 | 2016-07-06 | 华中科技大学 | Text recognition method under natural scene on the basis of spatial transformation |
Non-Patent Citations (3)
Title |
---|
CHENGQUAN ZHANG ETAL.: "Automatic discrimination of text and non-text natural images", 《INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR)》 * |
N. SHARMA ETAL.: "Piecewise linearity based method for text frame classification in video", 《PATTERN RECOGNITION》 * |
马然: "基于深度学习的自然场景文本识别系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109657522A (en) * | 2017-10-10 | 2019-04-19 | 北京京东尚科信息技术有限公司 | Detect the method and apparatus that can travel region |
CN110378330A (en) * | 2018-04-12 | 2019-10-25 | Oppo广东移动通信有限公司 | Picture classification method and Related product |
CN110378330B (en) * | 2018-04-12 | 2021-07-13 | Oppo广东移动通信有限公司 | Picture classification method and related product |
WO2020052085A1 (en) * | 2018-09-13 | 2020-03-19 | 北京字节跳动网络技术有限公司 | Video text detection method and device, and computer readable storage medium |
CN109711241B (en) * | 2018-10-30 | 2021-07-20 | 百度在线网络技术(北京)有限公司 | Object detection method and device and electronic equipment |
CN109711241A (en) * | 2018-10-30 | 2019-05-03 | 百度在线网络技术(北京)有限公司 | Object detecting method, device and electronic equipment |
CN109740482A (en) * | 2018-12-26 | 2019-05-10 | 北京科技大学 | A kind of image text recognition methods and device |
CN109711481A (en) * | 2019-01-02 | 2019-05-03 | 京东方科技集团股份有限公司 | Neural network, correlation technique, medium and equipment for the identification of paintings multi-tag |
CN109711481B (en) * | 2019-01-02 | 2021-09-10 | 京东方艺云科技有限公司 | Neural networks for drawing multi-label recognition, related methods, media and devices |
CN109858432A (en) * | 2019-01-28 | 2019-06-07 | 北京市商汤科技开发有限公司 | Method and device, the computer equipment of text information in a kind of detection image |
CN109815473A (en) * | 2019-01-28 | 2019-05-28 | 四川译讯信息科技有限公司 | A kind of documents editing householder method |
CN109858432B (en) * | 2019-01-28 | 2022-01-04 | 北京市商汤科技开发有限公司 | Method and device for detecting character information in image and computer equipment |
CN114565800A (en) * | 2022-04-24 | 2022-05-31 | 深圳尚米网络技术有限公司 | Method for detecting illegal picture and picture detection engine |
Also Published As
Publication number | Publication date |
---|---|
CN106257496B (en) | 2019-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106257496A (en) | Mass network text and non-textual image classification method | |
CN112001385B (en) | Target cross-domain detection and understanding method, system, equipment and storage medium | |
Serna et al. | Classification of traffic signs: The european dataset | |
JP6351689B2 (en) | Attention based configurable convolutional neural network (ABC-CNN) system and method for visual question answering | |
Ruiz et al. | Information theory in computer vision and pattern recognition | |
Rong et al. | Recognizing text-based traffic guide panels with cascaded localization network | |
Wang et al. | FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection | |
CN106022300A (en) | Traffic sign identifying method and traffic sign identifying system based on cascading deep learning | |
CN106815604A (en) | Method for viewing points detecting based on fusion of multi-layer information | |
CN104778476A (en) | Image classification method | |
CN114187311A (en) | Image semantic segmentation method, device, equipment and storage medium | |
CN112287983B (en) | Remote sensing image target extraction system and method based on deep learning | |
CN112990282B (en) | Classification method and device for fine-granularity small sample images | |
CN112488229A (en) | Domain self-adaptive unsupervised target detection method based on feature separation and alignment | |
CN112861739A (en) | End-to-end text recognition method, model training method and device | |
CN111680684B (en) | Spine text recognition method, device and storage medium based on deep learning | |
CN116091946A (en) | Yolov 5-based unmanned aerial vehicle aerial image target detection method | |
CN115984537A (en) | Image processing method and device and related equipment | |
CN113903022A (en) | Text detection method and system based on feature pyramid and attention fusion | |
Cao et al. | An end-to-end neural network for multi-line license plate recognition | |
CN109034213A (en) | Hyperspectral image classification method and system based on joint entropy principle | |
CN117173422B (en) | Fine granularity image recognition method based on graph fusion multi-scale feature learning | |
CN113822134A (en) | Instance tracking method, device, equipment and storage medium based on video | |
Li | A deep learning-based text detection and recognition approach for natural scenes | |
CN112613474A (en) | Pedestrian re-identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |