CN110533088A - A kind of scene text Language Identification based on differentiated convolutional neural networks - Google Patents
A kind of scene text Language Identification based on differentiated convolutional neural networks Download PDFInfo
- Publication number
- CN110533088A CN110533088A CN201910759386.7A CN201910759386A CN110533088A CN 110533088 A CN110533088 A CN 110533088A CN 201910759386 A CN201910759386 A CN 201910759386A CN 110533088 A CN110533088 A CN 110533088A
- Authority
- CN
- China
- Prior art keywords
- neural networks
- convolutional neural
- differentiated
- feature
- cnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of scene text Language Identification based on differentiated convolutional neural networks, traditional image classification method often carry out integral analysis to image, lack the clearly capture to details, cannot handle problems well.In order to solve the above-mentioned technical problem, propose the model and correlation method of a kind of entitled " differentiated convolutional neural networks ", pass through " differentiated cluster " (discriminative clustering) algorithm, study has the local feature of distinction to one group " distinction pattern " (discriminative patterns) in the depth convolution feature of image.Finally, being indicated generally at for image is classified with two layers of fully connected network network layers.Traditional image classification method before the present invention can significantly improve is identified under the condition elements such as font, noise, illumination and is come with some shortcomings.
Description
Technical field
The invention belongs to deep learning technical field of character recognition, and in particular to one kind is based on differentiated convolutional neural networks
Scene text Language Identification.
Background technique
Multi-language environment is very universal in modern society.It can usually be encountered not in airport, railway station, hotels and other places
The case where occurring simultaneously with spoken and written languages.Languages itself are important information.In addition, different spoken and written languages have it is completely different
Characteristic, processing often needs to according to languages classification using targeted model and processing method, therefore, in multi-language environment
In, languages identification has great significance.
Languages identification is the important component of traditional optical character recognition (OCR) system.Since it is in multi-language environment
In importance, the problem in the past few decades in obtained extensive research.In recent years, with multi-medium data, especially
It is the continuous growth of the picture of mobile device acquisition, the importance of scene Text region more highlights, and leads in computer vision
Domain causes the upsurge of research.Just because of this, the languages identification of scene text also becomes indispensable.
Previous languages recognizer is designed mainly for document picture or video caption.Document picture and video caption
Background and prospect all relative cleans, noise jamming it is small.Therefore, the method based on binaryzation, region segmentation, morphological analysis etc.
Often it is used in such work.However, such method is often because back can not be coped with when being applied on scene text
Complexity and the variability of the factors such as scape, font, noise, illumination condition and be difficult to be competent at.
The task of scene text languages identification be in given picture languages classification belonging to text in prediction (English,
Chinese, Greek etc.).The problem can naturally enough be considered as a kind of image classification problem.In recent years, convolutional neural networks
(CNN) good solution is provided using its powerful learning ability and generalization ability as image classification.Nonetheless, languages
Identification still has unique challenge because of its own characteristic.And traditional image classification method often carries out entirety to image
Property analysis, lack the clearly capture to details, problems cannot be handled well.
Summary of the invention
In order to solve the above-mentioned technical problem the present invention, proposes a kind of scene text based on differentiated convolutional neural networks
Recognition methods, traditional image classification method before capable of significantly improving identifies under the condition elements such as font, noise, illumination to be deposited
In some shortcomings.
The technical scheme adopted by the invention is that: a kind of scene text languages identification based on differentiated convolutional neural networks
Method, which comprises the following steps:
Step 1: building languages identify convolutional neural networks model Disc CNN;
Step 1.1: obtaining data set, the data set is made of several pictures, has a kind of language inside every picture
Text constitutes several spoken and written languages together;The data set is divided into training set and test set;
Step 1.2: the picture in training set being cut according to default size, based on the scene character image after cutting
I extracts convolution feature level based on convolutional neural networks from scene character image IWherein hlIndicate first of feature
The characteristic pattern extracted on layer, L are characterized the number of layer;
Step 1.3: extracting intensive local feature from convolution feature level using convolutional neural networks;
Step 1.4: the intensive local feature extracted to convolutional neural networks distinguishes formula cluster, and study obtains one
Differentiated code book;
Step 1.5: each intensive local feature being encoded using the differentiated code book, then by intensive local feature
Coding result fusion, obtain full figure fixed dimension vector indicate, be denoted as full figure description son;
Step 1.6: building languages identify convolutional neural networks model Disc CNN;
Disc CNN includes a hidden layer, and activation primitive ReLU, output layer includes C node;Output layer activation value
The probability value in C languages classification is obtained after SoftMax is operated;Wherein Disc CNN uses differentiated middle layer coding staff
Method learns to obtain, and is finely tuned in the end-to-end training of overall model for once linear operation plus a ReLU class;It is complete
Whole Disc CNN is the end-to-end model that can be trained, and carries out global optimization using gradient descent method;
Step 1.7: parameter migration, end-to-end optimization;
By the back transfer algorithm of neural network, the error gradient of rear class is fed back to prime, then uses gradient descent method
Parameter adjustment is carried out simultaneously to front stage;
Step 2: the identification of text languages being carried out to picture using languages identification convolutional neural networks model Disc CNN;
It includes four convolutional layers: convl, conv2, conv, 3, conv4 that Disc CNN network, which has altogether,;The characteristic pattern of output
Height is respectively 15,7,3,1, and width is determined by input picture width;Distinction segment is found and middle layer is indicated in conv2,
It is carried out on the characteristic pattern of conv3, conv4 output.
The beneficial effects of the present invention are: the present invention is special by identifying that problem proposes that one kind combines depth convolution to languages
Sign, differentiated middle layer indicate, the image representing method of spatial pyramid method, and image representing method is modeled as end-to-end to instruct
Experienced neural network model advanced optimizes model parameter by end-to-end training.
Detailed description of the invention
Fig. 1 is the general flow chart of the embodiment of the present invention;
Fig. 2 is the detail differences schematic diagram of the scene Text region of the embodiment of the present invention;
Fig. 3 is the Disc CNN neural network structure schematic diagram of the embodiment of the present invention;
Fig. 4 is the test Text region picture schematic diagram of the embodiment of the present invention.
Specific embodiment
Understand for the ease of those of ordinary skill in the art and implement the present invention, with reference to the accompanying drawings and embodiments to this hair
It is bright to be described in further detail, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, not
For limiting the present invention.
The task of scene text languages identification be in given picture languages classification belonging to text in prediction (English,
Chinese, Greek etc.).The problem can naturally enough be considered as a kind of image classification problem.In recent years, convolutional neural networks
(CNN) good solution is provided using its powerful learning ability and generalization ability as image classification.Nonetheless, languages
Identification still has unique challenge because of its own characteristic.And traditional image classification method often carries out entirety to image
Property analysis, lack the clearly capture to details, problems cannot be handled well.In order to solve the above-mentioned technical problem, it proposes
A kind of model and correlation method of entitled " differentiated convolutional neural networks ", traditional image point before capable of significantly improving
Class method identifies under the condition elements such as font, noise, illumination to come with some shortcomings.This method passes through " differentiated cluster "
(discriminative clustering) algorithm, study is to one group " distinction pattern " in the depth convolution feature of image
(discriminative patterns), that is, have the local feature of distinction.Finally, being indicated generally at for image is connected entirely with two layers
Network layer is connect to classify.
Referring to Fig.1, a kind of scene text Language Identification based on differentiated convolutional neural networks provided by the invention,
The following steps are included:
Step 1: building languages identify convolutional neural networks model Disc CNN;
Step 1.1: data set is obtained, data set is made of several pictures, there are a kind of spoken and written languages inside every picture,
Several spoken and written languages are constituted together;Data set is divided into training set and test set;
It is the detail differences schematic diagram of the scene Text region of the present embodiment see Fig. 2.
Step 1.2: the picture in training set being cut according to default size, based on the scene character image after cutting
I extracts convolution feature level based on convolutional neural networks from scene character image IWherein hlIndicate first of feature
The characteristic pattern extracted on layer, L are characterized the number of layer;
In the present embodiment, feature level is extracted by a convolutional neural networks CNN Jing Guo pre-training;
The picture of input is scaled to fixed height first, is then input in CNN;CNN passes through to input
Picture carries out a series of convolution sum maximum pondization operation to extract feature;
WithConvolution kernel and amount of bias are respectively indicated, then the process of feature extraction are as follows:
hl=poolmax(σ(hl-1*kl+bl));
Wherein, poolmaxIndicate maximum pondization operation, σ is any nonlinear activation function, and * indicates convolution operation, h0Table
Show input initial picture, hlThe l in the middle upper right corner indicates the coding of input picture;The step-length of all convolution operations is disposed as 1, volume
For product core having a size of 3x3, Boundary filling size is set as 1x1.
Step 1.3: extracting intensive local feature from convolution feature level using convolutional neural networks;
It is done with given image to convolution for a convolution kernel can extract a kind of feature of text in image, no
Same convolution kernel can extract different characteristics of image.Overview says that the calculation method of convolutional layer is exactly according to formula:
Wherein " σ " indicates activation primitive;" imgMat " expression gray level image matrix;" W " expression convolution kernel;" ο " expression volume
Product operation;" b " expression bias.
Step 1.4: the intensive local feature extracted to convolutional neural networks distinguishes formula cluster, and study obtains one
Differentiated code book (discriminative codebook);
In the present embodiment, the specific implementation of step 1.4 includes following sub-step:
Step 1.4.1: the intensive local feature extracted to convolutional neural networks distinguishes formula cluster;
Differentiated cluster carries out respectively in each languages classification and feature level;It was found that collection is the picture of classification c in layer
The local feature set extracted on grade l, is usedIt indicates;Naturally collection is then the local feature that every other classification picture extracts
Set;
Step 1.4.2: study obtains a differentiated code book;
Differentiated code book is made of one group of linear classifier;Each classifier is equivalent to the detection of a distinction segment
Device, its output illustrate the response for distinguishing sexual norm;Wherein, languages are distinguished to need to capture the area, topography for having distinction
Domain, this region is also referred to as distinction segment, as a kind of differentiation sexual norm.
Step 1.5: each intensive local feature being encoded using differentiated code book, then by the volume of intensive local feature
Code result fusion, the fixed dimension vector for obtaining full figure indicate, are denoted as full figure description;
In the present embodiment, it is assumed that intensive local feature diagram shape is w1×h1×n1, w1、h1、n1Indicate characteristic pattern shape
Three dimensions;Each position of characteristic pattern has corresponded to a local description, therefore w × h can be extracted from characteristic pattern
Local description, each is the vector of m dimension;Use hl[i, j] come indicate the description on the i-th row jth column position, the description son
It is encoded by the code book comprising k classification, obtains vector z [i, j]: z of k dimensionl[i, j]=max (0, Wlxl[i, j]+bl);
Wherein, Wlxl[i, j]+blIt is l layers of description in code book Wl, blResponse after coding, by max (0) by the negative sound in response
Zero setting is answered, therefore coding result is the non-Negative Acknowledgment of code book;
The coding result of local description is full figure description by horizontal space pyramid pond HSPP operating polymerization;It will
Local code value is divided into several sub-regions according to spatial position;And do horizontal maximum pondization operation in that region respectively;
Assuming that local code value, by indicating, horizontal maximum pondization operation chooses maximum encoded radio as pond by dimension in each row
As a result, i.e. hspp (zl)=maxzl[i, j];Wherein, local code value is by zl[i, j] is indicated;
Spatial sub-area divides on the longitudinal direction of characteristic pattern, i.e., characteristic pattern is divided into contour several pieces, width and
Former characteristic pattern is consistent;Each subgraph is by equation hspp (zl)=maxzl[i, j] carries out the operation of horizontal space pondization respectively, most
The pond result obtained afterwards obtains description of full figure by splicing.
Step 1.6: building languages identify convolutional neural networks model Disc CNN;
See Fig. 3, the Disc CNN of the present embodiment includes a hidden layer, and activation primitive ReLU, output layer includes C
Node;Output layer activation value obtains the probability value in C languages classification after SoftMax is operated;Wherein Disc CNN is used
The coding method of differentiated middle layer learns to obtain for once linear operation plus a ReLU class, and in the end-to-end of overall model
It is finely tuned in training;Complete Disc CNN is the end-to-end model that can be trained, and is carried out using gradient descent method whole excellent
Change;
Step 1.7: parameter migration, end-to-end optimization;
By the back transfer algorithm of neural network, the error gradient of rear class is fed back to prime, then uses gradient descent method
Parameter adjustment is carried out simultaneously to front stage;
In the present embodiment, the fine tuning optimization algorithm of Disc CNN is stochastic gradient descent (SGD).Initial learning rate is set as
10-3, momentum 0.9, batch size is 128.In fcl, network uses dropout mechanism to avoid trained over-fitting.
Feature extraction network carries out pre-training in another individual CNN model.
Disc CNN realizes by C++ and Python programming language
Step 2: the identification of text languages being carried out to picture using languages identification convolutional neural networks model Disc CNN;
It includes four convolutional layers: convl, conv2, conv, 3, conv4 that Disc CNN network, which has altogether,;The characteristic pattern of output
Height is respectively 15,7,3,1, and width is determined by input picture width;Distinction segment is found and middle layer is indicated in conv2,
It is carried out on the characteristic pattern of conv3, conv4 output.
The present invention passes through the application using a kind of entitled differentiated convolutional neural networks on scene Text region, passes through area
Fraction clustering algorithm, study is to one group " distinction pattern " in the depth convolution feature of image, finally, image is indicated generally at
Classified with two layers of fully connected network network layers.
The present invention can significantly improve before traditional image classification method under the condition elements such as font, noise, illumination
Identification comes with some shortcomings.See Fig. 4, it is the test Text region picture schematic diagram of the embodiment of the present invention, is to be acquired in figure
Some data sets, the substantially picture picture that is all the natural scene for the use looked on the net, all texts font, color,
Strong variation is suffered from typesetting, writing style, in addition, background is also to be frequently subjected to illumination, the influence of camera angle etc.,
It is relatively mixed and disorderly.
For the present invention see the following table 1 compared with conventional method accuracy, accuracy of the invention is apparently higher than traditional method.
Table 1
It should be understood that the part that this specification does not elaborate belongs to the prior art;It is above-mentioned to implement for preferable
The description of example is more detailed, therefore can not be considered the limitation to the invention patent protection scope, the common skill of this field
Art personnel under the inspiration of the present invention, in the case where not departing from the ambit that the claims in the present invention are protected, can also make and replace
It changes or deforms, fall within the scope of protection of the present invention, it is of the invention range is claimed to be determined by the appended claims.
Claims (5)
1. a kind of scene text Language Identification based on differentiated convolutional neural networks, which is characterized in that including following step
It is rapid:
Step 1: building languages identify convolutional neural networks model Disc CNN;
Step 1.1: data set is obtained, the data set is made of several pictures, there are a kind of spoken and written languages inside every picture,
Several spoken and written languages are constituted together;The data set is divided into training set and test set;
Step 1.2: the picture in training set being cut according to default size, based on the scene character image I after cutting, base
Convolution feature level is extracted from scene character image I in convolutional neural networksWherein hlIt indicates on first of characteristic layer
The characteristic pattern of extraction, L are characterized the number of layer;
Step 1.3: extracting intensive local feature from convolution feature level using convolutional neural networks;
Step 1.4: the intensive local feature extracted to convolutional neural networks distinguishes formula cluster, and study obtains a differentiation
Formula code book;
Step 1.5: each intensive local feature being encoded using the differentiated code book, then by the volume of intensive local feature
Code result fusion, the fixed dimension vector for obtaining full figure indicate, are denoted as full figure description;
Step 1.6: building languages identify convolutional neural networks model Disc CNN;
Disc CNN includes a hidden layer, and activation primitive ReLU, output layer includes C node;Output layer activation value passes through
The probability value in C languages classification is obtained after SoftMax operation;Wherein Disc CNN uses the coding method of differentiated middle layer, is
Once linear operation learns to obtain plus a ReLU class, and is finely tuned in the end-to-end training of overall model;Completely
Disc CNN is the end-to-end model that can be trained, and carries out global optimization using gradient descent method;
Step 1.7: parameter migration, end-to-end optimization;
By the back transfer algorithm of neural network, the error gradient of rear class is fed back to prime, then with gradient descent method to preceding
Rear class carries out parameter adjustment simultaneously;
Step 2: the identification of text languages being carried out to picture using languages identification convolutional neural networks model Disc CNN;
It includes four convolutional layers: convl, conv2, conv, 3, conv4 that Disc CNN network, which has altogether,;The characteristic pattern height of output
Respectively 15,7,3,1, width is determined by input picture width;Distinction segment is found and middle layer is indicated in conv2, conv3,
It is carried out on the characteristic pattern of conv4 output.
2. the scene text Language Identification according to claim 1 based on differentiated convolutional neural networks, feature
Be: in step 1.2, feature level is extracted by a convolutional neural networks CNN Jing Guo pre-training;
The picture of input is scaled to fixed height first, is then input in CNN;CNN passes through to input picture
A series of convolution sum maximum pondization operation is carried out to extract feature;
WithWithConvolution kernel and amount of bias are respectively indicated, then the process of feature extraction are as follows:
hl=poolmax(σ(hl-1*kl+bl));
Wherein, poolmaxIndicate maximum pondization operation, σ is any nonlinear activation function, and * indicates convolution operation, h0Indicate input
Initial picture, hlThe l in the middle upper right corner indicates the coding of input picture;The step-length of all convolution operations is disposed as 1, convolution kernel ruler
Very little is 3x3, and Boundary filling size is set as 1x1.
3. the scene text Language Identification according to claim 1 based on differentiated convolutional neural networks, feature
It is, the specific implementation of step 1.4 includes following sub-step:
Step 1.4.1: the intensive local feature extracted to convolutional neural networks distinguishes formula cluster;
Differentiated cluster carries out respectively in each languages classification and feature level;It was found that collection is the picture of classification c on level l
Obtained local feature set is extracted, is usedIt indicates;Naturally collection is then the local feature set that every other classification picture extracts;
Step 1.4.2: study obtains a differentiated code book;
The differentiated code book is made of one group of linear classifier;Each classifier is equivalent to the detection of a distinction segment
Device, its output illustrate the response for distinguishing sexual norm;Wherein, languages are distinguished to need to capture the area, topography for having distinction
Domain, this region is also referred to as distinction segment, as a kind of differentiation sexual norm.
4. the scene text Language Identification according to claim 1 based on differentiated convolutional neural networks, feature
It is: in step 1.5, it is assumed that intensive local feature diagram shape is w1×h1×n1, w1、h1、n1Indicate three of characteristic pattern shape
Dimension;Each position of characteristic pattern has corresponded to a local description, therefore w × h part can be extracted from characteristic pattern
Description, each is the vector of m dimension;Use hl[i, j] come indicate the description on the i-th row jth column position, description is by wrapping
Code book containing k classification is encoded, and vector z [i, j]: z of k dimension is obtainedl[i, j]=max (0, Wlxl[i, j]+bl);Wherein,
Wlxl[i, j]+blIt is l layers of description in code book Wl, blResponse after coding is set the Negative Acknowledgment in response by max (0)
Zero, therefore coding result is the non-Negative Acknowledgment of code book.
5. the scene text Language Identification according to claim 4 based on differentiated convolutional neural networks, feature
Be: in step 1.5, the coding result of local description is retouched by horizontal space pyramid pond HSPP operating polymerization for full figure
State son;Local code value is divided into several sub-regions according to spatial position;And do horizontal maximum pond in that region respectively
Change operation;Assuming that local code value, by indicating, horizontal maximum pondization operation chooses maximum encoded radio by dimension in each row
As pond as a result, i.e. hspp (zl)=maxzl[i, j];Wherein, local code value is by zl[i, j] is indicated;
Spatial sub-area divides on the longitudinal direction of characteristic pattern, i.e., characteristic pattern is divided into contour several pieces, width and Yuan Te
Sign figure is consistent;Each subgraph is by equation hspp (zl)=maxzl[i, j] carries out the operation of horizontal space pondization respectively, finally
To pond result by splicing obtain full figure description son.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910759386.7A CN110533088A (en) | 2019-08-16 | 2019-08-16 | A kind of scene text Language Identification based on differentiated convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910759386.7A CN110533088A (en) | 2019-08-16 | 2019-08-16 | A kind of scene text Language Identification based on differentiated convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110533088A true CN110533088A (en) | 2019-12-03 |
Family
ID=68663497
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910759386.7A Pending CN110533088A (en) | 2019-08-16 | 2019-08-16 | A kind of scene text Language Identification based on differentiated convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110533088A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036296A (en) * | 2014-06-20 | 2014-09-10 | 深圳先进技术研究院 | Method and device for representing and processing image |
CN105917354A (en) * | 2014-10-09 | 2016-08-31 | 微软技术许可有限责任公司 | Spatial pyramid pooling networks for image processing |
CN105956517A (en) * | 2016-04-20 | 2016-09-21 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Motion identification method based on dense trajectory |
US20170011281A1 (en) * | 2015-07-09 | 2017-01-12 | Qualcomm Incorporated | Context-based priors for object detection in images |
CN109685152A (en) * | 2018-12-29 | 2019-04-26 | 北京化工大学 | A kind of image object detection method based on DC-SPP-YOLO |
-
2019
- 2019-08-16 CN CN201910759386.7A patent/CN110533088A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036296A (en) * | 2014-06-20 | 2014-09-10 | 深圳先进技术研究院 | Method and device for representing and processing image |
CN105917354A (en) * | 2014-10-09 | 2016-08-31 | 微软技术许可有限责任公司 | Spatial pyramid pooling networks for image processing |
US20170011281A1 (en) * | 2015-07-09 | 2017-01-12 | Qualcomm Incorporated | Context-based priors for object detection in images |
CN105956517A (en) * | 2016-04-20 | 2016-09-21 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Motion identification method based on dense trajectory |
CN109685152A (en) * | 2018-12-29 | 2019-04-26 | 北京化工大学 | A kind of image object detection method based on DC-SPP-YOLO |
Non-Patent Citations (1)
Title |
---|
石葆光: "基于深度学习的自然场景文字检测与识别方法研究", 《中国博士学位论文全文数据库信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105469047B (en) | Chinese detection method and system based on unsupervised learning deep learning network | |
CN108549893A (en) | A kind of end-to-end recognition methods of the scene text of arbitrary shape | |
CN105608454B (en) | Character detecting method and system based on text structure component detection neural network | |
CN107368831A (en) | English words and digit recognition method in a kind of natural scene image | |
CN107871101A (en) | A kind of method for detecting human face and device | |
CN110807422A (en) | Natural scene text detection method based on deep learning | |
Radwan et al. | Neural networks pipeline for offline machine printed Arabic OCR | |
CN108804397A (en) | A method of the Chinese character style conversion based on a small amount of target font generates | |
CN112686345B (en) | Offline English handwriting recognition method based on attention mechanism | |
CN105913053B (en) | A kind of facial expression recognizing method for singly drilling multiple features based on sparse fusion | |
CN111126404B (en) | Ancient character and font recognition method based on improved YOLO v3 | |
CN113762269B (en) | Chinese character OCR recognition method, system and medium based on neural network | |
Talukder et al. | Real-time bangla sign language detection with sentence and speech generation | |
Ahmed et al. | Bangladeshi sign language recognition using fingertip position | |
Malakar et al. | A holistic approach for handwritten Hindi word recognition | |
CN108664975A (en) | A kind of hand-written Letter Identification Method of Uighur, system and electronic equipment | |
CN108537109B (en) | OpenPose-based monocular camera sign language identification method | |
CN112069900A (en) | Bill character recognition method and system based on convolutional neural network | |
CN110517270A (en) | A kind of indoor scene semantic segmentation method based on super-pixel depth network | |
CN110348280A (en) | Water book character recognition method based on CNN artificial neural | |
Alghazo et al. | An online numeral recognition system using improved structural features–a unified method for handwritten Arabic and Persian numerals | |
CN109002771A (en) | A kind of Classifying Method in Remote Sensing Image based on recurrent neural network | |
Garg et al. | Optical character recognition using artificial intelligence | |
Rajnoha et al. | Handwriting comenia script recognition with convolutional neural network | |
Bankar et al. | Real time sign language recognition using deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191203 |
|
RJ01 | Rejection of invention patent application after publication |