CN106407971A

CN106407971A - Text recognition method and device

Info

Publication number: CN106407971A
Application number: CN201610827063.3A
Authority: CN
Inventors: 杨松; 万韶华; 陈志军
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2016-09-14
Filing date: 2016-09-14
Publication date: 2017-02-15

Abstract

The invention provides a text recognition method and device. The method comprises: determining convolutional neural network (CNN) features of a text image and obtaining an eigenvector sequence of a preset dimension; and using a recurrent neural network (RNN) to decode the eigenvector sequence of the preset dimension, and obtaining texts in the text image to be recognized. According to the technical scheme provided by embodiments of the invention, the method combines the CNN and RNN to achieve integral recognition of the text image to be recognized, so that the method avoids the problem that texts need to be separated character by character and then recognized character by character in the prior art which leads to accumulated errors, and the text recognition rate can be improved.

Description

Character recognition method and device

Technical field

It relates to technical field of character recognition, more particularly, to a kind of character recognition method and device.

Background technology

Text region in natural scene is one of computer vision key areas, knows including text detection and word Other two steps, wherein text detection are to detect the character area word in image, and Text region is then by literary composition Block domain is identified, and obtains corresponding word.

In correlation technique, by word for word splitting the word of character area, then word is identified one by one, correlation technique In Text region each step be independent step it is easy to cause cumulative errors, reduce Text region rate.

Content of the invention

For overcoming problem present in correlation technique, the embodiment of the present disclosure provides a kind of character recognition method and device, uses To reduce the cumulative errors of Text region, improve Text region rate.

According to the embodiment of the present disclosure in a first aspect, provide a kind of character recognition method, it may include：

Determine the convolutional neural networks CNN feature of character image to be identified, obtain the characteristic vector sequence of default dimension；

Using Recognition with Recurrent Neural Network RNN, the characteristic vector sequence of described default dimension is decoded, obtain described in wait to know Word in other character image.

In one embodiment, the described convolutional neural networks CNN feature determining character image to be identified, obtains default dimension Characteristic vector sequence, it may include：

Described character image to be identified is divided into N number of sliding window, each window described is with described each is adjacent M pixel wide of windows overlay, N is positive integer, and M is positive integer；

Determine the CNN feature of each sliding window described, obtain the characteristic vector sequence of the default dimension that length is N.

In one embodiment, the described CNN feature determining each sliding window described, it may include：

The image of each sliding window is inputted CNN network；

Described CNN Web vector graphic is controlled to preset the convolutional layer of the number of plies and a full articulamentum to each sliding window described Image carry out feature extraction and process, obtain the characteristic vector of the corresponding default dimension of each sliding window.

In one embodiment, described use Recognition with Recurrent Neural Network RNN is decoded to the characteristic vector of described default dimension, May include：

The characteristic vector sequence of the default dimension that described length is N sequentially inputs RNN model；

Control described RNN that the characteristic vector sequence of the default dimension that the described length sequentially inputting is N is decoded, obtain To corresponding N number of recognition result.

In one embodiment, the described RNN of described control to the described length sequentially inputting be N default dimension feature to After amount sequence is decoded, may also include：

Filter the non-legible invalid information in described N number of recognition result, obtain Word message；

According to each window described pixel wide overlapping with each adjacent window apertures, determine in described Word message Replicated literal；

Described replicated literal is eliminated, obtains the effective word in described character image to be identified.

In one embodiment, methods described may also include：

Gray proces are carried out to original character image, obtains gray level image；

Process is zoomed in and out to described gray level image, obtaining width is preset height, length to height ratio and original word to be identified The consistent character image to be identified of the length to height ratio of image, executes described determination word to be identified based on described character image to be identified The step of the CNN feature of image.

According to the second aspect of the embodiment of the present disclosure, provide a kind of character recognition device, it may include：

Characteristic extracting module, is configured to determine that the convolutional neural networks CNN feature of character image to be identified, is preset The characteristic vector sequence of dimension；

Decoder module, be configured with that Recognition with Recurrent Neural Network RNN determines to described characteristic extracting module is described default The characteristic vector sequence of dimension is decoded, and obtains the word in described character image to be identified.

In one embodiment, described characteristic extracting module may include：

Segmentation submodule, is configured to for described character image to be identified to be divided into N number of sliding window, each window described Mouth M pixel wide overlapping with each adjacent window apertures described, N is positive integer, and M is positive integer；

Determination sub-module, is configured to determine that each sliding window described that described segmentation submodule segmentation obtains CNN feature, obtains the characteristic vector sequence of the default dimension that length is N.

In one embodiment, determination sub-module may include：

First input submodule, be configured to by segmentation submodule segmentation obtain described in each sliding window image Input CNN network；

Process submodule, be configured to control the convolutional layer of the default number of plies of described CNN Web vector graphic and a full articulamentum Feature extraction and process are carried out to the image of each sliding window described in described first input submodule input, obtains each The characteristic vector of the corresponding default dimension of individual sliding window.

In one embodiment, decoder module may include：

Second input submodule, the characteristic vector sequence being configured to the default dimension being N by described length sequentially inputs RNN model；

Decoding sub-module, is configured to control the length that described RNN sequentially inputs to described second input submodule to be N's The characteristic vector sequence of default dimension is decoded, and obtains corresponding N number of recognition result.

In one embodiment, device may also include：

Filtering module, is configured to filter non-legible in described N number of recognition result that described decoding sub-module obtains Invalid information, obtains Word message；

Determining module, is configured to according to each window described pixel wide overlapping with each adjacent window apertures, really Replicated literal in the described Word message that fixed described filtering module obtains；

Cancellation module, the described replicated literal being configured to determine described determining module eliminates, and obtains described to be identified Effective word in character image.

In one embodiment, also include：

Gray scale module, is configured to carry out gray proces to original character image, obtains gray level image；

Zoom module, the described gray level image being configured to described gray scale module is obtained zooms in and out process, obtains width Spend for preset height, the length to height ratio to be identified character image consistent with the length to height ratio of original character image to be identified, based on described The step of the CNN feature of character image execution to be identified described determination character image to be identified.

According to the third aspect of the embodiment of the present disclosure, provide a kind of character recognition device, including：

Processor；

For storing the memorizer of processor executable；

Wherein, described processor is configured to：

The technical scheme that embodiment of the disclosure provides can include following beneficial effect：By deep learning literary composition to be identified Convolutional neural networks (Convolutional Neural Network, the CNN) feature of word image, can obtain word to be identified The characteristic vector sequence of image, and pass through Recognition with Recurrent Neural Network (Recurrent Neural Networks, RNN), for example：Long Short term memory artificial neural network (Long-Short Term Memory, LSTM) can be decoded to characteristic vector sequence and obtain To the corresponding word of character image to be identified, the method being combined using CNN and RNN can complete whole to character image to be identified The identification of body, it is to avoid need word for word to split word in correlation technique, then word is identified led to accumulation by mistake one by one The problem of difference, can improve Text region rate；And convolutional neural networks have deep learning ability, therefore can effectively improve The overall performance of system.

And, by extracting the CNN feature of character image to be identified in the form of sliding window, character features can be obtained Sequence vector, so that RNN can be decoded to each characteristic vector in this sequence and identify, obtains each sliding window Text region result, and invalid information and replicated literal are filtered according to itself memory function by RNN, obtain literary composition to be identified The corresponding effective word of word image, can use end from the feature extracting character image to be identified to going out word according to feature identification It is trained to end, and then effective cumulative errors eliminating between different step.

It should be appreciated that above general description and detailed description hereinafter are only exemplary and explanatory, not The disclosure can be limited.

Brief description

Accompanying drawing herein is merged in description and constitutes the part of this specification, shows the enforcement meeting the present invention Example, and be used for explaining the principle of the present invention together with description.

Figure 1A is the flow chart of the character recognition method according to an exemplary embodiment.

Figure 1B is the character image schematic diagram to be identified according to an exemplary embodiment.

Fig. 2 is the flow chart of the character recognition method according to an exemplary embodiment one.

Fig. 3 A is the flow chart of the CNN feature of determination character image to be identified according to an exemplary embodiment two.

Fig. 3 B is the flow chart of the step 302 according to an exemplary embodiment two.

Fig. 4 is the characteristic vector of described default dimension to be decoded using RNN according to an exemplary embodiment three Flow chart to corresponding word.

Fig. 5 is a kind of block diagram of the character recognition device according to an exemplary embodiment.

Fig. 6 is the block diagram of another kind of character recognition device according to an exemplary embodiment.

Fig. 7 is the block diagram of another character recognition device according to an exemplary embodiment.

Fig. 8 is a kind of block diagram being applied to character recognition device according to an exemplary embodiment.

Specific embodiment

Here will in detail exemplary embodiment be illustrated, its example is illustrated in the accompanying drawings.Explained below is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the present invention.On the contrary, they be only with such as appended The example of the consistent apparatus and method of some aspects being described in detail in claims, the present invention.

Figure 1A is the flow chart of the character recognition method according to an exemplary embodiment, and Figure 1B is exemplary according to one Implement the character image schematic diagram to be identified exemplifying；This character recognition method can be applied in electronic equipment (for example：Intelligent handss Machine, panel computer, personal computer) on, as shown in Figure 1A, this character recognition method comprises the following steps：

In a step 101, determine the CNN feature of character image to be identified, obtain the characteristic vector sequence of default dimension.

In one embodiment, character image to be identified can be a single line text image, can comprise multiple words.

In one embodiment, referring to Figure 1B, character image to be identified includes " whole people harmonious China for the national games " eight words.

In one embodiment, determine that the method for the CNN feature of character image to be identified can be found in Fig. 3 A embodiment, here first Do not describe in detail.

In a step 102, using RNN, the characteristic vector sequence of default dimension is decoded, obtains word graph to be identified Word in picture.

In one embodiment, using the LSTM network in RNN network, the characteristic vector of default dimension can be decoded, LSTM network has memory function, but also can pass through the direction of gate function control information stream.

In one embodiment, LSTM network can be decoded to each of characteristic vector sequence characteristic vector, obtains To the corresponding recognition result of this feature vector, recognition result can be word or space.

In one embodiment, the method using RNN, the characteristic vector of default dimension being decoded obtaining corresponding word Can be found in Fig. 4 embodiment, first do not describe in detail here.

In the present embodiment, by the CNN feature of deep learning character image to be identified, character image to be identified can be obtained Characteristic vector sequence, and pass through RNN, such as LSTM can be decoded to characteristic vector sequence and obtain character image to be identified Corresponding word, the method being combined using CNN and RNN can complete the identification overall to character image to be identified, it is to avoid Need in correlation technique word for word to split word, then word is identified the problem of led to cumulative error, Ke Yiti one by one High Text region rate；And convolutional neural networks have deep learning ability, the overall performance of system therefore can be effectively improved.

In one embodiment, determine the convolutional neural networks CNN feature of character image to be identified, obtain the spy of default dimension Levy sequence vector, it may include：

Character image to be identified is divided into N number of sliding window, each window M picture overlapping with each adjacent window apertures Plain width, N is positive integer, and M is positive integer；

Determine the CNN feature of each sliding window, obtain the characteristic vector sequence of the default dimension that length is N.

In one embodiment, determine the CNN feature of each sliding window, it may include：

The image of each sliding window is inputted CNN network；

Control the convolutional layer of the default number of plies of CNN Web vector graphic and a full articulamentum that the image of each sliding window is entered Row feature extraction and process, obtain the characteristic vector of the corresponding default dimension of each sliding window.

In one embodiment, using Recognition with Recurrent Neural Network RNN, the characteristic vector of default dimension is decoded, it may include：

The characteristic vector sequence of the default dimension for N for the length is sequentially input RNN model；

Control RNN that the characteristic vector sequence of the default dimension that the length sequentially inputting is N is decoded, obtain corresponding N number of recognition result.

In one embodiment, control RNN that the characteristic vector sequence of the default dimension that the length sequentially inputting is N is solved After code, may also include：

Filter the non-legible invalid information in N number of recognition result, obtain Word message；

According to each window pixel wide overlapping with each adjacent window apertures, determine the repetition literary composition in Word message Word；

Replicated literal is eliminated, obtains the effective word in character image to be identified.

In one embodiment, method may also include：

Process is zoomed in and out to gray level image, obtaining width is preset height, length to height ratio and original character image to be identified The consistent character image to be identified of length to height ratio, executed based on character image to be identified and determine that the CNN of character image to be identified is special The step levied.

Specifically how to identify word, refer to subsequent embodiment.

Below the technical scheme that the embodiment of the present disclosure provides is illustrated with specific embodiment.

Fig. 2 is the flow chart of the character recognition method according to an exemplary embodiment one；The present embodiment utilizes this public affairs The said method of embodiment offer is provided, carries out example taking the word how electronic equipment identifies in character image to be identified as a example Property explanation, as shown in Fig. 2 comprising the steps：

In step 201, gray proces are carried out to original character image, obtain gray level image.

In step 202., process is zoomed in and out to gray level image, obtain character image to be identified.

In one embodiment, because disclosed technique scheme passes through the CNN model that trains and RNN model realization, and it is Ensure that the CNN model training can preferably extract the characteristic vector treating each word, therefore can be to CNN mode input The fixing gray level image of size, the such as gray level image of a size of 32 pixel * 32 pixel.

In one embodiment, original character image scaling can be to be highly preset height, the image of such as 32 pixels, treat The length to height ratio of identification character image is consistent with the length to height ratio of original image.

In step 203, determine the convolutional neural networks CNN feature of character image to be identified, obtain the spy of default dimension Levy sequence vector.

In one embodiment, by character image to be identified is divided into N number of sliding window in the longitudinal direction, each The size of sliding window is the input size of CNN model, and then realizes extracting word graph to be identified using the mode of sliding window The characteristic vector sequence of picture.

In step 204, using Recognition with Recurrent Neural Network RNN, the characteristic vector sequence of default dimension is decoded, obtains Word in identification character image.

In one embodiment, because each style of writing word is considered as a clock signal, therefore can be by believing to sequential Cease the RNN model having very strong disposal ability to be decoded to according to the characteristic vector sequence that character image to be identified obtains, and Obtain the word in character image to be identified.

In the present embodiment, by original image being scaled the character image to be identified of preset height, it is possible to achieve CNN mould Type extracts feature to the input of fixed dimension, thus improving the accuracy rate of character features vector, and then the follow-up RNN of raising decodes Accuracy rate to word.

Fig. 3 A is the flow chart of the CNN feature of determination character image to be identified according to an exemplary embodiment two, Fig. 3 B is the flow chart of the step 302 according to an exemplary embodiment two；The present embodiment utilizes the embodiment of the present disclosure to provide Said method, taking the CNN feature how electronic equipment determines character image to be identified as a example illustrative, such as Fig. 3 A Shown, comprise the steps：

In step 301, character image to be identified is divided into N number of sliding window, each window is adjacent with each M pixel wide of windows overlay, N is positive integer, and M is positive integer.

In one embodiment, the size of each sliding window and the input of CNN model can be sized to identical, for example Each sliding window can be set to 32 pixel * 32 pixel.

In one embodiment, each sliding window can M pixel wide overlapping with each adjacent window apertures.For example, such as Character image length really to be identified is 128 pixels, and each sliding window length is 32 pixels, each sliding window and phase Adjacent 28 pixels of windows overlay, that is, sliding window sliding step is 4 pixels, then character image to be identified may include 25 slips Window.

In step 302, determine the CNN feature of each sliding window, obtain length be N default dimension feature to Amount sequence.

In one embodiment, determine that the CNN feature of each sliding window can be found in Fig. 3 B embodiment, as shown in Figure 3 B, Comprise the following steps：

In step 311, the image of each sliding window is inputted CNN network.

In one embodiment, each sliding window can be sequentially inputted CNN network.

In step 312, control the convolutional layer of the default number of plies of CNN Web vector graphic and a full articulamentum that each is slided The image of window carries out feature extraction and process, obtains the characteristic vector of the corresponding default dimension of each sliding window.

In one embodiment, CNN network includes the full articulamentum of convolutional layer unification of the default number of plies, for example, include 5 volumes Lamination, after the image input CNN network of each sliding window, the first convolutional layer is processed to image, obtains the first volume Then first convolution feature is inputted the second convolutional layer and obtains the second convolution feature, carry out through each layer of convolutional layer by long-pending feature After process, the convolution eigenvalue full articulamentum of input is obtained the characteristic vector of default dimension.In one embodiment, default dimension is used In the dimension of the character features representing CNN model extraction, the characteristic vector of for example, 128 dimensions.

In the present embodiment, by extracting the CNN feature of character image to be identified in the form of sliding window, literary composition can be obtained Word characteristic vector sequence, so that RNN can be decoded to each characteristic vector in this sequence and identify, obtains each cunning The Text region result of dynamic window, and invalid information and replicated literal are filtered according to the memory function of itself by RNN, treated The identification corresponding effective word of character image, permissible to word is gone out according to feature identification from the feature extracting character image to be identified It is trained using end-to-end, and then effective cumulative errors eliminating between different step.

Fig. 4 be the characteristic vector of default dimension is decoded obtain using RNN according to an exemplary embodiment three right The flow chart of the word answered；Embodiment utilizes the said method that the embodiment of the present disclosure provides, with electronic equipment how to default dimension As a example the characteristic vector sequence decoding of number obtains corresponding word and illustrative with reference to Figure 1B, as shown in figure 4, including Following steps：

In step 401, by length, the characteristic vector sequence of the default dimension for N sequentially inputs RNN model.

In step 402, control RNN that the characteristic vector sequence of the default dimension that the length sequentially inputting is N is solved Code, obtains corresponding N number of recognition result.

In one embodiment, in the disclosure, RNN can use LSTM model, LSTM model using default dimension feature to Amount conduct input, using corresponding for characteristic vector recognition result as output；In one embodiment, recognition result can be word, Can also be space, can also be background etc..

In one embodiment, referring to Figure 1B it is assumed that the length of the character image to be identified shown in Figure 1B be 280 pixels, It is highly 32 pixels, the size of sliding window is 32 pixel * 32 pixel, sliding and mending step-length is 1 pixel, then can obtain 248 sliding windows, adjacent window apertures overlapping most, the recognition result of therefore adjacent window apertures is very likely to be identical, For example, in Figure 1B, the recognition result of first sliding window is " complete ", and the recognition result of 2-5 sliding window may be also " complete ", and the 6-32 sliding window is " complete " word because of part, partly for " people " word, therefore corresponding recognition result may be Idle character.

In step 403, filter the non-legible invalid information in N number of recognition result, obtain Word message.

In one embodiment, can be first by the invalid information in N number of recognition result, the such as information filtering such as space, background Fall.

In step 404, according to the pixel wide that each window is overlapping with each adjacent window apertures, determine Word message In replicated literal.

In one embodiment, LSTM has memory function, can be using Connection Time classification (connectionist Temporal classification, CTC) method the word of repetition is filtered.In one embodiment, according to each The individual window pixel wide overlapping with each adjacent window apertures it may be determined that going out the replicated literal of the misrecognition in Word message, And retaining the replicated literal of the script in character image to be identified, such as the recognition result in Figure 1B is that " Quan Quanquan is complete ... The people people people people ... Quan Quanquan is complete ... fortune fortune fortune fortune fortune ... and and and and and ... in in in humorous ... In ... state of Guo Guoguo state ", wherein, ellipsis is used for representing space or other non-legible idle characters, according to each The window pixel wide overlapping with each adjacent window apertures is it may be determined that continuous repeat character (RPT) is the replicated literal of misrecognition.

In step 405, replicated literal is eliminated, obtain the effective word in character image to be identified.

In one embodiment, referring to the example above, through filtering invalid information and duplicate message, can correctly be known Other result " whole people harmonious China for the national games ".

In the present embodiment, each characteristic vector in the characteristic vector sequence that RNN can obtain to CNN is decoded and knows Not, obtain the Text region result of each sliding window, the memory function according to itself filters invalid information and replicated literal, Obtain the corresponding effective word of character image to be identified, go out literary composition from the feature extracting character image to be identified to according to feature identification Word can be trained using end-to-end, and then effective cumulative errors eliminating between different step.

Corresponding with the embodiment of aforementioned character recognition method, the disclosure additionally provides character recognition device and its is applied Electronic equipment embodiment.

Fig. 5 is a kind of block diagram of the character recognition device according to an exemplary embodiment, as shown in figure 5, word is known Other device includes：

Characteristic extracting module 510, is configured to determine that the convolutional neural networks CNN feature of character image to be identified, obtains The characteristic vector sequence of default dimension；

Decoder module 520, is configured with the default dimension that Recognition with Recurrent Neural Network RNN determines to characteristic extracting module 510 The characteristic vector sequence of number is decoded, and obtains the word in character image to be identified.

Fig. 6 is the block diagram of another kind of character recognition device according to an exemplary embodiment, as shown in fig. 6, upper On the basis of stating embodiment illustrated in fig. 5, in one embodiment, characteristic extracting module 510 may include：

Segmentation submodule 511, be configured to for character image to be identified to be divided into N number of sliding window, each window with Each adjacent window apertures M pixel wide of overlap, N is positive integer, and M is positive integer；

Determination sub-module 512, is configured to determine that the CNN of each sliding window that segmentation submodule 511 segmentation obtains Feature, obtains the characteristic vector sequence of the default dimension that length is N.

In one embodiment, determination sub-module 512 may include：

First input submodule 5121, is configured to segmentation submodule is split the image of each sliding window obtaining Input CNN network；

Process submodule 5122, be configured to control the convolutional layer of the default number of plies of CNN Web vector graphic and a full articulamentum Feature extraction and process are carried out to the image of each sliding window of the first input submodule 5121 input, obtains each cunning The characteristic vector of the corresponding default dimension of dynamic window.

In one embodiment, decoder module 520 may include：

Second input submodule 521, the characteristic vector sequence being configured to the default dimension for N by length sequentially inputs RNN model；

Decoding sub-module 522, is configured to control the length that RNN sequentially inputs to the second input submodule 521 to be the pre- of N If the characteristic vector sequence of dimension is decoded, obtain corresponding N number of recognition result.

Fig. 7 is the block diagram of another character recognition device according to an exemplary embodiment, as shown in fig. 7, upper On the basis of stating Fig. 5 or embodiment illustrated in fig. 6, in one embodiment, device may also include：

Filtering module 530, is configured to filter the non-legible nothing in N number of recognition result that decoding sub-module 522 obtains Effect information, obtains Word message；

Determining module 540, is configured to, according to each window pixel wide overlapping with each adjacent window apertures, determine Replicated literal in the Word message that filtering module 530 obtains；

Cancellation module 550, the replicated literal being configured to determine determining module 540 eliminates, and obtains word graph to be identified Effective word in picture.

In one embodiment, device also includes：

Gray scale module 560, is configured to carry out gray proces to original character image, obtains gray level image；

Zoom module 570, the gray level image being configured to gray scale module 560 is obtained zooms in and out process, obtains width For the character image to be identified that preset height, length to height ratio are consistent with the length to height ratio of original character image to be identified, based on to be identified The step that character image execution determines the CNN feature of character image to be identified.

In said apparatus, the process of realizing of the function of unit and effect specifically refers to corresponding step in said method Realize process, will not be described here.

For device embodiment, because it corresponds essentially to embodiment of the method, thus real referring to method in place of correlation The part applying example illustrates.Device embodiment described above is only schematically, wherein illustrates as separating component Unit can be or may not be physically separate, as the part that unit shows can be or may not be Physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to the actual needs Select the purpose to realize disclosure scheme for some or all of module therein.Those of ordinary skill in the art are not paying wound In the case of the property made work, you can to understand and to implement.

Fig. 8 is a kind of block diagram being applied to character recognition device according to an exemplary embodiment.For example, device 800 can be electronic equipment, such as panel computer, smart mobile phone etc..

With reference to Fig. 8, device 800 can include following one or more assemblies：Process assembly 802, memorizer 804, power supply Assembly 806, multimedia groupware 808, audio-frequency assembly 810, the interface 812 of input/output (I/O), sensor cluster 814, and Communication component 816.

The integrated operation of the usual control device 800 of process assembly 802, such as with display, call, data communication, phase Machine operation and record operate associated operation.Treatment element 802 can include one or more processors 820 and refer to execute Order, to complete all or part of step of above-mentioned method.Additionally, process assembly 802 can include one or more modules, just Interaction between process assembly 802 and other assemblies.For example, processing component 802 can include multi-media module, many to facilitate Interaction between media component 808 and process assembly 802.

Memorizer 804 is configured to store various types of data to support the operation in equipment 800.The showing of these data Example includes the instruction for any application program of operation or method on device 800, message, picture etc..Memorizer 804 is permissible Realized by any kind of volatibility or non-volatile memory device or combinations thereof, such as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory EPROM (EPROM), programmable Read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, disk or CD.

Power supply module 806 provides electric power for the various assemblies of device 800.Electric power assembly 806 can include power management system System, one or more power supplys, and other generate, manage and distribute, with for device 800, the assembly that electric power is associated.

Multimedia groupware 808 includes the screen of one output interface of offer between device 800 and user.Real at some Apply in example, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen can To be implemented as touch screen, to receive the input signal from user.Touch panel include one or more touch sensors with Gesture on sensing touch, slip and touch panel.Touch sensor can not only sensing touch or sliding action border, and And also detect the persistent period related to touch or slide and pressure.In certain embodiments, multimedia groupware 808 includes One front-facing camera and/or post-positioned pick-up head.When equipment 800 is in operator scheme, during as screening-mode or video mode, front Put photographic head and/or post-positioned pick-up head can receive the multi-medium data of outside.Each front-facing camera and post-positioned pick-up head can To be the optical lens system of a fixation or there is focusing and optical zoom capabilities.

Audio-frequency assembly 810 is configured to output and/or input audio signal.For example, audio-frequency assembly 810 includes a Mike Wind (MIC), when device 800 is in operator scheme, during as call model, logging mode and speech recognition mode, mike is joined It is set to reception external audio signal.The audio signal being received can be further stored in memorizer 804 or via communication set Part 816 sends.In certain embodiments, audio-frequency assembly 810 also includes a speaker, for exports audio signal.

, for providing interface between process assembly 802 and peripheral interface module, above-mentioned peripheral interface module can for I/O interface 812 To be keyboard, click wheel, button etc..These buttons may include but be not limited to：Home button, volume button, start button and lock Determine button.

Sensor cluster 814 includes one or more sensors, for providing the state of various aspects to comment for device 800 Estimate.For example, sensor cluster 814 can detect/the closed mode of opening of equipment 800, the relative localization of assembly, such as assembly Display for device 800 and keypad, sensor cluster 814 can be with the position of detection means 800 or 800 1 assemblies of device Put change, user is presence or absence of with what device 800 contacted, the temperature of device 800 orientation or acceleration/deceleration and device 800 Change.Sensor cluster 814 can include proximity transducer, is configured near the detection when not having any physical contact The presence of object.Sensor cluster 814 can also include optical sensor, such as CMOS or ccd image sensor, for answering in imaging Use middle use.In certain embodiments, this sensor cluster 814 can also include acceleration transducer, gyro sensor, magnetic Sensor, distance-sensor, pressure transducer or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, such as WIFI, 2G or 3G, or combinations thereof.In an exemplary enforcement In example, communication component 816 receives broadcast singal or the broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, communication component 816 also includes near-field communication (NFC) module, to promote junction service.For example, RF identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra broadband (UWB) technology, bluetooth can be based in NFC module (BT) technology and other technologies are realizing.

In the exemplary embodiment, device 800 can be by one or more application specific integrated circuits (ASIC), numeral letter Number processor (DSP), digital signal processing appts (DSPD), PLD (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing said method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided As included the memorizer 804 instructing, above-mentioned instruction can be executed by the processor 820 of device 800 to complete said method.For example, Non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and light Data storage device etc..

Those skilled in the art, after considering description and putting into practice disclosure disclosed herein, will readily occur to its of the disclosure Its embodiment.The application is intended to any modification, purposes or the adaptations of the disclosure, these modifications, purposes or Person's adaptations are followed the general principle of the disclosure and are included the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments be considered only as exemplary, the true scope of the disclosure and spirit by following Claim is pointed out.

It should be appreciated that the disclosure is not limited to be described above and precision architecture illustrated in the accompanying drawings, and And various modifications and changes can carried out without departing from the scope.The scope of the present disclosure only to be limited by appended claim.

Claims

1. a kind of character recognition method is it is characterised in that methods described includes：

Using Recognition with Recurrent Neural Network RNN, the characteristic vector sequence of described default dimension is decoded, obtains described literary composition to be identified Word in word image.

2. method according to claim 1 is it is characterised in that the convolutional neural networks of described determination character image to be identified CNN feature, obtains the characteristic vector sequence of default dimension, including：

Described character image to be identified is divided into N number of sliding window, each window described and each adjacent window apertures described Overlapping M pixel wide, N is positive integer, and M is positive integer；

3. method according to claim 2 is it is characterised in that the CNN feature of each sliding window described in described determination, Including：

The image of each sliding window is inputted CNN network；

Described CNN Web vector graphic is controlled to preset the convolutional layer of the number of plies and a full figure to each sliding window described for the articulamentum As carrying out feature extraction and process, obtain the characteristic vector of the corresponding default dimension of each sliding window.

4. method according to claim 2 is it is characterised in that described use Recognition with Recurrent Neural Network RNN is to described default dimension The characteristic vector of number is decoded, including：

Control described RNN that the characteristic vector sequence of the default dimension that the described length sequentially inputting is N is decoded, it is right to obtain The N number of recognition result answered.

5. method according to claim 4 is it is characterised in that the described RNN of described control is to the described length sequentially inputting After the characteristic vector sequence of the default dimension for N is decoded, also include：

According to each window described pixel wide overlapping with each adjacent window apertures, determine the repetition in described Word message Word；

6. method according to claim 1 is it is characterised in that methods described also includes：

Process is zoomed in and out to described gray level image, obtaining width is preset height, length to height ratio and original character image to be identified The consistent character image to be identified of length to height ratio, described determination character image to be identified is executed based on described character image to be identified CNN feature step.

7. a kind of character recognition device is it is characterised in that described device includes：

Characteristic extracting module, is configured to determine that the convolutional neural networks CNN feature of character image to be identified, obtains default dimension Characteristic vector sequence；

Decoder module, is configured with the described default dimension that Recognition with Recurrent Neural Network RNN determines to described characteristic extracting module Characteristic vector sequence be decoded, obtain the word in described character image to be identified.

8. device according to claim 7 is it is characterised in that described characteristic extracting module includes：

Segmentation submodule, is configured to for described character image to be identified to be divided into N number of sliding window, each window described with Each adjacent window apertures M pixel wide of overlap described, N is positive integer, and M is positive integer；

Determination sub-module, is configured to determine that the CNN of each sliding window described that described segmentation submodule segmentation obtains is special Levy, obtain the characteristic vector sequence of the default dimension that length is N.

9. device according to claim 8 is it is characterised in that described determination sub-module includes：

Process submodule, be configured to control the convolutional layer of the default number of plies of described CNN Web vector graphic and a full articulamentum to institute The image stating each sliding window described of the first input submodule input carries out feature extraction and process, obtains each cunning The characteristic vector of the corresponding default dimension of dynamic window.

10. device according to claim 8 is it is characterised in that described decoder module includes：

Second input submodule, the characteristic vector sequence being configured to the default dimension being N by described length sequentially inputs RNN mould Type；

Decoding sub-module, is configured to control the length that described RNN sequentially inputs to described second input submodule to be the default of N The characteristic vector sequence of dimension is decoded, and obtains corresponding N number of recognition result.

11. devices according to claim 10 are it is characterised in that described device also includes：

Filtering module, it is non-legible invalid in described N number of recognition result that described decoding sub-module obtains to be configured to filter Information, obtains Word message；

Determining module, is configured to, according to each window described pixel wide overlapping with each adjacent window apertures, determine institute State the replicated literal in the described Word message that filtering module obtains；

Cancellation module, the described replicated literal being configured to determine described determining module eliminates, and obtains described word to be identified Effective word in image.

12. devices according to claim 7 are it is characterised in that described device also includes：

Zoom module, the described gray level image being configured to described gray scale module is obtained zooms in and out process, obtains width and is Preset height, the length to height ratio to be identified character image consistent with the length to height ratio of original character image to be identified, waits to know based on described The step of the CNN feature of other character image execution described determination character image to be identified.

A kind of 13. devices controlling Text region are it is characterised in that described device includes：

Processor；

For storing the memorizer of processor executable；

Wherein, described processor is configured to：