CN108062547A

CN108062547A - Character detecting method and device

Info

Publication number: CN108062547A
Application number: CN201711332870.9A
Authority: CN
Inventors: 杨松
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2018-05-22
Anticipated expiration: 2037-12-13
Also published as: CN108062547B

Abstract

The disclosure is directed to a kind of character detecting method and devices.Wherein, which includes：Word candidate region extraction is carried out to image, obtains multiple word candidate regions；Calculate the multiple word candidate region word probability and, calculate word probability and meet the mask figure of its character area of the word candidate region of probability threshold value requirement；The external border of minimum of the mask figure is calculated, obtains text detection result.Since the disclosure extracts multiple character areas in a manner that word candidate region is extracted, calculate the mask figure that word probability meets its character area of the word candidate region of probability threshold value requirement, and the shielding of mask figure is exactly that character area is in situ, therefore, no matter whether word tilts, word position can accurately be corresponded to by obtaining text detection result according to the external border of minimum of mask figure, therefore, the character detecting method that the embodiment of the present disclosure provides can tackle the word of various typesettings, effectively raise the recall rate of word.

Description

Character detecting method and device

Technical field

This disclosure relates to computer realm more particularly to a kind of character detecting method and device.

Background technology

Text detection refers to find the position of word in image.

In correlation technique, text detection is generally realized using the method for object detection.In detection process, put down with several Whether capable rectangle frame is to being that word is detected into the object in rectangle frame.However, this detection method is to irregular Such as sideling the word detection result of typesetting is poor for word.

The content of the invention

To overcome the problems, such as present in correlation technique, the disclosure provides a kind of character detecting method and device.

According to the embodiment of the present disclosure in a first aspect, provide a kind of character detecting method, including：Word time is carried out to image Favored area is extracted, and obtains multiple word candidate regions；Calculate the multiple word candidate region word probability and, calculate Word probability meets the mask figure of its character area of the word candidate region of probability threshold value requirement；Calculate the minimum of the mask figure External border obtains text detection result.

It is described to calculate the multiple word time according to a kind of possible embodiment of the first aspect of the embodiment of the present disclosure The word probability of favored area and, calculate word probability meet its character area of the word candidate region of probability threshold value requirement Mask figure includes：The word probability of the multiple word candidate region is calculated using convolutional neural networks and word probability meets generally The mask figure of its character area of the word candidate region of rate threshold requirement.

According to a kind of possible embodiment of the first aspect of the embodiment of the present disclosure, the calculating word probability meets general The mask figure of its character area of the word candidate region of rate threshold requirement includes：It is filtered out from the multiple word candidate region Word probability meets the word candidate region of probability threshold value requirement；Its character area is carried out to the word candidate region filtered out The calculating of mask figure.

According to a kind of possible embodiment of the first aspect of the embodiment of the present disclosure, the calculating word probability meets general The mask figure of its character area of the word candidate region of rate threshold requirement includes：Calculate the multiple its word of word candidate region The mask figure in region；From the mask figure of its character area of the multiple word candidate region, filter out probability and meet probability threshold It is worth the mask figure of the word candidate region of requirement.

It is described to calculate the multiple word time according to a kind of possible embodiment of the first aspect of the embodiment of the present disclosure The mask figure of its character area of favored area includes：Feature extraction is carried out to described image using the first convolutional neural networks, is obtained Characteristic pattern, wherein first volume product neutral net is to have completed the convolutional neural networks of image characteristics extraction training；By described in Multiple word candidate regions are respectively mapped in the characteristic pattern, obtain the corresponding feature in the multiple word candidate region Region；The corresponding characteristic area in the multiple word candidate region is mapped as to the feature vector of fixed size；By described in The feature vector of the corresponding fixed size in multiple word candidate regions is input to the second convolutional neural networks, and institute is calculated The mask figure of the corresponding word probability in multiple word candidate regions and its character area is stated, wherein, the second convolution god It is the convolutional neural networks for having completed word probability and the training of mask figure through network.

According to the second aspect of the embodiment of the present disclosure, a kind of text detection device is provided, including：Region extraction module, quilt It is configured to carry out word candidate region extraction to image, obtains multiple word candidate regions.First computing module is configured as counting Calculate multiple word candidate regions that the region extraction module obtains word probability and, calculate word probability meet probability The mask figure of its character area of the word candidate region of threshold requirement.Second computing module is configured as calculating first meter The external border of minimum for the mask figure that module calculates is calculated, obtains text detection result.

According to a kind of possible embodiment of the second aspect of the embodiment of the present disclosure, first computing module is configured Meet probability threshold value to calculate the word probability of the multiple word candidate region and word probability using convolutional neural networks and want The mask figure for its character area of word candidate region asked.

According to a kind of possible embodiment of the second aspect of the embodiment of the present disclosure, first computing module includes： Probability calculation submodule is configured as calculating the word probability of the multiple word candidate region.Probability screen submodule, by with It is set to from the multiple word candidate region and filters out the word candidate region that word probability meets probability threshold value requirement.Mask Figure computational submodule is configured as carrying out the word candidate region filtered out the calculating of the mask figure of its character area.

According to a kind of possible embodiment of the second aspect of the embodiment of the present disclosure, first computing module includes： Probability and mask figure computational submodule are configured as calculating the word probability and its character area of the multiple word candidate region Mask figure.Probability and mask figure screening submodule, are configured as covering from its character area of the multiple word candidate region In code figure, the mask figure that probability meets the word candidate region of probability threshold value requirement is filtered out.

According to a kind of possible embodiment of the second aspect of the embodiment of the present disclosure, the probability and mask figure calculate son Module includes：First convolution computational submodule is configured as putting forward described image progress feature using the first convolutional neural networks It takes, obtains characteristic pattern, wherein first volume product neutral net is to have completed the convolutional neural networks of image characteristics extraction training. Characteristic area mapping submodule is configured as the multiple word candidate region being respectively mapped in the characteristic pattern, obtain The corresponding characteristic area in the multiple word candidate region.DUAL PROBLEMS OF VECTOR MAPPING submodule is configured as the multiple word The corresponding characteristic area in candidate region is mapped as the feature vector of fixed size.Second convolution computational submodule, is configured For the feature vector of the corresponding fixed size in the multiple word candidate region is input to the second convolutional neural networks, count Calculation obtains the mask figure of the corresponding word probability in the multiple word candidate region and its character area, wherein, described the Two convolutional neural networks are the convolutional neural networks for having completed word probability and the training of mask figure.

According to the third aspect of the embodiment of the present disclosure, a kind of text detection device is provided, including：Processor；For storing The memory of processor-executable instruction；Wherein, the processor is configured as：Word candidate region extraction is carried out to image, Obtain multiple word candidate regions；Calculate the multiple word candidate region word probability and, calculate word probability and meet The mask figure of its character area of word candidate region of probability threshold value requirement；The external border of minimum of the mask figure is calculated, is obtained To text detection result.

According to the fourth aspect of the embodiment of the present disclosure, a kind of computer readable storage medium is provided, is stored thereon with calculating Machine program instruction realizes any possible embodiment party of the first aspect of the embodiment of the present disclosure when program instruction is executed by processor The step of formula the method.

The technical scheme provided by this disclosed embodiment can include the following benefits：The disclosure passes through word candidate regions The mode of domain extraction extracts multiple character areas, calculates the word candidate region that word probability meets probability threshold value requirement The mask figure of its character area, and the shielding of mask figure is exactly that character area is in situ, therefore, no matter whether word tilts, according to The external border of minimum of mask figure, which obtains text detection result, can accurately correspond to word position, and therefore, the disclosure is implemented The character detecting method that example provides can tackle the word of various typesettings, effectively raise the recall rate of word.

It should be appreciated that above general description and following detailed description are only exemplary and explanatory, not The disclosure can be limited.

Description of the drawings

Attached drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the disclosure Example, and for explaining the principle of the disclosure together with specification.

Fig. 1 is the flow chart according to a kind of character detecting method shown in an exemplary embodiment.

Fig. 2 is according to the image schematic diagram shown in an exemplary embodiment.

Fig. 3 is a kind of flow chart of the character detecting method shown according to another exemplary embodiment.

Fig. 4 is the block diagram according to a kind of text detection device shown in an exemplary embodiment.

Fig. 5 is a kind of block diagram of the text detection device shown according to another exemplary embodiment.

Fig. 6 is the block diagram according to a kind of text detection device shown in another exemplary embodiment.

Fig. 7 is the block diagram according to a kind of text detection device shown in an exemplary embodiment.

Specific embodiment

Here exemplary embodiment will be illustrated in detail, example is illustrated in the accompanying drawings.Following description is related to During attached drawing, unless otherwise indicated, the same numbers in different attached drawings represent the same or similar element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the disclosure.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects be described in detail in claims, the disclosure.

Fig. 1 is according to a kind of flow chart of character detecting method shown in an exemplary embodiment, as shown in Figure 1, this article Word detection method may comprise steps of：

In step 110, word candidate region extraction is carried out to image, obtains multiple word candidate regions.

For example, the quantity n of word candidate region can be hundreds of to thousands of.Common candidate region extracting method has Selective Search, RPN etc..Each candidate region is by its transverse and longitudinal coordinate height-width isoparametric formulations.

In the step 120, calculate the multiple word candidate region word probability and, calculate word probability and meet The mask figure of its character area of word candidate region of probability threshold value requirement.

For example, in a kind of possible embodiment, convolutional neural networks can be utilized to calculate the multiple word candidate regions The word probability and word probability in domain meet the mask figure of its character area of the word candidate region of probability threshold value requirement.In the reality It applies in mode, the advantage of convolutional neural networks can be utilized to realize speed text detection with high accuracy soon.

It should be noted that the disclosure is to the computation sequence of word probability and mask figure and is not limited.

For example, in a kind of possible embodiment, the word probability of the multiple word candidate region can be first calculated, from The word candidate region that word probability meets probability threshold value requirement is filtered out in the multiple word candidate region, to what is filtered out Word candidate region carries out the calculating of the mask figure of its character area.The calculating of mask figure can be reduced by the embodiment Amount.

For another example in alternatively possible embodiment, the word probability of multiple word candidate regions and the meter of mask figure Calculation order is unlimited, after the mask figure of its character area of multiple word candidate regions is calculated, from the multiple word candidate regions In the mask figure of its character area of domain, the mask figure that probability meets the word candidate region of probability threshold value requirement is filtered out.Pass through The embodiment can calculate word probability simultaneously and mask figure, detection speed are very fast.

In step 130, the external border of minimum of the mask figure is calculated, obtains text detection result.

For example, the minimum enclosed rectangle of mask figure can be calculated.In image schematic diagram as shown in Figure 2, " STOP " word The minimum enclosed rectangle frame 210 of mask figure corresponding to region is text detection as a result, it has accurately framed word institute in place It puts.

As it can be seen that the disclosure extracts multiple character areas in a manner that word candidate region is extracted, word is calculated Probability meets the mask figure of its character area of the word candidate region of probability threshold value requirement, and the shielding of mask figure is exactly literal field Domain is in situ, and therefore, no matter whether word tilts, and obtaining text detection result according to the external border of minimum of mask figure can be accurate Corresponding word position, therefore, the character detecting method that the embodiment of the present disclosure provides can tackle the word of various typesettings, effectively The recall rate for improving word.

Fig. 3 is a kind of flow chart of the character detecting method shown according to another exemplary embodiment, as shown in figure 3, should Character detecting method may comprise steps of：

In the step 310, word candidate region extraction is carried out to image, obtains multiple word candidate regions.

In step 320, feature extraction is carried out to described image using the first convolutional neural networks, obtains characteristic pattern, Described in the first convolutional neural networks be completed image characteristics extraction training convolutional neural networks.

In a step 330, the multiple word candidate region is respectively mapped in the characteristic pattern, obtained the multiple The corresponding characteristic area in word candidate region.

Characteristic pattern namely eigenmatrix, for describing the high-layer semantic information of image, such as what is inside image, Where etc..

For example, for word candidate region r=(x, y, w, h), characteristic pattern F is mapped that_c, the r can be obtained in F_cIn Corresponding characteristic area r_c=(x_c,y_c,w_c,h_c)=(s_c*x,s_c*y,s_c*w,s_c* h), s_cFor input image size to its feature The zoom factor of figure size.

In step 340, the corresponding characteristic area in the multiple word candidate region is mapped as fixed size Feature vector.

It for example, can be to characteristic area r_cThe operation of maximum pondization is carried out, is mapped as the feature vector of regular length f_c。

In step 350, the feature vector of the corresponding fixed size in the multiple word candidate region is input to The corresponding word probability in the multiple word candidate region and its character area is calculated in second convolutional neural networks Mask figure, wherein, second convolutional neural networks are the convolutional neural networks for having completed word probability and the training of mask figure.

In step 360, from the mask figure of its character area of the multiple word candidate region, probability satisfaction is filtered out The mask figure of the word candidate region of probability threshold value requirement.

For example, the word probability of multiple word candidate regions is carried out using default probability threshold value threshold filtering and it is non-most Big value inhibits, and the mask figure of the word candidate region remained is the mask figure of the word candidate region filtered out.

In step 370, the minimum enclosed rectangle of the mask figure is calculated, obtains text detection result.

As it can be seen that since the present embodiment using convolutional neural networks calculates the mask figure of word probability and character area simultaneously, Text detection is obtained according to the minimum enclosed rectangle of mask figure as a result, therefore, fast and accurately word institute can be detected with rectangle In position.

Fig. 4 is according to a kind of block diagram of text detection device 400 shown in an exemplary embodiment, as shown in figure 4, this article Word detection device 400 can include：Region extraction module 410, the first computing module 420 and the second computing module 430.

The region extraction module 410 can be configured as and carry out word candidate region extraction to image, obtains multiple words Candidate region.

First computing module 420 can be configured as and calculate multiple words time that the region extraction module 410 obtains The word probability of favored area and, calculate word probability meet its character area of the word candidate region of probability threshold value requirement Mask figure.

Second computing module 430 can be configured as and calculate the mask figure that first computing module 420 calculates Minimum external border, obtains text detection result.

As it can be seen that the disclosure region extraction module 410 by word candidate region extract in a manner of to multiple character areas It extracts, the first computing module 420 calculates word probability and meets its character area of the word candidate region of probability threshold value requirement Mask figure, and the shielding of mask figure is exactly that character area is in situ, and therefore, no matter whether word tilts, the second computing module 430 Word position, therefore, the disclosure can accurately be corresponded to by obtaining text detection result according to the external border of minimum of mask figure The character detecting method that embodiment provides can tackle the word of various typesettings, effectively raise the recall rate of word.

In a kind of possible embodiment, first computing module 420, which can be configured as, utilizes convolutional neural networks Calculate the word probability of the multiple word candidate region and word probability meet probability threshold value requirement word candidate region its The mask figure of character area.In this embodiment, the advantage of convolutional neural networks can be utilized to realize that speed is with high accuracy soon Text detection.

For example, a kind of block diagram of the text detection device 500 shown according to another exemplary embodiment as shown in Figure 5, institute Stating the first computing module 420 can include：Probability calculation submodule 421, probability screening submodule 422 and mask figure calculate son Module 423.

The probability calculation submodule 421 can be configured as the word probability for calculating the multiple word candidate region.

The probability screens submodule 422, can be configured as that filter out word from the multiple word candidate region general Rate meets the word candidate region of probability threshold value requirement；

The mask figure computational submodule 423, can be configured as and carry out its literal field to the word candidate region filtered out The calculating of the mask figure in domain.

Since the embodiment carries out the word candidate region filtered out the calculating of the mask figure of its character area, The calculation amount of mask figure can be reduced.

For another example the block diagram of a kind of text detection device 600 according to another exemplary embodiment as shown in Figure 6, First computing module 420 can include：Probability and mask figure computational submodule 424 and probability and mask figure screening submodule 425。

The probability and mask figure computational submodule 424 can be configured as the text for calculating the multiple word candidate region The mask figure of word probability and its character area.

The probability and mask figure screening submodule 425, can be configured as from the multiple its word of word candidate region In the mask figure in region, the mask figure that probability meets the word candidate region of probability threshold value requirement is filtered out.

Word probability can be calculated by the embodiment simultaneously and mask figure, detection speed are very fast.

In the embodiment for combining convolutional neural networks, the probability and mask figure computational submodule 424 can include： First convolution computational submodule, characteristic area mapping submodule, DUAL PROBLEMS OF VECTOR MAPPING submodule and the second convolution computational submodule.

The first convolution computational submodule can be configured as and carry out spy to described image using the first convolutional neural networks Sign extraction, obtains characteristic pattern, wherein first volume product neutral net is to have completed the convolutional Neural of image characteristics extraction training Network.

This feature area maps submodule, can be configured as the multiple word candidate region is respectively mapped to it is described In characteristic pattern, the corresponding characteristic area in the multiple word candidate region is obtained.

The DUAL PROBLEMS OF VECTOR MAPPING submodule can be configured as the corresponding characteristic area in the multiple word candidate region It is mapped as the feature vector of fixed size.

The second convolution computational submodule can be configured as the corresponding fixation in the multiple word candidate region The feature vector of size is input to the second convolutional neural networks, and the corresponding text in the multiple word candidate region is calculated The mask figure of word probability and its character area, wherein, second convolutional neural networks are to have completed word probability and mask figure Trained convolutional neural networks.

As it can be seen that since the present embodiment using convolutional neural networks calculates the mask figure of word probability and character area simultaneously, Therefore, word position can fast and accurately be detected.

On the device in above-described embodiment, wherein modules perform the concrete mode of operation in related this method Embodiment in be described in detail, explanation will be not set forth in detail herein.

The disclosure also provides a kind of computer readable storage medium, is stored thereon with computer program instructions, which refers to The step of character detecting method that the disclosure provides is realized when order is executed by processor.

Fig. 7 is the block diagram according to a kind of text detection device 700 shown in an exemplary embodiment.For example, device 700 can To be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices are good for Body equipment, personal digital assistant etc..

With reference to Fig. 7, device 700 can include following one or more assemblies：Processing component 702, memory 704, electric power Component 706, multimedia component 708, audio component 710, the interface 712 of input/output (I/O), sensor module 714 and Communication component 716.

The integrated operation of 702 usual control device 700 of processing component, such as with display, call, data communication, phase Machine operates and record operates associated operation.Processing component 702 can refer to including one or more processors 720 to perform Order, to complete all or part of step of above-mentioned character detecting method.In addition, processing component 702 can include one or more A module, convenient for the interaction between processing component 702 and other assemblies.For example, processing component 702 can include multimedia mould Block, to facilitate the interaction between multimedia component 708 and processing component 702.

Memory 704 is configured as storing various types of data to support the operation in device 700.These data are shown Example is included for the instruction of any application program or method that are operated on device 700, contact data, and telephone book data disappears Breath, picture, video etc..Memory 704 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.

Electric power assembly 706 provides electric power for the various assemblies of device 700.Electric power assembly 706 can include power management system System, one or more power supplys and other generate, manage and distribute electric power associated component with for device 700.

Multimedia component 708 is included in the screen of one output interface of offer between described device 700 and user.One In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch-screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Border, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 708 includes a front camera and/or rear camera.When device 700 is in operation mode, such as screening-mode or During video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 710 is configured as output and/or input audio signal.For example, audio component 710 includes a Mike Wind (MIC), when device 700 is in operation mode, during such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The received audio signal can be further stored in memory 704 or via communication set Part 716 is sent.In some embodiments, audio component 710 further includes a loud speaker, for exports audio signal.

I/O interfaces 712 provide interface between processing component 702 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to：Home button, volume button, start button and lock Determine button.

Sensor module 714 includes one or more sensors, and the state for providing various aspects for device 700 is commented Estimate.For example, sensor module 714 can detect opening/closed state of device 700, and the relative positioning of component, for example, it is described Component is the display and keypad of device 700, and sensor module 714 can be with 700 1 components of detection device 700 or device Position change, the existence or non-existence that user contacts with device 700,700 orientation of device or acceleration/deceleration and device 700 Temperature change.Sensor module 714 can include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 714 can also include optical sensor, such as CMOS or ccd image sensor, for into As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 716 is configured to facilitate the communication of wired or wireless way between device 700 and other equipment.Device 700 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation In example, communication component 716 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 716 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 700 can be believed by one or more application application-specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing above-mentioned character detecting method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 704 of instruction, above-metioned instruction can be performed to complete above-mentioned text detection side by the processor 720 of device 700 Method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic Band, floppy disk and optical data storage devices etc..

Those skilled in the art will readily occur to other embodiment party of the disclosure after considering specification and putting into practice the disclosure Case.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or adaptability Variation follows the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure or usual skill Art means.Description and embodiments are considered only as illustratively, and the true scope and spirit of the disclosure are by following claim It points out.

It should be appreciated that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by appended claim.

Claims

1. a kind of character detecting method, which is characterized in that including：

Word candidate region extraction is carried out to image, obtains multiple word candidate regions；

Calculate the multiple word candidate region word probability and, calculate word probability and meet the text of probability threshold value requirement The mask figure of its character area of word candidate region；

The external border of minimum of the mask figure is calculated, obtains text detection result.

2. character detecting method according to claim 1, which is characterized in that described to calculate the multiple word candidate region Word probability and, calculate word probability and meet the mask figure of its character area of the word candidate region of probability threshold value requirement Including：

The word probability of the multiple word candidate region is calculated using convolutional neural networks and word probability meets probability threshold value It is required that its character area of word candidate region mask figure.

3. character detecting method according to claim 1 or 2, which is characterized in that the calculating word probability meets probability The mask figure of its character area of the word candidate region of threshold requirement includes：

The word candidate region that word probability meets probability threshold value requirement is filtered out from the multiple word candidate region；

The calculating of the mask figure of its character area is carried out to the word candidate region filtered out.

4. character detecting method according to claim 1 or 2, which is characterized in that the calculating word probability meets probability The mask figure of its character area of the word candidate region of threshold requirement includes：

Calculate the mask figure of its character area of the multiple word candidate region；

From the mask figure of its character area of the multiple word candidate region, the text that probability meets probability threshold value requirement is filtered out The mask figure of word candidate region.

5. character detecting method according to claim 4, which is characterized in that described to calculate the multiple word candidate region The mask figure of its character area includes：

Feature extraction is carried out to described image using the first convolutional neural networks, obtains characteristic pattern, wherein first volume product god It is the convolutional neural networks for having completed image characteristics extraction training through network；

The multiple word candidate region is respectively mapped in the characteristic pattern, obtains the multiple word candidate region difference Corresponding characteristic area；

The corresponding characteristic area in the multiple word candidate region is mapped as to the feature vector of fixed size；

The feature vector of the corresponding fixed size in the multiple word candidate region is input to the second convolutional neural networks, The mask figure of the corresponding word probability in the multiple word candidate region and its character area is calculated, wherein, it is described Second convolutional neural networks are the convolutional neural networks for having completed word probability and the training of mask figure.

6. a kind of text detection device, which is characterized in that including：

Region extraction module is configured as carrying out word candidate region extraction to image, obtains multiple word candidate regions；

First computing module, the word for being configured as calculating multiple word candidate regions that the region extraction module obtains are general Rate and, calculate word probability and meet the mask figure of its character area of the word candidate region of probability threshold value requirement；

Second computing module is configured as calculating the external border of minimum for the mask figure that first computing module calculates, obtains To text detection result.

7. text detection device according to claim 6, which is characterized in that first computing module is configured as utilizing Convolutional neural networks calculate the word probability of the multiple word candidate region and word probability meets the text of probability threshold value requirement The mask figure of its character area of word candidate region.

8. the text detection device according to claim 6 or 7, which is characterized in that first computing module includes：

Probability calculation submodule is configured as calculating the word probability of the multiple word candidate region；

Probability screens submodule, is configured as filtering out word probability from the multiple word candidate region and meets probability threshold value It is required that word candidate region；

Mask figure computational submodule is configured as carrying out the word candidate region filtered out the meter of the mask figure of its character area It calculates.

9. the text detection device according to claim 6 or 7, which is characterized in that first computing module includes：

Probability and mask figure computational submodule are configured as calculating the word probability and its word of the multiple word candidate region The mask figure in region；

Probability and mask figure screening submodule, are configured as the mask figure from its character area of the multiple word candidate region In, filter out the mask figure that probability meets the word candidate region of probability threshold value requirement.

10. text detection device according to claim 9, which is characterized in that the probability and mask figure computational submodule Including：

First convolution computational submodule is configured as carrying out feature extraction to described image using the first convolutional neural networks, obtain To characteristic pattern, wherein first volume product neutral net is to have completed the convolutional neural networks of image characteristics extraction training；

Characteristic area mapping submodule is configured as the multiple word candidate region being respectively mapped in the characteristic pattern, Obtain the corresponding characteristic area in the multiple word candidate region；

DUAL PROBLEMS OF VECTOR MAPPING submodule is configured as being mapped as fixing by the corresponding characteristic area in the multiple word candidate region The feature vector of size；

Second convolution computational submodule is configured as the feature of the corresponding fixed size in the multiple word candidate region Vector is input to the second convolutional neural networks, be calculated the corresponding word probability in the multiple word candidate region and its The mask figure of character area, wherein, second convolutional neural networks are the convolution for having completed word probability and the training of mask figure Neutral net.

11. a kind of text detection device, which is characterized in that including：

Processor；

For storing the memory of processor-executable instruction；

Wherein, the processor is configured as：

12. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that the program instruction The step of method any one of Claims 1 to 5 is realized when being executed by processor.