CN108062547B

CN108062547B - Character detection method and device

Info

Publication number: CN108062547B
Application number: CN201711332870.9A
Authority: CN
Inventors: 杨松
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2021-03-09
Anticipated expiration: 2037-12-13
Also published as: CN108062547A

Abstract

The disclosure relates to a character detection method and device. The character detection method comprises the following steps: extracting character candidate regions from the image to obtain a plurality of character candidate regions; calculating character probabilities of the character candidate regions, and calculating mask patterns of the character regions of the character candidate regions with the character probabilities meeting the requirement of a probability threshold; and calculating the minimum external boundary of the mask image to obtain a character detection result. According to the character detection method, the plurality of character areas are extracted in a character candidate area extraction mode, the mask image of the character areas of the character candidate areas with the character probability meeting the probability threshold requirement is calculated, and the mask image shields the original positions of the character areas, so that whether the characters are inclined or not, the character detection result obtained according to the minimum external boundary of the mask image can accurately correspond to the positions of the characters, therefore, the character detection method provided by the embodiment of the disclosure can be used for characters with various types, and the detection rate of the characters is effectively improved.

Description

Character detection method and device

Technical Field

The present disclosure relates to the field of computers, and in particular, to a method and an apparatus for detecting characters.

Background

And character detection means finding the position of characters in the image.

In the related art, the character detection is generally implemented by using an object detection method. In the detection process, whether the object entering the rectangular frame is a character or not is detected by using a plurality of parallel rectangular frames. However, this detection method is poor in detection effect for irregular characters such as obliquely laid characters.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a text detection method and apparatus.

According to a first aspect of the embodiments of the present disclosure, there is provided a text detection method, including: extracting character candidate regions from the image to obtain a plurality of character candidate regions; calculating character probabilities of the character candidate regions, and calculating mask patterns of the character regions of the character candidate regions with the character probabilities meeting the requirement of a probability threshold; and calculating the minimum external boundary of the mask image to obtain a character detection result.

According to a possible implementation manner of the first aspect of the embodiments of the present disclosure, the calculating the text probabilities of the plurality of text candidate regions, and calculating a mask map of text regions of the text candidate regions whose text probabilities satisfy a probability threshold requirement includes: and calculating the character probabilities of the character candidate regions and the mask image of the character region of the character candidate region with the character probability meeting the probability threshold requirement by using a convolutional neural network.

According to a possible implementation manner of the first aspect of the embodiments of the present disclosure, the calculating a mask map of a text region of a text candidate region where a text probability satisfies a probability threshold includes: screening out character candidate regions with character probabilities meeting the requirement of a probability threshold from the character candidate regions; and calculating a mask image of the character region of the screened character candidate region.

According to a possible implementation manner of the first aspect of the embodiments of the present disclosure, the calculating a mask map of a text region of a text candidate region where a text probability satisfies a probability threshold includes: calculating mask patterns of character areas of the character candidate areas; and screening out the mask images of the character candidate regions with the probability meeting the requirement of the probability threshold from the mask images of the character regions of the character candidate regions.

According to a possible implementation manner of the first aspect of the embodiments of the present disclosure, the calculating a mask map of the text regions of the text candidate regions includes: performing feature extraction on the image by using a first convolutional neural network to obtain a feature map, wherein the first convolutional neural network is a convolutional neural network which is trained by the feature extraction of the image; mapping the character candidate regions to the feature map respectively to obtain feature regions corresponding to the character candidate regions respectively; mapping the characteristic regions respectively corresponding to the character candidate regions into characteristic vectors with fixed sizes; and inputting the feature vectors with fixed sizes corresponding to the character candidate regions into a second convolutional neural network, and calculating to obtain character probabilities corresponding to the character candidate regions and mask maps of the character regions, wherein the second convolutional neural network is a convolutional neural network which is trained by the character probabilities and the mask maps.

According to a second aspect of the embodiments of the present disclosure, there is provided a character detection apparatus including: and the region extraction module is configured to extract character candidate regions from the image to obtain a plurality of character candidate regions. And the first calculation module is configured to calculate the character probabilities of the plurality of character candidate regions obtained by the region extraction module, and calculate a mask map of the character regions of the character candidate regions with the character probabilities meeting the probability threshold requirement. And the second calculation module is configured to calculate the minimum external boundary of the mask image calculated by the first calculation module to obtain a character detection result.

According to a possible implementation manner of the second aspect of the embodiments of the present disclosure, the first calculation module is configured to calculate the word probabilities of the plurality of word candidate regions and a mask map of the word regions of the word candidate regions whose word probabilities meet a probability threshold requirement by using a convolutional neural network.

According to a possible implementation of the second aspect of the embodiments of the present disclosure, the first calculation module comprises: a probability calculation sub-module configured to calculate a text probability of the plurality of text candidate regions. And the probability screening submodule is configured to screen out a character candidate region of which the character probability meets the requirement of a probability threshold from the plurality of character candidate regions. And the mask image calculation submodule is configured to calculate the mask image of the character region of the screened character candidate region.

According to a possible implementation of the second aspect of the embodiments of the present disclosure, the first calculation module comprises: and the probability and mask map calculation submodule is configured to calculate the character probabilities of the plurality of character candidate regions and the mask maps of the character regions. And the probability and mask image screening submodule is configured to screen out a mask image of a character candidate region with a probability meeting the requirement of a probability threshold from the mask images of the character regions of the plurality of character candidate regions.

According to a possible implementation manner of the second aspect of the embodiments of the present disclosure, the probability and mask map calculation sub-module includes: and the first convolution calculation sub-module is configured to perform feature extraction on the image by using a first convolution neural network to obtain a feature map, wherein the first convolution neural network is a convolution neural network which has completed image feature extraction training. And the feature region mapping submodule is configured to map the plurality of character candidate regions into the feature map respectively to obtain feature regions corresponding to the plurality of character candidate regions respectively. And the vector mapping submodule is configured to map the feature regions corresponding to the plurality of character candidate regions into feature vectors with fixed sizes. And the second convolution calculation submodule is configured to input the feature vectors with fixed sizes corresponding to the plurality of character candidate regions into a second convolution neural network, and calculate character probabilities corresponding to the plurality of character candidate regions and mask maps of the character regions, wherein the second convolution neural network is a convolution neural network which is trained by the character probabilities and the mask maps.

According to a third aspect of the embodiments of the present disclosure, there is provided a character detection apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: extracting character candidate regions from the image to obtain a plurality of character candidate regions; calculating character probabilities of the character candidate regions, and calculating mask patterns of the character regions of the character candidate regions with the character probabilities meeting the requirement of a probability threshold; and calculating the minimum external boundary of the mask image to obtain a character detection result.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method as described in any of the possible implementations of the first aspect of embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: according to the character detection method, the plurality of character areas are extracted in a character candidate area extraction mode, the mask image of the character areas of the character candidate area with the character probability meeting the probability threshold requirement is calculated, the mask image shields the original positions of the character areas, and therefore, whether the characters are inclined or not, the character detection result obtained according to the minimum external boundary of the mask image can accurately correspond to the positions of the characters, therefore, the character detection method provided by the embodiment of the disclosure can be used for characters with various types, and the detection rate of the characters is effectively improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a text detection method according to an example embodiment.

FIG. 2 is a diagram illustrating an image according to an exemplary embodiment.

FIG. 3 is a flow diagram illustrating a text detection method according to another exemplary embodiment.

FIG. 4 is a block diagram illustrating a text detection apparatus according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating a text detection apparatus according to another exemplary embodiment.

Fig. 6 is a block diagram illustrating a text detection apparatus according to yet another exemplary embodiment.

FIG. 7 is a block diagram illustrating a text detection apparatus according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flow chart illustrating a text detection method according to an exemplary embodiment, which may include the following steps, as shown in fig. 1:

in step 110, a plurality of candidate text regions are obtained by extracting candidate text regions from the image.

For example, the number n of text candidate regions may be several hundreds to several thousands. Common candidate region extraction methods include Selective Search, RPN, and the like. Each candidate region is represented by parameters such as height and width of horizontal and vertical coordinates.

In step 120, the text probabilities of the text candidate regions are calculated, and a mask map of the text region of the text candidate region whose text probability satisfies the probability threshold requirement is calculated.

For example, in one possible implementation, a convolutional neural network may be used to calculate the text probabilities of the text candidate regions and a mask map of the text regions of the text candidate regions whose text probabilities meet a probability threshold requirement. In the embodiment, the character detection with high speed and high precision can be realized by utilizing the advantages of the convolutional neural network.

It should be noted that the present disclosure does not limit the calculation order of the text probability and the mask map.

For example, in one possible embodiment, the character probabilities of the plurality of character candidate regions may be calculated first, character candidate regions with character probabilities satisfying the probability threshold requirement are screened out from the plurality of character candidate regions, and the mask map of the character regions may be calculated for the screened character candidate regions. The amount of computation of the mask map can be reduced by this embodiment.

For another example, in another possible embodiment, the calculation order of the character probabilities of the plurality of character candidate regions and the mask map is not limited, and after the mask maps of the character regions of the plurality of character candidate regions are calculated, the mask map of the character candidate region with a probability satisfying the probability threshold requirement is screened out from the mask maps of the character regions of the plurality of character candidate regions. By the implementation method, the character probability and the mask graph can be calculated at the same time, and the detection speed is high.

In step 130, the minimum external boundary of the mask map is calculated to obtain a text detection result.

For example, the minimum bounding rectangle of the mask map may be computed. As shown in the image schematic diagram of fig. 2, the minimum circumscribed rectangle 210 of the mask image corresponding to the STOP character region is the character detection result, which accurately frames the position of the character.

Therefore, the method for detecting the characters in the character candidate area extracts the character areas in a character candidate area extraction mode, calculates the mask image of the character areas of the character candidate area with the character probability meeting the requirement of the probability threshold, and the mask image shields the original positions of the character areas, so that whether the characters are inclined or not, the character detection result obtained according to the minimum external boundary of the mask image can accurately correspond to the positions of the characters, and therefore the method for detecting the characters provided by the embodiment of the disclosure can deal with the characters with various types of typesetting, and the detection rate of the characters is effectively improved.

Fig. 3 is a flow chart illustrating a text detection method according to another exemplary embodiment, which may include the following steps, as shown in fig. 3:

in step 310, a plurality of candidate text regions are obtained by extracting candidate text regions from the image.

In step 320, feature extraction is performed on the image by using a first convolutional neural network to obtain a feature map, where the first convolutional neural network is a convolutional neural network that has completed image feature extraction training.

In step 330, the plurality of character candidate regions are mapped to the feature map, so as to obtain feature regions corresponding to the plurality of character candidate regions.

The feature map, i.e. the feature matrix, is used to describe high-level semantic information of the image, such as what is inside the image, where it is, etc.

For example, for a text candidate region r ═ (x, y, w, h), it is mapped to featuresFIG. F_cR is at F_cOf (2) corresponding feature region r_c＝(x_c,y_c,w_c,h_c)＝(s_c*x,s_c*y,s_c*w,s_c*h)，s_cIs the scaling factor of the input image size to its feature map size.

In step 340, feature regions corresponding to the text candidate regions are mapped to feature vectors with fixed sizes.

For example, the characteristic region r may be aligned_cMaximum pooling is performed and mapped to a fixed-length feature vector f_c。

In step 350, the feature vectors with fixed sizes corresponding to the plurality of character candidate regions are input to a second convolutional neural network, and character probabilities corresponding to the plurality of character candidate regions and mask maps of the character regions are obtained through calculation, wherein the second convolutional neural network is a convolutional neural network which has completed training of the character probabilities and the mask maps.

In step 360, from the mask maps of the text regions in the text candidate regions, the mask map of the text candidate region with a probability satisfying the probability threshold requirement is screened out.

For example, threshold filtering and non-maximum suppression are performed on the character probabilities of the plurality of character candidate regions by using a preset probability threshold, and the mask map of the reserved character candidate regions is the mask map of the screened character candidate regions.

In step 370, the minimum bounding rectangle of the mask map is calculated to obtain the text detection result.

Therefore, in the embodiment, the convolution neural network is used for simultaneously calculating the character probability and the mask image of the character area, and the character detection result is obtained according to the minimum external rectangle of the mask image, so that the position of the character can be quickly and accurately detected by using the rectangle.

Fig. 4 is a block diagram illustrating a text detection apparatus 400 according to an exemplary embodiment, where, as shown in fig. 4, the text detection apparatus 400 may include: a region extraction module 410, a first calculation module 420 and a second calculation module 430.

The region extraction module 410 may be configured to perform text candidate region extraction on the image, resulting in a plurality of text candidate regions.

The first calculating module 420 may be configured to calculate text probabilities of the text candidate regions obtained by the region extracting module 410, and calculate a mask map of text regions of the text candidate regions whose text probabilities satisfy a probability threshold requirement.

The second calculating module 430 may be configured to calculate a minimum bounding edge of the mask map calculated by the first calculating module 420, so as to obtain a text detection result.

It can be seen that, in the present disclosure, the region extraction module 410 extracts a plurality of text regions in a text candidate region extraction manner, the first calculation module 420 calculates a mask map of a text region of the text candidate region whose text probability meets the requirement of the probability threshold, and the mask map shields the text region in situ, so that, no matter whether the text is inclined, the second calculation module 430 obtains a text detection result according to the minimum external boundary of the mask map, which can accurately correspond to the position of the text, and therefore, the text detection method provided in the embodiment of the present disclosure can cope with various typesetted texts, and effectively improves the detection rate of the text.

In one possible implementation, the first calculation module 420 may be configured to calculate the word probabilities of the plurality of word candidate regions and a mask map of the word regions of the word candidate regions whose word probabilities satisfy a probability threshold requirement by using a convolutional neural network. In the embodiment, the character detection with high speed and high precision can be realized by utilizing the advantages of the convolutional neural network.

For example, as shown in fig. 5, which is a block diagram of a text detection apparatus 500 according to another exemplary embodiment, the first calculation module 420 may include: a probability calculation sub-module 421, the probability screening sub-module 422, and a mask map calculation sub-module 423.

The probability calculation submodule 421 may be configured to calculate a text probability of the plurality of text candidate regions.

The probability screening sub-module 422 may be configured to screen out a text candidate region from the plurality of text candidate regions, where the text probability satisfies a probability threshold requirement;

the mask map calculation sub-module 423 may be configured to perform mask map calculation of the text region for the screened text candidate region.

In this embodiment, the mask map of the character region is calculated for the selected character candidate region, so that the amount of calculation of the mask map can be reduced.

For another example, as shown in fig. 6, which is a block diagram of a text detection apparatus 600 according to another exemplary embodiment, the first calculation module 420 may include: a probability and masking map calculation submodule 424 and a probability and masking map filtering submodule 425.

The probability and mask map calculation submodule 424 may be configured to calculate the text probabilities of the text candidate regions and the mask maps of the text regions.

The probability and mask map filtering sub-module 425 may be configured to filter out a mask map of a text candidate region having a probability that meets a probability threshold requirement from mask maps of text regions of the plurality of text candidate regions.

By the implementation method, the character probability and the mask graph can be calculated at the same time, and the detection speed is high.

In an embodiment incorporating a convolutional neural network, the probability and mask map calculation sub-module 424 may include: the device comprises a first convolution calculation submodule, a characteristic area mapping submodule, a vector mapping submodule and a second convolution calculation submodule.

The first convolution calculation sub-module may be configured to perform feature extraction on the image by using a first convolution neural network to obtain a feature map, where the first convolution neural network is a convolution neural network that has completed training of image feature extraction.

The feature region mapping sub-module may be configured to map the plurality of character candidate regions into the feature map, respectively, to obtain feature regions corresponding to the plurality of character candidate regions, respectively.

The vector mapping sub-module may be configured to map feature regions corresponding to the plurality of text candidate regions respectively into a feature vector of a fixed size.

The second convolution calculation sub-module may be configured to input the feature vectors with fixed sizes corresponding to the plurality of character candidate regions to a second convolution neural network, and calculate character probabilities corresponding to the plurality of character candidate regions and mask maps of the character regions, where the second convolution neural network is a convolution neural network in which training of the character probabilities and the mask maps is completed.

Therefore, in the embodiment, the convolutional neural network is used for simultaneously calculating the character probability and the mask map of the character area, so that the position of the character can be quickly and accurately detected.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the text detection method provided by the present disclosure.

Fig. 7 is a block diagram illustrating a text detection apparatus 700 according to an example embodiment. For example, the apparatus 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 7, apparatus 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.

The processing component 702 generally controls overall operation of the device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to perform all or a portion of the steps of the text detection method described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operations at the apparatus 700. Examples of such data include instructions for any application or method operating on device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 706 provides power to the various components of the device 700. The power components 706 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the apparatus 700.

The multimedia component 708 includes a screen that provides an output interface between the device 700 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 700 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 710 is configured to output and/or input audio signals. For example, audio component 710 includes a Microphone (MIC) configured to receive external audio signals when apparatus 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, audio component 710 also includes a speaker for outputting audio signals.

The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects of the apparatus 700. For example, sensor assembly 714 may detect an open/closed state of device 700, the relative positioning of components, such as a display and keypad of device 700, sensor assembly 714 may also detect a change in position of device 700 or a component of device 700, the presence or absence of user contact with device 700, orientation or acceleration/deceleration of device 700, and a change in temperature of device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate wired or wireless communication between the apparatus 700 and other devices. The apparatus 700 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 716 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described text detection method.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the device 700 to perform the above-described text detection method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A character detection method is used for detecting characters in various typesets, and comprises the following steps:

extracting character candidate regions from the image to obtain a plurality of character candidate regions;

calculating character probabilities of the character candidate regions, and calculating mask patterns of the character regions of the character candidate regions with the character probabilities meeting the requirement of a probability threshold;

calculating the minimum external boundary of the mask image to obtain a character detection result;

the calculating the mask map of the character region of the character candidate region with the character probability meeting the requirement of the probability threshold comprises the following steps:

screening out character candidate regions with character probabilities meeting the requirement of a probability threshold from the character candidate regions;

calculating a mask image of the character region of the screened character candidate region; or

calculating mask patterns of character areas of the character candidate areas;

and screening out the mask images of the character candidate regions with the probability meeting the requirement of the probability threshold from the mask images of the character regions of the character candidate regions.

2. The text detection method of claim 1, wherein the calculating text probabilities for the text candidate regions and the calculating a mask map of text regions for text candidate regions for which text probabilities satisfy a probability threshold requirement comprises:

and calculating the character probabilities of the character candidate regions and the mask image of the character region of the character candidate region with the character probability meeting the probability threshold requirement by using a convolutional neural network.

3. The text detection method of claim 1, wherein said calculating a mask map of text regions of said plurality of text candidate regions comprises:

performing feature extraction on the image by using a first convolutional neural network to obtain a feature map, wherein the first convolutional neural network is a convolutional neural network which is trained by the feature extraction of the image;

mapping the character candidate regions to the feature map respectively to obtain feature regions corresponding to the character candidate regions respectively;

mapping the characteristic regions respectively corresponding to the character candidate regions into characteristic vectors with fixed sizes;

and inputting the feature vectors with fixed sizes corresponding to the character candidate regions into a second convolutional neural network, and calculating to obtain character probabilities corresponding to the character candidate regions and mask maps of the character regions, wherein the second convolutional neural network is a convolutional neural network which is trained by the character probabilities and the mask maps.

4. A character detection device, which is used for detecting characters of various typesets, comprises:

the region extraction module is configured to extract character candidate regions from the image to obtain a plurality of character candidate regions;

the first calculation module is configured to calculate the character probabilities of the character candidate regions obtained by the region extraction module, and calculate mask maps of the character regions of the character candidate regions with the character probabilities meeting the probability threshold requirement;

the second calculation module is configured to calculate the minimum external boundary of the mask image calculated by the first calculation module to obtain a character detection result;

the first computing module includes:

a probability calculation sub-module configured to calculate text probabilities of the plurality of text candidate regions;

a probability screening submodule configured to screen out a character candidate region from the plurality of character candidate regions, wherein the character probability of the character candidate region meets the requirement of a probability threshold;

the mask image calculation submodule is configured to calculate the mask image of the character region of the screened character candidate region; or

The first computing module includes:

a probability and mask map calculation submodule configured to calculate the character probabilities of the plurality of character candidate regions and mask maps of the character regions thereof;

and the probability and mask image screening submodule is configured to screen out a mask image of a character candidate region with a probability meeting the requirement of a probability threshold from the mask images of the character regions of the plurality of character candidate regions.

5. The text detection device of claim 4, wherein the first calculation module is configured to calculate the text probabilities of the text candidate regions and a mask map of the text regions of the text candidate regions whose text probabilities satisfy a probability threshold requirement by using a convolutional neural network.

6. The text detection device of claim 4, wherein the probability and mask map calculation sub-module comprises:

the first convolution calculation submodule is configured to utilize a first convolution neural network to perform feature extraction on the image to obtain a feature map, wherein the first convolution neural network is a convolution neural network which is trained by image feature extraction;

a feature region mapping sub-module configured to map the plurality of character candidate regions to the feature map, respectively, to obtain feature regions corresponding to the plurality of character candidate regions, respectively;

a vector mapping sub-module configured to map feature regions corresponding to the plurality of character candidate regions respectively into feature vectors of a fixed size;

and the second convolution calculation submodule is configured to input the feature vectors with fixed sizes corresponding to the plurality of character candidate regions into a second convolution neural network, and calculate character probabilities corresponding to the plurality of character candidate regions and mask maps of the character regions, wherein the second convolution neural network is a convolution neural network which is trained by the character probabilities and the mask maps.

7. A character detection device, which is used for detecting characters of various typesets, comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

calculating mask patterns of character areas of the character candidate areas;

8. A computer-readable storage medium, on which computer program instructions are stored, which program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 3.