CN111753836A

CN111753836A - Character recognition method and device, computer readable medium and electronic equipment

Info

Publication number: CN111753836A
Application number: CN201910797941.5A
Authority: CN
Inventors: 李小利; 汤海萍
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2020-10-09

Abstract

The embodiment of the disclosure provides a character recognition method, a character recognition device, a computer readable medium and electronic equipment, and relates to the technical field of image processing. The character recognition method comprises the following steps: identifying a background image of an image to be identified to obtain a mask image; identifying characters contained in the image to be identified according to the mask image and the image to be identified; and outputting characters contained in the image to be recognized as computer readable characters. According to the technical scheme of the embodiment of the disclosure, the characters contained in the image to be recognized are recognized by combining with the characteristics of the background image, so that the recognition accuracy of characters is improved.

Description

Character recognition method and device, computer readable medium and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a text recognition method, a text recognition apparatus, a computer-readable medium, and an electronic device.

Background

Optical Character Recognition (OCR) refers to a process in which electronic devices (such as scanners and digital cameras) convert characters on paper or other printed matters into image information by scanning and other optical input methods, and translate the image into computer characters by using a character recognition method.

With the continuous development of the deep learning technology, the accuracy of character recognition is greatly improved after the deep learning technology is adopted in the field of character recognition. The existing character recognition methods are mainly divided into two types, one is to perform training by extracting the features of an original image, so that the trained model recognizes characters contained in the image. The method does not distinguish the foreground and the background of the image, and if lines and other information which are easy to be confused with the text appear in the background, the identification accuracy is greatly reduced. Yet another method identifies the text contained in the foreground region by pre-processing the original image to leave only the foreground region of the image. In this way, the foreground and the background in the original image are not accurately positioned, and the acquired foreground region carries more noise information, which results in lower identification accuracy.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a character recognition method, a character recognition apparatus, a computer-readable medium, and an electronic device, so as to overcome the problem of low accuracy of character recognition at least to a certain extent.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the embodiments of the present disclosure, there is provided a character recognition method, including: identifying a background image of an image to be identified to obtain mask characteristics; identifying characters contained in the image to be identified according to the mask features and the image to be identified; and outputting characters contained in the image to be recognized as computer readable characters.

In an exemplary embodiment of the present disclosure, the identifying a background image of an image to be identified and obtaining a mask feature includes: and determining the probability that the pixel points of the image to be identified belong to the background image to obtain a first characteristic value, wherein the mask characteristic comprises the first characteristic value.

In an exemplary embodiment of the present disclosure, the identifying a background image of an image to be identified and obtaining a mask feature includes: and carrying out binarization on the image to be identified according to the first characteristic value, and determining the image to be identified after binarization as a mask characteristic.

In an exemplary embodiment of the present disclosure, the identifying, by combining the mask feature and the image to be identified, a character included in the image to be identified includes: determining a first weight of the mask feature, and determining a second weight of the image to be identified, wherein the first weight is smaller than the second weight; and training a deep learning model based on the first weight and the second weight so as to identify characters contained in the image to be identified by utilizing the trained deep learning model.

In an exemplary embodiment of the present disclosure, the identifying, by combining the mask feature and the image to be identified, a character included in the image to be identified includes: performing first convolution processing on the image to be identified to obtain a feature map after the convolution processing; and performing second convolution processing on the mask feature and the feature map based on the first weight and the second weight so as to determine characters contained in the image to be recognized.

In an exemplary embodiment of the present disclosure, the determining a probability that a pixel point of the image to be recognized belongs to a background image includes: and processing the image to be recognized through an image segmentation algorithm to determine the probability that each pixel point of the image to be recognized belongs to the background image.

In an exemplary embodiment of the present disclosure, the mask image is in accordance with the size of the image to be recognized.

According to a second aspect of the embodiments of the present disclosure, there is provided a character recognition apparatus including:

the image segmentation unit is used for identifying a background image of the image to be identified and obtaining mask characteristics; the characteristic combination unit is used for identifying characters contained in the image to be identified according to the mask characteristic and the image to be identified; and the text output unit is used for outputting characters contained in the image to be recognized as computer readable characters.

According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements the character recognition method as described in the first aspect of the embodiments above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of text recognition as described in the first aspect of the embodiments above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the technical solutions provided by some embodiments of the present disclosure, on one hand, the mask feature is obtained by identifying the background image of the image to be recognized, which is convenient for determining the influence of the background image, thereby reducing the interference of the background image, and making the feature characterization force of the foreground portion including the characters in the image to be recognized stronger and easier to recognize. On the other hand, when the characters contained in the image to be recognized are recognized, the mask features and the image to be recognized are combined, so that the background image can play a reference role, and the recognition accuracy is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 schematically illustrates a system architecture diagram for implementing the text recognition method of an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a text recognition method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of a text recognition method according to another embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow diagram of a text recognition method according to yet another embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of a text recognition apparatus according to an embodiment of the present disclosure;

fig. 6 shows a schematic structural diagram of an electronic device suitable for implementing the text recognition method of the embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The inventor finds that the method of directly extracting image features based on an original image to perform character recognition can be used for viewing the foreground and the background in the image at the same time, so that the background plays the same important role as the foreground in the extracted features, and particularly, when lines and other elements similar to the foreground appear in the background, the recognition accuracy is greatly reduced; in the mode of character recognition based on foreground extraction features, background information is completely removed, the background does not participate in feature recognition at all, but due to inaccurate extraction when the foreground is extracted, noise is introduced, and the recognition accuracy is not high.

Based on this, in the exemplary embodiment of the present disclosure, a system architecture for implementing the text recognition method is first provided. Referring to fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send request instructions or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for shopping-like websites browsed by users using the

terminal devices

101, 102, 103. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the character recognition method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the report generation apparatus is generally disposed in the terminal device 101.

Based on the system architecture 100, an exemplary embodiment of the present disclosure provides a text recognition method, which may include the following steps as shown in fig. 2:

step S210, identifying a background image in the image to be identified, and obtaining mask characteristics;

step S220, combining the mask features and the image to be recognized, and recognizing characters contained in the image to be recognized;

step S230, outputting characters included in the image to be recognized as computer readable characters.

In the technical solution provided in the embodiment shown in fig. 2, on one hand, the influence of the background image can be conveniently determined by identifying the background image of the image to be recognized and obtaining the mask feature, so that the interference of the background image is reduced, and the feature characterization force of the foreground part containing the characters in the image to be recognized is stronger and easier to recognize. On the other hand, when the characters contained in the image to be recognized are recognized, the mask features and the image to be recognized are combined, so that the background image can play a reference role, and the recognition accuracy is improved.

The following describes in detail the specific implementation of the various steps in the example illustrated in fig. 2.

First, with respect to step S210, a background image in the image to be recognized is recognized to obtain a mask feature.

The image to be recognized may include image data acquired by an optical device, such as a scanner, a digital camera, or the like. Or, the video data is processed to obtain an image to be identified, for example, the video data is converted by an MATLAB tool to obtain a multi-frame image. Of course, the image to be recognized may also include data acquired in other ways, such as an image designed by an image production tool, and the like.

The image to be recognized may include various objects, such as a person, a scene, characters, etc., where the characters may be objects to be recognized, the region containing the characters may be a foreground region, and the region other than the characters may be a background region. The characters contained in the recognized image can be converted into editable texts, so that the operation of people is facilitated. Characters may include letters, numbers, words, and may include other shapes having a particular meaning.

The background image in the image to be identified can be identified through the machine learning model, so that the mask characteristic is obtained. For example, an image to be recognized is input into an R-FCN (Region-based, full volumetric network based on Region, which can be used for object detection) model, so as to obtain a background image of the image to be recognized. Or, the edge detection can be performed on the image to be recognized, and the character outline in the image to be recognized is detected, so that the characters are positioned and separated, and the background image of the image to be recognized is obtained. The separated background image may be a mask feature. Or, covering the identified image of the foreground part, wherein the covered image is the mask feature.

Optionally, the mask feature may be obtained by: and determining the probability of each pixel point of the image to be identified belonging to the background image as a first characteristic value, wherein the first characteristic value is the mask characteristic.

The image to be recognized is input into a machine learning model, and the model can recognize each pixel point in the image to be recognized and predict the probability of each pixel point belonging to the background image. During model training, a large number of sample images can be obtained, foreground parts and background parts in the sample images are respectively marked, and the marked sample images are adopted for model training, so that the trained model can identify foreground images and background images in the images. When the model identifies the background image in the image to be identified, each pixel point in the image to be identified is analyzed, the probability that the pixel point belongs to the background image is predicted, and then the prediction result is output.

For example, the image to be recognized may be processed through an image segmentation algorithm, so as to determine the probability that each pixel point of the image to be recognized belongs to the background image. The image segmentation algorithm may include FCN (full convolutional Networks), Mask R-CNN (Mask Region-CNN, which may be used for object detection and Mask segmentation), and other methods, such as Mask TextSpotter (Mask text detection, which may be used for detecting and recognizing text in any shape).

The first characteristic value can refer to the probability that pixel points in the image to be recognized belong to the background image, and after the probability that each pixel point in the image to be recognized belongs to the background image is determined, the mask characteristic can be obtained. That is to say, the mask feature includes a plurality of first feature values, and each first feature value represents the probability that a pixel point in the image to be recognized is the background image.

In an exemplary embodiment, the mask features may include a feature image. Specifically, the pixel points of the image to be recognized correspond to the first characteristic values, that is, each pixel point corresponds to one first characteristic value. The first feature value may be considered as a pixel value of a pixel point, thereby converting the mask feature into a mask image. The pixel points of the mask image correspond to the pixel points of the image to be identified, and the pixel values of the pixel points of the mask image are the probability that the pixel points at the corresponding positions in the image to be identified belong to the background image. The probability may be a number between 0 and 1, for example, 0, 1, 0.5, and the like, so that the pixel value of each pixel point of the mask image is a number between 0 and 1, and the pixel value of each pixel point of the image to be recognized may include values of three dimensions of RGB.

Optionally, the first feature value may be used to binarize the image to be recognized to obtain a binarized mask image, that is, the first feature value is converted into 0 or 255. If the first characteristic value is larger than or equal to a preset threshold value, converting the first characteristic value into 255; and if the first characteristic value is smaller than a preset threshold value, converting the first characteristic value into 0.

In summary, the mask feature may include a probability that each pixel point in the image to be recognized belongs to the background image. Further, the background image in the image to be recognized is determined according to the probability that each pixel belongs to the background image, and the mask features may also include the background image. Moreover, the image to be recognized can be binarized according to the probability that each pixel point belongs to the background image, so that a binarized image is obtained, and the mask features can also comprise the binarized image. For example, the pixel point with the probability exceeding 0.5 is set as 255, and the pixel point with the probability not exceeding 0.5 is set as 0, so that the mask feature with the same size as the image to be recognized is obtained. Therefore, the size of the mask image can be kept consistent with the image to be recognized. Alternatively, the mask image may only include probability data of a part of the pixel points in the image to be recognized. For example, the image to be recognized is divided into a foreground image and a background image according to the probability that the pixel points in the image to be recognized belong to the background image, and the mask feature is obtained by only taking the probability of the pixel points belonging to the background image as a first probability value.

Then, in step S220, the characters included in the image to be recognized are recognized in combination with the mask features and the image to be recognized.

In the present exemplary embodiment, the characters contained in the image to be recognized can be recognized by the machine learning model. And simultaneously inputting the mask features and the image features of the image to be recognized into the trained machine learning model to obtain characters contained in the image to be recognized. In the training process of the model, sample images may be obtained first, the processing described in step S210 is performed on each sample image, the mask feature of each sample image is determined, and then the RGB pixel values of the sample image and the feature values included in the mask feature are used as the input of the model. That is to say, the sample image may contain three-channel features, and the mask features may be single-channel features, so as to form four-channel features, increase the observable feature dimension of the machine learning model, and improve the identification accuracy of the machine learning model.

Different weights can be set for the image features and the mask features of the image to be recognized, the image features of the image to be recognized are enhanced, and the mask features are weakened.

As shown in fig. 3, step S220 may specifically include the following steps:

step S301, determining a first weight of the mask feature, and determining a second weight of the image to be identified, wherein the first weight is smaller than the second weight;

step S302, training a deep learning model based on the first weight and the second weight, so as to identify characters contained in the image to be identified by using the trained deep learning model.

In the present exemplary embodiment, the deep learning model is trained by the sample image, and the image features and the mask features of the sample image are first acquired. When the deep learning model is trained, the mask features of the sample image are used as the input of the model, the first input is mapped to the next layer by the first weight, the image features of the sample image are used as the other input and are mapped to the next layer by the second weight, so that when the deep learning model extracts the image features, the background features of the sample image can be learned on the basis of learning the complete image features of the sample image, the characters contained in the image can be better distinguished, and the recognition accuracy is improved.

Wherein the first weight is less than the second weight. In the training process of the deep learning model, the first weight and the second weight can be subjected to value taking for multiple times respectively, and the optimal value of the first weight and the optimal value of the second weight are determined through the recognition effect of the model on the characters.

Illustratively, the deep learning model may be trained using a convolutional neural network algorithm. As shown in fig. 4, step S220 may specifically include the following steps:

step S401, performing first convolution processing on the image to be identified to obtain a feature map after the convolution processing;

step S402, based on the first weight and the second weight, performing second convolution processing on the mask feature and the feature map to determine characters contained in the image to be recognized.

The mask features and the image features may be subjected to convolution processing after adding corresponding weights respectively to determine characters included in the image to be recognized. Specifically, first convolution processing is performed on image features of an image to be recognized, a plurality of feature maps of the image features are obtained after the first convolution processing, and then second convolution processing is performed on the mask features and the plurality of feature maps. In the second convolution process, the mask feature and the feature map obtained by the first convolution are multiplied by their weights, i.e., the first weight and the second weight, respectively. And after the second convolution processing, obtaining the feature representation of the mask feature and the feature diagram, and performing convolution again on the feature representation to obtain the feature of the character finally after multiple times of convolution. For example, the image to be recognized may be subjected to convolution processing by the machine learning model to obtain characters included in the image to be recognized. The image to be recognized is input into the machine learning model, the model can be a convolutional neural network model, so that after the image to be recognized is input into the model, the image to be recognized can be subjected to convolution for multiple times, a plurality of feature maps obtained by a previous convolution layer and mask features jointly enter a next convolution layer for convolution in one layer of convolution, the original image features of the image to be recognized are preferentially learned, and the mask features can play a role in enhancing or weakening the preferentially learned image features. Because the characteristic value in the mask characteristic is the probability that the pixel point belongs to the foreground image, the characteristic value has an enhancing effect on the pixel characteristic with higher probability, and the pixel characteristic with lower probability has a weakening effect, that is, the characteristic force of the foreground characteristic is enhanced, and the characteristic force of the background characteristic is weakened. Through contrast of the foreground and the background, learning of the model has an emphasis point, characteristics of characters in the foreground image can be observed rapidly, and therefore the characters can be recognized.

In an exemplary embodiment, the machine learning model may include an input layer, a convolutional layer, an activation layer, a pooling layer, and an output layer. The mask features and the image features of the image to be recognized are input into the input layer together, and the image features of the image to be recognized can be rgb three-channel input, so that four-channel input is formed by the mask features and the image features. When the convolution layer convolves the input features, the image features of the image to be recognized are convolved with a first weight and the mask features are convolved with a second weight, for example, the output of the convolution layer may be y ═ k₁f(x)₁+k₂f(x)₂Wherein k is₁Is a first weight, f (x)₁For image features of the image to be recognized, k₂Is a second weight, f (x)₂Are mask features. And obtaining a final convolution output result through a plurality of convolution layers. Moreover, the input features can be processed by an activation function, and the activation function can comprise a ReLU function or can customize other functions. The pooling layer may compress the features, extracting the main features, such that the features input to the output layer are reduced. The output layer may comprise a fully connected layer, i.e. the sum output of all features input to that layer. The machine learning model can obtain characters contained in the image to be recognized, and the image to be recognized can contain a plurality of characters, namely character strings. Therefore, after the image to be recognized is recognized and the contained character is found, the sequence of the character can be recognized by utilizing an RNN (Recurrent neural Network) model, and a correct character string is determined. Alternatively, the character string may be identified by other algorithms, such as CTC (connection Temporal Classification), which can align a sequence of a plurality of characters.

In step S230, the characters included in the image to be recognized are output as computer-readable characters.

After the characters contained in the image to be recognized are recognized, the characters can be output as editable text, such as doc file, txt file, etc., or can be output as computer-readable characters in other forms, such as providing an interface, and displaying the characters contained in the image to be recognized in a text box of the interface. The computer readable text is different from the image and the user can perform editing operations on the computer readable text, such as copying, adding, etc. Thereby enabling the information contained in the image to be used more flexibly.

Further, an exemplary embodiment of the present disclosure further provides a text recognition apparatus, which is used to implement the text recognition method. As shown in fig. 5, the word recognition apparatus 500 may include an image segmentation unit 510, a feature combination unit 520, and a text output unit 530. Specifically, the method comprises the following steps:

an image segmentation unit 510, configured to identify a background image in the image to be identified, and obtain a mask feature; a feature combining unit 520, configured to combine the mask feature with the image to be recognized, and recognize characters included in the image to be recognized; a text output unit 530, configured to output characters included in the image to be recognized as computer-readable characters.

In an exemplary embodiment of the present disclosure, the image segmentation unit 510 may be further configured to: and determining the probability that the pixel points of the image to be identified belong to the background image to obtain a first characteristic value, wherein the mask characteristic comprises the first characteristic value.

In an exemplary embodiment of the present disclosure, the image segmentation unit 510 may be further configured to: and carrying out binarization on the image to be identified according to the first characteristic value, and determining the image to be identified after binarization as a mask characteristic.

In an exemplary embodiment of the present disclosure, the feature combining unit 520 may include: the weight determining unit is used for determining a first weight of the mask feature and determining a second weight of the image to be identified, wherein the first weight is smaller than the second weight; and the model training unit is used for training a deep learning model based on the first weight and the second weight so as to identify the characters contained in the image to be identified by using the trained deep learning model.

In an exemplary embodiment of the present disclosure, the feature combining unit 520 may further include: the first convolution processing unit is used for performing first convolution processing on the image to be identified to obtain a feature map after the convolution processing; and the second convolution processing unit is used for performing second convolution processing on the mask feature and the feature map based on the first weight and the second weight so as to determine characters contained in the image to be recognized.

In an exemplary embodiment of the present disclosure, the image segmentation unit 510 is further configured to: and processing the image to be recognized through an image segmentation algorithm to determine the probability that each pixel point of the image to be recognized belongs to the background image.

The details of each module/unit in the above-mentioned apparatus have been described in detail in the embodiments of the method section, and thus are not described again.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.

The program product may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method.

An electronic device 600 provided by an embodiment of the present disclosure is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.

As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, and a bus 630 that couples the various system components including the memory unit 620 and the processing unit 610.

Wherein the storage unit stores program code that is executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present disclosure as described in the above section "exemplary methods" of this specification. For example, the processing unit 610 may perform the following as shown in fig. 2: step S210, identifying a background image in the image to be identified, and obtaining mask characteristics; step S220, combining the mask features and the image to be recognized, and recognizing characters contained in the image to be recognized; step S230, outputting characters included in the image to be recognized as computer readable characters.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. As shown, the network adapter 660 communicates with the other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for recognizing a character, comprising:

identifying a background image in an image to be identified to obtain mask characteristics;

combining the mask features and the image to be recognized, and recognizing characters contained in the image to be recognized;

and outputting characters contained in the image to be recognized as computer readable characters.

2. The method according to claim 1, wherein the identifying a background image of the image to be identified and obtaining mask features comprises:

and determining the probability that the pixel points of the image to be identified belong to the background image to obtain a first characteristic value, wherein the mask characteristic comprises the first characteristic value.

3. The method of claim 2, wherein the identifying a background image of the image to be identified and obtaining mask features comprises:

and carrying out binarization on the image to be identified according to the first characteristic value, and determining the image to be identified after binarization as a mask characteristic.

4. The method of claim 1, wherein the identifying the character contained in the image to be identified by combining the mask feature with the image to be identified comprises:

determining a first weight of the mask feature, and determining a second weight of the image to be identified, wherein the first weight is smaller than the second weight;

and training a deep learning model based on the first weight and the second weight so as to identify characters contained in the image to be identified by utilizing the trained deep learning model.

5. The method of claim 4, wherein the identifying the characters contained in the image to be identified by combining the mask feature with the image to be identified comprises:

performing first convolution processing on the image to be identified to obtain a feature map after the convolution processing;

and performing second convolution processing on the mask feature and the feature map based on the first weight and the second weight so as to determine characters contained in the image to be recognized.

6. The method according to claim 2, wherein the determining the probability that the pixel point of the image to be recognized belongs to the background image comprises:

and processing the image to be recognized through an image segmentation algorithm to determine the probability that each pixel point of the image to be recognized belongs to the background image.

7. The method of claim 3, wherein the mask image is of a size consistent with the image to be identified.

8. A character recognition apparatus, comprising:

the image segmentation unit is used for identifying a background image in the image to be identified and obtaining mask characteristics;

the characteristic combination unit is used for combining the mask characteristic with the image to be recognized and recognizing characters contained in the image to be recognized;

and the text output unit is used for outputting characters contained in the image to be recognized as computer readable characters.

9. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out a method for word recognition according to any one of claims 1 to 7.

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the word recognition method of any one of claims 1 to 7.