CN111368709A

CN111368709A - Picture text recognition method, device and equipment and readable storage medium

Info

Publication number: CN111368709A
Application number: CN202010134748.6A
Authority: CN
Inventors: 章放; 林翠; 杨海军; 金虎光; 徐倩; 杨强; 陈敏
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2020-07-03

Abstract

The invention discloses a method, a device and equipment for identifying picture texts and a readable storage medium, which relate to the field of financial science and technology, and the method comprises the following steps: acquiring a picture to be recognized, inputting the picture to be recognized into a preset picture text recognition model so as to acquire region coordinates of a text region corresponding to the picture to be recognized and acquire text content corresponding to the text region; determining a related text area according to the area coordinate corresponding to the text area and the text content corresponding to the area coordinate; and obtaining a text recognition result containing semantics in the picture to be recognized according to the associated text region. The invention realizes automatic recognition of the text in the picture and improves the recognition efficiency of the text in the picture.

Description

Picture text recognition method, device and equipment and readable storage medium

Technical Field

The invention relates to the technical field of text recognition of financial technology (Fintech), in particular to a method, a device and equipment for recognizing picture texts and a readable storage medium.

Background

With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (Fintech), and the text recognition technology is no exception, but due to the requirements of the financial industry on safety and real-time performance, higher requirements are also put forward on the text recognition technology.

In the field of automobile financial loan, a fund party often needs a borrower to take pictures of various certificate information (such as identity cards, driving licenses, vehicle registration certificates and the like) uploaded by the borrower for checking authenticity, then the pictures are checked by a specially-assigned person, and then further wind control is carried out to finally determine whether to loan or not. The step in which a person specializes in audit checks is time consuming and labor intensive, especially for vehicle registration certificates, where the information is numerous, including but not limited to: a driving license number, a vehicle license number, a frame number, an engine number, an identification number, a registration authority, a license plate number, a unified social credit code, a name, a mortgage owner, and the like. Moreover, because the format of the vehicle registration card is not regular as that of the identity card, a plurality of fields are relatively messy, even serious fuzziness, dislocation and the like occur, and the workload and the working difficulty of the checking personnel are increased; and because the number of the certificate pictures is huge, all the pictures can not be checked sometimes, only a part of the pictures can be checked, and the loan risk is obviously increased.

Therefore, if the text in the picture can be automatically recognized, and then the picture is automatically checked according to the recognized text, the cost for checking the picture of the certificate can be reduced, and the efficiency for checking the picture of the certificate can be improved. Therefore, how to automatically identify texts in pictures is an urgent problem to be solved.

Disclosure of Invention

The invention mainly aims to provide a method, a device and equipment for identifying a picture text and a readable storage medium, and aims to solve the technical problem of how to automatically identify the text in a picture.

In order to achieve the above object, the present invention provides a method for identifying a picture text, wherein the method for identifying a picture text comprises the steps of:

acquiring a picture to be recognized, inputting the picture to be recognized into a preset picture text recognition model so as to acquire region coordinates of a text region corresponding to the picture to be recognized and acquire text content corresponding to the text region;

determining a related text area according to the area coordinate corresponding to the text area and the text content corresponding to the area coordinate;

and obtaining a text recognition result containing semantics in the picture to be recognized according to the associated text region.

Preferably, the step of inputting the picture to be recognized into a preset picture text recognition model to obtain the region coordinates of the text region corresponding to the picture to be recognized includes:

inputting the picture to be recognized into a preset picture text recognition model, and recognizing a text region in the picture to be recognized through a Feature Pyramid Network (FPN) in the picture text recognition model;

and determining the region coordinates of the text region according to the pixel values of the text region in the picture to be identified.

Preferably, the step of identifying the text region in the picture to be identified through the FPN in the picture text identification model includes:

performing feature extraction and feature fusion on the picture to be recognized through the FPN in the picture text recognition model to obtain a first feature map corresponding to the picture to be recognized;

inputting the first feature map into the convolution layer of the FPN to obtain a second feature map corresponding to the picture to be identified;

and determining a text region in the picture to be identified according to text pixel points in the second characteristic diagram.

Preferably, the step of determining the text region in the picture to be recognized according to the text pixel point in the second feature map includes:

determining text pixel points in the second feature map, and determining core pixel points in the second feature map according to the text pixel points;

classifying each pixel point in the second characteristic diagram based on the core pixel point, and determining a text region in the picture to be recognized according to a classification result obtained by classification.

Preferably, the step of obtaining the text content corresponding to the text region includes:

inputting the text region into a network structure of the picture text recognition model to obtain a third feature map corresponding to the text region;

inputting the third feature map into a sequence transformation network corresponding to the network structure to obtain a serialized fourth feature map;

and connecting the full-connection network according to the fourth characteristic diagram and each node in the sequence transformation network to obtain text content corresponding to the text region.

Preferably, before the step of acquiring the picture to be recognized, the method further includes:

obtaining a first sample picture for model training, and labeling the first sample picture to obtain a training sample set consisting of the labeled first sample picture;

and inputting the training sample set into the picture text recognition model to train the picture text recognition model.

Preferably, the step of labeling the first sample picture to obtain a training sample set composed of the labeled first sample picture includes:

marking the first sample picture to obtain a marked first sample picture, and calculating the number of the marked first sample picture;

if the number of the pictures is smaller than the preset number, carrying out picture simulation according to the marked first sample picture to obtain a second sample picture;

and taking the second sample picture and the labeled first sample picture as a training sample set.

Preferably, the step of obtaining a text recognition result containing semantics in the picture to be recognized according to the associated text region further includes:

comparing the text recognition result with prestored certificate information of the borrower to obtain a comparison result;

and if the text recognition result is determined to be consistent with the certificate information according to the comparison result, determining that the certificate corresponding to the picture to be recognized is a real certificate.

In addition, in order to achieve the above object, the present invention further provides an apparatus for recognizing a picture text, including:

the acquisition module is used for acquiring a picture to be identified;

the input module is used for inputting the picture to be recognized into a preset picture text recognition model so as to obtain the region coordinates of the text region corresponding to the picture to be recognized and obtain the text content corresponding to the text region;

the determining module is used for determining the associated text area according to the area coordinate corresponding to the text area and the text content corresponding to the area coordinate;

and the processing module is used for obtaining a text recognition result containing semantics in the picture to be recognized according to the associated text region.

In addition, in order to achieve the above object, the present invention further provides a device for recognizing picture texts, where the device for recognizing picture texts includes a memory, a processor, and a picture text recognition program stored in the memory and executable on the processor, and when the picture text recognition program is executed by the processor, the step of implementing a picture text recognition method corresponding to the federal learning server is implemented.

In addition, to achieve the above object, the present invention further provides a computer readable storage medium, on which a picture text recognition program is stored, which when executed by a processor implements the steps of the picture text recognition method as described above.

According to the method and the device, the picture to be recognized is obtained, the picture to be recognized is input into the picture text recognition model, so that the region coordinates of the text region corresponding to the picture to be recognized are obtained, the text content corresponding to the text region is obtained, the associated text region is determined according to the region coordinates corresponding to the text region and the text content corresponding to the region coordinates, and the text recognition result containing the semantics in the picture to be recognized is obtained according to the associated text region, so that the text in the picture is recognized automatically, and the recognition efficiency of the text in the picture is improved.

Drawings

FIG. 1 is a flowchart illustrating a method for recognizing a picture text according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating a second embodiment of a method for recognizing a picture text according to the present invention;

FIG. 3 is a block diagram illustrating a function diagram of an apparatus for recognizing picture texts according to a preferred embodiment of the present invention;

fig. 4 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a method for identifying a picture text, and referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of the method for identifying the picture text.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different from that shown or described herein.

The picture text recognition method is applied to a picture text recognition device which can be a server or a terminal, and the terminal can include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a camera, a palm computer, a Personal Digital Assistant (PDA) and the like, and a fixed terminal such as a Digital TV, a desktop computer and the like. In the respective embodiments of the recognition method of the picture text, for convenience of description, the execution subject is omitted to explain the respective embodiments. The picture text recognition method comprises the following steps:

step S10, obtaining a picture to be recognized, inputting the picture to be recognized into a preset picture text recognition model to obtain the region coordinates of the text region corresponding to the picture to be recognized, and obtaining the text content corresponding to the text region.

Specifically, the picture to be recognized may be obtained in real time, such as collected by a recognition device of a picture text in real time, or sent to the recognition device by another terminal, and the picture to be recognized may also be stored in advance. The picture to be identified can be one picture or a plurality of pictures. When the picture to be identified is pre-stored, a timing task can be preset, and the pre-stored picture to be identified is obtained through the timing task. In this embodiment, the picture to be recognized may be a certificate picture, or a license plate number or other pictures. After the picture to be recognized is obtained, the picture to be recognized is input into a preset picture text recognition model so as to obtain the region coordinates of the text region corresponding to the picture to be recognized and obtain the text content corresponding to the text region. It should be noted that each picture to be recognized corresponds to at least one text region, each text region has at least one character, and the text characters are text contents corresponding to the text regions. In the present embodiment, the shape of the text region is not limited. In the present embodiment, the picture text Recognition model is an OCR (Optical Character Recognition) model. In other embodiments, the picture text recognition model may also be other models that can recognize text in a picture. Specifically, an OCR engine may be piggybacked to implement invocation of the OCR model.

It should be noted that the process of recognizing the text in the multiple pictures to be recognized is consistent with the process of recognizing the text in the single picture to be recognized, and therefore, for convenience of description, the embodiment of the present invention is described with the single picture to be recognized.

Further, the step of inputting the picture to be recognized into a preset picture text recognition model to obtain the region coordinates of the text region corresponding to the picture to be recognized includes:

step a, inputting the picture to be recognized into a preset picture text recognition model, and recognizing a text region in the picture to be recognized through a Feature Pyramid Network (FPN) in the picture text recognition model.

And b, determining the region coordinates of the text region according to the pixel values of the text region in the picture to be identified.

Specifically, after the picture to be recognized is obtained, the picture to be recognized is input into a preset picture text recognition model (FPN), so as to recognize a text region in the picture to be recognized through the FPN. It should be noted that FPN is a basic architecture in the image text recognition model, and is mainly used to solve the multi-scale problem in object detection, and the performance of small object detection is greatly improved through simple network connection change without increasing the calculation amount of the original model. And after the text region in the picture to be recognized is obtained through the FPN in the picture text recognition model, determining the region coordinates of the text region according to the pixel values of the recognized text region in the picture to be recognized. It can be understood that, since each pixel in the picture to be recognized has a corresponding pixel value, the position of the corresponding pixel in the picture to be recognized can be determined by the pixel value, and therefore, the region coordinates of the text region can be determined by the pixel value of each pixel in the picture to be recognized.

Further, the step of identifying the text region in the picture to be identified through the FPN in the picture text identification model includes:

step a1, performing feature extraction and feature fusion on the picture to be recognized through the FPN in the picture text recognition model to obtain a first feature map corresponding to the picture to be recognized.

Further, it should be noted that the FPN is divided into a first half and a second half, where the first half is used to perform feature extraction on features in the picture to be recognized from large to small, and the second half is used to perform feature fusion on features in the picture to be recognized from small to large, and it can be understood that the larger the feature is, the more information the feature contains is, the more complex the feature is. In the first half of the FPN, each layer uses a structure similar to ResNet (Residual Network). In the FPN, a convolutional layer, a pooling layer, and the like are included. The FPN has K1 layers in the front half and the back half, wherein the size of K1 can be preset, and the embodiment does not specifically limit the size of K1. It can be understood that after the picture to be recognized passes through the K1 layers, K1 feature maps (feature maps) corresponding to the picture to be recognized are obtained, that is, each time a layer is passed, one feature map is obtained, and since the characteristics of each layer are different, the sizes of the obtained feature maps are also different. Therefore, the first feature graphs corresponding to the pictures to be recognized can be obtained by performing feature extraction and feature fusion on the pictures to be recognized through the FPN in the picture text recognition model, and at the moment, the sizes of the first feature graphs corresponding to the pictures to be recognized are not consistent.

Step a2, inputting the first feature map into the convolution layer of the FPN to obtain a second feature map corresponding to the picture to be identified.

In order to facilitate subsequent processing of the first feature map and improve the processing efficiency of the first feature map, after each first feature map corresponding to the picture to be identified is obtained, a target feature map with the largest area in the first feature maps is determined, then each first feature map is subjected to up-sampling, the size of each first feature map is adjusted to be consistent with that of the target feature map, and the adjusted first feature map is obtained. In the process of resizing each first feature map as the target feature map, each first feature map may be increased in the same scale, that is, the length and width of the first feature map are increased in the same scale when the first feature map is increased.

And after the adjusted first characteristic diagram is obtained, inputting the adjusted first characteristic diagram into the convolution layer of the FPN to obtain a second characteristic diagram corresponding to the picture to be identified. The number of the convolutional layers can be set to be K2, wherein the size of K2 can be the same as K1 or different from K1. In the K2 convolutional layers, the size of each convolutional layer may be the same or different, and the user may set the size of each convolutional layer according to specific needs.

Step a3, determining a text region in the picture to be recognized according to text pixel points in the second feature map.

And after the second characteristic diagram is obtained, determining text pixel points in the second characteristic diagram, and determining a text region in the picture to be identified according to the text pixel points in the second characteristic diagram. Specifically, in the obtained second feature map, each pixel has a corresponding label value, and whether the pixel is a text pixel can be determined by the label value. If the label value representing the text pixel point can be set to be 1, the label value representing the non-text pixel point can be set to be 0, and therefore when the label value corresponding to a certain pixel in the second feature map is 1, the pixel is indicated to be the text pixel point; when the label value corresponding to a certain pixel in the second feature map is "0", the pixel is indicated as a non-text pixel point. The embodiment does not limit the expression form of the label value, for example, the label value representing the text pixel point may be represented as "true", and the label value representing the non-text pixel point may be represented as "false".

Further, step a3 includes:

step a31, determining text pixel points in the second feature map, and determining core pixel points in the second feature map according to the text pixel points.

Step a32, classifying each pixel point in the second feature map based on the core pixel point, and determining a text region in the picture to be recognized according to a classification result obtained by classification.

Further, after the second feature map is obtained, text pixel points in the second feature map are determined, and the pixels in the second feature map are classified according to the text pixel points, so that a pixel classification result is obtained. Specifically, a core pixel point in the second feature map is determined, wherein the core pixel point is that all pixels around the pixel point are text pixel points, that is, the upper, lower, left and right four pixel points of the core pixel point are all core pixel points. And after the core pixel point is determined, the core pixel point is used as the center to expand to the periphery. The expansion mode can be specifically as follows: if the pixel point to be expanded is marked as x, the corresponding value is f (x), f (x) represents how many proportion of pixels in the adjacent pixel points of the pixel point to be expanded are successfully expanded, or the pixel point is a text pixel point, and the number of the pixel points adjacent to the pixel point to be expanded can be set according to specific requirements. It should be noted that if a certain pixel is successfully expanded, it indicates that the pixel is also determined as a text pixel. When f (x) corresponding to the pixel point to be expanded is greater than a preset threshold, it indicates that the pixel point x is successfully expanded, where the preset threshold may be set according to specific needs, for example, may be set to 0.6, 0.7, or 0.85. If the preset threshold is set to 0.7, if 8 pixels are text pixels (including pixels confirmed as text pixels due to successful expansion) among 10 adjacent pixels corresponding to the pixel x to be expanded, the expanded pixel can be confirmed as a text pixel. It can be understood that the pixel points to be expanded are pixel points which are not originally text pixel points.

After all the pixels in the second feature map are subjected to the operation, all the text pixels and all the non-text pixels in the second feature map can be determined, so that the classification result of each pixel is obtained, namely, the pixels in the second feature map are divided into the text pixels and the non-text pixels. And after the text pixel points and the non-text pixel points in the second characteristic diagram are determined, determining the text region in the picture to be identified according to the text pixel points in the second characteristic diagram.

Further, the step of obtaining the text content corresponding to each text region includes:

and c, inputting the text region into a network structure of the picture text recognition model to obtain a third feature map corresponding to the text region.

Further, after the text region in the picture to be recognized is determined, the text region is input into a network structure (backbone) in the picture recognition model, and a third feature map corresponding to the text region is obtained. Specifically, each text region in the picture to be recognized can be cut to obtain a region picture corresponding to each text region, and the region picture is input into a network structure in the picture text recognition model; or labeling each text area in the picture to be recognized to obtain the labeled picture to be recognized, and inputting the labeled picture to be recognized into the network structure of the picture text recognition model. Wherein, the network structure can be ResNet or VGG (visual Geometry group)16, etc.

And d, inputting the third characteristic diagram into a sequence transformation network corresponding to the network structure to obtain a serialized fourth characteristic diagram.

And after the third characteristic diagram is obtained, inputting the second characteristic diagram into a sequence transformation network corresponding to the network structure, and serializing the third characteristic diagram through the sequence transformation network to obtain a serialized fourth characteristic diagram. In the present embodiment, the sequence transformation network includes, but is not limited to, LSTM (Long Short-Term Memory network) and BilSTM.

And e, connecting the full-connection network according to the fourth characteristic diagram and each node in the sequence transformation network to obtain the text content corresponding to the text region.

It should be noted that, in this embodiment, each node in the sequence transformation network is connected to a fully connected network, and this embodiment does not limit what kind of fully connected network is. And after the fourth feature map is obtained, inputting the fourth feature map into a fully-connected network connected with each node in the sequence transformation network, and then obtaining text contents corresponding to the text region through softmax connected with the fully-connected network.

Step S20, determining a text region according to the region coordinates corresponding to the text region and the text content corresponding to the region coordinates.

And after the text content corresponding to the text region in the picture to be identified is obtained, determining the associated text region in the text region according to the region coordinate corresponding to the text region and the text content corresponding to the region coordinate. It can be understood that, in the process of determining the associated text region, the associated region coordinates may be determined first, the text content corresponding to the associated region coordinates is determined as the text target content, then the associated text target content is determined according to the semantics of each text target content, and the text region corresponding to the associated text target content is determined as the associated text region; it is also possible to determine the associated text target content first and then the associated region coordinates.

It is understood that, according to the size of the area coordinate corresponding to each text area, the associated text area may be determined, and the associated text area may be one or more of an upper associated text area, a lower associated text area, a left associated text area, and a right associated text area. It may be appreciated that the associated text regions are adjacent text regions, such as determining that a text region is associated with B, C, and D text regions. In the process of determining the associated text content, the text content is determined mainly according to semantics, for example, the text content corresponding to the text area a is the 'identification number', and the text content corresponding to the text area B meets the requirement corresponding to the 'identification number', for example, 18 characters in total including an address code, a birth date code, a numeric code and a check code, at this time, it can be finally determined that the text area a is associated with the text area B.

And step S30, obtaining a text recognition result containing semantics in the picture to be recognized according to the associated text region.

And after the associated text region is determined, according to a text recognition result containing semantics in the picture to be recognized corresponding to the associated text region. Specifically, corresponding semantics may be assigned to each text region according to the text content. If a "name" is allocated to the text region corresponding to "zhangxiaoming", an "address" is allocated to the text region corresponding to "C town D village, a, B, etc., of" guangdong province, a "name tag" is allocated to the text region corresponding to "name", for example, the "name" on the identity card: and Zhangming ", then determining the corresponding semantics of each text region in the picture to be recognized, and obtaining the text recognition result in the picture to be recognized, namely obtaining the structured text recognition result containing the semantics.

According to the method and the device for recognizing the text in the picture, the picture to be recognized is acquired, the picture to be recognized is input into the picture text recognition model, so that the area coordinates of the text area corresponding to the picture to be recognized are acquired, the text content corresponding to the text area is acquired, the associated text area is determined according to the area coordinates corresponding to the text area and the text content corresponding to the area coordinates, and the text recognition result containing the semantics in the picture to be recognized is obtained according to the associated text area, so that the text in the picture is recognized automatically, and the recognition efficiency of the text in the picture is improved.

It should be noted that, in this embodiment, the text in the picture may be recognized online, or the text in the picture may be recognized offline. When the text in the picture is identified on line, the picture to be identified is obtained in real time; when the text in the picture is identified offline, the picture to be identified is pre-stored. If the text in the picture is identified on line, the user can upload the picture to be identified through the small program corresponding to the picture text identification model. The text recognition result obtained at this time allows the user to make online modification. And after the text recognition result is obtained, outputting the text recognition result, and enabling the user corresponding to the picture to be recognized to confirm whether the text recognition result is accurate. If the user confirms that the text recognition result is not accurate, the inaccurate text recognition result can be modified. It should be noted that, the user at this time is not the user of the information corresponding to the picture to be recognized, but the user corresponding to the picture text recognition model, for example, when the picture to be recognized is a certificate picture, the user is not the owner of the certificate, but the user checks the certificate picture, so as to prevent the user corresponding to the picture to be recognized from forging the information. The picture text is identified on line to ensure that the equipment load adopted by the picture text identification is more balanced, and the text identification result is confirmed by a user to ensure the correctness of the obtained text identification result. Furthermore, the image to be recognized can be labeled through the text recognition result to obtain the labeled image to be recognized, and the image text recognition model can be further trained through the labeled image to be recognized so as to improve the accuracy of the image text recognition model in text recognition.

When the picture text recognition is performed offline, the number of pictures to be recognized each time can be multiple, and the picture text recognition is performed at regular time. The reason why the picture text recognition is performed offline is that it is sometimes inconvenient or unnecessary to perform online recognition of the picture text. In the process of offline recognition of the picture to be recognized, the network can be selected to be good, and the picture text recognition is performed when the load of the corresponding equipment is low, so that the recognition efficiency of the picture text is improved, and the delay of the network does not need to be worried about. Furthermore, if some pictures to be recognized contain sensitive information, in order to ensure the security of the sensitive information, the pictures to be recognized containing the sensitive information can be limited to be recognized only offline.

Further, a second embodiment of the method for recognizing picture texts is provided. The second embodiment of the method for identifying a picture text is different from the first embodiment of the method for identifying a picture text in that, referring to fig. 2, the method for identifying a picture text further includes:

and step S40, acquiring a first sample picture for model training, and labeling the first sample picture to obtain a training sample set consisting of the labeled first sample picture.

And acquiring a first sample picture for model training, wherein the first sample picture can be acquired from other terminals when needed and can also be acquired in real time. It can be understood that the first sample pictures are also pictures containing text, and in this embodiment, the number of the first sample pictures is not limited, and the user may set the number of the first sample pictures according to specific needs. And after the first sample picture is obtained, labeling the first sample picture to obtain a labeled first sample picture, and forming a training sample set by the labeled first sample picture. Specifically, the prompt information may be sent to a annotating person, so as to prompt the annotating person to annotate the first sample picture according to the prompt information, so as to determine each character in each first sample picture, further, a text region and each character in each first sample picture may also be annotated, and text content may be determined according to each annotated character. In other embodiments, the annotation can also be done automatically.

And step S50, inputting the training sample set into the picture text recognition model to train the picture text recognition model.

And inputting the first sample picture in the training sample set into the picture text recognition model to train the picture text recognition model, and storing the picture text recognition model. It should be noted that the processing procedure for inputting the first sample picture into the picture text recognition model is consistent with the processing procedure for inputting the picture to be recognized into the picture text recognition model, and the description is not repeated here.

In this embodiment, a first sample picture is obtained to perform model training, so as to obtain a picture text recognition model, and when a text in the picture needs to be recognized, the trained picture text recognition model can be used.

Further, the step of labeling the first sample picture to obtain a training sample set composed of the labeled first sample picture includes:

and h, labeling the first sample picture to obtain a labeled first sample picture, and calculating the number of the labeled first sample picture.

And i, if the number of the pictures is smaller than the preset number, carrying out picture simulation according to the marked first sample picture to obtain a second sample picture.

And j, taking the second sample picture and the labeled first sample picture as a training sample set.

And after the first sample picture is obtained, marking the first sample picture to obtain a marked first sample picture, calculating the number of the marked first sample picture, and judging whether the picture data is smaller than the preset number. And if the image data are determined to be smaller than the preset number, performing image simulation according to the marked first sample image to obtain a second sample image. Specifically, in the process of performing image simulation, in order to improve the fidelity of the obtained second sample image, it is required to ensure that the font in the second sample image is the same as that of the first sample image, and the similarity between the background of the second sample image and the background of the first sample image is greater than the preset similarity, where the size of the preset similarity is not limited in this embodiment. The corpus in the second sample picture can be determined according to the characteristics and the format of the text in the first sample picture, namely the text content in the second sample picture is determined. For example, the format of the fictitious identity card number in the second sample picture is the same as that of the real identity card number. It should be noted that the second sample picture obtained by simulation is also a labeled picture.

And after the second sample picture is obtained, taking the second sample picture and the labeled first sample picture as a training sample set. Further, if it is determined that the number of the pictures is greater than or equal to the preset number, the number of the first sample pictures is enough, and a second sample picture does not need to be obtained through simulation.

If the second sample picture is obtained, and then a plurality of sample pictures are not needed, a part of the sample pictures can be selected from the first sample picture and the second sample picture to form a training sample set, the quantity ratio between the first sample picture and the second sample picture in the training sample set is set according to specific needs, and the embodiment does not specifically limit the quantity ratio.

Because a large number of sample pictures are needed for training the picture text recognition model, but the sample pictures which are real enough cannot be taken out and labeled due to the reasons of time, labor cost, insufficient real data quantity, sensitive data and the like, the sample pictures of the picture text recognition model are simulated to increase the sample pictures of the picture text recognition model, and therefore the accuracy of the picture text recognition model obtained through training for recognizing the text is improved.

Further, a third embodiment of the method for recognizing a picture text is provided.

The third embodiment of the method for recognizing picture texts differs from the first and/or second embodiment of the method for recognizing picture texts in that the method for recognizing picture texts further comprises:

and k, comparing the text recognition result with the pre-stored certificate information of the borrower to obtain a comparison result.

If the method for recognizing the picture text is applied to loan services, such as automobile financial loan or house mortgage loan. And after the text recognition result is obtained, comparing the text recognition result with the certificate information of the borrower stored in the database in advance to obtain a comparison result. It is understood that in the loan data, the borrower's certificate information is filled in, which is stored in the database.

And step l, if the text recognition result is determined to be consistent with the certificate information according to the comparison result, determining that the certificate corresponding to the picture to be recognized is a real certificate.

After the comparison result is obtained, if the text recognition result is determined to be consistent with the certificate information according to the comparison result, determining that the certificate corresponding to the picture to be recognized is a real certificate, wherein the picture to be recognized is the certificate picture; and if the text recognition result is determined to be inconsistent with the certificate information according to the comparison result, determining that the certificate corresponding to the picture to be recognized is a false certificate. It should be noted that when the information in the text recognition result is the same as the certificate information, it may be determined that the text recognition result is consistent with the certificate information, otherwise, it may be determined that the text recognition result is inconsistent with the certificate information; and when the similarity between the information in the text recognition result and the certificate information is greater than the preset information similarity, determining that the text recognition result is consistent with the certificate information, and otherwise, determining that the text recognition result is inconsistent with the certificate information. When the certificate corresponding to the picture to be recognized is determined to be a real certificate, lending the borrower; and when the certificate corresponding to the picture to be recognized is the false certificate, refusing to loan the certificate to the borrower.

The embodiment compares the text recognition result with the certificate information of the borrower, thereby realizing the automatic verification of the certificate information of the borrower, ensuring the authenticity of the certificate information of the borrower, improving the safety of loan and reducing the risk of loan.

In addition, the present invention also provides an apparatus for recognizing a picture text, and referring to fig. 3, the apparatus for recognizing a picture text includes:

the acquisition module 10 is used for acquiring a picture to be identified;

the input module 20 is configured to input the picture to be recognized into a preset picture text recognition model, so as to obtain region coordinates of a text region corresponding to the picture to be recognized, and obtain text content corresponding to the text region;

a determining module 30, configured to determine a relevant text region according to a region coordinate corresponding to the text region and text content corresponding to the region coordinate;

and the processing module 40 is configured to obtain a text recognition result containing semantics in the picture to be recognized according to the associated text region.

Further, the input module 20 includes:

the first input unit is used for inputting the picture to be recognized into a preset picture text recognition model;

the identification unit is used for identifying a text region in the picture to be identified through a Feature Pyramid Network (FPN) in the picture text identification model;

the first determining unit is used for determining the region coordinates of the text region according to the pixel values of the text region in the picture to be identified.

Further, the identification unit includes:

the feature processing subunit is used for performing feature extraction and feature fusion on the picture to be recognized through the FPN in the picture text recognition model to obtain a first feature map corresponding to the picture to be recognized;

the input subunit is used for inputting the first feature map into the convolution layer of the FPN to obtain a second feature map corresponding to the picture to be identified;

and the determining subunit is used for determining the text region in the picture to be identified according to the text pixel points in the second feature map.

Further, the determining subunit is further configured to determine text pixel points in the second feature map, and determine core pixel points in the second feature map according to the text pixel points; classifying each pixel point in the second characteristic diagram based on the core pixel point, and determining a text region in the picture to be recognized according to a classification result obtained by classification.

Further, the input module 20 further includes:

the second input unit is used for determining text pixel points in the second feature map and determining core pixel points in the second feature map according to the text pixel points; classifying each pixel point in the second characteristic diagram based on the core pixel point, and determining a text region in the picture to be recognized according to a classification result obtained by classification.

And the processing unit is used for connecting the full-connection network according to the fourth feature map and each node in the sequence transformation network to obtain the text content corresponding to the text region.

Further, the obtaining module 10 is further configured to obtain a first sample picture for model training;

the device for recognizing the picture text further comprises:

the marking module is used for marking the first sample picture to obtain a training sample set consisting of the marked first sample picture;

the input module 20 is further configured to input the training sample set into the photo text recognition model to train the photo text recognition model.

Further, the labeling module comprises:

the marking unit is used for marking the first sample picture to obtain a marked first sample picture;

the calculating unit is used for calculating the number of the marked first sample pictures;

the simulation unit is used for carrying out picture simulation according to the marked first sample picture to obtain a second sample picture if the number of the pictures is smaller than the preset number;

and the second determining unit is used for taking the second sample picture and the labeled first sample picture as a training sample set.

Further, the device for recognizing the picture text further comprises:

the comparison module is used for comparing the text recognition result with the prestored certificate information of the borrower to obtain a comparison result;

the determining module 30 is further configured to determine that the certificate corresponding to the picture to be recognized is a real certificate if it is determined that the text recognition result is consistent with the certificate information according to the comparison result.

The specific implementation of the device for identifying picture texts is basically the same as that of the above-mentioned method for identifying picture texts, and is not described herein again.

In addition, the invention also provides a device for identifying the picture text. As shown in fig. 4, fig. 4 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.

It should be noted that fig. 4 is a schematic structural diagram of a hardware operating environment of the device for recognizing picture texts. The device for recognizing the picture text in the embodiment of the invention can be a terminal device such as a PC, a portable computer and the like.

As shown in fig. 4, the apparatus for recognizing picture text may include: a processor 1001, such as a CPU, a memory 1005, a user interface 1003, a network interface 1004, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the device for recognizing picture texts shown in fig. 4 does not constitute a limitation of the device for recognizing picture texts, and may include more or less components than those shown, or combine some components, or arrange different components.

As shown in fig. 4, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a recognition program of a picture text. The operating system is a program for managing and controlling hardware and software resources of the identification device of the picture texts, and supports the operation of the identification program of the picture texts and other software or programs.

In the device for recognizing a picture text shown in fig. 4, the user interface 1003 is mainly used for connecting other terminals, performing data communication with other terminals, and obtaining a picture to be recognized and/or a first sample picture through other terminals; the network interface 1004 is mainly used for the background server and performs data communication with the background server; the processor 1001 may be configured to call a recognition program of the picture text stored in the memory 1005 and execute the steps of the recognition method of the picture text as described above.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a picture text recognition program is stored on the computer-readable storage medium, and when being executed by a processor, the picture text recognition program implements the steps of the above-mentioned picture text recognition method.

The specific implementation manner of the computer-readable storage medium of the present invention is substantially the same as that of the above embodiments of the method for identifying a picture text, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A picture text recognition method is characterized by comprising the following steps:

2. The method for recognizing picture texts according to claim 1, wherein the step of inputting the picture to be recognized into a preset picture text recognition model to obtain region coordinates of a text region corresponding to the picture to be recognized comprises:

3. The picture text recognition method according to claim 2, wherein the step of recognizing the text region in the picture to be recognized through the FPN in the picture text recognition model comprises:

4. The method for recognizing the text of the picture according to claim 3, wherein the step of determining the text region in the picture to be recognized according to the text pixel points in the second feature map comprises:

5. The method for recognizing the picture text according to claim 1, wherein the step of obtaining the text content corresponding to the text region comprises:

6. The method for recognizing the picture text according to claim 1, wherein the step of obtaining the picture to be recognized further comprises, before the step of obtaining the picture to be recognized:

7. The method for recognizing picture texts according to claim 6, wherein the step of labeling the first sample picture to obtain a training sample set composed of the labeled first sample picture includes:

8. The method for recognizing the picture text as claimed in any one of claims 1 to 7, wherein the picture to be recognized is a document picture of a borrower, and after the step of obtaining the text recognition result containing semantic meaning in the picture to be recognized according to the associated text region, the method further comprises:

9. An apparatus for recognizing picture text, comprising:

the acquisition module is used for acquiring a picture to be identified;

10. A device for recognizing picture texts, characterized in that the device comprises a memory, a processor and a program for recognizing picture texts, which is stored in the memory and can be run on the processor, and when the program for recognizing picture texts is executed by the processor, the steps of the method for recognizing picture texts as claimed in any one of claims 1 to 8 are implemented.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a picture text recognition program, which when executed by a processor implements the steps of the picture text recognition method according to any one of claims 1 to 8.