CN111783645A

CN111783645A - Character recognition method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN111783645A
Application number: CN202010615181.4A
Authority: CN
Inventors: 冯博豪; 庞敏辉; 谢国斌; 陈兴波; 韩光耀; 杨舰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-16

Abstract

The application discloses a character recognition method, a character recognition device, electronic equipment and a computer readable storage medium, and relates to the technical field of computers, the technical field of image processing, the technical field of character recognition and the technical field of deep learning. The specific implementation scheme is as follows: acquiring an electronic screenshot to be identified; carrying out row and column alignment processing on the text information content contained in the electronic screenshot to obtain row and column aligned text information; and extracting the character content in the row and column alignment text information by adopting a character recognition neural network obtained by pre-training. The scheme provides a method for aligning the lines and columns of the text information contained in the electronic screenshot firstly and then recognizing the characters of the obtained aligned lines and columns of the text information, so that the character recognition result is more accurate and convenient to observe.

Description

Character recognition method and device, electronic equipment and computer readable storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of image processing, the technical field of character recognition and the technical field of deep learning, and particularly relates to a character recognition method, a character recognition device, electronic equipment and a computer-readable storage medium.

Background

Currently, credit finance companies require consumers of loans to upload electronic transaction screenshots as consumption vouchers for the needs of financial risk management auditing.

After the credit finance company obtains the electronic transaction screenshot uploaded by the consumer, the contents recorded in the electronic transaction screenshot need to be classified and extracted according to the purpose of use, the extracted contents are filled in related record information, the accounting of transaction information is completed, and the specific contents of the transaction are confirmed so as to release loan.

Disclosure of Invention

The application provides a method, a device, an electronic device and a storage medium for character recognition.

In a first aspect, an embodiment of the present application provides a method for character recognition, including: acquiring an electronic screenshot to be identified; carrying out row and column alignment processing on the text information content contained in the electronic screenshot to obtain row and column aligned text information; and extracting the character content in the row and column alignment text information by adopting a character recognition neural network obtained by pre-training.

In a second aspect, an embodiment of the present application provides an apparatus for character recognition, including: an image acquisition unit configured to acquire an electronic screenshot to be recognized; the line alignment processing unit is configured to perform line alignment processing on the text information content contained in the electronic screenshot; the column alignment processing unit is configured to perform column alignment processing on text information content contained in the electronic screenshot; and the character recognition unit is configured to extract the character content in the row and column alignment text information by adopting a character recognition neural network obtained by pre-training.

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the text recognition method described in any implementation manner of the first aspect.

In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium having computer instructions stored thereon, comprising: the computer instructions are used for causing the computer to execute the character recognition method as described in any one of the implementation manners of the first aspect.

After the electronic screenshot to be recognized is obtained, the text information content in the screenshot is aligned in rows and columns, and the content in the text information in the rows and columns is extracted by adopting a character recognition neural network obtained through pre-training. And after the text information in the electronic screenshot is aligned, the text extraction operation is executed, so that the extracted text information is accurate and neat and is easy to read, and the text auditing efficiency is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary system architecture to which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a text recognition method according to the present application;

FIG. 3 is a flow chart of another embodiment of a text recognition method according to the present application

FIG. 4a is a schematic diagram of an electronic screenshot in an application scenario of a text recognition method according to the present application;

FIG. 4b is a diagram illustrating the result of text recognition for an application scenario in accordance with the text recognition method of the present application;

FIG. 5 is a schematic diagram of an embodiment of a text recognition device according to the present application;

FIG. 6 is a block diagram of an electronic device suitable for use in implementing text recognition of embodiments of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the text recognition method, apparatus, electronic device, and computer-readable storage medium of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various image acquisition applications, such as an audit upload application, an image document conversion application, an image text audit application, and the like, may be installed on the

terminal devices

101, 102, and 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. Hardware, various electronic devices with display screens are possible, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (for example to implement a text recognition service) or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, for example, acquiring an electronic screenshot to be recognized from the

terminal devices

101, 102, and 103 that receive a character recognition request sent by a user in the present application through the network 104, performing row and column alignment processing on text information content included in the electronic screenshot to be recognized, and extracting text content in the row and column aligned text information by using a character recognition neural network obtained by training in advance.

It should be noted that the method for character recognition provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the web page generation device is generally disposed in the server 105.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules, for example, to provide distributed services, or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be further noted that the

terminal devices

101, 102, and 103 may also be installed with a text recognition application, and the

terminal devices

101, 102, and 103 may also complete acquiring an electronic screenshot to be recognized, perform row and column alignment processing on text information content included in the electronic screenshot to be recognized, and extract text content in the row and column aligned text information by using a character recognition neural network obtained through pre-training. In this case, the character recognition method may be executed by the

terminal apparatuses

101, 102, and 103, and accordingly, the character recognition apparatus may be provided in the

terminal apparatuses

101, 102, and 103. At this point, the exemplary system architecture 100 may also not include the server 105 and the network 104.

In this case, the method of character recognition may be executed by the

terminal apparatuses

101, 102, and 103, and accordingly, the apparatus of character recognition may be provided in the

terminal apparatuses

With continued reference to FIG. 2, a flow 200 of one embodiment of a text recognition method according to the present application is shown. The character recognition method comprises the following steps:

step 201, obtaining an electronic screenshot to be identified.

In this embodiment, an execution subject of the text recognition method (e.g., the server 105 shown in fig. 1) may obtain the electronic screenshot to be recognized from a local or non-local storage device (e.g., the

terminal devices

101, 102, 103 shown in fig. 1).

The local storage device may be a data storage module arranged in the execution main body, and in this case, the electronic screenshot to be identified can be obtained only by locally reading the electronic screenshot; the non-local storage device may also be another data storage server dedicated to storing the query information of the user, in which case the executing agent may obtain the electronic screenshot to be recognized returned by the data storage server by sending an electronic screenshot obtaining command to be recognized to the data storage server.

In some embodiments, a large number of electronic screenshots to be identified may be collected directly and centrally stored in a database. In this scenario, the execution body may obtain the user data set from the database. In other embodiments, a number of electronic screenshots to be identified may be stored in various user terminal devices. In this scenario, the execution body may obtain the electronic screenshot to be recognized from various user terminal devices.

Step 202, performing row and column alignment processing on the text information content contained in the electronic screenshot to obtain row and column aligned text information.

In this embodiment, the content in the to-be-recognized electronic screenshot acquired in step 201 is detected, text information in the electronic screenshot is detected, and then the text information is subjected to row and column alignment processing to obtain the text information with aligned rows and columns.

Determining text information content contained in the electronic screenshot, performing row and column alignment processing on the content to obtain row and column aligned text information, performing pixelization processing on the text information contained in the electronic screenshot to align the text information content contained in the electronic screenshot in a pixel alignment mode, so that the text content in the text information can be distributed in a plurality of mutually parallel text lines, and then adjusting the positions of the plurality of mutually parallel text lines to align the headers between the text lines; it should be understood that after the text information included in the electronic screenshot is pixelized, a minimum rectangle (usually, the minimum rectangle is defined as a word detection box) that can completely include each pixel of the text information is determined, and the position of the text information content is adjusted according to the position of the center point and the position of the border of the rectangle, so as to obtain the column-aligned text information.

It should be understood that when performing column alignment text, column alignment may be performed according to the content of a specific line or a specific rule, for example, the content of a specific line is used as header information or reference information, and the header information is used to perform a column alignment operation on the content of text information.

And step 203, extracting the character content in the row and column alignment text information by adopting a character recognition neural network obtained by pre-training.

Specifically, the pre-trained character recognition neural network may be a common optical character recognition network or a semantic character recognition network, and extracts the text content in the row and column alignment text information acquired in step 202 to obtain the final text content.

Furthermore, a key-value pair relationship with the text content can be established according to the recognition result of the recognition network, so that the available value of the text recognition result is improved.

For example, after the electronic screenshot is identified by using the optical character recognition network, the text information of the screenshot is obtained, but usable key value pair information is not formed, so that the key value of the identification information needs to be sorted and extracted. Extracting key words and corresponding values of the key words on the screenshot can complete extraction of key information. Such as "shipping address: "east street of navigation" in the Hui nationality area of Zhengzhou city, Henan province "," order number: 9909991212".

The character recognition method provided by the embodiment of the application obtains the electronic screenshot to be recognized, and carries out row and column alignment processing on the text information content contained in the electronic screenshot to obtain row and column aligned text information; and extracting the character content in the row and column alignment text information by adopting a character recognition neural network obtained by pre-training. According to the character recognition method provided by the embodiment of the application, the information positions in the electronic screenshot are firstly adjusted and classified, so that the recognition accuracy can be improved, the recognized content can be conveniently checked by follow-up workers, and the auditing efficiency of the content in the electronic screenshot is improved.

In some optional implementation manners of this embodiment, a layout of the to-be-recognized electronic screenshot is analyzed, and at least one of the following layouts is obtained: text layout, table layout and image layout; and performing line and column alignment processing on the text information content contained in the layout obtained in the electronic screenshot to obtain line and column aligned text information of different layouts.

Specifically, because the electronic screenshot usually includes background information, text information, image information, and table information, when detecting the content in the electronic screenshot to be recognized, the detecting step usually includes three parts, i.e., image detection, text region determination, and region line detection, so as to determine the information content included in the electronic screenshot, and obtain at least one of the following layouts: text layout, table layout and image layout, so that character recognition can be carried out according to the layout in the following process, and the character recognition efficiency is improved.

Illustratively, the image detection step is completed by using a target detection algorithm model, the image in the electronic screenshot is acquired, and the related area where the image is located is determined, the target detection algorithm model uses a backbone network (consisting of a computer vision group and a residual error neural network-101) and a multi-target detector to extract image features, and finally, a final detection result is obtained through a non-maximum inhibition model, and the image information in the electronic screenshot is accurately acquired.

For the text region judgment step, the text region in the electronic screenshot to be identified is usually judged by three steps of boundary identification, convolutional neural network judgment and filtering correction, and operators applied to the boundary identification can be a Sobel operator, a Prewitt operator, a Laplacian operator and a Laplacian operator; after determining the operator for the boundary identification application, the image is classified into non-text regions and text regions using a convolutional neural network. For example, a convolutional neural network is used to classify pixel points in the image, and if the non-local region is output 1, the text region has an output value of 0.

And finally, carrying out a region line detection step, wherein in the electronic transaction screenshot, a large number of region lines possibly exist due to self attributes, using Hough transformation straight line detection and carrying out corrosion expansion on the image, the region lines in the transaction screenshot can be detected, and each layout region in the electronic screenshot is divided, so that targeted character recognition can be carried out according to the layout region, and the character recognition efficiency is improved.

In some optional implementation manners of this embodiment, it is determined whether text contents of adjacent upper and lower lines in the text information aligned in the row and the column are continuous; and splicing the text contents of the upper and lower lines of continuous text contents in response to the continuous text contents of the upper and lower lines of adjacent text information aligned in the rows and columns.

Specifically, in order to solve the line folding problem, it is necessary to determine whether the upper and lower lines of text contents are continuous, and if so, the text information of the upper and lower lines is spliced to better adapt to the reading habit of human beings and improve the use value of the text recognition result.

Illustratively, a semantic understanding pre-training algorithm model (ERNIE algorithm model) is used to determine whether the upper and lower lines of text are continuous. The semantic understanding pre-training algorithm model comprises a bidirectional encoder of a modeling sequence architecture (Transformer), a deep bidirectional representation model is pre-trained by jointly adjusting the context in all layers, the pre-trained pre-training algorithm model can be subjected to fine adjustment through an additional output layer, the pre-trained pre-training algorithm model is suitable for various tasks, and whether the upper line and the lower line of characters are continuous or not can be judged by fine adjustment of the pre-training algorithm model.

In some optional implementations of this embodiment, the method shown above further includes: in response to detecting that the user inputs a modification mark for the unqualified text information, analyzing correct text content corresponding to the modification mark; adopting the correct character content corresponding to the modification mark to train the character recognition neural network to obtain an updated character recognition neural network; adopting the updated character recognition neural network to re-recognize the unqualified text information; and in response to the fact that the modification mark of the unqualified text information input by the user is not detected within the preset time period, updating the text content corresponding to the unqualified text information into the text content identified by the updated character recognition neural network.

Specifically, after obtaining the result of the character recognition, the user may input a modification mark and enter correct content into the corresponding content when considering that an error or a better solution exists in the result, and when recognizing the modification mark input by the user, the execution main body trains and updates the character network by using the entered correct content, and then re-recognizes the corresponding content by using the updated character recognition neural network until the user is satisfied (i.e., no modification mark is recognized). Through the setting mode, the interaction with the user can be better realized, the correct recognition result is obtained, and the character recognition neural network is updated, so that more accurate character extraction is realized.

In some optional implementations of this embodiment, the method shown above further includes: and classifying the electronic screenshot according to the text content, and storing the text content and the electronic screenshot according to a classification result.

Specifically, the electronic screenshot can be classified according to the content in the electronic screenshot and the requirement by using an algorithm model, for example, the electronic screenshot is classified into daily necessities, dangerous goods, domestic and overseas consumption, small consumption, large consumption and the like according to the information of the type of the purchased goods, the purchase place, the purchase amount and the like, so that the information in the electronic screenshot can be better acquired.

Illustratively, the relevant text classification is done using a language representation model (BERT model) and a text classification algorithm model (TextCNN algorithm model). The language representation model is used for converting text information into word vectors, then text classification is carried out by using a text classification algorithm model, the electronic screenshot is classified into a legal transaction class and an illegal transaction class, and if the system finds that the user purchases forbidden goods by using loan funds or illegally consumes the illegal transaction class, namely the illegal transaction class is classified, the system returns the electronic transaction screenshot to the user and reminds the user to upload again. And meanwhile, relevant information is forwarded to the business personnel of the lending company.

The present application further provides an implementation flow 300 for another embodiment of character recognition through fig. 3, which is to provide a process for aligning rows and columns of text information contents through the implementation flow 300, and includes:

and 301, performing binarization processing on the electronic screenshot.

Specifically, the electronic screenshot is subjected to binarization processing, and the binarization processing of the image is to set the gray value of a point on the image to be 0 or 255, that is, the whole image shows an obvious black-and-white effect. That is, a gray scale image with 256 brightness levels is selected by a proper threshold value to obtain a binary image which can still reflect the whole and local features of the image.

And step 302, performing horizontal projection on character pixel rows in the image obtained after binarization processing, and counting pixel values contained in each row according to a projection result.

Specifically, the text pixels in the image obtained through the binarization processing in step 301 are horizontally projected, and the number of pixel values included in each horizontal line is counted.

Step 303, performing a line segmentation and line alignment operation according to the statistical result of the pixel value.

Specifically, the content in the same row is determined according to the statistical result of the pixel values in step 302, and the position of the content in the same row is adjusted.

Furthermore, if the number of input characters is limited, the number of pixel values can be screened and segmented according to the number of character limitations.

When the character pixels in the image obtained after binarization processing are used for line alignment operation, the operation difficulty is low, errors are not easy to occur, and the work efficiency can be improved while the accuracy is ensured.

Step 304, obtaining detection box information of characters in the text information in the electronic screenshot; wherein the character detection box comprises at least one of a word-level detection box and a character detection box.

Specifically, the column alignment may be performed by using detection box information during text detection, such as a word-level detection box and a character detection box, where the detection box may be regarded as a minimum rectangle containing text information content pixels, and whether the text information is in the same column is determined according to positions of a left frame, a right frame, and a center point of the detection box.

And 305, performing column segmentation operation on the text information in the electronic screenshot by adopting a pre-trained sequence marking model according to column name information contained in the text information of the first line in the electronic screenshot.

Specifically, the pre-trained sequence labeling model is a model capable of classifying text information according to listing information, and the model consists of three parts: language representation models and named entity recognition models (BilSTM models) and Conditional Random Fields (CRF). Word embedding is first obtained using a language representation model. After the character embedding is obtained, character embedding characteristics are input into a bidirectional long-short term memory artificial neural network (LSTM), a linear layer is added to an output hidden layer, and finally a conditional random field network layer is added to finish sequence labeling. In order to improve the accuracy of the sequence labeling model, the invention also inputs the coordinate information of the characters into the bidirectional long-short term memory artificial neural network. Through the sequence labeling model, column name information can be segmented, and then column segmentation is achieved.

Step 306, adjusting the position alignment between the detection frames belonging to the same column.

Specifically, on the basis of step 305, a position alignment model, such as a true lift algorithm model (Real Adaboost) algorithm model, a Gentle lift algorithm model (Gentle Adaboost algorithm model), and a forward step algorithm model (LogitBoost algorithm model), is used to determine whether the detection frame is in the same column as the detection frame of the corresponding column name information according to the left frame, right frame, and center point position information of the detection frame corresponding to the determined text information content, thereby completing the position determination and adjustment operations.

By adopting the mode, the text information content is aligned in rows and columns, the text information aligned in rows and columns is finally obtained, and on the basis of improving the accuracy of subsequent character recognition, the information can be labeled and classified according to the listing information, so that the observation is facilitated.

In order to deepen understanding, the application also provides a specific implementation scheme by combining a specific application scene. In the application scenario, after purchasing a product of three sellers, a buyer li four obtains an electronic screenshot as shown in fig. 4a and feeds the electronic screenshot back to the execution main body a of character recognition for recognition.

After the execution main body A obtains the electronic screenshot and determines the text information, the line alignment operation is performed on the text information content according to the pixels of the text information content, then the number of header information pieces/money is defined for the transaction detail part, and the related content is aligned, so that the text information content with aligned lines and columns as shown in FIG. 4b is obtained.

According to the application scenario, the method for recognizing the characters in the application can enable the characters in the same line to be in the same horizontal line through a line and column alignment processing mode, adjust the line alignment between different horizontal lines, and align the lines and columns of the text information content in the line on the basis to obtain the accurate and conveniently observed line and column aligned text information.

As shown in fig. 5, the text recognition apparatus 500 of the present embodiment may include: an image acquisition unit 501 configured to acquire an electronic screenshot to be recognized; the line alignment processing unit 502 is configured to perform line alignment processing on the text information content contained in the electronic screenshot; a column alignment processing unit 503 configured to perform column alignment processing on the text information content included in the electronic screenshot; and a word recognition unit 504 configured to extract word contents in the row and column alignment text information by using a character recognition neural network obtained by training in advance.

In some optional implementations of the present embodiment, the apparatus shown above further includes a layout analysis unit 505, configured to analyze the layout of the to-be-recognized electronic screenshot, resulting in at least one of the following layouts: text layout, table layout and image layout; and performing line and column alignment processing on the text information content contained in the layout obtained in the electronic screenshot to obtain line and column aligned text information of different layouts.

In some optional implementations of this embodiment, the processing step of line alignment in the line alignment processing unit 502 includes: carrying out binarization processing on the electronic screenshot; performing horizontal projection on character pixel rows in the image obtained after binarization processing, and counting pixel values contained in each row according to a projection result; and segmenting according to the statistical result of the pixel value.

In some optional implementations of this embodiment, the processing step of aligning rows and columns in the column alignment processing unit 503 includes: acquiring detection box information of characters in the text information in the electronic screenshot; the character detection box comprises at least one of a word-level detection box and a character detection box; according to column name information contained in the text information of the first line in the electronic screenshot, adopting a pre-trained sequence marking model to perform column segmentation operation on the text information in the electronic screenshot; and adjusting the character detection frame position of the characters in the same column to be aligned with the detection frame position of the column name characters in the column.

In some optional implementations of this embodiment, the apparatus shown above further includes: the character splicing unit is configured to judge whether the text contents of the upper line and the lower line adjacent to each other in the text information with the aligned rows and columns are continuous or not; and splicing the text contents of the upper and lower lines of continuous text contents in response to the continuous text contents of the upper and lower lines of adjacent text information aligned in the rows and columns.

In some optional implementations of this embodiment, the apparatus shown above further includes: the neural network updating unit is configured to respond to the detection that the user inputs a modification mark of the unqualified text information and analyze the correct text content corresponding to the modification mark; adopting the correct character content corresponding to the modification mark to train the character recognition neural network to obtain an updated character recognition neural network; the word recognition unit is further configured to re-identify the disqualified text message using the updated character recognition neural network; and in response to the fact that the modification mark of the unqualified text information input by the user is not detected within the preset time period, updating the text content corresponding to the unqualified text information into the text content identified by the updated character recognition neural network.

In some optional implementations of this embodiment, the apparatus shown above further includes: and the image classification unit is configured to classify the electronic screenshot according to the text content and store the text content and the electronic screenshot according to a classification result.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 6 is a block diagram of an electronic device according to the text recognition method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of text recognition provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of text recognition provided herein.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of text recognition in the embodiments of the present application (for example, the image acquisition unit 501, the row alignment processing unit 502, the column alignment processing unit 503, and the text recognition unit 504 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implements the character recognition method in the above method embodiments.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device by character recognition, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 602 optionally includes memory located remotely from processor 601, which may be connected to a text recognition electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the character recognition method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the text-recognized electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the electronic screenshot to be identified is obtained; carrying out row and column alignment processing on the text information content contained in the electronic screenshot to obtain row and column aligned text information; and the character recognition neural network obtained by pre-training is adopted to extract the character content in the row and column alignment text information, so that the character extraction result is more accurate and convenient to observe.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of text recognition, comprising:

acquiring an electronic screenshot to be identified;

carrying out row and column alignment processing on the text information content contained in the electronic screenshot to obtain row and column aligned text information;

and extracting the character content in the row and column alignment text information by adopting a character recognition neural network obtained by pre-training.

2. The method of claim 1, further comprising:

analyzing the layout of the electronic screenshot to be identified to obtain at least one of the following layouts: text layout, table layout and image layout; and

and performing line and column alignment processing on the text information content contained in the layout obtained in the electronic screenshot to obtain line and column aligned text information of different layouts.

3. The method of claims 1-2, wherein the row-alignment processing step comprises:

carrying out binarization processing on the electronic screenshot;

performing horizontal projection on character pixel rows in the image obtained after binarization processing, and counting pixel values contained in each row according to a projection result;

and performing line segmentation and line alignment operation according to the statistical result of the pixel values.

4. The method of claims 1-2, wherein the processing step of column alignment comprises:

acquiring detection box information of characters in the text information in the electronic screenshot; the character detection box comprises at least one of a word-level detection box and a character detection box;

performing column segmentation operation on the text information in the electronic screenshot by adopting a pre-trained sequence marking model according to column name information contained in the text information of the first line in the electronic screenshot;

and adjusting the character detection frame position of the characters in the same column to be aligned with the detection frame position of the column name characters in the column.

5. The method of claim 1, further comprising:

judging whether the text contents of the upper line and the lower line adjacent to each other in the text information with the aligned rows and columns are continuous or not;

and splicing the text contents of the upper and lower lines of continuous text contents in response to the continuous text contents of the upper and lower lines of adjacent text information aligned in the rows and columns.

6. The method of claim 1, further comprising:

in response to detecting that a user inputs a modification mark for unqualified text information, analyzing correct text content corresponding to the modification mark;

training the character recognition neural network by adopting the correct character content corresponding to the modification mark to obtain an updated character recognition neural network;

adopting the updated character recognition neural network to re-recognize the unqualified text information;

and in response to the fact that the modification mark of the unqualified text information input by the user is not detected within the preset time period, updating the text content corresponding to the unqualified text information into the text content identified by the updated character recognition neural network.

7. The method of claim 1, further comprising:

and classifying the electronic screenshot according to the text content, and storing the text content and the electronic screenshot according to a classification result.

8. An apparatus for word recognition, comprising:

an image acquisition unit configured to acquire an electronic screenshot to be recognized;

the line alignment processing unit is configured to perform line alignment processing on text information content contained in the electronic screenshot;

the column alignment processing unit is configured to perform column alignment processing on text information content contained in the electronic screenshot;

and the character recognition unit is configured to extract the character content in the row and column alignment text information by adopting a character recognition neural network obtained by pre-training.

9. The apparatus of claim 8, further comprising:

a layout analysis unit configured to analyze a layout of the electronic screenshot to be recognized to obtain at least one of: text layout, table layout and image layout; and

10. The apparatus according to claims 8-9, wherein the processing step of row alignment in the row alignment processing unit comprises:

carrying out binarization processing on the electronic screenshot;

and segmenting according to the statistical result of the pixel values.

11. The apparatus according to claims 8-9, wherein the processing step of column alignment in the column alignment processing unit comprises:

12. The apparatus of claim 8, further comprising:

the character splicing unit is configured to judge whether the text contents of the upper and lower adjacent lines in the text information aligned in the rows and the columns are continuous;

13. The apparatus of claim 8, further comprising:

the neural network updating unit is configured to respond to the detection that the modification mark of the unqualified text information is input by the user, and analyze the correct text content corresponding to the modification mark;

the character recognition unit is further configured to re-identify the unqualified text information using the updated character recognition neural network;

14. The apparatus of claim 8, further comprising:

and the image classification unit is configured to classify the electronic screenshot according to the text content and store the text content and the electronic screenshot according to a classification result.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions, comprising: the computer instructions are for causing the computer to perform the method of any one of claims 1-7.