CN112861648B

CN112861648B - Character recognition method, character recognition device, electronic equipment and storage medium

Info

Publication number: CN112861648B
Application number: CN202110068580.8A
Authority: CN
Inventors: 刘翔; 刘莹
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2023-09-26
Anticipated expiration: 2041-01-19
Also published as: CN112861648A; WO2022156066A1

Abstract

The application relates to the field of image detection, and discloses a character recognition method, which comprises the following steps: acquiring a text image, and performing text detection on the text image to obtain a text detection frame; screening and combining the text detection frames to obtain target text frames; cutting the target text frame in a text-free area to obtain a cut text frame; and extracting the characters of the cutting character frame to obtain an initial character set. Extracting key words in the initial word set, checking the key words by using a regular checking technology, and taking the key words which are successfully checked as word recognition results of the text image. In addition, the application also relates to a blockchain technology, and the key words can be stored in the blockchain. The application can improve the accuracy of character recognition.

Description

Character recognition method, character recognition device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image detection, and in particular, to a text recognition method, a text recognition device, an electronic device, and a computer readable storage medium.

Background

Text recognition refers to a process of extracting text from a text image, for example, when government authorities examine the text, the text in the text image often needs to be recognized in order to extract important information. At present, character recognition is generally realized by adopting optical character recognition (Optical Character Recognition, OCR), wherein the OCR refers to a process of analyzing, recognizing and processing a file image of a text material to acquire characters and layout information.

However, when character recognition of a text image is performed by OCR, the position and the direction of characters in the text image cannot be accurately located, which tends to result in a phenomenon that the accuracy of the recognized characters is not high.

Disclosure of Invention

The application provides a character recognition method, a character recognition device, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of character recognition.

In order to achieve the above object, the present application provides a text recognition method, including:

acquiring a text image, and performing text detection on the text image to obtain a text detection frame;

screening and combining the text detection frames to obtain target text frames;

cutting the target text frame in a text-free area to obtain a cut text frame;

extracting the characters of the cutting character frame to obtain an initial character set;

extracting key words in the initial word set, checking the key words by using a regular checking technology, and taking the key words which are successfully checked as word recognition results of the text image.

Optionally, the text image performs text detection to obtain a text detection frame, including:

extracting image features of the text image by using a convolution layer in a text target frame detection model to obtain a feature image, wherein the text target frame detection model is trained in advance;

carrying out standardization operation on the characteristic images by utilizing a batch standardization layer in the text target frame detection model to obtain standard characteristic images;

fusing the bottom features of the text image with the standard feature image by utilizing a fusion layer in the text target frame detection model to obtain a target feature image;

and outputting a detection result of the target feature image by using an activation function in the text target frame detection model, and generating a text detection frame according to the detection result.

Optionally, the performing the text-free region clipping on the target text frame to obtain a clipped text frame includes:

performing binarization processing on the target text frame to obtain a binarized text frame;

inquiring a character starting position and a character ending position in the longitudinal axis direction in the binarized character frame and the longitudinal axis direction length of the binarized character frame, and performing longitudinal cutting on the binarized character frame according to the character starting position, the character ending position and the longitudinal axis direction length in the longitudinal axis direction to obtain a longitudinal cutting character frame;

inquiring the character starting position and the character ending position in the transverse axis direction in the longitudinal cutting character frame and the transverse axis direction length of the longitudinal cutting character frame, and transversely cutting the longitudinal cutting character frame according to the character starting position and the character ending position in the transverse axis direction and the transverse axis direction length to obtain the cutting character frame.

Optionally, the performing text extraction on the clipping text frame to obtain an initial text set includes:

performing feature extraction on the cut text frames by using a convolutional neural network in a text extraction model to obtain feature text frames, wherein the text extraction model is trained in advance;

performing character position sequence recognition on the characteristic character frame by utilizing a long-short-term memory network in the character extraction model to generate an original character set;

and performing character alignment on the original character set by utilizing a time sequence classification network in the character extraction model to generate an initial character set.

Optionally, the feature extraction of the cut text box by using the convolutional neural network in the text extraction model to obtain a feature text box includes:

performing convolution feature extraction on the cut text frame by using a convolution layer in the convolution neural network to obtain an initial feature text frame;

reducing the dimension of the initial feature cutting frame by using a pooling layer in the convolutional neural network to obtain a dimension-reduced feature text frame;

and outputting the dimension-reducing characteristic text frame by using a full connection layer in the convolutional neural network to obtain the characteristic text frame.

Optionally, the performing text position sequence recognition on the feature text frame by using the long-short term memory network in the text extraction model, generating an original text set includes:

calculating the state value of the characteristic text frame by using the input gate of the long-short-term memory network;

calculating the activation value of the characteristic text frame by using the forgetting door of the long-short-term memory network;

calculating a state update value of the feature text frame according to the state value and the activation value;

and calculating a character position sequence of the state update value by using an output gate of the long-short-term memory network to generate an original character set.

Optionally, the extracting the key words in the initial word set includes:

deleting the stop words in the initial word set to obtain a standard word set;

and calculating the weight of each standard character in the standard character set, and screening the standard characters with the weights larger than the preset weights from the standard character set as the key characters.

In order to solve the above problems, the present application also provides a text recognition device, the device comprising:

the detection module is used for acquiring a text image, and performing text detection on the text image to obtain a text detection frame;

the merging module is used for screening and merging the text detection frames to obtain target text frames;

the cutting module is used for cutting the target text frame in a text-free area to obtain a cut text frame;

the extraction module is used for extracting the characters of the cutting character frame to obtain an initial character set;

and the recognition module is used for extracting the key words in the initial word set, checking the key words by utilizing a regular checking technology, and taking the key words which are successfully checked as word recognition results of the text image.

In order to solve the above-mentioned problems, the present application also provides an electronic apparatus including:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores a computer program executable by the at least one processor to implement the word recognition method described above.

In order to solve the above-mentioned problems, the present application also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned character recognition method.

Firstly, acquiring a text image, and performing text detection on the text image to obtain a text detection frame, wherein the text detection frame can detect a detection frame with text position coordinates in the text image; secondly, the embodiment of the application screens, merges and cuts the text detection frame in the text-free area to obtain the cut text frame, which can improve the text recognition performance, thereby greatly improving the accuracy of picture text recognition; further, the embodiment of the application utilizes the character extraction of the cutting character frame to obtain an initial character set, extracts key characters in the initial character set, verifies the key characters by utilizing a regular verification technology, and takes the key characters which are successfully verified as character recognition results of the text image. Therefore, the character recognition method, the character recognition device, the electronic equipment and the storage medium can improve the accuracy of character recognition.

Drawings

FIG. 1 is a flow chart of a text recognition method according to an embodiment of the present application;

FIG. 2 is a detailed flowchart illustrating one of the steps of the text recognition method shown in FIG. 1 according to a first embodiment of the present application;

FIG. 3 is a schematic diagram of a text recognition device according to an embodiment of the present application;

fig. 4 is a schematic diagram of an internal structure of an electronic device for implementing a text recognition method according to an embodiment of the present application;

the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application provides a character recognition method. The execution subject of the text recognition method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the word recognition method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Referring to fig. 1, a flow chart of a text recognition method according to an embodiment of the application is shown. In the embodiment of the application, the text recognition method comprises the following steps:

s1, acquiring a text image, and performing text detection on the text image to obtain a text detection frame.

In the embodiment of the application, the text image is obtained by converting a document text, and the document text can be a PDF text, such as a government message text. Further, the embodiment of the application performs text detection on the text image by using a trained text target frame detection model, wherein the text target frame detection model is constructed by a YOLO (You Only Look Once) neural network and is used for detecting a detection frame with text position coordinates in the text image, and the text detection model comprises the following components: convolution layer, batch normalization layer, fusion layer, activation function, etc. Further, before the text image is subjected to text detection by using the trained text target frame detection model, the embodiment of the application further comprises the following steps: training a pre-constructed character target frame detection model by using a training text image set until the pre-constructed character target frame detection model tends to be stable, ending the training of the pre-constructed character target frame detection model, and obtaining a character target frame detection model after training. It should be stated that the training process of the pre-constructed text target frame detection model belongs to the current mature technology, and further description is omitted here.

In detail, the text image is subjected to text detection by using the trained text target frame detection model to obtain a text detection frame, which comprises the following steps: extracting image features of the text image by using the convolution layer to obtain a feature image; -normalizing the feature images with the batch normalization layer (Batch Normalization, BN) to obtain standard feature images; fusing the bottom layer features of the text image with the standard feature image by utilizing the fusion layer to obtain a target feature image; and outputting a detection result of the target feature image by using the activation function, and generating a text detection frame according to the detection result.

The image feature extraction can be realized by carrying out convolution operation on tensors of the input image; the batch normalization layer normalizes the extracted image features, so that convergence of the model can be accelerated.

In a preferred embodiment, the normalization operation may be expressed as:

wherein x' _i Is a standard characteristic image set after batch standardization, x _i Is a characteristic image, mu is the mean value of the characteristic image, sigma ² For the variance of the feature image set, ε is an infinitesimal random number.

The fusion layer fuses the bottom layer features of the image into the extracted image features, so that the influence on the image gray level change caused by different gains can be reduced. The bottom layer features refer to basic features of the text image, such as color, length, width, etc., and preferably, the fusion is implemented by a Cross-Stage-Partial-Connections (CSP) module in the fusion layer in the embodiment of the present application.

In a preferred embodiment, the activation function comprises:

where s' represents the activated target feature image and s represents the target feature image.

Further, in a preferred embodiment of the present application, the detection result includes: x, y, height, width, category, etc., wherein x, y represent the center point of the target feature image, category represents whether the target feature image is a text region, i.e., category 0 represents not a text region, category 1 represents a predicted region is a text region, and then, the embodiment of the application selects the target feature image with category 1 as the text region, thereby generating the text detection frame.

And S2, screening and combining the text detection frames to obtain target text frames.

The embodiment of the application screens and merges the text detection frames to obtain the target text frame so as to screen the text detection frames with confidence and repeatability in the text detection frames and improve the speed of subsequent text extraction. The confidence coefficient refers to the probability that the characters fall in the detected character detection frame, namely, the higher the confidence coefficient is, the higher the probability that the detected character detection frame contains the characters is.

Further, before the text detection frame is screened and combined, the embodiment of the application further comprises: and performing Non-maximum processing on the text detection frame by using a Non-maximum algorithm (Non-Maximum Suppression, NMS) to inhibit elements which are not maximum in the text detection frame, so as to improve the detection speed of the subsequent text detection frames. It should be stated that the non-maximum algorithm belongs to a technology that is currently mature, and will not be described in detail herein.

Further, in one of the alternative embodiments of the present application, the text detection box may be implemented by a currently known interval estimation method.

Further, it should be understood that the text detection frames after screening will include text detection frames with the same text, so that the embodiment of the present application utilizes a preset merging rule to merge the text detection frames after screening to avoid the occurrence of the same text detection frames. The preset merging rule comprises the following steps: and merging the character detection frames with the same adjacent space, the same character height ratio and the same character content in the screened character detection frames.

S3, cutting the target text frame in a text-free area to obtain a cut text frame.

In the embodiment of the application, the target text frame is subjected to text-free region clipping so as to improve the character recognition performance of the follow-up model.

In detail, the step of performing text-free region clipping on the target text frame to obtain a clipped text frame includes: performing binarization processing on the target text frame to obtain a binarized text frame; inquiring a character starting position and a character ending position in the longitudinal axis direction in the binarized character frame and the longitudinal axis direction length of the binarized character frame, and performing longitudinal cutting on the binarized character frame according to the character starting position, the character ending position and the longitudinal axis direction length in the longitudinal axis direction to obtain a longitudinal cutting character frame; inquiring the character starting position and the character ending position in the transverse axis direction in the longitudinal cutting character frame and the transverse axis direction length of the longitudinal cutting character frame, and transversely cutting the longitudinal cutting character frame according to the character starting position and the character ending position in the transverse axis direction and the transverse axis direction length to obtain the cutting character frame.

The binarization processing of the target text frame comprises the following steps: and marking the text area in the target text frame as 1, and marking the background area as 0.

In an alternative embodiment, the text start position, the text end position, the length in the vertical axis direction, and the length in the horizontal axis direction may be implemented by a precompiled query script, and the query script may be compiled by a JavaScript script language.

In an alternative embodiment, the longitudinal clipping and the transverse clipping are implemented by currently known text clipping tools, such as Photoshop clipping tools.

And S4, performing character extraction on the cutting character frame to obtain an initial character set.

In the embodiment of the application, the training word extraction model is utilized to extract the words of the clipping word frame, and an initial word set is obtained. Wherein, the word extraction model that training was accomplished in advance includes: a convolutional neural network (Convolutional Neural Networks, CNN), a Long Short-Term Memory network (LSTM) and a time sequence classification network (Connectionist temporal classification, CTC), wherein the CNN is used for identifying characteristic text frames of the clipping text frames, the LSTM is used for extracting text sequences in the characteristic text frames, and the CTC is used for solving the problem that characters in the text feature sequences cannot be aligned. Further, the CNN includes a convolutional layer, a pooling layer, and a fully-connected layer, and the LSTM includes: an input gate, a forget gate, and an output gate.

Further, before the training word extraction model is used for extracting the words of the clipping word frame, the embodiment of the application further comprises the following steps: training the pre-constructed character extraction model by utilizing the training cutting character frame set until the pre-constructed character extraction model tends to be stable, ending the training of the pre-constructed character extraction, and obtaining the character extraction model after training. It should be stated that the pre-constructed text extraction training process belongs to a current mature technology, and further description is omitted here.

In detail, referring to fig. 2, the text extraction of the cut text frame by using the pre-trained text extraction model to obtain an initial text set includes:

s20, performing feature extraction on the cut text frame by using a convolutional neural network in the text extraction model to obtain a feature text frame;

s21, performing character position sequence recognition on the characteristic character frame by utilizing a long-short-term memory network in the character extraction model to generate an original character set;

s22, performing character alignment on the original character set by utilizing a time sequence classification network in the character extraction model to generate an initial character set.

Further, the S20 includes: performing convolution feature extraction on the cut text frame by using a convolution layer in the convolution neural network to obtain an initial feature text frame; reducing the dimension of the initial feature cutting frame by using a pooling layer in the convolutional neural network to obtain a dimension-reduced feature text frame; outputting the dimension-reducing characteristic text frame by utilizing a full connection layer in the convolutional neural network to obtain a characteristic text frame;

further, the S21 includes: calculating the state value of the characteristic text frame by using the input gate of the long-short-term memory network; calculating the activation value of the characteristic text frame by using the forgetting door of the long-short-term memory network; calculating a state update value of the feature text frame according to the state value and the activation value; and calculating a character position sequence of the state update value by using an output gate of the long-short-term memory network to generate an original character set.

Further, it should be stated that the training process of the convolutional neural network, the long-short term memory network and the time-ordered classification network belongs to the current mature technology, and is not further described herein

And S5, extracting key words in the initial word set, checking the key words by using a regular checking technology, and taking the key words which are successfully checked as word recognition results of the text image.

It should be understood that in the initial text set obtained in step S4, there are a lot of text that cannot be used by the user, so that the embodiment of the application better helps the user to process information by extracting the key text in the initial text set, and improves the working efficiency.

In detail, the extracting the key words in the initial word set includes: deleting the stop words in the initial word set to obtain a standard word set, calculating the weight of each standard word in the standard word set, and screening the standard words with the weights larger than the preset weights from the standard word set as the key words.

In an alternative embodiment, the deletion of the stop word may be filtered according to a stop word list, and if the stop word list has "yes", all "in the initial text set is deleted.

In an optional embodiment, the weight of the standard text may be obtained by calculating the duty ratio of each standard text in the standard text set, where the preset weight may be 0.6, or may be set to another value, and set according to a specific scenario.

Further, it should be understood that some incorrect character formats may exist in the extracted standard characters, such as a Chinese character error, so that the embodiment of the present application uses a regular verification technique to verify the key characters, and uses the key characters that are successfully verified as the character recognition result of the text image.

In an alternative embodiment, the regular verification technique includes: digital check expressions (e.g., ++0-9), chinese character check expressions (++u4e00-_9fa5] {0, } $), and special need check expressions (e.g., date format: +\d {4} - \d {1,2 }).

Further, to ensure privacy and reusability of the key words, the key words may also be stored in a blockchain node.

Firstly, acquiring a text image, and performing text detection on the text image to obtain a text detection frame, wherein the text detection frame can detect a detection frame with text position coordinates in the text image; secondly, the embodiment of the application screens, merges and cuts the text detection frame in the text-free area to obtain the cut text frame, which can improve the text recognition performance, thereby greatly improving the accuracy of picture text recognition; further, the embodiment of the application performs text extraction on the cutting text frame to obtain an initial text set, extracts key text in the initial text set, performs verification on the key text by using a regular verification technology, and takes the key text which is successfully verified as a text recognition result of the text image. Therefore, the application can improve the accuracy of character recognition.

As shown in fig. 3, a functional block diagram of the character recognition device according to the present application is shown.

The character recognition apparatus 100 of the present application may be mounted in an electronic device. The text recognition device may include a detection module 101, a combination module 102, a clipping module 103, an extraction module 104, and a recognition module 105, depending on the implemented functions. The module of the present application may also be referred to as a unit, meaning a series of computer program segments capable of being executed by the processor of the electronic device and of performing fixed functions, stored in the memory of the electronic device.

In the present embodiment, the functions concerning the respective modules/units are as follows:

the detection module 101 is configured to obtain a text image, and perform text detection on the text image to obtain a text detection box;

the merging module 102 is configured to screen and merge the text detection frames to obtain a target text frame;

the clipping module 103 is configured to clip the target text frame in a text-free area to obtain a clipped text frame;

the extracting module 104 is configured to perform text extraction on the clipping text frame to obtain an initial text set;

the recognition module 105 is configured to extract the key words in the initial word set, verify the key words by using a regular verification technology, and use the key words that are successfully verified as a word recognition result of the text image.

In detail, the modules in the word recognition device 100 in the embodiment of the present application use the same technical means as the word recognition method described in fig. 1 and 2, and can produce the same technical effects, which are not described herein.

Fig. 4 is a schematic structural diagram of an electronic device for implementing the text recognition method according to the present application.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a word recognition program 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various data such as a code for character recognition, etc., but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective parts of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (for example, performing word recognition, etc.) stored in the memory 11, and calling data stored in the memory 11.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

Fig. 4 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The word recognition 12 stored in the memory 11 of the electronic device 1 is a combination of a plurality of computer programs, which, when run in the processor 10, can implement:

screening and combining the text detection frames to obtain target text frames;

cutting the target text frame in a text-free area to obtain a cut text frame;

In particular, the specific implementation method of the processor 10 on the computer program may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

Further, the integrated modules/units of the electronic device 1 may be stored in a non-volatile computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

The present application also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:

screening and combining the text detection frames to obtain target text frames;

cutting the target text frame in a text-free area to obtain a cut text frame;

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.

Claims

1. A method of text recognition, the method comprising:

screening and combining the text detection frames to obtain target text frames;

cutting the target text frame in a text-free area to obtain a cut text frame;

extracting key words in the initial word set, checking the key words by using a regular checking technology, and taking the key words which are successfully checked as word recognition results of the text image;

the text image is subjected to text detection to obtain a text detection frame, which comprises the following steps: extracting image features of the text image by using a convolution layer in a text target frame detection model to obtain a feature image, wherein the text target frame detection model is trained in advance; carrying out standardization operation on the characteristic images by utilizing a batch standardization layer in the text target frame detection model to obtain standard characteristic images; fusing the bottom features of the text image with the standard feature image by utilizing a fusion layer in the text target frame detection model to obtain a target feature image; outputting a detection result of the target feature image by using an activation function in the text target frame detection model, and generating a text detection frame according to the detection result;

the step of performing text-free region clipping on the target text frame to obtain a clipping text frame comprises the following steps: performing binarization processing on the target text frame to obtain a binarized text frame; inquiring a character starting position and a character ending position in the longitudinal axis direction in the binarized character frame and the longitudinal axis direction length of the binarized character frame, and performing longitudinal cutting on the binarized character frame according to the character starting position, the character ending position and the longitudinal axis direction length in the longitudinal axis direction to obtain a longitudinal cutting character frame; inquiring the character starting position and the character ending position in the transverse axis direction in the longitudinal cutting character frame and the transverse axis direction length of the longitudinal cutting character frame, and transversely cutting the longitudinal cutting character frame according to the character starting position and the character ending position in the transverse axis direction and the transverse axis direction length to obtain the cutting character frame.

2. The text recognition method of claim 1, wherein the text extraction of the cropped text box to obtain an initial text set comprises:

3. The text recognition method as recited in claim 2, wherein the feature extraction of the cropped text box using the convolutional neural network in the text extraction model to obtain a feature text box comprises:

performing dimension reduction on the initial characteristic text frame by using a pooling layer in the convolutional neural network to obtain a dimension-reduced characteristic text frame;

4. The text recognition method as recited in claim 2, wherein said performing text position sequence recognition on said feature text frame using a long-short term memory network in said text extraction model, generating an original text set, comprises:

5. The text recognition method according to any one of claims 1 to 4, wherein the extracting the key text in the initial text set includes:

deleting the stop words in the initial word set to obtain a standard word set;

6. A character recognition apparatus for implementing the character recognition method according to any one of claims 1 to 5, characterized in that the apparatus comprises:

7. An electronic device, the electronic device comprising:

at least one processor; the method comprises the steps of,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the word recognition method of any one of claims 1 to 5.

8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the text recognition method according to any one of claims 1 to 5.