CN112668580A

CN112668580A - Text recognition method, text recognition device and terminal equipment

Info

Publication number: CN112668580A
Application number: CN202011580119.2A
Authority: CN
Inventors: 魏政; 曹瑾; 孙圆
Original assignee: Nanjing Aerospace Technology Co ltd
Current assignee: Nanjing Aerospace Technology Co ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-16

Abstract

The application is applicable to the technical field of artificial intelligence, and provides a text recognition method, a text recognition device, terminal equipment and a computer readable storage medium, wherein the method comprises the steps of obtaining a text image to be recognized, wherein the text image comprises noise information; eliminating the interference of the noise information by using the trained first neural network model and segmenting the text image to obtain a text region of the text image; performing character recognition on the text region by using the trained second neural network model to obtain a character recognition result; and generating a file according to the text image and the character recognition result. By the method, the character recognition of the text image in the political and legal field can be realized, and the recognition accuracy is high.

Description

Text recognition method, text recognition device and terminal equipment

Technical Field

The present application belongs to the technical field of artificial intelligence, and in particular, relates to a text recognition method, a text recognition apparatus, a terminal device, and a computer-readable storage medium.

Background

Currently, various text images exist in the political field, for example, noise such as broken pen, adhesion, shadow, stain, and the like exists in text areas of the text images; distortion of characters in a text region, inclination of text lines, and the like due to thickness, smoothness, and print quality of paper; due to the particularity of text images in the field of politics, fingerprints, seals, postmarks and the like exist in text regions. In the related art, the difficulty of character recognition of a text region in the political field is high, and the recognition accuracy is low.

Disclosure of Invention

In view of the above, the present application provides a text recognition method, a text recognition apparatus, a terminal device, and a computer-readable storage medium, which can implement text image character recognition in the political and legal fields with high recognition accuracy.

In a first aspect, the present application provides a text recognition method, including:

acquiring a text image to be identified, wherein the text image comprises noise information;

eliminating the interference of the noise information by using a trained first neural network model and segmenting the text image to obtain a text region of the text image;

performing character recognition on the text region by using the trained second neural network model to obtain a character recognition result;

and generating a file according to the text image and the character recognition result.

In a second aspect, the present application provides a text recognition apparatus, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a text image to be identified, and the text image comprises noise information;

a segmentation unit, configured to eliminate interference of the noise information by using a trained first neural network model and segment the text image to obtain a text region of the text image;

the recognition unit is used for carrying out character recognition on the text area by using the trained second neural network model to obtain a character recognition result;

and the generating unit is used for generating a file according to the text image and the character recognition result.

In a third aspect, the present application provides a terminal device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method provided in the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the method as provided in the first aspect.

In a fifth aspect, the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to perform the steps of the method provided in the first aspect.

As can be seen from the above, in the present application, a text image to be recognized is first obtained, where the text image includes noise information, then the first trained neural network model is used to eliminate interference of the noise information and segment the text image, so as to obtain a text region of the text image, a second trained neural network model is used to perform text recognition on the text region, so as to obtain a text recognition result, and finally a file is generated according to the text image and the text recognition result. According to the scheme, the interference of the noise of the text image is eliminated through the first neural network model, the text region is accurately obtained through segmentation, then the second neural network model is utilized to perform character recognition on the text region, the interference of the noise is eliminated, and therefore the obtained character recognition result is accurate, so that the character recognition of the text image in the political field is realized, and the recognition accuracy is high. It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a text recognition method provided in an embodiment of the present application;

fig. 2 is a block diagram of a text recognition apparatus according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

Fig. 1 shows a flowchart of a text recognition method provided in an embodiment of the present application, which is detailed as follows:

step 101, acquiring a text image to be recognized.

In the embodiment of the application, the text image to be recognized can be selected by a user, and when the user needs to perform character recognition on a certain text image, the text image can be selected as the text image to be recognized. The text image may be an image captured by the terminal device through a camera, or the text image may be an image obtained from the inside of the terminal device (for example, an image of an album of the terminal device). Illustratively, a user can shoot a print file in the political field through a terminal device to obtain a text image. The text image includes noise information, such as a fingerprint, a stamp, an indicia, and the like.

And 102, eliminating interference of noise information by using the trained first neural network model and segmenting the text image to obtain a text region of the text image.

In this embodiment of the present application, the first neural Network model may be a Progressive Scale Expansion Network (PSENet) Network model. The PSENet network model can eliminate the interference of noise information in the text image, and segment the text image to obtain the text regions of the text image, wherein the number of the text regions of the text image may be one or more than two, and is specifically determined by the content of the text image. Before using the first neural network model, the first neural network model needs to be trained. The training samples used for training the first neural network model comprise a text line segmentation data set (containing characters in various real scenes) and a marked political and legal document data set (containing a large number of document text lines of standard systems, densely arranged document text lines and digital text lines). Optionally, the labeling mode may be manual labeling, or may be pre-labeling by OpenCV, which is not limited herein. It should be noted that, before inputting the training samples into the first neural network model, the training samples need to be preprocessed, and specifically, the training samples may be converted into input images conforming to an input format of the first neural network model, and then the input images are input into the first neural network model to train the first neural network model.

Optionally, the step 102 may specifically include:

a1, eliminating the interference of noise information by using a first neural network model, and detecting a text box of the text image to obtain the position information of a text area;

and A2, extracting text regions from the text image according to the position information.

In the embodiment of the application, the text image can be input into the trained first neural network model, and the first neural network model can eliminate the interference of noise information and perform text box detection on the text image. The first neural network model may output location information of a text region of the text image. The position information is used to indicate a position where the text region is located in the text image, and therefore, the text region can be extracted from the text image based on the position information.

Alternatively, the text region may be a rectangle, and on this basis, the position information may include coordinates of two vertices of the text region, and the step a2 may specifically include:

determining the coordinates of all the vertexes of the text area according to the coordinates of the two vertexes;

and extracting a text region from the text image according to the coordinates of all the vertexes.

The two vertexes can be a top left vertex and a bottom right vertex of the text region, and according to the shape characteristics of the rectangle, under the condition that the coordinates of the top left vertex and the coordinates of the bottom right vertex are known, the coordinates of the top left vertex of the text region and the coordinates of the top right vertex of the text region can be determined; or, the two vertices may also be an upper right vertex and a lower left vertex of the text region, and according to the shape feature of the rectangle, under the condition that the coordinates of the upper right vertex and the coordinates of the lower left vertex are known, the coordinates of the upper left vertex and the coordinates of the lower right vertex of the text region may be determined. According to the coordinates of all the vertexes (including the top left vertex, the top right vertex, the bottom left vertex and the bottom right vertex) of the text region, a rectangle can be determined in the text image, and the rectangle is the text region.

And 103, performing character recognition on the text region by using the trained second neural network model to obtain a character recognition result.

In an embodiment of the present application, the second Neural Network model may be a Conditional Recovery Neural Network (CRNN) Network model. The text area comprises text content, and the CRNN model is utilized to perform character recognition on the text area to obtain a character recognition result. Before using the second neural network model, the second neural network model needs to be trained. The training samples used to train the second neural network model include a print data set (containing multiple typeface style text) and a political grammar data set (containing multiple grammatical text). Before inputting the training samples into the second neural network model, the training samples need to be preprocessed, specifically, the training samples may be enhanced to obtain enhanced training samples, and then the enhanced training samples are input into the second neural network model to train the second neural network model. The enhancement processing may include, but is not limited to, rotation, warping, noising, blurring, and the like.

Optionally, the step 103 may specifically include:

performing character recognition on the text region by using the second neural network model to obtain the index position of characters corresponding to the text region in the preset dictionary;

and acquiring characters corresponding to the text area from a preset dictionary according to the index position.

In an embodiment of the present application, the text region may be input to a second neural network model, and the second neural network model may perform text recognition on the text region. Specifically, after the text region is input to the second neural network model, the second neural network model may output an index position of a word corresponding to the text region in the preset dictionary. The preset dictionary comprises at least one character collected in advance, the characters are arranged in the preset dictionary in sequence, each character is correspondingly provided with an index position, and the index positions are used for indicating the positions of the corresponding characters in the preset dictionary. The terminal device can obtain the characters corresponding to the text region from the preset dictionary according to the index position. For example, the second neural network model outputs that the index position of the text corresponding to the text region is a, and in the preset dictionary, if the text at the position indicated by the index position a is "me", the position corresponding to the text region can be obtained as "me".

And 104, generating a file according to the text image and the character recognition result.

In the embodiment of the application, after the character recognition result is obtained, a file can be generated according to the text image and the character recognition result, and the file comprises the text image and the character recognition result. Therefore, the file provides the user with a character recognition result while keeping the content of the text image, so that the content of the text image can be better understood according to the character recognition result.

Optionally, the step 104 may specifically include:

b1, determining the corresponding target position of the character recognition result in the file according to the position information of the text area;

and B2, generating a file according to the target position, the text image and the character recognition result.

In the embodiment of the present application, the text recognition result includes text corresponding to the text region. According to the position information of the text area, the target position of the corresponding text of the text area in the file can be determined. That is, the target position of the text corresponding to the text area in the document is the same as the position where the text is displayed in the text image. For example, the word a is displayed in the middle of the text image, and the target position of the word a in the document should also be the middle position. After the target position of the text corresponding to the text area in the file is determined, the file can be generated according to the target position, the text image and the text recognition result.

Optionally, the file may be a dual-layer Portable Document Format (PDF) file, and based on this, the step B2 may specifically include:

and generating a double-layer PDF file by taking the text image as the upper layer of the double-layer PDF file and the character recognition result as the lower layer of the double-layer PDF file, wherein the character recognition result is positioned at the target position of the double-layer PDF file.

The double-layer PDF file is a PDF format file with a multilayer structure and is a file derived from a PDF file. The upper layer of the double-layer PDF file is an image layer, the lower layer of the double-layer PDF file is a text layer, and the text layer and the image layer are in one-to-one correspondence in position. In the embodiment of the application, the text image can be used as an upper layer of the double-layer PDF file, namely, an image layer, and the character recognition result can be used as a lower layer of the double-layer PDF file, namely, a text layer, so as to generate the double-layer PDF file comprising the text image and the character recognition result. Wherein, the position of the character recognition result in the text layer of the double-layer PDF file is the target position determined in the step B1.

As can be seen from the above, in the present application, a text image to be recognized is first obtained, where the text image includes noise information, then the first trained neural network model is used to eliminate interference of the noise information and segment the text image, so as to obtain a text region of the text image, a second trained neural network model is used to perform text recognition on the text region, so as to obtain a text recognition result, and finally a file is generated according to the text image and the text recognition result. According to the scheme, the interference of the noise of the text image is eliminated through the first neural network model, the text region is accurately obtained through segmentation, then the second neural network model is utilized to perform character recognition on the text region, the interference of the noise is eliminated, and therefore the obtained character recognition result is accurate, so that the character recognition of the text image in the political field is realized, and the recognition accuracy is high.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 2 shows a block diagram of a text recognition apparatus according to an embodiment of the present application, and only shows portions related to the embodiment of the present application for convenience of description.

The text recognition apparatus 200 includes:

an obtaining unit 201, configured to obtain a text image to be identified, where the text image includes noise information;

a segmenting unit 202, configured to eliminate interference of the noise information by using the trained first neural network model and segment the text image to obtain a text region of the text image;

the recognition unit 203 is configured to perform character recognition on the text region by using the trained second neural network model to obtain a character recognition result;

a generating unit 204, configured to generate a file according to the text image and the character recognition result.

Optionally, the dividing unit 202 includes:

a text box detection subunit, configured to eliminate interference of the noise information by using the first neural network model, and perform text box detection on the text image to obtain position information of the text region;

and the region extraction subunit is used for extracting the text region from the text image according to the position information.

Optionally, the text region is rectangular, the position information includes coordinates of two vertices of the text region, and the region extraction subunit includes:

a coordinate determination subunit, configured to determine coordinates of all vertices of the text region according to the coordinates of the two vertices;

a text region extracting subunit, configured to extract the text region from the text image according to the coordinates of all the vertices.

Optionally, the generating unit 204 includes:

a position determining subunit, configured to determine, according to the position information of the text area, a target position in the file corresponding to the text recognition result;

and a file generating subunit, configured to generate the file according to the target position, the text image, and the character recognition result.

Optionally, the file is a dual-layer PDF file, and the file generation subunit is specifically configured to generate the dual-layer PDF file by using the text image as an upper layer of the dual-layer PDF file and the character recognition result as a lower layer of the dual-layer PDF file, where the character recognition result is located in the target position of the dual-layer PDF file.

Optionally, the identifying unit 203 includes:

a text recognition subunit, configured to perform text recognition on the text region by using the second neural network model, so as to obtain an index position of a text corresponding to the text region in a preset dictionary;

and the character obtaining subunit is used for obtaining the characters corresponding to the text area from the preset dictionary according to the index position.

Optionally, the first neural network model is a PSENet network model, and the second neural network model is a CRNN network model.

Fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 3, the terminal device 3 of this embodiment includes: at least one processor 30 (only one is shown in fig. 3), a memory 31, and a computer program 32 stored in the memory 31 and executable on the at least one processor 30, wherein the processor 30 executes the computer program 32 to perform the following steps:

In a second possible embodiment based on the first possible embodiment, the obtaining of the text region of the text image by segmenting the text image while eliminating the interference of the noise information by using the trained first neural network model includes:

eliminating the interference of the noise information by using the first neural network model, and detecting a text box of the text image to obtain the position information of the text area;

and extracting the text area from the text image according to the position information.

In a third possible implementation form based on the second possible implementation form, the text region is a rectangle, the position information includes coordinates of two vertices of the text region, and the extracting the text region from the text image based on the position information includes:

and extracting the text area from the text image according to the coordinates of all the vertexes.

In a fourth possible embodiment based on the second possible embodiment, the generating a file based on the text image and the character recognition result includes:

determining a target position corresponding to the character recognition result in the file according to the position information of the text area;

and generating the file according to the target position, the text image and the character recognition result.

In a fifth possible embodiment based on the fourth possible embodiment, the generating the file based on the target position, the text image, and the character recognition result, the generating the file being a dual-layer PDF file, includes:

and generating the double-layer PDF file by using the text image as an upper layer of the double-layer PDF file and the character recognition result as a lower layer of the double-layer PDF file, wherein the character recognition result is positioned at the target position of the double-layer PDF file.

In a sixth possible embodiment based on the first possible embodiment, the second possible embodiment, the third possible embodiment, the fourth possible embodiment, or the fifth possible embodiment, the method for performing character recognition on the text region by using a trained second neural network model to obtain a character recognition result includes:

performing character recognition on the text region by using the second neural network model to obtain an index position of a character corresponding to the text region in a preset dictionary;

and acquiring characters corresponding to the text area from the preset dictionary according to the index position.

In a seventh possible embodiment based on the first possible embodiment, the second possible embodiment, the third possible embodiment, the fourth possible embodiment, or the fifth possible embodiment, the first neural network model is a PSENet network model, and the second neural network model is a CRNN network model.

The terminal device 3 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is only an example of the terminal device 3, and does not constitute a limitation to the terminal device 3, and may include more or less components than those shown, or combine some components, or different components, for example, and may further include an input/output device, a network access device, and the like.

The Processor 30 may be a Central Processing Unit (CPU), and the Processor 30 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 31 may be an internal storage unit of the terminal device 3, such as a hard disk or a memory of the terminal device 3. In other embodiments, the memory 31 may also be an external storage device of the terminal device 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device 3. Further, the memory 31 may include both an internal storage unit and an external storage device of the terminal device 3. The memory 31 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, other programs, and the like, such as program codes of the computer programs. The above-mentioned memory 31 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps in the above method embodiments.

Embodiments of the present application provide a computer program product, which, when running on a terminal device, causes the terminal device to execute the steps in the above-mentioned method embodiments.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer-readable medium may include at least: any entity or apparatus capable of carrying computer program code to a terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the above modules or units is only one logical function division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A text recognition method, comprising:

eliminating the interference of the noise information by using the trained first neural network model and segmenting the text image to obtain a text region of the text image;

2. The method of claim 1, wherein the using the trained first neural network model to eliminate the interference of the noise information and segment the text image to obtain the text region of the text image comprises:

and extracting the text region from the text image according to the position information.

3. The text recognition method according to claim 2, wherein the text region is rectangular, the position information includes coordinates of two vertices of the text region, and the extracting the text region from the text image based on the position information includes:

and extracting the text region from the text image according to the coordinates of all the vertexes.

4. The method of claim 2, wherein the generating a file from the text image and the word recognition result comprises:

determining a corresponding target position of the character recognition result in the file according to the position information of the text area;

5. The method of claim 4, wherein the document is a dual-layer PDF document, and wherein generating the document according to the target location, the text image, and the text recognition result comprises:

and taking the text image as the upper layer of the double-layer PDF file, taking the character recognition result as the lower layer of the double-layer PDF file, and generating the double-layer PDF file, wherein the character recognition result is positioned at the target position of the double-layer PDF file.

6. The method according to any one of claims 1 to 5, wherein the performing character recognition on the text region by using the trained second neural network model to obtain a character recognition result comprises:

performing character recognition on the text region by using the second neural network model to obtain an index position of characters corresponding to the text region in a preset dictionary;

and obtaining characters corresponding to the text area from the preset dictionary according to the index position.

7. The text recognition method of any one of claims 1 to 5, wherein the first neural network model is a PSENet network model and the second neural network model is a CRNN network model.

8. A text recognition apparatus, comprising:

the segmentation unit is used for eliminating the interference of the noise information by using the trained first neural network model and segmenting the text image to obtain a text region of the text image;

the recognition unit is used for carrying out character recognition on the text region by utilizing the trained second neural network model to obtain a character recognition result;

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.