CN116524516A

CN116524516A - Text structured information determining method, device, equipment and storage medium

Info

Publication number: CN116524516A
Application number: CN202310278136.8A
Authority: CN
Inventors: 于海鹏; 李煜林; 钦夏孟; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-03-20
Filing date: 2023-03-20
Publication date: 2023-08-01

Abstract

The disclosure provides a text structured information determining method, a device, equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing and computer vision, and can be applied to scenes such as OCR. The specific implementation scheme is as follows: determining visual characteristics of a field image and an initial text recognition result of the field image; correcting the initial text recognition result according to the visual characteristics and the initial text recognition result to obtain a corrected text recognition result; and determining text structural information of the field image according to the field category corresponding to the field image and the corrected text recognition result. Through the technical scheme, the accuracy of determining the text structured information can be improved.

Description

Text structured information determining method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, image processing and computer vision, and can be applied to scenes such as OCR.

Background

Documents are an important way to save information, contain many pieces of structured information, and acquire these pieces of structured information helps to construct a huge database for data storage and management. However, the documents in the form of images are usually available at present, and when the documents in the form of images are recognized, recognition errors are inevitably generated in the process of recognition, so that how to correct the recognized content by using semantic information to obtain a more accurate analysis result is one of the challenges at present.

Disclosure of Invention

The disclosure provides a text structured information determining method, device, equipment and storage medium.

According to an aspect of the present disclosure, there is provided a text structured information determination method, the method including:

determining visual characteristics of a field image and an initial text recognition result of the field image;

correcting the initial text recognition result according to the visual characteristics and the initial text recognition result to obtain a corrected text recognition result;

and determining text structural information of the field image according to the field category corresponding to the field image and the corrected text recognition result.

According to another aspect of the present disclosure, there is provided a text structured information determination apparatus including:

the initial text result determining module is used for determining visual characteristics of the field image and an initial text recognition result of the field image;

the corrected text result determining module is used for correcting the initial text recognition result according to the visual characteristics and the initial text recognition result to obtain a corrected text recognition result;

and the structured information determining module is used for determining the text structured information of the field image according to the field category corresponding to the field image and the corrected text recognition result.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining textual structural information according to any one of the embodiments of the disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the text structured information determination method according to any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method of determining textual structured information according to any embodiment of the present disclosure.

According to the technology disclosed by the invention, the accuracy of determining the text structured information can be improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method of determining textual structured information provided in accordance with an embodiment of the present disclosure;

FIG. 2 is a flow chart of another text structured information determination method provided in accordance with an embodiment of the present disclosure;

FIG. 3 is a flow chart of yet another text structured information determination method provided in accordance with an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a text structured information determining apparatus according to an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device for implementing a text structured information determination method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the terms "initial," "correct," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In addition, in the technical scheme of the invention, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the text image, the field image and the like all meet the requirements of related laws and regulations and do not violate the popular regulations.

Fig. 1 is a flowchart of a text structured information determination method provided according to an embodiment of the present disclosure. The embodiment is suitable for the situation of how to more accurately determine the structural information of the document in the document scene in the image form. The method may be performed by a text structured information determination apparatus, which may be implemented in software and/or hardware, and may be integrated in an electronic device, such as a server, carrying text structured information determination functions. As shown in fig. 1, the text structured information determination method of the present embodiment may include:

s101, determining visual characteristics of the field image and an initial text recognition result of the field image.

In this embodiment, the field image refers to an image containing a field; wherein, the field can be text content at character level, text line level or paragraph level; for example, a single field, text line, paragraph, etc.

The visual features are used for representing features of the field image at the image level, can be shallow features such as stroke structures and the like, can also be deep features such as deep network output features in a convolutional neural network, and can be expressed in a vector or matrix form.

The initial text recognition result is a result of initially recognizing text in a field image.

Alternatively, a preset image feature extraction mode may be adopted to perform feature extraction on the field image, so as to obtain visual features of the field image. The preset image feature extraction mode may be an image feature extraction algorithm well known to those skilled in the art, for example, an image feature extraction model based on a machine learning algorithm or a deep learning algorithm.

Alternatively, a text recognition model may be used to recognize the field image, so as to obtain an initial text recognition result of the field image. The text recognition model may be a model for text recognition, among other things, as is known in the art.

Further, the characteristics output by the hidden layer in the text recognition model can be used as the visual characteristics of the field image.

S102, correcting the initial text recognition result according to the visual characteristics and the initial text recognition result to obtain a corrected text recognition result.

In this embodiment, the corrected text recognition result refers to a text result obtained by correcting the character which is not accurately recognized in the initial text recognition result.

Specifically, the initial text recognition result can be vectorized to obtain a processed initial text recognition result, then the visual features and the processed initial text recognition result are spliced to obtain spliced features, and further the spliced features are processed based on the correction model to obtain a corrected text recognition result. The correction model may be a pre-trained neural network model.

S103, determining text structural information of the field image according to the field category corresponding to the field image and the corrected text recognition result.

In this embodiment, the text structured information may refer to document information in the form of key-value pairs, for example, the key is hospital and the corresponding key value is xxhospital.

Specifically, the corrected text recognition result of the field category corresponding to the field image can be corresponding to be used as the text structural information of the field image.

According to the technical scheme provided by the embodiment of the disclosure, the visual characteristics of the field image and the initial text recognition result of the field image are determined, then the initial text recognition result is corrected according to the visual characteristics and the initial text recognition result to obtain the corrected text recognition result, and further the text structural information of the field image is determined according to the field category corresponding to the field image and the corrected text recognition result. According to the technical scheme, the visual characteristics of the field image and the text recognition result of the field are combined, the text recognition result of the field image is corrected, and the structured information of the document can be extracted more accurately end to end.

On the basis of the above embodiment, as an optional manner of the present disclosure, before determining the visual feature of the field image and the initial text recognition result of the field image, field detection may be performed on the text image to obtain the field category and the position information of the field; and determining a field image of the field from the text image according to the position information.

Wherein, the text image refers to an image containing text. The field class refers to type information to which the field belongs. The position information refers to position information of a field in a text image, and may be vertex coordinate information of a bounding box containing the field.

Specifically, based on the text detection model, field detection can be performed on the text image to obtain field category and position information of the field. Furthermore, according to the position information of the fields, the region where each field is located can be cut from the text image, so as to obtain the field image of the field. The text detection model can be a model obtained by training a convolutional neural network.

It can be appreciated that, compared with the scheme of extracting the text structural information from the whole text image, the method and the device for extracting the text structural information from the text image firstly cut the field image, so that the text structural information of the field image is determined, namely, the text structural information is determined from the field image with finer granularity, so that the mutual influence among fields is avoided, and the determination of the text structural information is more accurate.

Fig. 2 is a flow chart of another text structured information determination method provided in accordance with an embodiment of the present disclosure. The present embodiment provides an alternative embodiment based on the above embodiment by further optimizing the "correcting the initial text recognition result according to the visual characteristics and the initial text recognition result, to obtain the corrected text recognition result". As shown in fig. 2, the text structured information determination method of the present embodiment may include:

s201, determining visual characteristics of the field image and an initial text recognition result of the field image.

S202, extracting features of the initial text recognition result to obtain embedded features of the initial text recognition result.

In this embodiment, the embedded feature refers to a semantic feature obtained by processing an initial text recognition result, and may be represented in a matrix or vector form.

Alternatively, the initial text recognition result may be input into a text feature extraction model, and the embedded feature of the initial text recognition result may be obtained through model learning. The text feature extraction model may be obtained by training a convolutional neural network in advance.

And S203, correcting the initial text recognition result according to the visual characteristics and the embedded characteristics to obtain a corrected text recognition result.

Alternatively, the visual feature and the embedded feature may be fused, and the fused feature is used to correct the initial text recognition result, so as to obtain a corrected text recognition result. For example, the visual features and the embedded features can be spliced, and the spliced features are input into the text recognition model again to obtain a corrected text recognition result.

S204, determining text structural information of the field image according to the field category corresponding to the field image and the corrected text recognition result.

According to the technical scheme provided by the embodiment of the disclosure, the visual characteristics of the field image and the initial text recognition result of the field image are determined, then the initial text recognition result is subjected to characteristic extraction to obtain the embedded characteristics of the initial text recognition result, the initial text recognition result is corrected according to the visual characteristics and the embedded characteristics to obtain the corrected text recognition result, and further the text structural information of the field image is determined according to the field category corresponding to the field image and the corrected text recognition result. According to the technical scheme, the embedded features, namely the semantic features, are introduced, and the initial text recognition result is corrected by combining the visual features and the semantic features, so that more accurate text structural information can be obtained.

On the basis of the above embodiment, as an optional manner of the present disclosure, feature extraction is performed on the initial text recognition result, and the embedding feature for obtaining the initial text recognition result may be that the initial text recognition result is encoded, so as to obtain the encoding feature of the initial text recognition result; and mapping the coding features to obtain embedded features of the initial text recognition result.

Specifically, the initial text recognition result can be encoded based on a preset encoding rule to obtain the encoding feature of the initial text recognition result, then the encoding feature is mapped by adopting the full-connection layer, and the feature output by the full-connection layer is used as the embedded feature of the initial text recognition result.

It can be understood that the coding mapping is performed on the initial text recognition result, so that semantic information of the initial text recognition result and correlation among characters can be fully learned, the obtained embedded features can more accurately reflect the initial text recognition result, and a foundation is laid for correcting the initial text recognition result.

Fig. 3 is a flow chart of yet another text structured information determination method provided in accordance with an embodiment of the present disclosure. Based on the above embodiment, the present embodiment further optimizes "correcting the initial text recognition result according to the visual feature and the embedded feature, and obtains the corrected text recognition result" to provide an optional manner. As shown in fig. 3, the text structured information determination method of the present embodiment may include:

s301, determining visual characteristics of the field image and an initial text recognition result of the field image.

S302, extracting features of the initial text recognition result to obtain embedded features of the initial text recognition result.

S303, determining multi-mode features according to the visual features and the embedded features.

S304, correcting the initial text recognition result according to the multi-mode characteristics to obtain a corrected text recognition result.

In this embodiment, the multi-modal feature refers to a feature that merges different modes, and may be represented in a vector or matrix form.

Alternatively, the visual features and the embedded features may be scale normalized, and then the normalized visual features and the embedded features may be superimposed, where the superimposed features are used as multi-modal features. And inputting the multi-modal characteristics into the text recognition model again to obtain a corrected text recognition result of the initial text recognition result.

S305, determining text structural information of the field image according to the field category corresponding to the field image and the corrected text recognition result.

According to the technical scheme provided by the embodiment of the disclosure, the visual characteristics of the field image and the initial text recognition result of the field image are determined, then the initial text recognition result is subjected to characteristic extraction to obtain the embedded characteristics of the initial text recognition result, the multi-mode characteristics are determined according to the visual characteristics and the embedded characteristics, the initial text recognition result is corrected according to the multi-mode characteristics to obtain the corrected text recognition result, and further the text structural information of the field image is determined according to the field category corresponding to the field image and the corrected text recognition result. According to the technical scheme, the multi-mode features are introduced, the visual features such as the stroke structure and the text semantic features are fully fused, and the initial text recognition result is corrected, so that the correction result is more accurate.

On the basis of the foregoing embodiment, as an optional manner of the present disclosure, correcting, according to the multi-modal feature, the initial text recognition result to obtain a corrected text recognition result, including: according to the multi-mode characteristics, determining correction characteristics of an initial text recognition result; and determining a corrected text recognition result of the initial text recognition result according to the corrected characteristics.

The correction features can be features corresponding to all characters in the initial text recognition result and can be expressed in a matrix or vector form; for example, the correction probability vector corresponding to each character may be used.

Specifically, the multi-modal feature can be input into the correction model to obtain correction features of the initial text recognition result, namely correction probability vectors corresponding to the characters in the initial text recognition result, and then the correction probability vectors are converted into texts to obtain the corrected text recognition result. The correction model may be a deep learning model, such as a transducer model, among others.

It can be appreciated that text correction is performed based on multi-modal features, which fully utilizes characteristics of characters, such as stroke structures, and the like, and fully utilizes semantic information of text and correlation of context thereof.

Fig. 4 is a schematic structural diagram of a text structured information determining apparatus according to an embodiment of the present disclosure. The embodiment is suitable for the situation of how to more accurately determine the structural information of the document in the document scene in the image form. The apparatus may be implemented in software and/or hardware and may be integrated in an electronic device, such as a server, carrying text structured information determination functions. As shown in fig. 4, the text structured information determination apparatus 400 includes:

an initial text result determining module 401, configured to determine a visual feature of the field image and an initial text recognition result of the field image;

the corrected text result determining module 402 is configured to correct the initial text recognition result according to the visual feature and the initial text recognition result, so as to obtain a corrected text recognition result;

the structured information determining module 403 is configured to determine text structured information of the field image according to the field category corresponding to the field image and the corrected text recognition result.

Further, the apparatus further comprises:

the field detection module is used for carrying out field detection on the text image to obtain field category and position information of the field;

and the field image determining module is used for determining a field image of the field from the text image according to the position information.

Further, the corrected text result determining module 402 includes:

the embedded feature determining unit is used for extracting features of the initial text recognition result to obtain embedded features of the initial text recognition result;

and the corrected text result determining unit is used for correcting the initial text recognition result according to the visual characteristics and the embedded characteristics to obtain a corrected text recognition result.

Further, the embedded feature determining unit is specifically configured to:

encoding the initial text recognition result to obtain the encoding characteristics of the initial text recognition result;

and mapping the coding features to obtain embedded features of the initial text recognition result.

Further, the corrected text result determining unit includes:

a multi-modal feature determination subunit configured to determine multi-modal features according to the visual features and the embedded features;

and the corrected text result determining subunit is used for correcting the initial text recognition result according to the multi-mode characteristics to obtain a corrected text recognition result.

Further, the corrected text result determination subunit is specifically configured to:

according to the multi-mode characteristics, determining correction characteristics of an initial text recognition result;

and determining a corrected text recognition result of the initial text recognition result according to the corrected characteristics.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 5 is a block diagram of an electronic device for implementing a text structured information determination method of an embodiment of the present disclosure; fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the respective methods and processes described above, such as the text structured information determination method. For example, in some embodiments, the text structured information determination method can be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the text structured information determination method described above can be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the text structured information determination method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A text structured information determination method, comprising:

2. The method of claim 1, wherein prior to determining the visual characteristics of the field image and the initial text recognition result of the field image, further comprising:

performing field detection on the text image to obtain field category and position information of the field;

and determining a field image of the field from the text image according to the position information.

3. The method of claim 1, wherein the correcting the initial text recognition result according to the visual feature and the initial text recognition result to obtain a corrected text recognition result comprises:

extracting features of the initial text recognition result to obtain embedded features of the initial text recognition result;

and correcting the initial text recognition result according to the visual characteristics and the embedded characteristics to obtain a corrected text recognition result.

4. A method according to claim 3, wherein the feature extraction of the initial text recognition result to obtain the embedded feature of the initial text recognition result includes:

coding the initial text recognition result to obtain coding features of the initial text recognition result;

5. A method according to claim 3, wherein said correcting said initial text recognition result according to said visual features and said embedded features to obtain a corrected text recognition result comprises:

determining a multi-modal feature from the visual feature and the embedded feature;

and correcting the initial text recognition result according to the multi-mode characteristics to obtain a corrected text recognition result.

6. The method of claim 5, wherein the correcting the initial text recognition result according to the multi-modal feature to obtain a corrected text recognition result comprises:

determining correction characteristics of the initial text recognition result according to the multi-modal characteristics;

7. A text structured information determination apparatus comprising:

8. The apparatus of claim 7, wherein the apparatus further comprises:

and the field image determining module is used for determining the field image of the field from the text image according to the position information.

9. The apparatus of claim 7, wherein the corrected text result determination module comprises:

10. The apparatus according to claim 9, wherein the embedded feature determination unit is specifically configured to:

11. The apparatus of claim 9, wherein the corrected text result determination unit comprises:

12. The apparatus of claim 11, wherein the corrected text result determination subunit is specifically configured to:

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the text structured information determination method of any one of claims 1-6.

14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the text structured information determination method according to any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the text structured information determination method of any of claims 1-6.