CN113627439A - Text structuring method, processing device, electronic device and storage medium - Google Patents

Text structuring method, processing device, electronic device and storage medium Download PDF

Info

Publication number
CN113627439A
CN113627439A CN202110921811.5A CN202110921811A CN113627439A CN 113627439 A CN113627439 A CN 113627439A CN 202110921811 A CN202110921811 A CN 202110921811A CN 113627439 A CN113627439 A CN 113627439A
Authority
CN
China
Prior art keywords
text
detection box
category
target
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110921811.5A
Other languages
Chinese (zh)
Inventor
于海鹏
梁思远
李煜林
钦夏孟
姚锟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110921811.5A priority Critical patent/CN113627439A/en
Publication of CN113627439A publication Critical patent/CN113627439A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Character Discrimination (AREA)

Abstract

The disclosure provides a text structuring processing method, a processing device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as OCR optical character recognition. The specific implementation scheme is as follows: performing text detection on the text image to obtain category information of at least one text detection box corresponding to the text image, wherein the category information comprises a keyword category or a numerical value category; determining a text image corresponding to a target text detection box in at least one text detection box; performing text recognition on the text image corresponding to the target text detection box to obtain a text recognition result of the text image corresponding to the target text detection box; performing text classification on the text recognition result to obtain a semantic classification result corresponding to the text recognition result; generating a text structured result, wherein the text structured result comprises a value corresponding to the keyword category and a value corresponding to the numerical value category.

Description

Text structuring method, processing device, electronic device and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technology, and more particularly to the field of computer vision and deep learning technology, and can be applied to scenes such as OCR optical character recognition. In particular, the invention relates to a text structuring processing method, a processing device, an electronic device and a storage medium.
Background
With the continuous development and popularization of information technology, various industries have widely utilized the information technology to improve efficiency, so that a large amount of text data is generated, the text data may contain more structured information, and the acquisition of the structured information is helpful for providing help for deep-level application based on the text data.
Disclosure of Invention
The disclosure provides a text structuring processing method, a processing device, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided a text structuring processing method, including: performing text detection on a text image to obtain category information of at least one text detection box corresponding to the text image, wherein the category information comprises a keyword category or a numerical value category; determining a text image corresponding to a target text detection box in the at least one text detection box, wherein the target text detection box is a text detection box of which the category information is the numerical category; performing text recognition on the text image corresponding to the target text detection box to obtain a text recognition result of the text image corresponding to the target text detection box; performing text classification on the text recognition result to obtain a semantic classification result corresponding to the text recognition result; and generating a text structured result, wherein the text structured result includes a value corresponding to the keyword category and a value corresponding to the numerical category, the value corresponding to the keyword category includes the semantic category result, and the value corresponding to the numerical category includes the text recognition result.
According to another aspect of the present disclosure, there is provided a text structuring processing apparatus including: the text detection module is used for performing text detection on a text image to obtain category information of at least one text detection box corresponding to the text image, wherein the category information comprises a keyword category or a numerical value category; a determining module, configured to determine a text image corresponding to a target text detection box in the at least one text detection box, where the target text detection box is a text detection box whose category information is the numeric category; the text recognition module is used for performing text recognition on the text image corresponding to the target text detection box to obtain a text recognition result of the text image corresponding to the target text detection box; the text classification module is used for performing text classification on the text recognition result to obtain a semantic classification result corresponding to the text recognition result; and a generating module, configured to generate a text structured result, where the text structured result includes a value corresponding to the keyword category and a value corresponding to the numerical category, the value corresponding to the keyword category includes the semantic category result, and the value corresponding to the numerical category includes the text recognition result.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as above.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 schematically illustrates an exemplary system architecture to which a text structured processing method and processing apparatus may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically shows a flow diagram of a text structuring process method according to an embodiment of the present disclosure;
FIG. 3 schematically shows a schematic diagram of a text structuring process according to an embodiment of the present disclosure;
FIG. 4 schematically shows a block diagram of a text structuring processing device according to an embodiment of the present disclosure; and
fig. 5 shows a block diagram of an electronic device suitable for a text structuring processing method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Text structuring may be understood as processing the text content into a form comprising values corresponding to a keyword category and values corresponding to a numerical category. The Key class (i.e., Key) and the Value class (i.e., Value) can be understood as a Key-Value. The text data may be presented in the form of an image, i.e. forming a text image, the text structuring of which may be achieved in two ways.
The method comprises the steps of processing a text image by using a detection algorithm to obtain a category identification result aiming at a target text included in the text image, processing the target text by using a text identification algorithm to obtain a text identification result, and obtaining a text structuralization result aiming at the target text according to the category identification result and the text identification result. The text structured result includes a value corresponding to the keyword category and a value corresponding to the numerical category, the value corresponding to the keyword category includes a category recognition result, and the value corresponding to the numerical category includes a text recognition result.
And secondly, processing the text image by using a text detection algorithm to obtain position information of a target text included in the text image, processing the target text by using a text recognition algorithm to obtain a text recognition result, determining a category recognition result corresponding to the target text according to the position information and a preset position relation, and obtaining a text structuralization result for the target text according to the category recognition result and the text recognition result. The preset positional relationship may be understood as a positional relationship between the keyword category and the numerical value category corresponding to the keyword category.
In the process of implementing the concept of the present disclosure, it is found that when the format of the text image changes more, since the visual difference between different texts is relatively small, it is difficult to implement category identification by using the first method, and a case of category identification error may occur. Since the relative position between the keyword category and the numerical category corresponding to the keyword category in the text image is not fixed, the robustness of the second usage method is poor. Therefore, the accuracy of text structuring realized by using a text detection algorithm and a text recognition algorithm is not high.
In the process of realizing the concept disclosed by the invention, the semantic extraction can be carried out on the target text to obtain a semantic category identification result (namely a category identification result) corresponding to the target text because the target text contains semantic information, so that the text classification can be combined with the text detection and the text identification to realize the text structuring of the text image.
Therefore, the embodiment of the disclosure provides a text structuring processing scheme combining text detection, text recognition and text classification, that is, determining a text recognition result of a text image by using the text detection and the text recognition, determining a semantic category recognition result of the text image by using the text classification, and obtaining a text structuring result of the text image according to the text recognition result and the semantic category recognition result. Because the category identification is realized by utilizing the semantic information included in the characters, the accuracy of text structuring is improved.
Based on the foregoing, the disclosed embodiments provide a text structuring method, a processing device, an electronic device, and a non-transitory computer-readable storage medium and a computer program product storing computer instructions. The text structuring processing method can comprise the following steps: performing text detection on the text image to obtain category information of at least one text detection box corresponding to the text image, wherein the category information comprises a keyword category or a numerical value category, and determining the text image corresponding to a target text detection box in the at least one text detection box, wherein the target text detection box is the text detection box of which the category information is the numerical value category; performing text recognition on the text image corresponding to the target text detection box to obtain a text recognition result of the text image corresponding to the target text detection box, performing text classification on the text recognition result to obtain a semantic category result corresponding to the text recognition result, and generating a text structured result, wherein the text structured result comprises a value corresponding to a keyword category and a value corresponding to a numerical value category, the value corresponding to the keyword category comprises the semantic category result, and the value corresponding to the numerical value category comprises the text recognition result.
Fig. 1 schematically shows an exemplary system architecture to which a text structuring processing method and processing apparatus may be applied according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, the exemplary system architecture 100 to which the text structuring method and processing apparatus may be applied may include a terminal device, but the terminal device may implement the text structuring method and processing apparatus provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
The Server 105 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a conventional physical host and a VPS (Virtual Private Server). The server 105 may also be an edge server. Server 105 may also be a server of a distributed system or a server that incorporates a blockchain.
It should be noted that the text structuring processing method provided by the embodiment of the present disclosure may be generally executed by the terminal device 101, 102, or 103. Accordingly, the text structuring processing device provided by the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103.
Alternatively, the text structuring processing method provided by the embodiment of the present disclosure may also be generally executed by the server 105. Accordingly, the text structuring processing device provided by the embodiment of the present disclosure may be generally disposed in the server 105. The text structuring method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the text structuring processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, the server 105 performs text detection on a text image, obtains category information of at least one text detection box corresponding to the text image, determines a text image corresponding to a target text detection box, performs text recognition on the text image corresponding to the target text detection box, obtains a text recognition result of the text image corresponding to the target text detection box, performs text classification on the text recognition result, obtains a semantic category result corresponding to the text recognition result, and generates a text structuring result. Or text detection of the text image by a server or server cluster capable of communicating with the terminal devices 101, 102, 103 and/or the server 105 and finally generating a text structuring result.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flow chart of a text structuring processing method according to an embodiment of the present disclosure.
As shown in FIG. 2, the method 200 includes operations S210-S250.
In operation S210, text detection is performed on the text image to obtain category information of at least one text detection box corresponding to the text image, where the category information includes a keyword category or a numerical value category.
In operation S220, a text image corresponding to a target text detection box among the at least one text detection box is determined, wherein the target text detection box is a text detection box whose category information is a numerical category.
In operation S230, text recognition is performed on the text image corresponding to the target text detection box, and a text recognition result of the text image corresponding to the target text detection box is obtained.
In operation S240, the text recognition result is subjected to text classification, and a semantic classification result corresponding to the text recognition result is obtained.
In operation S250, a text structured result is generated, wherein the text structured result includes a value corresponding to a keyword category and a value corresponding to a numerical category, the value corresponding to the keyword category includes a semantic category result, and the value corresponding to the numerical category includes a text recognition result.
According to an embodiment of the present disclosure, a text image may refer to an image including text content. The type of text image may include a variety, for example, the text image may include a medical text image, a goods listing text image, or a financial text image, etc. The text detection box may comprise a four-corner box, i.e. may be characterized by four coordinates. The category information of the text detection box may include a keyword category that may characterize a category attribute of the text content included in the text detection box or a numerical category that may characterize a content attribute of the text content included in the text detection box. The text recognition result of the text image corresponding to the target detection box may be used to characterize the value of the numerical category corresponding to the text image. The semantic category identification result may be used to characterize the value of a key category corresponding to the text image.
For example, a text detection box includes text content of "x1x2City center hospital ", the category information of the text detection box is a numerical category. A text detection boxIf the included text content is "name", the category information of the text detection box is the keyword category. If the text content included in one text detection box is 'Zhang III', the category information of the text detection box is a numerical value category.
According to the embodiment of the disclosure, the text image can be processed by using the text detection model, and the category information of at least one text detection box corresponding to the text image is obtained. The text detection model may be obtained by training a first preset model using a first training sample set and a first label set. The first preset model may include a deep learning model or a conventional model. The deep learning model may include a candidate box based text detection model, a segmentation based text detection model, or a hybrid of both. The conventional model may include a text detection model based on SWT (Stroke Width Transform), a text detection model based on EdgeBox (i.e., edge box), or the like.
According to an embodiment of the present disclosure, after the text detection boxes corresponding to the text image are obtained, a text detection box whose category information is a numerical category may be determined from at least one text detection box according to the category information, and a text detection box whose category information is a numerical category may be determined as a target text detection box. After the target text detection box is determined, the text image corresponding to the target text detection box may be extracted from the text image, and then the text image corresponding to the target text detection box may be subjected to text recognition. The text image corresponding to the target text detection box may be processed using a text recognition model. The text recognition model may be obtained by training a second preset model using a second training sample set and a second label set. The second preset model may include a pattern matching model, a machine learning model, or a deep learning model. The deep learning model may include a text recognition model based on single character recognition or a text recognition model based on whole body recognition.
For example, the text recognition model may be a text recognition model based on single character recognition. Detecting an inclusion of "x" corresponding to a target text1x2Text image of city center hospitalPerforming text recognition to obtain a word "x1x2The text recognition result corresponding to each character in the downtown hospital, that is, the text recognition result corresponding to "x" can be obtained1"corresponding text recognition symbol is 2, and" x2The text type identifier corresponding to "3", the text identification identifier corresponding to "city" 4, the text type identification identifier corresponding to "medium" 5, the text type identification identifier corresponding to "heart" 7, the text identification identifier corresponding to "doctor" 8, and the text identification identifier corresponding to "hospital" 6. Determining that the target text detection box corresponds to the x according to the mapping relation between the character meaning and the text recognition identification1x2The text recognition result of the text image of the downtown hospital is "x1x2City center hospital ".
According to the embodiment of the disclosure, after the text recognition result corresponding to the target text detection box is obtained, the text recognition result may be processed by using a text classification model, that is, semantic features included in the text recognition result are extracted by using the text classification model, and a semantic category result corresponding to the text recognition result is determined according to the semantic features. The text classification model may be obtained by training a third preset model with a third training sample set. The third preset model may include a machine learning model or a deep learning model. The machine learning model may include a text classification model based on a naive bayes algorithm or a text classification model based on a decision tree.
For example, the text recognition result is "zhang san", and the text recognition result is subjected to text classification to obtain a semantic recognition result of "name".
According to an embodiment of the present disclosure, the semantic category result obtained in operation S240 and the text recognition result obtained in operation S230 may be combined into a text structured result in which a value corresponding to the keyword category is a semantic category result and a value corresponding to the numeric category is a text recognition result.
It should be noted that, in the technical solution of the embodiment of the present disclosure, the related text image, the category information of the text image, the text recognition result, the semantic category result, and the text structuring result all meet the regulations of the relevant laws and regulations, and necessary security measures are taken without violating the good customs of the public order.
According to the embodiment of the disclosure, the text detection is performed on the text image to obtain the category information of at least one text detection box corresponding to the text image, wherein the category information comprises a keyword category or a numerical value category, the text image corresponding to the target text detection box is determined, the text image corresponding to the target text detection box is subjected to text recognition to obtain the text recognition result of the text image corresponding to the target text detection box, the text recognition result is subjected to text classification to obtain the semantic category result corresponding to the text recognition result, and a text structured result is generated, so that the computer vision and the language model are combined to realize category recognition by utilizing the semantic information contained in the text, and therefore, the accuracy of text structuring is improved.
Operation S240 may operate as follows according to an embodiment of the present disclosure.
And processing the text recognition result by using the text classification model to obtain a semantic classification result corresponding to the text recognition result.
According to embodiments of the present disclosure, the text classification model may include a deep learning model or a machine learning model. The third training sample set may include a plurality of training texts, and the third label set may include a third label corresponding to each training text.
According to an embodiment of the present disclosure, training a third preset model by using a third training sample set and a third label set, and obtaining a text classification model may include: and inputting each training text in the plurality of training texts into a third preset model to obtain a semantic category result corresponding to each training text. And inputting the semantic category result and the third label corresponding to each training text into a first loss function to obtain a first output value. And adjusting the model parameters of the third preset model according to the first output value until the first output value is converged. And determining a third preset model obtained under the condition of meeting the convergence of the first output value as a text classification model.
According to the embodiment of the disclosure, the text recognition result is processed by using the text classification model, and the semantic classification result corresponding to the text recognition result is obtained, so that the semantic information contained in the text recognition result is fully utilized, and the accuracy and the practicability of text structured extraction are further improved.
According to embodiments of the present disclosure, the text classification model may include a deep learning model.
According to embodiments of the present disclosure, the deep learning model may include a fast text (i.e., FastText) based text classification model, a text convolution Neural Network (i.e., TextCNN) based text classification model, a recurrent Neural Network (i.e., TextRNN) based text classification model, or a Dilated Gated Convolutional Neural Network (DGCNN) based text classification model.
According to an embodiment of the present disclosure, operation S210 may include the following operations.
And performing text detection on the text image to obtain the category information and the position information of at least one text detection box corresponding to the text image.
According to an embodiment of the present disclosure, the position information corresponding to the text detection box may be used to characterize the position of the text detection box on the text image. The position information may be characterized by coordinate information of a four-corner box.
According to an embodiment of the present disclosure, operation S220 may include the following operations.
And extracting a text image corresponding to the target text detection box from the text image according to the position information corresponding to the target text detection box in the at least one text detection box.
According to an embodiment of the present disclosure, the position information may be used as a basis for extracting a text image corresponding to the target text detection box from the text image.
According to an embodiment of the present disclosure, extracting a text image corresponding to a target text detection box from the text images according to position information corresponding to the target text detection box of the at least one text detection box may include the following operations.
Position information corresponding to a target text detection box of the at least one text detection box is converted into target position information using affine transformation. And extracting the text image corresponding to the target text detection box from the text image according to the target position information.
According to an embodiment of the present disclosure, the affine transformation is a linear transformation between two-dimensional coordinates to two-dimensional coordinates for maintaining "straightness" and "parallelism" of a two-dimensional figure. The straightness can be understood as straight line or straight line after transformation, no bending, and circular arc or circular arc. The parallelism can be understood as keeping the relative position relationship between different two-dimensional patterns unchanged, and whether parallel lines or parallel lines, and the included angle of the intersected straight lines are unchanged. The affine transformation may be achieved by at least one of translation, scaling, flipping, rotation, and shearing, among others.
According to an embodiment of the present disclosure, the converting the position information corresponding to the target text detection box into the target position information using affine transformation may include: the text detection box in the form of the quadrangular dot box can be converted into the text detection box in the form of the rectangular box by affine transformation, and the position information corresponding to the text detection box in the form of the rectangular box is determined as the target position information, so that the text image corresponding to the target text detection box can be extracted from the text image according to the target position information.
For example, the target text detection box is a four-corner box, which may be defined by { P }1,P2,P3,P4Characterization, P1Points representing the upper left corner of the four-corner box, P2Representing the point in the upper right corner of the four-corner box, P3Points characterizing the lower left corner of the four-corner box, P4Points representing the lower right hand corner of the four-corner box. P1Can be characterized as { x }1,y1},P2Can be characterized as { x }2,y2},P3Can be characterized as { x }3,y3},P4Can be characterized as { x }4,y4}. Transforming P by affine1→P′1,P2→P′2,P3→P′3,P4→P′4To obtain a rectangular frame { P'1,P′2,P′3,P′4}。P1'can be characterized as { x'1,y′1},P′2Can be characterized as { x'2,y′2},P′3Can be characterized as { x'3,y′3},P′4Can be characterized as { x'4,y′4}。
According to an embodiment of the present disclosure, operation S210 may include the following operations.
And processing the text image by using the text detection model to obtain the category information of at least one text detection box corresponding to the text image.
According to an embodiment of the present disclosure, the text detection model may include a deep learning model, which may include a candidate box-based text detection model, a segmentation-based text detection model, or a hybrid text detection model based on both, and the like. The basic idea of realizing text detection based on the text detection model of the candidate frames is to generate a plurality of candidate text detection frames in advance, and then obtain category information and position information corresponding to the text detection frames by utilizing non-maximum suppression. The basic idea of the text detection model based on segmentation is to segment a text image at a pixel level by using a segmentation network and then process the segmented text image to obtain category information and position information corresponding to a text detection frame.
According to the embodiment of the disclosure, a first preset model may be trained by using a first training sample set and a first label set to obtain a text detection model, where the first training sample set includes a plurality of training text images, the first label set includes a first label corresponding to each training text image, and the first label represents real position information and real category information corresponding to at least one text detection box included in the training text images.
According to an embodiment of the present disclosure, training a first preset model by using a first training sample set and a first label set, and obtaining a text detection model may include: inputting each training text image in the plurality of training text images into a first preset model to obtain category information and position information corresponding to at least one text detection box included in each training text image. And inputting the category information, the position information and the first label corresponding to each text detection box into a second loss function to obtain a second output value. And adjusting the model parameters of the first preset model according to the second output value until the second output value is converged. And determining a first preset model obtained under the condition that the second output value is satisfied with convergence as a text detection model.
According to an embodiment of the present disclosure, operation S230 may include the following operations.
And processing the text image corresponding to the target text detection box by using the text recognition model to obtain a text recognition result of the text image corresponding to the text detection box.
According to the embodiment of the disclosure, a second preset model may be trained by using a second training sample set and a second label set to obtain a text recognition model, where the second training sample set may include a plurality of training text image slices, and the second label set includes a second label corresponding to each training text image slice.
According to an embodiment of the present disclosure, training a second preset model by using a second training sample set and a second label set, and obtaining a text recognition model may include: and inputting each training text image slice in the plurality of training text image slices into a second preset model to obtain a text recognition result corresponding to each training text image slice. And inputting the text recognition result and the second label corresponding to each training text image slice into a third loss function to obtain a third output value. And adjusting the model parameters of the second preset model according to the third output value until the third output value is converged. And determining a second preset model obtained under the condition that the third output value convergence is met as a text recognition model.
According to the embodiment of the disclosure, the trained text detection model, the trained text recognition model and the trained text classification model can be determined as the text structured model.
According to an embodiment of the present disclosure, the text structuring processing method may further include the following operations.
And preprocessing the data to obtain a text image.
According to embodiments of the present disclosure, the data pre-processing may include at least one of: noise reduction processing, tilt correction processing, and sharpening processing. For example, before text detection is performed on a text image, for a text image shot in an inclined manner, the text image may be corrected by some inclination correction algorithms and then input into a text detection model for text detection.
According to the embodiment of the disclosure, the quality of the text image can be improved by performing data preprocessing on the text image, so that the text structuring result is more accurate and practical.
According to an embodiment of the present disclosure, the text image may include a medical text image.
According to the embodiment of the disclosure, the medical text is an important way for saving information in a medical scene, and contains a lot of structured information of a user, and the acquisition of the structured information is helpful for understanding the health condition of the user, and then the targeted analysis and processing are performed. At the same time, a complete database and user representation may also be established. The medical text can exist in an image form, how to extract the required structural information from the medical text image is a technical difficulty in a medical scene, and the method can be realized by utilizing the text structuring scheme provided by the embodiment of the disclosure.
The method for training the abnormal audio classification model according to the embodiment of the present disclosure is further described with reference to fig. 3.
The text structuring processing method according to the embodiment of the present disclosure is further described with reference to fig. 3.
Fig. 3 schematically shows a schematic diagram of a text structuring process according to an embodiment of the present disclosure.
As shown in fig. 3, in the text structuring process 300, the text detection model 302 performs text detection on the text image 301 to obtain category information and position information of at least one text detection box corresponding to the text image 301, where the category information may include a keyword category or a numerical value category, and the at least one text detection box may include at least one of a text detection box 303, a text detection box 304, a text detection box 305, a text detection box 306, and a text detection box 307. The category information of the text detection box 304 and the text detection box 306 may be a keyword category, and the category information of the text detection box 303, the text detection box 305, and the text detection box 307 may be a numeric category.
And determining a text image corresponding to the target text detection box, wherein the target text detection box can be a text detection box with numerical value as category information. The target text detection box may include at least one of a text detection box 303, a text detection box 305, and a text detection box 307. The following describes operations of text recognition, text classification, and text structuring result generation, taking the text detection box 303 as a target text detection box.
The text recognition model 308 performs text recognition on the text image 303-1 corresponding to the text detection box 303 to obtain a text recognition result 303-2 of the text image 303-1 corresponding to the text detection box 303.
The text classification model 309 performs text classification on the text recognition result 303-2 to obtain a semantic classification result 310 corresponding to the text recognition result 303-2.
Semantic category results 310 and text recognition results 303-2 are grouped into structured results 311. The structured structure 311 includes values corresponding to a keyword category including the semantic category result 310 and values corresponding to a numerical category including the text recognition result 303-2.
Fig. 4 schematically shows a block diagram of a text structuring processing device according to an embodiment of the present disclosure.
As shown in fig. 4, the text structuring processing device 400 may include a text detecting module 410, a determining module 420, a text recognizing module 430, a text classifying module 440, and a generating module 450.
The text detection module 410 is configured to perform text detection on the text image to obtain category information of at least one text detection box corresponding to the text image, where the category information includes a keyword category or a numerical value category.
A determining module 420, configured to determine a text image corresponding to a target text detection box of the at least one text detection box, where the target text detection box is a text detection box whose category information is a numeric category.
And the text recognition module 430 is configured to perform text recognition on the text image corresponding to the target text detection box to obtain a text recognition result of the text image corresponding to the target text detection box.
And the text classification module 440 is configured to perform text classification on the text recognition result to obtain a semantic classification result corresponding to the text recognition result.
The generating module 450 is configured to generate a text structured result, where the text structured result includes a value corresponding to the keyword category and a value corresponding to the numerical category, the value corresponding to the keyword category includes a semantic category result, and the value corresponding to the numerical category includes a text recognition result.
According to an embodiment of the present disclosure, the text classification module 440 may include a first obtaining sub-module.
And the first obtaining submodule is used for processing and text recognition results by utilizing the text classification model to obtain semantic classification results corresponding to the text recognition results.
According to embodiments of the present disclosure, the text classification model may include a deep learning model.
According to an embodiment of the present disclosure, the text detection module 410 may include a second obtaining sub-module.
And the second obtaining submodule is used for carrying out text detection on the text image to obtain the category information and the position information of at least one text detection box corresponding to the text image.
According to an embodiment of the present disclosure, the determination module 420 may include an extraction sub-module.
And the extraction submodule is used for extracting the text image corresponding to the target text detection box from the text image according to the position information corresponding to the target text detection box in the at least one text detection box.
According to an embodiment of the present disclosure, the extraction sub-module may include a conversion unit and an extraction unit.
A conversion unit configured to convert position information corresponding to a target text detection box of the at least one text detection box into target position information using affine transformation.
And the extraction unit is used for extracting the text image corresponding to the target text detection box from the text image according to the target position information.
According to an embodiment of the present disclosure, the text detection module 410 may include a third obtaining sub-module.
And the third obtaining submodule is used for processing the text image by using the text detection model to obtain the category information of at least one text detection box corresponding to the text image.
According to an embodiment of the present disclosure, the text recognition module 430 may include a fourth obtaining sub-module.
And the fourth obtaining submodule is used for processing the text image corresponding to the target text detection box in the at least one text detection box by using the text recognition model to obtain a text recognition result of the text image corresponding to the target text detection box.
According to an embodiment of the present disclosure, the text structuring processing device 400 may further include an obtaining module.
An obtaining module, configured to obtain a text image by using data preprocessing, where the data preprocessing includes at least one of: noise reduction processing, tilt correction processing, and sharpening processing.
According to an embodiment of the present disclosure, the text image includes a medical text image.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as above.
According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as above.
According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as above.
Fig. 5 shows a block diagram of an electronic device suitable for a text structuring processing method according to an embodiment of the present disclosure. The electronic device 500 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 executes the respective methods and processes described above, such as the text structuring processing method. For example, in some embodiments, the text structuring method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the text structuring processing method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the text structuring processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (20)

1. A text structuring processing method comprises the following steps:
performing text detection on a text image to obtain category information of at least one text detection box corresponding to the text image, wherein the category information comprises a keyword category or a numerical value category;
determining a text image corresponding to a target text detection box in the at least one text detection box, wherein the target text detection box is a text detection box of which the category information is the numerical category;
performing text recognition on the text image corresponding to the target text detection box to obtain a text recognition result of the text image corresponding to the target text detection box;
performing text classification on the text recognition result to obtain a semantic classification result corresponding to the text recognition result; and
generating a text structured result, wherein the text structured result comprises a value corresponding to the keyword category and a value corresponding to the numerical category, the value corresponding to the keyword category comprises the semantic category result, and the value corresponding to the numerical category comprises the text recognition result.
2. The method of claim 1, wherein the text classifying the text recognition result to obtain a semantic category result corresponding to the text recognition result comprises:
and processing the text recognition result by using a text classification model to obtain a semantic classification result corresponding to the text recognition result.
3. The method of claim 2, wherein the text classification model comprises a deep learning model.
4. The method according to any one of claims 1 to 3, wherein the text detection on the text image to obtain the category information of at least one text detection box corresponding to the text image comprises:
performing text detection on the text image to obtain category information and position information of at least one text detection box corresponding to the text image;
wherein the determining a text image corresponding to a target text detection box of the at least one text detection box comprises:
and extracting a text image corresponding to the target text detection box from the text image according to the position information corresponding to the target text detection box in the at least one text detection box.
5. The method of claim 4, wherein the extracting a text image corresponding to a target text detection box from the text images according to the position information corresponding to the target text detection box comprises:
converting position information corresponding to a target text detection box in the at least one text detection box into target position information by affine transformation; and
and extracting a text image corresponding to the target text detection box from the text image according to the target position information.
6. The method according to any one of claims 1 to 5, wherein the text detection on the text image to obtain the category information of at least one text detection box corresponding to the text image comprises:
and processing the text image by using a text detection model to obtain the category information of at least one text detection box corresponding to the text image.
7. The method according to any one of claims 1 to 6, wherein the performing text recognition on the text image corresponding to the target text detection box to obtain a text recognition result of the text image corresponding to the target text detection box comprises:
and processing the text image corresponding to the target text detection box by using a text recognition model to obtain a text recognition result of the text image corresponding to the target text detection box.
8. The method of any of claims 1-7, further comprising:
obtaining the text image by using data preprocessing, wherein the data preprocessing comprises at least one of the following steps: noise reduction processing, tilt correction processing, and sharpening processing.
9. The method of any of claims 1-8, wherein the text image comprises a medical text image.
10. A text structuring processing device comprising:
the text detection module is used for performing text detection on a text image to obtain category information of at least one text detection box corresponding to the text image, wherein the category information comprises a keyword category or a numerical value category;
a determining module, configured to determine a text image corresponding to a target text detection box in the at least one text detection box, where the target text detection box is a text detection box of which the category information is the numerical category;
the text recognition module is used for performing text recognition on the text image corresponding to the target text detection box to obtain a text recognition result of the text image corresponding to the target text detection box;
the text classification module is used for performing text classification on the text recognition result to obtain a semantic classification result corresponding to the text recognition result; and
a generating module, configured to generate a text structured result, where the text structured result includes a value corresponding to the keyword category and a value corresponding to the numerical category, the value corresponding to the keyword category includes the semantic category result, and the value corresponding to the numerical category includes the text recognition result.
11. The apparatus of claim 10, wherein the text classification module comprises:
and the first obtaining submodule is used for processing the text recognition result by utilizing a text classification model to obtain a semantic classification result corresponding to the text recognition result.
12. The apparatus of claim 11, wherein the text classification model comprises a deep learning model.
13. The apparatus of any of claims 10-12, wherein the text detection module comprises:
the second obtaining submodule is used for carrying out text detection on the text image to obtain the category information and the position information of at least one text detection box corresponding to the text image;
wherein the determining module comprises:
and the extraction submodule is used for extracting the text image corresponding to the target text detection box from the text image according to the position information corresponding to the target text detection box in the at least one text detection box.
14. The apparatus of claim 13, wherein the extraction submodule comprises:
a conversion unit configured to convert, by affine transformation, position information corresponding to a target text detection box of the at least one text detection box into target position information; and
and the extraction unit is used for extracting the text image corresponding to the target text detection box from the text image according to the target position information.
15. The apparatus of any of claims 10-14, wherein the text detection module comprises:
and the third obtaining submodule is used for processing the text image by using a text detection model to obtain the category information of at least one text detection box corresponding to the text image.
16. The apparatus of any of claims 10-15, wherein the text recognition module comprises:
and the fourth obtaining submodule is used for processing the text image corresponding to the target text detection box by using the text recognition model to obtain a text recognition result of the text image corresponding to the target text detection box.
17. The apparatus of any of claims 10-16, further comprising:
an obtaining module, configured to obtain the text image by using data preprocessing, where the data preprocessing includes at least one of: noise reduction processing, tilt correction processing, and sharpening processing.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-9.
20. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 9.
CN202110921811.5A 2021-08-11 2021-08-11 Text structuring method, processing device, electronic device and storage medium Pending CN113627439A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110921811.5A CN113627439A (en) 2021-08-11 2021-08-11 Text structuring method, processing device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110921811.5A CN113627439A (en) 2021-08-11 2021-08-11 Text structuring method, processing device, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN113627439A true CN113627439A (en) 2021-11-09

Family

ID=78384674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110921811.5A Pending CN113627439A (en) 2021-08-11 2021-08-11 Text structuring method, processing device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113627439A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187435A (en) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 Text recognition method, device, equipment and storage medium
CN114299522A (en) * 2022-01-10 2022-04-08 北京百度网讯科技有限公司 Image recognition method, device and storage medium
CN114495103A (en) * 2022-01-28 2022-05-13 北京百度网讯科技有限公司 Text recognition method, text recognition device, electronic equipment and medium
CN114724156A (en) * 2022-04-20 2022-07-08 北京百度网讯科技有限公司 Form identification method and device and electronic equipment
CN116110056A (en) * 2022-12-29 2023-05-12 北京百度网讯科技有限公司 Information extraction method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086756A (en) * 2018-06-15 2018-12-25 众安信息技术服务有限公司 A kind of text detection analysis method, device and equipment based on deep neural network
CN111753727A (en) * 2020-06-24 2020-10-09 北京百度网讯科技有限公司 Method, device, equipment and readable storage medium for extracting structured information
US20210081729A1 (en) * 2019-09-16 2021-03-18 Beijing Baidu Netcom Science Technology Co., Ltd. Method for image text recognition, apparatus, device and storage medium
WO2021051553A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Certificate information classification and positioning method and apparatus
CN112989995A (en) * 2021-03-10 2021-06-18 北京百度网讯科技有限公司 Text detection method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086756A (en) * 2018-06-15 2018-12-25 众安信息技术服务有限公司 A kind of text detection analysis method, device and equipment based on deep neural network
US20210081729A1 (en) * 2019-09-16 2021-03-18 Beijing Baidu Netcom Science Technology Co., Ltd. Method for image text recognition, apparatus, device and storage medium
WO2021051553A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Certificate information classification and positioning method and apparatus
CN111753727A (en) * 2020-06-24 2020-10-09 北京百度网讯科技有限公司 Method, device, equipment and readable storage medium for extracting structured information
CN112989995A (en) * 2021-03-10 2021-06-18 北京百度网讯科技有限公司 Text detection method and device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周炫余;刘娟;卢笑;邵鹏;罗飞;: "一种联合文本和图像信息的行人检测方法", 电子学报, no. 01 *
唐三立;程战战;钮毅;雷鸣;: "一种面向结构化文本图像识别的深度学习模型", 杭州电子科技大学学报(自然科学版), no. 02 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187435A (en) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 Text recognition method, device, equipment and storage medium
CN114299522A (en) * 2022-01-10 2022-04-08 北京百度网讯科技有限公司 Image recognition method, device and storage medium
CN114299522B (en) * 2022-01-10 2023-08-29 北京百度网讯科技有限公司 Image recognition method device, apparatus and storage medium
CN114495103A (en) * 2022-01-28 2022-05-13 北京百度网讯科技有限公司 Text recognition method, text recognition device, electronic equipment and medium
CN114495103B (en) * 2022-01-28 2023-04-04 北京百度网讯科技有限公司 Text recognition method and device, electronic equipment and medium
CN114724156A (en) * 2022-04-20 2022-07-08 北京百度网讯科技有限公司 Form identification method and device and electronic equipment
CN116110056A (en) * 2022-12-29 2023-05-12 北京百度网讯科技有限公司 Information extraction method and device, electronic equipment and storage medium
CN116110056B (en) * 2022-12-29 2023-09-26 北京百度网讯科技有限公司 Information extraction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113627439A (en) Text structuring method, processing device, electronic device and storage medium
CN113657274B (en) Table generation method and device, electronic equipment and storage medium
US20210295088A1 (en) Image detection method, device, storage medium and computer program product
CN113780098B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN113204615A (en) Entity extraction method, device, equipment and storage medium
CN114612743A (en) Deep learning model training method, target object identification method and device
CN114429633A (en) Text recognition method, model training method, device, electronic equipment and medium
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
US20230096921A1 (en) Image recognition method and apparatus, electronic device and readable storage medium
CN113553428B (en) Document classification method and device and electronic equipment
CN114596188A (en) Watermark detection method, model training method, device and electronic equipment
CN114445826A (en) Visual question answering method and device, electronic equipment and storage medium
CN113610809A (en) Fracture detection method, fracture detection device, electronic device, and storage medium
CN115082598B (en) Text image generation, training, text image processing method and electronic equipment
CN115116080A (en) Table analysis method and device, electronic equipment and storage medium
CN114842489A (en) Table analysis method and device
CN115101069A (en) Voice control method, device, equipment, storage medium and program product
CN114419613A (en) Image sample generation method, text recognition method, device, equipment and medium
CN114708580A (en) Text recognition method, model training method, device, apparatus, storage medium, and program
CN114724144A (en) Text recognition method, model training method, device, equipment and medium
CN113971810A (en) Document generation method, device, platform, electronic equipment and storage medium
Hu et al. Mathematical formula detection in document images: A new dataset and a new approach
CN114187448A (en) Document image recognition method and device, electronic equipment and computer readable medium
CN113903071A (en) Face recognition method and device, electronic equipment and storage medium
CN114661904A (en) Method, apparatus, device, storage medium, and program for training document processing model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination