CN112926577B - Medical bill image structuring method and device and computer readable medium - Google Patents

Medical bill image structuring method and device and computer readable medium Download PDF

Info

Publication number
CN112926577B
CN112926577B CN202110193283.6A CN202110193283A CN112926577B CN 112926577 B CN112926577 B CN 112926577B CN 202110193283 A CN202110193283 A CN 202110193283A CN 112926577 B CN112926577 B CN 112926577B
Authority
CN
China
Prior art keywords
character string
clustering
information
data
medical bill
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110193283.6A
Other languages
Chinese (zh)
Other versions
CN112926577A (en
Inventor
康帅兵
褚一平
陈建勇
郑义
朱华山
郁星星
张雪妮
陈士春
潘翔
赵小敏
郑河荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hailiang Information Technology Co ltd
Original Assignee
Hangzhou Hailiang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hailiang Information Technology Co ltd filed Critical Hangzhou Hailiang Information Technology Co ltd
Priority to CN202110193283.6A priority Critical patent/CN112926577B/en
Publication of CN112926577A publication Critical patent/CN112926577A/en
Application granted granted Critical
Publication of CN112926577B publication Critical patent/CN112926577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention discloses a medical bill image structuring method and device based on mean value clustering and character recognition and a computer readable medium, comprising the following steps: step 1, performing OCR character recognition on the obtained medical bill image to obtain full-text character string information of the bill; step S2, KMeans clustering is carried out on the note full-text character string information; step S3, determining the title position according to the clustering result, and extracting the entry data of the corresponding column according to the title position information; and step S4, carrying out validity check and correction on the entry data to obtain the structured data of the medical bill. By adopting the technical scheme of the invention, the bill structuring effect can be greatly improved.

Description

Medical bill image structuring method and device and computer readable medium
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a medical bill image structuring method and device based on mean value clustering and character recognition and a computer readable medium.
Background
In recent years, with the continuous and deep development of the medical informatization of China, the electronization of medical bills has become a trend. However, the reimbursement unit cannot directly acquire the detailed medical information of the user, so that the user needs to submit the original medical document during reimbursement, and then the original medical document is manually input into the system by reimbursers and reimbursed according to a specific reimbursement proportion and reimbursement amount after checking item by item. There are a lot of drawbacks in manual entry process, it can be inevitable to appear the wrong problem of missing one item to enter manually on the one hand, on the other hand needs to dispose a large amount of human resources and carries out high repeatability work, this not only can bring very big pressure for medical staff, leads to the reimbursement flow consuming time hard and inefficiency.
For automatic bill recognition, the character information in the image is recognized by OCR technology. The text recognition result is subjected to structuring processing according to the structured information of the bill to form a detailed medical bill result. However, the existing table recognition technology mainly adopts the characteristics of table lines and the like to carry out segmentation so as to obtain the table structure information. But for many medical tickets there is no form line. Therefore, the structuring process cannot be completed by the existing method.
Disclosure of Invention
The invention aims to solve the technical problem of providing a medical bill image structuring method and device based on mean value clustering and character recognition, which can greatly improve the bill structuring effect.
In order to achieve the purpose, the invention adopts the following technical scheme:
a medical bill image structuring method based on mean value clustering and character recognition comprises the following steps:
step 1, performing OCR character recognition on the obtained medical bill image to obtain full-text character string information of the bill;
step S2, KMeans clustering is carried out on the note full-text character string information;
step S3, determining title position information according to the clustering result, and extracting entry data of the corresponding column according to the title position information;
and step S4, carrying out validity check and correction on the entry data to obtain the structured data of the medical bill.
Preferably, the obtaining of the full-text character string information in step S1 includes:
preprocessing the medical bill image;
calculating the rotation angle of the preprocessed medical bill image;
performing rotation correction on the preprocessed medical bill image according to the rotation angle;
OCR recognition is carried out on the corrected medical bill image to obtain bill full text character string information, and the bill full text character string information comprises: the method comprises the following steps of (1) character string content, character string coordinate positions, recognition confidence coefficients of character strings and candidate characters;
and filtering the full text character string information of the bill.
Preferably, the clustering of the full-text character string information of the ticket in step S2 includes:
step 2.1, extracting all the note full text character string information obtained in the step S1, and initializing the vector of each character string at the left position of each character string;
step 2.2, initializing k to 10 central points, wherein k represents the final clustering result;
step 2.3, randomly selecting k points as initial clustering centers, calculating the distance from each character string vector to each clustering center,
Figure BDA0002945174480000021
wherein x and y are the coordinate values of the left position of the character,
step 2.4, comparing the distance from each character string vector to each clustering center, and dividing the distance into the clusters closest to the clustering centers;
step 2.5, recalculating each clustering center until convergence, and outputting clustering results, wherein the target function formula is as follows:
Figure BDA0002945174480000031
wherein u isiIs SiMean of all points, SiIndicating the ith cluster.
Preferably, in step S3, performing row segmentation according to the clustering result and the OCR full text recognition data, and counting attributes corresponding to each row of data, where the attributes include: the sum of the number of Chinese characters, the number of digits and the number, and performing semantic analysis on the attributes to determine the title position information.
Preferably, in step S3, the extracting entry data corresponding to the column specifically includes: sequentially extracting vertical direction data according to the coordinate information of the left and right boundaries of the title; matching the amount information with the nearest distance in the vertical direction according to the extracted item name position information to extract unit price or quantity; merging the project name line feed data meeting the merging condition to obtain specific entry data.
A medical bill image structuring device based on mean value clustering and character recognition comprises:
the recognition module is used for carrying out OCR character recognition on the acquired medical bill image to obtain the full-text character string information of the bill;
the clustering module is used for performing KMeans clustering on the note full-text character string information;
the extraction module is used for determining title position information according to the clustering result and extracting the entry data of the corresponding column according to the title position information;
and the correction module is used for carrying out validity check and correction on the entry data to obtain the structured data of the medical bill.
A computer readable medium having stored thereon instructions which, when executed by a processor, implement steps for a medical ticket image structuring method based on mean clustering and character recognition.
Firstly, obtaining a character string by adopting an OCR technology, then carrying out cluster analysis on the character string according to coordinates, and determining column data; and simultaneously, according to semantic features of the medical bill, including title keywords, Chinese character statistics and numerical statistics information, four fields of item names, unit prices, quantity and total prices in the medical bill are extracted. In addition, in order to ensure that the structured output data is accurate and error-free, the invention also adds multiple check rules, self-determines character information with lower confidence coefficient based on the internal logic association relationship between the fields, and carries out heuristic correction on the data which is possibly wrong according to the internal logic. Finally, multiple modes are integrated, the data can be quickly checked and corrected, and a complete, quick and accurate data basis is provided for medical insurance reimbursement.
Drawings
FIG. 1 is a flow chart of a medical document image structuring method of the present invention;
FIG. 2 is a flow chart of string clustering based on left position X coordinates;
FIG. 3 is a title location flow diagram;
FIG. 4 is a digital adaptive correction flow chart;
fig. 5 is a schematic structural diagram of the medical bill image structuring device of the invention.
Detailed Description
In order to better explain the technical scheme of the invention, the invention is further described in detail by combining the drawings and the specific embodiment. It should be noted that the embodiments described herein are only for illustrating and explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1, an embodiment of the present invention provides a medical bill image structuring method based on mean value clustering and character recognition, including the following steps:
step 1, performing OCR character recognition on the obtained medical bill image to obtain full-text character string information of the bill;
step S2, KMeans clustering is carried out on the note full-text character string information;
step S3, determining the title position according to the clustering result, and extracting the entry data of the corresponding column according to the title position information;
and step S4, carrying out validity check and correction on the entry data to obtain the structured data of the medical bill.
Further, the obtaining of the full-text character string information of the ticket in step S1 includes:
step 1.1, preprocessing the medical bill image, including cutting, binaryzation and scaling; the image is cut to avoid the influence of black edges on the calculation of the rotation angle; carrying out adaptive threshold value binarization on the cut image, zooming the binarized image, wherein the zooming ratio is one fourth of the cut image, and the image processing speed is further improved through image zooming;
step 1.2, calculating the rotation angle of the preprocessed medical bill image by a histogram method;
step 1.3, performing rotation correction on the preprocessed medical bill image according to the rotation angle;
step 1.4, performing OCR recognition on the corrected medical bill image to obtain bill full-text character string information, wherein the bill full-text character string information comprises: the method comprises the following steps of (1) character string content, character string coordinate positions, recognition confidence coefficients of character strings and candidate characters;
step 1.5: and filtering the full-text character string information of the bill, wherein the filtering comprises the following steps: the subtotal, total, and total character strings are removed.
Further, in step S2, since there is a significant characteristic between the medical note columns, when the clustering is initialized, if k is 10, even if there is some interference information in the header, a good clustering effect can be obtained. The clustering sample is an x central point coordinate of each character string identified by OCR, the initial clustering value is set to be 10, the result obtained by clustering is a class label to which each character string belongs, and the clustering mean value is a central point coordinate corresponding to each class.
Clustering the full-text character string information of the bill, as shown in fig. 2, specifically comprises the following steps:
step 2.1, extracting all the character strings obtained in the step S1, and initializing the vector of each character string by the left position of each character string;
step 2.2, initializing k to 10 central points, wherein k represents the final clustering result;
step 2.3, randomly selecting k points as initial clustering centers, calculating the distance from each character string vector to each clustering center,
Figure BDA0002945174480000061
wherein x and y are the coordinate values of the left position of the character,
step 2.4, comparing the distance from each character string vector to each clustering center, and dividing the distance into the clusters closest to the clustering centers;
step 2.5, recalculating each clustering center until convergence, and outputting clustering results, wherein the target function formula is as follows:
Figure BDA0002945174480000062
wherein u isiIs SiMean vector of all points, SiRepresents the ith cluster;
further, in step S3, performing row segmentation according to the clustering result and the OCR full text recognition data, that is, counting attributes corresponding to each row (category) of data, where the attributes include the number of chinese characters, the number of digits, and the sum of the numbers. And performing semantic analysis according to the segmentation information, determining the position information of the title, and analyzing the position information into two conditions of existence and nonexistence of the original title.
When the first type has title row information, searching the position of a title according to title candidate characters, and simultaneously verifying by combining a clustering result, namely, the row with the most Chinese characters has more money columns and more numbers than the Chinese characters, if the difference between the position of the title positioned by a keyword and the clustering result is larger, verifying and judging the corresponding adjacent row again;
and when the title line information does not exist in the second type, determining the item name and the amount according to the classification attribute, wherein the column with the most Chinese characters is the item name column, and the column with the most amount is the amount column. In addition, when the statistics is performed, the influence of subtotal, total, and total is excluded. After the project name and the amount position are located, the data in the same row are checked, and the unit price or the quantity column can be obtained by checking the digital part in the same row.
And finally, after the position information of the title is positioned, determining the left and right boundaries of the title according to the information of the corresponding column, determining by traversing the widest entry in all data of the corresponding column, and simultaneously meeting the condition that the distance is within 30 pixels from the clustering center.
The determining of the position of the mark in step S3, as shown in fig. 3, specifically includes the following steps:
step 3.1, classifying the full text results according to the clustering results, and recording the mean value of each class;
step 3.2, counting attributes corresponding to each type of data, wherein the attributes comprise the number of Chinese characters, the number of digits and the sum of the numbers, and meanwhile, obtaining title keywords of each row for identifying the semantics of the row;
and 3.3, positioning title keywords of the full text result, wherein the title keywords comprise item names and item code words, and the title keywords are defined as item name candidate columns. If an amount key is included in the column, an amount candidate column is defined. Typically, the sum of money is much larger than the sum of unit prices. For the columns which can not adopt the keywords for semantic recognition, the columns are identified according to the numerical sum, and the money columns are distinguished;
and 3.4, matching the column data obtained by semantic recognition according to the consistency of the line positions to obtain the information of the project name, the amount, the unit price or the quantity of each line.
The step S3 of extracting entry data where the corresponding column is located specifically includes: the method comprises the steps of firstly, sequentially taking out vertical direction data according to the coordinate information of the left and right boundaries of the title, then matching the amount information of the nearest distance in the vertical direction according to the position information of the taken-out item name to take out unit price or quantity, finally judging data of line feed of the item name, and merging the data meeting merging conditions to obtain specific entry data.
Further, the step S4 of performing a validity check and matching correction on the entry data includes:
and 4.1, carrying out decimal point number check on the numbers of the extracted entry data, wherein decimal point consistency exists in the amount, unit price and number of each column. Therefore, the algorithm firstly counts the subsequent numbers of the decimal point and determines the precision of each row of numbers. If the decimal point is not recognized in the line, the decimal point is increased according to the number digit, and the recognition precision is improved. As shown in fig. 4, specifically: firstly, removing Chinese characters, dates and special characters; secondly, counting the average occurrence times of the decimal points in the row, wherein when the average occurrence times is more than one half of the total row number, the positions (from right to left) of the decimal points in the character are counted; then, taking the average value of the first half (from large to small) of the times as the position of the corresponding column decimal point, and filling the data lacking the decimal point;
4.2, if the amount, the unit price and the quantity are simultaneously recognized in the current row, carrying out consistency check; if the verification fails, selecting two values with the highest confidence degrees according to the recognition confidence degree sum of the character string, and performing back calculation on the other data to ensure the data consistency;
and 4.3, matching and checking are carried out by combining the medical bill semantic dictionary and the recognition result, and data of a specific title is corrected by combining candidate characters recognized by the OCR based on the medical bill semantic dictionary (the dictionary comprises a medical specific title and common combination information).
The invention combines the average value clustering and OCR technology, combines the common OCR recognition technology and the average value clustering method, can accurately position the title position, then takes out the corresponding item data, and finally combines the multiple matching rule engine, can accurately recognize the information of the name, the amount, the unit price and the quantity of the item. The medical staff input time can be saved, a large amount of repeated labor is avoided, the input speed is increased, the recognition accuracy is high, errors caused by manual input can be avoided, the work efficiency can be improved, and social resources are saved.
As shown in fig. 5, an embodiment of the present invention further provides a medical bill image structuring apparatus based on mean value clustering and character recognition, including:
the recognition module is used for carrying out OCR character recognition on the acquired medical bill image to obtain the full-text character string information of the bill;
the clustering module is used for performing KMeans clustering on the note full-text character string information;
the extraction module is used for determining title position information according to the clustering result and extracting the entry data of the corresponding column according to the title position information;
and the correction module is used for carrying out validity check and correction on the entry data to obtain the structured data of the medical bill.
Embodiments of the present invention also provide a computer readable medium having stored thereon instructions that, when executed by a processor, implement the steps of the method for structuring medical document images based on mean clustering and character recognition of the present invention.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable medium or transmitted from one computer readable medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk) and any combination thereof.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (4)

1. A medical bill image structuring method based on mean value clustering and character recognition is characterized by comprising the following steps:
s1, performing OCR character recognition on the acquired medical bill image to obtain full-text character string information of the bill;
step S2, KMeans clustering is carried out on the note full text character string information, and the method comprises the following steps:
step 2.1, extracting all the note full text character string information obtained in the step S1, and initializing the vector of each character string at the left position of each character string;
step 2.2, initializing k to 10 central points, wherein k represents the final clustering result,
step 2.3, randomly selecting k points as initial clustering centers, calculating the distance from each character string vector to each clustering center,
Figure FDA0003306364580000011
wherein x and y are the coordinate values of the left position of the character,
step 2.4, comparing the distance from each character string vector to each clustering center, and dividing the distance into the clusters closest to the clustering centers;
step 2.5, recalculating each clustering center until convergence, and outputting clustering results, wherein the target function formula is as follows:
Figure FDA0003306364580000012
wherein u isiIs SiMean of all points, SiRepresents the ith cluster;
step S3, determining title position information according to the clustering result, and extracting entry data where the corresponding column is located according to the title position information;
in step S3, performing row segmentation according to the clustering result and the OCR full text recognition data, and counting attributes corresponding to each row of data, where the attributes include: performing semantic analysis on the attributes to determine title position information;
in step S3, the extracting entry data where the corresponding column is located specifically includes: sequentially extracting vertical direction data according to the coordinate information of the left and right boundaries of the title; matching the amount information with the nearest distance in the vertical direction according to the extracted item name position information to extract unit price or quantity; merging the project name line feed data meeting the merging condition to obtain specific entry data;
and step S4, carrying out validity check and correction on the entry data to obtain the structured data of the medical bill.
2. The medical bill image structuring method based on mean value clustering and character recognition according to claim 1, wherein the obtaining of the full-text character string information in the step S1 comprises:
preprocessing the medical bill image;
calculating the rotation angle of the preprocessed medical bill image;
performing rotation correction on the preprocessed medical bill image according to the rotation angle;
OCR recognition is carried out on the corrected medical bill image to obtain bill full text character string information, and the bill full text character string information comprises: the method comprises the following steps of (1) character string content, character string coordinate positions, character string recognition confidence coefficients and candidate characters;
and filtering the full text character string information of the bill.
3. A medical bill image structuring device based on mean value clustering and character recognition is characterized by comprising:
the recognition module is used for carrying out OCR character recognition on the acquired medical bill image to obtain the full-text character string information of the bill;
the clustering module is used for performing KMeans clustering on the note full-text character string information; the method specifically comprises the following steps:
extracting all note full-text character string information, and initializing a vector of each character string according to the left position of each character string;
initializing k to 10 central points, k representing the result of the final clustering,
randomly selecting k points as initial clustering centers, calculating the distance from each character string vector to each clustering center,
Figure FDA0003306364580000031
wherein x and y are the coordinate values of the left position of the character,
comparing the distance from each character string vector to each clustering center, and dividing the distance into the cluster closest to the clustering center;
recalculating each clustering center until convergence, and outputting a clustering result, wherein the target function formula is as follows:
Figure FDA0003306364580000032
wherein u isiIs SiMean of all points, SiRepresents the ith cluster;
the extraction module is used for determining title position information according to the clustering result and extracting the entry data of the corresponding column according to the title position information; performing row segmentation according to the clustering result and the data identified by the OCR full text, and counting attributes corresponding to each row of data, wherein the attributes comprise: performing semantic analysis on the attributes to determine title position information; the specific steps for extracting the entry data of the corresponding column are as follows: sequentially extracting vertical direction data according to the coordinate information of the left and right boundaries of the title; matching the amount information with the nearest distance in the vertical direction according to the extracted item name position information to extract unit price or quantity; merging the project name line feed data meeting the merging condition to obtain specific entry data;
and the correction module is used for carrying out validity check and correction on the entry data to obtain the structured data of the medical bill.
4. A computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of the method of any of claims 1-2.
CN202110193283.6A 2021-02-20 2021-02-20 Medical bill image structuring method and device and computer readable medium Active CN112926577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110193283.6A CN112926577B (en) 2021-02-20 2021-02-20 Medical bill image structuring method and device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110193283.6A CN112926577B (en) 2021-02-20 2021-02-20 Medical bill image structuring method and device and computer readable medium

Publications (2)

Publication Number Publication Date
CN112926577A CN112926577A (en) 2021-06-08
CN112926577B true CN112926577B (en) 2021-11-26

Family

ID=76170014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110193283.6A Active CN112926577B (en) 2021-02-20 2021-02-20 Medical bill image structuring method and device and computer readable medium

Country Status (1)

Country Link
CN (1) CN112926577B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762100B (en) * 2021-08-19 2024-02-09 杭州米数科技有限公司 Method, device, computing equipment and storage medium for extracting and standardizing names in medical notes

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10984337B2 (en) * 2012-02-29 2021-04-20 Microsoft Technology Licensing, Llc Context-based search query formation
US10796357B2 (en) * 2017-04-17 2020-10-06 Walmart Apollo, Llc Systems to fulfill a picked sales order and related methods therefor
CN111062259B (en) * 2019-11-25 2023-08-25 泰康保险集团股份有限公司 Table identification method and apparatus
CN111461062B (en) * 2020-04-23 2023-12-19 国网吉林省电力有限公司 Structured extraction method for bill image text information
CN111784587B (en) * 2020-06-30 2023-08-01 杭州师范大学 Invoice photo position correction method based on deep learning network

Also Published As

Publication number Publication date
CN112926577A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
US10482174B1 (en) Systems and methods for identifying form fields
US11232300B2 (en) System and method for automatic detection and verification of optical character recognition data
US11348353B2 (en) Document spatial layout feature extraction to simplify template classification
US20240012846A1 (en) Systems and methods for parsing log files using classification and a plurality of neural networks
WO2019218473A1 (en) Field matching method and device, terminal device and medium
US10489645B2 (en) System and method for automatic detection and verification of optical character recognition data
US20220004878A1 (en) Systems and methods for synthetic document and data generation
US20210366055A1 (en) Systems and methods for generating accurate transaction data and manipulation
US11880435B2 (en) Determination of intermediate representations of discovered document structures
US20160283582A1 (en) Device and method for detecting similar text, and application
US11615244B2 (en) Data extraction and ordering based on document layout analysis
CN112926577B (en) Medical bill image structuring method and device and computer readable medium
CN113762100B (en) Method, device, computing equipment and storage medium for extracting and standardizing names in medical notes
CN113221918A (en) Target detection method, and training method and device of target detection model
CN112949653A (en) Text recognition method, electronic device and storage device
CN115294593A (en) Image information extraction method and device, computer equipment and storage medium
CN112380856A (en) Method, system, terminal and readable storage medium for automatically extracting component names in patent text
JP2004046723A (en) Method for recognizing character, program and apparatus used for implementing the method
CN113298632B (en) Intelligent financial management system based on mobile internet and data characteristic analysis
CN110765263B (en) Display method and device for search cases
US20220350814A1 (en) Intelligent data extraction
CN117421487B (en) Multiple network information screening management system based on artificial intelligence
CN113362151B (en) Data processing method and device for financial business, electronic equipment and storage medium
CN113011174B (en) Method for identifying purse string based on text analysis
Mao et al. Style-independent document labeling: design and performance evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant