CN117116409A - Test report structured recognition method based on deep learning and automatic error correction - Google Patents

Test report structured recognition method based on deep learning and automatic error correction Download PDF

Info

Publication number
CN117116409A
CN117116409A CN202311143625.9A CN202311143625A CN117116409A CN 117116409 A CN117116409 A CN 117116409A CN 202311143625 A CN202311143625 A CN 202311143625A CN 117116409 A CN117116409 A CN 117116409A
Authority
CN
China
Prior art keywords
recognition
text
error correction
information
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311143625.9A
Other languages
Chinese (zh)
Inventor
徐增敏
吴金儒
李明辉
刘龙飞
原孟炜
杜声茂
蒙儒省
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin Anview Technology Co ltd
Guilin University of Electronic Technology
Original Assignee
Guilin Anview Technology Co ltd
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin Anview Technology Co ltd, Guilin University of Electronic Technology filed Critical Guilin Anview Technology Co ltd
Priority to CN202311143625.9A priority Critical patent/CN117116409A/en
Publication of CN117116409A publication Critical patent/CN117116409A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to the technical field of medical laboratory sheet recognition, in particular to a method for structuring recognition of an assay report sheet based on deep learning and automatic error correction, which is characterized in that scene characteristics of detection and recognition of the medical laboratory sheet are improved based on a differentiable binarization network, text detection and recognition are carried out on an assay report sheet image by using a differentiable binarization network with cascade sparse query to obtain a structuring recognition result of the assay report sheet, and in addition, aiming at the problems of item misrow, single item information separation, multi item information adhesion and the like of the structuring result output by an OCR module, the automatic error correction is carried out on item information in a structured form after finishing through a BK tree and an AC automaton, the recognized item information is finally combined in the same new form, and finally the finished new form is output in a JSON format. The invention makes up the defects of the prior art, realizes the automatic acquisition of the information of the test report and improves the efficiency of data collection and arrangement.

Description

Test report structured recognition method based on deep learning and automatic error correction
Technical Field
The invention relates to the technical field of medical laboratory sheet identification, in particular to a structured identification method of an assay report sheet based on deep learning and automatic error correction.
Background
Optical character recognition technology spans two major areas of computer vision and natural language processing. In the technological development process over 40 years, the optical character recognition technology has a strong industrial application background all the time, and is a few fields which are driven by industrial and academic circles at the beginning in the computer field. Conventional optical character recognition technology is mainly based on a feature extraction method of signal processing, a method of image structure, various operators and various mapping technologies. For some special character types and special application scenarios, there are also specially constructed artificial feature extraction techniques. In recent years, deep learning has been developed rapidly, and optical character recognition technology based on a deep neural network has excellent effect. The characteristic learning is automatically carried out through the multilayer network structure, the traditional artificial characteristic extraction process is overturned, and the effect is obviously improved.
The optical character recognition technology based on the deep neural network mainly comprises two parts: a text detection network and a text recognition network. The modern text detection network is mainly based on the idea of picture segmentation to adapt to text areas with various irregular shapes, the central idea is to classify from a pixel layer first, judge whether each pixel belongs to a text object or not, obtain a probability map of the text area, then draw a minimum surrounding curve of the candidate areas by utilizing polygons and the like, and link a stack of scattered pixel blocks together to form a boundary frame. The modern text recognition network is mainly and widely based on a cyclic convolutional neural network for improvement and adjustment, the central idea of the technology is to use a deep convolutional layer to generate basic image features, then use a deep two-way long-short-term memory network capable of absorbing context semantic information for time sequence feature training, finally introduce a connection time sequence classification loss function to realize end-to-end indefinite length sequence recognition, and solve the problem that characters cannot be aligned during training.
The existing optical character recognition method of the medical laboratory sheet has higher requirement on the image quality of the laboratory sheet, and when the image quality is poor or impurities, stains and color distortion exist, the recognition effect can be greatly affected; in addition, misidentification phenomenon is easy to occur to specific characters or problematic notes in the process of character identification, so that misunderstanding of results is caused.
Disclosure of Invention
The invention aims to provide a test report structured recognition method based on deep learning and automatic error correction, which aims to solve the defects of the existing intelligent recognition problem of medical test report detection items and results.
In order to achieve the above object, the present invention provides a method for structured recognition of an assay report based on deep learning and automatic error correction, comprising the steps of:
step 1: preprocessing the imaging effect of the test report photo;
step 2: the Hough transformation centralizes and cuts the table;
step 3: the first time passes through the OCR module, the key information is positioned to cut the region;
step 4: the text detection and recognition are carried out on the cutting area through the OCR module for the second time, and a structured recognition result is obtained;
step 5: automatically checking and sorting project information;
step 6: automatically correcting the item information in the tidied structured form through BK tree and AC automaton;
step 7: project information is combined and output.
Optionally, the process of preprocessing the imaging effect of the test report photo comprises the following steps:
inputting an assay report image, and converting the RGB image into a gray scale image by gray scale treatment;
performing picture processing by using Gaussian filtering, and reducing picture noise;
images with non-uniform illumination conditions or background noise are processed using adaptive threshold binarization.
Optionally, in the process of centralizing and cutting the table through Hough transformation, finding the position of the test report table through Hough transformation, centralizing the inclined test report table photo, judging whether the test report table is a single-column test table or a multi-column test table, and cutting the area where the content of each column of the table is located.
Optionally, in the process of centralizing and cutting the table through Hough transformation, finding the position of the test report table through Hough transformation, centralizing the inclined test report photo, judging whether the test report is a single-column test report or a multi-column test report, and cutting the area where the content of each column of the table is located.
Optionally, the first time the OCR module locates key information to perform a region clipping process, including the following steps:
positioning detection frame coordinates of all text areas through a text detection network in the OCR module;
identifying text content within all text regions through a text recognition network in the OCR module;
desensitizing the sensitive information according to the text detection frame coordinates and the content obtained by the OCR module;
according to the text detection frame coordinates and contents obtained by the OCR module, a detection frame corresponding to a result field in the table is found, all the detection frame is cut to left to form an area A1 and an area A2 in which the detection frame is arranged, wherein the area A1 comprises important item name information of an item Chinese name or an item English name, and the area A2 comprises an assay result corresponding to the item name.
Optionally, in the process of text detection and recognition of the clipping regions A1 and A2 through the OCR module for the second time, the recognition result of the region A1 includes an excel table of item information, and the recognition result of the region A2 includes a single-column data frame of the test result.
Optionally, the text detection network in the OCR module adopts a differential binarization network of cascade sparse query for locating text positions in the test report form table and outputting rectangular frame coordinates of the text positions; the text recognition network adopts CRNN for recognizing text content in the rectangular box detected by the text detection network.
Optionally, the process of automatically checking and sorting the project information comprises the following steps:
merging the contents of all columns into a first column, and unifying the table information into a character string type;
the contents of all cells in a column are correctly combined, and English, numerals and Chinese which are singly in a row are bonded;
the contents of all cells in a column are correctly separated, and the names of two detection items which are written as two items but are adhered to the same row are separated.
Optionally, the executing process of step 6 specifically includes collecting general project information first, establishing a medical laboratory sheet project information dictionary, performing post-processing on the result obtained in step 5, obtaining project information by using an AC automaton, then establishing a BK tree by using all correct project information of the medical laboratory sheet, and quickly finding out the correct project information with higher matching degree with the error project information from the established BK tree for incorrect project information left after the correct result is extracted by the AC automaton, thereby achieving the purpose of error correction.
Optionally, in the process of merging and outputting the project information, counting the number of projects, taking out the data with the corresponding number from the form of the project information identification result, merging the identified project information into the same new form, and finally outputting the tidied new form in a JSON format.
Optionally, if the test report is determined to be a multi-column test report in the step 2, the processing procedures from the step 4 to the step 7 are performed on the test report of each column, so as to obtain the structural identification result of each column respectively.
The invention provides a structured recognition method of an assay report based on deep learning and automatic error correction, which is characterized in that scene characteristics of medical laboratory sheet detection and recognition are improved based on a differentiable binarization network, text detection and recognition are carried out on an assay report image preprocessed by technologies such as Gaussian filtering, hough transformation and the like by using a differentiable binarization network of cascade sparse query, a structured recognition result of the assay report is obtained, and in addition, the problems of item misrows, single item information separation, multi-item information adhesion and the like of the structured result output by an OCR module are solved, the automatic error correction is carried out on item information in a structured form after finishing is carried out through a BK tree and an AC automaton, the recognized item information is finally combined in the same new form, and the finished new form is finally output in a JSON format. According to the invention, on the basis of the intelligent recognition of the medical test report detection items and results, the defects of the prior art are overcome, the automatic acquisition of test report information is realized, and the data collection and arrangement efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the steps of a method for structured recognition of an assay report based on deep learning and automatic error correction according to the present invention.
FIG. 2 is a schematic diagram of the model structure of a differentiable binary network (CSQ-DBNet) of the cascade sparse query of the present invention.
Fig. 3 is a schematic representation of the sparse convolution of the present invention.
Fig. 4 is a schematic view of region clipping for key information according to "result" frame coordinates in an embodiment of the present invention.
FIG. 5 is a schematic representation of the project name dictionary tree structure of the present invention.
FIG. 6 is a schematic representation of node attributes in a dictionary tree of the present invention.
FIG. 7 is a schematic diagram of a process for constructing a fail table in accordance with an embodiment of the present invention.
FIG. 8 is a schematic diagram of target string matching for an embodiment of the present invention.
Fig. 9 is a schematic view of a project name BK tree structure according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
Referring to fig. 1, the invention provides a method for identifying a structured test report form based on deep learning and automatic error correction, which comprises the following steps:
s1: preprocessing the imaging effect of the test report photo;
s2: the Hough transformation centralizes and cuts the table;
s3: the first time passes through the OCR module, the key information is positioned to cut the region;
s4: the text detection and recognition are carried out on the cutting area through the OCR module for the second time, and a structured recognition result is obtained;
s5: automatically checking and sorting project information;
s6: automatically correcting the item information in the tidied structured form through BK tree and AC automaton;
s7: project information is combined and output.
The following further describes the implementation steps and embodiments:
step 1: the pretreatment of imaging effects such as image erosion and the like is carried out according to the image characteristics of the test report photo, and the specific implementation process is as follows:
step 1.1: the picture is input to read the gray scale map.
The process of converting a color image into a grayscale image is called graying processing of the image. The color of each pixel in the color image is determined by R, G, B three components, with 256 values for each component being desirable, such a pixel may have 1600 tens of thousands of pixels (256×256×256) color change range. The gray image is a special color image with the same R, G, B components, and the variation range of one pixel point is 256, so that in digital image processing, images in various formats are generally converted into gray images first so that the calculation amount of the subsequent images is reduced. The description of gray-scale images, like color images, reflects the distribution and characteristics of the chromaticity and brightness levels of the whole and parts of the image:
Gray=0.299×R+0.587×G+0.114×B (1)
in the common gray scale calculation, the weights are commonly used as 0.299, 0.587 and 0.114, and are obtained by estimating the perceived intensities of the red, green and blue color channels according to human eyes. According to the formula, R, G and B values of each pixel point are sequentially read, gray values are calculated (converted into integer), the gray values are assigned to corresponding positions of the new image, and conversion is completed after all the pixel points are traversed once.
Step 1.2: using gaussian filtering, the picture noise is reduced, the size of the filter kernel is 7 x 7.
And carrying out weighted average on the whole image, wherein the value of each pixel point is obtained by carrying out weighted average on the pixel point and other pixel values in the neighborhood. The specific operations of gaussian filtering are: each pixel in the image is scanned with a template (or convolution, mask), and the value of the center pixel point of the template is replaced with the weighted average gray value of the pixels in the neighborhood determined by the template.
The most important parameter in the gaussian filter template is the standard deviation σ of the gaussian distribution. It represents the degree of dispersion of the data, if σ is smaller, the larger the center coefficient of the generated template is, and the smaller the surrounding coefficient is, so that the smoothing effect on the image is not obvious; in contrast, when σ is large, the difference between the coefficients of the generated templates is not large, and the smoothing effect on the image is obvious when the coefficients are similar to the average template.
Step 1.3: binarization. The principle of adaptive binarization is as follows: first, the original gray-scale image is divided into small blocks (usually square or rectangular) of equal size, called local areas. For each local region, an adaptive threshold is calculated for classifying pixels of the region into foreground (white) and background (black). The calculation of the adaptive threshold is typically based on local region pixel value statistics, such as local region average pixel values or gaussian weighted average pixel values. For each pixel, it is compared to an adaptive threshold for the corresponding local region. If the pixel value is greater than the threshold value, the pixel is set to white (foreground), otherwise set to black (background). By using an adaptive threshold, adaptive binarization is able to better preserve useful image information when processing images with non-uniform illumination conditions or background noise.
Step 2: the method comprises the steps of finding out the position of a three-wire list of an assay report form through Hough transformation, righting an inclined laboratory sheet photo, judging whether the laboratory sheet photo is a single-column laboratory sheet or a multi-column laboratory sheet, and then cutting the area where the content of each column of the three-wire list is located respectively, wherein the specific implementation process is as follows:
and finding out straight lines in all horizontal directions by using Hough transformation, calculating the slope of the straight lines, and calculating the inclination angle by taking the average value of the slope of the obtained straight lines in all horizontal directions, wherein the calculation formula is as follows:
wherein θ represents the tilt angle,representing the average of the slopes. After the tilt angle is obtained, the image is rotated so as to be kept horizontal.
The principle of detecting a straight line by hough transform is as follows:
using a polar coordinate system to represent a straight line, the expression of the straight line may be:
the simplification can be obtained:
r=xcosθ+ysinθ (5)
where r is the radius in polar form of the point (x, y) and θ represents the angle of the ray from the positive half axis (typically the x-axis) rotated counterclockwise to the line segment OQ where O is the origin and Q is the point (x, y).
Generally, for any one point (x 0 ,y 0 ) A family of straight lines passing through the point can be defined using equation (4), which means that each pair r and θ represents a passing point (x 0 ,y 0 ) Is a straight line of (a). It is possible to obtain a defined expression of the curve, and if x and y are determined, the equation formed by the unknowns corresponding to them can be regarded as a curve expression, and this new expression is called a new expression formed by these positions in the process of changing x and y in the parameter space (the independent variable is the original parameter), and the parameter represented by this fixed point is the original curve expression. However, the actual curve is affected by noise, that is, all expressions in the parameter space do not pass through the same point, but most of the curves pass through a fixed point, which may be more than one, so that the parameter space is divided into sub-regions, and the intersection points of the lines in the sub-regions are counted, and the region with the most intersection points is the parameter of the original curve.
The method comprises the steps of utilizing a Hough transformation technology to obtain straight lines in the horizontal direction and the vertical direction in a righted image, firstly obtaining maximum values and minimum values of longitudinal coordinates of left and right endpoints of the straight lines according to all detected straight lines in the horizontal direction, taking the obtained maximum values and the obtained minimum values as cutting boundaries, then utilizing the Hough transformation technology to detect whether straight lines in the vertical direction exist or not, obtaining transverse coordinates of two endpoints of the straight lines in the vertical direction if the straight lines exist, taking an average value of the transverse coordinates as a cutting boundary, and cutting the image into left and right parts.
Step 3: the first time of the laboratory sheet image passes through the OCR module, key information in the laboratory report form is positioned, sensitive information is subjected to desensitization treatment, and region cutting is carried out according to the key information in columns, wherein the specific implementation process is as follows:
step 3.1: the detection box coordinates of all text regions are located through a text detection network in the OCR module.
The invention provides a differential binary network (CSQ-DBNet) of cascade sparse query, which is based on a differential binary network (DBNet) and aims at improving scene characteristics of medical laboratory sheet detection and identification, and the model structure is shown in figure 2.
First, the complete image is input into a differentiable binarized text detection model, and the text box positions of all texts in the image are found. Image features with length and width of 1/4 of the original image are extracted through a multi-level feature pyramid structure (FPN). In consideration of the fact that the detection model is more sensitive to the perception of fine-grained table boundaries and texts in a laboratory sheet scene, more fine-grained local features are extracted.
Based on the original DBNet, considering that most of text boxes in a scene of laboratory sheet recognition are small-size targets, the invention adopts a cascade sparse query module (CSQ) to improve the speed and precision of small-size target detection. The basic idea of CSQ is to determine the approximate position of a small object from a coarse-grained feature map, and then in the next layer, perform convolution calculation by using sparse convolution (as shown in fig. 3, if and only if the central position of the convolution kernel is a value other than 0, and output a result, so that only two convolution calculations are required to be performed in fig. 3) to predict the possible position of the small object, so that the convolution operation required to be performed for the high-resolution feature map can be greatly reduced and the accuracy of detecting the small object can be improved.
In the invention, the comparison examples are the characteristic diagrams with original image sizes of 1/4, 1/8, 1/16 and 1/32 which are subjected to sparse convolution. The basis of the sparse convolution operation of the first three is a Small target query (Sq) given to the latter feature map, and the Small target query has the function of keeping the original value of the position of the Small target which is likely to exist on the original feature map and keeping the position value of the Small target which is likely to not exist as 0. These small target queries are trained, so unlike the original DBNet, the model is trained with the loss of each feature map added to the small target query's loss function, which is a binary classification loss that determines if the target is a small target.
And then, inserting a channel attention module when the feature map splicing operation is carried out, so that the model can learn to give different weights to the feature maps with different scales.
Then, the spliced image features respectively pass through a 3×3 convolution layer and two deconvolution layers with step length of 2 to generate a probability map and a threshold map. And finally, generating an accurate text box by utilizing the approximate binary image obtained by differential binarization according to the predicted probability image and the threshold image, and returning the coordinate information of the text box. The calculation formula is as follows:
wherein B is an approximate binary image; p is a probability map generated by the network; t is a network generation threshold diagram, l is an amplification factor, and is set to be 50 according to experimental results.
Step 3.2: text content within all text regions is identified through a text recognition network in the OCR module.
And (3) inputting the text region obtained in the step (3.1) into a convolutional recurrent neural network, and identifying the text content. Firstly, a text region image after detection and positioning enters a convolution layer, the convolution network converts an original image into a characteristic sequence and then inputs the characteristic sequence into a cyclic neural network, and the cyclic neural network is a deep two-way long-short-term memory network (LSTM). The recurrent neural network processes the feature sequence of the image to give the predicted content of the text.
Step 3.3: and desensitizing the sensitive information according to the text detection box coordinates and the content obtained by the OCR module. And comparing the sensitive information with each text in the OCR recognition result through a sequence matching-based algorithm, calculating the editing distance between the sensitive information and each text, and covering the word with the shortest editing distance.
The edit distance refers to the minimum number of edits required to change one character string to another, wherein the allowed editing operations include: insert, delete, replace a character. And (3) performing sequence matching on the text of the detection and identification result in the step (3.2) and the sensitive information, wherein the result is a candidate text list, and the candidate texts are arranged in ascending order according to the editing distance between each text and the sensitive information, and the editing distance between the first item of the list and the sensitive information is the shortest, so that the candidate texts are used as the texts to be processed.
According to the sensitive information and the character string length of the text to be processed, different modes are selected to desensitize the text to be processed: if the length of the text to be processed is larger than the length of the sensitive information, covering a rectangular area where the text to be processed is located; if the length of the text to be processed is smaller than the length of the sensitive information, jointly covering a rectangular area where the text to be processed is located and a rectangular area where the next text in the candidate text list is located; if the length of the sensitive information text is equal to that of the text to be processed, or if the sensitive information is the text to be processed plus a colon, the rectangular area where the text to be processed is located is lengthened to twice and covered.
Step 3.4: according to the coordinates and the content of the text detection frame obtained by the OCR module, a detection frame corresponding to a result field in the three-line table header is found, and the detection frame is cut to obtain all left areas A1 and an area A2 in which the detection frame is arranged, wherein the area A1 comprises important item name information such as an item Chinese name, an item English name and the like, and the area A2 comprises an assay result corresponding to the item name.
All detection and identification results are arranged, and the data format is [ 'text content', (x 1, y1, x2, y 2) and confidence ], wherein (x 1, y 1) and (x 2, y 2) are coordinates of the upper left corner and the lower right corner of the text region box respectively. And searching a text box with text content being a result from the detection and identification result, extracting coordinate information of the text box, and providing the coordinate information for a subsequent cutting step.
And (3) performing second clipping according to the coordinates of the text box with the detection and identification content of ' result ', and respectively cutting the original laboratory sheet image into a region which simultaneously contains a column of a "(detection) item ' and a column of a" result "and a region which only contains the column of the" result ", wherein the coordinate information of the" result "is (x 11, y11, x12, y 12). In the field of computer vision, the upper left corner of a picture is taken as the origin of a two-dimensional coordinate system, the left-to-right horizontal direction is taken as the positive x-axis direction, and the top-to-bottom vertical direction is taken as the positive y-axis direction. Therefore, after combining the experimental micro-tuning value range, intercept [ y11: ,: x12] as the area containing both the column in which the "(detection) item" is located and the column in which the "result" is located is the area A1, intercept [ y11: x11: x12] is the area A2 as the area containing only the column where "result" is located, as shown in fig. 4.
Step 4: the second time of the laboratory sheet image passes through the OCR module, text detection and recognition are carried out on the cutting area, and a structured recognition result of the laboratory sheet is obtained, wherein the specific implementation process is as follows:
step 4.1: and (3) inputting the image A1 which is obtained in the step (3) and contains the item column and the result column into a differentiable binarization text detection network-convolution cyclic neural network of the cascade sparse query to perform text region identification, and obtaining detection identification results of all texts in the region.
Step 4.2: and calling a pandas toolkit through the structured output module, and writing the text recognition result into an excel table file according to the relative position in the image to form structured form data.
Step 4.3: and inputting the picture A2 only comprising the 'result' column into a differentiable binarized text detection network-convolution cyclic neural network of the cascade sparse query to obtain the identification results of all texts in the 'result' column. And calling the pandas toolkit to convert the pandas toolkit into a single-column data frame and writing the single-column data frame into an excel table file.
Step 5: the method automatically checks and sorts the structured form where the project names are located, and comprises the following specific implementation processes:
step 5.1: and merging the contents of all columns into the first column, and unifying the table information into a character string type.
Step 5.2: the English and the number and the Chinese which are separately arranged in a row are bonded by correctly combining the contents of all the cells in one column.
Step 5.3: the contents of all cells in a column are correctly separated, and the names of two detection items which are written as two items but are adhered to the same row are separated.
Step 6: the automatic error correction is carried out on the item information in the structured form after the arrangement through an AC automaton and a BK tree, and the specific implementation process is as follows:
step 6.1: firstly, collecting overall project names, establishing a medical laboratory sheet project name dictionary, carrying out post-processing on the result obtained in the step 5, and obtaining the project names by using an AC automaton.
Step 6.1.1: and establishing a dictionary tree.
A dictionary tree is an ordered tree used to hold an associative array. All descendants of a node have the same prefix, namely the character string corresponding to the node, the character string represented by the child node is composed of the original character string of the node itself and all characters on the path leading to the child node, and the root node corresponds to the empty character string. The idea here is that a prefix needs to be found, and the current matching progress is kept to the maximum on the premise that the prefix is equal to a certain suffix. Moving the same ideas to the dictionary tree, what needs to be found is: a node representing a string that is the longest suffix of the string represented by the current node. And (3) unifying the correct project name information base into character string types and marking the character string types as P strings, and marking the table information obtained in the step (4) as T strings. And performing state jump byte by byte according to the input character string according to the AC automaton until the whole input detection method is completed, and building a dictionary tree, as shown in fig. 5.
Step 6.1.2: a fail pointer and an exit array are established.
Before the fail table is built, it is necessary to specify the attributes of the parent node, the child node list, the fail node, the node value, and whether or not it is a tail node of each node, as shown in fig. 6. The fail pointer in the mismatch is stored in the fail table, and the fail pointer is the character position which can jump to continue matching after the current position is mismatched, so that the effect of no need of backtracking in the matching process is achieved. The construction process uses the BFS (breadth first search) algorithm. The fail pointer points to the longest suffix of the accessed node in the dictionary tree. Traversing the built dictionary tree hierarchy according to the definition of the fail pointer, and recording the fail pointer of each identifiable state, thereby facilitating the subsequent identification process. The Exit array stores the item string length information, which is convenient for correctly outputting the identified item name, as shown in FIG. 7.
Step 6.1.3: pattern matching. The core of the matching is to start from the head of the target string one by one, match in the AC automaton, count if matching, jump to the mismatch position to try matching if not matching, until all matching is completed. The matching code logic defines that the p pointer initially points to the root and then begins traversing each character of the target string. The traversal procedure is as follows:
(1) Assuming that the current target string character is i, when the condition 1 is satisfied (i is not in the p child nodes and p is not root node either), circulating to find the fail node of p until the condition 1 is not satisfied;
(2) i is in the p sub-node, p points to the node, otherwise points to the root node;
(3) Defining temp to point to p, starting to attempt matching, if temp is not a root node, entering a loop, otherwise jumping out:
(3. A) if temp is the tail of the pattern string, judging whether temp is already matched and successful, if not, recording the successful matching frequency of the pattern string as 1, otherwise, adding one to the original successful matching frequency;
(3.b) temp is directed to temp.fail.
Sequentially identifying the characters in the T strings, and if the T strings are consistent, detecting according to an AC automaton; if the pattern string is matched on the dictionary tree, the key word of the current node cannot be matched continuously, and the node pointed by the fail pointer of the current node should be removed to be matched continuously. After matching is successful, the length of the pattern string is obtained according to the exit array return value, so that the item name is output, as shown in fig. 8.
Step 6.2: and establishing a BK tree by utilizing all correct item names of the test report, and for the incorrect item names left after the correct results of OCR recognition are extracted by the AC automaton, rapidly finding out the correct item name with higher matching degree with the error item name from the established BK tree, thereby achieving the purpose of error correction.
Step 6.2.1: and establishing a BK tree.
The BK tree is built with the correct item names, then the appropriate edit distance is found, and for incorrect item names the BK tree algorithm is used to match the corresponding correct item names from the built BK tree. The specific BK tree building procedure is as follows (taking fig. 9 as an example, the numbers above the arrow are edit distances):
(1) Randomly taking a word from the entered words as a root node, e.g. "neutrophil absolute value";
(2) Randomly continuing to select the next word, such as 'eosinophil absolute value', calculating the edit distance between the words, and taking the 'eosinophil absolute value' node as a new branch of the 'neutrophil absolute value' root node;
(3) Continuing to select the next word, e.g. "absolute basophil value", still traversing from the root node, the edit distances of "absolute basophil value" and "neutrophil absolute value" are calculated, at which point the branch of edit distance 1 is found to already exist ("eosinophil absolute value" and edit distance of "neutrophil absolute value" is 1). At this time, the edit distance of "absolute basophil value" and "eosinophil absolute value" is calculated along this branch, whereas the "eosinophil absolute value" node has no branch whose edit distance is 1, and therefore "absolute basophil value" is mounted under the "eosinophil absolute value" node as a new branch of the "eosinophil absolute value" node.
(4) The remaining words are sequentially selected and continuously expanded according to the steps 2 and 3, so that a BK tree is constructed.
The BK tree thus constructed has the following characteristics: all descendant nodes under the branch 1 of the root node 'neutrophil absolute value' and the editing distance thereof are 1, and all descendant nodes under the branch 2 and the editing distance thereof are 2; similarly, all offspring nodes under node "eosinophil absolute" branch 1 and its edit distance are 1. This property can reduce the amount of computation when querying with the BK tree.
Step 6.2.2: query error correction is performed using a BK tree.
The core idea of BK tree is: let d (x, y) denote the edit distance of the string x to y, d (x, y) +d (y, z). Gtoreq.d (x, z) (the minimum number of edits required to change from x to z does not exceed the minimum number of edits required to change x to y and then to z, this property is called the triangle inequality, and the sum of two sides is necessarily greater than the third side).
If the word with the editing distance not exceeding n from the target word needs to be returned, if the node where the query word is located is found, all nodes on the subtree with branches smaller than n are known to have the editing distance smaller than n from the property of the BK tree.
The other part is the ancestor node or other branch of the query term, and the core ideas triangle inequality of the BK tree is used for optimizing the calculation amount. Further, the goal is to find all the words of d (q, t). Ltoreq.n, where q represents the query word, t represents the target word, and n is the set edit distance threshold. The triangular inequality is used to have d (r, t). Ltoreq.d+d (q, t). Ltoreq.d+n and d (r, t). Gtoreq.d-d (q, t). Gtoreq.d-n and where d represents the edit distance of the target word t from other nodes r, e.g., root nodes, then d-n.ltoreq.d (r, t). Ltoreq.d+n, so that only the edges of each child node numbered d-n through d+n (inclusive) need be recursively searched, and since n is typically small, many subtrees can be excluded each time a comparison is made with a node.
Step 7: the project information is combined in a table and the result is output into a JSON format, and the specific implementation process is as follows:
counting the number of the items, taking out the data with the corresponding number from the table of the item information identification result, merging the identified item information into the same new table, and finally outputting the tidied new table in a JSON format.
The above disclosure is only a preferred embodiment of the present invention, and it should be understood that the scope of the invention is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present invention.

Claims (10)

1. The method for structurally recognizing the test report form based on deep learning and automatic error correction is characterized by comprising the following steps of:
step 1: preprocessing the imaging effect of the test report photo;
step 2: the Hough transformation centralizes and cuts the table;
step 3: the first time passes through the OCR module, the key information is positioned to cut the region;
step 4: the text detection and recognition are carried out on the cutting area through the OCR module for the second time, and a structured recognition result is obtained;
step 5: automatically checking and sorting project information;
step 6: automatically correcting the item information in the tidied structured form through BK tree and AC automaton;
step 7: project information is combined and output.
2. The method for structured recognition of an assay report based on deep learning and automatic error correction as set forth in claim 1,
the process for preprocessing the imaging effect of the test report photo comprises the following steps:
inputting an assay report image, and converting the RGB image into a gray scale image by gray scale treatment;
performing picture processing by using Gaussian filtering, and reducing picture noise;
images with non-uniform illumination conditions or background noise are processed using adaptive threshold binarization.
3. The method for structured recognition of an assay report based on deep learning and automatic error correction as set forth in claim 2,
in the process of centralizing and cutting the table through Hough transformation, the position of the test report table is found through Hough transformation, the inclined test report table photo is centralized, whether a single-column test table or a multi-column test table is judged, and then the area where the content of each column of the table is located is cut respectively.
4. The method for structured recognition of an assay report based on deep learning and automatic error correction according to claim 3,
the first time the OCR module is used for positioning key information to cut the region, the method comprises the following steps:
positioning detection frame coordinates of all text areas through a text detection network in the OCR module;
identifying text content within all text regions through a text recognition network in the OCR module;
desensitizing the sensitive information according to the text detection frame coordinates and the content obtained by the OCR module;
according to the text detection frame coordinates and contents obtained by the OCR module, a detection frame corresponding to a result field in the table is found, all the detection frame is cut to left to form an area A1 and an area A2 in which the detection frame is arranged, wherein the area A1 comprises important item name information of an item Chinese name or an item English name, and the area A2 comprises an assay result corresponding to the item name.
5. The method for structured recognition of an assay report based on deep learning and automatic error correction as set forth in claim 4,
in step 4, text detection and recognition are performed on the cut areas A1 and A2, respectively, the recognition result of the area A1 includes an excel table of item information, and the recognition result of the area A2 includes a single-column data frame of the assay result.
6. The method for structured recognition of an assay report based on deep learning and automatic error correction of claim 5,
the text detection network in the OCR module adopts a differential binarization network of cascade sparse query, and is used for positioning the text position in the test report form and outputting rectangular frame coordinates of the text position; the text recognition network adopts CRNN for recognizing text content in the rectangular box detected by the text detection network.
7. The method for structured recognition of an assay report based on deep learning and automatic error correction of claim 6,
the process for automatically checking and sorting the project information comprises the following steps:
merging the contents of all columns into a first column, and unifying the table information into a character string type;
the contents of all cells in a column are correctly combined, and English, numerals and Chinese which are singly in a row are bonded;
the contents of all cells in a column are correctly separated, and the names of two detection items which are written as two items but are adhered to the same row are separated.
8. The method for structured recognition of an assay report based on deep learning and automatic error correction as set forth in claim 7,
the executing process of the step 6 specifically includes the steps of firstly collecting general project information, establishing a medical laboratory sheet project information dictionary, carrying out post-processing on the result obtained in the step 5, obtaining project information by using an AC automaton, then establishing a BK tree by using all correct project information of the medical laboratory sheet, and quickly finding out the correct project information with higher matching degree with error project information from the established BK tree for incorrect project information left after the correct result of OCR recognition is extracted by the AC automaton, thereby achieving the purpose of error correction.
9. The method for structured recognition of an assay report based on deep learning and automatic error correction as set forth in claim 8,
in the process of merging and outputting the project information, counting the number of projects, taking out the data with corresponding number from the form of the project information identification result, merging the identified project information into the same new form, and finally outputting the tidied new form in a JSON format.
10. The method for structured recognition of an assay report based on deep learning and automatic error correction as set forth in claim 8,
if the test report is judged to be a multi-column test report in the step 2, the processing procedures of the steps 4 to 7 are carried out on the test report of each column, and the structural identification result of each column is respectively obtained.
CN202311143625.9A 2023-09-06 2023-09-06 Test report structured recognition method based on deep learning and automatic error correction Pending CN117116409A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311143625.9A CN117116409A (en) 2023-09-06 2023-09-06 Test report structured recognition method based on deep learning and automatic error correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311143625.9A CN117116409A (en) 2023-09-06 2023-09-06 Test report structured recognition method based on deep learning and automatic error correction

Publications (1)

Publication Number Publication Date
CN117116409A true CN117116409A (en) 2023-11-24

Family

ID=88794602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311143625.9A Pending CN117116409A (en) 2023-09-06 2023-09-06 Test report structured recognition method based on deep learning and automatic error correction

Country Status (1)

Country Link
CN (1) CN117116409A (en)

Similar Documents

Publication Publication Date Title
CN111325203B (en) American license plate recognition method and system based on image correction
KR100248917B1 (en) Pattern recognizing apparatus and method
Lee et al. Binary segmentation algorithm for English cursive handwriting recognition
RU2598300C2 (en) Methods and systems for automatic recognition of characters using forest solutions
CN111626146A (en) Merging cell table segmentation and identification method based on template matching
US9047655B2 (en) Computer vision-based methods for enhanced JBIG2 and generic bitonal compression
US9589185B2 (en) Symbol recognition using decision forests
CN113158808A (en) Method, medium and equipment for Chinese ancient book character recognition, paragraph grouping and layout reconstruction
CN112016481B (en) OCR-based financial statement information detection and recognition method
CN113705576B (en) Text recognition method and device, readable storage medium and equipment
CN112836650A (en) Semantic analysis method and system for quality inspection report scanning image table
CN113537227B (en) Structured text recognition method and system
CN109190625A (en) A kind of container number identification method of wide-angle perspective distortion
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN116824608A (en) Answer sheet layout analysis method based on target detection technology
CN111340032A (en) Character recognition method based on application scene in financial field
CN111626145A (en) Simple and effective incomplete form identification and page-crossing splicing method
CN112784932A (en) Font identification method and device and storage medium
CN111832497B (en) Text detection post-processing method based on geometric features
CN111553361B (en) Pathological section label identification method
Zhiming et al. Automatic container code recognition via faster-RCNN
CN116310826B (en) High-resolution remote sensing image forest land secondary classification method based on graphic neural network
CN116152824A (en) Invoice information extraction method and system
CN117116409A (en) Test report structured recognition method based on deep learning and automatic error correction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination