WO2021168703A1 - 字符处理及字符识别方法、存储介质和终端设备 - Google Patents

字符处理及字符识别方法、存储介质和终端设备 Download PDF

Info

Publication number
WO2021168703A1
WO2021168703A1 PCT/CN2020/076828 CN2020076828W WO2021168703A1 WO 2021168703 A1 WO2021168703 A1 WO 2021168703A1 CN 2020076828 W CN2020076828 W CN 2020076828W WO 2021168703 A1 WO2021168703 A1 WO 2021168703A1
Authority
WO
WIPO (PCT)
Prior art keywords
characters
character
difference
spacing
determined
Prior art date
Application number
PCT/CN2020/076828
Other languages
English (en)
French (fr)
Inventor
苗新宇
王川
王洪
雷一鸣
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to PCT/CN2020/076828 priority Critical patent/WO2021168703A1/zh
Priority to CN202080000183.0A priority patent/CN113557520A/zh
Publication of WO2021168703A1 publication Critical patent/WO2021168703A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the embodiments of the present disclosure relate to, but are not limited to, word processing technology, in particular to a character processing and character recognition method, storage medium, and terminal device.
  • the paper test form After a patient undergoes a physical examination or a check-up test in a physical examination institution or hospital, the paper test form is not easy to keep. Moreover, when the user goes to another hospital for examination, a series of problems such as the inability to structure the paper test report data, the inability to use the data between the medical examination institution and the hospital or the hospital and the hospital, make the current hospitals unable to conduct a lot of information on the patient’s situation. Good assessment. It often happens that a new hospital has to be rechecked, and a lot of time, money and manpower are wasted. Therefore, there is a need for a method to structure the data in the patient’s paper physical examination report or laboratory test form, and integrate the fragmented laboratory test information. This is important for the establishment of patient electronic medical records and the communication of data between various hospitals. significance.
  • the form is an important part of the laboratory test form and the medical examination report form.
  • the recognition of the characters in the form is a problem that needs to be solved.
  • an embodiment of the present disclosure provides a character processing method, including: recognizing and obtaining the coordinates of characters in an image; using a kernel density function to perform a clustering calculation on the difference between the first coordinate values of two adjacent characters to determine that they belong to Characters in the same row.
  • embodiments of the present disclosure provide a character recognition method, including: preprocessing an image; recognizing the coordinates of characters in the image; using the aforementioned character processing method to recognize characters in the image, and determine whether they belong to Characters in the same row.
  • embodiments of the present disclosure provide a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to implement the foregoing method.
  • embodiments of the present disclosure provide a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor executes the program, the above method is implemented. step.
  • FIG. 1 is a flowchart of a word processing method provided by an embodiment of the disclosure
  • FIG. 2 is a flowchart of a method for implementing step 120 in the steps shown in FIG. 1;
  • FIG. 3 is a flowchart of another method for implementing step 120 in the steps shown in FIG. 1;
  • FIG. 4 is a flowchart of a text recognition method provided by an embodiment of the disclosure.
  • FIG. 5 is a flowchart of another text recognition method provided by an embodiment of the disclosure.
  • Fig. 6 is a general flow chart of the algorithm of an exemplary embodiment of the present disclosure.
  • Fig. 7 is a flowchart of picture preprocessing in an exemplary embodiment of the present disclosure.
  • FIG. 8 is an example of a table image with a tilted situation in an exemplary embodiment of the present disclosure.
  • FIG. 9 is an image after image preprocessing according to an embodiment of the present disclosure.
  • FIG. 10 is a flowchart of table reconstruction according to an exemplary embodiment of the present disclosure.
  • FIG. 11 is a density curve diagram fitted by a discrete sequence of kernel density estimation calculation in an exemplary embodiment of the present disclosure
  • FIG. 12 is a diagram showing a table recognition result of an exemplary embodiment of the present disclosure.
  • FIG. 13a and 13b are diagrams showing the effect of an application program to which the method of an embodiment of the present disclosure is applied;
  • FIG. 14 is a schematic structural diagram of a terminal device provided by an embodiment of the disclosure.
  • the present disclosure includes and contemplates combinations with features and elements known to those of ordinary skill in the art.
  • the embodiments, features, and elements disclosed in the present disclosure can also be combined with any conventional features or elements to form a unique invention solution defined by the claims.
  • Any feature or element of any embodiment can also be combined with features or elements from other invention solutions to form another unique invention solution defined by the claims. Therefore, it should be understood that any feature shown and/or discussed in this disclosure can be implemented individually or in any suitable combination. Therefore, the embodiments are not subject to other restrictions except for the restrictions made according to the appended claims and their equivalents.
  • various modifications and changes can be made within the protection scope of the appended claims.
  • the specification may have presented the method and/or process as a specific sequence of steps. However, to the extent that the method or process does not depend on the specific order of the steps described herein, the method or process should not be limited to the steps in the specific order described. As those of ordinary skill in the art will understand, other sequence of steps are also possible. Therefore, the specific order of the steps set forth in the specification should not be construed as a limitation on the claims. In addition, the claims for the method and/or process should not be limited to performing their steps in the written order. Those skilled in the art can easily understand that these orders can be changed and still remain within the spirit and scope of the embodiments of the present disclosure. Inside.
  • OCR Optical Character Recognition
  • One method of OCR table recognition is to recognize the borders in the table, divide the text in the borders into text blocks, identify the content of each text block, and then combine the text blocks and the borders to form a table.
  • OCR table recognition is to recognize the borders in the table, divide the text in the borders into text blocks, identify the content of each text block, and then combine the text blocks and the borders to form a table.
  • most of the medical check-up test form forms are borderless forms, which will make it impossible to distinguish each text block in the form.
  • the photos may have a certain degree of distortion, although the perspective transformation in the Open Source Computer Vision Library (Open Source Computer Vision Library, referred to as opencv) can be used to correct the image to a certain extent.
  • Opencv Open Source Computer Vision Library
  • the text originally on the same horizontal line will be offset to a certain extent, which will have a huge impact on the recognition result of the OCR table.
  • the text that should be in the same line may be recognized as multiple lines, which affects the recognition accuracy. And precision.
  • the embodiments of the present disclosure provide a character processing and character recognition method, which can correct the error of recognizing characters belonging to one row as multiple rows due to image deformation. It can be applied to an image including characters, or an image including a frameless form, or an image including a framed form.
  • FIG. 1 is a flowchart of a character processing method provided by an embodiment of the present disclosure. As shown in the figure, it includes the following steps 110 to 120.
  • Step 110 Recognize and obtain the coordinates of the characters in the image
  • the OCR text recognition method can be used to recognize the coordinates of the characters in the form image to obtain the coordinate positions of the characters in the form image.
  • Step 120 Use the kernel density function to perform a clustering calculation on the difference between the first coordinate values of two adjacent characters, and determine the characters belonging to the same row.
  • the same row includes: the same row or the same column, which is determined according to the arrangement of characters.
  • “row” includes rows, that is, it is determined that the characters belong to the same row.
  • “row” includes columns, that is, characters that belong to the same column are determined.
  • the row includes rows, and in step 120, using the kernel density function to perform clustering calculation on the difference between the first coordinate values of two adjacent characters to determine the characters belonging to the same row includes:
  • the kernel density function is used to perform clustering calculation on the difference between the first ordinates of two laterally adjacent characters to determine the first lateral spacing, and to determine the characters belonging to the same row according to the first lateral spacing.
  • the two adjacent characters in the lateral direction include characters that are adjacent in position, for example, characters that are adjacent in the x-axis direction, and characters that are adjacent in content.
  • the last character in one line and the first character in the next line can be regarded as adjacent characters in content, that is, they belong to horizontally adjacent characters.
  • Writing in horizontal direction and writing from left to right and writing from top to bottom is an example.
  • one or more points in all characters can be preset The coordinates of is summarized in a list in the order of characters from left to right and lines from top to bottom. Each character in the list is one item, and two adjacent items in the list indicate two adjacent characters.
  • the first ordinate refers to the ordinate of one or more preset points.
  • the difference between the first ordinate of two adjacent characters in the lateral direction may be the difference between the first ordinate of one or more points in the first character and the first ordinate of the corresponding one or more points in the second character. Difference, including one of the following:
  • Case 1 The difference between the ordinate of a point in the first character and the corresponding point in the second character, for example, the difference between the ordinate of the uppermost point of the first character and the ordinate of the uppermost point of the second character, or the first character
  • the end point, the lowermost point, and the center point are only examples for illustration. In other embodiments, other points can be selected for calculation, as long as the standard for selecting points is unified;
  • Case 2 The difference between the mean value of the ordinates of the multiple points in the first character and the mean value of the ordinates of the corresponding points in the second character, for example, the mean value of the ordinates of the multiple upper end points in the first character corresponds to that in the second character
  • first characters and second characters For any two laterally adjacent first characters and second characters, calculate the first ordinate of one or more points in the first character and the first ordinate of the corresponding one or more points in the second character A difference in the ordinate; the kernel density function is used to perform clustering calculation on all the calculated differences, the first lateral distance is determined according to the calculation result (minimum value) of the kernel density function, and the lateral phase is determined according to the first lateral distance The difference between the first ordinates of two adjacent characters is judged, and the characters belonging to the same character line are determined.
  • the kernel density function is a method of estimating the basic distribution of data.
  • a kernel function such as a Gaussian kernel
  • all the kernel functions are added to obtain the kernel density estimate of the data set.
  • the kernel function bandwidth parameter used is different, the density function obtained will be different.
  • Mean shift (a gradient ascent algorithm) is used to make the data points move in the direction where the density increases the fastest, and finally converge at the local maximum point to form a cluster, and the points that converge to the same maximum value are members of the same cluster.
  • Mean shift a gradient ascent algorithm
  • Each cluster represents a type of spacing.
  • the boundary of the cluster (the minimum value of the kernel density function) is also the boundary of this type of spacing, so it can be calculated Find the minimum value of the kernel density function and determine the corresponding distance.
  • a minimum value represents the maximum value of a horizontal spacing.
  • the first horizontal distance represents the maximum character line offset, that is, the deviation of two adjacent characters in the x-axis direction in the y-axis direction in the same line.
  • the row includes columns
  • the kernel density function is used in step 120 to perform clustering calculation on the difference between the first coordinate values of two adjacent characters to determine the characters belonging to the same row, including : Use the kernel density function to perform clustering calculation on the difference between the first abscissas of two longitudinally adjacent characters, determine the first longitudinal spacing, and determine the characters belonging to the same column according to the first longitudinal spacing.
  • the two longitudinally adjacent characters include characters that are adjacent in position, for example, characters that are adjacent in the y-axis direction, and characters that are adjacent in content.
  • the last character in one column and the first character in the next column can be regarded as adjacent characters in content, that is, they belong to adjacent characters in the vertical direction.
  • the preset characters After recognizing the coordinates of all characters in the image, one or more of the preset characters The coordinates of the points are summarized in a list in the order of characters from top to bottom and columns from right to left. Each character in the list is one item, and two adjacent items in the list indicate two adjacent characters.
  • the first abscissa refers to the abscissa of one or more preset points.
  • the difference between the first abscissa of the two adjacent characters in the longitudinal direction may be the first abscissa of one or more points in the third character and the first abscissa of the corresponding one or more points in the fourth character. Difference, including one of the following:
  • Case 3 The difference between the abscissa of a point in the third character and the corresponding point in the fourth character, for example, the difference between the abscissa of the leftmost point of the third character and the abscissa of the leftmost point of the fourth character, Or the difference between the abscissa of the rightmost point of the third character and the abscissa of the rightmost point of the fourth character, or the difference between the abscissa of the center point of the third character and the abscissa of the center point of the fourth character;
  • the above-mentioned leftmost point, rightmost point, and center point are only examples. In other embodiments, other points can be selected for calculation, as long as the standard for selecting points is unified;
  • Case 4 The difference between the mean value of the abscissa of the multiple points in the third character and the mean value of the abscissa of the corresponding multiple points in the fourth character, for example, the mean value of the abscissa of the multiple left end points in the third character and the mean value of the abscissa in the fourth character.
  • the kernel density function is used to perform clustering calculation on all the calculated differences, the first longitudinal distance is determined according to the calculation result (minimum value) of the kernel density function, and the longitudinal phase is determined according to the first longitudinal distance.
  • the kernel density function performs clustering calculation on all the calculated differences, and the calculation result represents one or several types of distances that may appear in the graph.
  • a minimum value represents the maximum value of a longitudinal spacing.
  • the first longitudinal distance represents the maximum value of character column offset, that is, the deviation of two adjacent characters in the x-axis direction in the y-axis direction in the same column.
  • the offset of the character row in the image can be determined by performing a clustering calculation on the difference between the first coordinate values of two adjacent characters. For example, one or more minimum values can be obtained by using the kernel density function, where the smallest first minimum value represents the maximum value of the character row offset, which can be used to determine the characters belonging to the same character row or character column, when adjacent When the difference between the first coordinate values of the two characters is less than or equal to the first minimum value, it can be determined that the two adjacent characters belong to the same character row or character column.
  • the implementation method of the present disclosure can correct the characters whose positions are shifted due to the tilt of the shooting angle.
  • the clustering kernel density algorithm can classify these shifts into their correct category, that is, the shift can be removed and the original Characters belonging to the same row or column are recognized as the same row or column.
  • clustering calculation of the coordinate difference using the kernel density function may obtain a minimum value, which can be used to correct the offset of the character row. It is also possible to use the kernel density function to perform clustering calculation of the coordinate difference to obtain multiple minimum values.
  • the smallest minimum value (the first minimum value) is used to correct the offset of the character row, and the other one or more minimum values
  • the value (the second minimum value) reflects the existence of several types of spacing in the image (for horizontal writing, the spacing includes line spacing, for vertical writing, the spacing includes column spacing), a second minimum value corresponds to The maximum value of a type of spacing.
  • one or more of the minimum values obtained by clustering the coordinate difference using the kernel density function may all represent the distance, and one minimum value corresponds to the maximum value of a type of distance.
  • the row includes behavior examples
  • the character processing method further includes a step of determining a line spacing and a step of determining a character column group.
  • the foregoing step 120 may include the following steps 121 to 124.
  • Step 121 For any two laterally adjacent first characters and second characters, calculate the first ordinate of one or more points in the first character and one or more corresponding ones in the second character The difference of the first ordinate of the point;
  • Step 122 Use the kernel density function to perform clustering calculation on all the calculated differences, determine the first lateral distance according to the calculation result of the kernel density function (for example, the first minimum value), and determine the first lateral distance according to the first lateral distance. Determine the difference between the first ordinates of the two characters and determine the characters belonging to the same character line;
  • Step 123 Determine a second lateral distance according to the calculation result of the kernel density function (for example, the second minimum value), and judge the difference between the first ordinates of two laterally adjacent characters according to the second lateral distance, Determine the line spacing;
  • the kernel density function for example, the second minimum value
  • step 122 and step 123 can be executed in combination.
  • the kernel density function is used to cluster all the calculated differences. Since the horizontally adjacent characters include not only positionally adjacent characters but also content adjacent characters, the kernel density function is used to cluster the coordinate differences of horizontally adjacent characters. The maximum value of the offset of the character line and the maximum value of one or more line spacings (ie, the second horizontal spacing) can be obtained.
  • the calculation result of the kernel density function includes one or more minimum values, such as the smallest first minimum value and the second minimum value greater than the first minimum value, where the first minimum value represents the smallest type that exists
  • the maximum value of spacing (character line offset)
  • the first minimum value represents the maximum offset value of two adjacent characters in the x-axis direction in the y-axis direction
  • the first minimum value can be used to correct slanted characters
  • there can be one or more second minimum values which can represent one or more row spacings, that is, the spacing between two adjacent rows.
  • Step 124 Use the kernel density function to perform a clustering calculation on the difference between the second coordinate values of two laterally adjacent characters to determine a third lateral distance, and calculate the second lateral distance of the two laterally adjacent characters according to the third lateral distance. The difference between the coordinate values is judged, it is judged whether the two adjacent characters in the horizontal direction belong to the same character string group, and finally the characters belonging to the same character string group are determined.
  • the second coordinate value is the second abscissa, which may be the abscissa of one or more preset points.
  • the difference between the second abscissas of two adjacent characters in the horizontal direction may be the second abscissa of one or more points in the fifth character and the sixth character.
  • the difference of the second abscissa of the corresponding or corresponding one or more points includes one of the following situations:
  • Case 5 The difference between the abscissa of a point in the fifth character and the corresponding point in the sixth character, for example, the difference between the abscissa of the leftmost point of the fifth character and the abscissa of the leftmost point of the sixth character, Or the difference between the abscissa of the rightmost point of the fifth character and the abscissa of the rightmost point of the sixth character, or the difference between the abscissa of the center point of the fifth character and the abscissa of the center point of the sixth character;
  • the above-mentioned leftmost point, rightmost point, and center point are only examples. In other embodiments, other points can be selected for calculation, as long as the standard for selecting points is unified;
  • Case 7 The difference between the mean value of the abscissas of multiple points in the fifth character and the mean value of the abscissas of the corresponding points in the sixth character, for example, the mean value of the abscissas of the multiple left end points in the fifth character and that of the sixth character.
  • Case 8 The difference between the abscissas of multiple points in the fifth character and the corresponding multiple points in the sixth character, for example, the mean value of the abscissas of the multiple right end points in the fifth character and the multiple left end points in the sixth character The difference of the mean value of the abscissa, the abscissa of the sixth character is greater than the abscissa of the fifth character;
  • corresponding endpoints refer to endpoints that have the same position or the same rules for selecting positions.
  • corresponding end points refer to end points that have mirror symmetry.
  • the third lateral distance represents the coordinate difference between two adjacent characters in the x-axis direction in the x-axis direction, and “adjacent” here includes the case where the foregoing content is adjacent.
  • the kernel density function is used to perform clustering calculation on the difference of all the calculated second coordinate values.
  • the calculation result of the kernel density function includes one or more minimum values, and each minimum value represents the existence of a type of third horizontal spacing in the figure. , For example, including the smallest third smallest value and the fourth smallest value greater than the third smallest value, where the smallest third smallest value represents the spacing between normal characters, such as " ⁇ " in the word "character” The distance between "Fu” and "Fu”. There may be one or more fourth minimum values greater than the third minimum value, which can represent a variety of spacings.
  • the last character in the first column and the first character in the second column The space between the two is a kind of space, and the space between the last word in the second column and the first word in the third column may be another kind of space. Since the writing method in this embodiment is horizontal writing, it is not necessary to consider the part of the difference less than the smallest third minimum value, and only consider the difference between the third minimum value and the fourth minimum value. When it is determined that the difference between the second coordinate values of the two laterally adjacent characters is greater than the third minimum value and less than the fourth minimum value, or the difference between the second coordinate values of the two laterally adjacent characters is determined When between the two fourth minimum values, it is determined that the two adjacent characters in the horizontal direction belong to two character string groups respectively.
  • one or more character string groups existing in the image can be determined, and finally the characters belonging to the same character string group can be determined. If there is a table in the figure (not limited to whether there is a frame line), there may be a table line between the two characters, and a table column is a character column group.
  • the character column group can be judged line by line. After the character column group judgment is performed on all the characters in the first character line, multiple character column groups can be obtained, and then the character column group judgment can be performed on all the characters in the second character line. The abscissas of the characters in the previously obtained character string group are compared to determine which character string group they are divided into. If the character column group is a table column, combined with the judgment of the row spacing, the table in the figure can be fully recognized.
  • the character processing method further includes a step of determining a column spacing and a step of determining a character row group.
  • the above-mentioned step 120 may include step 121' to step 124'.
  • Step 121' for any two longitudinally adjacent third and fourth characters, calculate the first abscissa of one or more points in the third character and one or more corresponding ones in the fourth character. The difference of the first abscissa of each point;
  • Step 122' cluster calculation of all the calculated differences using the kernel density function, determine the first longitudinal distance according to the calculation result of the kernel density function (for example, the fifth minimum value), and compare the longitudinal phase according to the first longitudinal distance. The difference between the first abscissas of two adjacent characters is judged, and the characters belonging to the same character column are determined;
  • Step 123' Determine the second longitudinal distance according to the calculation result of the kernel density function (for example, the sixth minimum value), and judge the difference between the first abscissas of two longitudinally adjacent characters according to the second longitudinal distance To determine the column spacing;
  • the kernel density function for example, the sixth minimum value
  • step 122' and step 123' can be executed in combination.
  • the kernel density function is used to cluster all the calculated differences. Since the longitudinally adjacent characters include not only the position adjacent characters but also the content adjacent characters, the kernel density function is used to cluster the coordinate differences of the longitudinally adjacent characters The maximum value of the offset of the character column and the maximum value of one or more column spacings (second vertical spacing) can be obtained.
  • the calculation result of the kernel density function includes one or more minimum values, for example, including the smallest fifth minimum value and a sixth minimum value greater than the fifth minimum value, where the fifth minimum value represents the smallest value that exists
  • the maximum value of a type of spacing (character column offset) the fifth minimum value represents the maximum offset value of two adjacent characters in the y-axis direction in the x-axis direction
  • the fifth minimum value can be used for To correct the slanted character column, there can be one or more second minimum values, which can represent one or more column spacing, that is, the spacing between two adjacent columns.
  • the difference between the first abscissas of the two longitudinally adjacent characters is less than or equal to the fifth minimum value, it is determined that the two longitudinally adjacent characters belong to the same character string, and it is determined that the longitudinally adjacent characters belong to the same character string.
  • the difference between the first abscissas of two characters is greater than the fifth minimum value and less than the sixth minimum value, it is determined that the two longitudinally adjacent characters belong to two character strings respectively, that is, there exists between the two characters Column spacing.
  • Step 124' using the kernel density function to perform clustering calculation on the difference between the second coordinate values of the two longitudinally adjacent characters to determine the third longitudinal spacing, and according to the third longitudinal spacing, the first two longitudinally adjacent characters
  • the difference between the two coordinate values is determined to determine whether the two longitudinally adjacent characters belong to the same character line group, and finally the characters belonging to the same character line group are determined.
  • the second coordinate value is the second ordinate, which may be the ordinate of one or more preset points.
  • the difference between the second ordinates of the two longitudinally adjacent characters may be the second ordinate of one or more points in the seventh character and the eighth character.
  • the difference in the second ordinate of the corresponding or corresponding one or more points includes one of the following situations:
  • Case 9 The difference between the ordinate of a point in the seventh character and the corresponding point in the eighth character, for example, the difference between the ordinate of the uppermost point of the seventh character and the ordinate of the uppermost point of the eighth character, or the first The difference between the ordinate of the lowermost point of the seven character and the ordinate of the lowermost point of the eighth character, or the difference between the ordinate of the center point of the seventh character and the ordinate of the center point of the eighth character;
  • the end point, the lowermost point, and the center point are only examples for illustration. In other embodiments, other points can be selected for calculation, as long as the standard for selecting points is unified;
  • Case 10 The difference between the ordinate of a point in the seventh character and the corresponding point in the eighth character, for example, the difference between the ordinate of the lowest point of the seventh character and the ordinate of the uppermost point of the eighth character, the first The ordinate of the eight character is greater than the ordinate of the seventh character;
  • Case 11 The difference between the mean value of the ordinates of multiple points in the seventh character and the mean value of the ordinates of the corresponding points in the eighth character, for example, the mean values of the ordinates of the multiple upper end points in the seventh character correspond to those in the eighth character
  • Case 12 The difference between the ordinates of multiple points in the seventh character and the corresponding points in the eighth character, for example, the mean value of the ordinates of the multiple lower endpoints in the seventh character and the ordinate of the multiple upper endpoints in the eighth character The difference of the mean value of the coordinates, the ordinate of the eighth character is greater than the ordinate of the seventh character;
  • corresponding endpoints refer to endpoints that have the same position or the same rules for selecting positions.
  • corresponding end points refer to end points that have mirror symmetry.
  • the third longitudinal distance represents the coordinate difference of two adjacent characters in the y-axis direction in the y-axis direction, and "adjacent" here includes the situation that the foregoing content is adjacent.
  • the step of determining whether the characters belong to the same character row group can be performed with reference to the step of determining whether the characters belong to the same character row group in step 124, which will not be repeated here.
  • the character row group is, for example, a table row.
  • FIG. 4 is a flowchart of a character recognition method provided by an embodiment of the disclosure. As shown in the figure, it includes step 310 to step 330.
  • Step 310 preprocessing the image
  • the preprocessing includes one or more of the following processes: color image conversion to grayscale image, Gaussian filtering, background extraction, contrast compensation, binarization, and perspective transformation.
  • Step 320 Identify the coordinates of the characters in the image
  • Step 330 Use the character processing method in the foregoing embodiment (for example, FIG. 1, FIG. 2 or FIG. 3) to perform recognition processing on the characters in the image, and determine the characters belonging to the same row.
  • the character recognition method further includes step 340 of displaying part or all of the row or rows of characters determined after the recognition processing.
  • step 330 after performing recognition processing on the characters in the image, the following recognition processing results can be determined: which characters belong to the same row, and thus the number of rows existing in the image can be determined.
  • row-included behavior you can determine the characters belonging to the same row and how many rows there are in the image.
  • the row and column you can determine the characters that belong to the same column and how many columns there are in the image.
  • the image includes a table, in addition to the above content, the table row and/or table column can also be determined.
  • part or all of the recognition processing results can be selected and displayed on the interface of the terminal. Taking the table included in the image as an example, it can be one or more of the following two situations:
  • the character processing and recognition method provided by the present disclosure has fast recognition speed and high recognition accuracy.
  • the kernel density estimation function is used to estimate the offset distance and the distance between the characters in the text block after the character recognition result, and carry out the line reconstruction and the character block reconstruction. It has high robustness and anti-interference ability, and the recognition accuracy is high.
  • the method of the embodiments of the present disclosure can be used not only for character recognition and frameless form recognition, but also for framed form recognition. Because the embodiment of the present disclosure performs identification analysis on the row spacing and column spacing, the presence or absence of frame lines does not affect the identification result.
  • the method of the embodiment of the present disclosure can handle table photos with offsets in the text position due to the tilt of the shooting angle, uneven shooting of the lighting, etc., and the kernel density algorithm of the clustering nature can classify these offsets into its correct category (line ,List).
  • FIG. 6 is an algorithm flow chart of the form recognition method of the exemplary embodiment, which includes the following steps 410 to 430.
  • Step 410 image preprocessing
  • the image preprocessing process may include one or more of the following steps as required: color image conversion to grayscale image, filtering (such as Gaussian filtering), background extraction, contrast compensation, binarization, and perspective transformation.
  • filtering such as Gaussian filtering
  • background extraction contrast compensation
  • binarization binarization
  • perspective transformation perspective transformation
  • Step 411 Convert the input color image to be recognized into a grayscale image
  • Converting a color image to a grayscale image can be achieved using opencv.
  • Step 412 Use Gaussian filtering to remove noise in the grayscale image
  • Gaussian filtering can use opencv to remove noise.
  • Step 413 Perform background extraction on the grayscale image after the noise is removed
  • the background of the photo taken when the light is poor is relatively dark, and the characters in the image are usually black, the background is equivalent to interference, resulting in inconspicuous characters. Therefore, the gray level of the background can be estimated through background extraction, and the main character and the background can be estimated. Peel it off.
  • the background of a certain point in the picture can be estimated by the set of brighter points in the w*w neighborhood of the point, that is, some whiter points in an area can be used to represent the background of the area.
  • w is an empirical value
  • the value range of w is the number of pixels greater than 0 and less than the side length. For example, it may be about one-tenth of a certain side length (for example, the shortest side length).
  • the processing speed can be improved by reducing the picture before processing. If the pixels of an image are several thousand times several thousand, the processing speed will be greatly improved after equal-ratio compression and will not affect the background extraction result.
  • the values of the neighborhood range and the number of brightness samples are empirical values determined according to the image size, and other parameters can be tried according to actual conditions.
  • Step 414 Compensate the contrast of the gray image after noise removal according to the background extraction result
  • the uneven illumination background can be removed by contrast compensation, which can be calculated by the following formula:
  • y is the gray value of any pixel after compensation
  • p s is the gray value of the original image at the same pixel position obtained in step 102
  • p b is the gray value of the background image at the same pixel position extracted in step 103 .
  • the above-mentioned contrast compensation method is the realization method of contrast compensation in the open source tool ScanTailor, and the amount of calculation is small and the speed is fast.
  • other contrast compensation methods can also be selected, which is not limited in the present disclosure.
  • Step 415 Binarize the contrast-compensated image
  • the local threshold Sauvola an image binarization method that considers the local mean brightness
  • the input of the Sauvola algorithm is a grayscale image, which takes the current pixel as the center, and dynamically calculates the threshold of the pixel according to the gray average and standard deviation in the neighborhood of the current pixel. It can be implemented with scikit-image.
  • scikit-image (abbreviated as skimage) is a collection of image processing and computer vision algorithms, including the Sauvola algorithm.
  • Step 416 Perform perspective transformation on the binarized image.
  • Perspective Transformation is used to solve the problem that Affine Transformation cannot change the relative positional relationship inside the shape. Similar to the "free transform” function in Photoshop, or the "perspective" function in the GNU (an operating system) image processing program (GNU Image Manipulation Program, GIMP for short), both can be implemented with a perspective transformation matrix.
  • Perspective transformation uses a four-vertex perspective transformation method. By looking for a perspective transformation matrix, an oblique quadrilateral can be converted into a rectangle.
  • Step 420 Perform general OCR text recognition on the preprocessed image
  • FIG. 8 is an example of a medical examination report with an oblique angle taken by a user
  • FIG. 9 is an image after the above-mentioned picture preprocessing and general text OCR recognition.
  • Step 430 Perform table reconstruction on the content identified by OCR.
  • the table reconstruction process of the embodiment of the present disclosure uses the Gaussian kernel density estimation method to reconstruct the result of the general OCR character recognition process to form a table.
  • the reconstruction process includes text row reconstruction, table row reconstruction, and table column reconstruction. Through table row reconstruction and table column reconstruction, the cells in the table can be determined, and the inner text block can be reconstructed.
  • the reconstruction process does not depend on the borders inside and outside the form, and has strong anti-interference ability against tilt.
  • Figure 10 shows the table reconstruction process.
  • the kernel density estimation algorithm is used to reconstruct the table to realize the OCR table recognition, which improves the shortcomings of the previous table recognition that relied on the table frame and the accuracy was not high.
  • Kernel density estimation is a method used to estimate unknown density functions in probability theory, and it is one of the non-parametric testing methods.
  • the kernel density estimation method does not use prior knowledge about the data distribution and does not attach any assumptions to the data distribution. It is a method to study the characteristics of the data distribution from the data sample itself.
  • the table to be identified may be compared with the table in the pre-established table format library before the table reconstruction is performed.
  • the table header and/or table footer are compared and determined.
  • the table header part is the same as the pre-saved table format
  • the table border is determined according to the pre-saved table format
  • the table range is determined, and the table is performed within the determined table range Refactoring.
  • the table reconstruction process includes the following steps 801 to 806.
  • the row is determined first and then the column is determined as an example for description.
  • the column may be determined first, and then the row may be determined.
  • Step 801 After the pre-processed image is recognized by general characters, the upper, lower, left, and right coordinate positions (hereinafter referred to as coordinates) of each character in the image are obtained, and the upper coordinate of each character is recorded;
  • the horizontal axis is the x axis and the vertical axis is the y axis.
  • Each character can be regarded as a rectangle.
  • a rectangle contains at least one character.
  • a rectangle includes four sides: the upper side (or top side), the bottom side (or bottom side), the left side and the right side.
  • the upper coordinate of the text is the distance from the top of the rectangle to the x-axis
  • the left coordinate of the text is the distance from the left of the rectangle to the y-axis
  • the lower coordinate of the text is the distance from the bottom of the rectangle to the x-axis
  • the right coordinate of the text is The distance from the right side of the rectangle to the y axis.
  • the upper coordinates of all text can be inserted into an upper coordinate list, and the upper coordinate list is recorded as list1;
  • Step 802 traverse list1, calculate the absolute value of the upper coordinate difference of two adjacent characters in list1, and insert the obtained absolute value into a new list—the upper coordinate difference list, the upper coordinate difference list is recorded as list_top;
  • calculating the upper coordinate difference is taken as an example for description.
  • the lower coordinate difference of two adjacent characters can also be calculated.
  • the method to find the distance is the same in the subsequent steps.
  • Calculating the upper coordinate difference of two adjacent characters is to obtain the data distribution, which includes the following situations that may occur: the first horizontal spacing (ie the offset of the text line or the offset spacing, for example, the offset of the text line due to the tilt of the shooting angle) Shift), the second horizontal spacing (including text line spacing, table line spacing or other line spacing, other line spacing is for example the large distance between the last check item in the medical examination form or the laboratory test form and the signature at the end of the form, others There may be one or more line spacing). According to the test order form in this example, it has three types of horizontal spacing as described above-offset spacing, table line spacing (in this example, the text line spacing is the same as the table line spacing) and other spacings.
  • the first horizontal spacing ie the offset of the text line or the offset spacing, for example, the offset of the text line due to the tilt of the shooting angle) Shift
  • the second horizontal spacing including text line spacing, table line spacing or other line spacing, other line spacing is for example the large distance between the last check item in
  • Step 803 Use Gaussian kernel density estimation to fit the density curve for the list list_top to find the minimum point of the discrete sequence list_top.
  • the initial bandwidth bandwidth is the smoothing parameter
  • the preset step size such as 0.1
  • the preset step size such as 0.1
  • the Gaussian kernel density estimation to fit the density curve until the number of minimum values is 3. .
  • the minimum point of the discrete sequence is the dividing point of multiple types of horizontal spacing. In other embodiments, depending on the table, the number of horizontal spacing may also be different. There may be a type of horizontal spacing, that is, the first horizontal spacing. When the upper coordinate difference (or the lower coordinate difference) of the two characters is smaller than the first horizontal spacing When it is considered that the two characters are on the same line, the first horizontal spacing can be used to correct the character deviation caused by the shooting angle.
  • Figure 11 is the density curve fitted to the coordinate difference on adjacent text using kernel density estimation.
  • the X-axis is the value of the input data, which is the upper coordinate difference
  • the Y-axis is the estimated logarithmic kernel under a certain upper coordinate difference. Density estimate.
  • For the kernel density estimation curve of multiple data points since waveform synthesis occurs between adjacent peaks, the final curve shape is not closely related to the selected kernel function. Considering the ease of use of the function in waveform synthesis calculation, this embodiment uses a Gaussian kernel function (normal distribution curve) as the kernel function for kernel density estimation.
  • the estimated value of the ordinate can be taken as a logarithm (lg10(f)) to compress the size of the estimated value.
  • Kernel density estimation is to superimpose the kernel functions of each coordinate difference x i to form a density curve.
  • n the sample size (that is, the total number of coordinate differences)
  • h the bandwidth
  • K() the kernel function
  • This embodiment uses the Gaussian kernel function.
  • the calculation formula of the Gaussian kernel function is as follows:
  • kernel density estimation it is possible to distinguish a variety of lateral distances.
  • Figure 11 only illustrates the density curve waveform and the corresponding minimum point in this example.
  • the minimum value of the kernel density function can be calculated directly by derivation.
  • Step 804 Determine the characters belonging to the same row and the characters belonging to the same table row according to the found minimum value
  • the number of horizontal spacing is 3, that is, there are 3 minimum values (3 minimum values in Figure 11), and the X-axis coordinate corresponding to the first minimum value is the maximum text offset spacing Value (hereinafter referred to as text offset spacing), the X-axis coordinate corresponding to the second minimum value is the maximum table line spacing (hereinafter referred to as the table line spacing), and the X-axis coordinate corresponding to the third minimum value is other spacing.
  • it is a large distance between rows (for example, the distance between the last check item in the medical examination form or the laboratory test form and the signature at the end of the form).
  • Text offset spacing ⁇ table line spacing ⁇ large distance line spacing.
  • the two characters when the upper coordinate difference between two characters is less than or equal to the character offset distance, then the two characters are considered to be in the same character line.
  • the upper coordinate difference between the two characters is greater than the character offset distance and less than If it is equal to the table row spacing, it is considered that the two characters are in different table rows, that is, there is a table line between the two characters.
  • the upper coordinate difference between two characters is greater than the table line spacing and less than or equal to the large line spacing, it is considered that there is a large line spacing between the two characters.
  • the upper coordinate difference less than mi is regarded as the same text line.
  • the first character Take the first character as the first line, traverse the remaining characters in list1, if the absolute value of the difference between the upper coordinate and the determined average value of the upper coordinate of the character on the line is less than mi, then The character is determined to be the character on the line, if it is greater than mi, it is recorded as an independent new text line. After all the characters are traversed, the characters in each line are sorted according to the left coordinate, and the reconstruction of the character line is completed.
  • the x-coordinate value of the second minimum point is ni
  • ni is the maximum table row spacing
  • the upper coordinate difference less than ni is considered to be the same table row.
  • the table line can also be determined on the basis of the determined text line, and the position of the table line can be determined by traversing the upper coordinate difference of each character to find a coordinate difference greater than mi and less than ni, or respectively Determine the average upper coordinate difference of two text lines, calculate the absolute value of the difference of the average upper coordinate difference of the two text lines, and determine if the absolute value is greater than mi and less than ni, then there is a table line between the two text lines.
  • the determined table line can also be verified. For example, the above methods for determining the table line can be mutual verification methods. So far, the table row reconstruction is complete.
  • Step 805 For each character line, calculate the absolute value of the difference between the right coordinate of each character and the left coordinate of the character located on the right side of the character and adjacent to the character (hereinafter referred to as the left and right coordinate difference of two adjacent characters), and Take all the differences as a discrete sequence and use Gaussian kernel density estimation to find the minimum point on the fitting curve, similar to step 803. For the table in this embodiment, 3 extreme points (number of column spacing) will be obtained through Gaussian kernel density estimation.
  • the x-axis coordinates of the 3 extreme points represent the character column spacing, the table column spacing and other spacing ( Large distance column spacing greater than the table column spacing, such as the spacing on both sides of the table), where the text column spacing can be regarded as the maximum column spacing between texts in the same cell (the smallest cell of the table consisting of one row and one column), and the text column spacing is smaller than the table Column spacing, the table column spacing is smaller than other large distance column spacing. There may be one or more other spacings. In this embodiment, when the difference between the left and right coordinates of two adjacent characters is greater than the spacing between the character columns and less than or equal to the spacing between the table columns, it is considered that the two characters are in different table columns.
  • the description is based on the writing habit of from left to right as an example. If the writing habit is from right to left, the left coordinate of each character in each text line and the left coordinate of each character on the left side of the character and the The absolute value of the difference between the right coordinates of adjacent characters. Another adjacent situation is that the last word in a row and the first word in another row are also considered adjacent characters.
  • step 806 the text is reorganized according to the table row determined in step 804 and the table column determined in step 805, and the table is reconstructed.
  • Using the method of the embodiments of the present disclosure does not depend on the presence or absence of a frame line in the table, because even if there is no frame line, the row spacing still exists, and the column spacing still exists.
  • FIG. 12 shows the recognition result of the exemplary embodiment.
  • you can preset the styles of different types of forms when displayed on the terminal interface, and determine the display style corresponding to the current form according to the header and/or footer identification, and then apply the above recognition method After the recognition result shown in FIG. 12 is obtained, the recognized content is displayed to the user through a preset display style.
  • FIGS. 13a and 13b show the display effect of part of the content.
  • the display style is set to black large font, and the unit and reference range are reference content, so the display style is set to gray small font.
  • the serial number and English abbreviation are not the content that the user cares about, in this example, they are not displayed.
  • the table is a standardized table
  • the standardized content such as item name, unit, reference range, etc., can be filled in advance when the style is preset, and the item name or serial number in the preset style can be compared.
  • Figures 12a and 12b are only a display example, and in other embodiments, it can be set to other styles as required.
  • the technical solution of the present disclosure can achieve better results in borderless forms such as medical scenarios, physical examination reports and laboratory test forms.
  • the method according to the embodiment of the present disclosure has higher robustness and anti-interference ability, as well as higher recognition speed and accuracy.
  • the method of the embodiments of the present disclosure still has the characteristics of good results when the borders in the table are defaulted, and are especially suitable for the identification of borderless tables such as medical scene medical examination reports and laboratory test forms.
  • a terminal device (referred to as a terminal for short) is also provided.
  • the terminal device may include a processor, a memory, and a computer program that is stored on the memory and can run on the processor.
  • the processor can implement character processing in the embodiments of the present disclosure when the processor executes the computer program. Method or character recognition method.
  • the terminal device 1300 may include: a processor 1310, a memory 1320, a bus system 1330, and a transceiver 1340, where the processor 1310, the memory 1320, and the transceiver 1340 pass through the bus
  • the system 1330 is connected, the memory 1320 is configured to store instructions, and the processor 1310 is configured to execute the instructions stored in the memory 1320 to implement the character processing or character recognition method as described above.
  • the transceiver 1340 is configured to receive the coordinates of the characters in the image obtained by recognition and send them to the processor, and is also configured to output the character processing result of the processor.
  • the transceiver 1340 is configured to obtain the image to be processed and receive the character recognition result of the processor.
  • the terminal may be, for example, a mobile handheld device such as a mobile phone and a tablet computer.
  • the transceiver 1340 may include a camera, which can obtain images to be processed by capturing images, or the transceiver 1340 may be installed from other applications (for example, with an image transmission function) installed in the mobile phone.
  • the image to be processed may be obtained from the application program), or the transceiver 1340 may obtain the coordinates of the characters in the recognized image from other applications installed in the mobile phone (for example, an application program with a text recognition function).
  • the transceiver 1340 may also include a display module configured to receive and display the character processing result (the result obtained by the processor performing the character processing method) or the character recognition result (the processor performing the character recognition method) output by the processor. ⁇ ); or the transceiver 1340 includes other application programs that are configured to receive character processing results or character recognition results output by the processor, and process the received results.
  • the processor 1310 may be a central processing unit (Central Processing Unit, referred to as "CPU"), and the processor 1310 may also be other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), or off-the-shelf processors. Programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 1320 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1310. A part of the memory 1320 may also include a non-volatile random access memory. For example, the memory 1320 may also store device type information.
  • the bus system 1330 may also include a power bus, a control bus, and a status signal bus. However, for the sake of clear description, various buses are marked as the bus system 1330 in FIG. 14.
  • the processing performed by the terminal device may be completed by an integrated logic circuit of hardware in the processor 1310 or an instruction in the form of software. That is, the steps of the method disclosed in the embodiments of the present disclosure may be embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the software module can be located in storage media such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers, etc.
  • the storage medium is located in the memory 1320, and the processor 1310 reads the information in the memory 1320, and completes the steps of the foregoing method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • Such software may be distributed on a computer-readable medium, and the computer-readable medium may include a computer storage medium (or a non-transitory medium) and a communication medium (or a transitory medium).
  • the term computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Sexual, removable and non-removable media.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
  • communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

一种字符处理及字符识别方法、存储介质和终端设备。所述字符处理方法包括:识别获得图像中字符的坐标;采用核密度函数对相邻两字符的第一坐标值的差值进行聚类计算,确定属于同一排的字符。

Description

字符处理及字符识别方法、存储介质和终端设备 技术领域
本公开实施例涉及但不限于文字处理技术,尤指一种字符处理及字符识别方法、存储介质和终端设备。
背景技术
患者在体检机构或者医院进行体检或者检查化验后,纸质化验单不易保存。而且当用户去另外的医院检查时,由于纸质化验单数据不能结构化,体检机构与医院之间或者医院与医院之间的数据不能通用等一系列问题使得当前医院对于患者的情况不能进行很好的评估。经常发生换一家医院又要重新做检查的情况,浪费了大量的时间、金钱和人力。因此需要一种方法可以将患者的纸质体检报告单或者化验单中的数据结构化,将碎片性的化验单信息整体化,这对于患者电子病历的建立、各医院数据间的联通具有重要的意义。
表格是化验单及体检报告单中的重要组成部分,将纸质表格转化为结构化的电子信息时表格中字符的识别是需要解决的问题。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
第一方面,本公开实施例提供了一种字符处理方法,包括:识别获得图像中字符的坐标;采用核密度函数对相邻两字符的第一坐标值的差值进行聚类计算,确定属于同一排的字符。
第二方面,本公开实施例提供了一种字符识别方法,包括:对图像进行预处理;识别所述图像中字符的坐标;采用前述字符处理方法对所述图像中字符进行识别处理,确定属于同一排的字符。
第三方面,本公开实施例提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于实现上述方法。
第四方面,本公开实施例提供了一种终端设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法中的步骤。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图说明
图1为本公开实施例提供的一种文字处理方法的流程图;
图2为图1所示步骤中步骤120一种实现方法的流程图;
图3为图1所示步骤中步骤120另一种实现方法的流程图;
图4为本公开实施例提供的一种文字识别方法的流程图;
图5为本公开实施例提供的另一种文字识别方法的流程图;
图6为本公开示例性实施例的算法总流程图;
图7为本公开示例性实施例中图片预处理流程图;
图8为本公开示例性实施例具有倾斜情况的表格图像示例;
图9为经过本公开实施例图像预处理后的图像;
图10为本公开示例性实施例表格重构流程图;
图11为本公开示例性实施例核密度估计计算离散序列拟合出的密度曲线图;
图12为本公开示例性实施例表格识别结果展示图;
图13a和13b为应用了本公开实施例方法的应用程序的效果展示图;
图14为本公开实施例提供的一种终端设备的结构示意图。
具体实施方式
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。尽管在附图中示出了许多 可能的特征组合,并在具体实施方式中进行了讨论,但是所公开的特征的许多其它组合方式也是可能的。除非特意加以限制的情况以外,任何实施例的任何特征或元件可以与任何其它实施例中的任何其他特征或元件结合使用,或可以替代任何其它实施例中的任何其他特征或元件。
本公开包括并设想了与本领域普通技术人员已知的特征和元件的组合。本公开已经公开的实施例、特征和元件也可以与任何常规特征或元件组合,以形成由权利要求限定的独特的发明方案。任何实施例的任何特征或元件也可以与来自其它发明方案的特征或元件组合,以形成另一个由权利要求限定的独特的发明方案。因此,应当理解,在本公开中示出和/或讨论的任何特征可以单独地或以任何适当的组合来实现。因此,除了根据所附权利要求及其等同替换所做的限制以外,实施例不受其它限制。此外,可以在所附权利要求的保护范围内进行各种修改和改变。
此外,在描述具有代表性的实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。
除非另外定义,本公开实施例公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。
一种将纸质表格转化为结构化电子信息的技术是对表格数据拍照再使用 光学字符识别(Optical Character Recognition,简称OCR)技术识别表格。一种OCR表格识别的方法是识别表格内框线,将框线内的文字划分成文字块,分别识别每个文字块内容,再将文字块与框线组成表格。然而医疗体检化验单表格多为无框线表格,这会导致无法区分出表中的每个文字块。另外,当受到用户拍摄角度或拍摄技术等影响,照片可能会有一定程度的变形,虽然可以使用开源计算机视觉库(Open Source Computer Vision Library,简称opencv)中的透视变换对图片进行一定的矫正,但是透视变换处理后原本在同一水平线上的文字会出现一定程度的偏移,这会对OCR表格识别结果产生巨大的影响,本应在同行的文字可能会被识别成多行,影响识别准确度和精度。
为此本公开实施例提供了一种字符处理和字符识别方法,能够矫正由于图像变形导致的将本属于一排的文字识别为多排的错误。可以适用于包括字符的图像,或者包括无框线表格的图像,或者包括有框线表格的图像。
本公开提供的以下实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图1为本公开实施例提供的一种字符处理方法的流程图,如图所示,包括以下步骤110至120。
步骤110,识别获得图像中字符的坐标;
例如可以采用OCR文字识别方法识别表格图像中字符的坐标,得到表格图像中字符的坐标位置。
步骤120,采用核密度函数对相邻两字符的第一坐标值的差值进行聚类计算,确定属于同一排的字符。
所述同一排包括:同一行或者是同一列,这根据字符的排列方式确定,当字符采用横向排列(通常为横向书写方式)时,“排”包括行,即确定属于同一行的字符,当字符采用纵向排列(通常为纵向书写方式)时,“排”包括列,即确定属于同一列的字符。
在一示例性实施例中,所述排包括行,上述步骤120中的采用核密度函数对相邻两字符的第一坐标值的差值进行聚类计算,确定属于同一排的字符, 包括:采用核密度函数对横向相邻的两字符的第一纵坐标的差值进行聚类计算,确定第一横向间距,根据所述第一横向间距确定属于同一行的字符。
在本示例中,所述横向相邻两字符包括位置上相邻的字符,例如x轴方向上相邻的字符,以及在内容上相邻的字符。在横向书写方式中,可以将一行中最后一个字符与下一行中第一个字符视为内容上相邻的字符,即属于横向相邻的字符。以横向书写且书写顺序为从左至右书写文字且从上至下书写行为例,在计算时,可以在识别出图像中所有字符的坐标后,将所有字符中预设的一个或多个点的坐标按照字符从左至右和行从上至下的顺序汇总在一个列表中,列表中每个字符为一项,列表中的相邻两项表示相邻两字符。
以横向相邻的第一字符和第二字符为例,第一纵坐标是指预设一个或多个点的纵坐标。所述横向相邻两字符的第一纵坐标的差值可以是第一字符中的一个或多个点的第一纵坐标与第二字符中相应的一个或多个点的第一纵坐标的差值,包括以下情况之一:
情况1:第一字符中一个点与第二字符中相应点的纵坐标的差值,例如第一字符的最上端的点的纵坐标与第二字符最上端的点的纵坐标的差值,或者第一字符的最下端的点的纵坐标与第二字符最下端的点的纵坐标的差值,或者第一字符的中心点的纵坐标与第二字符中心点的纵坐标的差值;上述最上端的点、最下端的点和中心点仅为举例说明,在其他实施例中,可以选择其他点进行计算,只要统一选择点的标准即可;
情况2:第一字符中多个点的纵坐标均值与第二字符中相应多个点的纵坐标均值的差值,例如第一字符中多个上端点的纵坐标均值与第二字符中相应多个上端点的纵坐标均值,或者第一字符中多个下端点的纵坐标均值与第二字符中相应多个下端点的纵坐标均值;上述多个上端点和多个下端点仅为举例说明,在其他实施例中,可以根据需要进行选择,只要统一选择标准即可。
对于任意两个横向相邻的第一字符和第二字符,计算所述第一字符中的一个或多个点的第一纵坐标与所述第二字符中相应的一个或多个点的第一纵坐标的差值;采用核密度函数对计算得到的所有差值进行聚类计算,根据核密度函数计算结果(极小值)确定第一横向间距,根据所述第一横向间距对 横向相邻的两字符的第一纵坐标的差值进行判断,确定属于同一字符行的字符。
核密度函数是一种估计数据基础分布的方法。通过在数据集中的每个样本点设置一个核函数(例如高斯核),对所有的核函数相加,得到数据集的核密度估计。使用的核函数带宽参数不同,得到的密度函数将有所不同。通过Mean shift(一种梯度上升算法)使数据点朝着密度增大最快的方向移动最终收敛在局部极大值点处形成簇,收敛至同一极大值处的点为同一簇的成员。通过设置适合的带宽参数,可以到符合图像情况的一个或多个簇,每个簇代表一类间距,簇的边界(核密度函数极小值)也即该类间距的边界,因此可以通过计算寻找核密度函数的极小值,确定对应的间距。
对计算得到的所有差值进行聚类计算,其计算结果表示图中可能出现的一类或几类间距。一个极小值表示一种横向间距的最大值。在本实施例中,所述第一横向间距表示字符行偏移最大值,即同一行中x轴方向上相邻两个字符在y轴方向上的偏差。
在另一示例性实施例中,所述排包括列,上述步骤120中的采用核密度函数对相邻两字符的第一坐标值的差值进行聚类计算,确定属于同一排的字符,包括:采用核密度函数对纵向相邻的两字符的第一横坐标的差值进行聚类计算,确定第一纵向间距,根据所述第一纵向间距确定属于同一列的字符。
在本示例中,所述纵向相邻两字符包括位置上相邻的字符,例如y轴方向上相邻的字符,以及在内容上相邻的字符。在纵向书写方式,可以将一列最后一个字符与下一列中第一个字符视为内容上相邻的字符,即属于纵向相邻的字符。以纵向书写且书写顺序为从上至下书写文字且从右至左书写列为例,在计算时,可以在识别出图像中所有字符的坐标后,将所有字符中预设的一个或多个点的坐标按照字符从上至下和列从右至左的顺序汇总在一个列表中,列表中每个字符为一项,列表中的相邻两项表示相邻两字符。
以纵向相邻的第三字符和第四字符为例,第一横坐标是指预设一个或多个点的横坐标。所述纵向相邻两字符的第一横坐标的差值可以是第三字符中的一个或多个点的第一横坐标与第四字符中相应的一个或多个点的第一横坐标的差值,包括以下情况之一:
情况3:第三字符中一个点与第四字符中相应点的横坐标的差值,例如第三字符的最左端的点的横坐标与第四字符最左端的点的横坐标的差值,或者第三字符的最右端的点的横坐标与第四字符最右端的点的横坐标的差值,或者第三字符的中心点的横坐标与第四字符中心点的横坐标的差值;上述最左端的点、最右端的点和中心点仅为举例说明,在其他实施例中,可以选择其他点进行计算,只要统一选择点的标准即可;
情况4:第三字符中多个点的横坐标均值与第四字符中相应多个点的横坐标均值的差值,例如第三字符中多个左侧端点的横坐标均值与第四字符中相应多个左侧端点的横坐标均值,或者第三字符中多个右侧端点的横坐标均值与第四字符中相应多个右侧端点的横坐标均值;上述多个左侧端点和多个右侧端点仅为举例说明,在其他实施例中,可以根据需要进行选择,只要统一选择标准即可。
对于任意两个纵向相邻的第三字符和第四字符,计算所述第三字符中的一个或多个点的第一横坐标与所述第四字符中相应的一个或多个点的第一横坐标的差值;采用核密度函数对计算得到的所有差值进行聚类计算,根据核密度函数计算结果(极小值)确定第一纵向间距,根据所述第一纵向间距对纵向相邻的两字符的第一横坐标的差值进行判断,确定属于同一字符列的字符。
如前所述,核密度函数对计算得到的所有差值进行聚类计算,其计算结果表示图中可能出现的一类或几类间距。一个极小值表示一种纵向间距的最大值。在本实施例中,所述第一纵向间距表示字符列偏移最大值,即同一列中y轴方向上相邻两个字符在x轴方向上的偏差。
通过对相邻两字符的第一坐标值的差值进行一次聚类计算可以确定图像中字符排的偏移。例如,采用核密度函数能够得到一个或多个极小值,其中最小的第一极小值表示字符排偏移的最大值,可以用于确定属于同一字符行或者字符列的字符,当相邻两字符的第一坐标值的差值小于或等于该第一极小值时,可确定该相邻两字符属于同一字符行或字符列。
采用本公开实施方法可以矫正由于拍摄角度倾斜等情况下的位置发生偏移的字符,通过聚类性质的核密度算法可以将这些偏移归入其正确的类中, 即可以去除偏移将原本属于同一行或同一列的字符识别为同一行或同一列。
如前所述,利用核密度函数对坐标差进行聚类计算可能得到一个极小值,该极小值可以用于矫正字符排的偏移。利用核密度函数对坐标差进行聚类计算也可能得到多个极小值,其中最小的极小值(第一极小值)用于矫正字符排的偏移,其他的一个或多个极小值(第二极小值)反映了图像中存在几类间距(对于横向书写来说,该间距包括行间距,对纵向书写来说,该间距包括列间距),一个第二极小值对应表示一类间距的最大值。在其他实施例中,利用核密度函数对坐标差进行聚类计算得到的一个或多个极小值可能均表示间距,一个极小值对应表示一类间距的最大值。
在一示例性实施例中,以排包括行为例,所述字符处理方法还包括确定行间距的步骤以及确定字符列组的步骤。如图2所示,上述步骤120可以包括以下步骤121至步骤124。
步骤121,对于任意两个横向相邻的第一字符和第二字符,计算所述第一字符中的一个或多个点的第一纵坐标与所述第二字符中相应的一个或多个点的第一纵坐标的差值;
实现方法同前述实施例中描述,此处不再赘述。
步骤122,采用核密度函数对计算得到的所有差值进行聚类计算,根据核密度函数计算结果(例如第一极小值)确定第一横向间距,根据所述第一横向间距对横向相邻的两字符的第一纵坐标的差值进行判断,确定属于同一字符行的字符;
步骤123,根据所述核密度函数计算结果(例如第二极小值)确定第二横向间距,根据所述第二横向间距对横向相邻的两字符的第一纵坐标的差值进行判断,确定行间距;
上述步骤122和步骤123可以合并执行。
采用核密度函数对计算得到的所有差值进行聚类计算,由于横向相邻字符不仅包括位置相邻字符还包括内容相邻字符,因此采用核密度函数对横向相邻字符的坐标差进行聚类可以得到字符行的偏移最大值以及一种或多种行间距的最大值(即第二横向间距)。核密度函数计算结果包括一个或多个极 小值,例如包括最小的第一极小值以及大于第一极小值的第二极小值,其中第一极小值表示存在的最小的一类间距(字符行偏移)的最大值,该第一极小值表示在x轴方向上相邻的两字符在y轴方向上的最大偏移值,该第一极小值可用于修正倾斜字符行,第二极小值可以有一个或多个,可以表示一种或多种行间距,即相邻两行之间的间距。
判断所述横向相邻两字符的第一纵坐标的差值小于或等于所述第一极小值时,确定所述横向相邻的两字符属于同一字符行,判断所述横向相邻两字符的第一纵坐标的差值大于所述第一极小值且小于所述第二极小值时,确定所述横向相邻的两字符分别属于两个字符行,即两字符间存在行间距。通过上述步骤可以确定图像中存在的一种或多种行间距。当有多个第二极小值时,例如包括第一极小值T1和第二极小值T2和T3,当横向相邻两字符S1和S2的坐标差大于T1小于T2,说明该两个字符S1和S2位于两行,字符S1为一行中最后一个字符,字符S2为下一行中第一个字符,该两字符之间存在第一种行间距。当横向相邻两字符S3和S4的坐标差大于T2小于T3,说明该两个字符S3和S4位于两行,该两字符之间存在第二种行间距,第一种行间距小于第二种行间距。如果图中有表格(不限于是否有框线),具有第二种行间距的位置可能存在表格横线(如果是无框线表格,则为隐含表格线)。
坐标差值等于极小值的情况可以划分在较小的极小值对应的情况之内。
步骤124,采用核密度函数对横向相邻的两字符的第二坐标值的差值进行聚类计算,确定第三横向间距,根据所述第三横向间距对横向相邻的两字符的第二坐标值的差值进行判断,判断所述横向相邻的两字符是否属于同一字符列组,最终确定属于同一字符列组的字符。
在本实施例中,由于为横向书写方式,第二坐标值为第二横坐标,可以为预设一个或多个点的横坐标。以横向相邻的第五字符和第六字符为例,横向相邻两字符的第二横坐标的差值可以是第五字符中的一个或多个点的第二横坐标与第六字符中相应或对应的一个或多个点的第二横坐标的差值,包括以下情况之一:
情况5:第五字符中一个点与第六字符中相应点的横坐标的差值,例如第五字符的最左端的点的横坐标与第六字符最左端的点的横坐标的差值,或 者第五字符的最右端的点的横坐标与第六字符最右端的点的横坐标的差值,或者第五字符的中心点的横坐标与第六字符中心点的横坐标的差值;上述最左端的点、最右端的点和中心点仅为举例说明,在其他实施例中,可以选择其他点进行计算,只要统一选择点的标准即可;
情况6:第五字符中一个点与第六字符中对应点的横坐标的差值,例如第五字符的最右端的点的横坐标与第六字符最左端的点的横坐标的差值,第六字符的横坐标大于第五字符的横坐标;
情况7:第五字符中多个点的横坐标均值与第六字符中相应多个点的横坐标均值的差值,例如第五字符中多个左侧端点的横坐标均值与第六字符中相应多个左侧端点的横坐标均值,或者第五字符中多个右侧端点的横坐标均值与第六字符中相应多个右侧端点的横坐标均值;上述多个左侧端点和多个右侧端点仅为举例说明,在其他实施例中,可以根据需要进行选择,只要统一选择标准即可。
情况8:第五字符中多个点与第六字符中对应多个点的横坐标的差值,例如第五字符中多个右侧端点的横坐标均值与第六字符中多个左侧端点的横坐标均值的差值,第六字符的横坐标大于第五字符的横坐标;
上述“相应”的端点指位置相同或选取位置的规则相同的端点。上述“对应”的端点指存在镜像对称的端点。
由上述差值计算方法可知,所述第三横向间距表示x轴方向上相邻两个字符在x轴方向上的坐标差值,此处“相邻”包括前述内容相邻的情况。
采用核密度函数对计算得到的所有第二坐标值的差值进行聚类计算,核密度函数计算结果包括一个或多个极小值,每个极小值代表图中存在一类第三横向间距,例如包括最小的第三极小值以及大于第三极小值的第四极小值,其中最小的第三极小值表示正常字符之间的间距,例如“字符”二字中“字”与“符”之间的距离。大于第三极小值的第四极小值可能有一个或多个,可以表示多种间距,例如同一行中,第一列表格中的最后一个字与第二列表格中第一个字之间的间距为一种间距,第二列表格中的最后一个字与第三列表格中的第一个字之间的间距可能为另一种间距。由于在本实施例书写方式为横向书写,因此可以不考虑小于最小的第三极小值的那部分差值,只考虑位 于第三极小值与第四极小值之间的差值。判断所述横向相邻的两字符的第二坐标值的差值大于第三极小值且小于第四极小值时,或者判断所述横向相邻的两字符的第二坐标值的差值在两个第四极小值之间时,确定所述横向相邻的两字符分别属于两个字符列组。采用上述方法对图中所有相邻字符的第二坐标的差值进行判断后,可以确定图像中存在的一个或多个字符列组,最终可以确定属于同一字符列组的字符。如果图中有表格(不限于是否有框线),则该两个字符之间可能存在表格线,一个表格列为一个字符列组。
例如可以逐行字符列组的判断,对第一字符行的所有字符进行字符列组判断后,可以得到多个字符列组,再对第二字符行的所有字符进行字符列组判断,同时与之前得到的字符列组中的字符的横坐标进行比对,以判断将其划分进那个字符列组。如果字符列组为表格列,再结合行间距的判断,就可以完整识别出图中的表格。
在一示例性实施例中,以排包括列为例,所述字符处理方法还包括确定列间距的步骤以及确定字符行组的步骤。如图3所示,上述步骤120可以包括步骤121’至步骤124’。
步骤121’,对于任意两个纵向相邻的第三字符和第四字符,计算所述第三字符中的一个或多个点的第一横坐标与所述第四字符中相应的一个或多个点的第一横坐标的差值;
同前述实施例中描述,此处不再赘述。
步骤122’,采用核密度函数对计算得到的所有差值进行聚类计算,根据核密度函数计算结果(例如第五极小值)确定第一纵向间距,根据所述第一纵向间距对纵向相邻的两字符的第一横坐标的差值进行判断,确定属于同一字符列的字符;
步骤123’,根据所述核密度函数计算结果(例如第六极小值)确定第二纵向间距,根据所述第二纵向间距对纵向相邻的两字符的第一横坐标的差值进行判断,确定列间距;
上述步骤122’和步骤123’可以合并执行。
采用核密度函数对计算得到的所有差值进行聚类计算,由于纵向相邻字符不仅包括位置相邻字符还包括内容相邻字符,因此采用核密度函数对纵向相邻字符的坐标差进行聚类可以得到字符列的偏移最大值以及一种或多种列间距的最大值(第二纵向间距)。所述核密度函数计算结果包括一或多个极小值,例如包括最小的第五极小值以及大于第五极小值的第六极小值,其中第五极小值表示存在的最小的一类间距(字符列偏移)的最大值,该第五极小值表示在在y轴方向上相邻的两字符在x轴方向上的最大偏移值,该第五极小值可用于修正倾斜字符列,第二极小值可以有一个或多个,可以表示一种或多种列间距,即相邻两列之间的间距。
判断所述纵向相邻的两字符的第一横坐标的差值小于或等于所述第五极小值时,确定所述纵向相邻的两字符属于同一字符列,判断所述纵向相邻的两字符的第一横坐标的差值大于所述第五极小值且小于所述第六极小值时,确定所述纵向相邻的两字符分别属于两个字符列,即两字符间存在列间距。通过上述步骤可以确定图像中存在的一种或多种列间距。当有多个第六极小值时,例如包括第五极小值T4和第六极小值T5和T6,当纵向相邻两字符S5和S6的坐标差大于T4小于T5,说明该两个字符S5和S6位于两列,字符S5为一列中最后一个字符,字符S6为下一列中第一个字符,该两字符之间存在第一种列间距。当纵向相邻两字符S7和S8的坐标差大于T5小于T6,说明该两个字符S7和S8位于两列,该两字符之间存在第二种列间距,第一种列间距小于第二种列间距。如果图中有表格(不限于是否有框线),具有第二种列间距的位置可能存在表格竖线(如果是无框线表格,则为隐含表格线)。
步骤124’,采用核密度函数对纵向相邻的两字符的第二坐标值的差值进行聚类计算,确定第三纵向间距,根据所述第三纵向间距对纵向相邻的两字符的第二坐标值的差值进行判断,判断所述纵向相邻的两字符是否属于同一字符行组,最终确定属于同一字符行组的字符。
在本实施例中,由于为纵向书写方式,第二坐标值为第二纵坐标,可以为预设一个或多个点的纵坐标。以纵向相邻的第七字符和第八字符为例,纵向相邻两字符的第二纵坐标的差值可以是第七字符中的一个或多个点的第二 纵坐标与第八字符中相应或对应的一个或多个点的第二纵坐标的差值,包括以下情况之一:
情况9:第七字符中一个点与第八字符中相应点的纵坐标的差值,例如第七字符的最上端的点的纵坐标与第八字符最上端的点的纵坐标的差值,或者第七字符的最下端的点的纵坐标与第八字符最下端的点的纵坐标的差值,或者第七字符的中心点的纵坐标与第八字符中心点的纵坐标的差值;上述最上端的点、最下端的点和中心点仅为举例说明,在其他实施例中,可以选择其他点进行计算,只要统一选择点的标准即可;
情况10:第七字符中一个点与第八字符中对应点的纵坐标的差值,例如第七字符的最下端的点的纵坐标与第八字符最上端的点的纵坐标的差值,第八字符的纵坐标大于第七字符的纵坐标;
情况11:第七字符中多个点的纵坐标均值与第八字符中相应多个点的纵坐标均值的差值,例如第七字符中多个上端点的纵坐标均值与第八字符中相应多个上端点的纵坐标均值,或者第七字符中多个下端点的纵坐标均值与第八字符中相应多个下端点的纵坐标均值;上述多个上端点和多个下端点仅为举例说明,在其他实施例中,可以根据需要进行选择,只要统一选择标准即可。
情况12:第七字符中多个点与第八字符中对应多个点的纵坐标的差值,例如第七字符中多个下端点的纵坐标均值与第八字符中多个上端点的纵坐标均值的差值,第八字符的纵坐标大于第七字符的纵坐标;
上述“相应”的端点指位置相同或选取位置的规则相同的端点。上述“对应”的端点指存在镜像对称的端点。
由上述差值计算方法可知,所述第三纵向间距表示y轴方向上相邻两个字符在y轴方向上的坐标差值,此处“相邻”包括前述内容相邻的情况。
确定字符是否属于同一字符行组的步骤可参照步骤124中确定字符是否属于同一字符列组的步骤执行,此处不再赘述。该字符行组例如是一个表格行。
图4为本公开实施例提供的一种字符识别方法的流程图,如图所示,包括步骤310至步骤330。
步骤310,对图像进行预处理;
所述预处理,包括以下处理中的一种或多种:彩色图转灰度图、高斯滤波、背景提取、对比度补偿、二值化和透视变换。
步骤320,识别所述图像中字符的坐标;
步骤330,采用前述实施例中的字符处理方法(例如图1、图2或图3)对图像中字符进行识别处理,确定属于同一排的字符。
在一示例性实施例中,如图5所示,在步骤330之后,所述字符识别方法还包括步骤340,显示识别处理后确定的一排或多排字符中的部分或者全部。
在步骤330中,对图像中的字符进行识别处理后,可以确定如下识别处理结果:哪些字符属于同一排,并且由此可以确定图像中存在的排数。以排包括行为例,可以确定属于同一行的字符,以及图像中有多少行。以排包括列为例,可以确定属于同一列的字符,以及图像中有多少列。当图像中包括表格,除上述内容外,还可以确定表格行和/或表格列。得到上述识别处理结果后,可以从识别处理结果中选择部分或者全部在终端的界面上进行显示。以图像中包括表格为例,可以是以下两种情况的一种或多种:
选择确定的一个或多个表格行中的部分或者全部内容(表格行中所有字符的部分或者全部)进行显示;
选择确定的一个或多个表格列中的部分或者全部内容(表格列中所有字符的部分或者全部)进行显示。
选择时可以根据字符内容进行判断,选择需要展示的行或列进行展示。展示的样式可以根据需要预先进行设置。
采用本公开实施例提供的方法,通过对图片进行预处理,去除了图片中的噪声,获得的表格识别结果更加准确。
本公开提供的字符处理和识别方法,识别速度快、识别准确率高。通过核密度估计函数对文字识别后的结果进行偏移距离的估计及文字块内文字间 距估计,进行行重构和文字块重构。具有较高的鲁棒性以及抗干扰能力,且识别准确率高。
本公开实施例方法不仅可以用于文字识别、无框线表格识别,并且可以用于有框线表格识别。因为本公开实施例是对行间距和列间距进行识别分析,因此有无框线并不影响识别结果。本公开实施例方法可以处理由于拍摄角度倾斜、拍摄光照不均匀等情况下文字位置发生偏移的表格照片,而聚类性质的核密度算法可以将这些偏移归入其正确的类中(行、列)。
下面结合一示例性实施例对本公开实施例方法进行说明。在本示例性实施例中以无框线化验单表格为例进行说明。图6为本示例性实施例表格识别方法的算法流程图,包括以下步骤410至步骤430。
步骤410,图像预处理;
图片预处理过程根据需要可以包括以下一个或多个步骤:彩色图转灰度图、滤波(例如高斯滤波)、背景提取、对比度补偿、二值化和透视变换。以下以包括上述全部过程为例进行说明,当包括全部过程时可以保证图像在极端条件(例如拍摄角度倾斜、光照不均匀等情况)下后续OCR文字识别的准确率。如图7所示,包括以下步骤411到步骤416。
步骤411:将输入的彩色待识别图像转为灰度图;
将彩色图转换为灰度图可利用opencv实现。
步骤412,用高斯滤波去除灰度图中的噪点;
高斯滤波去除噪点可利用opencv实现。
步骤413,对去除噪点后的灰度图进行背景提取;
由于光线不好时拍出的照片背景比较昏暗,而图像中的字符通常也是黑色的,背景相当于干扰,导致字符不明显,因此通过背景提取可以估算出背景的灰度,把字符主体与背景剥离开来。
图片中某一点的背景,可以用该点w*w邻域内较亮的点的集合来进行估算,即可以用一个区域内较白的一些点代表该区域的背景。w为经验值,w的取值范围为大于0,小于边长的像素个数。例如可以取某边长(例如最短边边长)的10分之一左右。
以w取31为例进行说明。先将图片等比例压缩至最短的边为300像素点,然后逐行或者逐列扫描图像,依次选取每个像素点31*31邻域内7个亮度最高的点,去掉最亮的一个像素点以减少白噪声点的影响,然后其余6个点求平均作为该点的背景值。所有像素点的背景值计算完后将背景图片放大至原图大小,对图像进行缩小和放大可采用插值法实现。
通过对图片进行缩小后再处理可以提高处理速度。如果一张图像的像素为几千乘几千,经等比例压缩后处理速度大幅提高且不会影响背景提取结果。
上述邻域范围、亮度样本个数的取值为依据图像尺寸而确定的经验值,可以根据实际情况尝试其他参数。
步骤414,根据背景提取结果对去除噪点后的灰度图对比度补偿;
通过对比度补偿该可以去掉不均匀的光照背景,可采用以下公式计算:
Figure PCTCN2020076828-appb-000001
其中,y为任意一像素点补偿后的灰度值;p s为步骤102得到的相同像素位置的原图灰度值,p b为步骤103提取出的相同像素位置的背景图的灰度值。
上述对比度补偿方法为开源工具ScanTailor中对比度补偿实现方法,计算量小速度快。在其他实施例中,还可以选用其他对比度补偿方法,本公开对此不做限定。
步骤415,对对比度补偿后的图像进行二值化处理;
可采用局部阈值Sauvola(一种考虑局部均值亮度的图像二值化方法)算法进行二值化处理。Sauvola算法的输入是灰度图像,它以当前像素点为中心,根据当前像素点邻域内的灰度均值与标准方差来动态计算该像素点的阈值。可用scikit-image实现。scikit-image(简称skimage)是一个图像处理和计算机视觉的算法集合,其中包括Sauvola算法。
除了使用Sauvola算法外,还可以使用opencv提供的自适应二值化算法实现。
步骤416,对二值化处理后的图像进行透视变换。
可使用Canny算法进行边缘检测,再用opencv内置轮廓检测findContours函数找到最大的具有四个角点的轮廓,对这四个角点圈出的图片进行透视变换。透视变换(Perspective Transformation)用于解决仿射变换(Affine Transformation)无法改变形状内部的相对位置关系的问题。类似Photoshop中的“自由变换”功能,或者GNU(一种操作系统)图像处理程序(GNU Image Manipulation Program,简称GIMP)中的“透视”功能,都可以用透视变换矩阵来实现。透视变换采用四顶点透视变换的方法,通过寻找透视变换矩阵,可以将一个倾斜的四边形转换为一个矩形。
以上给出了几个步骤的实现方法,在其他实施例中,也可以采用其他方式实现。
步骤420,对预处理后的图像进行通用OCR文字识别;
相较于OCR表格识别,通用OCR文字识别技术更为成熟,其只专注于文字的识别,识别后的结果为每个字以及其上(top)、下(bottom)、左(left)、右(right)的坐标。通用文字OCR识别的实现方法有很多,本文中使用google开源的Tesseract-OCR实现,也可调用其他通用文字识别服务实现。
图8为用户拍摄的具有倾斜角度的体检报告示例,图9为经过上述图片预处理和通用文字OCR识别后的图像。
步骤430,对OCR识别后的内容进行表格重构。
本公开实施例的表格重构过程使用高斯核密度估计方法将通用OCR文字识别过程的结果进行重构从而组成表格。重构过程包括文字行重构、表格行重构和表格列重构,通过表格行重构和表格列重构可以确定表格中的单元格,进而可以重构内文字块。重构过程不依赖于表格内外的框线,对于倾斜的抗干扰能力较强。图10为表格重构流程。通过核密度估计算法进行表格重构来实现OCR表格识别,改进了以往表格识别依赖表格框线、精度不高的弊病。
核密度估计是在概率论中用来估计未知的密度函数的方法,属于非参数检验方法之一。核密度估计方法不利用有关数据分布的先验知识,对数据分 布不附加任何假定,是一种从数据样本本身出发研究数据分布特征的方法。
在一示例性实施例中,表格边界部分除了采用上述聚类的方法进行确定外,还可以在进行表格重构之前,可以将待识别表格与预先建立的表格格式库中的表格进行比对,例如通过表头和/或表尾进行比对确定,如当表头部分与预先保存的表格格式相同时,按照预先保存的表格格式确定表格边框,确定表格范围,在确定表格范围之内进行表格重构。
如图10所示,表格重构流程包括以下步骤801到步骤806。本实施例以先确定行再确定列为例进行说明。在其他实施例中也可以先确定列,再确定行。
步骤801,预处理后的图像通过通用文字识别后会得到图像中每个文字的上、下、左和右的坐标位置(以下简称坐标),记录每个文字的上坐标;
如果以图像左上角为原点建立坐标轴,横向为x轴,纵向为y轴。每个文字可以看作一个矩形,一个矩形至少将一个文字包含在内,一个矩形包括四个边:上边(或称顶边)、下边(或称底边)、左边和右边。文字的上坐标为该矩形的上边距x轴的距离,文字的左坐标为该矩形的左边距y轴的距离,文字的下坐标为该矩形的下边距x轴的距离,文字的右坐标为该矩形的右边距y轴的距离。为计算方便,可将所有文字的上坐标插入一个上坐标列表,该上坐标列表记做list1;
步骤802,遍历list1,计算list1中相邻两文字的上坐标差的绝对值,可将得到的绝对值插入一个新的列表——上坐标差列表,该上坐标差列表记做list_top;
本实施例以计算上坐标差为例进行说明,在其他实施例中也可以计算相邻两文字的下坐标差。后续寻找间距的方法相同。
计算相邻两文字的上坐标差是为了得到数据分布情况,其中包含可能出现的以下情况:第一横向间距(即文字行偏移或称偏移间距,例如由于拍摄角度倾斜导致的文字行偏移),第二横向间距(包括文字行间距、表格行间距或其他行间距,其他行间距例如为体检表或者化验单中最后一检查项到表尾签名处之间的大距离行间距,其他行间距可能有一个或多个)。根据本例 的化验单表格,其具有如上所述的3种横向间距——偏移间距、表格行间距(本例中文字行间距与表格行间距相同)和其他间距。
步骤803,对于列表list_top使用高斯核密度估计拟合密度曲线寻找离散序列list_top的极小值点,寻找时将初始带宽(带宽即为平滑参数)设为1,在本实施例中,如果离散序列list_top的极小值个数大于3(横向间距个数),则在初始带宽基础上增加预设步长(例如0.1),重新计算高斯核密度估计拟合密度曲线直至极小值个数为3。离散序列的极小值点即为多类横向间距的分界点。在其他实施例中,根据表格的不同,横向间距个数也可能不同,可以有一类横向间距,即第一横向间距,当两字符的上坐标差(或者下坐标差)小于该第一横向间距时,认为该两字符位于同一行,因此该第一横向间距可以用于矫正因拍摄角度而导致的文字偏移。
图11为使用核密度估计对相邻文字上坐标差拟合出的密度曲线,X轴为输入数据的值即为上坐标差,Y轴为估算出的在某上坐标差下的对数核密度估计值。对于多个数据点的核密度估计曲线,由于相邻波峰之间会发生波形合成,因此最终所形成的曲线形状与选择的核函数关系并不密切。考虑到函数在波形合成计算上的易用性,本实施例使用高斯核函数(正态分布曲线)作为核密度估计的核函数。图11中可对纵坐标的估计值取对数(lg10(f)),以对估计值的大小进行压缩。核密度估计就是把每个坐标差x i的核函数叠加在一起形成密度曲线。
核密度估计值计算公式如下:
Figure PCTCN2020076828-appb-000002
其中:
Figure PCTCN2020076828-appb-000003
为核密度估计函数,n为样本量(即坐标差的总个数),h为带宽,也称平滑参数,h>0,K()为核函数,本实施例使用高斯核函数。高斯核函数计算公式如下:
Figure PCTCN2020076828-appb-000004
通过核密度估计可以区分多种横向间距。
图11仅为说明本例中的密度曲线波形以及对应的极小值点。在计算时可 直接通过求导计算出核密度函数的极小值。
步骤804,根据寻找到的极小值确定属于同一行的字符,以及属于同一表格行的字符;
在本实施例中,横向间距个数为3,即有3个极小值(如图11中的3个极小值),第一个极小值对应的X轴坐标为文字偏移间距最大值(以下简称文字偏移间距),第二个极小值对应的X轴坐标为表格行间距最大值(以下简称表格行间距),第三个极小值对应的X轴坐标为其他间距,本例中为大距离行间距(例如体检表或者化验单中最后一检查项到表尾签名处之间的间距)。文字偏移间距<表格行间距<大距离行间距。在本实施例中,当两个文字间上坐标的差小于或等于文字偏移间距,则认为该两个字符处于同一文字行,当两个文字间上坐标的差大于文字偏移间距且小于或等于表格行间距,则认为该两个字符处于不同表格行,即该两字符之间存在表格线。当两个文字间上坐标的差大于表格行间距且小于或等于大距离行间距,则认为该两个字符之间存在大距离的行间距。
实现时,可记第一个极小值横坐标值x值为mi,mi即为最大文字偏移间距,小于mi的上坐标差值被认为是同一文字行。将第一个字符作为第一行,遍历列表list1中剩下的字符,如果其上坐标与已确定的该行上的字符的上坐标的平均值的差值的绝对值小于mi,则将该字符确定为该行上的字符,如果大于mi,则将其记录为独立的新文字行。所有文字遍历完后每行字符按照左坐标排序,至此文字行重构完成。在本示例中,通过将每个字符的上坐标与已确定的行上字符的上坐标平均值进行比较确实是否属于同一文字行,在其他示例性实施例中,还可以直接用相邻两字符的上坐标差与mi进行比较,判断是否该两个文字属于同一文字行。
采用同样的方式进行表格行重构,可记第二个极小值点横坐标值x值为ni,ni即为最大表格行间距,小于ni的上坐标差值被认为是同一表格行。将第一个字符作为第一表格行,遍历列表list1中剩下的字符,如果其上坐标与已确定的该表格行的字符的上坐标的平均值的差值的绝对值大于mi且小于ni,则确定该两个字符之间存在表格线,即该两个字符分别处于不同的表格行。在一个示例性实施例中,还可以直接用相邻两字符的上坐标差与ni进行 比较,确定表格线的位置。在另一个示例性实施例中,还可以在已确定的文字行的基础上确定表格线,通过遍历每个字符的上坐标差寻找大于mi且小于ni的坐标差来确定表格线位置,或者分别确定两文字行的平均上坐标差,计算两文字行平均上坐标差的差的绝对值,判断该绝对值如果大于mi且小于ni,则该两文字行之间存在表格线。在示例性实施例中,还可以对已确定的表格线进行验证。例如上述确定表格线的方法可以互为验证方法。至此表格行重构完成。
步骤805,对于每一文字行计算每个字符的右坐标和位于该字符右侧且与该字符相邻的字符的左坐标的差值的绝对值(以下简称相邻两文字左右坐标差),并将所有的差值作为离散序列使用高斯核密度估计拟合曲线寻找极小值点,类似步骤803。对于本实施例中表格,通过高斯核密度估计将得到3个极值点(列间距个数),该3个极值点的x轴坐标分别代表的文字列间距,表格列间距和其他间距(大于表格列间距的大距离列间距,例如表格两侧的间距),其中文字列间距可视为同一个单元格(一行一列构成的表格最小单元)内文字间最大列间距,文字列间距小于表格列间距,表格列间距小于其他大距离列间距。其他间距可能有一个或多个。在本实施例中,当相邻两文字左右坐标差大于文字列间距且小于或等于表格列间距,则认为该两个文字处于不同表格列中。
在本实施例中,按照书写习惯为从左往右为例进行说明,如果书写习惯为从右往左,则应计算每一文字行中每个字符的左坐标和位于该字符左侧且与该字符相邻的字符的右坐标的差值的绝对值。另一种相邻的情况是一行中最后一个字与另一行第一字也认为是相邻字符。
步骤806,按照步骤804确定的表格行以及步骤805确定的表格列对文字进行重组,重构表格。
使用本公开实施例方法不依赖于表格内框线的有无,因为即使没有框线,行间距依然真实存在、列间距依然真实存在。
通过上述方法可以识别出原本属于同一行的字符,以及识别出属于同一表格行和属于同一表格列的字符,图12为本示例性实施例的识别结果展示。将上述方法应用于例如体检报告等的表格识别时,可以预先设置不同种类表 格在终端界面显示时的样式,根据表头和/或表尾识别确定当前表格对应的显示样式,在应用上述识别方法得到如图12所示的识别结果后,将识别出的内容通过预先设置的显示样式显示给用户,图13a和图13b示出了部分内容的展示效果。图13a和图13b所示示例中,由于项目名称和检查结果为用户关心的内容,因此设置显示样式为黑色大号字体,单位和参考范围为参考内容,因此设置显示样式为灰色小号字体,序号和英文缩写并非用户关心的内容在本例中设置为不显示。在显示时,可以按照表格行的顺序逐行选择表格中的内容填充到显示页面。在另一实施方式中,当表格为规范表格,可以在预先设置样式时提前填充好规范的内容,例如项目名称、单位和参考范围等,通过比对预先设置的样式中的项目名称或序号等内容,从图像表格的识别结果中选择对应的检查结果填充到展示页面中相应的位置。图12a和图12b仅为一种显示示例,在其他实施例中,可以根据需要设置为其他样式。本公开技术方案在医疗场景,体检报告及化验单这种无框线表格上能取得更好的效果。
采用本公开实施例所述方法,相较于传统表格识别方法具有更高的鲁棒性以及抗干扰能力,以及更高的识别速度和准确率。本公开实施例方法在表内框线缺省时仍然具有较好效果的特点,尤其适用于医疗场景体检报告及化验单这种无框线表格的识别。
在本公开一示例性实施例中,还提供了一种终端设备(简称为终端)。所述终端设备可包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时可实现本公开实施例中的字符处理方法或字符识别方法。
如图14所示,在一个示例中,终端设备1300可包括:处理器1310、存储器1320、总线系统1330和收发器1340,其中,该处理器1310、该存储器1320和该收发器1340通过该总线系统1330相连,该存储器1320设置为存储指令,该处理器1310设置为执行该存储器1320存储的指令,以实现如前所述的字符处理或字符识别方法。在一示例性实施例中,该收发器1340设置为接收识别获得的图像中字符的坐标将其发送给处理器,此外还设置为输出处理器的字符处理结果。在其他示例性实施例中,该收发器1340设置为获取待处理图像以及接收处理器的字符识别结果。
所述终端例如可以是手机、平板电脑等移动手持设备。以所述终端为手机为例,该收发器1340可以包括摄像头,该摄像头可通过拍摄图像的方式获取待处理图像,或者该收发器1340可以从手机中安装的其他应用程序(例如具有图像传输功能的应用程序)处获取待处理图像,或者该收发器1340可以从手机中安装的其他应用程序(例如具有文字识别功能的应用程序)处获取识别后得到的图像中字符的坐标。此外,该收发器1340还可以包括显示模块,该显示模块设置为接收并显示处理器输出的字符处理结果(处理器执行字符处理方法得到的结果)或字符识别结果(处理器执行字符识别方法得到的结果);或者该收发器1340包括其他应用程序,该应用程序设置为接收处理器输出的字符处理结果或者字符识别结果,并对接收到的结果进行处理。
应理解,处理器1310可以是中央处理单元(Central Processing Unit,简称为“CPU”),处理器1310还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器1320可以包括只读存储器和随机存取存储器,并向处理器1310提供指令和数据。存储器1320的一部分还可以包括非易失性随机存取存储器。例如,存储器1320还可以存储设备类型的信息。
总线系统1330除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图14中将多种总线都标为总线系统1330。
在实现过程中,终端设备所执行的处理可以通过处理器1310中的硬件的集成逻辑电路或者软件形式的指令完成。即本公开实施例所公开的方法的步骤可以体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等存储介质中。该存储介质位于存储器1320,处理器1310读取存储器1320中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
本领域的普通技术人员应当理解,可以对本公开实施例的技术方案进行 修改或者等同替换,而不脱离本公开技术方案的精神和范围,均应涵盖在本申请的权利要求范围当中。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。

Claims (16)

  1. 一种字符处理方法,包括:
    识别获得图像中字符的坐标;
    采用核密度函数对相邻两字符的第一坐标值的差值进行聚类计算,确定属于同一排的字符。
  2. 根据权利要求1所述的方法,其中,
    所述排包括行;所述采用核密度函数对相邻两字符的第一坐标值的差值进行聚类计算,确定属于同一排的字符,包括:采用核密度函数对横向相邻的两字符的第一纵坐标的差值进行聚类计算,确定第一横向间距,根据所述第一横向间距确定属于同一行的字符;或者
    所述排包括列;所述采用核密度函数对相邻两字符的第一坐标值的差值进行聚类计算,确定属于同一排的字符,包括:采用核密度函数对纵向相邻的两字符的第一横坐标的差值进行聚类计算,确定第一纵向间距,根据所述第一纵向间距确定属于同一列的字符。
  3. 根据权利要求2所述的方法,其中,
    所述采用核密度函数对横向相邻的两字符的第一纵坐标的差值进行聚类计算,确定第一横向间距,根据所述第一横向间距确定属于同一行的字符,包括:
    对于任意两个横向相邻的第一字符和第二字符,计算所述第一字符中的一个或多个点的第一纵坐标与所述第二字符中相应的一个或多个点的第一纵坐标的差值;采用核密度函数对计算得到的所有差值进行聚类计算,根据核密度函数计算结果确定第一横向间距,根据所述第一横向间距对横向相邻的两字符的第一纵坐标的差值进行判断,确定属于同一字符行的字符。
  4. 根据权利要求2所述的方法,其中,
    所述采用核密度函数对纵向相邻的两字符的第一横坐标的差值进行聚类计算,确定第一纵向间距,根据所述第一纵向间距确定属于同一列的字符,包括:
    对于任意两个纵向相邻的第三字符和第四字符,计算所述第三字符中的一个或多个点的第一横坐标与所述第四字符中相应的一个或多个点的第一横坐标的差值;采用核密度函数对计算得到的所有差值进行聚类计算,根据核密度函数计算结果确定第一纵向间距,根据所述第一纵向间距对纵向相邻的两字符的第一横坐标的差值进行判断,确定属于同一字符列的字符。
  5. 根据权利要求3所述的方法,还包括:
    根据所述核密度函数计算结果确定第二横向间距,根据所述第二横向间距对横向相邻的两字符的第一纵坐标的差值进行判断,确定行间距;
    采用核密度函数对横向相邻的两字符的第二坐标值的差值进行聚类计算,确定第三横向间距,根据所述第三横向间距对横向相邻的两字符的第二坐标值的差值进行判断,确定属于同一字符列组的字符。
  6. 根据权利要求5所述的方法,其中:
    所述横向相邻的两字符的第二坐标值的差值包括:第五字符中的一个或多个点的第二横坐标与第六字符中相应或对应的一个或多个点的第二横坐标的差值,所述第五字符与第六字符为横向相邻的两字符。
  7. 根据权利要求5所述的方法,所述根据所述第一横向间距对横向相邻的两字符的第一纵坐标的差值进行判断,确定属于同一字符行的字符,以及根据所述核密度函数计算结果确定第二横向间距,根据所述第二横向间距对横向相邻的两字符的第一纵坐标的差值进行判断,确定行间距,包括:
    所述核密度函数计算结果包括至少两个极小值,包括最小的第一极小值以及大于第一极小值的第二极小值,判断所述横向相邻两字符的第一纵坐标的差值小于或等于所述第一极小值时,确定所述横向相邻的两字符属于同一字符行,判断所述横向相邻两字符的第一纵坐标的差值大于所述第一极小值且小于或等于所述第二极小值时,确定所述横向相邻的两字符之间存在行间距。
  8. 根据权利要求4所述的方法,还包括:
    根据所述核密度函数计算结果确定第二纵向间距,根据所述第二纵向间距对纵向相邻的两字符的第一横坐标的差值进行判断,确定列间距;
    采用核密度函数对纵向相邻的两字符的第二坐标值的差值进行聚类计算,确定第三纵向间距,根据所述第三纵向间距对纵向相邻的两字符的第二坐标值的差值进行判断,确定属于同一组的字符行。
  9. 根据权利要求8所述的方法,其中:
    所述纵向相邻的两字符的第二坐标值的差值包括:第七字符中的一个或多个点的第二纵坐标与第八字符中相应或对应的一个或多个点的第二纵坐标的差值,所述第七字符与第八字符为纵向相邻的两字符。
  10. 根据权利要求8所述的方法,所述根据所述第一纵向间距对纵向相邻的两字符的第一横坐标的差值进行判断,确定属于同一字符列的字符,以及根据所述核密度函数计算结果确定第二纵向间距,根据所述第二纵向间距对纵向相邻的两字符的第一横坐标的差值进行判断,确定列间距,包括:
    所述核密度函数计算结果包括至少两个极小值,包括最小的第五极小值以及大于第五极小值的第六极小值,判断所述纵向相邻的两字符的第一横坐标的差值小于或等于所述第五极小值时,确定所述纵向相邻的两字符属于同一字符列,判断所述纵向相邻的两字符的第一横坐标的差值大于所述第五极小值且小于或等于所述第六极小值时,确定所述纵向相邻的两字符之间存在列间距。
  11. 根据权利要求1至10中任一项所述的方法,其中,所述核密度函数为高斯核密度函数。
  12. 一种字符识别方法,包括:
    对图像进行预处理;识别所述图像中字符的坐标;采用权利要求1-11中任一项所述字符处理方法对所述图像中字符进行识别处理,确定属于同一排的字符。
  13. 根据权利要求12所述的方法,其中,所述对图像进行预处理,包括以下处理中的一种或多种:彩色图转灰度图、高斯滤波、背景提取、对比度补偿、二值化和透视变换。
  14. 根据权利要求12所述的方法,所述确定属于同一排的字符后,所述方法还包括:显示识别处理后确定的一排或多排字符中的部分或者全部。
  15. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至11中任一项或12至14中任一项所述的方法。
  16. 一种终端设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1至11中任一项或12至14中任一项所述方法的步骤。
PCT/CN2020/076828 2020-02-26 2020-02-26 字符处理及字符识别方法、存储介质和终端设备 WO2021168703A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/076828 WO2021168703A1 (zh) 2020-02-26 2020-02-26 字符处理及字符识别方法、存储介质和终端设备
CN202080000183.0A CN113557520A (zh) 2020-02-26 2020-02-26 字符处理及字符识别方法、存储介质和终端设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/076828 WO2021168703A1 (zh) 2020-02-26 2020-02-26 字符处理及字符识别方法、存储介质和终端设备

Publications (1)

Publication Number Publication Date
WO2021168703A1 true WO2021168703A1 (zh) 2021-09-02

Family

ID=77489814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/076828 WO2021168703A1 (zh) 2020-02-26 2020-02-26 字符处理及字符识别方法、存储介质和终端设备

Country Status (2)

Country Link
CN (1) CN113557520A (zh)
WO (1) WO2021168703A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740059A (zh) * 2023-08-11 2023-09-12 济宁金康工贸股份有限公司 一种门窗机加工智能调控方法
CN117037194A (zh) * 2023-05-10 2023-11-10 广州方舟信息科技有限公司 单据图像的表格识别方法、装置、电子设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071771A (zh) * 2023-03-24 2023-05-05 南京燧坤智能科技有限公司 表格重构方法、装置、非易失性存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110222752A1 (en) * 2008-04-08 2011-09-15 Three Palm Software Microcalcification enhancement from digital mammograms
CN108597605A (zh) * 2018-03-19 2018-09-28 特斯联(北京)科技有限公司 一种个人健康生活大数据采集与分析系统
CN108628824A (zh) * 2018-04-08 2018-10-09 上海熙业信息科技有限公司 一种基于中文电子病历的实体识别方法
CN109815958A (zh) * 2019-02-01 2019-05-28 杭州睿琪软件有限公司 一种化验单识别方法、装置、电子设备和存储介质
CN110837796A (zh) * 2019-11-05 2020-02-25 泰康保险集团股份有限公司 图像处理方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110222752A1 (en) * 2008-04-08 2011-09-15 Three Palm Software Microcalcification enhancement from digital mammograms
CN108597605A (zh) * 2018-03-19 2018-09-28 特斯联(北京)科技有限公司 一种个人健康生活大数据采集与分析系统
CN108628824A (zh) * 2018-04-08 2018-10-09 上海熙业信息科技有限公司 一种基于中文电子病历的实体识别方法
CN109815958A (zh) * 2019-02-01 2019-05-28 杭州睿琪软件有限公司 一种化验单识别方法、装置、电子设备和存储介质
CN110837796A (zh) * 2019-11-05 2020-02-25 泰康保险集团股份有限公司 图像处理方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037194A (zh) * 2023-05-10 2023-11-10 广州方舟信息科技有限公司 单据图像的表格识别方法、装置、电子设备及存储介质
CN116740059A (zh) * 2023-08-11 2023-09-12 济宁金康工贸股份有限公司 一种门窗机加工智能调控方法
CN116740059B (zh) * 2023-08-11 2023-10-20 济宁金康工贸股份有限公司 一种门窗机加工智能调控方法

Also Published As

Publication number Publication date
CN113557520A (zh) 2021-10-26

Similar Documents

Publication Publication Date Title
US20210256253A1 (en) Method and apparatus of image-to-document conversion based on ocr, device, and readable storage medium
WO2021168703A1 (zh) 字符处理及字符识别方法、存储介质和终端设备
US11929048B2 (en) Method and device for marking target cells, storage medium and terminal device
WO2019174130A1 (zh) 票据识别方法、服务器及计算机可读存储介质
WO2020253508A1 (zh) 异常细胞检测方法、装置及计算机可读存储介质
WO2018233055A1 (zh) 保单信息录入的方法、装置、计算机设备及存储介质
WO2018233038A1 (zh) 基于深度学习的车牌识别方法、装置、设备及存储介质
US20200117943A1 (en) Method and apparatus for positioning text over image, electronic apparatus, and storage medium
CN108830780B (zh) 图像处理方法及装置、电子设备、存储介质
JP2016517587A (ja) モバイル装置を用いて取込まれたデジタル画像におけるオブジェクトの分類
JP2016516245A (ja) モバイル装置を用いた画像内のオブジェクトの分類
WO2018049801A1 (zh) 基于深度图的启发式手指检测方法
BRPI0708452A2 (pt) método e aparelho de correção de distorção baseado em modelo
US10169673B2 (en) Region-of-interest detection apparatus, region-of-interest detection method, and recording medium
WO2020143316A1 (zh) 证件图像提取方法及终端设备
WO2019200802A1 (zh) 合同影像图片的识别方法、电子装置及可读存储介质
US20180253852A1 (en) Method and device for locating image edge in natural background
US20200012879A1 (en) Text region positioning method and device, and computer readable storage medium
WO2020248848A1 (zh) 智能化异常细胞判断方法、装置及计算机可读存储介质
WO2020038312A1 (zh) 多通道舌体边缘检测装置、方法及存储介质
CN107845068A (zh) 图像视角变换装置以及方法
WO2021189856A1 (zh) 证件校验方法、装置、电子设备及介质
CN115273115A (zh) 一种文档元素标注方法、装置、电子设备和存储介质
CN114359932B (zh) 文本检测方法、文本识别方法及装置
CN112419207A (zh) 一种图像矫正方法及装置、系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922251

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20922251

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20922251

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20922251

Country of ref document: EP

Kind code of ref document: A1