CN117152778B

CN117152778B - Medical instrument registration certificate identification method, device and medium based on OCR

Info

Publication number: CN117152778B
Application number: CN202311423275.1A
Authority: CN
Inventors: 丁飞; 邓芳; 徐静; 谷昊
Original assignee: Anhui Provincial Hospital First Affiliated Hospital Of Ustc
Current assignee: Anhui Provincial Hospital First Affiliated Hospital Of Ustc
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-01-16
Anticipated expiration: 2043-10-31
Also published as: CN117152778A

Abstract

The invention discloses a medical instrument registration certificate identification method, a device and a medium based on OCR, wherein the method comprises the following steps: s1, acquiring an image of a medical instrument registration card and preprocessing the image of the medical instrument registration card to improve the definition of the image; s2, recognizing and packaging the preprocessed image to generate packaging data; s3, searching preset keywords from the package data; s4, calculating the coordinates of the searched keywords according to the search result in the step S3. Through OCR technology, can discern the literal information on the medical instrument registration card fast accurately, has improved speed and accuracy of information acquisition greatly, saved time and human cost that the manual work was typewritten and checked, OCR technology can discern the literal information of various typefaces and formats simultaneously, reduces the recognition error because of handwriting or printing error, ensures the accuracy of information, can directly convert the literal information on the medical instrument registration card into electronic format.

Description

Medical instrument registration certificate identification method, device and medium based on OCR

Technical Field

The invention relates to the technical field of medical informationized data identification, in particular to a medical instrument registration certificate identification method, device and medium based on OCR.

Background

In the purchasing and supplying business process in the medical field, medical equipment and material information such as medical equipment, consumables, reagents and instruments are compared with medical product registration certificate information so as to ensure legal compliance of medical product purchasing and supplying.

In the prior art, the registration certificate comparison work of medical products is often manual operation, the workload is extremely high, the probability of error occurrence in comparison is extremely high, and once errors occur, a large amount of economic loss and reputation loss can be caused for both hospitals and suppliers.

How to reduce the labor cost and the risk of human errors is one of the important problems to be solved in the field.

Disclosure of Invention

The invention aims to reduce labor cost and human error risk, and designs a medical instrument registration certificate identification method, device and medium based on OCR.

In order to achieve the aim of the invention, the invention adopts the following technical scheme: in a first aspect, the present invention provides an OCR-based medical instrument registration certificate recognition method, including the steps of:

s1, acquiring an image of a medical instrument registration card and preprocessing the image of the medical instrument registration card to improve the definition of the image;

s2, recognizing and packaging the preprocessed image to generate packaging data;

s3, searching preset keywords from the package data;

s4, calculating the coordinates of the searched keywords according to the search result in the step S3;

s5, marking corresponding keywords in the image of the medical instrument registration certificate according to the coordinates calculated in the step S4.

The medical instrument registration card recognition method based on the OCR as described above, wherein optionally, the preprocessing of the image of the medical instrument registration card in step S1 is to remove noise, enhance contrast, and adjust the recognition threshold or remove background interference to improve the image sharpness by a graying, binarizing and smoothing filtering method.

The medical instrument registration certificate identification method based on OCR as described above, wherein optionally, step S2 includes:

s21, recognizing data in the preprocessed image by using an OCR technology; the data comprises a whole line of text character strings, whole line of text coordinates, in-line text, a text coordinate set and identification accuracy evaluation data;

and S22, packaging the data obtained in the step S21 to obtain the packaged data.

The medical instrument registration certificate identification method based on OCR as described above, wherein optionally, the package data is json data.

The medical instrument registration certificate identification method based on OCR as described above, wherein optionally, text data input by a user is used as a preset keyword in step S3, and searching is performed in the package data.

The OCR-based medical instrument registration certificate identification method as described above, wherein, optionally, step S3 includes the steps of:

s31, searching a preset keyword in the whole line of text character strings of the package data;

s32, judging whether a preset keyword can be found in the whole line of text character strings; if yes, go to step S4; if not, go to step S33;

s33, splitting a preset keyword into a plurality of keywords; searching keywords in the whole line of text character strings;

s34, judging whether partial keywords can be found in the whole line of text character strings; if yes, go to step S35; if not, go to step S37;

s35, arranging the searched keywords in sequence;

s36, judging whether the ordered keywords are continuous characters or not; if so, go to step S38; if not, go to step S37;

s37, searching keywords in the text character strings of the next whole line, and repeating the steps S34-S36 until all the text characters of the whole line are traversed;

s38, judging whether the last character in the searched continuous characters is the last character in the whole line of text character strings; if yes, go to step S39, if no, go to step S37;

s39, updating the keywords, and searching the updated keywords from the first character of the text character string of the next line;

s310, judging whether updated keywords are found; if yes, go to step S311; if not, go to step S37;

s311, combining the updated keywords and the corresponding continuous characters into a search result set.

The medical instrument registration card recognition method based on OCR, wherein optionally, the preset keywords comprise the number of the medical instrument registration card and/or text information filled by a provider, and the search result comprises characters, coordinates of the characters in the file and/or character recognition accuracy evaluation.

The medical instrument registration certificate identification method based on OCR as described above, wherein optionally, the coordinates of the searched keyword calculated in step S4 include the start-stop coordinates of the keyword, the start-stop coordinates of the keyword include the upper left corner coordinates and the lower left corner coordinates of the first character and the upper right corner coordinates and the lower right corner coordinates of the last character, and if the line feed condition exists in the characters, the start-stop coordinates of the two lines of characters are calculated respectively.

In a second aspect, the present invention also proposes an OCR-based medical instrument registration certificate recognition apparatus comprising a memory and a processor, the memory storing computer-executable instructions, the processor being configured to execute the computer-executable instructions, wherein the computer-executable instructions, when executed by the processor, implement an OCR-based medical instrument registration certificate recognition method as set forth in any one of the preceding claims.

In a third aspect, the present invention also proposes a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the OCR-based medical instrument registration certificate identification method as described in any one of the above.

Compared with the prior art, the invention has the following beneficial effects:

1. the method can rapidly and accurately identify and extract the product information on the medical instrument registration card and compare the product information with the submitted product information through the OCR technology and the image processing technology, and calculate and display the comparison result and the comparison reliability, so that the speed and accuracy of information acquisition are greatly improved, the time of manual input and check is saved, the manual input cost is reduced, the economic loss caused by inspection errors is greatly reduced, the OCR technology can identify the character information in various fonts and formats, the identification errors caused by handwriting or printing errors are reduced, and the accuracy of the information is ensured.

2. The text processing quality and efficiency can be improved by marking and processing the text semantic problem and the keyword marking range problem, the format and keyword marking range problem in the text can be processed in a standardized and standardized manner, the repeatability and consistency of the text processing can be improved, the subsequent data analysis and processing work is convenient, the keyword marking range is selected and processed, the marked keywords are ensured to be consistent with the text content, the keyword marking range is prevented from being too small or too large, and the consistency and accuracy of the text processing are improved.

3. According to the method, the preset keywords are split into the single keywords, and the single keywords are respectively searched, so that the keywords which are fed from the middle part of the keywords can be searched, and corresponding coordinates can be calculated, so that the keywords which cross two rows in the picture can be marked.

Drawings

FIG. 1 is a flowchart of a method for recognizing registration certificate of medical equipment based on OCR in embodiment 1 of the present invention;

fig. 2 is a flowchart showing the specific steps of step S3 in embodiment 1 provided by the present invention.

Detailed Description

The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

Example 1:

the present embodiment further improves the algorithm based on the OCR technology to solve the problems presented in the background art.

OCR, optical character recognition, is a technique that optically converts print characters on paper documents into black and white dot matrix image files by image processing and character recognition techniques, and then converts the text in the images into electronic text format by recognition software for further editing by word processing software.

Referring to fig. 1, the embodiment discloses a medical instrument registration certificate identification method based on OCR, which includes the following implementation steps:

s1, acquiring an image of the medical instrument registration card and preprocessing the image of the medical instrument registration card to improve the image definition.

In the step, the image of the medical instrument registration certificate is preprocessed by removing noise and enhancing contrast through graying, binarization and smoothing filtering methods, and the recognition threshold value is adjusted or background interference is removed to improve the definition of the image. Through the steps, the image definition can be improved, and the accuracy of OCR recognition is improved.

S2, recognizing and packaging the preprocessed image to generate packaging data. The main function of this step is to identify the preprocessed image using OCR technology.

Specifically, the method specifically comprises the following steps:

s21, recognizing data in the preprocessed image by using an OCR technology; the data comprises an entire line of text character strings, an entire line of text coordinates, an in-line text, a text coordinate set and recognition accuracy evaluation data. That is, the recognized text forms a text string of a plurality of lines in whole lines in units of whole lines according to the number of lines in the original drawing. The whole line text coordinates refer to the coordinates of each line text character string in the original image. The line text refers to the corresponding line text in the text character string of the whole line, and the text coordinate set can be a set of coordinates corresponding to each line text.

S22, packaging the data obtained in the step S21 to obtain the packaged data, wherein the packaged data is json data. And the data are packaged into json data, so that the retrieval of keywords in the subsequent steps is facilitated.

S3, searching preset keywords from the package data; in the implementation, searching the preset keywords from the package data is one of the key steps of the embodiment. The preset keywords comprise numbers of medical instrument registration certificates and/or text information filled by suppliers, and the search results comprise characters, coordinates of the characters in the files and/or character recognition accuracy evaluation.

In the step, text data input by a user is used as a preset keyword, and searching is carried out in the package data. That is, the operator can input the text to be searched according to the actual requirement, and take the text data as the preset keyword to perform the subsequent processing.

The method has the main effect of searching preset keywords from the package data. It should be noted that, there is a problem of line feed for the keywords on the image, that is, in the case that the keyword spans two lines, there is a problem of omission in the prior art.

For this case, the present embodiment designs the following specific steps to solve the problem of keyword omission in the pattern. Specifically, step S3 includes the following specific steps:

s31, searching a preset keyword in the whole line of text character strings of the package data; that is, the keywords existing within one row can be directly found by directly searching in the package data.

For some certificates, the keywords to be searched for appear in two rows, namely, the situation that a row is arranged in the middle of the keywords on the image. This can be avoided by the following steps.

S32, judging whether a preset keyword can be found in the whole line of text character strings; if yes, go to step S4; if not, the process advances to step S33.

S33, splitting a preset keyword into a plurality of keywords; and searches for keywords in the entire line of text strings. That is, the keywords are split in units of single words, and then are searched in units of keywords, so that the keywords in the image can still be searched under the condition of line feed. Searching for a single keyword may involve a large number of non-target keywords being searched, and in order to determine whether the searched keyword is a non-target keyword, it is necessary to determine whether the searched keyword exists in the keywords to be searched. For this purpose, the present embodiment also contemplates the following steps:

s34, judging whether partial keywords can be found in the whole line of text character strings; if yes, go to step S35; if not, the process advances to step S37. In practice, since the number of text strings in an entire line is multiple, it is necessary to search line by line. When a line is not found, the process proceeds to step S37 to search for the next line of characters.

S35, arranging the searched keywords in sequence; in the implementation, the number of the searched keywords may be multiple or one, and when the number is one, the keywords are directly used as continuous characters. When the number of the searched keywords is multiple, sorting is performed according to the front-back sequence of the searched keywords in the text character strings of the whole line.

S36, judging whether the ordered keywords are continuous characters or not; if so, go to step S38; if not, the process advances to step S37. Specifically, the step of judging that the ordered keywords are continuous characters means that the ordered keywords exist in preset keywords.

S37, searching keywords in the text character strings of the next whole line, and repeating the steps S34-S36 until all the text characters of the whole line are traversed. When no continuous text exists in all the whole-line text character strings, the continuous text is not found in all the whole-line text character strings.

S38, judging whether the last character in the searched continuous characters is the last character in the whole line of text character strings; if yes, the process proceeds to step S39, and if no, the process proceeds to step S37. In some cases, the continuous text does not appear at the end of the line, although it exists, indicating that the continuous text does not exist in the corresponding keyword. This can be removed by this step.

S39, updating the keywords, and searching the updated keywords from the first character of the text character string of the next line. In specific implementation, the method for updating the keywords is that, among preset keywords, words formed by characters with continuous characters removed according to the original sequence are used as updated keywords.

S310, judging whether updated keywords are found; if yes, go to step S311; if not, the process advances to step S37. That is, when the updated keyword appears at the head of line position of the next line, it is explained that the updated keyword is successfully matched.

S311, combining the updated keywords and the corresponding continuous characters into a search result set. That is, the combined result is still a keyword, and it should be noted that, since two rows are spanned, the corresponding coordinates should be two groups, one group being coordinates corresponding to consecutive characters, and the other group being coordinates of the updated keyword.

S4, calculating the coordinates of the searched keywords according to the search result in the step S3.

The coordinates of the searched keywords calculated in the step comprise initial coordinates of the keywords, wherein the initial coordinates of the keywords comprise upper left corner coordinates and lower left corner coordinates of the first character and upper right corner coordinates and lower right corner coordinates of the last character. Thus, the position of the text can be quickly determined.

If the line feed condition exists in the characters, the start and stop coordinates of the two lines of characters are calculated respectively. That is, the coordinates of the continuous text and the coordinates of the corresponding updated keyword are calculated, respectively. So as to mark out the entire keyword.

S5, marking corresponding keywords in the image of the medical instrument registration certificate according to the coordinates calculated in the step S4. In practice, the marks for the keywords may be underlined under the keywords, for example, wavy lines, dash-dot lines, etc., and the colors thereof may be red, blue, etc.

In the specific implementation, the method further comprises the steps of calculating the line spacing of the characters and determining the thickness of the lines according to the line spacing of the characters; specifically, according to the start-stop coordinates of the keywords, red lines are marked in the pictures by using an image processing method.

By identifying the information on the medical instrument registration card and identifying and marking the key information, the full life cycle management of the medical instrument can be realized, traceability of various links including production, circulation, use and the like is facilitated, the supervision and management of the medical instrument are enhanced, the checking efficiency and accuracy of the medical instrument registration card can be effectively improved, the labor input cost is reduced, and meanwhile, the economic loss caused by checking errors is greatly reduced.

The invention recognizes and extracts the product information in the medical instrument registration card and compares the product information with the submitted structured product information by integrating the technologies of OCR technology, picture processing and the like, calculates and displays the comparison result and the comparison reliability, and marks the key information, thereby achieving the purposes of reducing the labor cost and reducing the risk of human errors.

The invention can improve the quality and efficiency of text processing by identifying the characters and marking and processing the preset keywords, can normalize and standardize the format and keyword marking range problems in the text, can improve the repeatability and consistency of text processing, is convenient for the subsequent data analysis and processing work, can select and process the keyword marking range, can ensure that the marked keywords are consistent with the text content, can avoid too small or too large keyword marking range, and can improve the consistency and accuracy of text processing.

Example 2:

the embodiment discloses an OCR-based medical instrument registration certificate recognition device, which comprises a memory and a processor, wherein the memory stores computer-executable instructions, the processor is configured to execute the computer-executable instructions, and the computer-executable instructions are executed by the processor to realize the OCR-based medical instrument registration certificate recognition method disclosed in the embodiment 1.

Example 3:

the present embodiment discloses a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the OCR-based medical instrument registration certificate identification method disclosed in embodiment 1.

While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. The medical instrument registration certificate identification method based on OCR is characterized by comprising the following steps of:

s3, searching preset keywords from the package data;

wherein, step S3 includes the following steps:

s35, arranging the searched keywords in sequence;

s311, combining the updated keywords and the corresponding continuous characters into a search result set;

2. The OCR-based medical device registration card recognition method according to claim 1, wherein the preprocessing of the image of the medical device registration card in step S1 is to remove noise, enhance contrast, and adjust a recognition threshold or remove background interference to improve the sharpness of the picture by graying, binarizing, and smoothing filtering.

3. The OCR-based medical instrument registration certificate recognition method according to claim 2, wherein step S2 includes:

4. The OCR-based medical instrument registration certificate recognition method of claim 1, wherein the encapsulated data is json data.

5. The OCR-based medical instrument registration identification method according to claim 3, wherein the text data input by the user is used as a preset keyword in step S3 to search in the package data.

6. The OCR-based medical instrument registration card recognition method according to claim 1, wherein the preset keywords include a number of the medical instrument registration card and/or text information filled in by a vendor, and the search result includes a text character, coordinates of the text character in a file, and/or a text character recognition accuracy evaluation.

7. The method for recognizing registration certificate of medical instrument based on OCR according to any one of claims 1 to 6, wherein the start-stop coordinates of the searched keyword calculated in step S4 include the upper left-hand and lower left-hand coordinates of the first character and the upper right-hand and lower right-hand coordinates of the last character, and if a line-feed condition exists for the characters, the start-stop coordinates of the two lines of characters are calculated respectively.

8. An OCR-based medical instrument registration certificate recognition device comprising a memory storing computer executable instructions and a processor configured to execute the computer executable instructions, wherein the computer executable instructions when executed by the processor implement the OCR-based medical instrument registration certificate recognition method of any one of claims 1-7.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the OCR-based medical instrument registration identification method of any one of claims 1 to 7.