CN114299529A - Identification method based on medical laboratory test report picture, storage medium and terminal - Google Patents

Identification method based on medical laboratory test report picture, storage medium and terminal Download PDF

Info

Publication number
CN114299529A
CN114299529A CN202111565336.9A CN202111565336A CN114299529A CN 114299529 A CN114299529 A CN 114299529A CN 202111565336 A CN202111565336 A CN 202111565336A CN 114299529 A CN114299529 A CN 114299529A
Authority
CN
China
Prior art keywords
picture
medical laboratory
characters
laboratory sheet
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111565336.9A
Other languages
Chinese (zh)
Inventor
张少典
马汉东
肖威
朱珉
薛颜波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Senyi Medical Instrument Co ltd
Original Assignee
Changsha Senyi Medical Instrument Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Senyi Medical Instrument Co ltd filed Critical Changsha Senyi Medical Instrument Co ltd
Priority to CN202111565336.9A priority Critical patent/CN114299529A/en
Publication of CN114299529A publication Critical patent/CN114299529A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides an identification method based on a medical laboratory sheet picture, a storage medium and a terminal, wherein the identification method based on the medical laboratory sheet picture comprises the following steps: acquiring a medical laboratory sheet picture, and compiling the medical laboratory sheet picture; preprocessing the compiled medical laboratory sheet picture to generate a picture to be identified; target detection is carried out on the character area in the picture to be identified; extracting characters based on the targets detected in the character areas, and determining coordinates corresponding to the characters; and outputting a text recognition result according to the characters and the coordinates. The invention can realize data extraction of a large number of medical laboratory test report pictures, also can customize rules for automatic processing, and improves the recognition efficiency of the medical laboratory test reports and the accuracy rate of extracting characters.

Description

Identification method based on medical laboratory test report picture, storage medium and terminal
Technical Field
The invention belongs to the technical field of character recognition, relates to a recognition method of characters in a picture, and particularly relates to a recognition method based on a medical laboratory sheet picture, a storage medium and a terminal.
Background
OCR (Optical Character Recognition) is a technology for automatically recognizing text in an image, and has a long research history and a wide application scope, but medical laboratory test sheets have a wide variety of pictures, and the Recognition process includes complex situations such as illumination change, low resolution, font and arrangement diversity, and many kinds of medical term characters, so that general OCR Recognition is difficult to apply in the medical field, and in addition, general OCR Recognition lacks data structuring and typesetting at a later stage.
Therefore, how to provide an identification method, a storage medium and a terminal based on a medical laboratory sheet picture to solve the defects that the prior art cannot automatically, comprehensively and accurately identify the medical laboratory sheet picture and the like becomes a technical problem to be solved by the technical staff in the field.
Disclosure of Invention
In view of the above disadvantages of the prior art, an object of the present invention is to provide an identification method, a storage medium, and a terminal based on medical laboratory test report pictures, which are used to solve the problem that the prior art cannot break through the limitation of professional skills and automatically, comprehensively, and accurately identify the medical laboratory test report pictures.
In order to achieve the above objects and other related objects, an aspect of the present invention provides a medical laboratory sheet picture-based identification method, including: acquiring a medical laboratory sheet picture, and compiling the medical laboratory sheet picture; preprocessing the compiled medical laboratory sheet picture to generate a picture to be identified; target detection is carried out on the character area in the picture to be identified; extracting characters based on the targets detected in the character areas, and determining coordinates corresponding to the characters; and outputting a text recognition result according to the characters and the coordinates.
In an embodiment of the present invention, the step of compiling the medical laboratory sheet picture includes: and compiling the medical laboratory sheet pictures into binary data to form a basic data format required by preprocessing.
In an embodiment of the present invention, the step of preprocessing the compiled medical laboratory sheet picture to generate a picture to be recognized includes: judging the compiled medical laboratory sheet pictures according to each judgment condition of the preset judgment logic; and preprocessing the medical laboratory sheet picture which is judged to be correct according to preset processing logic to generate a picture to be identified.
In an embodiment of the present invention, the determining condition includes: whether the picture is fuzzy or not, whether the picture meets the identification requirement of the laboratory test report or not, whether the picture is deformed or not and whether the picture is inclined or not; the preset processing logic comprises: at least one of picture enhancement, picture noise reduction, binarization processing, horizontal correction and deformation processing.
In an embodiment of the present invention, the step of performing target detection on the text region in the picture to be recognized includes: establishing a text recognition network through deep learning; detecting a character area in the picture to be recognized by utilizing the text recognition network; segmenting the text area to determine a single continuous text area; and taking a single continuous character area as a detection target.
In an embodiment of the present invention, the step of establishing the text recognition network through deep learning includes: establishing more than two layers of text recognition networks through deep learning; training samples of the text recognition network are obtained at least partially based on the medical association format samples; and carrying out forward and backward bidirectional propagation among the network layers, so that the network parameter weight is connected with the context.
In an embodiment of the present invention, the step of extracting a character based on the target detected in the text area and determining the coordinates corresponding to the character includes: extracting image convolution characteristics based on the target detected in the text area; extracting character sequence features based on the image convolution features; aligning the characters corresponding to the character sequence features, and determining coordinates corresponding to the characters; generating corresponding word vector codes from the medical term dictionary corpus aiming at the characters after the alignment treatment; and combining full text words and sentences to calculate the similarity of each word vector code, and taking the word with the highest similarity as the extracted character.
In an embodiment of the present invention, the step of outputting the text recognition result according to the character and the coordinate includes: initializing a two-dimensional matrix; mapping the characters to rows and columns corresponding to the two-dimensional matrix according to the coordinates of the extracted characters, and filling the matrix; judging the density degree of the filled characters, and adjusting the text interval proportion according to the judgment result; outputting the adjusted characters in sequence according to the rows of the matrix, and adding a line feed character into each row for typesetting; and taking the text content with the typesetting format as the text recognition result.
To achieve the above and other related objects, another aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the medical laboratory sheet picture-based identification method.
To achieve the above and other related objects, a final aspect of the present invention provides a terminal comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory so as to enable the terminal to execute the medical laboratory sheet picture-based identification method.
As described above, the identification method, the storage medium and the terminal based on the medical laboratory sheet picture according to the present invention have the following advantages:
the invention combines deep learning and the medical term dictionary corpus to perform neural network training and calculation in a fusion manner, not only can realize data extraction of a large number of medical laboratory test report pictures, but also can customize rules for automatic processing, and improves the recognition efficiency of the medical laboratory test reports and the accuracy rate of extracting characters. Meanwhile, the character typesetting display and the structuring processing can be performed, a plurality of application products can be connected, the end-to-end real-time extraction of the test order data is realized, and the labor cost is reduced. The invention can flexibly cope with the complex conditions of illumination change, low resolution, font and arrangement diversity, various medical term characters and the like in the process of image recognition.
Drawings
Fig. 1 is a schematic flow chart of an identification method based on medical laboratory test report pictures according to an embodiment of the present invention.
FIG. 2 is a flow chart of the preprocessing of the identification method based on medical laboratory test report picture according to an embodiment of the present invention.
Fig. 3 is a schematic view of a medical laboratory sheet according to an embodiment of the identification method based on a medical laboratory sheet picture of the present invention.
Fig. 4 is a schematic text recognition diagram of the identification method based on medical laboratory sheet pictures according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a textual layout of the identification method based on medical laboratory sheet pictures according to an embodiment of the present invention.
Fig. 6 is a schematic structural connection diagram of the terminal according to an embodiment of the invention.
Description of the element reference numerals
6 terminal
61 processor
62 memory
S11-S15
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
The identification method, the storage medium and the terminal based on the medical laboratory sheet pictures can realize data extraction of a large number of medical laboratory sheet pictures, can customize rules for automatic processing, and improve the identification efficiency of the medical laboratory sheet and the accuracy rate of extracting characters.
The principles and embodiments of the identification method, the storage medium and the terminal based on the medical laboratory sheet picture according to the present embodiment will be described in detail below with reference to fig. 1 to 6, so that those skilled in the art can understand the identification method, the storage medium and the terminal based on the medical laboratory sheet picture without creative work.
Referring to fig. 1, a schematic flow chart of an identification method based on medical laboratory test report pictures according to an embodiment of the present invention is shown. As shown in fig. 1, the identification method based on medical laboratory test report picture specifically includes the following steps:
and S11, acquiring the medical laboratory sheet picture, and compiling the medical laboratory sheet picture.
In one embodiment, the medical laboratory sheet pictures are compiled into binary data to form a basic data format required for preprocessing.
And S12, preprocessing the compiled medical laboratory sheet picture to generate a picture to be identified.
Please refer to fig. 2, which is a flowchart illustrating a preprocessing process of the identification method based on medical laboratory test report pictures according to an embodiment of the present invention. As shown in fig. 2, the preset judgment logic and the preset processing logic constitute a preprocessing flow.
On one hand, the compiled medical laboratory sheet pictures are judged according to all judgment conditions of the preset judgment logic.
The judgment condition includes: whether the picture is fuzzy or not, whether the picture meets the identification requirement of the laboratory test report or not, whether the picture has deformation or not and whether the picture is inclined or not.
Wherein, whether the picture has deformation or not refers to whether the medical laboratory sheet has paper marks caused by folding, wrinkles and the like or not. Whether the picture meets the identification requirement of the test ticket refers to whether the coding format of the picture or the language type of the content of the picture meets the preprocessing requirement, for example, the conventional picture formats such as jpg or png meet the identification requirement of the test ticket, and the moving picture gif does not meet the identification requirement of the test ticket; the widely used languages such as Chinese or English conform to the identification requirements of the laboratory test reports, and the less widely used languages such as Arabic do not conform to the identification requirements of the laboratory test reports.
And on the other hand, preprocessing the medical laboratory sheet picture which is judged to be correct according to the preset processing logic to generate a picture to be identified.
The preset processing logic comprises: at least one of picture enhancement, picture noise reduction, binarization processing, horizontal correction and deformation processing. The horizontal correction means that the acquired medical laboratory sheet picture is displayed in an inclined mode and is rotated by a certain angle to be horizontal.
It should be noted that the judgment result of the preset judgment logic may be associated with the preset processing logic, or may be independent of each other. For example, when the picture is judged to be blurred, the picture can be enhanced according to a preset processing logic; when the image is judged to be deformed, deformation processing can be carried out according to preset processing logic so as to eliminate folding or wrinkle traces on the image.
In addition, the invention provides the registration and deletion of corresponding judgment nodes and processing nodes for the user aiming at the preset judgment logic and the preset processing logic, and finally outputs the preprocessed binary data. For example, a user can select two judgment conditions of whether the picture is fuzzy and whether the picture meets the identification requirement of the laboratory test report through an operation interface according to the self pretreatment requirement, and can also select two processing nodes of picture noise reduction and binarization processing through the operation interface. Therefore, the invention realizes the automatic processing of the customized rule, and improves the recognition efficiency of the medical laboratory test report and the accuracy rate of extracting the characters. For example, if a picture with changed illumination and low resolution is selected for picture enhancement in the preprocessing, the data extraction of a large number of medical laboratory test report pictures with similar conditions is automatically performed for picture enhancement preprocessing.
In practical application, the invention can configure a system for executing the identification method based on the medical laboratory test report picture, configure whether the picture is preprocessed or not, and configure the preprocessed judgment logic and Processing logic, and can also configure the environment depending on the system, including system CPU (Central Processing Unit ) and GPU (Graphics Processing Unit, Graphics Processing Unit) limiting parameters, picture size limitation, and the like.
And S13, carrying out target detection on the character area in the picture to be recognized.
In one embodiment, S13 specifically includes the following steps:
(1) and establishing a text recognition network through deep learning.
Specifically, more than two layers of text recognition networks are established through deep learning; training samples of the text recognition network are obtained at least partially based on the medical association format samples; and carrying out forward and backward bidirectional propagation among the network layers, so that the network parameter weight is connected with the context.
In practical application, a character detection algorithm DBNet based on segmentation is adopted, so that the shape of a text is not limited, and the problem of diversity of the content arrangement of the laboratory sheet pictures can be well solved; when a neural network, namely a text recognition network, is trained, a multi-layer recognition network is established through deep learning, forward and backward bidirectional propagation is carried out between network layers, so that the network parameter weight is connected with a context, and a more stable recognition effect network is established. And adding more medical associated format samples to establish the detection effect on the specific target of the medical data.
(2) And detecting a character area in the picture to be recognized by utilizing the text recognition network.
(3) And segmenting the text area to determine a single continuous text area.
Specifically, the single continuous character area determination includes: through morphological expansion corrosion, the characters become a large area. Then, through screening, filtering out regions with small areas, identifying the whole outline, determining the edge of the outline and marking the edge by using a matrix.
(4) And taking a single continuous character area as a detection target.
And S14, extracting characters based on the targets detected in the character areas, and determining the coordinates corresponding to the characters.
In one embodiment, S14 specifically includes the following steps:
(1) and extracting image convolution characteristics based on the target detected in the character area.
Specifically, the present invention utilizes the CRNN + CTC (or CNN + RNN + CTC) framework. A CRNN (Convolutional Recurrent Neural Network) algorithm is used for the detection target, wherein the CNN algorithm extracts image convolution features.
(2) And extracting character sequence features based on the image convolution features.
Specifically, a deep bidirectional LSTM (Long Short Term Memory Network) Network extracts character sequence features, and the LSTM is a special RNN (Recurrent Neural Network) and mainly aims to solve the problems of gradient elimination and gradient explosion in the Long sequence training process.
(3) And aligning the characters corresponding to the character sequence characteristics, and determining the coordinates corresponding to the characters.
Specifically, a CTC (connectivity Temporal Classification, time series class Classification of a connection network) is used to solve the problem that characters cannot be aligned during training.
(4) And generating corresponding word vector codes from the medical term dictionary corpus aiming at the characters after the alignment processing.
Specifically, like the numeric character "5", it is possible to generate a plurality of word vector encodings, such as the word vector encoding of the number "5" and the word vector encoding of the english letter "S". Such as the english letter "B," a plurality of word vector encodings may be generated, such as the word vector encoding of the number "8" and the word vector encoding of the english letter "B.
Therefore, the same content represented by different fonts in the picture can be recognized through the medical term dictionary corpus, and for example, the character content of the 'predicted value' can be correctly recognized through the medical term dictionary corpus for the 'predicted value' regardless of the Song style or the regular style. The problem of multiple medical term characters can be well solved.
(5) And combining full text words and sentences to calculate the similarity of each word vector code, and taking the word with the highest similarity as the extracted character.
Specifically, for the numeric character "5", similarity calculation is performed on the word vector code of the number "5" and the word vector code of the english letter "S", respectively, and finally the number "5" with the highest similarity is taken as the extracted character. And (3) respectively carrying out similarity calculation on the word vector code of the number 8 and the word vector code of the English letter B aiming at the English letter B, and finally taking the English letter B with the highest similarity as the extracted character.
Please refer to fig. 3, which is a schematic diagram of a medical laboratory sheet according to an embodiment of the identification method based on a medical laboratory sheet picture of the present invention. As shown in fig. 3, the medical laboratory sheet picture is a blood routine laboratory sheet picture, and when the blood routine laboratory sheet picture is shot, the light on the upper part is dark, and the picture needs to be enhanced in the preprocessing process; horizontal creases exist between the distribution width of the platelets and the average platelet volume, vertical creases exist in a column of detection results, the fact that the image has deformation is described for two creases, and deformation correction needs to be carried out in the preprocessing process to eliminate folding traces; the whole blood routine laboratory sheet picture is inclined at a certain angle when being shot, so that horizontal correction is needed in the preprocessing process, and all recognized characters are displayed on the basis of a horizontal plane.
Please refer to fig. 4, which is a schematic text recognition diagram of the identification method based on medical laboratory test chart pictures according to an embodiment of the present invention. As shown in fig. 4, the recognized characters corresponding to fig. 3 are individually presented, including information such as the department, the type of specimen, the bed number, the examination department collection time, the doctor applying for the examination, each examination item, and the corresponding examination result. The blood routine laboratory sheet picture in fig. 3 has a dark upper light, which affects the identification result, and the application time and the sample number are not accurately identified.
And S15, outputting a text recognition result according to the characters and the coordinates.
In one embodiment, S15 specifically includes the following steps:
(1) a two-dimensional matrix is initialized.
Specifically, according to the original resolution of the picture, a two-dimensional matrix with rows X and columns Y is initialized, and space characters are used for filling.
(2) And mapping the characters to rows and columns corresponding to the two-dimensional matrix according to the coordinates of the extracted characters, and filling the matrix.
Specifically, coordinate start points of all detection targets are mapped to the XthiLine YjAnd (4) columns. According to the single detection target character length and the starting point, mapping characters to the back row and column of the text display one by one until the character length end point.
(3) And judging the density degree of the filled characters, and adjusting the text interval proportion according to the judgment result.
Specifically, if the characters in the single detection target are dense, the text interval mapping ratio is increased appropriately, and if the characters in the single detection target are sparse, the text interval mapping ratio is decreased appropriately.
(4) And outputting the adjusted characters in sequence according to the rows of the matrix, and adding a line-feed character into each row for typesetting.
(5) And taking the text content with the typesetting format as the text recognition result.
Specifically, the characters finally filled according to the matrix are sequentially output according to rows, and a line-feed character is added into each row. And outputting text of txt with a basic typesetting format.
Please refer to fig. 5, which is a schematic diagram of a textual layout of the identification method based on medical laboratory sheet pictures according to an embodiment of the present invention. As shown in fig. 5, a textt file after the extracted characters and the character coordinates are subjected to text typesetting is presented, which includes information such as the department, the specimen type, the bed number, the examination department acquisition time, the doctor applying, each detection item and the corresponding detection result. the txt file is used as a preferable mode of a text recognition result, is convenient for a user to check and is convenient to import into various devices or applications for reading and using.
It should be noted that the txt file is only one embodiment of the text recognition result, and other formats such as table, WORD, mail, subscription message of various terminal software or applications, etc. are included in the scope of the present invention.
The protection scope of the identification method based on the medical laboratory test chart picture is not limited to the execution sequence of the steps listed in the embodiment, and all the schemes of adding, subtracting and replacing the steps in the prior art according to the principle of the invention are included in the protection scope of the invention.
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, implements the medical laboratory sheet picture-based identification method.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned computer-readable storage media comprise: various computer storage media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Please refer to fig. 6, which is a schematic diagram illustrating a structural connection of the terminal according to an embodiment of the present invention. As shown in fig. 6, the present embodiment provides a terminal 6, which specifically includes: a processor 61 and a memory 62; the memory 62 is used for storing a computer program, and the processor 61 is used for executing the computer program stored in the memory 62 to make the terminal 6 execute the steps of the identification method based on the medical laboratory test chart picture.
The Processor 61 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component.
The Memory 62 may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
In practice, the terminal may be a computer that includes all or a portion of the memory, memory controller, one or more processing units (CPUs), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports; the computer includes, but is not limited to, Personal computers such as desktop computers, notebook computers, tablet computers, smart phones, Personal Digital Assistants (PDAs), and the like. In other embodiments, the terminal may also be a server, where the server may be arranged on one or more entity servers according to various factors such as functions and loads, or may be a cloud server formed by a distributed or centralized server cluster, which is not limited in this embodiment.
In summary, the identification method, the storage medium and the terminal based on the medical laboratory sheet pictures can be used for performing network training and calculation by combining deep learning and the medical term dictionary corpus, so that not only can data extraction of a large number of medical laboratory sheet pictures be realized, but also rules can be customized for automatic processing, and the identification efficiency of the medical laboratory sheet pictures and the accuracy of character extraction are improved. Meanwhile, the character typesetting display and the structuring processing can be performed, a plurality of application products can be connected, the end-to-end real-time extraction of the test order data is realized, and the labor cost is reduced. The invention can flexibly cope with the complex conditions of illumination change, low resolution, font and arrangement diversity, various medical term characters and the like in the process of image recognition. The invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. The identification method based on the medical laboratory sheet picture is characterized by comprising the following steps:
acquiring a medical laboratory sheet picture, and compiling the medical laboratory sheet picture;
preprocessing the compiled medical laboratory sheet picture to generate a picture to be identified;
target detection is carried out on the character area in the picture to be identified;
extracting characters based on the targets detected in the character areas, and determining coordinates corresponding to the characters;
and outputting a text recognition result according to the characters and the coordinates.
2. The medical laboratory sheet picture-based identification method according to claim 1, wherein said step of compiling said medical laboratory sheet picture comprises:
and compiling the medical laboratory sheet pictures into binary data to form a basic data format required by preprocessing.
3. The identification method based on the medical laboratory sheet picture as claimed in claim 1, wherein the step of preprocessing the compiled medical laboratory sheet picture to generate the picture to be identified comprises the following steps:
judging the compiled medical laboratory sheet pictures according to each judgment condition of the preset judgment logic;
and preprocessing the medical laboratory sheet picture which is judged to be correct according to preset processing logic to generate a picture to be identified.
4. The identification method based on medical laboratory sheet picture according to claim 3, characterized in that:
the judgment condition includes: whether the picture is fuzzy or not, whether the picture meets the identification requirement of the laboratory test report or not, whether the picture is deformed or not and whether the picture is inclined or not;
the preset processing logic comprises: at least one of picture enhancement, picture noise reduction, binarization processing, horizontal correction and deformation processing.
5. The identification method based on the medical laboratory sheet picture according to claim 1, wherein the step of performing target detection on the text area in the picture to be identified comprises the following steps:
establishing a text recognition network through deep learning;
detecting a character area in the picture to be recognized by utilizing the text recognition network;
segmenting the text area to determine a single continuous text area;
and taking a single continuous character area as a detection target.
6. The identification method based on medical laboratory test report picture according to claim 5, wherein said step of establishing text recognition network by deep learning comprises:
establishing more than two layers of text recognition networks through deep learning; training samples of the text recognition network are obtained at least partially based on the medical association format samples;
and carrying out forward and backward bidirectional propagation among the network layers, so that the network parameter weight is connected with the context.
7. The medical laboratory test report picture-based recognition method according to claim 1, wherein said step of extracting characters based on the targets detected in said text area and determining the coordinates corresponding to said characters comprises:
extracting image convolution characteristics based on the target detected in the text area;
extracting character sequence features based on the image convolution features;
aligning the characters corresponding to the character sequence features, and determining coordinates corresponding to the characters;
generating corresponding word vector codes from the medical term dictionary corpus aiming at the characters after the alignment treatment;
and combining full text words and sentences to calculate the similarity of each word vector code, and taking the word with the highest similarity as the extracted character.
8. The medical laboratory sheet picture-based recognition method according to claim 1, wherein the step of outputting a text recognition result according to the characters and the coordinates comprises:
initializing a two-dimensional matrix;
mapping the characters to rows and columns corresponding to the two-dimensional matrix according to the coordinates of the extracted characters, and filling the matrix;
judging the density degree of the filled characters, and adjusting the text interval proportion according to the judgment result;
outputting the adjusted characters in sequence according to the rows of the matrix, and adding a line feed character into each row for typesetting;
and taking the text content with the typesetting format as the text recognition result.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for medical laboratory test chart picture-based identification according to any one of claims 1 to 8.
10. A terminal, comprising: a processor and a memory;
the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory to cause the terminal to execute the identification method based on the medical laboratory sheet picture according to any one of claims 1 to 8.
CN202111565336.9A 2021-12-20 2021-12-20 Identification method based on medical laboratory test report picture, storage medium and terminal Pending CN114299529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111565336.9A CN114299529A (en) 2021-12-20 2021-12-20 Identification method based on medical laboratory test report picture, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111565336.9A CN114299529A (en) 2021-12-20 2021-12-20 Identification method based on medical laboratory test report picture, storage medium and terminal

Publications (1)

Publication Number Publication Date
CN114299529A true CN114299529A (en) 2022-04-08

Family

ID=80967076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111565336.9A Pending CN114299529A (en) 2021-12-20 2021-12-20 Identification method based on medical laboratory test report picture, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN114299529A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115274095A (en) * 2022-08-02 2022-11-01 广州酷睿体悟科技有限公司 Intelligent APP laboratory sheet recognition early warning system
CN115984859A (en) * 2022-12-14 2023-04-18 广州市保伦电子有限公司 Image character recognition method and device and storage medium
CN117152778A (en) * 2023-10-31 2023-12-01 安徽省立医院(中国科学技术大学附属第一医院) Medical instrument registration certificate identification method, device and medium based on OCR

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1101141A (en) * 1993-06-25 1995-04-05 欧姆龙株式会社 Apparatus and method for adjusting space between character
CN102567303A (en) * 2010-12-24 2012-07-11 北京大学 Typesetting method and device for variable official document data
CN105930844A (en) * 2016-04-20 2016-09-07 西北工业大学 Method for improving paper medical test sheet mobile phone scanning identification rate
CN113569840A (en) * 2021-08-31 2021-10-29 平安医疗健康管理股份有限公司 Form recognition method and device based on self-attention mechanism and storage medium
CN113673519A (en) * 2021-08-24 2021-11-19 平安科技(深圳)有限公司 Character recognition method based on character detection model and related equipment thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1101141A (en) * 1993-06-25 1995-04-05 欧姆龙株式会社 Apparatus and method for adjusting space between character
CN102567303A (en) * 2010-12-24 2012-07-11 北京大学 Typesetting method and device for variable official document data
CN105930844A (en) * 2016-04-20 2016-09-07 西北工业大学 Method for improving paper medical test sheet mobile phone scanning identification rate
CN113673519A (en) * 2021-08-24 2021-11-19 平安科技(深圳)有限公司 Character recognition method based on character detection model and related equipment thereof
CN113569840A (en) * 2021-08-31 2021-10-29 平安医疗健康管理股份有限公司 Form recognition method and device based on self-attention mechanism and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
(意)保罗·加莱奥内(PAOLO GALEONE): "《TensorFlow 2.0神经网络实践》", 北京:高等教育出版社, pages: 126 *
MINGHUI LIAO ED.: "Real-time Scene Text Detection with Differentiable Binarization", 《ARXIV:1911.08947V2 [CS.CV]》, pages 1 - 11 *
李杭: "面向医疗化验单数据的智能识别系统研究与实现", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 9, pages 1 - 70 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115274095A (en) * 2022-08-02 2022-11-01 广州酷睿体悟科技有限公司 Intelligent APP laboratory sheet recognition early warning system
CN115984859A (en) * 2022-12-14 2023-04-18 广州市保伦电子有限公司 Image character recognition method and device and storage medium
CN117152778A (en) * 2023-10-31 2023-12-01 安徽省立医院(中国科学技术大学附属第一医院) Medical instrument registration certificate identification method, device and medium based on OCR
CN117152778B (en) * 2023-10-31 2024-01-16 安徽省立医院(中国科学技术大学附属第一医院) Medical instrument registration certificate identification method, device and medium based on OCR

Similar Documents

Publication Publication Date Title
Bušta et al. E2e-mlt-an unconstrained end-to-end method for multi-language scene text
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
CN111723807B (en) End-to-end deep learning recognition machine for typing characters and handwriting characters
US10354168B2 (en) Systems and methods for recognizing characters in digitized documents
CN114299529A (en) Identification method based on medical laboratory test report picture, storage medium and terminal
US20190294921A1 (en) Field identification in an image using artificial intelligence
US9384389B1 (en) Detecting errors in recognized text
RU2757713C1 (en) Handwriting recognition using neural networks
CN111507330B (en) Problem recognition method and device, electronic equipment and storage medium
Asthana et al. Handwritten multiscript numeral recognition using artificial neural networks
JP2021166070A (en) Document comparison method, device, electronic apparatus, computer readable storage medium and computer program
KR20200020305A (en) Method and Apparatus for character recognition
CN115130613B (en) False news identification model construction method, false news identification method and device
Rehman et al. A scale and rotation invariant urdu nastalique ligature recognition using cascade forward backpropagation neural network
Shekar et al. Optical character recognition and neural machine translation using deep learning techniques
CN113711232A (en) Object detection and segmentation for inking applications
CN111008624A (en) Optical character recognition method and method for generating training sample for optical character recognition
Elanwar et al. Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model
Nayak et al. Odia running text recognition using moment-based feature extraction and mean distance classification technique
Al-Barhamtoshy et al. An arabic manuscript regions detection, recognition and its applications for ocring
CN114581928A (en) Form identification method and system
Raj et al. Grantha script recognition from ancient palm leaves using histogram of orientation shape context
Al Ghamdi A novel approach to printed Arabic optical character recognition
Hemanth et al. CNN-RNN BASED HANDWRITTEN TEXT RECOGNITION.
Memon et al. Glyph identification and character recognition for Sindhi OCR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination