CN114612915B

CN114612915B - Method and device for extracting patient information of film image

Info

Publication number: CN114612915B
Application number: CN202210511460.5A
Authority: CN
Inventors: 冯健; 陈栋栋; 赖永航
Original assignee: Qingdao Medcare Digital Engineering Co ltd
Current assignee: Qingdao Medcare Digital Engineering Co ltd
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2022-08-02
Anticipated expiration: 2042-05-12
Also published as: CN114612915A

Abstract

The invention relates to the field of artificial intelligence, in particular to a method and a device for extracting patient information of a film image. The method for extracting the patient information of the film image comprises the following steps: calling a pre-trained image area detection model to perform image area detection on a film image to be identified, and identifying the position information of one or more image areas; acquiring a sub-image set according to the identified position information of each image area; selecting a sub-image from the sub-image set, and performing color filling on an image area in the sub-image according to the background color of the sub-image to generate a text information image only containing patient information and the background color; and performing text detection on the text information image to obtain the patient information. The invention has higher recognition effect and higher universality.

Description

Method and device for extracting patient information of film image

Technical Field

The invention relates to the field of artificial intelligence, in particular to a method and a device for extracting patient information of a film image.

Background

After the image film is taken in a hospital, the examination image of the film is usually obtained after waiting for a period of time, and then the film is held to find a corresponding doctor, so that a diagnosis result is made. It is inconvenient for many patients going to the hospital to see a doctor. In recent years, a self-service film taking machine is started to enable a patient to print a film by self at any time, the patient scans a bar code on a receipt certificate, a background film virtual printing server verifies the patient information of the bar code, and the film is automatically printed after matching is successful. Self-service printing recognizes patient information in film images/images primarily through OCR (optical character recognition) technology.

At present, before the OCR identifies the text, OCR text detection processing is required to be carried out, and a text area is extracted from a film image and then is delivered to an OCR identification module to identify text information. Because the film size, the typesetting layout of the film and the patient information layout have no universality, the OCR recognition rate is poor, the patient information recognized by the OCR is not matched with the bar code information on the receipt voucher, the automatic printing cannot be realized, and the effective utilization rate of the self-service printing is low. In addition, in the OCR recognition process, the conventional method needs to segment individual characters from the film image, and combine them into patient information after recognition, and when the size of the preset information font of the film is reset, the OCR recognition rate is also affected.

The prior art does not present an effective solution to the above-mentioned problems of extracting film user information based on OCR technology.

Disclosure of Invention

The embodiment of the invention provides a method and a device for extracting patient information of a film image, which are used for at least improving the recognition rate of extracting the patient information from the film image.

In a first aspect, an embodiment of the present invention provides a method for extracting patient information from a film image, where the method for extracting patient information from a film image includes:

calling a pre-trained image area detection model to perform image area detection on a film image to be identified, and identifying the position information of one or more image areas;

acquiring a sub-image set according to the identified position information of each image area; the sub-image set comprises one or more sub-images, each sub-image corresponding to an image region and patient information;

selecting a sub-image from the sub-image set, and performing color filling on an image area in the sub-image according to the background color of the sub-image to generate a text information image only containing patient information and the background color;

and performing text detection on the text information image to obtain the patient information.

Optionally, the invoking a pre-trained image area detection model to perform image area detection on a film image to be recognized, before recognizing position information of one or more image areas, includes:

creating a film blank image composed of a plurality of cells according to a pre-obtained cell typesetting layout;

respectively filling medical image samples and patient information in each cell of the film blank image to generate a film training image, and recording the position information of each medical image sample in the filling process; wherein, the medical image sample and the patient information in each unit cell are positioned in different horizontal areas;

and training according to the film training image and the position information of each medical image sample to obtain the image area detection model.

Optionally, the obtaining a sub-image set according to the identified position information of each image area includes:

determining the cell layout format of the film image according to the identified position information of each image area;

and according to the cell typesetting format of the film image, performing equidistant cutting on the film image to obtain the sub-image set.

Optionally, the selecting a sub-image from the sub-image set, and performing color filling on an image area in the sub-image according to a background color of the sub-image to generate a text information image only including patient information and the background color includes:

selecting any sub-image from the sub-image set to perform similarity matching with the rest sub-images to obtain matched sub-images;

mapping the image area of the matched sub-image to the selected sub-image according to the position information of the image area of the matched sub-image to obtain a mapping area;

carrying out non-maximum suppression on the image area of the selected sub-image and the mapping area to obtain a final image area of the selected sub-image;

and filling the color of the final image area in the sub-image according to the background color of the sub-image to generate a text information image only containing the patient information and the background color.

Optionally, the position information includes a center point coordinate, a width, and a height of the image area;

the mapping the image area of the matching sub-image to the selected sub-image according to the position information of the image area of the matching sub-image includes:

matching the center point coordinates of the image area of the matched sub-image to the center point of the selected sub-image;

and mapping the image area of the matched sub-image into the selected sub-image according to the width and the height of the image area of the matched sub-image.

Optionally, the performing text detection on the text information image to obtain patient information includes:

according to image morphology, recognizing a text area in the text information image;

and performing text detection on the text area to obtain the patient information.

Optionally, the identifying a text region in the text information image according to image morphology includes:

carrying out image binarization processing on the text information image; highlighting the outline of the text region through expansion processing, corrosion processing and re-expansion processing;

extracting the outline of the text region by adopting an outline tracking algorithm;

and filtering out the outline which does not conform to the horizontal form from the outline of the text area to obtain the text area in the text information image.

Optionally, the performing text detection on the text region to obtain patient information includes:

recognizing text content from the text area according to a pre-trained text recognition model;

and matching the patient information from the identified text content by adopting a regular expression according to a preset patient information format.

Optionally, before recognizing the text content from the text region according to the pre-trained text recognition model, the method includes:

creating a text information image sample consisting of background color and horizontally arranged patient information according to the patient information format in the film training image;

training the text information image sample based on a Convolution Recurrent Neural Network (CRNN) and a connection time sequence classification (CTC) frame to obtain the text recognition model.

In a second aspect, an embodiment of the present invention provides a patient information extraction apparatus for a film image, where the patient information extraction for the film image includes: a memory, a processor, and a computer program stored on the memory and executable on the processor;

the computer program, when executed by the processor, implements the steps of a method of patient information extraction of film images as described in any one of the above.

Various embodiments of the present invention identify the location information of one or more image areas by invoking a pre-trained image area detection model to perform image area detection on a film image to be identified, then obtaining a sub-image set according to the position information of each identified image area, selecting a sub-image from the sub-image set, filling the image region in the sub-image with color according to the background color of the sub-image to generate a text information image only containing patient information and background color, finally performing text detection on the text information image to obtain the patient information, thereby realizing the removal of complex background in the film image, the image only contains text information, and the text area with multiple horizontal lines can be extracted by adopting the image morphological operation with small calculated amount, therefore, the method has higher identification effect, and reduces a large amount of labor time cost and manual labeling errors. The embodiment of the invention effectively solves the problem that the universality of the OCR text region detection and recognition module in the film image processed by the prior art is lower, and improves the universality.

Drawings

Fig. 1 is a flowchart of a method of extracting patient information from a film image according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of a film image according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating the effect of image area identification of a film image in accordance with an embodiment of the present invention;

fig. 4 is a diagram of recognition effects of sub-images according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of removing sub-images of an image region according to an embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the following drawings and specific embodiments, it being understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention.

Example one

An embodiment of the present invention provides a method for extracting patient information from a film image, as shown in fig. 1, the method for extracting patient information from a film image includes:

s101, calling a pre-trained image area detection model to perform image area detection on a film image to be identified, and identifying position information of one or more image areas;

s102, acquiring a sub-image set according to the identified position information of each image area; the sub-image set comprises one or more sub-images, each sub-image corresponding to an image region and patient information;

s103, selecting a sub-image from the sub-image set, and performing color filling on an image area in the sub-image according to the background color of the sub-image to generate a text information image only containing patient information and the background color;

and S104, performing text detection on the text information image to obtain the patient information.

As shown in fig. 2, the film image includes a plurality of image areas, and patient information is provided above the horizontal of each image area. The patient information includes information such as a patient number. The upper part is determined according to the font direction of the patient information, such as the upper part of the font. Each image area corresponds to a medical image. The position information includes coordinates of a center point of the image area, and width and height of the image area. The background color is typically black.

The embodiment of the invention carries out image area detection on the film image to be identified by calling the pre-trained image area detection model, identifies the position information of one or more image areas, then obtaining a sub-image set according to the position information of each identified image area, selecting a sub-image from the sub-image set, filling the image region in the sub-image with color according to the background color of the sub-image to generate a text information image only containing patient information and background color, finally performing text detection on the text information image to obtain the patient information, thereby realizing the removal of the complex background in the film image, the image only contains text information, and the image morphological operation with small calculation amount can extract a plurality of horizontal text areas, therefore, the method has higher identification effect, and reduces a large amount of labor time cost and manual labeling errors. The embodiment of the invention effectively solves the problem that the universality of the OCR text region detection and recognition module in the film image processed by the prior art is lower, and improves the universality.

Based on the above conception of the embodiments of the present invention, the embodiments of the present invention are described in detail by a specific embodiment.

In some embodiments, a deep learning based approach enables automatic detection of OCR text regions and text information recognition in film images. The OCR text region detection module adopts a convolutional neural network to realize multi-target detection on the film image region, training sample data of the neural network does not need to be manually marked with images, film images are automatically created based on the existing film printing and typesetting format, and film training images are generated in large batch. And calling the trained image area detection model for the film image to be identified to perform image area detection to obtain one or more image areas, wherein each image area forms an identification rectangular frame. And cutting out a sub-image with the text information of the patient from the film image according to the recognized position information of the rectangular frame. And the multi-line character detection is realized by adopting binarization, expansion, corrosion and contour tracking algorithms. The character recognition module adopts a CRNN + CTC (convolutional recurrent neural network based CRNN and connection time sequence based classification CTC framework) framework, automatically creates a text information image composed of horizontal characters with a fixed format of a black background and a white font based on the information content of a patient displayed by the existing film image, generates film training images and corresponding text information in a large batch, and trains a text recognition network. And inputting the text region image obtained by the OCR text region detection module into a text recognition network, outputting a prediction result, matching the patient information preset format content by adopting a regular expression, and predicting the patient information text content from a plurality of prediction results. Specifically, the provided method for extracting patient information of a film image may include:

step 1: training OCR area detection networks

Alternatively, a film blank image composed of a plurality of cells may be created according to a cell layout obtained in advance; filling medical image samples and patient information in each cell of the film blank image respectively to generate a film training image; recording the position information of each medical image sample in the filling process; the patient information and the medical image sample are positioned in different horizontal areas in each cell; and training according to the film training image and the position information of each medical image sample to obtain an image area detection model.

Optionally, the following specific steps may be included:

1.1 generating training film image samples

In some embodiments, without manual annotation of the images, film blank images are automatically created based on existing film cell layout formats, and training film image samples are generated in large batches.

For example, the layout format for printing a film is usually a layout of N × N cells, i.e., N cells per line and N cells per column, and one film can fill N × N medical images (e.g., radiological images) for printing. The steps of generating the training image sample are as follows: the method comprises the steps of firstly creating a film blank image with N-by-N cell typesetting layout, sequentially filling medical image samples of equipment types such as CT, MR and the like into each cell of the film blank image, and recording position information (such as center point coordinates, width and height) of each medical image sample in the whole film image in the filling process, namely generating a PASCAL VOC labeling format file with rectangular frame coordinate information by one film image. Then the patient information content is filled into the four-corner area in the unit grid to construct a complete film training image.

1.2 training image area detection network to obtain image area detection model for identifying image area in film training image.

1.3 calling the trained image area detection model to the film image to be recognized to perform image area detection, as shown in fig. 3, obtaining a plurality of image areas, wherein each image area forms a rectangular frame.

Step 2: text region detection

2.1 film image cropping. Acquiring a sub-image set according to the identified position information of each image area; as shown in fig. 3, the set of sub-images includes one or more sub-images, each sub-image corresponding to an image region and patient information.

Optionally, determining a cell layout format of the film image according to the identified position information of each image area; and according to the cell typesetting format of the film image, performing equidistant cutting on the film image to obtain a sub-image set.

For example, the cell layout format (M rows by N columns) of the film image is determined by identifying the resulting plurality of image areas. If there is only one rectangular box, the film size is considered to be 1 row by 1 column. If the number of the rectangular frames is multiple, the coordinate information of the rectangular frames is used for matching and determining: and calculating the center point coordinate of each rectangular frame, classifying the x coordinates of the center coordinates, classifying the x coordinates close to each other into one class, and obtaining the number of classes, namely the number of columns N. And similarly, classifying the y coordinate of the central coordinate, and classifying the y coordinates close to each other to obtain the number of categories, namely the line number M. And (3) cutting the film image at equal intervals of M lines by N columns, namely cutting to obtain a sub-image set of the film image, wherein each sub-image in the sub-image set comprises a text region and a video region.

In some embodiments, in order to ensure the recognition integrity of the recognition rectangular frames corresponding to the respective sub-images, i.e. to ensure that the recognized rectangular frames can cover the entire image area in the film image, further processing is required to obtain the complete rectangular frame in each sub-image. That is, if the recognized rectangular box is smaller than the actual image region, the text region recognized in the subsequent text recognition process includes a part of the content of the image region, thereby causing inaccurate patient information recognition. Therefore, as shown in fig. 4, a sub-image needs to be selected from the sub-image set, and the selected sub-image needs to be further judged. The selection mode specifically comprises the following steps: selecting any sub-image from the sub-image set to perform similarity matching with the rest sub-images to obtain matched sub-images; mapping the image area of the matched sub-image to the selected sub-image according to the position information of the image area of the matched sub-image to obtain a mapping area; and performing non-maximum suppression nms on the image area of the selected sub-image and the mapping area to obtain a final image area of the selected sub-image, and then performing color filling on the final image area in the sub-image according to the background color of the sub-image to generate a text information image only containing the patient information and the background color. The similarity matching can be calculated by using the existing similarity algorithm, for example, the similarity reaches a preset similarity threshold, and then the matching can be determined. The similarity threshold may be set to 90% similarity.

the mapping the image area of the matching sub-image to the selected sub-image according to the position information of the image area of the matching sub-image includes: matching the center point coordinates of the image area of the matched sub-image to the center point of the selected sub-image; and mapping the image area of the matched sub-image into the selected sub-image according to the width and the height of the image area of the matched sub-image.

For example, there are many sub-images in the film image (e.g. 3X3=9 CT medical images in the illustration), the image area detection model identifies many rectangular frames, and ideally identifies 9 rectangular frames (not many, one medical image corresponds to one rectangular frame); if only 8 rectangular frames are recognized, there will be 1 sub-image with no rectangular frame recognized.

And searching a target identification rectangular frame A corresponding to any sub-image coordinate range in the film image. Then using an image similarity algorithm to search a rectangular frame corresponding to the K sub-images similar to the sub-image: if the target detection rectangular frame A exists, calculating the coordinates (x _ center _ A, y _ center _ A) of the center point of the rectangular frame A; if the target detection rectangular frame a does not exist, the coordinates of the center point of the rectangular frame a (x _ center _ a, y _ center _ a) are calculated with the entire image of the upper left-hand sub-image in the film as the rectangular area of the target detection rectangular frame a. Converting the coordinate information (x, y, w, h) of the K rectangular frames into the coordinate information (x ', y', w ', h') corresponding to the image, specifically, the following operations are performed: calculating the distance (x _ span, y _ span) = (x _ center-x, y _ center-y) of the coordinate (x, y) at the upper left corner of the rectangular frame from the coordinate (x _ center, y _ center) of the center point according to the coordinate (x _ center, y _ center) of the center point of each rectangular frame; calculating the mapping of the rectangular box to the corresponding upper left-hand coordinate (x ', y') = (x _ center _ a-x _ span, y _ center _ a-y _ span) of the small picture in the film; the width and height of the rectangular frame after mapping are unchanged, and the new position (x ', y', w ', h') after coordinate mapping is obtained. Performing non-maximum value inhibition nms processing on the target detection rectangular frame, and if the target detection rectangular frame A exists, performing nms processing on the target identification rectangular frame subjected to coordinate mapping by using the target identification rectangular frame A + K pieces of coordinates; if the target detection rectangular frame A does not exist, only performing nms processing on the target identification rectangular frame subjected to coordinate mapping by K pieces to finally obtain a filtered rectangular frame (a selected sub-image).

2.2 detecting the text in the sub-image. And 2.1, acquiring the cut sub-image, wherein the coordinate information of the image area is known, and filling the image area in the coordinate range with black as shown in fig. 5, so as to remove the complex background in the film image, wherein the current sub-image only contains the patient information and the black background.

2.3 horizontal text detection in sub-images. Because the whole sub-image only contains the patient information and has no complex scene, the horizontal text detection can be realized by adopting the image morphology operation. In other words, the text region in the text information image can be identified according to the image morphology; and performing text detection on the text area to obtain the patient information. The method can comprise the following steps: carrying out image binarization processing on the text information image; the outline of the text area is highlighted through expansion treatment, corrosion treatment and re-expansion treatment; extracting the outline of the text region by adopting an outline tracking algorithm; and filtering out the outline which does not conform to the horizontal form from the outline of the text area to obtain the text area in the text information image.

For example, the image obtained in step 2.2 is subjected to binarization processing; expanding once to make the text detection outline prominent; carrying out corrosion treatment to remove redundant details; the text detection outline is further highlighted through re-expansion processing; extracting a text detection outline by adopting an outline tracking algorithm; filtering out contour regions which do not conform to the horizontal form; and extracting coordinate information of the rectangular frame of the outer boundary from the outline area.

And step 3: and a character recognition module.

Optionally, creating a text information image sample consisting of the background color and the patient information horizontally set according to the patient information format in the film training image; and obtaining the text recognition model for the text information image sample based on a Convolutional Recurrent Neural Network (CRNN) and based on a connected time sequence classification (CTC) framework.

For example, using the CRNN + CTC framework, text information image samples composed of fixed format horizontal characters in black background and white font are automatically created based on the patient information content displayed by existing film images, thereby eliminating the need for manual annotation of data during the training process. Commonly used patient information content is: patient name, patient number, date of examination, etc. For example, when the patient number of the CT examination is C-first, the MR examination is M-first, followed by N-digit numbers, and N =6, C000001, C000002, … C000009 … C999999 are generated in this order.

And generating text information image samples and corresponding texts in a large scale, and training a text recognition model. And (4) sequentially inputting the sub-images obtained in the step (2) into a text recognition model, and outputting a prediction result.

And 4, step 4: patient information content matching

Recognizing text content from a text area in the text information image according to a pre-trained text recognition model in the previous step; patient information can then be matched from the identified text content using regular expressions according to a preset patient information format. That is, the text detection and the character recognition may obtain content information of a plurality of text lines, and the content of the patient information text is predicted from the content information of the plurality of text lines by matching the content of the patient information preset format with the regular expression. For example, the content information of the text lines includes: the patient name, the patient number, the examination date and the like are matched with the patient number information, a regular expression with the patient number prefix of the C beginning and the 6-digit next following is created and is matched with the content information of the text lines, and the text content meeting the conditions is obtained, namely the patient number information is obtained.

Based on the above description, in some embodiments of the present invention, a method for detecting OCR text regions and recognizing text information in a film image is implemented by using a deep learning-based method, wherein an OCR text region detection module implements removing a complex background in a film image by using a target detection method, the image only includes text information, and a text region at a multi-line level can be extracted by using an image morphological operation with a small calculation amount. The OCR text recognition module adopts a character recognition model of a CRNN + CTC framework, and the text area image has no complex background, consists of horizontal characters with a fixed format of a black background and a white font, and has a high recognition effect. The deep learning method of the two modules does not need to label training samples manually, so that a large amount of labor time cost and manual labeling errors are reduced. The invention solves the problem that the universality of the OCR text region detection and identification module in the film image processed by the prior art is low, and improves the universality.

Example two

The embodiment of the invention provides a patient information extraction device of a film image, which comprises the following steps: a memory, a processor, and a computer program stored on the memory and executable on the processor;

the computer program, when executed by the processor, implements the steps of a method of extracting patient information from a film image according to any one of the embodiments.

EXAMPLE III

An embodiment of the present invention provides a computer-readable storage medium, on which an imaging program for medical radiology examination is stored, and when the imaging program for medical radiology examination is executed by a processor, the steps of the method for extracting patient information from a film image according to any one of embodiments are implemented.

In the concrete implementation process of the second embodiment to the third embodiment, reference may be made to the first embodiment, and corresponding technical effects are achieved.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for extracting patient information from a film image, the method comprising:

performing text detection on the text information image to obtain patient information;

the method for detecting the image area of the film image to be recognized by calling the pre-trained image area detection model comprises the following steps of before recognizing the position information of one or more image areas:

2. The method for extracting patient information from a film image according to claim 1, wherein said obtaining a set of sub-images based on the identified position information of each image area comprises:

3. The method of claim 1, wherein selecting a sub-image from the set of sub-images, color filling an image area in the sub-image according to a background color of the sub-image, and generating a text information image containing only patient information and the background color comprises:

4. The film image patient information extraction method according to claim 3, wherein the position information includes a center point coordinate, a width, and a height of the picture area;

5. The method for extracting patient information from a film image according to any one of claims 1 to 4, wherein the text detection of the text information image to obtain patient information comprises:

6. The method of extracting patient information from a film image according to claim 5, wherein said identifying text regions in said text information image based on image morphology comprises:

7. The method for extracting patient information from a film image according to claim 5, wherein the performing text detection on the text region to obtain patient information includes:

8. The film image patient information extraction method according to claim 7, wherein before recognizing text content from the text region according to a pre-trained text recognition model, the method comprises:

9. A patient information extraction apparatus for a film image, characterized by comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;

the computer program, when executed by the processor, implements the steps of a method of patient information extraction of film images as defined in any one of claims 1 to 8.