CN113537097B

CN113537097B - Information extraction method and device for image, medium and electronic equipment

Info

Publication number: CN113537097B
Application number: CN202110825008.1A
Authority: CN
Inventors: 刘昊岳; 王亚领; 马文伟; 刘设伟
Original assignee: Taikang Insurance Group Co Ltd; Taikang Online Property Insurance Co Ltd
Current assignee: Taikang Insurance Group Co Ltd; Taikang Online Property Insurance Co Ltd
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2023-08-22
Anticipated expiration: 2041-07-21
Also published as: CN113537097A

Abstract

The embodiment of the disclosure provides an information extraction method for an image, an information extraction device for the image, a computer readable medium and electronic equipment, and relates to the technical field of information identification, wherein the method comprises the following steps: identifying a text box in the target image, and fitting a straight line for dividing the target image according to the target text box containing the keywords in the text box; dividing a target image into a plurality of image areas according to a straight line; performing field identification and field information identification on text boxes in each image area in the plurality of image areas to obtain corresponding fields and field information in each image area; and generating structural information corresponding to the target image according to the corresponding fields and field information in each image area. Therefore, by implementing the technical scheme of the application, the extraction precision and the extraction efficiency of the structured information can be improved.

Description

Information extraction method and device for image, medium and electronic equipment

Technical Field

The present disclosure relates to the field of information recognition technology, and in particular, to an information extraction method for an image, an information extraction device for an image, a computer-readable medium, and an electronic device.

Background

For volatile written text, it is generally possible to record in the form of photographs. In order to uniformly manage and store the characters in the photo, a standardized list is generally required to be manually input into the characters in the photo, so that the characters in the photo are uniformly stored in a standardized manner through a structured information mode. However, manually inputting information often has a problem of low efficiency.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of an embodiment of the present disclosure is to provide an information extraction method for an image, an information extraction device for an image, a computer readable medium, and an electronic device, which can automatically extract structured information by dividing an area of an image, thereby improving extraction efficiency of the structured information.

A first aspect of an embodiment of the present disclosure provides an information extraction method for an image, the method including:

identifying a text box in the target image, and fitting a straight line for dividing the target image according to the target text box containing the keywords in the text box;

Dividing a target image into a plurality of image areas according to a straight line;

performing field identification and field information identification on text boxes in each image area in the plurality of image areas to obtain corresponding fields and field information in each image area;

and generating structural information corresponding to the target image according to the corresponding fields and field information in each image area.

According to a second aspect of embodiments of the present disclosure, there is provided an information extraction apparatus for an image, the apparatus including:

a text box recognition unit for recognizing a text box in the target image;

the linear fitting unit is used for fitting a linear for dividing the region of the target image according to the target text box containing the keywords in the text box;

an image area dividing unit for dividing the target image into a plurality of image areas according to the straight line;

the information identification unit is used for carrying out field identification and field information identification on the text box of each image area in the plurality of image areas to obtain corresponding fields and field information in each image area;

and the structured information generating unit is used for generating structured information corresponding to the target image according to the corresponding fields and the field information in each image area.

According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the information extraction method for an image as in the first aspect of the above embodiments.

According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for extracting information for an image as in the first aspect of the above embodiments.

According to a fifth aspect of the present application there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

The technical solutions provided in some embodiments of the present disclosure specifically include: identifying a text box in the target image, and fitting a straight line for dividing the target image according to the target text box containing the keywords in the text box; dividing a target image into a plurality of image areas according to a straight line; performing field identification and field information identification on text boxes in each image area in the plurality of image areas to obtain corresponding fields and field information in each image area; and generating structural information corresponding to the target image according to the corresponding fields and field information in each image area. According to the embodiment of the disclosure, on one hand, the automatic extraction of the structured information can be realized by dividing the area of the image, and the extraction efficiency of the structured information is improved. On the other hand, the straight line of the personalized matching target image can be fitted based on the target text box containing the keywords, so that the structural information of each region can be accurately extracted based on the regions divided by the straight line, and the extraction precision of the structural information is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 schematically illustrates a schematic diagram of an exemplary system architecture of an information extraction method for an image and an information extraction apparatus for an image to which embodiments of the present disclosure may be applied;

FIG. 2 schematically illustrates a structural schematic of a computer system suitable for use in implementing electronic devices of embodiments of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a method of information extraction for an image according to one embodiment of the disclosure;

FIG. 4 schematically illustrates a schematic view of a target image after straight line segmentation in accordance with one embodiment of the present disclosure;

FIG. 5 schematically illustrates a schematic view of straight line distance features in a target image according to one embodiment of the disclosure;

FIG. 6 schematically illustrates a directional projection schematic in a target image according to one embodiment of the disclosure;

FIG. 7 schematically illustrates a diagram of a case where a target image includes "non-corresponding fields and field information show closer distances" according to one embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of a case where "number/unit" is included in a target image according to one embodiment of the present disclosure;

FIG. 9 schematically illustrates a schematic diagram of a case in which "multi-line printing" is included in a target image according to one embodiment of the present disclosure;

FIG. 10 schematically illustrates a process flow diagram for a case where "multi-line printing" is included in a target image according to one embodiment of the disclosure;

FIG. 11 schematically illustrates a structured information schematic according to one embodiment of the present disclosure;

FIG. 12 schematically illustrates a multi-terminal interaction schematic of applying an image region planning model in accordance with one embodiment of the present disclosure;

fig. 13 schematically illustrates a flowchart of an information extraction method for an image according to one embodiment of the present disclosure;

fig. 14 schematically shows a block diagram of a structure of an information extraction apparatus for an image in an embodiment according to the present disclosure.

Detailed Description

Fig. 1 illustrates a schematic diagram of a system architecture of an exemplary application environment to which an information extraction method for an image and an information extraction apparatus for an image according to an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of the terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The terminal devices 101, 102, 103 may be various electronic devices with display screens including, but not limited to, desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers. Wherein the server 105 is configured to perform: identifying a text box in the target image, and fitting a straight line for dividing the target image according to the target text box containing the keywords in the text box; dividing a target image into a plurality of image areas according to a straight line; performing field identification and field information identification on text boxes in each image area in the plurality of image areas to obtain corresponding fields and field information in each image area; and generating structural information corresponding to the target image according to the corresponding fields and field information in each image area.

Fig. 2 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU) 201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In (RAM) 203, various programs and data required for system operation are also stored. The (CPU) 201, (ROM) 202, and (RAM) 203 are connected to each other through a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the (I/O) interface 205: an input section 206 including a keyboard, a mouse, and the like; an output portion 207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 208 including a hard disk or the like; and a communication section 209 including a network interface card such as a LAN card, a modem, and the like. The communication section 209 performs communication processing via a network such as the internet. The drive 210 is also connected to the (I/O) interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 210 as needed, so that a computer program read out therefrom is installed into the storage section 208 as needed.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 209, and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU) 201, performs the various functions defined in the method and apparatus of the present application.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 3, and so on.

The present exemplary embodiment provides an information extraction method for an image, which may include the following steps S310 to S340, specifically, referring to fig. 3:

step S310: and identifying the text boxes in the target image, and fitting a straight line for dividing the target image according to the target text boxes containing the keywords in the text boxes.

Step S320: the target image is divided into a plurality of image areas according to the straight line.

Step S330: and carrying out field identification and field information identification on the text boxes of each image area in the plurality of image areas to obtain corresponding fields and field information in each image area.

Step S340: and generating structural information corresponding to the target image according to the corresponding fields and field information in each image area.

By implementing the information extraction method for the image shown in fig. 3, the automatic extraction of the structured information can be realized by dividing the area of the image, and the extraction efficiency of the structured information is improved. In addition, the straight line of the personalized matching target image can be fitted based on the target text box containing the keywords, so that the structural information of each region can be accurately extracted based on the regions divided by the straight line, and the extraction precision of the structural information is improved.

Next, the above steps of the present exemplary embodiment will be described in more detail.

In step S310, text boxes in the target image are identified, and a straight line for dividing the target image into regions is fitted according to the target text boxes including keywords in the text boxes.

Specifically, the target image may be a medical manifest image shown in any one of fig. 4 to 9. The text box in the target image contains text information, wherein the text information contained in the text box may be fields (e.g., "belonging to an electronic bill code", "belonging to an electronic bill number", "payor", "date of billing", "item name", "number/unit", "amount (element)", "remark", "novelt" or "total"), or may be field information (e.g., "ABC tablet", "7.00", "agent" or "5.60"). The number of lines used to region-divide the target image may depend on the number of text boxes containing fields, and in the present application, the medical manifest image may be divided into a detail region, a head region, and a tail region, which are a plurality of image regions, for example. Specifically, the fields in the detail area and the field information are in one-to-one relationship or one-to-many relationship, the fields in the head area and the field information are in one-to-one relationship, and the fields in the tail area and the field information are in one-to-one relationship. Alternatively, the target image may be other images, and the number of the areas obtained by dividing the target image is not limited in the application.

Wherein identifying a text box in the target image comprises: determining each text region in the sample image by a text detection algorithm based on deep learning, and taking a closed image limiting the text region as a text box in the sample image, wherein the text detection algorithm can be realized based on the existing network structures such as CTPN, EAST, PA-net, DB-net and the like.

As an alternative embodiment, fitting a straight line for dividing a region of a target image according to a target text box including a keyword in the text box includes: determining at least one type of target text boxes hitting keywords in a preset word stock from the text boxes; the number of the target text boxes in each type of target text boxes is at least one; determining the position information of each target text box in at least one type of target text boxes; determining a straight line corresponding to each type of target text box according to the position information; the straight lines corresponding to each type of target text boxes are used for dividing the target image into areas.

Specifically, the preset word stock is configured to store a plurality of predefined keywords, if a text box hitting the keywords exists in the target image, the text box is determined to be a target text box, each target text box includes a keyword, and the keyword may be the above field and/or field information. Preferably, the keywords include fields.

The position information of the target text box may include that the target text box is represented relative to a central coordinate and/or an angular coordinate of the target image, and the angular coordinate of the target text box may be an upper left angular coordinate, a lower left angular coordinate, an upper right angular coordinate or a lower right angular coordinate.

Referring to fig. 4, fig. 4 schematically illustrates a schematic view of a target image obtained after straight line division according to an embodiment of the present disclosure. As shown in fig. 4, the target image includes two types of target text boxes, where one type of target text box includes: the target text box "item name", the target text box "number/unit", the target text box "amount (element)", the target text box "remark"; another class of target text boxes includes: target text box "minor", target text box "26.77". The straight line 410 may be determined by the center points of the first-class target text boxes, the straight line 420 may be determined by the center points of the second-class target text boxes, the straight line 410 may connect the center points of all the first-class target text boxes, and the straight line 420 may connect the center points of all the second-class target text boxes. Line 410 is used to divide the head region and detail region and line 420 is used to divide the tail region and detail region. The target image may be divided into a head region, a detail region, and a tail region based on the straight lines 410 and 420.

It should be noted that the text boxes shown in fig. 4 to 9 are not displayed in the form of "boxes", but each text message (e.g., field or field information) corresponds to a virtual closed area for containing the message, i.e., a text box. In addition, with respect to fig. 4 to 9, in which the text information of the regular script is the original information in the image, the text information of Song Ti is the recognition result of the text information of the regular script, and the text information of Song Ti may be displayed in the vicinity of the text information of the corresponding regular script (e.g., the lower part of the text information of the regular script).

Therefore, by implementing the alternative embodiment, the target text box for dividing the image area can be determined through the keywords, and the straight line for dividing the target image can be determined based on the position of the target text box, so that the subsequent structured information extraction is facilitated, corresponding information extraction can be performed in a targeted manner for each divided image area, and further the extraction precision and the extraction efficiency of the structured information are improved.

As an optional embodiment, determining, according to the location information, a straight line corresponding to each type of target text box includes:

determining the center point of the target text box in the class according to the position information, and determining the straight line corresponding to each class of target text box according to the center point of the target text box in the class; the straight line corresponding to each type of target text box is used for connecting the center point of the target text box in the type;

Or alternatively, the process may be performed,

determining the boundary slope of the target text boxes in the class according to the position information, and determining the straight line corresponding to each class of target text boxes according to the boundary slope of the target text boxes in the class; the straight line corresponding to each type of target text box is used for penetrating through the target text boxes in the type, and the boundary slope comprises at least one of an upper boundary slope and a lower boundary slope.

Specifically, determining a center point of the target text box in the class according to the position information comprises: determining an upper left corner coordinate, a lower left corner coordinate, an upper right corner coordinate or a lower right corner coordinate used for representing the position of the target text box in the position information; and the center point coordinates of the target text box are calculated according to the upper left corner coordinates, the lower left corner coordinates, the upper right corner coordinates or the lower right corner coordinates, and the center point of the target text box is represented by the center point coordinates. In addition, determining a boundary slope of the target text box in the class according to the position information comprises: determining the left upper corner coordinate and the right upper corner coordinate used for representing the position of the target text box in the position information, and calculating the upper boundary slope of the target text box according to the left upper corner coordinate and the right upper corner coordinate; or determining the left lower corner coordinate and the right lower corner coordinate used for representing the position of the target text box in the position information, and calculating the lower boundary slope of the target text box according to the left lower corner coordinate and the right lower corner coordinate.

Therefore, by implementing the alternative embodiment, two ways of determining the straight line can be disclosed, and the straight line determined according to the way can more accurately realize image region division, so that the extraction precision of the structured information is improved.

In step S320, the target image is divided into a plurality of image areas according to straight lines.

Specifically, the number of image areas may be at least two, which is not limited in the embodiment of the present application.

In step S330, field recognition and field information recognition are performed on the text box in each of the plurality of image areas, so as to obtain corresponding fields and field information in each image area.

Specifically, performing field recognition and field information recognition on a text box of each of a plurality of image areas includes: performing field identification and field information identification on text boxes of each image area in the plurality of image areas based on the KNN classifier; the KNN classifier relies on a K nearest neighbor algorithm (KNN, K-nearest neighbor), where each sample can be represented by the nearest K neighbor values, K being a positive integer.

As an alternative embodiment, performing field recognition and field information recognition on a text box of each image area of the plurality of image areas includes: calculating the linear distance characteristic of the text box in the target image area relative to the frame in the target image through a pre-trained classifier; wherein the target image area is any one of a plurality of image areas; calculating the horizontal distance characteristic of each text box in the target image area relative to each field in the target image area through a classifier; carrying out specific object recognition on each text box in the target image area through a classifier to obtain an object recognition result; wherein the specific object comprises at least one of symbols such as "/", numbers and words; carrying out directional projection on each text box in the target image area through a classifier to obtain projection areas corresponding to each text box in a specific direction, and merging the projection areas with intersection to obtain a plurality of fusion areas; wherein, a plurality of fusion areas are in one-to-one correspondence with each field in the detail area; and determining the straight line distance characteristic, the horizontal distance characteristic, the object recognition result and the fusion areas as the characteristic recognition result of the target image area.

Specifically, the pre-trained classifier may be the KNN classifier described above. The method for calculating the linear distance characteristic of the text box in the target image area relative to the frame in the target image through the pre-trained classifier comprises the following steps: the bottom edge length/top edge length (distance) of the border in the target image by the pre-trained classifier; determining the length (center-left) of a reference line segment parallel to the bottom edge of the frame according to the center point of the text box to be processed, wherein the reference line segment is used for connecting the specific point of the frame in the target image and the center point of the text box to be processed, and the text box to be processed is any text box in the target image area; a linear distance feature (distance_left) of the text box in the target image area relative to the border in the target image is calculated from the expression distance_left= (center-left)/distance.

The method for calculating the horizontal distance features of each text box in the target image area relative to each field in the target image area through the classifier comprises the following steps: and determining text boxes containing fields in the target image area, and calculating the horizontal distance characteristics of each text box in the target image area on the same projection plane for each field in the target image area through a KNN classifier. For example, the target image area includes a field "item name", a field "number/unit", a field "amount (element)", a field "remark", and a field information "ABC tablet", a field information "7.00/dose", a field information "5.00", a field information "S pill", a field information "42.00/dose", a field information "21.17", and a horizontal distance feature of the field information "ABC tablet", the field information "7.00/dose", the field information "5.00", the field information "S pill", the field information "42.00/dose", and the field information "21.17" respectively correspond to "item name", field "number/unit", a field "amount (element)", and a field "remark" may be calculated.

In addition, each text box in the target image area is directionally projected through a classifier, so that a projection area corresponding to each text box in a specific direction is obtained, and the method comprises the following steps: and determining a text box containing field information to be projected in the target image area, and longitudinally projecting the text box containing field information through a KNN classifier to obtain projection areas respectively corresponding to the text boxes containing field information.

Therefore, by implementing the optional embodiment, the situation that the display distance between the non-corresponding field and the field information in the image is relatively large and the text box containing a plurality of fields can be considered, so that the correct feature recognition can be performed, and the extraction precision of the structured information can be improved.

Referring to fig. 5, fig. 5 schematically illustrates a schematic diagram of straight line distance features in a target image according to one embodiment of the present disclosure. As shown in fig. 5, a straight line 510 may be determined based on the target text box "item name", the target text box "number/unit", the target text box "amount (element)", the target text box "remark", and a straight line 520 may be determined based on the target text box "small" and the target text box "26.77". Straight lines 510 and 520 are used to divide the target image into a head region, a detail region, and a tail region. Based on this, the straight line distance feature of the text box in the detail region with respect to the border 511 in the target image can be calculated by the KNN classifier. The detail area includes: a text box including "project name", a text box including "quantity/unit", a text box including "amount (element)", a text box including "remark", a text box including "ABC tablet", a text box including "7.00/dose", a text box including "5.00", a text box including "S pill", a text box including "42.00/dose", a text box including "21.17". Taking a text box comprising "ABC tablet" as an example, the length of the bottom edge 513 of the border in the KNN classifier target image may be passed; determining the length of a reference line segment 512 parallel to the bottom edge of the frame according to the center point of the text box comprising 'ABC tablet', wherein the reference line segment 512 is used for connecting the specific point of the frame in the target image and the center point of the text box to be processed; the length ratio of the reference line segment 512 to the bottom edge 513 is calculated as a straight line distance feature of the text box including "ABC tablet".

Referring to fig. 6, fig. 6 schematically illustrates a directed projection schematic in a target image according to one embodiment of the present disclosure. As shown in fig. 6, a straight line 610 may be determined based on the target text box "item name", the target text box "number/unit", the target text box "amount (element)", the target text box "remark", and a straight line 620 may be determined based on the target text box "small" and the target text box "26.77". Straight lines 610 and 620 are used to divide the target image into a head region, a detail region, and a tail region. Based on this, the text box including "ABC tablet", the text box including "7.00/dose", the text box including "5.00", the text box including "S pill", the text box including "42.00/dose", the text box including "21.17" may be projected longitudinally to obtain the text box including "ABC tablet", the text box including "7.00/dose", the text box including "5.00", the text box including "S pill", the text box including "42.00/dose", the projection area corresponding to the text box including "21.17", and the projection areas where intersections exist are combined to obtain the fusion area 611, the fusion area 612, the fusion area 613, the fusion area 614, that is, the above-described plural fusion areas.

Referring to fig. 7, fig. 7 schematically illustrates a case where a target image includes "non-corresponding fields and field information showing a closer distance" according to an embodiment of the disclosure. As shown in fig. 7, a straight line 710 may be determined based on the target text box "item name", the target text box "number/unit", the target text box "amount (element)", the target text box "remark", and a straight line 720 may be determined based on the target text box "small" and the target text box "26.77". Straight lines 710 and 720 are used to divide the target image into a head region, a detail region, and a tail region. Based on the linear distance characteristics of the text box in the target image area relative to the frame in the target image can be calculated through the KNN classifier; calculating the horizontal distance characteristic of each text box in the target image area relative to each field in the target image area; performing specific object recognition on each text box in the target image area to obtain an object recognition result; and carrying out directional projection on each text box in the target image area to obtain projection areas corresponding to each text box in a specific direction, and merging the projection areas with intersection to obtain a plurality of fusion areas. Thereby determining the text box in the area 711 as field information corresponding to the field "amount (meta)", avoiding determining the text box in the area 711 as field information corresponding to the field "remark" because the print distance of the text box in the area 711 is closer to the field "remark".

Referring to fig. 8, fig. 8 schematically illustrates a schematic diagram of a case where "number/unit" is included in a target image according to an embodiment of the present disclosure. As shown in fig. 8, a straight line 810 may be determined based on the target text box "item name", the target text box "number/unit", the target text box "amount (element)", the target text box "remark", and a straight line 820 may be determined based on the target text box "small" and the target text box "26.77". Straight lines 810 and 820 are used to divide the target image into a head region, a detail region, and a tail region. Based on the linear distance characteristics of the text box in the target image area relative to the frame in the target image can be calculated through the KNN classifier; calculating the horizontal distance characteristic of each text box in the target image area relative to each field in the target image area; performing specific object recognition on each text box in the target image area to obtain an object recognition result; and carrying out directional projection on each text box in the target image area to obtain projection areas corresponding to each text box in a specific direction, and merging the projection areas with intersection to obtain a plurality of fusion areas. In this way, the field information corresponding to the field "number" and the field "unit" in the "number/unit" can be identified, that is, the correct identification of the text box in the region 811 is achieved, and the correct correspondence is determined. The text box containing "7.00/dose" and the text box containing "42.00/dose" are avoided from being determined as field information corresponding to the field "number" or field information corresponding to the field "unit". Based on the above manner, the field information "7.00" and "42.00" corresponding to the field "number" and the field information "agent" corresponding to the field "unit" can be correctly identified, so as to improve the extraction precision of the structured information.

As an optional embodiment, after determining the straight distance feature, the horizontal distance feature, the object recognition result, and the plurality of fusion regions as the feature recognition result of the target image region, the method further includes: and training the classifier according to the characteristic recognition results of the target image and each image area until the loss function of the classifier converges.

Specifically, training the classifier according to the feature recognition results of the target image and each image region until the loss function of the classifier converges, including: inputting the target image into a classifier so that the classifier calculates the feature recognition result of each image area, and carrying out parameter adjustment on the classifier according to the feature recognition result of each image area in the sample and the loss function between the feature recognition results of each image area calculated by the classifier until the loss function of the classifier converges.

It can be seen that by implementing the alternative embodiment, the classifier can be continuously trained based on the feature recognition results of the target image and each image area, so that the classification precision of the classifier is improved.

As an optional embodiment, performing field recognition and field information recognition on a text box in each image area in the plurality of image areas to obtain a corresponding field and field information in each image area, including: determining corresponding reference fields and reference field information in each image area according to the characteristic identification result of each image area and the text boxes in each image area; comparing the text length of the upper and lower adjacent reference field information in each image area to obtain a comparison result; if the comparison result shows that the text length of the upper reference field information in the upper and lower adjacent reference field information is larger than that of the lower reference field information, calculating the confidence coefficient of the fusion result of the upper and lower adjacent reference field information and the confidence coefficient respectively corresponding to the upper and lower adjacent reference field information according to the fields corresponding to the upper and lower adjacent reference field information; if the confidence coefficient of the fusion result is larger than the confidence coefficient corresponding to the upper and lower adjacent reference field information respectively, fusing the text boxes corresponding to the upper and lower adjacent reference field information respectively into complete text boxes; and updating the corresponding reference fields and the corresponding reference field information in each image area according to the complete text box to obtain the corresponding fields and the corresponding field information in each image area.

Specifically, if the comparison result indicates that the text length of the upper reference field information in the vertically adjacent reference field information is less than or equal to the text length of the lower reference field information, the method may further include: and judging the upper and lower adjacent reference field information to be mutually independent reference field information, and no fusion is needed. In addition, if the confidence coefficient of the fusion result is less than or equal to the confidence coefficient corresponding to the upper and lower adjacent reference field information respectively, the method may further include: and judging the upper and lower adjacent reference field information to be mutually independent reference field information, and no fusion is needed.

Referring to fig. 9, fig. 9 schematically illustrates a schematic diagram of a case in which "multi-line printing" is included in a target image according to an embodiment of the present disclosure. As shown in fig. 9, a straight line 910 may be determined based on the target text box "item name", the target text box "number/unit", the target text box "amount (element)", the target text box "remark", and a straight line 920 may be determined based on the target text box "small" and the target text box "26.77". Straight lines 910 and 920 are used to divide the target image into a head region, a detail region, and a tail region. In the detail area shown in fig. 9, there is a "multi-line print" case, that is, the case shown in the area 911, the text box containing the field information "ABCDEFGHHH tablet" is shown as two lines of text, easily recognized as two text boxes containing "ABCDEFGHH" and containing "H tablet". Based on the above, the corresponding reference field and the reference field information in each image area can be determined according to the feature recognition result of each image area and the text boxes in each image area, the text length comparison is carried out on the vertically adjacent reference field information in each image area, the comparison result is obtained, if the comparison result shows that the text length of the upper reference field information in the vertically adjacent reference field information is larger than the text length of the lower reference field information, the confidence of the fusion result of the vertically adjacent reference field information and the confidence of the respective correspondence of the vertically adjacent reference field information are calculated according to the fields corresponding to the vertically adjacent reference field information, and if the confidence of the fusion result is larger than the confidence of the respective correspondence of the vertically adjacent reference field information, the text boxes corresponding to the vertically adjacent reference field information are fused into complete text boxes. Thus, two text boxes identified as containing "ABCDEFGHH" and containing "H-tablets" can be fused into a complete text box containing "ABCDEFGHHH tablets" to improve the accuracy of extraction of structured information.

Referring to fig. 10, fig. 10 schematically illustrates a process flow diagram for a case where "multi-line printing" is included in a target image according to an embodiment of the present disclosure. As shown in fig. 10, the flow diagram may include: step S1010 to step S1050.

Step S1010: and determining corresponding reference fields and reference field information in each image area according to the characteristic identification result of each image area and the text box in each image area.

Step S1020: and comparing the text lengths of the upper and lower adjacent reference field information in each image area to obtain a comparison result. If the text length of the upper reference field information in the vertically adjacent reference field information is greater than the text length of the lower reference field information, step S1030 is performed. If the text length of the upper reference field information in the vertically adjacent reference field information is smaller than or equal to the text length of the lower reference field information, ending the flow.

Step S1030: and calculating the confidence coefficient of the fusion result of the upper and lower adjacent reference field information and the confidence coefficient respectively corresponding to the upper and lower adjacent reference field information according to the fields corresponding to the upper and lower adjacent reference field information.

Step S1040: and detecting whether the confidence coefficient of the fusion result is larger than the confidence coefficient corresponding to the upper and lower adjacent reference field information respectively. If so, step S1050 is performed. If not, the process is ended.

Step S1050: and fusing the text boxes respectively corresponding to the upper and lower adjacent reference field information into a complete text box.

Therefore, by implementing the alternative embodiment, the text boxes can be fused, so that the multiple lines of text information forming the complete content are prevented from being identified as multiple text information, the complete content is prevented from being split, and the extraction precision of the structured information is improved.

In step S340, structured information corresponding to the target image is generated from the corresponding fields and field information in each image area.

Specifically, generating structured information corresponding to the target image according to the corresponding field and field information in each image area includes: and arranging the corresponding fields and field information in each image area according to a preset typesetting rule, and outputting an arrangement result as structured information corresponding to the target image.

Referring to fig. 11, fig. 11 schematically illustrates a structured information diagram according to one embodiment of the present disclosure. As shown in fig. 11, the structured information corresponding to the target image generated from the corresponding field and field information in each image area may include: electronic bill code: 123456; electronic bill number: 789123; the reminiscent: xxx; billing date: year 2020, month 01 and day 01; item details: ABC tablet-7.00/dose-5.60, S pill-42.00/dose-21.17; the minor gauge is as follows: 26.77; and (5) summation: 26.77; collection unit: XXXXXX; page number: page 1.

Referring to fig. 12, fig. 12 schematically illustrates a multi-terminal interaction diagram of applying an image region planning model according to one embodiment of the present disclosure. As shown in fig. 12, the multi-terminal interaction process of applying the image region planning model includes: a client 1210, a claim core system 1220, and a claim processing end 1230; the claim core system 1220 is configured to receive the medical manifest image uploaded by the client 1210, and call the classifier 1221 to extract structural information of the medical manifest image, so that the structural information extracted by the classifier 1221 may be fed back to the claim processing end 1230. The client 1210 and the claim processing 1230 may be user terminals, and the claim core system 1220 may be run in a server, where the claim core system 1220 includes a visualization platform for receiving the uploaded medical manifest image and displaying the extracted structured information.

Referring to fig. 13, fig. 13 schematically illustrates a flowchart of an information extraction method for an image according to one embodiment of the present disclosure. As shown in fig. 13, the information extraction method for an image includes: step S1310 to step S1370.

Step S1310: identifying text boxes in the target image, determining at least one type of target text boxes hitting keywords in a preset word stock from the text boxes, and determining the position information of each target text box in the at least one type of target text boxes; wherein the number of target text boxes in each type of target text boxes is at least one. Further, step S1320 or step S1330 is performed.

Step S1320: determining the center point of the target text box in the class according to the position information, and determining the straight line corresponding to each class of target text box according to the center point of the target text box in the class; the straight line corresponding to each type of target text box is used for connecting the center points of the target text boxes in the type. Further, step S1340 is performed.

Step S1330: determining the boundary slope of the target text boxes in the class according to the position information, and determining the straight line corresponding to each class of target text boxes according to the boundary slope of the target text boxes in the class; the straight line corresponding to each type of target text box is used for penetrating through the target text boxes in the type, and the boundary slope comprises at least one of an upper boundary slope and a lower boundary slope. Further, step S1340 is performed.

Step S1340: the target image is divided into a plurality of image areas including a detail area, a head area, and a tail area according to the straight line.

Step S1350: calculating the linear distance characteristic of the text box in the target image area relative to the frame in the target image through a pre-trained classifier; wherein the target image area is any one of a plurality of image areas; calculating the horizontal distance characteristic of each text box in the target image area relative to each field in the target image area through a classifier; carrying out specific object recognition on each text box in the target image area through a classifier to obtain an object recognition result; wherein the specific object comprises at least one of a symbol, a number, and a word; carrying out directional projection on each text box in the target image area through a classifier to obtain projection areas corresponding to each text box in a specific direction, and merging the projection areas with intersection to obtain a plurality of fusion areas; wherein, a plurality of fusion areas are in one-to-one correspondence with each field in the detail area; and determining the straight line distance characteristic, the horizontal distance characteristic, the object recognition result and the fusion areas as the characteristic recognition result of the target image area.

Step S1360: determining corresponding reference fields and reference field information in each image area according to the characteristic identification result of each image area and the text boxes in each image area; comparing the text length of the upper and lower adjacent reference field information in each image area to obtain a comparison result; if the comparison result shows that the text length of the upper reference field information in the upper and lower adjacent reference field information is larger than that of the lower reference field information, calculating the confidence coefficient of the fusion result of the upper and lower adjacent reference field information and the confidence coefficient respectively corresponding to the upper and lower adjacent reference field information according to the fields corresponding to the upper and lower adjacent reference field information; if the confidence coefficient of the fusion result is larger than the confidence coefficient corresponding to the upper and lower adjacent reference field information respectively, fusing the text boxes corresponding to the upper and lower adjacent reference field information respectively into complete text boxes; and updating the corresponding reference fields and the corresponding reference field information in each image area according to the complete text box to obtain the corresponding fields and the corresponding field information in each image area.

Step S1370: and generating structural information corresponding to the target image according to the corresponding fields and field information in each image area.

It should be noted that, the steps S1310 to S1370 correspond to the steps and embodiments shown in fig. 3, and for the specific implementation of the steps S1310 to S1370, please refer to the steps and embodiments shown in fig. 3, and the details are not repeated here.

Therefore, by implementing the method for extracting the information of the image shown in fig. 13, the structured information can be automatically extracted by dividing the area of the image, so that the extraction efficiency of the structured information is improved. In addition, the straight line of the personalized matching target image can be fitted based on the target text box containing the keywords, so that the structural information of each region can be accurately extracted based on the regions divided by the straight line, and the extraction precision of the structural information is improved.

Further, in the present exemplary embodiment, there is also provided an information extraction apparatus for an image, referring to fig. 14, the information extraction apparatus 1400 for an image may include:

a text box recognition unit 1401 for recognizing a text box in the target image;

a straight line fitting unit 1402 for fitting a straight line for performing region division on the target image according to a target text box including a keyword in the text box;

an image region dividing unit 1403 for dividing the target image into a plurality of image regions according to a straight line;

An information identifying unit 1404, configured to perform field identification and field information identification on the text box in each of the plurality of image areas, so as to obtain a corresponding field and field information in each image area;

the structured information generating unit 1405 is configured to generate structured information corresponding to the target image according to the corresponding field and field information in each image area.

The image areas comprise a detail area, a head area and a tail area, wherein the fields in the detail area and the field information are in one-to-one or one-to-many relation, the fields in the head area and the field information are in one-to-one relation, and the fields in the tail area and the field information are in one-to-one relation.

Therefore, the information extraction device for the image shown in fig. 14 can be implemented to automatically extract the structured information by dividing the area of the image, so as to improve the extraction efficiency of the structured information. In addition, the straight line of the personalized matching target image can be fitted based on the target text box containing the keywords, so that the structural information of each region can be accurately extracted based on the regions divided by the straight line, and the extraction precision of the structural information is improved.

In an exemplary embodiment of the present disclosure, the straight line fitting unit 1402 fits a straight line for region division of a target image according to a target text box including a keyword in text boxes, including:

determining at least one type of target text boxes hitting keywords in a preset word stock from the text boxes; the number of the target text boxes in each type of target text boxes is at least one;

determining the position information of each target text box in at least one type of target text boxes;

determining a straight line corresponding to each type of target text box according to the position information; the straight lines corresponding to each type of target text boxes are used for dividing the target image into areas.

In an exemplary embodiment of the present disclosure, the straight line fitting unit 1402 determines a straight line corresponding to each type of target text box according to the position information, including:

or alternatively, the process may be performed,

In an exemplary embodiment of the present disclosure, the information identifying unit 1404 performs field identification and field information identification on a text box of each of a plurality of image areas, including:

calculating the linear distance characteristic of the text box in the target image area relative to the frame in the target image through a pre-trained classifier; wherein the target image area is any one of a plurality of image areas;

Calculating the horizontal distance characteristic of each text box in the target image area relative to each field in the target image area through a classifier;

carrying out specific object recognition on each text box in the target image area through a classifier to obtain an object recognition result; wherein the specific object comprises at least one of a symbol, a number, and a word;

carrying out directional projection on each text box in the target image area through a classifier to obtain projection areas corresponding to each text box in a specific direction, and merging the projection areas with intersection to obtain a plurality of fusion areas; wherein, a plurality of fusion areas are in one-to-one correspondence with each field in the detail area;

and determining the straight line distance characteristic, the horizontal distance characteristic, the object recognition result and the fusion areas as the characteristic recognition result of the target image area.

In an exemplary embodiment of the present disclosure, the above apparatus further includes:

a classifier training unit (not shown) for training the classifier based on the feature recognition results of the target image and each image region after the information recognition unit 1404 determines the straight-line distance feature, the horizontal distance feature, the object recognition result, and the plurality of fusion regions as the feature recognition results of the target image region, until the loss function of the classifier converges.

In an exemplary embodiment of the present disclosure, the information identifying unit 1404 performs field identification and field information identification on a text box of each of a plurality of image areas, to obtain a corresponding field and field information in each image area, including:

determining corresponding reference fields and reference field information in each image area according to the characteristic identification result of each image area and the text boxes in each image area;

comparing the text length of the upper and lower adjacent reference field information in each image area to obtain a comparison result;

if the comparison result shows that the text length of the upper reference field information in the upper and lower adjacent reference field information is larger than that of the lower reference field information, calculating the confidence coefficient of the fusion result of the upper and lower adjacent reference field information and the confidence coefficient respectively corresponding to the upper and lower adjacent reference field information according to the fields corresponding to the upper and lower adjacent reference field information;

if the confidence coefficient of the fusion result is larger than the confidence coefficient corresponding to the upper and lower adjacent reference field information respectively, fusing the text boxes corresponding to the upper and lower adjacent reference field information respectively into complete text boxes;

And updating the corresponding reference fields and the corresponding reference field information in each image area according to the complete text box to obtain the corresponding fields and the corresponding field information in each image area.

Claims

1. An information extraction method for an image, comprising:

identifying a text box in a target image, and fitting a straight line for dividing the target image according to the target text box containing keywords in the text box;

dividing the target image into a plurality of image areas according to the straight line;

performing field identification and field information identification on text boxes of each image area in the plurality of image areas to obtain corresponding fields and field information in each image area;

generating structural information corresponding to the target image according to the corresponding fields and field information in each image area;

the method for identifying the field and identifying the field information of the text box of each image area in the plurality of image areas comprises the following steps:

Calculating the linear distance characteristic of a text box in a target image area relative to a frame in the target image through a pre-trained classifier; wherein the target image area is any one of the plurality of image areas;

calculating horizontal distance features of each text box in the target image area relative to each field in the target image area through the classifier;

performing specific object recognition on each text box in the target image area through the classifier to obtain an object recognition result; wherein the specific object comprises at least one of a symbol, a number and a word;

carrying out directional projection on each text box in the target image area through the classifier to obtain projection areas corresponding to the text boxes in a specific direction, and merging the projection areas with intersection to obtain a plurality of fusion areas; wherein, the fusion areas are in one-to-one correspondence with the fields in the detail areas;

and determining the straight line distance feature, the horizontal distance feature, the object recognition result and the fusion areas as feature recognition results of the target image area.

2. The method of claim 1, wherein fitting a straight line for region division of the target image according to a target text box including a keyword in the text boxes, comprises:

determining the position information of each target text box in the at least one type of target text boxes;

3. The method according to claim 2, wherein determining the straight line corresponding to each type of target text box according to the position information comprises:

or alternatively, the process may be performed,

4. The method of claim 1, wherein the plurality of image regions comprises a detail region, a header region, and a trailer region, wherein the fields in the detail region are in a one-to-one or one-to-many relationship with the field information, wherein the fields in the header region are in a one-to-one relationship with the field information, and wherein the fields in the trailer region are in a one-to-one relationship with the field information.

5. The method of claim 1, wherein after determining the straight-line distance feature, the horizontal distance feature, the object recognition result, and the plurality of fusion regions as the feature recognition result of the target image region, the method further comprises:

and training the classifier according to the target image and the feature recognition result of each image area until the loss function of the classifier converges.

6. The method of claim 1, wherein performing field recognition and field information recognition on the text box of each of the plurality of image areas to obtain corresponding fields and field information in each of the plurality of image areas, comprises:

Comparing the text length of the reference field information adjacent to each other in the image areas to obtain a comparison result;

calculating confidence degrees respectively corresponding to the comparison result and the upper and lower adjacent reference field information according to the comparison result and the fields corresponding to the upper and lower adjacent reference field information;

if the confidence coefficient of the comparison result is larger than the confidence coefficient corresponding to the upper and lower adjacent reference field information respectively, fusing the text boxes corresponding to the upper and lower adjacent reference field information respectively into complete text boxes;

7. An information extraction apparatus for an image, comprising:

a text box recognition unit for recognizing a text box in the target image;

a straight line fitting unit, configured to fit a straight line for performing region division on the target image according to a target text box including a keyword in the text box;

an image area dividing unit configured to divide the target image into a plurality of image areas according to the straight line;

a structured information generating unit, configured to generate structured information corresponding to the target image according to the corresponding field and field information in each image area;

the information identifying unit performs field identification and field information identification on text boxes of each image area in the plurality of image areas, and the information identifying unit includes:

8. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the information extraction method for an image according to any one of claims 1 to 6.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of information extraction for an image as claimed in any one of claims 1 to 6.