CN114973221A - Information identification method, device, equipment and storage medium - Google Patents

Information identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN114973221A
CN114973221A CN202110215064.3A CN202110215064A CN114973221A CN 114973221 A CN114973221 A CN 114973221A CN 202110215064 A CN202110215064 A CN 202110215064A CN 114973221 A CN114973221 A CN 114973221A
Authority
CN
China
Prior art keywords
information
information structure
field
image
structure body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110215064.3A
Other languages
Chinese (zh)
Inventor
杨志博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202110215064.3A priority Critical patent/CN114973221A/en
Publication of CN114973221A publication Critical patent/CN114973221A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/40Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the invention provides an information identification method, an information identification device, information identification equipment and a storage medium, wherein the method comprises the following steps: displaying an image of a target object in a first display area of an interface; determining at least one information structure body contained in the image and label information corresponding to the at least one information structure body respectively, wherein each information structure body corresponds to different fields in the target object; and outputting the at least one information structural body and the mark information corresponding to the at least one information structural body in a second display area of the interface. The user can quickly distinguish each information structure body based on the mark information corresponding to each information structure body so as to check and correct each information structure body.

Description

Information identification method, device, equipment and storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an information identification method, apparatus, device, and storage medium.
Background
The traditional financial reimbursement method is mainly characterized in that reimbursers give various cards and tickets and bill lines to financial staff, and the financial staff manually complete the auditing, inputting and settlement of reimbursement information, so that a large amount of manpower and time are wasted, and the efficiency is low.
With the development of artificial intelligence and cloud computing technology, intelligent office has gone into more and more enterprises. The financial reimbursement is an important component of intelligent office, and related text information in cards and tickets can be automatically identified based on an Optical Character Recognition (OCR) technology, so that electronic input of the related text information is realized.
The accuracy of the text information recognition result is not very reliable due to many factors, and if inaccurate text information is input, the error of the reimbursement result can be caused.
Disclosure of Invention
Embodiments of the present invention provide an information identification method, apparatus, device, and storage medium, which can help a user to quickly locate a possible error in a text information identification result.
In a first aspect, an embodiment of the present invention provides an information identification method, where the method includes:
displaying an image of a target object in a first display area of an interface;
determining at least one information structure body contained in the image and mark information corresponding to the at least one information structure body, wherein each information structure body corresponds to different fields in the target object;
and outputting the at least one information structural body and the mark information corresponding to the at least one information structural body in a second display area of the interface.
In a second aspect, an embodiment of the present invention provides an information identification apparatus, including:
the display module is used for displaying the image of the target object in a first display area of the interface;
a detection module, configured to determine at least one information structure included in the image and label information corresponding to the at least one information structure, where each information structure corresponds to a different field in the target object;
the display module is further configured to output, in a second display area of the interface, the at least one information structure and the mark information corresponding to the at least one information structure.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the information identification method as described in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is enabled to implement at least the information identification method according to the first aspect.
In a fifth aspect, an embodiment of the present invention provides an information identification method, where the method includes:
receiving a request for calling a target service interface by user equipment, wherein the request comprises an image of a target object;
executing the following steps by utilizing the processing resource corresponding to the target service interface:
determining at least one information structure body contained in the image and mark information corresponding to the at least one information structure body, wherein each information structure body corresponds to a different field in the target object;
and sending the at least one information structure body to the user equipment so that the user equipment displays the image in a first display area of an interface and outputs the at least one information structure body and the mark information corresponding to the at least one information structure body in a second display area.
In a sixth aspect, an embodiment of the present invention provides an information identification method, where the method includes:
acquiring a form image;
determining at least one information structure body contained in the form image and marking information corresponding to the at least one information structure body, wherein each information structure body corresponds to different fields in the form image;
and outputting the marking information corresponding to the at least one information structure body and the at least one information structure body respectively so that a user can complete the information input processing of the form image according to the marking information and the at least one information structure body.
In a seventh aspect, an embodiment of the present invention provides an information identification method, where the method includes:
acquiring an image containing commodity information;
determining at least one information structure body contained in the image and mark information corresponding to the at least one information structure body, wherein each information structure body corresponds to different fields in the image;
and outputting the marking information corresponding to the at least one information structure body and the at least one information structure body respectively so that a user can check the input commodity information according to the marking information and the at least one information structure body.
In an eighth aspect, an embodiment of the present invention provides an information identification method, where the method includes:
acquiring a medical record image;
determining at least one information structure body contained in the medical record image and mark information corresponding to the at least one information structure body, wherein each information structure body corresponds to different fields in the medical record image;
and outputting the marking information corresponding to the at least one information structure body and the at least one information structure body respectively, so that a user can screen medical record images meeting the requirements according to the marking information and the at least one information structure body.
In a ninth aspect, an embodiment of the present invention provides an information identification method, where the method includes:
acquiring a teaching image;
determining at least one information structure body contained in the teaching image and mark information corresponding to the at least one information structure body, wherein each information structure body corresponds to different fields in the teaching image;
and outputting the marking information corresponding to the at least one information structure body and the at least one information structure body respectively, so that a user can screen out teaching images meeting the requirements according to the marking information and the at least one information structure body. In a tenth aspect, an embodiment of the present invention provides an information identification method, where the method includes:
acquiring a reimbursement image containing a card and/or a bill;
determining at least one information structure body contained in the reimbursement image and mark information corresponding to the at least one information structure body, wherein each information structure body corresponds to different fields in the reimbursement image;
and outputting the at least one information structure body and the mark information corresponding to the at least one information structure body respectively so that a user can complete reimbursement processing according to the mark information and the at least one information structure body.
In the embodiment of the present invention, when an image including a target object (such as a certain card and a certain ticket) is received, text information identification is performed on the image to obtain at least one information structure included in the target object and label information corresponding to the at least one information structure, where each information structure corresponds to a different field in the target object, and for example, each information structure includes a field attribute, a field position, and a field content. Finally, the at least one identified information structure body and the mark information corresponding to each information structure body are output, so that a user can quickly check each information structure body based on the mark information corresponding to each information structure body, and the information structure body with the identification error can be corrected, so that the accuracy of the final identification result is ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a schematic diagram of an image and information structure provided by an embodiment of the present invention;
fig. 2 is a flowchart of an information identification method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a display effect of an information structure according to an embodiment of the present invention;
FIG. 4 is a flow chart of another information recognition method according to an embodiment of the present invention;
fig. 5 is a flowchart of an information structure recognition process according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of plate information provided by an embodiment of the present invention;
fig. 7 is a schematic diagram of an identification process of an information structure according to an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating a display effect of an information structure according to an embodiment of the present invention;
FIG. 9 is a flow chart of another information recognition method according to an embodiment of the present invention;
fig. 10 is a schematic diagram illustrating an application of an information identification method according to an embodiment of the present invention;
FIG. 11 is a flow chart of another information recognition method according to an embodiment of the present invention;
FIG. 12 is a flow chart of another information recognition method according to an embodiment of the present invention;
FIG. 13 is a flow chart of another information recognition method according to an embodiment of the present invention;
FIG. 14 is a flowchart of another information recognition method according to an embodiment of the present invention;
FIG. 15 is a flowchart of another information recognition method according to an embodiment of the present invention;
fig. 16 is a schematic structural diagram of an information recognition apparatus according to an embodiment of the present invention;
fig. 17 is a schematic structural diagram of an electronic device corresponding to the information identification apparatus provided in the embodiment shown in fig. 16.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "the plural" typically includes at least two.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
In addition, the sequence of steps in the embodiments of the methods described below is merely an example, and is not strictly limited.
The information identification method provided by the embodiment of the invention can be executed by an electronic device, and the electronic device can be a terminal device such as a PC (personal computer), a notebook computer, a smart phone and the like, and can also be a server. The server may be a physical server including an independent host, or may also be a virtual server, or may also be a cloud server or a server cluster.
The information identification method provided by the embodiment of the invention can be used for carrying out character information identification on each object contained in one image and outputting the identified character information in a structuralized mode. In addition, in the process of automatically identifying the character information, a phenomenon that some character information identification results are wrong may exist, so that a user can quickly and conveniently locate the character information identification results which are possibly wrong, the character information is structurally output, meanwhile, the mark information corresponding to the character information can be simultaneously output, and therefore the user can quickly focus and check the accuracy of each identification result according to the mark information corresponding to different identification results.
Here, the structured output character information means outputting a group of character information recognized from an input image in the form of a key-value pair, that is, outputting each recognized information structure.
Taking the input image to be recognized as an image of a target object, the target object is, for example, a card, a bill, a report, a commodity leaflet, or the like. The target object may actually include a plurality of fields, each information structure referred to herein corresponding to a different field in the target object. In the process of identifying the information of the image, the related information of each field needs to be extracted, and the related information of each field can be represented in a structural form, so that one information structural body is used for representing one field contained in the target object. In addition, with the information structure, data in an image format (image data is input) may also be converted into data in a character format (the information structure is in a character format), so that a user can perform operations such as editing, information entry, and the like on the data in the character format in the following.
In practical application, each information structure body is composed of a set of corresponding field attributes, field positions and field contents. The field attribute, the field position, and the field content may be considered as belonging to a key (key) and a key value (value). Since one information structure is used to represent one field in the target object, in order to accurately represent one field, the three types of information are used to represent one field in the embodiment of the present invention. Assuming that the target object includes a field X, the field position is a pixel position of the field X in the image, and the field position is automatically output in the process of identifying the information structure of the image. The field content is the character content actually filled in the field X as the name implies, and is automatically obtained through character recognition processing in the process of recognizing the information structure. Since the target object is assumed to have a fixed plate type in the embodiment of the present invention, each field in the object, such as the target object, has a specific physical meaning, and the physical meaning can be expressed by an automatic attribute, in other words, the field attribute is used to describe the physical meaning of the field content.
The information structure may be represented in the following format: [ field position, field attribute, field content ], it is possible to know, from the information structure, what is the content of a field having a field at a position in the image of the target object and what is the physical meaning of the field.
For ease of understanding, the composition of an image and the meaning of an information structure are exemplified with reference to fig. 1.
In fig. 1, it is assumed that one image includes an identification card and a train ticket illustrated in the drawing, that is, the identification card and the train ticket of a certain user are photographed together to obtain one image, and the purpose of performing text information recognition on the image is to extract each information structure included in an identification card image area and each information structure included in a train ticket image area.
Based on the assumptions in fig. 1, the information structure identified in the identification card image area may include, but is not limited to:
[ L1, name, Zhangzhi ], [ L2, date of birth, 9/20/1990 ], [ L3, address, a cell in a city ].
Where "L1" represents the corresponding field position of the name field in the id card image area, "name" is the field attribute (or field type, field name), "zhangsomewhat" represents the field content.
Similarly, "L2" and "L3" respectively indicate the corresponding field positions of the date of birth field and the address field in the identification card image area.
Similarly, the information structure identified in the train ticket image area may include, but is not limited to:
[ L4, START STATION, Guangzhou east STATION ], [ L5, END STATION, Beijing south STATION ], [ L6, Ticket price, 500 yuan ].
As shown in fig. 1, based on the recognition result of the information structure, the character recognition results corresponding to the identification card image area and the train ticket image area can be output on the interface. Specifically, as shown in fig. 1, for example, when a plurality of information structural bodies identified from an identification card image area are displayed, the positional relationships of the plurality of information structural bodies may be determined from the field positions included in the plurality of information structural bodies, and based on the positional relationships, the field attributes and the field contents included in each of the plurality of information structural bodies are displayed on the interface. The position relation of the plurality of information structural bodies is as follows: the positional relationship of one information structure with respect to other information structures, for example, the information structure "date of birth" illustrated in fig. 1, is located somewhere below the information structure "name".
In practical application, after the information structure is obtained, the field attribute and the field content included in each information structure can be recorded for subsequent use. For example, in a financial reimbursement scenario, an information structure identified from each card or ticket in the image may be entered into a reimbursement system for subsequent financial staff to perform relevant reimbursement processing.
Fig. 1 illustrates a situation that after the text information recognition processing is performed on the input image, correct text information recognition results are obtained, but in practical applications, it cannot be guaranteed that the recognition results are completely correct and reliable, and therefore, the recognition results can be displayed so as to be easily checked and corrected manually, so as to ensure that the finally obtained text information is accurate.
The following describes an exemplary implementation of the information recognition method provided herein with reference to the following embodiments.
Fig. 2 is a flowchart of an information identification method according to an embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:
201. an image of the target object is displayed within a first display area of the interface.
202. And determining at least one information structure body contained in the image and mark information corresponding to the at least one information structure body respectively, wherein each information structure body corresponds to a different field in the target object.
203. And outputting the at least one information structural body and the mark information corresponding to the at least one information structural body in a second display area of the interface.
In an embodiment of the present invention, the input image may be an image including at least one object, and each object may include a plurality of fields therein. The target object may be any one of the at least one object, and the target object includes a plurality of fields therein.
In practical applications, at least one object at least including the target object may be photographed together to obtain an image of the target object. In different application scenarios, the target object is different. For example, in a reimbursement scenario, the target object may be one or more cards, tickets. In a medical scenario, for example, the target object may be an electronic medical record. For example, in an e-market setting, the target object may be a product flyer.
In practical application, in different application scenarios, a user may have a requirement for extracting text information included in an image of a target object, and at this time, after obtaining the image of the target object, the user may obtain the text information included in the image, that is, at least one information structure, by using the method provided by the embodiment of the present invention, and may quickly complete the examination of the extracted text information.
In order to facilitate the user to check the recognition result, an initial input image before recognition and the recognition result may be displayed on an interface of the user equipment at the same time, where the initial input image is an image of the target object, and the recognition result is the at least one information structure obtained by recognition and the mark information corresponding to the at least one information structure.
The identification process of the information structure body can be obtained by identifying through a pre-trained network model, and the specific identification process will be described in detail below.
Alternatively, the label information corresponding to the information structure may be determined according to the confidence of the information structure obtained in the process of identifying the information structure. Alternatively, the mark information corresponding to the information structure may be determined according to the field position and the field attribute included in the information structure. Alternatively, the label information corresponding to different information structures may be randomly determined.
The core idea is that the mark information corresponding to different information structures is different according to the field position and the field attribute contained in the information structure. For example, adjacent fields may be given different label information to easily visually distinguish the adjacent fields. For another example, fields of different field attributes may have different label information to distinguish different fields.
For ease of understanding, the implementation of the present embodiment is exemplarily described with reference to fig. 3.
As shown in fig. 3, it is assumed that the image of the target object is an image including a train ticket and a taxi ticket, and in this case, the target object includes a train ticket and a taxi ticket. The image of the initial input, i.e., the image including the train ticket and the taxi ticket, may be displayed in a first display area of an interface of the user device.
In fig. 3, it is assumed that a plurality of information structures illustrated in the second display area in fig. 3 are obtained by taking the identification of the information structure included in the train ticket as an example. Then, the relative position relationship between different information structures can be kept unchanged according to the field positions recorded in the plurality of information structures, and the field attribute and the field content recorded in each information structure can be respectively displayed in the form of a key value pair in the second display area. Meanwhile, according to the mark information corresponding to each information structure, the mark information is added to the display area of the corresponding field attribute and/or field content, and in fig. 3, different mark information is represented by different lines.
In practical applications, the representation form of the mark information may include various forms such as color, figure, symbol, and the like.
Based on the mark information corresponding to different information structures, the user can focus and distinguish different information structures, and compare the information structures with corresponding fields in the input image, so that whether the identification result of the information structure is correct or not can be known, and the information structure with the wrong identification can be corrected.
As described above, the mark information corresponding to the information structure reflects the reliability of the information contained in the information structure, so that the user can focus on the verification of the information structure with low reliability, and the user can select to trust the recognition result of the model for the information structure with high reliability. In this way, when receiving an editing operation triggered by a user for a target information structure according to the mark information corresponding to each information structure, the editing operation is executed to realize error correction of the target information structure. The target information structure is any one of the plurality of acquired information structures, and the user generally triggers an editing operation for an information structure with low reliability. The editing operation described above is often an operation of correcting the field contents and the field attributes.
In another alternative embodiment, if a plurality of objects are included in one image, and the number of fields to be identified in each object is large, the number of information structures obtained finally is also large, and it is obviously time-consuming and labor-consuming to check the correctness of each information structure manually by a user.
Therefore, in the information identification scheme provided by the embodiment of the invention, the user can be assisted to quickly locate the information structure body with lower reliability, so that the user can focus on the examination of the field content corresponding to the information structure body with lower reliability, and skip the examination of the field content corresponding to the information structure body with higher reliability, thereby improving the efficiency. Therefore, it is necessary to identify at least one information structure included in an image, obtain a confidence level of each information structure, determine label information corresponding to each information structure based on the confidence level, and reflect the confidence level of the corresponding information structure by the label information. Therefore, the user can focus on the character information identification result with lower reliability, and check and correct the character information identification result.
Fig. 4 is a flowchart of another information identification method according to an embodiment of the present invention, as shown in fig. 4, the method includes the following steps:
401. determining at least one information structural body contained in the target object and the corresponding confidence of the at least one information structural body in the image of the target object, wherein each information structural body comprises a field attribute, a field position and field content.
The confidence of the information structure is used for reflecting the accuracy or the confidence of the recognition result of the information structure. For example, if an information structure is [ L1, start station, eastern Guangzhou ], the confidence corresponding to the information structure may reflect the probability that the image of the target object actually contains the information structure.
402. And determining marking information corresponding to the at least one information structural body according to the confidence degree corresponding to the at least one information structural body.
403. And outputting the at least one information structure body and the mark information corresponding to the at least one information structure body.
In this embodiment of the present invention, the image of the target object may be an image including at least one object, the target object may be any one of the at least one object, and each object may include a plurality of fields therein.
For example, in a financial reimbursement scenario, the image of the target object may include at least one card and/or ticket, that is, the at least one object may be at least one card and/or ticket.
In different application scenarios, when multiple objects are contained in the same image, the types of the multiple objects contained in the same image are not always the same.
In practical application, a user can place cards and tickets (such as identity cards, train tickets, taxi tickets, airline tickets and invoices) which need to be used together to take a picture, the picture is provided for relevant workers (such as financial staff), and the relevant workers input the picture into the functional module provided with the information identification method provided by the embodiment of the invention to obtain the output result of the functional module, wherein the output result is each information structure body with the mark information in brief. Further, the related staff locates an information structure with low reliability based on the mark information, and confirms the accuracy of the field content included in the information structure. The above-mentioned functional module can be a local application program (APP) of user terminal, or a functional module in an APP, also can be a service that the high in the clouds provided, and the user calls the interface of calling that this service corresponds, uploads the image to the high in the clouds, and receives the recognition result that the high in the clouds feedbacks: at least one information structure and the confidence corresponding to the at least one information structure.
The following specifically describes a process of determining at least one information structure included in a target object and a corresponding confidence of the at least one information structure, as shown in fig. 5, the process may include the following steps:
501. and identifying the category corresponding to the target object and the position area of the target object in the image through the object detection model.
502. And inputting the target image area intercepted according to the position area into a character recognition model so as to recognize at least one group of character recognition results, the confidence coefficient of the field position and the confidence coefficient of the field content in each group of character recognition results through the character recognition model, wherein each group of character recognition results comprises the field position and the field content.
503. And inputting the target image area and at least one group of character recognition results into a plate type recognition model corresponding to the category of the target object, so as to output at least one information structural body and the confidence coefficient of the corresponding relation between the field attribute and the field content through the plate type recognition model, wherein each information structural body comprises the field attribute, the field position and the field content.
It can be seen that the following three models need to be used in the embodiment of the present invention: an object detection model, a character recognition model, and a plate recognition model.
In practical application, structurally, the three models can be implemented as Neural Network models, such as Convolutional Neural Network (CNN) models; residual Network (ResNet) models, such as ResNet-18, DLA-34 models, and so on.
In summary, the object detection model is used to detect the category and position of each object in its input image, the character recognition model (model based on OCR technology) is used to recognize the position and content of each field in its input image, and the plate recognition model is used to recognize the position and property of each field in its input image.
The plate type recognition model is used for recognizing plate type information contained in the input image, and the plate type information reflects the position of each field in the input image and the corresponding field attribute. For example, cards and tickets such as train tickets, invoices and taxi tickets have relatively fixed plate-type information, and a plate-type identification model corresponding to a certain card or ticket is used for learning the plate-type information of the card or ticket so that the card or ticket has the capability of extracting the plate-type information of the card or ticket.
For example, taking the train ticket illustrated in fig. 6 as an example, the plate-type information corresponding to the train ticket may be field attributes and field positions of a plurality of fields included in the train ticket, in fig. 6, the field positions are represented by black rectangular graphics, and the field attributes are represented by associated attribute tags at each field position, including the originating station, the destination station, the train number, the seat number, the departure date, the ticket price, and the like illustrated in fig. 6. In practical applications, the field position may be expressed in pixel coordinates of four vertices of a rectangular graph, and the coordinate system is shown in fig. 6.
For ease of understanding, the implementation of the embodiment shown in FIG. 5 is described in conjunction with the image illustrated in FIG. 7. In fig. 7, it is assumed that one train ticket and one taxi ticket are included in the input image, and in this case, the target objects in the preamble may be the train ticket and the taxi ticket, respectively. After the image is input into the object detection model, the object detection model identifies the object type and the object position included in the input image, so as to obtain an identification result: the input image comprises two types of objects, one type is a train ticket, the other type is a taxi ticket, the corresponding position area of the train ticket in the input image is an area Q1 shown in figure 7, and the corresponding position area of the taxi ticket in the input image is an area Q2 shown in figure 7.
Then, the train ticket image area and the taxi ticket image area can be respectively intercepted according to the area Q1 and the area Q2, a character recognition model is called to respectively perform character recognition processing on the train ticket image area and the taxi ticket image area, so that field positions and field contents at each field position contained in the train ticket image area can be obtained, field positions and field contents at each field position contained in the taxi ticket image area can be obtained, and the recognition result is shown in fig. 7.
It should be noted that, taking the train ticket image area as an example, in the process of performing text recognition on the train ticket image area, the text recognition model outputs, on one hand, each field position and field content at each field position included in the train ticket image area, and on the other hand, also outputs the confidence level of each field position and the confidence level of each field content. Where the confidence level for a field location reflects the probability that the word at that location is indeed present, and the confidence level for the field content reflects the probability that the word present at the field location is indeed the identified field content.
Thereafter, if the plate type recognition model M1 corresponding to an object such as a train ticket has been trained, the field position and the field content recognized from the train ticket image area by the train ticket image area and the character recognition model are input into the plate type recognition model M1, and the plate type recognition model M1 outputs respective information structural bodies included in the train ticket image area, each information structural body having a composition of: [ field position, field attribute, field content ], the recognition result for the train ticket image area is shown in fig. 7. In addition, the plate recognition model M1 also outputs the confidence level of the correspondence between the field attribute and the field content in each information structure, and the confidence level reflects the probability that the field attribute and the field content in one information structure are matched. In fact, the confidence may be regarded as the confidence corresponding to the output result of the plate-type recognition model M1, and the output result is the information structure.
Assuming that a field position Li and a corresponding field content Si are recognized by the character recognition model after the train ticket image area is input to the character recognition model (only one field is taken as an example, the positions and contents of all the fields are actually recognized), and the field content Si is transmitted and output after the train ticket image area, the field position Li and the field content Si are input to the plate recognition model M1, the plate recognition model M1 recognizes the field position Li' and a corresponding field attribute Ci in the train ticket image area based on the learned plate information of the train ticket, and the confidence corresponding to the recognition result is assumed to be Pa. The plate recognition model M1 may also determine the confidence that the field location Li corresponds to the same field based on the distance of the two locations, say Pb. If the field position Li is found to correspond to the field position Li ', that is, the field position Li' and the field position Li 'correspond to the same field, an information structure body consisting of the field position Li', the field attribute Ci and the field content Si can be obtained, and the corresponding relation between the field attribute Ci and the field content Si is obtained. The confidence of the correspondence may be determined from Pa and Pb, such as the product of the two.
Actually, as shown in fig. 7, a certain field position L1 output by the character recognition model and a field position L1' output by the plate recognition model M1 actually correspond to the same field, and the other field positions are the same. It can be considered that the board recognition model M1 outputs a more accurate field position L1' with the aid of the field position L1 output by the character recognition model. Actually, each field position and the field content corresponding to each field position included in the train ticket image area output by the character recognition model can reflect different field positions in the train ticket image area and the relative position relationship between the different field positions, the semantic relationship between the different field positions can be further reflected by combining the field content corresponding to each field position, the output result of the character recognition model is input into the plate type recognition model M1, and the plate type recognition model M1 can output the plate type information recognized from the train ticket image area more accurately based on the field positions, the relative position relationship between the different field positions, and the semantic information between the field contents at the different field positions: at what position a field of a certain attribute is, what is its field content, i.e. the respective information structure.
Similarly, if the plate type recognition model M2 corresponding to an object such as a taxi ticket has been trained, the image area of the taxi ticket and the field position and the field content recognized by the character recognition model from the image area of the taxi ticket are input to the plate type recognition model M2, and the output result of the plate type recognition model M2 is shown in fig. 7.
Therefore, each object class corresponds to one plate type recognition model.
It is understood that the object detection model is trained to have the capability of recognizing various object classes, and when the object detection model detects objects in all classes included in the input image, if it is found that the class of a certain object X cannot be recognized, it means that the object detection model is not trained for the class in the training stage of the object detection model. When the object detection model, the character recognition model, and the plate recognition model are used as a whole, and when the object detection model cannot recognize the type of the object X, it can be considered that the plate recognition model corresponding to the type of the object X does not exist, and it is necessary to train a plate recognition model corresponding to the type of the object X. The training process of the plate recognition model will be described in other embodiments later.
In the embodiment of the present invention, the object detection model may adopt an existing detection model that can be used to identify a plurality of objects included in one image. The character recognition model may be an existing model that can perform character recognition.
After the at least one information structure included in the target object in the image and the confidence corresponding to each information structure are obtained in the above manner, the labeling information of the at least one information structure may be determined according to the respective confidence corresponding to the at least one information structure. As described above, the confidence level of an information structure may be determined by the confidence level of the field position in the information structure, the confidence level of the field content, and the confidence level of the corresponding relationship between the field attribute and the field content.
It should be noted that, in the working process of various machine learning models and neural network models including the object detection model, the character recognition model and the plate-type recognition model provided in the embodiment of the present invention, a certain result and a confidence corresponding to the result may be output, but in the embodiment of the present invention, in order to finally obtain an accurate information recognition result and improve the efficiency of information auditing and entry, the determination of the marking information is further performed based on the confidence (i.e., the above-mentioned several confidences) output by certain specific models.
Specifically, for any information structure i identified from the image, the confidence P of the information structure i can be determined according to at least one of the following confidence:
the confidence level P1 of the field position in the information structure i, the confidence level P2 of the field content in the information structure i, and the confidence level P3 of the corresponding relation between the field attribute and the field content in the information structure i.
That is, the confidence P of the information structure i can be determined from the above three confidences P1, P2, and P3.
Optionally, the confidence P of the information structure i may be determined according to a preset confidence weight, where the confidence weight includes at least one of: the first weight a1 corresponding to the confidence of the field position, the second weight a2 corresponding to the confidence of the field content, and the third weight a3 corresponding to the confidence of the correspondence between the field attribute and the field content.
Specifically, the confidence P of the information structure i may be determined as: p-a 1P 1+ a 2P 2+ a 3P 3.
In practical applications, the three weights are preset values corresponding to the three confidences, and optionally, the third weight a3 may be set to be greater than or equal to the second weight a2, and the second weight a2 may be greater than or equal to the first weight a 1. Alternatively, the third weight a3 may be set to be greater than the second weight a2, and the third weight a3 may be set to be greater than the first weight a 1. This is done because the later confidence level obtained (i.e., closer to the confidence level of the final output) has a greater impact on the final output result.
In the above example, the three confidence levels P1, P2, and P3 are considered for the confidence level of one information structure i, and actually, some of the three confidence levels may be used.
And after the confidence coefficient of the information structure i is obtained, determining the marking information corresponding to the information structure i according to the confidence coefficient.
In summary, the correspondence between different confidence value ranges and the label information may be preset, and the label information corresponding to the information structure i may be determined according to the value range in which the confidence of the information structure i falls.
For example, optionally, two thresholds may be preset: the device comprises a first preset threshold and a second preset threshold, wherein the second preset threshold is larger than the first preset threshold. From these two thresholds, three confidence value ranges are determined: range less than a first preset threshold: (0, a first preset threshold), a range consisting of the first preset threshold and a second preset threshold: [ first preset threshold, second preset threshold ], range greater than second preset threshold: (second preset range, 1).
Therefore, optionally, for any information structure i, if the confidence of the information structure i is smaller than a first preset threshold, determining that the label information corresponding to the information structure i is the first label information; if the confidence of the information structure i is between the first preset threshold and the second preset threshold, determining the marking information corresponding to the information structure i as second marking information; and if the confidence coefficient of the information structure i is greater than a third preset threshold value, determining the marking information corresponding to the information structure i as third marking information.
Alternatively, the first mark information, the second mark information, and the third mark information may be different colors, for example, the first mark information is green, the second mark information is yellow, and the third mark information is red.
Of course, the representation of the marking information may alternatively take other forms, not limited to colors, such as different figures, symbols, etc.
In the embodiment of the present invention, the tag information corresponding to one information structure is used to reflect the reliability of the information structure, and in short, to reflect whether the identification result of the field attribute, the field content, and the field position included in the information structure is accurate.
The above describes the procedure of specifying the marker information of the information structure i, and based on this specification procedure, the marker information corresponding to each of the plurality of information structures identified from the target image region corresponding to the target object is obtained, and after the marker information corresponding to each of the plurality of information structures is obtained, the marker information corresponding to each of the plurality of information structures and the plurality of information structures is output. In other words, the plurality of information structures are output based on the label information corresponding to each of the plurality of information structures. The mark information may affect the visual characteristics of the information structure, for example, if the mark information corresponding to a certain information structure is red, the display area of the information structure may be rendered into a red background or highlighted in red. In this way, the user can determine the reliability of each information structure according to the color of the information structure, for example, the reliability of red is the lowest, and the reliability of green is the highest. And aiming at the information structure body with low reliability, comparing the original input image, checking the correctness of the field content and the field attribute in the information structure body, and correcting in time when the field content and the field attribute are incorrect.
For ease of understanding, with respect to the train ticket illustrated in fig. 7, the display result of the information structure is exemplarily described with reference to fig. 8.
As shown in fig. 8, an initially input image, i.e., an image including train tickets and taxi tickets, may be optionally displayed in the first display area of the interface. Since the object detection model will obtain the position areas of the objects (in this example, train tickets and taxi tickets) included in the image when detecting the input image, a frame (a bold frame illustrated in fig. 8) and a control controlling the frame may also be displayed in the first display area: the buttons are illustrated with the next and previous typeface. The selection frame is used for framing an object according to the position area of each object output by the object detection model, and the control is used for controlling the movement of the selection frame so that the selection frame is switched among different objects.
And displaying the text information recognition result of the currently framed object in a second display area of the interface: each information structure and label information corresponding to each information structure.
In fig. 8, it is assumed that the current frame is a train ticket, and it is assumed that a plurality of information structures illustrated in fig. 7 are recognized from within the train ticket image area, the relative positional relationship between different information structures may be kept unchanged according to the field positions recorded in the plurality of information structures, and the field attributes and the field contents recorded within each information structure are respectively displayed in the form of key value pairs in the second display area. Meanwhile, according to the mark information corresponding to each information structure, the mark information is added to the display area of the corresponding field attribute and/or field content, and in fig. 8, different mark information is represented by different lines.
As described above, the mark information corresponding to the information structure reflects the reliability of the information contained in the information structure, so that the user can focus on the verification of the information structure with low reliability, and the user can select to trust the recognition result of the model for the information structure with high reliability. In this way, when receiving an editing operation triggered by a user for a target information structure according to the mark information corresponding to each information structure, the editing operation is executed to realize error correction of the target information structure. The target information structure is any one of the plurality of acquired information structures, and the user generally triggers an editing operation for an information structure with low reliability. The editing operation described above is often an operation of correcting the field contents and the field attributes.
In summary, based on the information structures included in the target object obtained by performing the text information recognition processing on the target object in the image and the confidence of each information structure, the marking information can be set for the corresponding information structure according to the confidence of each information structure to reflect the confidence of the information structure, so that the user can quickly locate the information structure which is easy to have recognition errors, and check and correct the information structure in time, and compared with manually checking all the information structures one by one, the efficiency is improved.
Fig. 9 is a flowchart of another information identification method according to an embodiment of the present invention, and as shown in fig. 9, the method may include the following steps:
901. the class corresponding to the target object is not identified in the image by the object detection model.
902. A first training sample corresponding to a target object is obtained.
903. And determining the marking information of the first training sample, and training a plate type recognition model according to the first training sample and the marking information, wherein the first marking information comprises all information structural bodies contained in the first training sample, and each information structural body comprises field attributes, field positions and field contents.
904. And acquiring a second training sample according to the image.
905. And determining labeling information of a second training sample, and training the object detection model according to the second training sample and the labeling information, wherein the second labeling information comprises the category and the position area of each object contained in the second training sample.
In this embodiment, the image refers to an image including at least one object, and the target object may be any one of the at least one object. For example, the at least one object is at least one card and/or ticket.
In practical applications, it is assumed that initially, the object detection model is trained to have the capability of recognizing N kinds of objects, where N is greater than or equal to 1, and during subsequent use, a new requirement may be generated, such as the capability of recognizing some other object if desired.
For example, the object detection model initially has the capability of identifying an identity card and a train ticket, and then needs to identify a taxi ticket, and at this time, the initial object detection model cannot identify the taxi ticket, so that the object detection model needs to be optimized to have the capability of identifying the taxi ticket.
Further, as described above, when the object detection model cannot recognize the type of an object, it is considered that there is no plate recognition model corresponding to the type, and therefore, in order to perform the character information recognition processing on the object of the type, it is necessary to train a plate recognition model corresponding to the type.
Based on this, when an image is input to the object detection model, the object detection model identifies the type and the position area of each object included in the image. The following situations may be encountered during the recognition process: the object detection model is able to identify the location area of an object therein, but is unable to identify the category of the object. At this time, training of a plate recognition model corresponding to the class of the object and optimization training of an object detection model are triggered.
In the training of the plate-type recognition model, it is assumed that an object whose category cannot be recognized by the object detection model in the image is referred to as a target object, and first, it is necessary to obtain a plurality of training sample images corresponding to the category of the target object, which are referred to as first training samples, and then, in order to perform supervised training, it is necessary to label the first training samples with supervision information, which are referred to as first labeling information. Since the plate-type recognition model is used for learning plate-type information of the target object, and the plate-type information may be reflected by field positions and field attributes of a plurality of fields included in the target object, the field positions, field attributes, and field contents of the respective fields included in the first training sample may be included in the first label information.
Where the field position may be represented in the coordinates of the four vertices of a rectangular box surrounding the field. Various field attributes included in the target object may be encoded in advance, and thus, the field attributes may be represented in the encoded result.
In practical applications, the manner of obtaining the first training sample corresponding to the category of the target object is not particularly limited, and may be autonomously collected by the user or generated based on the confrontation network model.
And then, inputting the first training sample and the corresponding first marking information into the plate type recognition model to train the plate type recognition model. In summary, the plate-type recognition model can learn semantic information between different fields based on field attributes and field contents corresponding to different fields marked in the first marking information, and can learn a field position relative relationship between fields with different field attributes by combining field positions of the fields, so that the plate-type recognition model has the capability of recognizing field positions and field attributes of different fields of the target object.
For the optimization training of the object detection model, first, it is necessary to obtain a plurality of training sample images of an object including a target class (i.e., a class of a target object), which are referred to as second training samples, and then, in order to perform supervised training, it is necessary to label the second training samples with supervised information, which are referred to as second labeled information. Since the object detection model is used to identify the category and the position of each object included in the image, the position region and the category of each object included in the second training sample may be included in the second labeling information.
Assuming that the object detection model is not able to identify the class C of the object 1 in the input image X, then the acquisition of the second training sample is: an image of an object including the class C is acquired, but when the object detection model is optimally trained only for the class C, other object classes that the object detection model cannot recognize are not included in the image. Alternatively, assuming that the image X includes objects of class a, class B, and class C, which is a class that the object detection model cannot currently recognize, an image of the object including the objects of class a, class B, and class C may be collected as the second training sample.
After the training of the object detection model and the plate-type recognition model is completed, the subsequently re-input image of the object including the class C can be based on the scheme provided by the foregoing embodiment, and the recognition and display processing of the information structure in the object can be completed.
As described above, the information identification method provided by the present invention can be executed in the cloud, and a plurality of computing nodes may be deployed in the cloud, and each computing node has processing resources such as computation and storage. In the cloud, a plurality of computing nodes may be organized to provide a service, and of course, one computing node may also provide one or more services. The cloud end can provide a service interface to the outside, and the user calls the service interface to use the corresponding service. The service Interface includes Software Development Kit (SDK), Application Programming Interface (API), and other forms.
Aiming at the scheme provided by the embodiment of the invention, the cloud end can be provided with a service interface of the information identification service, which is called a target service interface. When a user needs to identify information of a certain image, the target service interface is called through user equipment, so that a request for calling the target service interface is triggered to the cloud, and the request carries the image of the target object to be identified. The cloud determines the compute nodes that respond to the request, and performs the following steps using processing resources in the compute nodes:
determining at least one information structure body contained in the image and label information corresponding to the at least one information structure body, wherein each information structure body corresponds to a different field in the target object;
and sending the at least one information structural body to user equipment so that the user equipment can display the image in a first display area of the interface and output the at least one information structural body and the mark information corresponding to the at least one information structural body in a second display area.
For the detailed process of the target service interface performing the information identification processing by using the processing resource, reference may be made to the related description in the foregoing other embodiments, which is not described herein again. In addition, it is understood that the object detection model, the character recognition model, and the plate recognition model in the foregoing embodiments may run in one computing node or different computing nodes in the cloud.
For ease of understanding, the description is exemplified in conjunction with fig. 10. In fig. 10, when the user wants to perform information recognition processing on an image of a target object, a target service interface is called in the user device E1 to send a call request including the image of the target object to the cloud computing node E2. In this embodiment, it is assumed that an object detection model, a character recognition model, and a plate recognition model run in the cloud computing node E2, through these network models, the cloud computing node E2 recognizes the respective corresponding mark information of at least one information structure and at least one information structure included in the image, and feeds back the recognition result to the user equipment E1, and the user equipment E1 displays the respective corresponding mark information of the at least one information structure and the at least one information structure on an interface, so that a user operates the information structures based on the mark information. In practical application, the problem of image information identification may be involved in many application fields, and the technical scheme of the embodiment of the invention can be used.
In the reimbursement scene, in order to improve the financial reimbursement efficiency and the persistent storage management of reimbursement information and facilitate reimbursement processing of a user, reimbursement can be performed without carrying reimbursement related cards and bills to financial staff, and reimbursement processing can be assisted and completed based on the scheme provided by the embodiment of the invention.
Fig. 11 is a flowchart of another information identification method according to an embodiment of the present invention, and as shown in fig. 11, the method may include the following steps:
1101. an image of a reimbursement is acquired that includes a card and/or ticket.
1102. And determining at least one information structure body contained in the reimbursement image and marking information corresponding to the at least one information structure body, wherein each information structure body corresponds to a different field in the target object.
1103. And outputting the at least one information structure body and the mark information corresponding to the at least one information structure body respectively so that a user can complete reimbursement processing according to the mark information and the at least one information structure body.
The user who needs reimbursement can put the cards and the bills required for reimbursement together for shooting to obtain reimbursement images containing the cards and the bills, and the reimbursement images are transmitted to financial staff on line. The financial staff calls a service interface provided with an information recognition service to upload the reimbursement image. By identifying the information structures contained in each of the cards and the tickets included in the reimbursement image, the information structures contained in each of the cards and the tickets can be obtained. Each information structure may be composed of a field location, a field attribute, and a field content. Meanwhile, while identifying the information structural bodies, determining the mark information corresponding to each information structural body, displaying each information structural body by the corresponding mark information, so that financial staff can focus each information structural body according to the mark information corresponding to each information structural body, checking the accuracy of the identification result of each information structural body, correcting the information structural body with the identification error, and storing each corrected information structural body. At this time, since each information structure is data in a text format, the financial staff can perform operations such as copying, editing, and the like on the contents of the fields included in each information structure as needed. After each corrected information structure is obtained, the field attribute and the field content included in the information structure may be stored in the reimbursement database in a key-value pair manner according to a set storage policy.
In practical application, paper forms such as tax receipts and customs reports are used in some application scenarios, and these forms are often in a fixed form. Paper forms are not conducive to long-term storage, and therefore, the paper forms need to be digitally converted for long-term storage. The digital conversion is not to say that a paper form is photographed to obtain a corresponding image, because in practical application, there may be a need to count and analyze form data, and in order to support these needs, data content contained in the paper form needs to be stored in a text form. At this time, the method can be implemented by using the information identification scheme provided by the embodiment of the invention.
Fig. 12 is a flowchart of another information identification method according to an embodiment of the present invention, and as shown in fig. 12, the method may include the following steps:
1201. and acquiring a form image.
1202. And determining marking information corresponding to at least one information structural body and at least one information structural body contained in the form image, wherein each information structural body corresponds to different fields in the form image.
1203. And outputting the marking information corresponding to the at least one information structure body and the at least one information structure body respectively so that a user can complete information entry processing of the form image according to the marking information and the at least one information structure body.
The form image can be obtained by shooting a certain form. A form contains a plurality of cells, and each cell can be considered as a field. Thus, one information structure may correspond to one cell.
After the information structure corresponding to each cell is identified from the form image and the label information corresponding to each information structure is determined, each information structure can be displayed according to the label information corresponding to each information structure, and each information structure can include the position, the attribute and the content of each cell.
The user distinguishes different information structures based on the mark information of each information structure, refers to the form image to check each information structure, and corrects the information structure having an error. After that, the corrected cell attributes and cell contents included in each information structure may be stored in the database in the form of key value pairs. Based on this, assuming that a certain form has the attribute of payment amount, based on the storage results of a large number of forms, the user may trigger a statistical operation, such as searching for the payment amount larger than a certain set amount, in the database, so as to obtain the number of forms satisfying the condition that the payment amount is larger than the set amount.
In the field of electronic commerce, there may also be a need for information extraction. For example, when a new product needs to be released by a merchant, the merchant can register the commodity information on the e-commerce platform (i.e., the commodity information is recorded into the commodity database), and a commodity promotion image can also be made to promote the commodity on a commodity interface, so that a consumer can know the commodity more comprehensively and carefully, wherein the commodity promotion image includes the commodity information. When the merchant registers the commodity information, part of the commodity information may be input wrongly, and at the moment, the commodity information is identified through the commodity propaganda image, so that the staff of the e-commerce platform can be helped to check whether the commodity information input by the merchant is wrong or not.
Fig. 13 is a flowchart of another information identification method according to an embodiment of the present invention, and as shown in fig. 13, the method may include the following steps:
1301. an image containing merchandise information is acquired.
1302. And determining the mark information corresponding to at least one information structure body and at least one information structure body contained in the image, wherein each information structure body corresponds to different fields in the image.
1303. And outputting the at least one information structural body and the mark information corresponding to the at least one information structural body respectively, so that a user can check the input commodity information according to the mark information and the at least one information structural body.
In this embodiment, the image containing the product information may be the product advertisement image described above. The commodity information may include a plurality of commodity attributes such as commodity name, model, size, color, price, manufacturer, and the like. At this time, different product attributes are used as different field attributes, and the attribute value of each product attribute is the field content.
After the information structure corresponding to each product attribute is identified from the image and the mark information corresponding to each information structure is determined, each information structure can be displayed according to the mark information corresponding to each information structure, and each information structure comprises each product attribute, an attribute value and the position of each product attribute in the image.
The method comprises the steps that staff of an e-commerce platform distinguishes different information structure bodies based on mark information of the information structure bodies, corresponding commodity information is inquired in a database, the commodity information comprises commodity attributes and attribute values of the commodity attributes, which are recorded into the database, the commodity information is compared with commodity information revealed in the information structure bodies, whether the commodity information recorded into the database is correct or not is determined, and correction processing is carried out on the recorded incorrect commodity attributes and attribute values.
In the medical field, a large number of electronic medical record images can be generated, and when an organization needs to perform statistics and analysis on medical records, some key information contained in the medical record images can be extracted based on the assistance of the information identification scheme provided by the embodiment of the invention, so that the medical record images can be managed and analyzed in depth based on the key information.
Fig. 14 is a flowchart of another information identification method according to an embodiment of the present invention, and as shown in fig. 14, the method may include the following steps:
1401. acquiring medical record images.
1402. And determining the marking information corresponding to at least one information structure body and at least one information structure body contained in the medical record image, wherein each information structure body corresponds to different fields in the medical record image.
1403. And outputting the marking information corresponding to the at least one information structure body and the at least one information structure body respectively, so that a user can screen medical record images meeting the requirements according to the marking information and the at least one information structure body.
The medical record image may be an image obtained by photographing a medical record having a fixed plate type. The fields contained in the medical record image can comprise a plurality of fields corresponding to basic information of a user and a plurality of fields related to diagnosis.
By identifying the medical record image, the information structure corresponding to each field and the mark information corresponding to each information structure contained in the medical record image can be obtained. Different information structures are distinguished based on the label information of each information structure, and the information structure with the identification error is corrected by referring to the medical record image.
The information storage processing can be performed on each corrected information structure. For example, the information included in the medical record image is stored in a database, which may be a relational database, in a key-value pair form according to the field attribute and the field content included in each information structure.
Then, the database is subjected to data query processing according to needs, so that the statistical analysis requirements of various diseases can be met. For example, by using the visit time and a certain disease as query keywords (the medical record images include a field corresponding to the visit time and a field corresponding to the disease type), the medical record images corresponding to the disease generated within a set period of time can be queried to view the medical record images.
In the education scene, the teacher may use demonstration tools such as blackboard writing and PPT in the course of giving lessons, and students can shoot the demonstration tools to obtain the teaching image, and when the students shoot a large amount of teaching images, the requirements of sorting and searching a large amount of teaching images according to requirements are faced to follow-up needs.
Fig. 15 is a flowchart of another information identification method according to an embodiment of the present invention, and as shown in fig. 15, the method may include the following steps:
1501. and acquiring a teaching image.
1502. And determining at least one information structural body contained in the teaching image and marking information corresponding to the at least one information structural body, wherein each information structural body corresponds to different fields in the teaching image.
1503. And outputting the marking information corresponding to the at least one information structure body and the at least one information structure body respectively, so that a user can screen out the teaching image meeting the requirement according to the marking information and the at least one information structure body.
A teacher may have his own slate writing habit or PPT editing habit, resulting in a more fixed slate feature. The teaching image may include various knowledge points embodied in tree-like relationship, such as two subtitles juxtaposed under a certain headline. For example, the main title is "trigonometric function", and the sub-titles include "sine", "cosine", "tangent", etc. In this embodiment, different fields in the teaching image may correspond to different knowledge point titles.
One information structure body can comprise field attributes, field positions and field contents. In this embodiment, the field attribute refers to a category corresponding to the knowledge point, the field content refers to a name of the knowledge point, and the field position refers to a pixel position corresponding to a certain knowledge point in the teaching image.
After the information identification scheme obtains the mark information corresponding to at least one information structure body and at least one information structure body contained in a certain teaching image, the corresponding mark information can be used for displaying each information structure body, so that students can focus on what the knowledge points contained in the current teaching image are, and the teaching image required by the students is screened out for viewing.
The application scenarios to which the information identification scheme provided in the embodiment of the present invention can be applied are illustrated above only by taking several application fields as examples, and in fact, the present invention is not limited thereto.
The information identification device of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these information recognizing means can be constructed by configuring the steps taught in the present embodiment using commercially available hardware components.
Fig. 16 is a schematic structural diagram of an information identification apparatus according to an embodiment of the present invention, and as shown in fig. 16, the apparatus includes: display module 11, detection module 12.
And the display module 11 is used for displaying the image of the target object in the first display area of the interface.
A detection module 12, configured to determine at least one information structure included in the image and label information corresponding to the at least one information structure, where each information structure corresponds to a different field in the target object.
The display module 11 is further configured to output, in a second display area of the interface, the at least one information structure and the mark information corresponding to the at least one information structure.
Optionally, the target object comprises at least one card and/or ticket.
Optionally, the apparatus further comprises: and the interaction module is used for responding to the correction operation input by the user aiming at the target information structure body according to the marking information corresponding to the at least one information structure body, and executing the correction operation.
Optionally, the detection module 12 may be specifically configured to: determining the confidence degree corresponding to each of the at least one information structure; and determining marking information corresponding to the at least one information structure body according to the confidence degree corresponding to the at least one information structure body.
Each information structure includes field attributes, field locations, and field contents. Thus, optionally, the detection module 12 may be specifically configured to: for any information structure body in the at least one information structure body, determining the confidence level of the any information structure body according to at least one confidence level as follows: the confidence of the field position in any information structure body, the confidence of the field content in any information structure body, and the confidence of the corresponding relation between the field attribute and the field content in any information structure body.
Optionally, the detection module 12 may be specifically configured to: determining the confidence coefficient of any information structure body according to a preset confidence coefficient weight; wherein the confidence weight comprises at least one of: the first weight corresponding to the confidence coefficient of the field position, the second weight corresponding to the confidence coefficient of the field content and the third weight corresponding to the confidence coefficient of the corresponding relation between the field attribute and the field content.
Optionally, the third weight is greater than or equal to the second weight, which is greater than or equal to the first weight.
Optionally, the detection module 12 may be specifically configured to: for any information structure body in the at least one information structure body, if the confidence coefficient of the information structure body is smaller than a first preset threshold value, determining that the mark information corresponding to the information structure body is first mark information; if the confidence of any information structure body is between a first preset threshold and a second preset threshold, determining that the marking information corresponding to any information structure body is second marking information; and if the confidence coefficient of any information structure body is greater than a second preset threshold value, determining that the marking information corresponding to any information structure body is third marking information.
Optionally, the first mark information, the second mark information and the third mark information are different colors.
Optionally, the detection module 12 may be specifically configured to: identifying a category corresponding to the target object and a position area of the target object in the image through an object detection model; inputting the target image area extracted according to the position area into a character recognition model so as to recognize at least one group of character recognition results through the character recognition model, wherein each group of character recognition results comprises field positions and field contents; and inputting the target image area and the at least one group of character recognition results into a plate type recognition model corresponding to the category, so as to output at least one information structural body through the plate type recognition model, wherein each information structural body comprises field attributes, field positions and field contents.
Optionally, the confidence of the field position and the confidence of the field content are output by the word recognition model, and the confidence of the correspondence between the field attribute and the field content is output by the plate-type recognition model.
Optionally, the apparatus further comprises: a first training module, configured to obtain a first training sample corresponding to the target object if the class corresponding to the target object is not identified in the image through the object detection model; determining first labeling information of the first training sample, wherein the first labeling information comprises information structural bodies contained in the first training sample, and each information structural body comprises a field attribute, a field position and field content; and training the plate-type recognition model according to the first training sample and the first marking information.
Optionally, the apparatus further comprises: the second training module is used for acquiring a second training sample according to the image if the class corresponding to the target object is not identified in the image through the object detection model; determining second labeling information of the second training sample, wherein the second labeling information comprises the category and the position area of each object contained in the second training sample; and training the object detection model according to the second training sample and the second marking information.
The apparatus shown in fig. 16 may perform the information identification method provided in the foregoing embodiment, and the detailed implementation process and technical effect refer to the description in the foregoing embodiment, which is not described herein again.
In one possible design, the structure of the information recognition apparatus shown in fig. 16 may be implemented as an electronic device, as shown in fig. 17, which may include: processor 21, memory 22, display 23. Wherein the memory 22 has stored thereon executable code which, when executed by the processor 21, makes the processor 21 at least to implement the information identification method as provided in the previous embodiments.
Optionally, the electronic device may further include a communication interface 24 for communicating with other devices.
In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to implement at least the information identification method as provided in the foregoing embodiments.
The above-described apparatus embodiments are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.
The information identification method provided in the embodiment of the present invention may be executed by a certain program/software, the program/software may be provided by a network side, the electronic device mentioned in the foregoing embodiment may download the program/software into a local nonvolatile storage medium, and when it needs to execute the information identification method, the program/software is read into a memory by a CPU, and then the CPU executes the program/software to implement the information identification method provided in the foregoing embodiment, and the execution process may refer to the schematic in fig. 1 to fig. 15.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (15)

1. An information identification method, comprising:
displaying an image of a target object in a first display area of an interface;
determining at least one information structure body contained in the image and mark information corresponding to the at least one information structure body, wherein each information structure body corresponds to a different field in the target object;
and outputting the at least one information structural body and mark information corresponding to the at least one information structural body in a second display area of the interface.
2. The method of claim 1, further comprising:
and responding to the correction operation input by the user for the target information structure body according to the marking information corresponding to the at least one information structure body, and executing the correction operation.
3. The method of claim 1, wherein determining label information corresponding to each of the at least one information structure comprises:
determining the confidence degree corresponding to each of the at least one information structure;
and determining marking information corresponding to the at least one information structure body according to the confidence degree corresponding to the at least one information structure body.
4. The method of claim 3, wherein each information structure includes a field attribute, a field location, and a field content, and determining the confidence level of each of the at least one information structure comprises:
for any information structure body in the at least one information structure body, determining the confidence level of the any information structure body according to at least one confidence level as follows:
the confidence of the field position in any information structure body, the confidence of the field content in any information structure body, and the confidence of the corresponding relation between the field attribute and the field content in any information structure body.
5. The method of claim 4, wherein determining the confidence level of any of the information structures comprises:
determining the confidence of any information structure body according to a preset confidence weighting;
wherein the confidence weight comprises at least one of: the first weight corresponding to the confidence coefficient of the field position, the second weight corresponding to the confidence coefficient of the field content and the third weight corresponding to the confidence coefficient of the corresponding relation between the field attribute and the field content.
6. The method of claim 5, wherein the third weight is greater than or equal to the second weight, and wherein the second weight is greater than or equal to the first weight.
7. The method according to claim 3, wherein the determining the labeling information corresponding to each of the at least one information structure according to the confidence corresponding to each of the at least one information structure comprises:
for any information structure body in the at least one information structure body, if the confidence coefficient of the information structure body is smaller than a first preset threshold value, determining that the mark information corresponding to the information structure body is first mark information;
if the confidence of any information structure body is between a first preset threshold and a second preset threshold, determining that the marking information corresponding to any information structure body is second marking information;
and if the confidence of any information structure body is greater than a second preset threshold value, determining that the marking information corresponding to any information structure body is third marking information.
8. The method of claim 1, wherein the determining at least one information structure included in the image comprises:
identifying a category corresponding to the target object and a position area of the target object in the image through an object detection model;
inputting the target image area intercepted according to the position area into a character recognition model so as to recognize at least one group of character recognition results through the character recognition model, wherein each group of character recognition results comprises field positions and field contents;
and inputting the target image area and the at least one group of character recognition results into a plate type recognition model corresponding to the category, so as to output the at least one information structural body through the plate type recognition model, wherein each information structural body comprises field attributes, field positions and field contents.
9. The method of claim 8, further comprising:
if the category corresponding to the target object is not identified in the image through the object detection model, acquiring a first training sample corresponding to the target object;
determining first labeling information of the first training sample, wherein the first labeling information comprises information structural bodies contained in the first training sample, and each information structural body comprises a field attribute, a field position and field content;
and training the plate-type recognition model according to the first training sample and the first marking information.
10. The method of claim 1, wherein the target object is at least one card and/or ticket.
11. An information identifying apparatus, comprising:
the display module is used for displaying the image of the target object in a first display area of the interface;
a detection module, configured to determine at least one information structure included in the image and label information corresponding to the at least one information structure, where each information structure corresponds to a different field in the target object;
the display module is further configured to output, in a second display area of the interface, the at least one information structure and the mark information corresponding to the at least one information structure.
12. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to carry out the information identification method according to any one of claims 1 to 10.
13. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the information identification method of any one of claims 1 to 10.
14. An information identification method, comprising:
receiving a request for calling a target service interface by user equipment, wherein the request comprises an image of a target object;
executing the following steps by utilizing the processing resource corresponding to the target service interface:
determining at least one information structure body contained in the image and mark information corresponding to the at least one information structure body, wherein each information structure body corresponds to different fields in the target object;
and sending the at least one information structure body to the user equipment so that the user equipment displays the image in a first display area of an interface and outputs the at least one information structure body and the mark information corresponding to the at least one information structure body in a second display area.
15. An information identification method, comprising:
acquiring a reimbursement image containing a card and/or a bill;
determining at least one information structure contained in the reimbursement image and mark information corresponding to the at least one information structure, wherein each information structure corresponds to different fields in the reimbursement image;
and outputting the at least one information structure body and the mark information corresponding to the at least one information structure body respectively so that a user can complete reimbursement processing according to the mark information and the at least one information structure body.
CN202110215064.3A 2021-02-25 2021-02-25 Information identification method, device, equipment and storage medium Pending CN114973221A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110215064.3A CN114973221A (en) 2021-02-25 2021-02-25 Information identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110215064.3A CN114973221A (en) 2021-02-25 2021-02-25 Information identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114973221A true CN114973221A (en) 2022-08-30

Family

ID=82973711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110215064.3A Pending CN114973221A (en) 2021-02-25 2021-02-25 Information identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114973221A (en)

Similar Documents

Publication Publication Date Title
CN111931664B (en) Mixed-pasting bill image processing method and device, computer equipment and storage medium
CN109887153B (en) Finance and tax processing method and system
US10395108B1 (en) Automatically identifying and interacting with hierarchically arranged elements
US20190294921A1 (en) Field identification in an image using artificial intelligence
CN110175609B (en) Interface element detection method, device and equipment
US20190294912A1 (en) Image processing device, image processing method, and image processing program
CN111428599A (en) Bill identification method, device and equipment
CN111652232A (en) Bill identification method and device, electronic equipment and computer readable storage medium
EP3944145A2 (en) Method and device for training image recognition model, equipment and medium
WO2022247823A1 (en) Image detection method, and device and storage medium
CN112560855B (en) Image information extraction method and device, electronic equipment and storage medium
US11080808B2 (en) Automatically attaching optical character recognition data to images
CN111966600B (en) Webpage testing method, webpage testing device, computer equipment and computer readable storage medium
US20210390251A1 (en) Automatic generation of form application
CN112613367A (en) Bill information text box acquisition method, system, equipment and storage medium
CN111881900A (en) Corpus generation, translation model training and translation method, apparatus, device and medium
CN111462388A (en) Bill inspection method and device, terminal equipment and storage medium
CN113111829B (en) Method and device for identifying document
CN114973221A (en) Information identification method, device, equipment and storage medium
CN114067343A (en) Data set construction method, model training method and corresponding device
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN114049686A (en) Signature recognition model training method and device and electronic equipment
CN112287828A (en) Financial statement generation method and device based on machine learning
CN110827261A (en) Image quality detection method and device, storage medium and electronic equipment
KR102276491B1 (en) Reagent information collecting method and device by image analysis using reagent bottle features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination