CN114937158A - Image analysis method and related device - Google Patents

Image analysis method and related device Download PDF

Info

Publication number
CN114937158A
CN114937158A CN202210705841.7A CN202210705841A CN114937158A CN 114937158 A CN114937158 A CN 114937158A CN 202210705841 A CN202210705841 A CN 202210705841A CN 114937158 A CN114937158 A CN 114937158A
Authority
CN
China
Prior art keywords
image
data
template
target
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210705841.7A
Other languages
Chinese (zh)
Inventor
卜丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202210705841.7A priority Critical patent/CN114937158A/en
Publication of CN114937158A publication Critical patent/CN114937158A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image analysis method and a related device, and relates to the technical field of image processing. In the method, a target image template matched with the image type of an acquired image to be analyzed is screened out from a preset candidate image template set; then, at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in a reference coordinate system are obtained; further, based on the obtained at least one original position information, at least one preselected data name in each target data name contained in the target image template and expected position information of each preselected data name in a reference coordinate system, obtaining a corresponding position mapping relation, and based on the position mapping relation, performing position adjustment on the image to be analyzed to obtain an adjusted image to be analyzed; and finally, identifying each original data name from the adjusted image to be analyzed. By adopting the mode, the accuracy of image analysis is improved.

Description

Image analysis method and related device
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image analysis method and a related apparatus.
Background
With the rapid development of information technology, images are widely applied to data display and bearing, so that the analysis of the images is beneficial to the acquisition of data information.
At present, text recognition is an important development direction in the technical field of computer vision, and can realize the analysis of images so as to acquire data information carried by the images.
For example, in a credit voucher approval scene, an image carrying credit data of a target object is analyzed through a text recognition technology, that is, a fixed text box is adopted, and a data name of the credit data is determined by removing a frame, so that the credit data corresponding to the data name is acquired according to the recognized data name; and then, judging whether the obtained credit data of the target object is matched with the credit voucher requirement or not, thereby obtaining the conclusion whether the target object meets the credit voucher requirement or not.
However, with the above image analysis method, if the data name of the credit data is not completely included in the fixed text box, that is, the acquisition range corresponding to the fixed text box does not completely overlap with the position range corresponding to the data name, the data name of the credit data cannot be completely acquired, and the credit data corresponding to the data name cannot be accurately acquired according to the identified data name.
Therefore, the accuracy of image analysis cannot be improved by the above method.
Disclosure of Invention
The embodiment of the application provides an image analysis method and a related device, which are used for improving the accuracy of image analysis.
In a first aspect, an embodiment of the present application provides an image parsing method, where the method includes:
acquiring an image to be analyzed, and screening out a target image template matched with the image type of the image to be analyzed from a preset candidate image template set; wherein the target image template is used to indicate: the expected position of each of at least one data name contained in the image in a reference coordinate system;
carrying out image recognition on an image to be analyzed to obtain at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in a reference coordinate system;
based on the obtained at least one original position information and at least one preselected data name in all target data names contained in the target image template, and the expected position information of each in a reference coordinate system, obtaining a corresponding position mapping relation, and based on the position mapping relation, performing position adjustment on the image to be analyzed to obtain an adjusted image to be analyzed;
and identifying each original data name from the adjusted image to be analyzed based on each target data name in the corresponding target image template and the text identification box which is respectively arranged.
In a second aspect, an embodiment of the present application further provides an image analysis apparatus, where the apparatus includes:
the screening module is used for acquiring an image to be analyzed and screening out a target image template matched with the image type of the image to be analyzed from a preset candidate image template set; wherein the target image template is used to indicate: the expected position of each of at least one data name contained in the image in a reference coordinate system;
the acquisition module is used for carrying out image identification on the image to be analyzed to obtain at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in a reference coordinate system;
the adjusting module is used for obtaining a corresponding position mapping relation based on the obtained at least one original position information, at least one preselected data name in each target data name contained in the target image template and expected position information of each preselected data name in a reference coordinate system, and performing position adjustment on the image to be analyzed based on the position mapping relation to obtain an adjusted image to be analyzed;
and the identification module is used for identifying each original data name from the adjusted image to be analyzed based on each target data name in the corresponding target image template and the text identification box which is respectively arranged.
In a possible embodiment, before acquiring the image to be analyzed, the screening module is further configured to:
the following operations are performed for each image type, respectively:
selecting candidate template configuration information corresponding to an image type from a preset template configuration information set; wherein the candidate template configuration information at least comprises: each data name corresponding to one image type and the expected position of each data name in a reference coordinate system;
generating a corresponding candidate image template aiming at one image type based on each data name and each expected position thereof contained in the candidate template configuration information;
and storing the obtained candidate image template into a preset candidate image template set.
In a possible embodiment, when a target image template matching with an image type of an image to be analyzed is screened out from a preset candidate image template set, the screening module is specifically configured to:
acquiring a forward selection keyword and a reverse selection keyword which are associated with the image type of the image to be analyzed; wherein, the forward selection keyword representation: and (3) meeting a preset correlation constraint condition with the image to be analyzed, and reversely selecting a keyword for representation: the image to be analyzed and the image to be analyzed meet preset difference constraint conditions;
screening at least one candidate image template which meets the correlation constraint condition corresponding to the forward selection keyword from the image template set, and screening an alternative target template which meets the difference constraint condition corresponding to the reverse selection keyword from the obtained at least one candidate image template;
and taking the alternative target template as a target image template.
In a possible embodiment, when screening out, from the obtained at least one candidate image template, an alternative target template that satisfies a difference constraint condition corresponding to a reversely selected keyword, the screening module is specifically configured to:
if a plurality of candidate target templates meeting the difference constraint condition corresponding to the reversely selected keyword are screened out from at least one candidate image template, the number of data names contained in each candidate target template is respectively obtained;
obtaining the quantity arrangement sequence of each alternative target template based on the quantity of the data names contained in each alternative target template;
and reserving the alternative target templates meeting the preset quantity arrangement order condition.
In a possible embodiment, after identifying the respective original data names, the identification module is further configured to:
for each original data name, the following operations are respectively executed:
determining a target data name corresponding to an original data name from all target data names;
acquiring the names of the sub-categories and the sub-category data corresponding to the names of the target data, which are arranged in the data framing range of the reference coordinate system;
based on each subcategory name and subcategory data corresponding to the subcategory name, obtaining an output arrangement sequence of each subcategory name and subcategory data corresponding to the subcategory name respectively in a subcategory position within a data framing range;
and outputting corresponding subcategory names and subcategory data thereof in sequence according to the obtained output arrangement sequence.
In one possible embodiment, after obtaining that the corresponding target data name is set within the framing range of the reference coordinate system data, each sub-category name and its respective corresponding sub-category data, and before based on each sub-category name and its respective corresponding sub-category data, the identification module is further configured to:
aiming at each subcategory name and each corresponding subcategory data thereof, the following operations are respectively executed:
acquiring a subcategory name and subcategory data thereof, and acquiring the overlapping area of a data display range and a data framing range in a reference coordinate system;
obtaining a first coincidence degree based on the data framing range and the overlapping area, and obtaining a second coincidence degree based on the data display range and the overlapping area;
and when the coincidence degree which is not less than the preset coincidence degree threshold value exists in the first coincidence degree and the second coincidence degree, retaining one subcategory name and subcategory data thereof.
In a third aspect, an electronic device is proposed, which comprises a processor and a memory, wherein the memory stores program code, which, when executed by the processor, causes the processor to perform the steps of the image parsing method of the first aspect.
In a fourth aspect, a computer-readable storage medium is proposed, which comprises program code for causing an electronic device to perform the steps of the image analysis method of the first aspect when the program code runs on the electronic device.
In a fifth aspect, a computer program product is provided, which, when invoked by a computer, causes the computer to perform the image parsing method steps as described in the first aspect.
The beneficial effect of this application is as follows:
in the image analysis method provided in the embodiment of the present application, a target image template matched with an image type of an acquired image to be analyzed is screened out from a preset candidate image template set, where the target image template is used to indicate: the expected position of each of at least one data name contained in the image in a reference coordinate system; then, carrying out image recognition on the image to be analyzed to obtain at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in a reference coordinate system; further, based on the obtained at least one original position information and at least one pre-selected data name in each target data name contained in the target image template and the expected position information of each target data name in the reference coordinate system, obtaining a corresponding position mapping relation, and based on the position mapping relation, performing position adjustment on the image to be analyzed to obtain an adjusted image to be analyzed; and finally, identifying each original data name from the adjusted image to be analyzed based on each target data name in the corresponding target image template and the text identification box which is respectively arranged.
In this way, according to at least one pre-selected data name in each original data name contained in the image to be analyzed, the original position information of each original data name in the reference coordinate system, and at least one pre-selected data name among the target data names contained in the target image template, and expected position information of each in the reference coordinate system, obtaining a corresponding position mapping relation, thereby obtaining a position mapping relation based on the position mapping relation, the position of the image to be analyzed is adjusted, thereby avoiding the problem that in the prior art, the data name is not completely included in the fixed text box, that is, the acquisition range corresponding to the fixed text box does not completely overlap with the location range corresponding to the data name, which results in that the data name cannot be completely acquired, further, the technical disadvantage that the data information corresponding to the data name cannot be accurately acquired from the identified data name is overcome, and therefore, the accuracy of image analysis is improved.
Furthermore, other features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. In the drawings:
FIG. 1 is a schematic diagram illustrating an effect of OCR recognition provided by an embodiment of the present application;
FIG. 2 illustrates an alternative schematic diagram of a system architecture to which embodiments of the present application are applicable;
FIG. 3 is a flowchart illustrating a method for obtaining a candidate image template according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a structure of template configuration information provided by an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a structure of a candidate image template provided by an embodiment of the present application;
fig. 6 schematically illustrates a method flow of an image parsing method provided in an embodiment of the present application;
FIG. 7 is a logic diagram for acquiring a target image template according to an embodiment of the present application;
FIG. 8 is a flowchart illustrating a method for obtaining a target image template according to an embodiment of the present application;
FIG. 9 is a logic diagram for obtaining an alternative target template according to an embodiment of the present application;
fig. 10 schematically illustrates a specific application scenario for obtaining an alternative target template according to an embodiment of the present application;
fig. 11 is a schematic diagram illustrating another specific application scenario for acquiring a target image template according to an embodiment of the present application;
fig. 12 is a schematic diagram illustrating a specific application scenario based on fig. 6 according to an embodiment of the present application;
fig. 13 is a schematic flowchart illustrating a method for outputting a sub-category name and sub-category data thereof according to an embodiment of the present application;
FIG. 14 is a logic diagram for acquiring a sub-category name and sub-category data thereof according to an embodiment of the present application;
FIG. 15 is a diagram illustrating an exemplary logical decision of whether to retain a subcategory name and subcategory data thereof according to an embodiment of the present disclosure;
fig. 16 is a schematic diagram illustrating a specific application scenario in which the output arrangement order of each sub-category name and its corresponding sub-category data is provided in an embodiment of the present application;
fig. 17 is a schematic structural diagram schematically illustrating an image analysis apparatus according to an embodiment of the present application;
fig. 18 schematically shows a structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.
It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or order.
In addition, in the technical scheme of the application, the data acquisition, transmission, use and the like all meet the requirements of relevant national laws and regulations.
Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
(1) Optical Character Recognition (English: Optical Character Recognition, abbreviation: OCR): refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a character recognition method; that is, for print characters, the technology of converting characters in a paper document into an image file of a black-and-white dot matrix in an optical mode, converting characters in the image into a text format through recognition software, and further editing and processing the text format by the character processing software includes the technologies of adjusting the rotation angle of the image, recognizing characters in the image, table lines and the like.
(2) Intelligent Character Recognition (English: Intelligent Character Recognition, abbreviation: ICR): based on OCR, artificial intelligence technology of computer deep learning is implanted. And semantic reasoning and semantic analysis are adopted, and for characters which are not recognized by the OCR, the character information of the unrecognized part can be completed according to the character context statement information and by combining with a knowledge base of the semantic network of the EAI. In the semantic reasoning process, only when the variable nodes in the unrecognized character context fragments are completely and successfully matched with the sentences in the semantic knowledge base, the matching result is output, so that the defects of the OCR technology are overcome.
(3) Natural Language Processing (english: Natural Language Processing, abbreviation: NLP): the method is an important direction in the fields of computer science and artificial intelligence, can realize various theories and methods for effectively communicating between people and computers by using natural language, and generally refers to a technology for processing natural language and understanding content, including word segmentation, part of speech tagging, named entity recognition, semantic classification, similarity matching and the like.
(4) Affine transformation: in geometry, one vector space is linearly transformed and then translated into another vector space, and the transformation mainly comprises the following steps: translation transformation, rotation transformation, scaling transformation (also called scale transformation), tilt transformation (also called shear transformation, offset transformation), flip transformation.
(5) Hypertext markup Language or Hypertext markup Language (English: Hyper Text Mark-up Language, abbreviation: HTML) document: can be read by various web browsers to generate documents of web pages transmitting various information. Essentially, the Internet (Internet) is a collection of a series of transmission protocols and various documents, and HTML documents are only one of them, and these HTML documents are stored on server hard disks distributed all over the world, and the information and information transmitted by these documents can be remotely obtained by users through the transmission protocols.
(6) Span tag: for the layout tag commonly used in HTML, when a Span tag is usually used, a line is not changed, that is, a continuous Span tag is usually shown in the same line, and for the convenience of understanding and description, each data name mentioned herein is a Span tag.
(7) Artificial Intelligence (English: Artificial Intelligence, abbreviation: AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.
The following briefly introduces the design concept of the embodiments of the present application:
at present, text recognition (for example, OCR technology) is an important development direction in the field of computer vision technology, and can analyze an image, thereby obtaining data information carried by the image.
For example, in a link of credit voucher approval, semantic consistency and character consistency of document elements usually take a lot of time and energy, so that OCR in artificial intelligence is used to complete character extraction, and then NLP is used to realize parsing and mapping of semantic elements, which is a main way for enabling AI to realize intelligent credit voucher approval.
OCR finishes extracting image characters to obtain a file in an HTML format, the file is generally presented in a character block form, and the attribute of the file comprises character content, coordinates in the image and character size; the HRML file is an input for subsequent document parsing, and as shown in fig. 1, the HRML file has the following characteristics:
(1) continuous text information is usually contained in the same Span label, but multiple lines of entities are usually divided into different Span labels, namely multiple lines of single entities; a plurality of different elements, or keys and values, which are closer to each other may not be split in one Span tag;
(2) the OCR recognition error is directly transferred to the parsing step, for example, "GOODS" is recognized as "6 OODS" because the watermark image is recognized.
Therefore, in the prior art, three image analysis methods, namely rule analysis, OCR slicing and deep learning, are generally included, wherein each image analysis method has the following problems:
1. rule analysis is usually influenced by the identification of ICR, and the identification result is easily interfered, so that the value search and identification in the subsequent steps are influenced; in addition, the number of elements to be recognized in an image is usually large, the key values of the elements are usually inconsistent, some elements are laid out left and right, and some elements are top and bottom, and the left and right relationship may be changed into an actual top and bottom relationship due to post-printing reasons.
2. The OCR slicing technology can cover scenes with independent elements and a style of table lines, but cannot be applied to scenes without table lines, after-printed cross table lines and the like; in addition, the error of table line identification and recovery is also transmitted to the subsequent resolving step.
3. The image analysis technology based on the deep learning method can be theoretically applied to any image analysis scene, however, in the specific implementation process, the more complex the image is, the higher the requirement on the labeled sample size is. An image may contain more than 80 elements, some of which are common elements, and the rest of which are long-tailed elements, and the method based on the deep learning model has a high accuracy in identifying the common elements and a low identification rate for the long-tailed elements. In a multi-row single-entity scenario, such as a company name entity, since a single element is split into multiple rows, the key semantic information may be concentrated in a certain row (such as a keyword like "co. In addition, documents of different systems have large differences, and various systems require more samples to complete the learning of element expression forms.
In view of this, in the embodiment of the present application, based on the characteristics of the fixity of the representation form of the image, the difference of the images of different types, the low frequency of the fixed form change of the image category, and the like, an image analysis method that does not need data annotation and is easy to configure and maintain is provided, which specifically includes: acquiring an image to be analyzed, and screening out a target image template matched with the image type of the image to be analyzed from a preset candidate image template set; wherein the target image template is used to indicate: the expected position of each of at least one data name contained in the image in a reference coordinate system; then, carrying out image recognition on the image to be analyzed to obtain at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in a reference coordinate system; further, based on the obtained at least one original position information, at least one preselected data name in each target data name contained in the target image template and expected position information of each preselected data name in a reference coordinate system, obtaining a corresponding position mapping relation, and based on the position mapping relation, performing position adjustment on the image to be analyzed to obtain an adjusted image to be analyzed; and finally, recognizing each original data name from the adjusted image to be analyzed based on each target data name in the corresponding target image template and the text recognition box which is respectively arranged.
In particular, the preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it should be understood that the preferred embodiments described herein are only for illustrating and explaining the present application, and are not intended to limit the present application, and that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
Referring to fig. 2, a schematic diagram of a system architecture provided in the present embodiment is shown, where the system architecture includes: a target terminal (201a, 101b) and a server 202. The target terminals (201a, 201b) and the server 202 can perform information interaction through a communication network, wherein the communication mode adopted by the communication network can include: wireless communication and wired communication.
Illustratively, the target terminal (201a, 201b) may communicate with the server 202 by accessing a network via a cellular Mobile communication technology, e.g., including a 5th Generation Mobile Networks (5G) technology.
Optionally, the target terminal (201a, 201b) may access the network via a short-range Wireless communication mode, for example, including Wireless Fidelity (Wi-Fi) technology, to communicate with the server 202.
In the embodiment of the present application, the number of the above-mentioned devices is not limited at all, and as shown in fig. 2, the target terminals (201a, 201b) and the server 202 are only taken as an example for description, and the above-mentioned devices and their respective functions are briefly described below.
A target terminal (201a, 201b) is a device that can provide voice and/or data connectivity to a user, comprising: a hand-held terminal device, a vehicle-mounted terminal device, etc. having a wireless connection function.
Illustratively, the target terminals (201a, 201b) include, but are not limited to: the Mobile terminal Device comprises a Mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable Device, a Virtual Reality (VR) Device, an Augmented Reality (AR) Device, a wireless terminal Device in industrial control, a wireless terminal Device in unmanned driving, a wireless terminal Device in a smart grid, a wireless terminal Device in transportation safety, a wireless terminal Device in a smart city, a wireless terminal Device in a smart home, and the like.
Furthermore, the target terminals (201a, 201b) may have associated clients installed thereon, and the clients may be software (e.g., APP, browser, short video software, etc.), or may be web pages, applets, etc. In the embodiment of the application, the acquired image to be analyzed can be sent to the server 202 by the target terminal (201a, 201 b).
The server 202 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
It is worth proposing that, in the embodiment of the present application, the server 202 is configured to obtain an image to be analyzed, and screen out, from a preset candidate image template set, a target image template matching an image type of the image to be analyzed; wherein the target image template is used to indicate: the expected position of each of at least one data name contained in the image in a reference coordinate system; then, carrying out image recognition on the image to be analyzed to obtain at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in a reference coordinate system; further, based on the obtained at least one original position information and at least one pre-selected data name in each target data name contained in the target image template and the expected position information of each target data name in the reference coordinate system, obtaining a corresponding position mapping relation, and based on the position mapping relation, performing position adjustment on the image to be analyzed to obtain an adjusted image to be analyzed; and finally, identifying each original data name from the adjusted image to be analyzed based on each target data name in the corresponding target image template and the text identification box which is respectively arranged.
The image analysis method provided by the exemplary embodiment of the present application is described below in conjunction with the system architecture described above and with reference to the drawings, it should be noted that the system architecture described above is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiment of the present application is not limited in any way in this respect.
Before implementing the image analysis method based on the system architecture, the server needs to obtain candidate image templates of respective headers of various image types in advance, and referring to fig. 3, the following operations are performed for the various image types, respectively:
s301: candidate template configuration information corresponding to an image type is selected from a preset template configuration information set.
Specifically, in step S301, the server selects candidate template configuration information corresponding to the image type from preset template configuration information based on a correspondence between the image type and the candidate template configuration information, where the candidate template configuration information at least includes: the data names corresponding to the one image type and their respective expected locations in the reference coordinate system.
Exemplarily, 5 image types are taken as an example, that is, the preset template configuration information set includes candidate template configuration information corresponding to each of the 5 image types, and then the candidate template configuration information corresponding to each of the image types is shown in table 1:
TABLE 1
Figure BDA0003705263220000131
Obviously, based on the table, after acquiring one image type, the server may determine the template configuration information corresponding to the one image type based on the correspondence between the image type and the template configuration information. For example, it is assumed that the server obtains the image type image.type3, and then obtains the corresponding template configuration information te.con.inf3, and so on, which is not described herein again.
It should be noted that, as shown in fig. 4, the template configuration information may further include: name of template, original image of template configuration base. jpg, Positive and Negative keywords selected in forward and Negative directions, candidate data names (also called Anchor information, such as Anchor1, Anchor2 and Anchor3), and Templates of image parsing.
Wherein, the template Name represents a unique positioning configuration template, such as "Insurance _ PICC _ case 1"; configuring an original image base.jpg for a template for subsequent positioning and template adjustment, and comparing template differences; the method meets the condition that the Positive Word selected by the forward template appears in the candidate image template, and the Negative Word selected by the reverse template is not in the candidate image template, and indicates that the candidate image template is matched, and needs to be explained that the Positive Word selected by the forward template and the Negative Word selected by the reverse template need to be carefully selected so as to avoid misidentification, so that the correct candidate image template is accurately matched.
S302: and generating a corresponding candidate image template aiming at one image type based on each data name and each expected position thereof contained in the candidate template configuration information.
Specifically, in step S302, after selecting the candidate template configuration information corresponding to the one image type, the server may generate a corresponding candidate image template for the one image type based on each data name and its expected position included in the candidate template configuration information.
For example, referring to fig. 5, which is a schematic structural diagram of a candidate image template provided in the embodiment of the present application, the various data information included in the candidate image template generated by the server is essentially the candidate template information (each data name and its respective expected position) corresponding to the above one image type, for example, the data name inspired and its expected position are taken as an example for simple description: "INSURED", (104, 431, 81, 24), where (104, 431, 81, 24) are (Left, Top, Height, Width), respectively, that is, the Left distance of the data name INSURED with respect to the positioning reference is obtained, the upper distance of the data name INSURED with respect to the positioning reference is obtained, the Height of the data name INSURED in the browser visual window is obtained, and the Width of the data name INSURED in the browser visual window is obtained.
Based on the candidate image template, it is possible to implement: selecting a keyword Positive Word and a keyword Negative Word based on the forward template, matching candidate image templates and selecting a proper candidate image template; identifying candidate name coordinates of the image to be analyzed based on the candidate name (anchor point information), and calculating an affine transformation matrix between the image to be analyzed and a corresponding candidate image template, so that the position deviation of a data name key value caused by the problems of scanning distance, angle and the like is avoided, the data name key value is pulled back to the same position as base. And searching all possible value contents in the coordinate range associated with the data name based on the image parsing template Templates.
S303: and storing the obtained candidate image template into a preset candidate image template set.
Specifically, in step S303, after the server generates the candidate image template of the image type, the server may store the obtained candidate image template in a preset candidate image template set, so that after the image to be analyzed is obtained in the following step, the candidate image template matched with the image type of the image to be analyzed may be called from the preset candidate image template set according to the image type of the image to be analyzed.
Further, based on the above steps of the method for generating the candidate image template, after the server obtains the candidate image templates corresponding to the respective image types, referring to fig. 6, which is a flowchart of a method for implementing the image analysis method provided in the embodiment of the present application, the method includes the following specific implementation flows:
s601: and acquiring an image to be analyzed, and screening out a target image template matched with the image type of the image to be analyzed from a preset candidate image template set.
Specifically, referring to fig. 7, when step S601 is executed, after the server acquires the image to be analyzed, the server may determine the image type to which the image to be analyzed belongs by using a preset image type identification method, so as to screen out a target image template matching the image type of the image to be analyzed from a preset candidate image template set according to a forward selection keyword and a reverse selection keyword associated with the image type of the image to be analyzed, and a correspondence between the forward selection keyword and the reverse selection keyword and the candidate image template; wherein, the forward selection keyword representation: and (3) meeting a preset correlation constraint condition with the image to be analyzed, and reversely selecting a keyword for representation: and the image to be analyzed meet the preset difference constraint condition.
Further, referring to fig. 8, it is a flowchart illustrating an implementation of a method for obtaining a target image template according to an embodiment of the present application, where the method is implemented in the following specific steps:
s6011: and acquiring a forward selection keyword and a reverse selection keyword which are associated with the image type of the image to be analyzed.
Specifically, when step S6011 is executed, after the server determines the image type of the image to be analyzed by a preset image type determination method, the forward selection keyword and the reverse selection keyword associated with the image type of the image to be analyzed may be determined according to a correspondence between the image type and the forward selection keyword and the reverse selection keyword.
S6012: and screening out at least one candidate image template meeting the correlation constraint condition corresponding to the forward selected keyword from the image template set, and screening out an alternative target template meeting the difference constraint condition corresponding to the reverse selected keyword from the obtained at least one candidate image template.
Specifically, referring to fig. 9, when executing step S6012, after acquiring a forward selection keyword Positive Word and a reverse selection keyword Negative Word associated with an image type image of an image to be analyzed, by a server; next, at least one candidate image template (for example, can.image.t1, can.image.t2, and can.image.t3) satisfying the correlation constraint condition cor.constraint corresponding to the forward selection keyword Positive Word is screened out from the image template set image.tem.c, and a candidate target template can.image.t2 satisfying the diversity constraint condition dif.constraint corresponding to the reverse selection keyword Positive Word is screened out from the obtained at least one candidate image template (for example, can.image.t1, can.image.t2, and can.image.t 3).
In a possible implementation manner, in the process that the server screens out the candidate target templates meeting the difference constraint condition corresponding to the reversely selected keyword from the obtained at least one candidate image template, if a plurality of candidate target templates meeting the difference constraint condition corresponding to the reversely selected keyword are screened out from the at least one candidate image template, the number of data names contained in each candidate target template is respectively obtained; then, acquiring the quantity arrangement sequence of each candidate target template based on the quantity of the data names contained in each candidate target template; and finally, reserving the alternative target templates meeting the preset quantity arrangement order condition.
For example, referring to fig. 10, which is a schematic diagram of a specific application scenario for acquiring candidate targets provided in an embodiment of the present application, in a process of screening, by a server, candidate target templates that satisfy a difference constraint condition dif.constraint corresponding to a reversely selected keyword neutral Word from at least one obtained candidate image template (e.g., can. image.t1, can. image.t2, can. image.t3, can. image.t4, and can. image.t5), if a plurality of candidate target templates that satisfy a difference constraint condition dif.constraint corresponding to a reversely selected keyword neutral Word are screened from at least one candidate image template, i.e., can. image.t1, can. image.t2, can. image.t3, can. image.t4, and can. image.t5, for example, a plurality of candidate target templates that satisfy a difference constraint condition dif.constraint corresponding to a reversely selected keyword neutral Word are screened, and each candidate target template includes a respective number of candidate target data (e., each candidate target data is assumed to be as follows: 11. 13, 7; next, the number array order of each candidate target template (can. image.t1, can. image.t3, and can. image.t4) is obtained as follows: 2. 1, 3; and finally, taking the candidate target template can.image.T3 with the highest number arrangement order of 1 as the candidate target template meeting the preset number arrangement order condition Quant.order.
Obviously, based on the above method steps, the server reserves the candidate target templates meeting the preset quantity arrangement order condition through the respective quantity arrangement order of each candidate target template, thereby effectively avoiding the technical disadvantage that a plurality of candidate target templates meeting the difference constraint condition corresponding to the reversely selected keyword are not known to be used as the subsequent target image template due to the screening of the candidate target templates, ensuring the selected target image template to the greatest extent, and meeting the required image analysis requirement of the image to be analyzed.
In an optional implementation manner, referring to fig. 11, which is another schematic diagram of a specific application scenario for obtaining a target image template provided in the embodiment of the present application, after obtaining an HTML file associated with an image type of an image to be analyzed, a server may traverse respective template configuration information of each candidate image template, and determine whether forward-selected keywords of each candidate image template are all in the HTML file: if yes, recording the current candidate image template; if not, judging whether forward selection keywords of the next candidate image template are all in the HTML file or not until all candidate image templates in the image template set are checked; further, after screening out the corresponding forward selection keyword from the image template set in at least one candidate image template of the HTML file, judging whether the reverse selection keyword of each candidate image template is not in the HTML file: if yes, keeping the current candidate image template; if not, removing the current candidate image template from the obtained at least one candidate image.
It should be noted that, in the process of obtaining the target image template, if there is no candidate target template that satisfies the difference constraint condition corresponding to the reversely selected keyword and/or the reversely selected keywords of the candidate image template are all in the HTML file, it may be considered that corresponding candidate template configuration information is not configured in the preset candidate image template set, and a corresponding candidate image template is generated.
S6013: and taking the alternative target template as a target image template.
For example, after screening out an alternative target template can.image.t2 satisfying the diversity constraint condition dif.constraint corresponding to the reversely selected keyword Negative Word from at least one obtained candidate image template, namely can.image.t1, can.image.t2 and can.image.t3, the server may use the alternative target template can.image.t2 as the target image template tra.image.t.
S602: and carrying out image recognition on the image to be analyzed to obtain at least one preselected data name in the original data names contained in the image to be analyzed and original position information of each original data name in a reference coordinate system.
Specifically, when step S602 is executed, after the server obtains the target image template, the server may perform image recognition on the image to be analyzed by using a preset position information obtaining method, so as to obtain at least one preselected data name in each original data name included in the image to be analyzed, and original position information of each preselected data name in the reference coordinate system.
For example, after the server performs image recognition on the image to be analyzed, three preselected data names (e.g., Policy No., Survey agent surveyy By, and insurance Amount future) in each original data name included in the image to be analyzed can be obtained, and the original position information of each in the reference coordinate system: (L) 1 ,T 1 ,H 1 ,W 1 )、(L 2 ,T 2 ,H 2 ,W 2 )、(L 3 ,T 3 ,H 3 ,W 3 )。
S603: and obtaining a corresponding position mapping relation based on the obtained at least one original position information and at least one preselected data name in each target data name contained in the target image template and the expected position information of each target data name in the reference coordinate system, and carrying out position adjustment on the image to be analyzed based on the position mapping relation to obtain the adjusted image to be analyzed.
Specifically, when step S603 is executed, after obtaining at least one pre-selected data name in each original data name included in the image to be analyzed, the server may obtain, in combination with the at least one pre-selected data name included in the target image template and in the target data name, respective expected position information in the reference coordinate system, a corresponding position mapping relationship, such as an affine transformation matrix, so as to perform position adjustment on the image to be analyzed based on the position mapping relationship, and obtain an adjusted image to be analyzed.
Illustratively, again with the three preselected data names described above (e.g., Policy No.),Survey agent surveyy By: and insurance line atmospheric measured), original position information of each in a reference coordinate system: (L) 1 ,T 1 ,H 1 ,W 1 )、(L 2 ,T 2 ,H 2 ,W 2 )、(L 3 ,T 3 ,H 3 ,W 3 ) In conjunction with the three prior data names contained in the target image template and referred to above (e.g., Policy No., Survey agent Survey By: and insurance Amount, Amount measured), expected location information of each in the reference coordinate system: (L' 1 ,T′ 1 ,H′ 1 ,W′ 1 )、(L′ 2 ,T′ 2 ,H′ 2 ,W′ 2 )、(L′ 3 ,T′ 3 ,H′ 3 ,W′ 3 ) And obtaining a position mapping relation between the image to be analyzed and each data name of the target image template, namely an affine transformation matrix H.
Further, the server transforms each original data name contained in the image to be analyzed to an expected position consistent with the target image template based on the obtained position mapping relationship, namely, an affine transformation matrix H, wherein a transformation formula of the original position information and the expected position information is specifically as follows:
Figure BDA0003705263220000191
wherein [ x, y, z [ ]] T Indicates original position coordinates, [ x ', y ', z '] T Indicating the expected position coordinates corresponding to the expected position information.
For example, the reference coordinate system may be a coordinate plane with an ordinate of 1, i.e. the corresponding original position coordinate is [ x, y, 1 ]] T The corresponding expected position coordinates are [ x ', y', 1 ]] T
S604: and identifying each original data name from the adjusted image to be analyzed based on each target data name in the corresponding target image template and the text identification box which is respectively arranged.
Based on the above image analysis method steps, referring to fig. 12, which is a schematic view of a specific application scenario of an image analysis method provided in an embodiment of the present application, after acquiring a resource image of an image to be analyzed, a server may select, from a preset candidate image template set image.tem.c, a target image template tra.image.t that matches an image type image.type of the resource image of the image to be analyzed; then, performing image recognition on the image to be analyzed to obtain at least one preselected data name (for example, pre. data. name1, pre. data. name2 and pre. data. name3) in each original data name contained in the image to be analyzed, and original position information (for example, ori. loca. inform1, ori. loca. inform2 and ori. loca. inform3) of each original data name in a reference coordinate system; further, based on the obtained at least one original position information and at least one preselected data name in each target data name contained in the target image template, and the expected position information (for example, exp. loca. inform1, exp. loca. inform2, and exp. loca. inform3) of each target in the reference coordinate system, obtaining a corresponding position mapping relationship loca. map. relay, and based on the position mapping relationship loca. map. relay, performing position adjustment on the to-be-analyzed image, and obtaining an adjusted to-be-analyzed image; finally, based on the target data names (e.g., tar, data, name1, tar, data, name3, tar, data, name4, and tar, data, name5) in the corresponding target image template tra, image, t, text recognition boxes (e.g., text, idfi, box1, text, idfi, box2, text, idfi, box3, text, idfi, box4, and text, idfi, box5) respectively set, the original data names (e.g., ori.
In summary, in the image analysis method provided in the embodiment of the present application, a target image template matched with an image type of an acquired image to be analyzed is screened from a preset candidate image template set, where the target image template is used to indicate: the expected position of each of at least one data name contained in the image in a reference coordinate system; then, carrying out image recognition on the image to be analyzed to obtain at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in a reference coordinate system; further, based on the obtained at least one original position information, at least one preselected data name in each target data name contained in the target image template and expected position information of each preselected data name in a reference coordinate system, obtaining a corresponding position mapping relation, and based on the position mapping relation, performing position adjustment on the image to be analyzed to obtain an adjusted image to be analyzed; and finally, recognizing each original data name from the adjusted image to be analyzed based on each target data name in the corresponding target image template and the text recognition box which is respectively arranged.
In this way, based on at least one pre-selected data name among the original data names contained in the image to be resolved, the original position information of each in the reference coordinate system, and at least one pre-selected data name among the target data names contained in the target image template, and expected position information of each in the reference coordinate system, obtaining a corresponding position mapping relation, thereby obtaining a position mapping relation based on the position mapping relation, the position of the image to be analyzed is adjusted, thereby avoiding the problem that the data name is not completely included in the fixed text box in the prior art, that is, the acquisition range corresponding to the fixed text box does not completely overlap with the location range corresponding to the data name, which results in that the data name cannot be completely acquired, and furthermore, the technical disadvantage that the data information corresponding to the data name cannot be accurately acquired according to the identified data name is overcome, so that the accuracy of image analysis is improved.
Further, after identifying each original data name included in the image to be analyzed, the server may further obtain each original data name, each corresponding sub-category name and each corresponding sub-category data thereof, so as to refer to fig. 13, the following operations are respectively performed for each original data name:
s1301: from the target data names, a target data name corresponding to one original data name is determined.
Specifically, in step S1301, after obtaining an original name in the image to be parsed, the server may determine a target data name corresponding to the original name from the target data names included in the target image template.
It should be noted that each original data name included in the image to be parsed is a subset of each target data name included in the target image template.
S1302: and acquiring the names of the corresponding target data and the sub-category data corresponding to the names of the sub-categories, which are arranged in the data framing range of the reference coordinate system.
Specifically, as shown in fig. 14, in step S1302, after obtaining the target data name corresponding to the one original data name, the server may determine the data framing range of the reference coordinate system in which the corresponding target data name is pre-set, so as to obtain the sub-category names and the sub-category data corresponding to the sub-category names within the data framing range of the reference coordinate system.
In one possible implementation manner, referring to fig. 15, after acquiring each sub-category name and its corresponding sub-category data, the server performs the following operations for each sub-category name and its corresponding sub-category data, respectively: acquiring a subcategory name and subcategory data thereof, and acquiring the overlapping area S of a data display range and a data framing range in a reference coordinate system c (ii) a Next, a range S is framed based on the data k And the overlapping area S c To obtain a first contact ratio
Figure BDA0003705263220000211
Based on data display range S z And the overlapping area S c Obtaining a second degree of overlap
Figure BDA0003705263220000212
Finally, when the first coincidence degree and the second coincidence degree are obtained, there is a coincidence degree threshold value alpha not less than a preset value Y When the coincidence degree is smaller than the first coincidence degree, the first sub-category name and the sub-category data thereof are retained, and when the coincidence degree is smaller than the second coincidence degree, the first coincidence degree and the second coincidence degree are obtainedAt a predetermined contact ratio threshold value alpha Y The one sub-category name and the sub-category data thereof are not reserved.
Illustratively, assume a preset overlap ratio threshold
Figure BDA0003705263220000221
And the coincidence degree alpha is max (alpha) 1 ,α 2 ) If the data frame defines the range S k 10 and overlap area S c 2, corresponding first degree of overlap
Figure BDA0003705263220000222
Based on data display range S z 3 and overlap area S c A second degree of overlap is obtained
Figure BDA0003705263220000223
Then it can know
Figure BDA0003705263220000224
And
Figure BDA0003705263220000225
in the second degree of overlap α 2 Not less than a preset contact ratio threshold
Figure BDA0003705263220000226
Thus, the one sub-category name and the sub-category data thereof are retained.
Similarly, if the data frame defines the range S k 10 and overlap area S c 2, corresponding first degree of overlap
Figure BDA0003705263220000227
Based on the data display range S z 8 and overlap area S c A second degree of overlap is obtained
Figure BDA0003705263220000228
Then it can be known
Figure BDA0003705263220000229
And
Figure BDA00037052632200002210
are all less than the preset contact ratio threshold value
Figure BDA00037052632200002211
Therefore, the name of the sub-category and the data of the sub-category are not reserved.
It is easy to find that, by adopting the method, the image to be analyzed is friendly to the long-tail element, the data name with small occupation amount belongs to the long-tail element in the whole sample set, and the image to be analyzed can be easily analyzed by configuring the constraint conditions such as the data framing range and the like.
S1303: and obtaining the output arrangement sequence of each sub-category name and the sub-category data corresponding to the sub-category name based on each sub-category name and the sub-category data corresponding to the sub-category name respectively and the sub-category position of each sub-category within the data framing range.
Specifically, as shown in fig. 16, in step S1303, after the server obtains each sub-category name and its corresponding sub-category data, the server can obtain the output arrangement order of each sub-category name and its corresponding sub-category data based on each sub-category name and its corresponding sub-category data and its corresponding sub-category position within the data frame.
S1304: and outputting corresponding subcategory names and subcategory data thereof in sequence according to the obtained output arrangement sequence.
It should be further noted that, based on the above method steps, after the server obtains the corresponding sub-category name and the sub-category data thereof, there may be no critical interference, for example, different data contents are in the same sub-category name, and/or the compound sub-category name needs to be split into a plurality of single sub-category names, and therefore, the following steps further need to perform the splitting of the compound sub-category name and the cleaning of the single sub-category name specifically include: the splitting of the compound sub-category names can be configured in Templates respectively according to different compound sub-category names of different images to be analyzed; the cleaning and standardized conversion of the names of the single subcategories can be completed by deleting explicit key elements, deleting irrelevant spaces and the like. Therefore, each independent data name key value pair can finish the consistency audit of corresponding data information after being sent to the subsequent steps.
Therefore, based on the steps of the method and based on the target image template matched with the pattern, each type of image corresponds to one candidate image template, and each data name has independent region division and value constraint, so that the data names cannot interfere with each other, and are easy to configure and maintain; based on the correction of the image to be analyzed by the pre-data name (anchor point) and the position mapping relation (affine transformation), the influence caused by the rotation, the deformation and the like of the image to be analyzed is avoided, and therefore, no constraint is caused on a use scene; whether the data information is needed or not is determined based on the definition of the position and the expression form of the data name, so that the interference of OCR (optical character recognition) errors of keys on the positioning and matching of the data name and the corresponding data information is reduced, and the identification accuracy is improved; in addition, the method does not need to be marked, is light in model weight and has low use requirement of hardware resources.
Further, based on the same technical concept, the embodiment of the present application further provides an image analysis device, and the image analysis device is used for implementing the above method flow of the embodiment of the present application. Referring to fig. 17, the image analysis apparatus includes: a filtering module 1701, an obtaining module 1702, an adjusting module 1703, and an identifying module 1704, wherein:
a screening module 1701 for acquiring an image to be analyzed and screening out a target image template matched with the image type of the image to be analyzed from a preset candidate image template set; wherein the target image template is used to indicate: the image contains at least one data name each in an expected position in a reference coordinate system.
An obtaining module 1702, configured to perform image recognition on an image to be analyzed, to obtain at least one preselected data name among original data names included in the image to be analyzed, and original position information of each original data name in a reference coordinate system;
an adjusting module 1703, configured to obtain a corresponding position mapping relationship based on the obtained at least one piece of original position information, at least one preselected data name in each target data name included in the target image template, and expected position information of each in a reference coordinate system, and perform position adjustment on the image to be analyzed based on the position mapping relationship to obtain an adjusted image to be analyzed;
an identifying module 1704, configured to identify, based on the text identification boxes respectively set for the names of the target data in the corresponding target image template, the names of the original data from the adjusted image to be analyzed.
In one possible embodiment, before acquiring the image to be parsed, the filtering module 1701 is further configured to:
the following operations are performed for each image type, respectively:
selecting candidate template configuration information corresponding to an image type from a preset template configuration information set; wherein the candidate template configuration information at least comprises: each data name corresponding to one image type and the expected position of each data name in a reference coordinate system;
generating a corresponding candidate image template aiming at one image type based on each data name and each expected position thereof contained in the candidate template configuration information;
and storing the obtained candidate image template into a preset candidate image template set.
In a possible embodiment, when a target image template matching with an image type of an image to be parsed is screened out from a preset candidate image template set, the screening module 1701 is specifically configured to:
acquiring a forward selection keyword and a reverse selection keyword which are associated with the image type of the image to be analyzed; the method comprises the following steps of (1) forward selection of keyword representation: and (3) meeting a preset correlation constraint condition with the image to be analyzed, and reversely selecting a keyword for representation: the image to be analyzed and the image to be analyzed meet preset difference constraint conditions;
screening at least one candidate image template which meets the correlation constraint condition corresponding to the forward selection keyword from the image template set, and screening an alternative target template which meets the difference constraint condition corresponding to the reverse selection keyword from the obtained at least one candidate image template;
and taking the alternative target template as a target image template.
In a possible embodiment, when screening out, from the obtained at least one candidate image template, an alternative target template that satisfies a difference constraint condition corresponding to a reversely selected keyword, the screening module 1701 is specifically configured to:
if a plurality of candidate target templates meeting the difference constraint condition corresponding to the reversely selected keyword are screened out from at least one candidate image template, the number of data names contained in each candidate target template is respectively obtained;
obtaining the quantity arrangement sequence of each alternative target template based on the quantity of the data names contained in each alternative target template;
and reserving the alternative target templates meeting the preset quantity arrangement order condition.
In a possible embodiment, after identifying the original data names, the identifying module 1704 is further configured to:
for each original data name, the following operations are respectively executed:
determining a target data name corresponding to an original data name from all target data names;
acquiring the names of the sub-categories and the sub-category data corresponding to the names of the target data, which are arranged in the data framing range of the reference coordinate system;
based on each subcategory name and subcategory data corresponding to the subcategory name, obtaining an output arrangement sequence of each subcategory name and subcategory data corresponding to the subcategory name respectively in a subcategory position within a data framing range;
and outputting corresponding subcategory names and subcategory data thereof in sequence according to the obtained output arrangement sequence.
In one possible embodiment, after obtaining the names of the corresponding target data set within the framed range of the reference coordinate system, the names of the sub-categories and their respective corresponding sub-category data, and before obtaining the names of the sub-categories and their respective corresponding sub-category data, the identification module 1704 is further configured to:
aiming at each subcategory name and each corresponding subcategory data thereof, the following operations are respectively executed:
acquiring a subcategory name and subcategory data thereof, and acquiring the overlapping area of a data display range and a data framing range in a reference coordinate system;
obtaining a first coincidence degree based on the data framing range and the overlapping area, and obtaining a second coincidence degree based on the data display range and the overlapping area;
and when the coincidence degree which is not less than the preset coincidence degree threshold value exists in the first coincidence degree and the second coincidence degree, retaining one subcategory name and subcategory data thereof.
Based on the same technical concept, the embodiment of the present application further provides an electronic device, and the electronic device can implement the image analysis method provided by the above embodiment of the present application. In one embodiment, the electronic device may be a server, a terminal device, or other electronic device. As shown in fig. 18, the electronic apparatus may include:
at least one processor 1801 and a memory 1802 connected to the at least one processor 1801, in this embodiment, a specific connection medium between the processor 1801 and the memory 1802 is not limited in this application, and fig. 18 illustrates an example where the processor 1801 and the memory 1802 are connected through a bus 1800. The bus 1800 is shown in fig. 18 by a thick line, and the connection between other components is merely illustrative and not intended to be limiting. The bus 1800 may be divided into an address bus, a data bus, a control bus, etc., which is indicated in FIG. 18 by only one thick line for ease of illustration, but does not indicate that there is only one bus or type of bus. Alternatively, the processor 1801 may also be referred to as a controller, and is not limited by name.
In the embodiment of the present application, the memory 1802 stores instructions executable by the at least one processor 1801, and the at least one processor 1801 may execute the instructions stored in the memory 1802 to perform an image parsing method as discussed above. The processor 1801 may implement the functions of the various modules in the apparatus shown in fig. 17.
The processor 1801 is a control center of the apparatus, and may connect various parts of the entire control device by using various interfaces and lines, and perform various functions and process data of the apparatus by executing or executing instructions stored in the memory 1802 and calling data stored in the memory 1802, thereby monitoring the entire apparatus.
In one possible design, the processor 1801 may include one or more processing units, and the processor 1801 may integrate an application processor, which handles primarily operating systems, user interfaces, application programs, and the like, and a modem processor, which handles primarily wireless communications. It is to be appreciated that the modem processor described above may not be integrated into the processor 1801. In some embodiments, the processor 1801 and the memory 1802 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 1801 may be a general-purpose processor, such as a CPU, digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of an image analysis method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
Memory 1802, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1802 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 1802 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1802 of the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
The processor 1801 is programmed to solidify the code corresponding to the image analysis method described in the foregoing embodiment into the chip, so that the chip can execute the steps of the image analysis method of the embodiment shown in fig. 6 when running. How to program the processor 1801 is well known to those skilled in the art and will not be described herein.
Based on the same inventive concept, the present application also provides a storage medium storing computer instructions, which when run on a computer, cause the computer to execute an image parsing method as discussed above.
In some possible embodiments, the present application provides that the aspects of an image parsing method may also be implemented in the form of a program product comprising program code for causing a control device to perform the steps in an image parsing method according to various exemplary embodiments of the present application described above in this specification, when the program product is run on an apparatus.
It should be noted that although in the above detailed description several units or sub-units of the apparatus are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a server, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user equipment, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server.
In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (15)

1. An image analysis method, comprising:
acquiring an image to be analyzed, and screening out a target image template matched with the image type of the image to be analyzed from a preset candidate image template set; wherein the target image template is to indicate: the expected position of each of at least one data name contained in the image in a reference coordinate system;
carrying out image recognition on the image to be analyzed to obtain at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in the reference coordinate system;
based on the obtained at least one original position information and the at least one preselected data name in each target data name contained in the target image template, respectively obtaining corresponding position mapping relation in the expected position information of each target data name in the reference coordinate system, and based on the position mapping relation, carrying out position adjustment on the image to be analyzed to obtain an adjusted image to be analyzed;
and identifying each original data name from the adjusted image to be analyzed based on text identification frames respectively arranged corresponding to each target data name in the target image template.
2. The method of claim 1, wherein prior to obtaining the image to be resolved, further comprising:
the following operations are performed for each image type, respectively:
selecting candidate template configuration information corresponding to an image type from a preset template configuration information set; wherein the candidate template configuration information at least comprises: each data name corresponding to the one image type and the expected position of each data name in the reference coordinate system;
generating a corresponding candidate image template for the image type based on each data name and each expected position thereof contained in the candidate template configuration information;
and storing the obtained candidate image template into a preset candidate image template set.
3. The method of claim 1, wherein the screening out a target image template matching the image type of the image to be parsed from a preset candidate image template set comprises:
acquiring a forward selection keyword and a reverse selection keyword which are associated with the image type of the image to be analyzed; the forward selection keyword representation comprises the following steps: and the image to be analyzed meets a preset correlation constraint condition, and the reversely selected keyword represents: the image to be analyzed and the image to be analyzed meet preset difference constraint conditions;
screening at least one candidate image template which meets the correlation constraint condition corresponding to the forward selection keyword from the image template set, and screening an alternative target template which meets the difference constraint condition corresponding to the reverse selection keyword from the obtained at least one candidate image template;
and taking the alternative target template as the target image template.
4. The method according to claim 3, wherein the selecting, from the obtained at least one candidate image template, an alternative target template that satisfies a difference constraint condition corresponding to the reverse-chosen keyword, further comprises:
if a plurality of candidate target templates which meet the difference constraint condition corresponding to the reversely selected keyword are screened out from the at least one candidate image template, the number of data names contained in each candidate target template is respectively obtained;
obtaining the respective quantity arrangement sequence of each candidate target template based on the quantity of the data name contained in each candidate target template;
and reserving the alternative target templates meeting the preset quantity arrangement order condition.
5. The method of any of claims 1-4, wherein after identifying the respective raw data names, further comprising:
for each original data name, respectively executing the following operations:
determining a target data name corresponding to an original data name from the target data names;
acquiring each subcategory name and subcategory data corresponding to the subcategory name, which are arranged in a data framing range of the reference coordinate system and correspond to the target data name;
based on the sub-category names and the sub-category data corresponding to the sub-category names, obtaining output arrangement sequences of the sub-category names and the sub-category data corresponding to the sub-category names in the sub-category positions in the data framing range;
and outputting corresponding subcategory names and subcategory data thereof in sequence according to the obtained output arrangement sequence.
6. The method of claim 5, wherein said obtaining the data corresponding to the target data name disposed within the framed range of the reference coordinate system, after each sub-category name and its respective corresponding sub-category data, and before said obtaining the data based on the each sub-category name and its respective corresponding sub-category data, further comprises:
aiming at each subcategory name and each corresponding subcategory data thereof, the following operations are respectively executed:
acquiring a subcategory name and subcategory data thereof, and acquiring the overlapping area of a data display range and a data framing range in the reference coordinate system;
obtaining a first overlap ratio based on the data framing range and the overlap area, and obtaining a second overlap ratio based on the data display range and the overlap area;
and when the coincidence degree which is not less than a preset coincidence degree threshold value exists in the first coincidence degree and the second coincidence degree, retaining the one subcategory name and the subcategory data thereof.
7. An image analysis apparatus, comprising:
the screening module is used for acquiring an image to be analyzed and screening out a target image template matched with the image type of the image to be analyzed from a preset candidate image template set; wherein the target image template is to indicate: the expected position of each of at least one data name contained in the image in a reference coordinate system;
the acquisition module is used for carrying out image identification on the image to be analyzed to obtain at least one preselected data name in all original data names contained in the image to be analyzed and original position information of each original data name in the reference coordinate system;
an adjusting module, configured to obtain a corresponding position mapping relationship based on the obtained at least one original position information and the expected position information of each of the at least one preselected data name in each target data name included in the target image template in the reference coordinate system, and perform position adjustment on the image to be analyzed based on the position mapping relationship to obtain an adjusted image to be analyzed;
and the identification module is used for identifying each original data name from the adjusted image to be analyzed based on text identification boxes which are respectively arranged corresponding to each target data name in the target image template.
8. The apparatus of claim 7, wherein prior to the obtaining the image to be resolved, the screening module is further to:
the following operations are performed for each image type, respectively:
selecting candidate template configuration information corresponding to an image type from a preset template configuration information set; wherein the candidate template configuration information at least comprises: each data name corresponding to the one image type and the expected position of each data name in the reference coordinate system;
generating a corresponding candidate image template for the image type based on each data name and each expected position thereof contained in the candidate template configuration information;
and storing the obtained candidate image template into a preset candidate image template set.
9. The apparatus of claim 7, wherein when the target image template matching the image type of the image to be parsed is screened out from a preset set of candidate image templates, the screening module is specifically configured to:
acquiring a forward selection keyword and a reverse selection keyword which are associated with the image type of the image to be analyzed; the forward selection keyword representation comprises the following steps: and the image to be analyzed meets a preset correlation constraint condition, and the reversely selected keyword represents: the image to be analyzed and the image to be analyzed meet preset difference constraint conditions;
screening at least one candidate image template which meets the correlation constraint condition corresponding to the forward selection keyword from the image template set, and screening an alternative target template which meets the difference constraint condition corresponding to the reverse selection keyword from the obtained at least one candidate image template;
and taking the alternative target template as the target image template.
10. The apparatus according to claim 9, wherein when the candidate target templates that satisfy the difference constraint condition corresponding to the reversely-selected keyword are screened from the obtained at least one candidate image template, the screening module is specifically configured to:
if a plurality of candidate target templates meeting the difference constraint condition corresponding to the reversely selected keyword are screened out from the at least one candidate image template, respectively acquiring the number of data names contained in each candidate target template;
obtaining the respective quantity arrangement sequence of each candidate target template based on the quantity of the data name contained in each candidate target template;
and reserving the alternative target templates meeting the preset quantity arrangement sequence condition.
11. The apparatus of any of claims 7-10, wherein after the identifying the respective raw data names, the identifying module is further to:
for each original data name, respectively executing the following operations:
determining a target data name corresponding to an original data name from the target data names;
acquiring each subcategory name and subcategory data corresponding to the subcategory name, which are arranged in a data framing range of the reference coordinate system and correspond to the target data name;
based on the sub-category names and the sub-category data corresponding to the sub-category names, obtaining output arrangement sequences of the sub-category names and the sub-category data corresponding to the sub-category names in the sub-category positions in the data framing range;
and outputting corresponding subcategory names and subcategory data thereof in sequence according to the obtained output arrangement sequence.
12. The apparatus of claim 11, wherein after said obtaining the name corresponding to the target data is located within the framed range of the reference coordinate system, each sub-category name and its respective corresponding sub-category data, and before said based on each sub-category name and its respective corresponding sub-category data, the identification module is further to:
for each subcategory name and each corresponding subcategory data thereof, respectively executing the following operations:
acquiring a subcategory name and subcategory data thereof, and acquiring the overlapping area of a data display range and a data framing range in the reference coordinate system;
obtaining a first overlap ratio based on the data framing range and the overlap area, and obtaining a second overlap ratio based on the data display range and the overlap area;
and when the coincidence degree which is not less than a preset coincidence degree threshold value exists in the first coincidence degree and the second coincidence degree, retaining the one subcategory name and the subcategory data thereof.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-6 when executing the computer program.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
15. A computer program product, which, when called by a computer, causes the computer to perform the method of any one of claims 1 to 6.
CN202210705841.7A 2022-06-21 2022-06-21 Image analysis method and related device Pending CN114937158A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210705841.7A CN114937158A (en) 2022-06-21 2022-06-21 Image analysis method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210705841.7A CN114937158A (en) 2022-06-21 2022-06-21 Image analysis method and related device

Publications (1)

Publication Number Publication Date
CN114937158A true CN114937158A (en) 2022-08-23

Family

ID=82867758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210705841.7A Pending CN114937158A (en) 2022-06-21 2022-06-21 Image analysis method and related device

Country Status (1)

Country Link
CN (1) CN114937158A (en)

Similar Documents

Publication Publication Date Title
US20230401828A1 (en) Method for training image recognition model, electronic device and storage medium
US10990814B2 (en) Converting an image into a structured table
CN110874618B (en) OCR template learning method and device based on small sample, electronic equipment and medium
CN103838566A (en) Information processing device, and information processing method
JP2017134813A (en) Method and apparatus for obtaining semantic tag of digital image
US20210366055A1 (en) Systems and methods for generating accurate transaction data and manipulation
CN113205047B (en) Medicine name identification method, device, computer equipment and storage medium
CN116049397A (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
EP2884425B1 (en) Method and system of extracting structured data from a document
CN115546809A (en) Table structure identification method based on cell constraint and application thereof
CN116992081A (en) Page form data processing method and device and user terminal
CN113360737A (en) Page content acquisition method and device, electronic equipment and readable medium
WO2023239468A1 (en) Cross-application componentized document generation
CN114937158A (en) Image analysis method and related device
CN115546815A (en) Table identification method, device, equipment and storage medium
CN114299074A (en) Video segmentation method, device, equipment and storage medium
KR20220014015A (en) Texts of goods analyzing and matching method
CN115471859A (en) Image identification method and related device
CN114996226B (en) Icon detection method, electronic device, readable storage medium, and program product
CN110853115A (en) Method and equipment for creating development process page
US20240153299A1 (en) System and method for automated document analysis
US20240169143A1 (en) Method and system of generating an editable document from a non-editable document
US20240135739A1 (en) Method of classifying a document for a straight-through processing
Srividhya et al. Deep Learning based Telugu Video Text Detection using Video Coding Over Digital Transmission
US20240176951A1 (en) Electronic document validation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination