CN116246294B

CN116246294B - Image information identification method, device, storage medium and electronic equipment

Info

Publication number: CN116246294B
Application number: CN202211552240.3A
Authority: CN
Inventors: 王愚; 王化楠
Original assignee: Lianlian Hangzhou Information Technology Co ltd
Current assignee: Lianlian Hangzhou Information Technology Co ltd
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2024-04-09
Anticipated expiration: 2042-12-05
Also published as: CN116246294A

Abstract

The present disclosure relates to an image information recognition method, an image information recognition apparatus, a storage medium, and an electronic device. The image information identification method comprises the steps of obtaining a target image; invoking a preset layout classification model to identify the image layout of the target image, and obtaining a first template identification result of the target image; calling a preset adaptation degree calculation model, and carrying out adaptation treatment on the target image and a template database to obtain a second template identification result of the target image; determining a target template based on the first template recognition result and the second template recognition result; and carrying out content recognition on the target image based on the template information of the target template to obtain content description information of the target image. The method and the device can realize the self-adaptation of the template of the target image, efficiently and accurately process the structured output problem of the identification of various format materials, and save the manpower and material resources of related business application.

Description

Image information identification method, device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of image recognition technologies, and in particular, to an image information recognition method, an image information recognition device, a storage medium, and an electronic device.

Background

When materials with fixed formats are used for business application, the materials are usually required to be identified, and when multiple materials with fixed formats are identified in a concentrated mode, the structural problem of the materials with multiple formats cannot be automatically distinguished, so that business application is difficult.

For example, in a cross-border payment scenario, each payment requires a corresponding trade material to prove the authenticity of a transaction, the trade materials are various, have different types and multiple formats, and have no distinguishing characteristics, mainly include a license, a bill, a contract and the like in the form of a picture, such as an enterprise registration certificate, a business registration certificate, a annual declaration, a customs clearance slip, a maritime bill, an air bill, a freight bill and the like, and the same type of material in different countries can also relate to different formats. Structured information cannot be efficiently and accurately extracted from various format materials, and auditing quality, business safety and payment experience are affected.

Disclosure of Invention

In order to solve at least one technical problem set forth above, the present disclosure proposes an image information recognition method, an apparatus, a storage medium, and an electronic device.

According to an aspect of the present disclosure, there is provided an image information recognition method including:

Acquiring a target image;

invoking a preset layout classification model to identify the image layout of the target image, and obtaining a first template identification result of the target image;

calling a preset adaptation degree calculation model, and carrying out adaptation treatment on the target image and a template database to obtain a second template identification result of the target image;

determining a target template based on the first template recognition result and the second template recognition result;

and carrying out content recognition on the target image based on the template information of the target template to obtain content description information of the target image.

In some possible embodiments, the calling a preset fitness calculation model, performing a matching process on the target image and a template database to obtain a second template recognition result of the target image, where the method includes:

performing image transformation on the target image based on each preset template of the template database to obtain a first transformation image corresponding to each preset template;

performing adaptation processing on each preset template and the corresponding first transformation image of each preset template to obtain the adaptation degree of each preset template and the target image;

And determining the second template recognition result based on the adaptation degree.

In some possible embodiments, the adapting the each preset template and the corresponding first transformed image of each preset template to obtain the degree of adaptation between each preset template and the target image includes:

acquiring a preset labeling area of each preset template;

performing region matching on the preset labeling region and the first transformation image to obtain a region to be identified, corresponding to each preset template, in the first transformation image;

and determining the adaptation degree based on the preset labeling area of each preset template and the area to be identified corresponding to each preset template, wherein the adaptation degree characterizes the coincidence degree between the preset labeling area and the area to be identified.

In some possible implementations, the performing image transformation on the target image based on each preset template of the template database to obtain a first transformed image corresponding to each preset template includes:

performing perspective transformation on the target image to obtain a second transformed image;

determining a plurality of first vertex information of the second transformation image and a plurality of second vertex information of the preset template;

Determining vertex distance information between the second transformed image and the preset template based on the plurality of first vertex information and the plurality of second vertex information;

and if the vertex distance information meets a first preset condition, determining the second transformation image as a first transformation image corresponding to the preset template.

In some possible embodiments, the method further comprises:

if the vertex distance information does not meet the first preset condition, repeating the steps of perspective transformation and vertex information determination and vertex distance information determination to obtain an updated second transformed image and updated vertex distance information;

and if the updated vertex distance information meets a first preset condition, determining the updated second transformation image as a first transformation image corresponding to the preset template.

In some possible implementations, the calling a preset layout classification model to perform image layout recognition on the target image to obtain a first template recognition result of the target image includes:

carrying out convolution processing on the target image to obtain image characteristics;

performing content area identification on the target image to obtain a content area to be identified;

Performing content identification on the content area to be identified to obtain content description characteristics;

performing feature fusion processing on the image features and the content description features to obtain target features;

and inputting the target features into a classifier, and classifying the image formats to obtain a first template recognition result corresponding to the target image.

In some possible embodiments, before the capturing the target image, the method further includes:

acquiring an initial image;

and performing perspective projection transformation on the initial image to obtain the target image.

According to a second aspect of the present disclosure, there is provided an image information identifying apparatus, the apparatus comprising:

the target image acquisition module is used for acquiring a target image;

the first template recognition result determining module is used for calling a preset layout classification model to perform image layout recognition on the target image so as to obtain a first template recognition result of the target image;

the second template recognition result determining module is used for calling a preset adaptation degree calculation model, and carrying out adaptation processing on the target image and a template database so as to obtain a second template recognition result of the target image;

the target template determining module is used for determining a target template based on the first template recognition result and the second template recognition result;

And the content recognition module is used for carrying out content recognition on the target image based on the template information of the target template to obtain the content description information of the target image.

According to a third aspect of the present disclosure, there is provided an electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the image information identification method according to any one of the first aspects by executing the instructions stored by the memory.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the image information identification method according to any one of the first aspects.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

The implementation of the present disclosure has the following beneficial effects:

acquiring a target image; invoking a preset format classification model to perform image format recognition on the target image to obtain a first template recognition result of the target image, and performing first template recognition on the target image through the preset format classification model; calling a preset adaptation degree calculation model, and carrying out adaptation treatment on the target image and a template database to obtain a second template identification result of the target image; performing second template recognition on the target image by calculating the adaptation degree; determining a target template based on the first template recognition result and the second template recognition result; comparing the two template recognition results, and taking the template corresponding to the better recognition result as a final target template. And carrying out content recognition on the target image based on the template information of the target template to obtain content description information of the target image. And identifying the content of the target image based on the target template to obtain a structured output result. And the accuracy and the efficiency of template classification are improved by the way that the advantages of the deep learning model and the fitness model are complementary. The problem that automatic structural identification cannot be carried out when various format materials are not distinguished is solved, and manpower and material resources are saved.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present description, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 illustrates a schematic diagram of an application environment in accordance with an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of an image information identification method according to an embodiment of the present disclosure;

FIG. 3 illustrates a flow diagram of a second template recognition result determination method according to an embodiment of the present disclosure;

FIG. 4 shows a flow diagram of a fitness computing method according to an embodiment of the present disclosure;

FIG. 5 shows a flow diagram of a first transformed image determination method according to an embodiment of the present disclosure;

FIG. 6 shows a flow diagram of a second transformed image update method according to an embodiment of the present disclosure;

FIG. 7 illustrates a flow diagram of a first template recognition result determination method according to an embodiment of the present disclosure;

FIG. 8 shows a flow diagram of a target image determination method according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram showing a configuration of an image information recognition apparatus according to an embodiment of the present disclosure;

fig. 10 shows a block diagram of another electronic device according to an embodiment of the disclosure.

Detailed Description

The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present invention based on the embodiments herein.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

Referring to fig. 1, fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application, and as shown in fig. 1, the application environment may at least include a terminal 01 and a server 02. In practical applications, the terminal 01 and the server 02 may be directly or indirectly connected through a wired or wireless communication manner, so as to implement interaction between the server 02 and the terminal 01, which is not limited herein.

The server 02 in the embodiment of the present application may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content distribution networks), and basic cloud computing services such as big data and artificial intelligent platforms. Specifically, the server 02 may include an entity device, may include a network communication unit, a processor, a memory, and the like, may include software running in the entity device, and may include an application program and the like. In the embodiment of the present application, the server 01 may be used to provide network services and data storage services for the terminal 01.

In this embodiment of the present application, the terminal 01 may include a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a smart television, a smart speaker, a smart wearable device, a vehicle terminal device, and other types of entity devices, and may also include software running in the entity devices, such as an application program, and the like. Specifically, the terminal 01 may perform image information identification on the target image to obtain content description information of the target image.

Fig. 2 is a flowchart illustrating an image information recognition method according to an embodiment of the present disclosure, as shown in fig. 2, the method includes:

s101, acquiring a target image;

the target image corresponding to the material to be identified is obtained, the material to be identified can be a text document with a fixed format, and the material to be identified can also be a document comprising multi-mode document elements with the fixed format, wherein the document elements comprise but are not limited to picture elements, text elements and the like. The target image may be an image of the imaged data of the material to be identified after perspective projection transformation.

In some embodiments, the material to be identified includes, but is not limited to, a license, document, or contract having a fixed format, and illustratively, the material having a fixed format may be an enterprise registration certificate, a business registration certificate, a annual declaration, a customs clearance slip, a first type of bill of lading, a second type of bill of lading, or a shipping manifest, etc.

S102, invoking a preset layout classification model to perform image layout recognition on a target image to obtain a first template recognition result of the target image;

the preset layout classification model is a deep learning model which is constructed in advance based on a preset method and used for carrying out layout classification on an input image, a target image is input into the preset layout classification model, and a template type with the largest prediction probability, namely a first template recognition result, is output.

In some embodiments, the preset layout classification model may be trained based on a convolutional neural network and a plurality of classified training image sets, and exemplary, image layout recognition is performed on the target image based on the convolutional neural network model VGG 16.

S103, calling a preset fitness calculation model, and performing adaptation processing on the target image and a template database to obtain a second template identification result of the target image;

the preset adaptation degree calculation model is used for calculating the adaptation degree of any template in the target image and the database based on a preset mode, and continuously sequencing all templates in the database based on the adaptation degree, so that a template with the highest adaptation degree, namely a second recognition result, is finally obtained.

In some embodiments, the fitness of the target image and either template is calculated using the cross-over (Intersection over Union, ioU) method.

S104, determining a target template based on the first template recognition result and the second template recognition result;

and comparing the prediction probability corresponding to the first template recognition result with the adaptation degree corresponding to the second template recognition result, and taking the template corresponding to the optimal recognition result as a target template.

For example, the maximum prediction probability of the first template recognition result is 90%, the corresponding template category is the second class bill of lading, the maximum adaptation degree of the second template recognition result is 92%, and the corresponding template category is the first class bill of lading, so that the first template recognition result and the second template recognition result are compared to determine that the target template is the first class bill of lading with the adaptation degree of 92%.

S105, carrying out content recognition on the target image based on the template information of the target template to obtain content description information of the target image.

And correspondingly identifying the template information of the target template and the content of the target image to obtain the structured content description information. The content description information may be output in the form of a key + value, the key representing the type of information corresponding to the value, including, but not limited to, sender, recipient, or bill of lading.

In some embodiments, based on a preset labeling area of the target template, a character recognition open source tool page is called for a corresponding area of the input target image to obtain a character recognition result, all preset labeling areas of the target template are traversed, and finally a structural result of a key + value of the target image is obtained.

The method includes the steps that an optimal target template corresponding to a target image is determined to be a first type bill of lading template, the first type bill of lading template comprises a first type bill of lading number, a shipper, a receiver, a ship name and the like, the target image is identified based on the first type bill of lading template, and structural content description information is obtained, for example, the first type bill of lading number: g154721, shipper: and thirdly, the receiver: plum IV, ship name: an ideal ship, wherein the first class of the list number is key, and G154721 is value.

Character Recognition (OCR) of images has been currently applied to various industries such as sorting of express packages in the logistics industry, check document recognition input in the financial industry, license plate recognition in the traffic field, card recognition in daily life, bill recognition, and the like. OCR technology has become a common artificial intelligence capability, but the recognition result of general characters is a semi-structured form output by rows, and when various trade materials in different countries are paid across the border, general character recognition only outputs characters of one row, and the recognition result is not structured, especially the structuring problem of indistinguishable various plate materials cannot be processed, so that business application is difficult. According to the technical scheme, the advantages of the deep learning model prediction result and the traditional image matching algorithm adaptation degree calculation result are integrated, the accuracy of searching the optimal template is improved, the training time of the deep learning model and the workload of sample labeling are reduced, meanwhile, the identification of new template material types can be rapidly supported, the iterative update of the deep model is not required, and the accuracy and the efficiency of template classification are improved in a mode that the advantages of the deep learning model and the adaptation degree model are complementary. The template self-adaptation strategy is provided, and the structural problem of the multi-template material when the multi-template material is not distinguished is efficiently and accurately solved by utilizing the characteristics of short development period and high recognition rate of the template self-adaptation.

Referring to fig. 3, in some embodiments, invoking a preset fitness calculation model to perform a matching process on the target image and the template database to obtain a second template recognition result of the target image, including:

s1031, performing image transformation on the target image based on each preset template of the template database to obtain a first transformation image corresponding to each preset template;

s1032, carrying out adaptation processing on each preset template and the corresponding first transformation image of each preset template to obtain the adaptation degree of each preset template and the target image;

s1033, determining a second template recognition result based on the adaptation degree.

Traversing a template database, and before calculating the adaptation degree of a certain preset template in the target image and the template database, carrying out image transformation on the target image aiming at the certain preset template to obtain a first transformation image so as to enable the target image to be aligned relative to the preset template. After the target image is aligned relative to the preset template, calculating the adaptation degree of the aligned target image and the preset template, sorting all preset templates in a template database from large to small based on the adaptation degree, and determining the template type with the largest adaptation degree as a second template identification result.

Illustratively, the preset templates of the template database include, but are not limited to, templates corresponding to a license, templates corresponding to a document, or templates corresponding to a contract, such as: enterprise registration certificate templates, business registration certificate templates, year round declaration templates, customs clearance templates, first-class bill of lading templates, second-class bill of lading templates, or shipping manifest templates.

According to the technical scheme, the template which is optimally matched with the target image in the template database is determined by calculating the matching degree of the template image which is aligned relative to the preset template and the preset template.

Referring to fig. 4, in some embodiments, performing an adaptation process on each preset template and the corresponding first transformed image of each preset template to obtain an adaptation degree between each preset template and the target image, including:

s10321, obtaining a preset labeling area of each preset template;

s10322, performing region matching on the preset labeling region and the first transformation image to obtain a region to be identified, corresponding to each preset template, in the first transformation image;

s10323, determining the adaptation degree based on the preset labeling area of each preset template and the area to be identified corresponding to each preset template, wherein the adaptation degree represents the coincidence degree between the preset labeling area and the area to be identified.

A preset labeling area is preset for each preset template, and the to-be-identified area of the first transformation image is determined based on the preset labeling area of each preset template because the first transformation image is aligned with the preset template relatively. And calculating the coincidence ratio of the preset labeling area of each preset template and the area to be identified corresponding to each preset template by using a preset calculation method.

In some embodiments, the preset labeling areas of each preset template are pre-labeled with anchor boxes. Calculating a preset labeling area of each preset template and a coincidence ratio match_score of an area to be identified corresponding to the preset labeling area by using IoU, wherein the formula is as follows:

wherein N is the number of preset labeling areas in each preset template in the templates, and Anchor _i For the ith preset labeling area of each preset template, pre_anchor _i And the ith area to be identified of the first transformation image corresponding to each preset template.

Traversing preset templates in a template database, calculating the adaptation degree of each preset template and the first transformation image corresponding to each preset template, and finally obtaining the preset template with the highest adaptation degree.

According to the technical scheme, the matching degree of the target image and the preset template is obtained by calculating the matching degree, the calculation mode is simple and accurate, and the self-adapting efficiency and accuracy of the template are improved.

Referring to fig. 5, in some embodiments, performing image transformation on a target image based on each preset template of a template database to obtain a first transformed image corresponding to each preset template includes:

s10311, performing perspective transformation on the target image to obtain a second transformed image;

s10312, determining a plurality of first vertex information of the second transformation image and a plurality of second vertex information of a preset template;

s10313, determining vertex distance information between the second transformation image and a preset template based on the plurality of first vertex information and the plurality of second vertex information;

s10314, if the vertex distance information meets the first preset condition, determining the second transformation image as the first transformation image corresponding to the preset template.

And carrying out content recognition on the target image, selecting four reference contents from the content recognized by the target image, wherein the quadrilateral area formed based on the four reference contents is the largest, calculating a perspective projection matrix of the target image according to the quadrilateral formed by the four reference contents, and carrying out perspective transformation to obtain a second transformation image. And acquiring a plurality of first vertex information of the second transformation image, acquiring a plurality of second vertex information of a preset template, and judging the alignment condition between the second transformation image and the preset template based on the plurality of first vertex information of the second transformation image and the plurality of second vertex information of the preset template. The first preset condition may be that the sum of distances between vertices is smaller than or equal to a first threshold, that is, the sum of distances between vertices of the second transformation image and vertices of the preset template is smaller than or equal to the first threshold, and then it is determined that the second transformation image and the preset template satisfy an alignment condition, and it is determined that the second transformation image is the first transformation image corresponding to the preset template.

In some embodiments, text recognition is performed on the target image, four reference fields are selected from the text recognized by the target image, wherein the quadrilateral area formed by the four reference fields is maximum, a perspective projection matrix of the target image is calculated according to the quadrilateral formed by the four reference fields, and perspective transformation is performed to obtain a second transformation image. Determining four vertex information of a second transformation image, determining four vertex information of a preset template, calculating vertex distance information between the second transformation image and the preset template based on the four vertex information of the second transformation image and the four vertex information of the preset template, calculating the distance between each vertex of the second transformation image and the vertex corresponding to each vertex of the second transformation image in the preset template, and adding the four vertex distances to obtain a final total distance best_loc, wherein the calculation formula is as follows:

wherein s is ₁ ，s ₂ ，s ₃ Sum s ₄ Is the second oneTransforming four vertices, p, of an image _i For the ith vertex in the preset template, s _i Is the ith vertex in the second transformed image.

If the total distance best_loc between the four vertexes of the second transformation image and vertexes corresponding to the four vertexes of the second transformation image in the preset template is smaller than or equal to a first threshold value, determining that the second transformation image and the preset template meet the alignment condition, and determining the second transformation image as the first transformation image.

According to the technical scheme, the target image is aligned relative to the preset template through perspective projection transformation, so that the accuracy of calculation of the adaptation degree between the target image and the preset template is facilitated, and the accuracy of self-adaptation of the template is guaranteed.

Referring to fig. 6, in some embodiments, the method further comprises:

s10316, if the vertex distance information does not meet the first preset condition, repeating the steps of perspective transformation and vertex information determination and vertex distance information determination to obtain an updated second transformed image and updated vertex distance information;

s10317, if the updated vertex distance information meets the first preset condition, determining that the updated second transformation image is the first transformation image corresponding to the preset template.

If the vertex distance information between the second transformation image and the preset template meets the condition that the sum of the distances between the vertices of the second transformation image and the preset template is larger than a first threshold value, determining that the second transformation image and the preset template do not meet the alignment condition. At this time, the perspective transformation is continuously performed on the second transformation image to obtain an updated second transformation image, the vertex distance calculation is continuously performed on the updated second transformation image and the preset template, and if the vertex distance information of the updated second transformation image and the preset template meets the first preset condition, the updated second transformation image is determined to be the first transformation image corresponding to the preset template. If the vertex distance information of the updated second transformation image and the preset template does not meet the first preset condition, continuing to perform projection transformation on the updated second transformation image, and repeatedly executing the corresponding steps until the vertex distance information between the second transformation image and the preset template meets the first preset condition or the number of times of projection transformation reaches the preset number of times, and stopping circulation. If the number of times of projection conversion of the second conversion image reaches the preset number of times and the vertex distance information between the second conversion image and the preset template does not meet the first preset condition, discarding the preset template, and continuing to perform corresponding processing on other preset templates of the template database.

According to the technical scheme, the target image and the preset template are aligned for a plurality of times, accuracy of calculation of the adaptation degree of the target image and the preset template is ensured, when the alignment operation reaches the preset times, the situation that the target image and the preset template cannot be aligned all the time is indicated that the preset template is not matched with the target image, the preset template is discarded, and calculation amount of calculation of the adaptation degree is reduced.

Referring to fig. 7, in some embodiments, invoking a preset layout classification model to perform image layout recognition on a target image to obtain a first template recognition result of the target image includes:

s1021, performing convolution processing on the target image to obtain image characteristics;

s1022, carrying out content area identification on the target image to obtain a content area to be identified;

s1023, carrying out content identification on the content area to be identified to obtain content description characteristics;

s1024, carrying out feature fusion processing on the image features and the content description features to obtain target features;

s1025, inputting the target features into a classifier, and classifying image formats to obtain a first template recognition result corresponding to the target image.

The preset layout classification model is used for predicting which template classification the target image belongs to. And inputting the target image into a convolution layer of a preset layout classification model, and carrying out convolution processing to obtain the image characteristics of the target image. And carrying out content area identification on the target image, and sequentially inputting the content area to be identified into a content identification model to obtain content description characteristics corresponding to the target image. And fusing the image features and the content description features by using a preset fusion method to obtain target features, inputting the target features into a full-connection layer, and classifying to obtain a first template recognition result.

In some embodiments, the preset layout classification model includes, but is not limited to, a layout classification model trained based on convolutional neural network VGG 16. Inputting the image set to be trained of the classified templates into 13 convolution layers, extracting image features of the images, and extracting text features, namely content description features by using a text recognition model, wherein the text recognition model comprises a residual network ResNet and a sequence-cyclic neural network Seq-RNN by way of example and not limitation. The image features and the content description features are fused in a feature fusion method concat mode, 3 full-connection layers of the VGG16 are responsible for completing classification, 13 convolution layer parameters are kept unchanged in a transfer learning mode, the parameters of the 3 full-connection layers are adjusted, the number of the classes is equal to the number of templates, and training is stopped when a loss function converges or early-stop method early-stop conditions are met. And inputting the target image into a preset format classification model to obtain a template type with the maximum prediction probability, namely a first template recognition result.

In some embodiments, performing image segmentation processing on a target image to obtain a non-content image block and a to-be-identified content area corresponding to the target image; carrying out convolution processing on the non-content image blocks to obtain image characteristics; carrying out content identification on the content area to be identified to obtain content description characteristics; carrying out feature fusion processing on the image features and the content description features to obtain target features; and inputting the target features into a classifier, and classifying the image formats to obtain a first template recognition result corresponding to the target image.

In some embodiments, the image segmentation of the target image may be implemented by a semantic segmentation method, so as to obtain at least one content area to be identified corresponding to the text object in the material to be identified. The semantic segmentation method can be implemented based on a semantic segmentation network U-Net, a full convolutional neural network FCN, a residual network ResNet, an image segmentation network SegNet and the like, and the non-content image blocks can comprise but are not limited to a drawing image block, a table image block, a caption image block, a header image block, a footer image block and the like, and the text objects comprise but are not limited to characters, pictures or symbols.

According to the technical characteristics, the image segmentation processing is utilized to obtain the fine granularity characteristics of multiple modes of the target image, the image blocks of various document elements such as text content, images, tables and inscriptions in the data to be identified are extracted, fine granularity information is provided for subsequent feature extraction and content identification, and the accuracy and generalization of target image identification are improved. And the template classification precision and efficiency of the template classification are improved through deep learning and transfer learning for predicting the template class of the target image. And (5) realizing image analysis of the data to be identified through image segmentation processing.

Referring to fig. 8, in some embodiments, before acquiring the target image, the method further includes:

S201, acquiring an initial image;

s202, performing perspective projection transformation on the initial image to obtain a target image.

An initial image of the material to be identified is acquired, and the initial image can be acquired by shooting, scanning or format conversion. And detecting the picture outline of the initial image by using a preset detection method, obtaining the outline of the edge of the initial image by using a filtering method after all the outlines of the initial image are detected, calculating coordinates of four vertexes of the initial image after projection transformation, calculating a transformation matrix, and obtaining the target image after transformation of the initial image by using the transformation matrix.

For example, the edge contour of the initial image is obtained by using a contour detection function findContours, a contour perimeter calculation function arcLength and an approximation curve generation function approxPolyDP in the image processing toolkit OpenCV2, so that four vertex coordinates of the initial image are obtained, and the initial image is aligned by using a perspective projection matrix.

According to the technical scheme, the initial image of the material to be identified is subjected to position adjustment, so that the problem of possible rotation angle of the image is solved, and the efficiency and accuracy of the following template self-adaptation process are ensured.

The image information identification method of the application is introduced by combining the application scene:

s1: acquiring an initial image;

s2: performing perspective projection transformation on the initial image to obtain a target image;

s3: calling a preset format classification model, inputting the target image into a convolution layer of the preset format classification model, and carrying out convolution processing to obtain image characteristics of the target image; s4: carrying out content area identification on the target image, and sequentially inputting the content area to be identified into a content identification model to obtain character features corresponding to the target image, namely content description features;

s5: fusing the image characteristics and the content description characteristics by a preset fusion method to obtain target characteristics;

s6: inputting target features into a full-connection layer, and classifying to obtain a template category with the maximum prediction probability, namely a template recognition result;

s7: invoking an adaptation degree calculation model, traversing each preset template in a template database, and calculating the adaptation degree of each preset template and the target template aligned relative to the preset template to obtain a preset template with the maximum adaptation degree;

s8, determining a better template in the templates pointed by the first template recognition result and the second template recognition result as a target template by comparing the prediction probability and the adaptation degree;

S9: and identifying the target image based on the target template to obtain content description information comprising the key + value.

Referring to fig. 9, according to a second aspect of the present disclosure, there is provided an image information identifying apparatus, the apparatus including:

a target image acquisition module 10 for acquiring a target image;

the first template recognition result determining module 20 is configured to invoke a preset layout classification model to perform image layout recognition on the target image, so as to obtain a first template recognition result of the target image;

the second template recognition result determining module 30 is configured to invoke a preset fitness calculation model, and perform a matching process on the target image and the template database to obtain a second template recognition result of the target image;

a target template determination module 40 for determining a target template based on the first template recognition result and the second template recognition result;

the content recognition module 50 is configured to perform content recognition on the target image based on the template information of the target template, so as to obtain content description information of the target image.

In some embodiments, the second template recognition result determination module 30 includes:

an image transformation unit 31, configured to perform image transformation on the target image based on each preset template in the template database, so as to obtain a first transformed image corresponding to each preset template;

An adaptation processing unit 32, configured to perform an adaptation process on each preset template and the corresponding first transformed image of each preset template, so as to obtain an adaptation degree between each preset template and the target image;

and a result determination unit 33 for determining a second template recognition result based on the adaptation degree.

In some embodiments, the adaptation processing unit 32 comprises:

a preset labeling area obtaining unit 321, configured to obtain a preset labeling area of each preset template;

the to-be-identified region determining unit 322 is configured to perform region matching on the preset labeling region and the first transformation image, so as to obtain a to-be-identified region corresponding to each preset template in the first transformation image;

the fitness determining unit 323 is configured to determine a fitness based on a preset labeling area of each preset template and an area to be identified corresponding to each preset template, where the fitness characterizes a coincidence ratio between the preset labeling area and the area to be identified.

In some embodiments, the image transformation unit 31 includes:

a second transformed image obtaining unit 311, configured to perform perspective transformation on the target image to obtain a second transformed image;

a vertex information determining unit 312 for determining a plurality of first vertex information of the second transformed image and a plurality of second vertex information of the preset template;

A vertex distance information determining unit 313 for determining vertex distance information between the second transformed image and a preset template based on the plurality of first vertex information and the plurality of second vertex information;

the first transformed image determining unit 314 is configured to determine that the second transformed image is the first transformed image corresponding to the preset template if the vertex distance information meets the first preset condition.

In some embodiments, the image transformation unit 31 further comprises:

an updating unit 316, configured to repeatedly execute the steps of perspective transformation and vertex information determination and vertex distance information determination if the vertex distance information does not meet the first preset condition, to obtain an updated second transformed image and updated vertex distance information;

the first transformed image updating unit 317 is configured to determine that the updated second transformed image is the first transformed image corresponding to the preset template if the updated vertex distance information meets the first preset condition.

In some embodiments, the first template recognition result determination module 20 includes:

an image feature determining unit 21, configured to perform convolution processing on the target image to obtain an image feature;

a to-be-identified content area determining unit 22, configured to identify a content area of the target image, so as to obtain a to-be-identified content area;

A content description feature determining unit 23, configured to perform content recognition on the area to be recognized, so as to obtain content description features;

a target feature determining unit 24, configured to perform feature fusion processing on the image feature and the content description feature, so as to obtain a target feature;

and the classifying unit 25 is configured to input the target feature into a classifier, and perform image layout classification to obtain a first template recognition result corresponding to the target image.

In some embodiments, the apparatus further comprises:

an initial image acquisition module 70 for acquiring an initial image;

the target image determining module 80 is configured to perform perspective projection transformation on the initial image to obtain a target image.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The embodiment of the application provides an image information identification device, which can be a terminal or a server, and comprises a processor and a memory, wherein at least one instruction or at least one section of program is stored in the memory, and the at least one instruction or the at least one section of program is loaded and executed by the processor to realize the image information identification method provided by the embodiment of the method.

The memory may be used to store software programs and modules that the processor executes to perform various functional applications and data processing by executing the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.

The method embodiments provided in the embodiments of the present application may be performed in an electronic device such as a mobile terminal, a computer terminal, a server, or a similar computing device. Fig. 10 is a block diagram of a hardware structure of an electronic device according to an image information recognition method according to an embodiment of the present application. As shown in fig. 10, the electronic device 900 may vary considerably in configuration or performance, and may include one or more central processing units (Central Processing Units, CPU) 910 (the processor 910 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 930 for storing data, one or more storage media 920 (e.g., one or more mass storage devices) for storing applications 923 or data 922. Wherein memory 930 and storage medium 920 may be transitory or persistent storage. The program stored on the storage medium 920 may include one or more modules, each of which may include a series of instruction operations in the electronic device. Still further, the central processor 910 may be configured to communicate with a storage medium 920 and execute a series of instruction operations in the storage medium 920 on the electronic device 900. The electronic device 900 may also include one or more power supplies 960, one or more wired or wireless network interfaces 950, one or more input/output interfaces 940, and/or one or more operating systems 921, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

The input-output interface 940 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communications provider of the electronic device 900. In one example, the input-output interface 940 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices through a base station to communicate with the internet. In one example, the input/output interface 940 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 10 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, electronic device 900 may also include more or fewer components than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

Embodiments of the present application also provide a computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program related to implementing an image information identifying method in a method embodiment, where the at least one instruction or the at least one program is loaded and executed by the processor to implement the image information identifying method provided in the method embodiment.

Alternatively, in this embodiment, the storage medium may be located in at least one network server among a plurality of network servers of the computer network. Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.

As can be seen from the embodiments of the image information identifying method, apparatus, device, terminal, server, storage medium or computer program provided by the present application described above, the present application acquires a target image; invoking a preset format classification model to perform image format recognition on the target image to obtain a first template recognition result of the target image, and performing first template recognition on the target image through the preset format classification model; calling a preset adaptation degree calculation model, and carrying out adaptation treatment on the target image and a template database to obtain a second template identification result of the target image; performing second template recognition on the target image by calculating the adaptation degree; determining a target template based on the first template recognition result and the second template recognition result; comparing the two template recognition results, and taking the template corresponding to the better recognition result as a final target template. And carrying out content recognition on the target image based on the template information of the target template to obtain content description information of the target image. And identifying the content of the target image based on the target template to obtain a structured output result. And the accuracy and the efficiency of template classification are improved by the way that the advantages of the deep learning model and the fitness model are complementary. The problem that automatic structural identification cannot be carried out when various format materials are not distinguished is solved, and manpower and material resources are saved.

It should be noted that: the foregoing sequence of the embodiments of the present application is only for describing, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for apparatus, devices and storage medium embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above embodiments may be implemented by hardware, or may be implemented by a program indicating that the relevant hardware is implemented, where the program may be stored on a computer readable storage medium, where the storage medium may be a read only memory, a magnetic disk or optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to the particular embodiments of the present application.

Claims

1. An image information recognition method, characterized in that the method comprises:

acquiring a target image;

inputting the target features into a classifier, and classifying image formats to obtain a first template recognition result corresponding to the target image;

performing image transformation on the target image based on each preset template of a template database to obtain a first transformation image corresponding to each preset template;

Determining a second template recognition result based on the adaptation degree;

2. The method according to claim 1, wherein the adapting each preset template and the corresponding first transformed image of each preset template to obtain the adaptation degree of each preset template and the target image includes:

acquiring a preset labeling area of each preset template;

3. The method according to claim 1, wherein the performing image transformation on the target image based on each preset template of the template database to obtain a first transformed image corresponding to each preset template includes:

4. A method according to claim 3, characterized in that the method further comprises:

5. The method of any one of claims 1-4, further comprising, prior to the acquiring the target image:

Acquiring an initial image;

6. An image information recognition apparatus, characterized in that the apparatus comprises:

the target image acquisition module is used for acquiring a target image;

the convolution processing module is used for carrying out convolution processing on the target image to obtain image characteristics;

the content area identification module is used for carrying out content area identification on the target image to obtain a content area to be identified;

the content identification module is used for carrying out content identification on the content area to be identified to obtain content description characteristics;

the feature fusion processing module is used for carrying out feature fusion processing on the image features and the content description features to obtain target features;

the classification module is used for inputting the target features into a classifier, classifying image formats and obtaining a first template recognition result corresponding to the target image;

the image transformation module is used for carrying out image transformation on the target image based on each preset template of the template database to obtain a first transformation image corresponding to each preset template;

the adaptation processing module is used for carrying out adaptation processing on each preset template and the corresponding first transformation image of each preset template to obtain the adaptation degree of each preset template and the target image;

The second template recognition result determining module is used for determining a second template recognition result based on the adaptation degree;

7. A computer-readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the image information identification method according to any one of claims 1-5.

8. An electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the image information identification method of any one of claims 1-5 by executing the instructions stored by the memory.