CN115952295A

CN115952295A - Entity relation labeling model processing method based on image and related equipment thereof

Info

Publication number: CN115952295A
Application number: CN202211568748.2A
Authority: CN
Inventors: 梁凯程
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-04-11

Abstract

The embodiment of the application belongs to the field of artificial intelligence, and relates to an entity relation labeling model processing method and device based on an image, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a training image, wherein the entity marking information of the training image comprises text information and relation information of each entity in the image; generating text compound vectors of all entities according to the text information, and generating initial compound vectors of training images; carrying out separation convolution on the training image to obtain a convolution characteristic vector; combining the initial composite vector and the convolution characteristic vector to obtain an image composite vector; compositing text with vectors and imagesInputting the compound vector into an entity relation labeling network to obtain entity relation prediction information; calculating model loss according to the relationship information and the entity relationship prediction information so as to train the model; and inputting the image to be annotated into the model to obtain entity relationship information. The application also relates to blockchain technology, and the training image can be stored in blockchain _。 The method and the device realize automatic identification of the entity in the image and automatic labeling of the entity relationship.

Description

Entity relation labeling model processing method based on image and related equipment thereof

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for processing an entity relationship labeling model based on an image, a computer device, and a storage medium.

Background

The image is used as an information carrier, the contained information is richer, and the use is very convenient. With the development of computer technology, content information can be acquired from images by means of computer vision algorithms. In addition to obtaining content information from images, it is desirable to further mine relationship information in the image content. For example, entities are identified from an image, and information, attributes, and relationships between entities implied by the entities are obtained. However, the current entity relationship labeling is mainly implemented in a text scene and cannot be directly implemented in an image scene.

Disclosure of Invention

An embodiment of the application aims to provide an entity relationship labeling model processing method and device based on an image, a computer device and a storage medium, so as to realize entity relationship labeling in the image.

In order to solve the above technical problem, an embodiment of the present application provides an entity relationship labeling model processing method based on an image, which adopts the following technical solutions:

acquiring a training image with entity labeling information, wherein the entity labeling information comprises text information and relationship information of each entity in the training image, the text information comprises text remark values and coordinate information of each entity, and the relationship information comprises entity relationships and incidence relationships among the entities;

generating text compound vectors of the entities according to the text information, and generating initial compound vectors of the training images;

inputting the text compound vector, the training image and the initial compound vector into an initial entity relationship labeling model, and performing separation convolution processing on the training image through a convolution network in the initial entity relationship labeling model to obtain a convolution feature vector;

combining the initial composite vector and the convolution characteristic vector to obtain an image composite vector;

inputting the text compound vector and the image compound vector into an entity relationship labeling network in the initial entity relationship labeling model to obtain entity relationship prediction information;

calculating model loss according to the relationship information and the entity relationship prediction information, and performing parameter adjustment on the initial entity relationship labeling model according to the model loss until the model loss meets training stopping conditions to obtain an entity relationship labeling model;

and acquiring an image to be annotated, and inputting the image to be annotated into the entity relationship annotation model to obtain entity relationship information.

In order to solve the above technical problem, an embodiment of the present application further provides an entity relationship labeling model processing apparatus based on an image, which adopts the following technical solutions:

the image acquisition module is used for acquiring a training image with entity labeling information, wherein the entity labeling information comprises text information and relationship information of each entity in the training image, the text information comprises text remark values and coordinate information of each entity, and the relationship information comprises entity relationships and incidence relationships among the entities;

the vector generation module is used for generating text compound vectors of the entities according to the text information and generating initial compound vectors of the training images;

the convolution processing module is used for inputting the text composite vector, the training image and the initial composite vector into an initial entity relationship labeling model so as to carry out separation convolution processing on the training image through a convolution network in the initial entity relationship labeling model to obtain a convolution feature vector;

the vector merging module is used for merging the initial composite vector and the convolution characteristic vector to obtain an image composite vector;

the vector input module is used for inputting the text composite vector and the image composite vector into an entity relationship labeling network in the initial entity relationship labeling model to obtain entity relationship prediction information;

the model adjusting module is used for calculating model loss according to the relationship information and the entity relationship prediction information so as to carry out parameter adjustment on the initial entity relationship labeling model according to the model loss until the model loss meets a training stopping condition to obtain an entity relationship labeling model;

and the image annotation module is used for acquiring an image to be annotated and inputting the image to be annotated into the entity relationship annotation model to obtain entity relationship information.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

inputting the text compound vector, the training image and the initial compound vector into an initial entity relationship labeling model, and performing separation convolution processing on the training image through a convolution network in the initial entity relationship labeling model to obtain a convolution characteristic vector;

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: acquiring a training image with entity labeling information, wherein the entity labeling information comprises text information and relationship information of each entity in the training image, the text information comprises a text remark value and coordinate information of each entity, and the relationship information comprises entity relationships and incidence relationships among the entities; generating text compound vectors of all entities according to the text information, and generating initial compound vectors of training images; inputting a text compound vector, a training image and an initial compound vector into an initial entity relationship labeling model, wherein the initial entity relationship labeling model comprises a convolution network and an entity relationship labeling network; the convolution network carries out separation convolution processing on the training image to obtain a convolution characteristic vector serving as a supplementary characteristic, the separation convolution can reduce the calculated amount and reduce parameters, the induction bias of the convolution network can reduce required samples, and the training efficiency is improved; combining the initial composite vector and the convolution characteristic vector to obtain an image composite vector, and inputting the text composite vector and the image composite vector into an entity relationship labeling network to obtain entity relationship prediction information; calculating model loss according to the relationship information and the entity relationship prediction information to perform parameter adjustment on the model until the model loss meets the training stopping condition to obtain an entity relationship labeling model; and acquiring an image to be annotated, and inputting the image to be annotated into the entity relationship annotation model to obtain entity relationship information, thereby realizing automatic identification of entities in the image and automatic annotation of entity relationships.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for image-based entity relationship annotation model processing according to the present application;

FIG. 3 is a schematic structural diagram of an embodiment of an image-based entity relationship labeling model processing apparatus according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may use

terminal devices

101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that, the entity relation annotation model processing method based on image provided in the embodiment of the present application is generally executed by a server, and accordingly, the entity relation annotation model processing apparatus based on image is generally disposed in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

With continuing reference to FIG. 2, a flowchart of one embodiment of a method for image-based entity-relationship annotation model processing according to the present application is shown. The entity relationship labeling model processing method based on the image comprises the following steps:

step S201, a training image with entity marking information is obtained, the entity marking information comprises text information and relation information of each entity in the training image, the text information comprises text remark values and coordinate information of each entity, and the relation information comprises entity relations and incidence relations among the entities.

In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the image-based entity relationship labeling model processing method operates may communicate with the terminal through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G/5G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, an UWB (ultra wideband) connection, and other wireless connection means now known or developed in the future.

Specifically, a training image with entity labeling information is obtained first. The entity labeling information is equivalent to a label of a training image, the entity in the image is identified through an entity relation labeling model, and the relation between the entities is labeled, so that the entity labeling information comprises text information and relation information of each entity in the training image.

The text information comprises text remark values and coordinate information of all entities, wherein the text remark values are texts and are character representations of the entities; the entity is within a region of the training image, the region having coordinate information. For example, in the image of the medical charging bill, there are two entities of "personal payment amount" and "4180", two sets of text remark values of "personal payment amount" and "4180" in text form need to be added to the image, and the image area where the two entities are located is selected, and the coordinate information of "personal payment amount" and "4180" is obtained according to the selected image area, respectively.

The image can have a plurality of entities, the entities have an association relationship, two entities with the association relationship are related, the related two entities have a mutual entity relationship, and the association relationship and the entity relationship form relationship information between the entities. For example, in the image of the medical charging bill, there are two entities "Jiu Qiang Lu Ba Jiu Wu Yuan" and "9685", which have a correlation relationship, and in the correlation relationship, "Jiu Qiang Lu Ba Jiu Wu Jiu Yuan" is represented in upper case, "9685" is represented in lower case, it is necessary to add a correlation relationship to these two entities, and note that the entity relationship of "Jiu Qiang Lu Ba Jiu Wu Jiu Yuan" is "upper case," and the entity relationship of "9685" is "lower case.

It is emphasized that, to further ensure the privacy and security of the training images, the training images may also be stored in nodes of a blockchain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Step S202, generating text compound vectors of all entities according to the text information, and generating initial compound vectors of training images.

Specifically, the related information of the entity and the training image need to be converted into vectors respectively, the text information of the entity includes the text remark value and the coordinate information of the entity, both the text remark value and the coordinate information need to be converted into vectors, and the text remark value and the coordinate information are combined after the related information of the entity is converted into the vectors, so that a text composite vector is obtained.

When the training image is converted into the vector, the training image needs to be firstly cut into a plurality of image blocks, and the image blocks have position information compared with the original training image, so that the vector obtained by converting the training image also comprises the position information besides the image feature information of the training image, and an initial composite vector is obtained.

Step S203, inputting the text compound vector, the training image and the initial compound vector into an initial entity relationship labeling model, and performing separation convolution processing on the training image through a convolution network in the initial entity relationship labeling model to obtain a convolution characteristic vector.

Specifically, the entity relationship labeling is realized through an entity relationship labeling model, and the initial entity relationship labeling model can be an entity relationship labeling model which is not trained yet. The initial entity relationship labeling model/entity relationship labeling model may include two parts, a convolutional network and an entity relationship labeling network.

After the text compound vector, the training image and the initial compound vector are input into the initial entity relationship labeling model, the training image is first input into a Convolutional network (CNN), i.e., a Convolutional Neural network. And the convolution network performs separation convolution processing on the training image to obtain a convolution characteristic vector. Compared with the common convolution, the separation convolution can reduce the calculated amount and parameters and is beneficial to keeping the lightweight of the model. Meanwhile, the convolutional network CNN has the characteristic of inducing bias, wherein the inducing bias is the priori knowledge and the assumption made in advance, for example, adjacent areas in the image have similar characteristics and translation invariance; with prior information, a better model can be learned according to relatively few samples, the relationship among some entities in the image belongs to translation invariance, and the positions and the shapes of the entities can be changed to a certain extent, so that the defect can be overcome by the convolutional network. Meanwhile, the convolution feature vector output by the convolution network can be used as a supplementary feature of the training image, which is equivalent to extracting more information from the training image. And step S204, combining the initial compound vector and the convolution characteristic vector to obtain an image compound vector.

Specifically, the convolution feature vector may be used as a supplementary feature of the training image, and the convolution feature vector and the initial composite vector are merged (concat), so as to obtain an image composite vector of the training image.

Step S205, inputting the text compound vector and the image compound vector into an entity relationship labeling network in the initial entity relationship labeling model to obtain entity relationship prediction information.

Specifically, the text compound vector and the image compound vector are input to an entity relationship labeling network in an initial entity relationship labeling model, the entity relationship labeling network processes the text compound vector and the image compound vector, and entity relationship prediction information, namely entity relationships and incidence relationships among entities contained in a training image predicted by the entity relationship labeling network, is output.

In one embodiment, the entity relationship prediction information may further include text remark values and coordinate information of each entity in the training image predicted by the entity relationship labeling network.

And S206, calculating model loss according to the relationship information and the entity relationship prediction information, and performing parameter adjustment on the initial entity relationship labeling model according to the model loss until the model loss meets the training stopping condition to obtain the entity relationship labeling model.

Specifically, model loss is calculated according to relationship information in the entity labeling information and entity relationship prediction information; when the entity relationship prediction information includes text information, that is, the text remark value and the coordinate information of each entity, the model loss needs to be calculated according to the text information in the entity labeling information and the text information in the entity relationship prediction information, the relationship information in the entity labeling information and the relationship information in the entity relationship prediction information.

After the model loss is obtained, adjusting model parameters of the initial entity relationship labeling model by taking the minimum model loss as a target, performing iterative training on the initial entity relationship labeling model after the parameter adjustment, and stopping the training until the obtained model loss meets a training stopping condition (for example, the model loss can be converged, or the model loss is less than a preset loss threshold), so as to obtain the entity relationship labeling model.

Step S207, obtaining the image to be annotated, inputting the image to be annotated into the entity relationship annotation model, and obtaining entity relationship information.

Specifically, when the method is applied, an image to be labeled is obtained, the image to be labeled is input into the trained entity relationship labeling model, and entity relationship information output by the entity relationship labeling model is obtained. The entity relationship information may include relationship information between entities in the image to be annotated (i.e., entity relationship and association relationship between entities), and may also include text information of entities in the image to be annotated (text remark value and coordinate information of entities), thereby implementing automatic identification of entities in the image and automatic annotation of entity relationship.

In the embodiment, a training image with entity labeling information is obtained, the entity labeling information includes text information and relationship information of each entity in the training image, the text information includes text remark values and coordinate information of each entity, and the relationship information includes entity relationships and association relationships among the entities; generating text compound vectors of all entities according to the text information, and generating initial compound vectors of training images; inputting a text compound vector, a training image and an initial compound vector into an initial entity relationship labeling model, wherein the initial entity relationship labeling model comprises a convolution network and an entity relationship labeling network; the convolution network carries out separation convolution processing on the training image to obtain a convolution characteristic vector serving as a supplementary characteristic, the separation convolution can reduce the calculated amount and reduce parameters, the induction bias of the convolution network can reduce required samples, and the training efficiency is improved; combining the initial composite vector and the convolution characteristic vector to obtain an image composite vector, and inputting the text composite vector and the image composite vector into an entity relationship labeling network to obtain entity relationship prediction information; calculating model loss according to the relationship information and the entity relationship prediction information so as to perform parameter adjustment on the model until the model loss meets the training stopping condition, and obtaining an entity relationship labeling model; and acquiring an image to be annotated, and inputting the image to be annotated into the entity relationship annotation model to obtain entity relationship information, thereby realizing automatic identification of entities in the image and automatic annotation of entity relationships.

Further, before step S201, the method may further include: acquiring an initial training image with entity labeling information; carrying out image enhancement processing on the initial training image to obtain an enhanced image; and acquiring entity labeling information of the enhanced image according to image enhancement processing to obtain a training image.

Specifically, an initial training image with entity labeling information is obtained, and the entity labeling information of the initial training image has the same content as the entity labeling information mentioned above. In order to reduce the number of samples required by training and the labor cost consumed by labeling, image enhancement processing including rotation processing, blurring processing, brightness adjustment and the like is performed on the initial training image according to a preset image enhancement mode to obtain a plurality of enhanced images of the initial training image. It is understood that the initial training image may be subjected to image enhancement processing according to only one image enhancement mode, or may be subjected to superposition of multiple image enhancement processing. The initial training image itself may also be the enhanced image.

After the initial training image is subjected to image enhancement processing, the obtained entity labeling information of the enhanced image may be different from the entity labeling information of the initial training image, for example, after rotation processing, the position information of the entity may change, and the entity labeling information of the initial training image needs to be adjusted according to the image enhancement processing, so as to obtain the entity labeling information of each enhanced image, thereby obtaining an accurate training image.

In the embodiment, the initial training image is subjected to image enhancement processing to expand the sample, and the entity annotation information of the initial training image is adjusted according to the image enhancement processing to obtain the entity annotation information of the enhanced image, so that a large number of training images are obtained, multiple samples of the sample are increased, and the adaptability and robustness of the model are improved.

Further, the step of generating the text compound vector of each entity according to the text information may include: respectively converting the text remark values of all entities in the text information into word vectors; respectively generating a one-dimensional position vector of each entity according to the text remark value of each entity, and respectively generating a two-dimensional position vector of each entity according to the coordinate information of each entity; and generating a text composite vector of each entity according to the word vector, the one-dimensional position vector and the two-dimensional position vector which respectively correspond to each entity.

Specifically, obtaining a text remark value of each entity from the text information, and converting the text remark value into a word vector; and combining the text remark values of the entities into a text corresponding to the training image, and obtaining a one-dimensional position vector of each entity based on the text. The coordinate information of the entity reflects the position of the entity in the training image, the layout information of the training image can be embodied, and the two-dimensional position vector of the entity can be generated based on the coordinate information of the entity. And adding the word vector, the one-dimensional position vector and the two-dimensional position vector of each entity to obtain a text compound vector of each entity.

In one embodiment, the word vector may be generated by a pre-trained Roberta model, as may the one-dimensional position vector and the two-dimensional position vector. The Roberta models that generate the word vectors, the one-dimensional position vectors, and the two-dimensional position vectors may be the same model or may be different models.

In this embodiment, the text composite vector of the entity is obtained by adding the word vector, the one-dimensional position vector, and the two-dimensional position vector of the entity, and the text semantics of the entity itself, the position of the entity in the text, and the position of the entity in the training image are considered, so that the text composite vector can perform accurate comprehensive characterization on the entity.

Further, the step of generating the initial composite vector of the training image may include: adjusting the training image to a preset size, and cutting the training image after size adjustment according to a preset cutting mode to obtain a plurality of image blocks; generating a one-dimensional position vector of a training image, and respectively generating image characteristics of each image block; and generating an initial composite vector of the training image according to the one-dimensional position vector and the image characteristics of each image block.

Specifically, the size of the training image is adjusted, the training image is scaled to a preset size, and then the training image after size adjustment is cut according to a preset cutting mode to obtain a plurality of image blocks, where the size of each image block may be the same, for example, the training image is cut into 3 × 3 image blocks of 16 × 16.

The image blocks are arranged in a line according to a preset order, such as the positions of the image blocks in the training image, to generate a one-dimensional position vector of the training image, which is a learnable one-dimensional position vector. And then carrying out linear mapping conversion on each image block to obtain image characteristics. The initial composite vector of the training image can be obtained by adding the one-dimensional position vector to each image feature.

In this embodiment, the image blocks are arranged to obtain a one-dimensional position vector of the training image, and the initial composite vector is obtained by adding the image features of the image blocks, so that the position information and the image features of the training image are considered, and the accuracy of the initial composite vector is ensured.

Further, before the step of inputting the text compound vector, the training image and the initial compound vector into the initial entity relationship labeling model, the method may further include: acquiring an initial entity relationship labeling network, wherein the initial entity relationship labeling network is constructed based on a layout LMv3 network; and pre-training the initial entity relationship labeling network according to a preset pre-training task to obtain an entity relationship labeling network, wherein the preset pre-training task comprises a mask language modeling task, a mask image modeling task and a word block alignment task.

Specifically, the initial entity relationship labeling model has an entity relationship labeling network, and the network needs to be pre-trained in advance. The initial entity relationship labeling network can be constructed based on a layout LMv3 network, which is a multi-modal transformer architecture, and combines text and images in a unified manner. It divides the image into image blocks and then represents them as linear projections, aligning the linear projections with the text labels, can reduce the required parameters and the overall amount of computation.

The layout LMv3 network redesigns training and image processing, does not have a visual model any more, and adopts VIT to replace, thereby reducing model parameters. The layout method has the advantages that the layout method of the Layoutlmv3 directly utilizes image blocks of the image, parameters are greatly saved, and complicated text preprocessing (such as manual labeling of target area frames and text target detection) is avoided. The simple unified architecture and training targets make LayoutLMv3 a universal pre-training model that can be adapted to text-centric and image-centric document tasks.

And pre-training the initial entity relationship labeling network according to a preset pre-training task, learning multi-modal feature representation in a self-supervision mode, and obtaining the entity relationship labeling network, wherein the preset pre-training task comprises a mask language modeling task, a mask image modeling task and a word block alignment task.

In order to facilitate network learning of the corresponding relationship between layout information and texts and images, the task randomly covers a certain proportion of text word vectors, but retains corresponding two-dimensional position (layout) information. Like BERT and layout lm, the model object is to restore the covered words in the text from the uncovered teletext and layout information.

Mask Image Modeling (MIM). To encourage the model to infer image information from the context information of the text and image, the task randomly masks a proportion of the image blocks. Like BEiT, the model object is to recover the discretized ID of the covered patch from the information of the uncovered text and image.

Word-chunk Alignment (WPA). For text, each text word corresponds to an image block. Since the first two tasks randomly cover part of the text words and image blocks, the model cannot explicitly learn the fine-grained alignment relationship between the text words and the image blocks. The target learns fine-grained alignment relationships between language and visual modalities by explicitly predicting whether corresponding image blocks of a text word are masked.

In the application, the convolutional network CNN has the characteristic of inducing bias, and a Transformer in the layout LMv3 network has strong global induction modeling capacity; after the convolutional network CNN is added, the limitation that the Transformer lacks induction bias is broken through, a better migration effect can be obtained in a downstream task, the capability of the model on a small sample learning task is greatly improved, the Transformer is free from dependence on large sample data in a small sample scene, the sample amount required by training is reduced, and the training efficiency is improved.

In the embodiment, the initial entity relationship labeling network is pre-trained according to the mask language modeling task, the mask image modeling task and the word block alignment task, so that the obtained entity relationship labeling network can perform entity relationship labeling.

Further, after step S206, the method may further include: carrying out image detection on the image to be annotated according to the entity relationship information to obtain an image detection result; and performing service processing on the image to be annotated according to the image detection result.

Specifically, after the entity relationship information is obtained, image detection may be performed on the image to be annotated according to the entity relationship information, where the image detection may be to perform service audit on the image to be annotated according to the image type of the image to be annotated, for example, whether a specific type of entity is missing in the image or not and whether an entity is wrong or not are detected, and in the image of the medical charging bill, if there is no text remark value of the personal payment fee, that is, no specific personal payment fee value is recorded, the medical charging bill may have an error; if there is an association between two entities representing amounts, one of them is capital and the other is lowercase, but the amounts corresponding to their text remark values are not equal, the medical charging bill may have an error.

And obtaining an image detection result after the image detection, wherein the image detection result can display whether the image to be annotated passes the detection or not, and if the image to be annotated does not pass the detection, recording which entities have errors. The image to be annotated can be subjected to business processing according to the image detection result, for example, when the image detection result indicates that the image to be annotated passes the detection, the business processing of the next process node is performed, for example, after the image of the medical charging bill passes the detection, the reimbursement process is performed; and if the image to be annotated does not pass the detection, generating an error prompt, and returning the image to be annotated to the previous process node.

In the embodiment, the image to be annotated is subjected to image detection according to the entity relationship information to obtain an image detection result, and the image to be annotated is subjected to service processing according to the image detection result, enters the next process node or returns to the previous process node, so that the automatic processing of the service is realized, and the service processing efficiency is improved.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The application can be applied to the field of intelligent medical treatment, and therefore the construction of a smart city is promoted.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware that is configured to be instructed by computer-readable instructions, which can be stored in a computer-readable storage medium, and when executed, the programs may include the processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an image-based entity relationship labeling model processing apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 3, the entity relationship labeling model processing apparatus 300 based on image according to this embodiment includes: an image obtaining module 301, a vector generating module 302, a convolution processing module 303, a vector merging module 304, a vector input module 305, a model adjusting module 306, and an image labeling module 307, wherein:

the image obtaining module 301 is configured to obtain a training image with entity tagging information, where the entity tagging information includes text information and relationship information of each entity in the training image, the text information includes text remark values and coordinate information of each entity, and the relationship information includes entity relationships and association relationships between the entities.

And a vector generation module 302, configured to generate a text compound vector for each entity according to the text information, and generate an initial compound vector of the training image.

And the convolution processing module 303 is configured to input the text composite vector, the training image, and the initial composite vector into the initial entity relationship labeling model, and perform separation convolution processing on the training image through a convolution network in the initial entity relationship labeling model to obtain a convolution feature vector.

And a vector merging module 304, configured to merge the initial composite vector and the convolution feature vector to obtain an image composite vector.

The vector input module 305 is configured to input the text composite vector and the image composite vector into an entity relationship labeling network in the initial entity relationship labeling model, so as to obtain entity relationship prediction information.

And the model adjusting module 306 is configured to calculate a model loss according to the relationship information and the entity relationship prediction information, and perform parameter adjustment on the initial entity relationship labeling model according to the model loss until the model loss meets a training stopping condition, so as to obtain an entity relationship labeling model.

And the image annotation module 307 is configured to obtain an image to be annotated, and input the image to be annotated into the entity relationship annotation model to obtain entity relationship information.

In the embodiment, a training image with entity labeling information is obtained, the entity labeling information includes text information and relationship information of each entity in the training image, the text information includes text remark values and coordinate information of each entity, and the relationship information includes entity relationships and association relationships among the entities; generating text compound vectors of all entities according to the text information, and generating initial compound vectors of training images; inputting a text compound vector, a training image and an initial compound vector into an initial entity relationship labeling model, wherein the initial entity relationship labeling model comprises a convolution network and an entity relationship labeling network; the convolution network carries out separation convolution processing on the training image to obtain a convolution characteristic vector serving as a supplementary characteristic, the separation convolution can reduce the calculated amount and reduce parameters, the induction bias of the convolution network can reduce required samples, and the training efficiency is improved; combining the initial composite vector and the convolution characteristic vector to obtain an image composite vector, and inputting the text composite vector and the image composite vector into an entity relationship labeling network to obtain entity relationship prediction information; calculating model loss according to the relationship information and the entity relationship prediction information to perform parameter adjustment on the model until the model loss meets the training stopping condition to obtain an entity relationship labeling model; and acquiring an image to be annotated, and inputting the image to be annotated into the entity relationship annotation model to obtain entity relationship information, thereby realizing automatic identification of entities in the image and automatic annotation of entity relationships.

In some optional implementations of the present embodiment, the image-based entity relationship annotation model processing apparatus 300 may further include: the device comprises an initial acquisition module, an image enhancement module and an image generation module, wherein:

and the initial acquisition module is used for acquiring the initial training image with the entity labeling information.

And the image enhancement module is used for carrying out image enhancement processing on the initial training image to obtain an enhanced image.

And the image generation module is used for acquiring the entity marking information of the enhanced image according to the image enhancement processing so as to obtain the training image.

In some optional implementations of this embodiment, the vector generation module 302 may include: the system comprises a text conversion submodule, a vector generation submodule and a composite generation submodule, wherein:

and the text conversion sub-module is used for converting the text remark values of the entities in the text information into word vectors respectively.

And the vector generation submodule is used for respectively generating a one-dimensional position vector of each entity according to the text remark value of each entity and respectively generating a two-dimensional position vector of each entity according to the coordinate information of each entity.

And the compound generation submodule is used for generating text compound vectors of the entities according to the word vectors, the one-dimensional position vectors and the two-dimensional position vectors which respectively correspond to the entities.

In this embodiment, the text compound vector of the entity is obtained by adding the word vector, the one-dimensional position vector, and the two-dimensional position vector of the entity, and the text semantics of the entity itself, the position of the entity in the text, and the position of the entity in the training image are considered, so that the text compound vector can perform accurate comprehensive characterization on the entity.

In some optional implementations of this embodiment, the vector generation module 302 may further include: the image adjusting submodule, the generating submodule and the initial generating submodule, wherein:

and the image adjusting submodule is used for adjusting the training image to a preset size and cutting the training image after size adjustment according to a preset cutting mode to obtain a plurality of image blocks.

And the generation submodule is used for generating a one-dimensional position vector of the training image and respectively generating the image characteristics of each image block.

And the initial generation sub-module is used for generating an initial composite vector of the training image according to the one-dimensional position vector and the image characteristics of each image block.

In some optional implementations of the present embodiment, the image-based entity relationship annotation model processing apparatus 300 may further include: network acquisition module and pre-training module, wherein:

and the network acquisition module is used for acquiring an initial entity relationship labeling network, wherein the initial entity relationship labeling network is constructed based on the layout LMv3 network.

And the pre-training module is used for pre-training the initial entity relationship labeling network according to a preset pre-training task to obtain the entity relationship labeling network, wherein the preset pre-training task comprises a mask language modeling task, a mask image modeling task and a word block alignment task.

In some optional implementations of the present embodiment, the image-based entity relationship annotation model processing apparatus 300 may further include: image detection module and business processing module, wherein:

and the image detection module is used for carrying out image detection on the image to be annotated according to the entity relationship information to obtain an image detection result.

And the business processing module is used for carrying out business processing on the image to be annotated according to the image detection result.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as computer readable instructions of the image-based entity relationship labeling model processing method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the computer readable instructions or processing data stored in the memory 41, for example, execute the computer readable instructions of the image-based entity relationship annotation model processing method.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The computer device provided in this embodiment may execute the above processing method for an entity relationship labeling model based on an image. The processing method of the entity relation labeling model based on the image can be the processing method of the entity relation labeling model based on the image in the above embodiments.

In the embodiment, a training image with entity labeling information is obtained, wherein the entity labeling information comprises text information and relationship information of each entity in the training image, the text information comprises a text remark value and coordinate information of each entity, and the relationship information comprises entity relationships and association relationships among the entities; generating text compound vectors of all entities according to the text information, and generating initial compound vectors of training images; inputting a text compound vector, a training image and an initial compound vector into an initial entity relationship labeling model, wherein the initial entity relationship labeling model comprises a convolution network and an entity relationship labeling network; the convolution network carries out separation convolution processing on the training image to obtain a convolution characteristic vector serving as a supplementary characteristic, the separation convolution can reduce the calculated amount and reduce parameters, the induction bias of the convolution network can reduce required samples, and the training efficiency is improved; combining the initial composite vector and the convolution characteristic vector to obtain an image composite vector, and inputting the text composite vector and the image composite vector into an entity relationship labeling network to obtain entity relationship prediction information; calculating model loss according to the relationship information and the entity relationship prediction information to perform parameter adjustment on the model until the model loss meets the training stopping condition to obtain an entity relationship labeling model; and acquiring an image to be annotated, and inputting the image to be annotated into the entity relationship annotation model to obtain entity relationship information, thereby realizing automatic identification of entities in the image and automatic annotation of entity relationships.

The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, which can be executed by at least one processor, to cause the at least one processor to execute the steps of the image-based entity relationship annotation model processing method as described above.

In the embodiment, a training image with entity labeling information is obtained, wherein the entity labeling information comprises text information and relationship information of each entity in the training image, the text information comprises a text remark value and coordinate information of each entity, and the relationship information comprises entity relationships and association relationships among the entities; generating text compound vectors of all entities according to the text information, and generating initial compound vectors of training images; inputting a text compound vector, a training image and an initial compound vector into an initial entity relationship labeling model, wherein the initial entity relationship labeling model comprises a convolution network and an entity relationship labeling network; the convolution network carries out separation convolution processing on the training image to obtain a convolution characteristic vector serving as a supplementary characteristic, the separation convolution can reduce the calculated amount and reduce parameters, the induction bias of the convolution network can reduce required samples, and the training efficiency is improved; combining the initial composite vector and the convolution characteristic vector to obtain an image composite vector, and inputting the text composite vector and the image composite vector into an entity relationship labeling network to obtain entity relationship prediction information; calculating model loss according to the relationship information and the entity relationship prediction information so as to perform parameter adjustment on the model until the model loss meets the training stopping condition, and obtaining an entity relationship labeling model; and acquiring an image to be annotated, and inputting the image to be annotated into the entity relationship annotation model to obtain entity relationship information, thereby realizing automatic identification of entities in the image and automatic annotation of entity relationships.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. An entity relation labeling model processing method based on images is characterized by comprising the following steps:

2. The image-based entity-relationship labeling model processing method of claim 1, wherein before the step of obtaining the training image with entity labeling information, further comprising:

acquiring an initial training image with entity labeling information;

carrying out image enhancement processing on the initial training image to obtain an enhanced image;

and acquiring entity labeling information of the enhanced image according to the image enhancement processing to obtain a training image.

3. The image-based entity relationship labeling model processing method of claim 1, wherein the step of generating the text compound vector of each entity according to the text information comprises:

respectively converting the text remark values of the entities in the text information into word vectors;

respectively generating one-dimensional position vectors of the entities according to the text remarks of the entities, and respectively generating two-dimensional position vectors of the entities according to the coordinate information of the entities;

and generating a text composite vector of each entity according to the word vector, the one-dimensional position vector and the two-dimensional position vector which respectively correspond to each entity.

4. The method of claim 1, wherein the step of generating the initial composite vector of the training image comprises:

adjusting the training image to a preset size, and cutting the training image after size adjustment according to a preset cutting mode to obtain a plurality of image blocks;

generating a one-dimensional position vector of the training image, and respectively generating image characteristics of each image block;

and generating an initial composite vector of the training image according to the one-dimensional position vector and the image characteristics of each image block.

5. The image-based entity relationship annotation model processing method of claim 1, further comprising, before the step of inputting the text compound vector, the training image, and the initial compound vector into an initial entity relationship annotation model:

acquiring an initial entity relationship labeling network, wherein the initial entity relationship labeling network is constructed based on a l ayoutLMv3 network;

and pre-training the initial entity relationship labeling network according to a preset pre-training task to obtain an entity relationship labeling network, wherein the preset pre-training task comprises a mask language modeling task, a mask image modeling task and a word block alignment task.

6. The image-based entity relationship labeling model processing method according to claim 1, further comprising, after the step of inputting the image to be labeled into the entity relationship labeling model to obtain entity relationship information:

carrying out image detection on the image to be marked according to the entity relation information to obtain an image detection result;

and performing service processing on the image to be marked according to the image detection result.

7. An entity relation labeling model processing device based on images is characterized by comprising:

the vector generation module is used for generating text compound vectors of all the entities according to the text information and generating initial compound vectors of the training images;

the convolution processing module is used for inputting the text compound vector, the training image and the initial compound vector into an initial entity relationship labeling model so as to carry out separation convolution processing on the training image through a convolution network in the initial entity relationship labeling model to obtain a convolution characteristic vector;

the vector input module is used for inputting the text compound vector and the image compound vector into an entity relationship labeling network in the initial entity relationship labeling model to obtain entity relationship prediction information;

the model adjusting module is used for calculating model loss according to the relationship information and the entity relationship prediction information, and carrying out parameter adjustment on the initial entity relationship labeling model according to the model loss until the model loss meets a training stopping condition to obtain an entity relationship labeling model;

8. The apparatus according to claim 7, further comprising:

the initial acquisition module is used for acquiring an initial training image with entity marking information;

the image enhancement module is used for carrying out image enhancement processing on the initial training image to obtain an enhanced image;

and the image generation module is used for acquiring the entity marking information of the enhanced image according to the image enhancement processing so as to obtain a training image.

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the image-based entity relationship annotation model processing method of any one of claims 1 to 6.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the image-based entity relationship annotation model processing method according to any one of claims 1 to 6.