CN115130545A

CN115130545A - Data processing method, electronic device, program product, and medium

Info

Publication number: CN115130545A
Application number: CN202210483147.5A
Authority: CN
Inventors: 王硕; 鞠美芝; 张云燕
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-09-30

Abstract

The embodiment of the application discloses a data processing method, electronic equipment, a program product and a medium, which can be applied to the technical field of data processing. The method comprises the following steps: acquiring a sample data set; calling a prediction model to generate text characteristics of each text data and image characteristics of each image data; generating a first feature difference of each of the plurality of first feature pairs respectively; generating a second feature difference for each of the plurality of second feature pairs respectively; and correcting the model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair to obtain the trained prediction model. By adopting the embodiment of the application, the accuracy of prediction of the event indicated by the text data is improved. The embodiment of the application can also be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

Description

Data processing method, electronic device, program product, and medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method, an electronic device, a program product, and a medium.

Background

At present, Natural Language Processing (NLP) is widely used in various fields. In one subtask of natural language processing technology, an event indicated by text data may be predicted based on the text data. Currently, predicting an event indicated by text data based on the text data generally includes training an event prediction model based on the text data, and predicting an event indicated by input text data based on the trained event prediction model. In the practical process, the inventor finds that text data is greatly influenced by writing habits, and the event prediction model is trained by using single text data, so that the accuracy of the event indicated by the text data input by the trained event information extraction model is low.

Disclosure of Invention

Embodiments of the present application provide a data processing method, an electronic device, a program product, and a medium, which are helpful for improving accuracy of prediction of an event indicated by text data.

In one aspect, an embodiment of the present application discloses a data processing method, including:

acquiring a sample data set; sample data in the sample data set comprises N text data and M image data, any text data and any image data are provided with event tags, and N and M are positive integers;

calling a prediction model to generate text characteristics of each text data and image characteristics of each image data;

respectively generating a first feature difference between the text feature and the image feature contained in each of the plurality of first feature pairs; any first feature pair has the same event label to the text data to which the included text feature belongs and the image data to which the included image feature belongs;

respectively generating a second feature difference between the text feature and the image feature contained in each of the plurality of second feature pairs; any second feature pair contains text data to which the text features belong and image data to which the image features belong, and has different event labels;

correcting model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair to obtain a trained prediction model; the trained predictive model is used to predict an event indicated by the text data based on the input text data.

Optionally, the method further includes:

combining the plurality of first feature pairs and the plurality of second feature pairs to obtain first combined feature pairs and second combined feature pairs; a first feature pair and a second feature pair in the first combined feature pair contain the same text feature, and a first feature pair and a second feature pair in the second combined feature pair contain the same image feature;

generating a first prediction loss value of the prediction model for the sample characteristics according to a first characteristic difference corresponding to the first characteristic pair in the first combined characteristic pair and a second characteristic difference corresponding to the second characteristic pair;

generating a second prediction loss value of the prediction model for the sample characteristics according to a first characteristic difference corresponding to the first characteristic pair in the second combined characteristic pair and a second characteristic difference corresponding to the second characteristic pair;

and determining a first characteristic prediction deviation of the prediction model according to the first prediction loss value and the second prediction loss value, and correcting a model parameter of the prediction model according to the first characteristic prediction deviation to obtain the trained prediction model.

Optionally, the method further includes:

calling a prediction model to respectively predict events indicated by the N text data, and generating event prediction deviation based on the predicted events respectively indicated by the N text data and event labels respectively carried by the N text data;

and correcting the model parameters of the prediction model based on the prediction deviation according to the first characteristic and the event prediction deviation to obtain a trained prediction model.

In one aspect, an embodiment of the present application discloses a data processing apparatus, including:

an acquisition unit, configured to acquire a sample data set; the sample data set sample data comprises N text data and M image data, any text data and any image data have event tags, and N and M are positive integers;

the processing unit is used for calling the prediction model to generate the text characteristic of each text data and the image characteristic of each image data;

the processing unit is further used for respectively generating a first feature difference between the text feature and the image feature contained in each of the plurality of first feature pairs; any first feature pair contains the same event label of the text data to which the text feature belongs and the image data to which the image feature belongs;

the processing unit is further used for respectively generating a second feature difference between the text feature and the image feature contained in each of the plurality of second feature pairs; any second feature pair contains text data to which the text features belong and image data to which the image features belong, and has different event labels;

the processing unit is further used for correcting model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair to obtain a trained prediction model; the trained predictive model is used for predicting the event indicated by the text data according to the input text data.

In one aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to perform the following steps:

acquiring a sample data set; the sample data set sample data comprises N text data and M image data, any text data and any image data have event tags, and N and M are positive integers;

respectively generating a first feature difference between the text feature and the image feature contained in each of the plurality of first feature pairs; any first feature pair contains the same event label of the text data to which the text feature belongs and the image data to which the image feature belongs;

correcting model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair to obtain a trained prediction model; the trained predictive model is used for predicting the event indicated by the text data according to the input text data.

In one aspect, an embodiment of the present application provides a computer-readable storage medium, in which computer program instructions are stored, and when the computer program instructions are executed by a processor, the computer program instructions are configured to perform the following steps:

In one aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternatives of the above aspect and the like.

The embodiment of the application provides a data processing scheme, which can call a prediction model to generate text features of each text data and image features of each image data, and further correct model parameters of the prediction model based on first feature differences corresponding to a plurality of first feature pairs of which the text data containing the text features and the image data containing the image features have the same event labels, and second feature differences corresponding to second feature pairs of which the text data containing the text features and the image data containing the image features have different event labels, so as to obtain a trained prediction model. Therefore, the feature difference between the text data with the same event and the feature corresponding to the image data can be compared with the feature difference between the text data with different events and the feature corresponding to the image data for learning, the prediction model is trained by adding the image data, the influence of writing habits on the text feature corresponding to the text data is reduced, the text feature for more accurately predicting the event can be generated by the prediction model, and the accuracy of prediction of the event indicated by the text data is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;

FIG. 3 is a diagram illustrating the results of a trigger extractor provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a training framework of a predictive model according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of a data processing method according to an embodiment of the present application;

FIG. 6 is a diagram illustrating the results of a trigger extractor according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a training framework of a prediction model provided in an embodiment of the present application;

fig. 8 is a schematic flowchart of an application scenario provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

The embodiment of the application provides a data processing scheme, which can call a prediction model to generate text features of each text data and image features of each image data, and then revise model parameters of the prediction model based on first feature differences corresponding to a plurality of first feature pairs of event labels where the text data including the text features and the image data including the image features have the same event labels, and second feature differences corresponding to second feature pairs of event labels where the text data including the text features and the image data including the image features have different event labels, so as to obtain a trained prediction model. Therefore, the feature difference between the text data with the same event and the feature corresponding to the image data can be compared with the feature difference between the text data with different events and the feature corresponding to the image data for learning, the prediction model is trained by adding the image data, the influence of writing habits on the text feature corresponding to the text data is reduced, the text feature for more accurately predicting the event can be generated by the prediction model, and the accuracy of prediction of the event indicated by the text data is improved.

It should be noted that, before collecting the relevant data of the user (such as the text data and the image data) and during the process of collecting the relevant data of the user, a prompt interface or a pop-up window may be displayed, where the prompt interface or the pop-up window is used to prompt the user to currently collect the relevant data, so that the relevant step of obtaining the relevant data of the user is started only after a confirmation operation sent by the user to the prompt interface or the pop-up window is obtained, and otherwise (that is, when the confirmation operation sent by the user to the prompt interface or the pop-up window is not obtained), the relevant step of obtaining the relevant data of the user is ended, that is, the relevant data of the user is not obtained. In other words, all user data collected in the present application is collected under the approval and authorization of the user, and the collection, use and processing of the relevant user data need to comply with relevant laws and regulations and standards of relevant countries and regions.

The technical scheme of the application can be applied to electronic equipment. The electronic device may be a terminal, a server, or other devices for performing data processing, which is not limited in this application. And (4) optional. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and a big data and artificial intelligence platform. The terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, an aircraft, and the like.

In one possible implementation, the embodiments of the present application may be used in the field of Artificial Intelligence (AI), which is a theory, method, technique, and application that utilizes a digital computer or a digital computer controlled machine to simulate, extend, and extend human Intelligence, perceive the environment, acquire knowledge, and use the knowledge to achieve optimal results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The method can be particularly applied to the technical field of Natural Language Processing (NLP), which is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language people use daily, so it has a close relation with the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question answering, knowledge mapping, and the like.

In a possible implementation, the embodiments of the present application may be applied to the field of block chain technology, such as storing the extracted text data indicating the event and the associated information of the event based on the block chain link point. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation management. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation management module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process, and visual output of real-time status in product operation, for example: alarm, management of network conditions, management of node device health status, etc.

The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

Based on the above description, the embodiments of the present application provide a data processing method. Referring to fig. 1, fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present disclosure. The method may be performed by the electronic device described above. The data processing method may include the following steps.

S101, acquiring a sample data set.

The sample data in the sample data set comprises N text data and M image data, any text data and any image data have event tags, and N and M are positive integers. Wherein, the sample data set refers to a data set used for training a prediction model. M and N may be the same or different and are not limited herein.

The event labels of the sample data in the sample data set are labeled based on an event set, and the event set comprises a plurality of events. That is, the event tag of the sample data is labeled based on the events in the event set that the sample data has. The events included in the event set may be some preset events that need to be predicted, which are also called event types. For example, the events in the event set may be 8 event types and 33 seed types defined by ACE2005 (an event type definition). The events in the event set may be defined according to the field to which the prediction model needs to be trained, for example, if the prediction model needs to be trained is applied to the medical field, some events related to disease types, disease causes, and the like may be preset, for example, the events may be "jaundice", "water accumulation", "fever", and the like, which is not limited herein.

In a possible implementation manner, each event in the event set has a corresponding event number, and then the event label of the sample data is labeled based on the event set, and the labeling can be performed according to the event number corresponding to the event of the sample data. For example, if the event number of the event "jaundice" in the event set is "1", the sample data has an event tag including "1" if the sample data has the event "jaundice".

In a possible implementation manner, the sample data set is obtained, an event sample set may be obtained first, and then the sample data set is generated based on the event sample set, where each event sample in the event sample set includes text data and image data. The event sample may be text data and image data describing the same event information. For example, the event sample may be an examination report including text data and image data, the examination report may include text data describing symptoms, reasons for the symptoms, and other information, and corresponding image data showing events (e.g., symptoms) and other information, such as an electrocardiogram, an nmr chart, and the like, and the text data and the image data in the event sample indicated by the examination report are both used to express the event information of the event sample. And further, event tags of the text data and the image data in the event sample can be determined based on the event existing in the text data in the event sample, so that the text data and the image data with the event tags are determined as sample data to obtain a sample data set. It will be appreciated that the above-described inspection reports are collected under the consent and authorization of the user, and that the collection, use and processing of relevant user data complies with relevant laws and regulations and standards in relevant countries and regions.

For example, in one example of an event, the text data is "two months ago ball injury, check for the presence of knee hydrops," and the image data is a magnetic resonance image of the knee. If an event "water accumulation" exists in the event set, the event "water accumulation" may exist in the text data, the text data in the event sample is labeled with an event label corresponding to the event "water accumulation", and the image data in the event sample is labeled with an event label corresponding to the event "water accumulation", so that one text data and one image data in the sample data set are obtained, and so on, a plurality of text data and image data can be obtained according to a plurality of event samples as sample data in the sample data set, and each text data and each image data has a corresponding event label.

S102, calling a prediction model to generate text features of each text data and image features of each image data.

The text feature may be a feature for representing semantics of the text data, and the image feature may be a feature for representing content in the image data.

In one possible implementation, a text encoder may be included in the prediction model, and the text encoder in the prediction model is invoked to generate a text feature of each text data, and the text feature of each text data may be generated based on the text encoder in the prediction model. The text encoder may be a pre-trained BERT (language representation model) model, and the text features of the text data may be obtained based on the pre-trained BERT (language representation model). In a possible implementation mode, the sequence representation of the text data is directly obtained based on the BERT model, and in order to facilitate the subsequent comparative learning based on the image feature and the text feature, the dimension of the text feature and the dimension of the image feature need to be the same, so that the average pooling process can be performed based on the sequence representation directly obtained by the BERT model to obtain the text feature of the text data.

In a possible implementation, the prediction model may include an image encoder, and the image encoder in the prediction model is further invoked to generate an image feature of each image data, and the image feature of each image data may be generated based on the image encoder in the prediction model. The image encoder may include, among other things, a Resnet50 (a neural network model) model, such as Resnet50 (a neural network model), for generating global features of the image data, and a model, such as fast-RCNN (a neural network model), for performing entity identification and entity region identification of the image. The entity in the image may also be referred to as an objectively existing and distinguishable thing in the image, such as a skull in a magnetic resonance image of the skull as an entity. Wherein the Resnet50 model may be a model pre-trained on a 1k ImageNet (an image dataset) dataset, enabling extraction of global features of the image data; the fast-RCNN model can be a pre-trained model on a Visual Genome (an image data set) based data set, and can extract the entity characteristics and the entity position characteristics of the image data, so that the entity characteristics, the entity position characteristics and the overall characteristics of the image data can be spliced to obtain the image characteristics of the image data. In a possible implementation, when generating image features of image data, an existing open-source tool may be used to perform fine-grained information extraction on an image, for example, by performing a scene recognition (event and scene are strongly associated) task on a graph model, the model may be helped to collect more information about the event; in addition, semantic annotation is carried out on information obtained by scene recognition through a graph-based semantic tool, and the semantic information can be obtained while event related information is collected, so that the prediction model is helped to realize the injection of image modal information, and the accuracy of the trained prediction model for recognizing events is improved.

S103, first feature differences between text features and image features included in each of the plurality of first feature pairs are generated respectively.

And any first feature pair contains text data to which the text features belong and image data to which the image features belong, and the text data and the image data have the same event labels.

In a possible embodiment, if any text data may have at least one event tag, and any image data may have at least one event tag, the text data and the image data have the same event tag, which means that the text data and the image data have the same at least one event tag.

For example, if the event label of one text data S1 includes "1" and "2", and the event label of one image data V1 includes "1" and "2", it is obvious that the text data S1 and the image data V1 have the same respective event labels, the text feature of the text data S1 and the image feature of the image data V1 can be grouped into a first feature pair.

The first feature difference is used to characterize a difference between a text feature and an image feature included in the first feature pair. The first feature difference may be a feature distance or a feature similarity between the text feature and the image feature. Wherein, the feature distance between the text feature and the image feature may be euclidean distance, etc., and the feature similarity between the text feature and the image feature may be cosine similarity, etc. If the first feature difference is characterized by a feature distance, the larger the first feature difference (i.e., the feature distance), the larger the difference between the text feature and the image feature, and the smaller the first feature difference (i.e., the feature distance), the smaller the difference between the text feature and the image feature. If the first feature difference is characterized by feature similarity, the larger the first feature difference (i.e., feature similarity), the smaller the difference between the text feature and the image feature, and the smaller the first feature difference (i.e., feature similarity), the smaller the difference between the text feature and the image feature.

And S104, respectively generating a second feature difference between the text feature and the image feature contained in each of the plurality of second feature pairs.

And any second feature pair contains text data to which the text features belong and image data to which the image features belong, wherein the text data and the image data have different event labels.

In one possible embodiment, having different event tags means that at least one event tag is different between at least one event tag of the text data and at least one event tag of the image data.

For example, if the event tags of one text data S2 include "1" "2" and the event tags of one image data V2 include "3" "4", it is apparent that the text data S2 is different from the event tags of the image data V2, i.e., the text data S2 and the image data V2 have different event tags, the text features of the text data S2 and the image features of the image data V2 may be grouped into a second feature pair. For another example, if the event label of one text data S3 includes "1" or "2", and the event label of one image data V3 includes "1", it is obvious that the text data S3 has one more event label "2" than the image data V3, i.e., the text data S3 and the image data V3 have different event labels, the text feature of the text data S3 and the image feature of the image data V3 may be grouped into a second feature pair.

The second feature difference is used for characterizing the difference between the text feature and the image feature contained in the second feature pair. Similar to the first feature difference described above, the second feature difference may be a feature distance or a feature similarity between the text feature and the image feature. Wherein, the feature distance between the text feature and the image feature may be euclidean distance, etc., and the feature similarity between the text feature and the image feature may be cosine similarity, etc. If the second feature difference is characterized by a feature distance, the larger the second feature difference (i.e., the feature distance), the larger the difference between the event feature and the image feature of the second feature pair, and the smaller the second feature difference (i.e., the feature distance), the smaller the difference between the event feature and the image feature of the second feature pair. If the second feature difference is characterized by feature similarity, the larger the second feature difference (i.e., feature similarity), the smaller the difference between the event feature and the image feature of the second feature pair, and the smaller the second feature difference (i.e., feature similarity), the smaller the difference between the event feature and the image feature of the second feature pair.

And S105, correcting model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair to obtain the trained prediction model.

And the trained prediction model is used for predicting the event indicated by the text data according to the input text data.

It can be understood that, in the training process, modifying the model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair needs to make the difference between the text feature and the image feature corresponding to each first feature pair gradually decrease (i.e. the feature distance decreases and the feature similarity increases), and make the difference between the text feature and the image feature corresponding to each second feature pair gradually increase (i.e. the feature distance increases and the feature similarity decreases), so that the text encoder in the trained prediction model can generate text features that can better characterize events in the text data, because in general, the style of the image data in one event sample is relatively fixed, and is not influenced by writing habits like the text data, and the representation of the event information is more intuitive and comprehensive, for example, when the event sample is an examination report, the image is usually an electrocardiogram, a nuclear magnetic resonance image, or the like generated by a machine, and does not change according to changes of doctors and can also represent examination report information in the examination report, so that by introducing feature differences between image data and text data for comparison and learning, accuracy of feature extraction on the text data can be improved, further, the prediction effect of an event is prevented from being influenced by uncertainty caused by changes in the description style of the text data, and accuracy of event prediction on the text data is improved.

In a possible implementation, invoking the trained predictive model for predicting the event indicated by the text data according to the input text data may specifically include the following steps: firstly, target text data is obtained. And secondly, calling the trained prediction model to generate text features of the target text data. And thirdly, calling the trained prediction model to predict the event indicated by the target text data based on the text features of the target text data. The target text data refers to text data requiring event prediction. The text features of the target text data can be determined based on a text encoder in a trained prediction model, and after the text encoder performs contrast learning on the text data and the image data, events in the text can be well represented, so that the events indicated by the target text data can be determined based on the text features.

Referring to fig. 2, fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present disclosure. The method may be performed by the electronic device described above. The data processing method may include the following steps.

S201, acquiring a sample data set.

S202, calling a prediction model to generate text features of each text data and image features of each image data.

The steps S201 to S202 may refer to the related descriptions of the steps S101 to S102, which are not described herein again.

And S203, respectively generating a first feature difference between the text feature and the image feature contained in each of the plurality of first feature pairs.

In a possible implementation manner, as described above, the first feature difference may be characterized by the feature similarity, and step S203 may include the following steps: and generating feature similarity between the text feature and the image feature contained in each first feature pair respectively. The feature similarity may be a cosine similarity between a text feature and an image feature, or the like. And secondly, respectively determining the feature similarity between the text feature and the image feature contained in each first feature pair as the first feature difference corresponding to each first feature pair. The feature difference of the first feature pair can be characterized by the feature similarity of the text feature and the image feature contained in the first feature pair.

In a possible implementation, as mentioned above, the first feature difference may be characterized by a feature distance, and step S203 may include the following steps: and generating feature distances between the text features and the image features contained in each first feature pair respectively. The feature distance may be a euclidean distance between text features and image features, or the like. And secondly, respectively determining the feature distance between the text feature and the image feature contained in each first feature pair as the first feature difference corresponding to each first feature pair. The feature difference of the first feature pair can be characterized by the feature distance of the text feature and the image feature contained in the first feature pair.

And S204, respectively generating a second feature difference between the text feature and the image feature contained in each of the plurality of second feature pairs.

In a possible implementation manner, as described above, the second feature difference may be characterized by a feature similarity, and step S203 may include the following steps: and generating feature similarity between the text features and the image features contained in each second feature pair respectively. The feature similarity may be a cosine similarity between a text feature and an image feature, or the like. And secondly, respectively determining the feature similarity between the text feature and the image feature contained in each second feature pair as the second feature difference corresponding to each second feature pair. The feature difference of the second feature pair can be characterized by the feature similarity of the text feature and the image feature contained in the second feature pair.

In a possible implementation, as described above, the second feature difference may be characterized by a feature distance, and step S203 may include the following steps: and generating a feature distance between the text feature and the image feature contained in each second feature pair respectively. The feature distance may be a euclidean distance between text features and image features, or the like. And secondly, respectively determining the feature distance between the text feature and the image feature contained in each second feature pair as the corresponding second feature difference of each second feature pair. The feature difference of the second feature pair can be characterized by the feature distance of the text feature and the image feature contained in the second feature pair.

It is understood that, for the first feature pair and the second feature pair, whether the text data to which the text feature belongs and the image data to which the image feature belongs have the same event label is determined, that is, whether the text data to which the text feature belongs and the image data to which the image feature belongs have the same event content is characterized by the event label. In one possible implementation, event content may also be characterized by trigger words in conjunction with events. An event may correspond to a plurality of different trigger words, and the trigger words may also have different word forms, so that the word form of the trigger word may be restored by using an existing open source tool, and then the restored trigger word may be spliced with an event indicated by an event tag to represent a specific event content, and then the first feature pair and the second feature pair may be determined by comparing whether the event contents are the same.

S205, combining the plurality of first feature pairs and the plurality of second feature pairs to obtain first combined feature pairs and second combined feature pairs.

Wherein a first feature pair and a second feature pair of the first combined feature pair comprise the same text feature and a first feature pair and a second feature pair of the second combined feature pair comprise the same image feature. The number of the first combined feature pairs may be multiple, and the number of the second combined feature pairs may also be multiple.

For example, the plurality of first feature pairs may include (s1, v1), (s2, v2), (s3, v3), (s4, v4), and so on, and the plurality of second feature pairs may include (s1, v2), (s2, v3), (s1, v4), (s4, v4), and so on, wherein s 4-s 4 represents a text feature, v 4-v 4 represents an image feature, and combining the plurality of first feature pairs and the plurality of second feature pairs may result in a plurality of first combined feature pairs (s4, v4) - (s4, v4), (s4, v4) - (s4, v4) and (s4, v4) - (s4, v4, and so on, and the plurality of second combined feature pairs (s4, v4), (s4, v4) - (s4, v4, and so on, v3), (s4, v4) - (s1, v4), (s3, v3) - (s4, v3), etc., whereby a plurality of pairs of first combined features and a plurality of pairs of second combined features can be obtained.

It is to be understood that the same text feature contained in the first combined feature pair may be referred to as an anchor exemplar feature, the image feature contained in the first feature pair in the first combined feature pair may be referred to as a positive exemplar feature of the anchor exemplar feature, and the image feature contained in the second feature pair in the first combined feature pair may be referred to as a negative exemplar feature of the anchor exemplar feature. The same image feature included in the second combined feature pair may also be referred to as an anchor sample feature, the text feature included in the first feature pair in the second combined feature pair may be referred to as a positive sample feature of the anchor sample feature, and the text feature included in the second feature pair in the second combined feature pair may be referred to as a negative sample feature of the anchor sample feature. In the training process of the prediction model, the difference between the anchor sample features and the positive sample features corresponding to the anchor sample features needs to be gradually reduced, and the difference between the anchor sample features and the negative sample features corresponding to the anchor sample features needs to be gradually increased, so that the comparison learning of the text data and the image data is realized.

S206, generating a first prediction loss value of the prediction model for the sample characteristic according to the first characteristic difference corresponding to the first characteristic pair in the first combined characteristic pair and the second characteristic difference corresponding to the second characteristic pair.

The sample features refer to features corresponding to the sample data, namely, upper text features and image features. The first prediction loss value for the sample feature may characterize a difference between a first feature difference and a second feature difference corresponding to the first combined feature. During the training of the prediction model, the first prediction loss value should be made gradually smaller until convergence.

In a possible implementation manner, if the first feature difference and the second feature difference are both characterized by using the feature similarity, the first prediction loss value may be obtained by subtracting the first feature difference from the second feature difference corresponding to the first combined feature pair. If there are multiple first combined feature pairs, the first feature difference may be subtracted from the second feature difference corresponding to each first combined feature pair to obtain a feature pair difference value corresponding to each first combined feature pair, and an expectation of the feature pair difference value corresponding to each first combined feature pair is calculated to obtain a first prediction loss value. The expectation of the difference value of the feature pair corresponding to each first combined feature pair is calculated, which is also called calculating the mean value of the difference value of the feature pair corresponding to each combined feature pair.

In one possible embodiment, for any second feature pair, the event label of the text data and the event label of the image data are different from each other, and it is obvious that the smaller the proportion of the same event label in the event label of the text data and the event label of the image data to the union of the event labels of the text data and the image data, the larger the difference between the events described by the text data and the image data, the smaller the difference between the features of the second feature pair corresponding to the text data and the image data, which have the larger difference between the events, should be made in the training process, the smaller the difference between the features of the second feature pair corresponding to the text data and the image data, which have the smaller difference between the events, and furthermore, when calculating the first prediction loss value, the different weights may be given to the differences between the second features corresponding to the second feature pairs in different first combined features, the above object is achieved by weighting the second feature difference according to the repetition degree of the event label corresponding to the text feature and the image feature in the second feature pair.

Specifically, step S206 may include the following steps: determining text features contained in a second feature pair in the first combined feature pair as first text features, and determining image features contained in the second feature pair in the first combined feature pair as first image features.

And calling a prediction model to generate a first label weight of an event label of the text data to which the first text feature belongs based on the first image feature, and generating a second label weight of the event label of the image data to which the first image feature belongs. The first label weight refers to a weight corresponding to an event label of the text data to which the first text feature belongs, and the second label weight refers to a weight corresponding to an event label of the text data to which the first image feature belongs. It is understood that the at least one event label corresponding to the first text feature corresponds to a first label weight, and the at least one event label corresponding to the first image feature corresponds to a second label weight. The first label weight and the second label weight may be determined, the weight of each event in the event set may be determined based on the first image feature, the first label weight may be determined based on the weight of the event indicated by the event label of the text data to which the first text feature belongs, and the second label weight may be determined based on the weight of the event indicated by the event label of the image data to which the first image feature belongs. It will be appreciated that the determination of the weight of each event in the event set based on the first image feature is due to the better expressiveness of the image data for the significance of the event. This significance, also known as statistical significance, represents the ability to distinguish between populations, and here, the ability to distinguish between different events. For example, if the image data is a magnetic resonance image of a skull, which indicates that the event is more likely to cause a head injury or the like, i.e., the significance is higher, the application may weigh each event in the event set based on the image data in the second feature pair. Specifically, the event weight of each event obtained based on the first image feature may be determined according to the following formula (i.e., formula 1).

φ(v _o )＝Wv _o + b formula 1

Wherein phi (v) _o ) Representing an event weight, phi (v), for each event based on the first image feature _o ) Each line of data in (1) corresponds to an event in the event set, and each line of data corresponds to the event weight of the corresponding event. W may be a parameter matrix learned during training of the predictive model, with W having a dimension | E _a I.times.d, wherein E _a Representing the above set of events, d is the same constant as the number of columns of image features, i.e., each row of data in W corresponds to one event in the set of events. v. of _o Representing the first image feature, i.e. the image feature in the second feature pair, b represents a constant.

After the event weights of the events obtained based on the first image feature are obtained, a first label weight of at least one event label corresponding to the first text feature and a second label weight of at least one event label corresponding to the first image feature may be calculated based on the following formula (i.e., formula 2).

Wherein, the

The label weight representing the event label k. Phi (v) _o ) _k Represents the weight of the event indicated by the event label k, exp () refers to an exponential function with a natural constant e as the base, exp (phi (v)) _o ) _k ) Representing a numerical value based on e and indexed by the weight of the event indicated by the event label k. E represents a set of events indicated by the event labels of the image data to which the first image feature belongs, and l may take a value from 1 to E, that is, l may be an event indicated by the event labels corresponding to the first image feature,

may be a sum of numerical values based on e and indexed by the weight of the event indicated by each event label corresponding to the first image feature. It can be understood that, if the first label weight corresponding to the first text feature is to be calculated, the weight of the event indicated by each event label possessed by the first text feature may be regarded as phi (v) in formula 2 _o ) _k Calculating; to calculate the second label weight corresponding to the first image feature, the weight of the event indicated by each event label of the first image feature may be defined as phi (v) in equation 2 _o ) _k And (6) performing calculation.

And generating a first difference weight of the corresponding second feature difference of the first combined feature pair according to the first label weight and the second label weight. The first difference weight is used for indicating the weight given to the second feature difference corresponding to the first combined feature pair. In determining the first difference weight, the second feature difference may be weighted by comparing the differences of the event labels respectively possessed by the negative example feature and the anchor example feature in the first combined feature pair. Specifically, the calculation can be performed by the following formula (i.e., formula 3).

Wherein, mu _T (i, j) represents a first difference weight corresponding to a second feature difference between the text feature having the sample number i and the image feature having the sample number j.

Representing any first label weight, E, corresponding to a first text feature i _i Representing a set of events indicated by at least one event label possessed by the text data to which the first text feature belongs, E _j Representing a set of events indicated by at least one event label possessed by the image data to which the first image feature belongs, E _i \E _j Denotes E _i In which E is removed _j And E _i An event that is other than the same event,

e is divided from each event label corresponding to the first text characteristic _j And E _i The sum of the first label weights corresponding to the event labels of the events except the same event; e _j \E _i Represents E _j Removing E _i And E _j An event that is other than the same event,

e is divided from each event label corresponding to the first image characteristic _i And E _j The sum of the weights of the second labels corresponding to the event labels of events other than the same event, and thus can be passed,

and with

The differences of the event labels of the negative example feature and the anchor example feature are shown. δ is a predetermined constant.

Representing the sum of the first label weights corresponding to the event labels corresponding to the first text feature,

and the sum of the second label weights corresponding to the event labels corresponding to the first image characteristics is represented.

And generating a first prediction loss value according to the first difference weight, the first characteristic difference corresponding to the first characteristic pair in the first combined characteristic pair and the second characteristic difference corresponding to the second characteristic pair in the first combined characteristic pair. If the first feature difference and the second feature difference are both characterized by the feature similarity, the first prediction loss value may be obtained by multiplying the first difference weight by the second feature difference corresponding to the first combined feature pair and subtracting the first feature difference corresponding to the first combined feature pair.

And S207, generating a second prediction loss value of the prediction model for the sample characteristics according to the first characteristic difference corresponding to the first characteristic pair in the second combined characteristic pair and the second characteristic difference corresponding to the second characteristic pair.

Wherein the second predicted loss value for the sample feature may characterize a difference between the first feature difference and the second feature difference corresponding to the second combined feature pair. During the training process of the prediction model, the second prediction loss value is gradually reduced until convergence.

The manner of generating the second predicted loss value may refer to the above-mentioned description of generating the first predicted loss value.

In one possible implementation, if the first feature difference and the second feature difference are both characterized by feature similarity, the second predicted loss value may be obtained by subtracting the first feature difference from the second feature difference corresponding to the second combined feature pair. If there are multiple second combined feature pairs, the first feature difference may be subtracted from the second feature difference corresponding to each second combined feature pair to obtain a feature pair difference value corresponding to each second combined feature pair, and an expectation of the feature pair difference value corresponding to each second combined feature pair is calculated to obtain a second predicted loss value.

In a possible embodiment, when calculating the second prediction loss value, different weights may be given to the second feature difference corresponding to the second feature pair in different second combined features, that is, the second feature difference is weighted according to the repetition degree of the event label corresponding to the text feature and the image feature in the second feature pair, so that the difference between the features of the second feature pair corresponding to the text data and the image data with a larger degree of difference between the events should be smaller than the difference between the features of the second feature pair corresponding to the text data and the image data with a smaller degree of difference between the events. Specifically, the manner of determining the second predicted loss value may refer to the above-mentioned description of determining the first predicted loss value based on the first difference weight, that is, the method may specifically include the following steps: determining text features contained in a second feature pair in the second combined feature pair as second text features, and determining image features contained in the second feature pair in the first combined feature pair as second image features. And secondly, calling the prediction model to generate a third label weight of the event label of the text data to which the second text feature belongs based on the second image feature, and generating a fourth label weight of the event label of the image data to which the second image feature belongs. And generating a second difference weight of the second combined feature pair corresponding to the second feature difference according to the third label weight and the fourth label weight. And generating a second prediction loss value according to the second difference weight, the first characteristic difference corresponding to the first characteristic pair in the second combined characteristic pair and the second characteristic difference corresponding to the second characteristic pair in the second combined characteristic pair. The manner of determining the third label weight and the fourth label weight may refer to the description related to the first label weight and the second label weight, and the manner of determining the second difference weight may refer to the description related to the first difference weight, which is not described herein again.

S208, determining a first characteristic prediction deviation of the prediction model according to the first prediction loss value and the second prediction loss value, and correcting model parameters of the prediction model according to the first characteristic prediction deviation to obtain the trained prediction model.

The first feature prediction bias may be a bias value for correcting a model parameter of the prediction model based on comparison between the text feature and the image feature. During the training of the prediction model, the first feature prediction bias should be gradually reduced until convergence.

In one possible embodiment, the first characteristic prediction deviation may be a sum of the first prediction loss value and the second prediction loss value. Specifically, the determination may be made according to the following formula (formula 4).

Wherein, the first and the second end of the pipe are connected with each other,

s denotes a text feature and v denotes an image feature for predicting a deviation of a first feature determined based on each text feature and the image feature. S (S, v ') represents the feature similarity between the text feature S and the image feature v' in the second feature pair in the first combined feature pair, S (S, v) represents the feature similarity between the text feature S and the image feature v in the first feature pair in the first combined feature pair, and μ _T (i, j) represents the first difference weight, i.e., the first difference weight corresponding to the second feature difference between the text feature s having the sample number i and the image feature v 'having the sample number j, i represents the sample number of the text feature s, j represents the sample number of the image feature v', and E represents the second difference weight _v′ [ ] ₊ Expression [ 2 ]]Expectation of the numerical value in (1), then

Representing the first predicted lossLosing the value. And epsilon represents a boundary value, and the boundary value can ensure that the text feature and the image feature in the first feature pair are not completely the same and the text feature and the image feature in the second feature pair are not completely different in the training process of the prediction model, and can also ensure that the prediction deviation of the first feature is converged. S (S ', v) represents the feature similarity between the text feature S' and the image feature v in the second feature pair of the second combined feature pair, S (S, v) represents the feature similarity between the text feature S and the image feature v in the first feature pair of the second combined feature pair, and μ _T (k, l) S (S ', v) represents the second difference weight, i.e., the second difference weight corresponding to the second feature difference between the text feature S ' with the sample number k and the image feature v with the sample number l, where k represents the sample number of the text feature S ' and l represents the sample number of the image feature v, E _s′ [μ _T (k,l)S(s′,v)-S(s,v)+∈] ₊ The second prediction loss value described above is shown. The first characteristic prediction bias may thereby be derived based on a sum of the first prediction loss value and the second prediction loss value.

In a possible implementation manner, in the process of training the model parameters of the prediction model, in addition to correcting the model parameters based on the first feature prediction deviation, the model parameters may be corrected through an event indicated by the predicted text data and an event indicated by the event label, so that the trained prediction model can accurately predict the event indicated by the text data for the input text data. Then, specifically, the modifying the model parameter of the prediction model according to the first feature prediction deviation to obtain the trained prediction model may specifically include the following steps: calling a prediction model to predict events indicated by the N text data respectively, and generating event prediction deviation based on the predicted events indicated by the N text data respectively and event labels carried by the N text data respectively. And secondly, correcting model parameters of the prediction model based on the prediction deviation according to the first characteristic and the event prediction deviation to obtain a trained prediction model.

The event prediction deviation may be a deviation determined based on a difference between an event indicated by each of the predicted N text data and an event tag carried by each of the predicted N text data. In the training process, the event prediction deviation needs to be gradually reduced until convergence, so that the trained prediction model can accurately determine the event indicated by the text data.

In a possible implementation manner, a trigger word extractor may be included in the prediction model, and the N text data respectively carry trigger word tags. The trigger word tag is used for indicating which characters in the text data trigger corresponding events. And then a text encoder based on a prediction model obtains text features corresponding to the text data, a trigger word extractor is called to determine the probability that each participle (token) in the text data belongs to each event in an event set based on the text features, if the probability of an event corresponding to one participle is greater than a threshold value, the participle can be determined to be a character in the trigger word corresponding to the event (namely, the token can trigger the event), and then which events in the event set can be triggered by each participle can be determined, so that based on the triggered event, the event label and the trigger word label predicted by each participle, an event prediction deviation is obtained, and then the model parameters are corrected based on the event prediction deviation. In the training process, the event triggered by each word segmentation obtained through prediction is gradually consistent with the trigger word label and the event label, and then the trained prediction model can accurately determine the event indicated in the text data. If a word segment can trigger a corresponding event, the word segment is a character in a trigger word, the word segments which continuously trigger the same event can form a corresponding trigger word, and one or more events triggered by the trigger word are one or more events indicated by the text data. For example, the text data of the "two-month-before-mingmy ball injury, the existence of knee hydrops" is checked, the probability that the participle of "knee" corresponds to the "hydrops" is greater than the threshold, the probability that the participle of "hydrops" corresponds to the "hydrops" is greater than the threshold, the knee hydrops can be a trigger of the "hydrops" event, and the event indicated by the text data is predicted to include the "hydrops" event. It will be appreciated that the text features input to the trigger extractor may be sequence tokens directly obtained by the text encoder, and do not require the above-described averaging pooling process.

In one possible implementation, the trigger extractor in the prediction model may adopt a sequence labeling method. Referring to fig. 3, fig. 3 is a schematic diagram of a result of a trigger extractor according to an embodiment of the present application, as shown in fig. 3, in the trigger extractor, a text feature number as shown in 301 in fig. 3 may be input into a linear layer as shown in 302 in fig. 3, a low-dimensional feature having one dimension as the number of events in an event set is obtained, then a Conditional Random Field (CRF) is used to perform a joint probability characterization calculation on each token, so as to obtain a probability that each token corresponds to each event, and further determine an event corresponding to each token based on the probability, and a trigger (as shown in 304 in fig. 3) of the event corresponding to each token, where the event corresponding to the trigger is an event indicated by predicted text data (as shown in 305 in fig. 3). Therefore, model parameters of the prediction model can be optimized based on the event and the event label predicted by the trigger word extractor, so that the trained prediction model can accurately predict the event indicated by the text data.

Here, a training process of a prediction model is described by taking the prediction model as an example in a training process, please refer to fig. 4, and fig. 4 is a schematic diagram of a training framework of a prediction model provided in an embodiment of the present application. First, a sample data set, that is, a plurality of text data shown in 401 in fig. 4 and a plurality of image data shown in 402 in fig. 4, is obtained, so that a text feature of each text data can be obtained based on a text encoder (shown in 403 in fig. 4), and an image feature of each image data can be obtained based on an image encoder (shown in 404 in fig. 4), so as to determine a first feature prediction bias (shown in 405 in fig. 4) according to each text feature and image feature. And invoking a trigger word extractor (shown as 406 in fig. 4) to determine an event (shown as 407 in fig. 4) indicated by the text data predicted by each text feature based on the text feature of the respective text data, so as to determine an event prediction bias (shown as 408 in fig. 4) according to the event indicated by the predicted text data and the event tag carried by the text data. Model parameters of an optimized prediction model, such as model parameters in an optimized text encoder and a trigger extractor, are then determined based on the event prediction bias and the first feature prediction bias. It can be understood that, when the model parameters are corrected based on the first feature prediction bias, the difference of the features between the text data and the image data with the same event label is made small, and the difference of the features between the text data and the image data with different event labels is made large, so that the comparison learning can be performed based on the text data and the image data, so that the text features can better represent the events, and the accuracy of prediction of the events in the text data is improved.

In a possible implementation manner, based on the prediction model trained in the embodiment of the present application, when performing event prediction on input text data, the method may include the following steps: the method specifically comprises the following steps: firstly, target text data is obtained. And secondly, calling a text encoder in the trained prediction model to generate text features of the target text data. And calling a trigger word extractor of the trained prediction model to predict and obtain an event and a trigger word indicated by the target text data based on the text features of the target text data. The target text data refers to text data requiring event prediction. And calling a trigger word extractor of the trained prediction model to predict and obtain the event and the trigger word indicated by the target text data based on the text features of the target text data, determining the probability of each event in an event set corresponding to each participle in the target text data based on the trigger word extractor, determining the event with the probability greater than a threshold value as the event corresponding to the participle, and obtaining the event and the trigger word indicated by the target text data. The text features of the target text data can be determined based on a text encoder in a trained prediction model, and after the text encoder performs contrast learning on the text data and the image data, events in the text can be well represented, so that the events indicated by the target text data can be determined based on the text features.

Referring to fig. 5, fig. 5 is a schematic flowchart of a data processing method according to an embodiment of the present disclosure. The method may be performed by the electronic device described above. The data processing method may include the following steps.

S501, acquiring a sample data set.

S502, calling a prediction model to generate text features of each text data and image features of each image data.

And S503, respectively generating a first feature difference between the text feature and the image feature contained in each of the plurality of first feature pairs.

And S504, respectively generating a second feature difference between the text feature and the image feature contained in each of the plurality of second feature pairs.

The steps S501 to S504 can refer to the related descriptions of the steps S101 to S104, which are not described herein again.

And S505, generating a first feature prediction deviation of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair.

The relevant description of step S505 may refer to the relevant description of steps S205-S208 in the embodiment of fig. 2, and is not repeated here.

S506, calling an information extractor to generate the event characteristics of each event in the event set.

The event feature may be a semantic feature used to represent text corresponding to the event. For example, the event feature of the event "jaundice" is used to represent the semantic feature of the text "jaundice".

It will be appreciated that an information extractor may also be included in the predictive model, which may be used to extract information associated with events in the textual data. The association information of the event can comprise an argument and an argument role associated with the event. Wherein, the argument refers to the participant of the event, and the argument role refers to the role that the argument plays in the event. It can be understood that each event has several corresponding argument roles, so that in determining the association information of arguments, the argument of each argument role corresponding to the event that each text data has needs to be determined in the text data. For example, in a text data of 'two months ago, a beginner is injured by playing a ball, the existence of knee ponding' is checked, the argument role of the 'ponding' event can include argument roles of reasons, time and the like, and then in the text data, the argument corresponding to the argument role of the reasons is 'shot injury' and the argument corresponding to the argument role of the time is 'two months ago' can be extracted for the 'ponding' event.

In a possible implementation manner, the information extractor of the prediction model may include an event encoder, and the event encoder in the prediction model may be invoked to generate an event feature corresponding to each event. The event encoder may be an Embedding encoding network, and further may obtain an event feature corresponding to each event based on the Embedding encoding network.

S507, a second feature prediction bias of the information extractor is generated based on the event feature of each event and the image feature of each image data.

The second prediction bias may be a bias value for correcting a model parameter of the prediction model based on comparison between the image feature and the event feature.

In one possible implementation, step S507 may include the following steps: and calling the information extractor to generate a third feature difference between the event feature and the image feature contained in each of the plurality of third feature pairs. The event to which the event feature included in any third feature pair belongs is the same as the event indicated by the event label corresponding to the included image feature, that is, the event indicated by the event label corresponding to the image feature in the third feature pair is the event to which the event feature in the third feature pair belongs. For example, if the event to which the event feature e1 belongs is event 1, and the event tag corresponding to the image feature v1 only includes the event tag corresponding to the event 1, the event feature e1 and the image feature v1 may form a third feature pair.

The third feature difference is used for characterizing the difference between the event feature and the image feature included in the third feature pair. The third feature difference may be a feature distance or a feature similarity between the event feature and the image feature, similar to the first feature difference described above. Wherein, the feature distance between the event feature and the image feature may be an euclidean distance or the like, and the feature similarity between the event feature and the image feature may be a cosine similarity or the like. If the third feature difference is characterized by a feature distance, the larger the third feature difference (i.e., the feature distance), the larger the difference between the event feature and the image feature, and the smaller the third feature difference (i.e., the feature distance), the smaller the difference between the event feature and the image feature. If the third feature difference is characterized by the feature similarity, the larger the third feature difference (i.e., the feature similarity), the smaller the difference between the event feature and the image feature, and the smaller the third feature difference (i.e., the feature similarity), the smaller the difference between the event feature and the image feature.

And invoking an information extractor to generate a fourth feature difference between the event feature and the image feature contained in each of the plurality of fourth feature pairs. The event to which the event feature included in any fourth feature pair belongs is different from the event indicated by the event label corresponding to the image feature included in the fourth feature pair, that is, the event indicated by the event label corresponding to the image feature is not identical to the event to which the event feature belongs. For example, if the event to which the event feature e1 belongs is event 1, and if the event tags corresponding to the image feature v2 include event tags corresponding to event 1 and event 2, the event feature e1 and the image feature v2 may form a fourth feature pair; if the event labels corresponding to the image feature v3 include event labels corresponding to event 3 and event 4, the event feature e1 and the image feature v3 may be combined into a fourth feature pair.

The fourth feature difference is used for characterizing the difference between the event feature and the image feature included in the fourth feature pair. Similar to the third feature difference, the fourth feature difference may be a feature distance or a feature similarity between the event feature and the image feature, and is not repeated herein.

And thirdly, calling the information extractor to generate a second feature prediction deviation aiming at the information extractor based on the third feature difference corresponding to each third feature pair and the fourth feature difference corresponding to each fourth feature pair. The second prediction bias may be a bias value for correcting a model parameter of the prediction model based on comparison between the image feature and the event feature. During the training of the prediction model, the second feature prediction bias should be gradually reduced until convergence.

In one possible implementation, generating the second feature prediction bias for the information extractor may include the steps of: combining the plurality of third feature pairs and the plurality of fourth feature pairs to obtain a third combined feature pair and a fourth combined feature pair. Wherein the third feature pair and the fourth feature pair in the third combined feature pair comprise the same event feature, and the third feature pair and the fourth feature pair in the second combined feature pair comprise the same image feature. It is to be understood that the same event feature included in the third combined feature pair may be referred to as an anchor sample feature, the image feature included in the third feature pair in the third combined feature pair may be referred to as a positive sample feature of the anchor sample feature, and the image feature included in the fourth feature pair in the third combined feature pair may be referred to as a negative sample feature of the anchor sample feature. The same image feature included in the fourth combined feature pair may also be referred to as an anchor exemplar feature, the event feature included in the third feature pair in the fourth combined feature pair may be referred to as a positive exemplar feature of the anchor exemplar feature, and the event feature included in the fourth feature pair in the fourth combined feature pair may be referred to as a negative exemplar feature of the anchor exemplar feature. In the training process of the prediction model, the difference between the anchor sample features and the positive sample features corresponding to the anchor sample features needs to be gradually reduced, and the difference between the anchor sample features and the negative sample features corresponding to the anchor sample features needs to be gradually increased, so that the comparison learning of the event and the image data is realized.

And generating a third prediction loss value of the prediction model for the sample characteristic according to a third characteristic difference corresponding to the third characteristic pair in the third combined characteristic pair and a fourth characteristic difference corresponding to the fourth characteristic pair. Wherein the third prediction loss value for the sample feature may characterize a difference between a third feature difference and a fourth feature difference corresponding to the third combined feature. During the training of the prediction model, the third prediction loss value should be gradually reduced. The calculation method for the third predicted loss value may refer to the above-described description of the correlation of the first predicted loss value. In a possible implementation manner, if the third feature difference and the fourth feature difference are both characterized by using the feature similarity, the third prediction loss value may be obtained by subtracting the third feature difference from the fourth feature difference corresponding to the third combined feature pair.

And generating a fourth prediction loss value of the prediction model for the sample characteristics according to the third characteristic difference corresponding to the first characteristic pair in the fourth combined characteristic pair and the fourth characteristic difference corresponding to the fourth characteristic pair. Wherein the fourth prediction loss value for the sample feature may characterize a difference between the third feature difference and the fourth feature difference corresponding to the fourth combined feature. In the training process of the prediction model, the fourth prediction loss value should be gradually reduced. The calculation method for the fourth predicted loss value may refer to the above-described description of the correlation of the third predicted loss value. In a possible implementation manner, if the third feature difference and the fourth feature difference are both characterized by using the feature similarity, the fourth predicted loss value may be obtained by subtracting the third feature difference from the fourth feature difference corresponding to the fourth combined feature pair.

And determining a second characteristic prediction deviation of the prediction model according to the third prediction loss value and the fourth prediction loss value. The second characteristic prediction deviation may be a sum of the third prediction loss value and the fourth prediction loss value. Specifically, the determination may be made according to the following formula (formula 5).

Wherein, among others,

predicting a deviation for a second feature determined based on the respective event feature and the image feature, e representing the event feature and v representing the image feature. S (e, v ') represents a feature similarity between the event feature e and the image feature v' in the fourth feature pair in the third combined feature pair, S (e, v) represents a feature similarity between the event feature e and the image feature v in the third feature pair in the third combined feature pair, μ _E (E, j) represents a weight given to a fourth feature difference between the event feature E and the image feature v 'with the sample number j in the third combined feature pair, E represents the event feature E, j represents the sample number of the image feature v', E _v ′[ ] ₊ Expression [ 2 ]]Expectation of the numerical value in (1), then _v′ [μ _E (e，j)S(e，v′)-S(e，v)+∈] ₊ The third prediction loss value described above is shown. Where e represents a boundary value by which the event feature and the image feature in the third feature pair may not be exactly the same, the event feature and the image feature in the fourth feature pair may not be exactly different, and it is also possible for the second feature prediction bias to converge. S (e ', v) represents the feature facies between the event feature e' and the image feature v in the fourth feature pair of the fourth combined feature pairSimilarity, S (S, v) represents the feature similarity between the event feature S and the image feature v in the third feature pair of the fourth combined feature pair, μ _E (E ', i) represents a weight given to a fourth feature difference between the event feature E' and the image feature v with the sample number i in the fourth combined feature pair, E 'represents the event feature E', j represents the sample number of the image feature v, and then E _e′ [μ _E (e′，i)S(e′，v)-S(e，v)+∈] ₊ The fourth predicted loss value described above is shown. Whereby the second characteristic prediction bias can be derived based on the sum of the third predicted loss value and the fourth predicted loss value.

In a possible implementation manner, for any fourth feature pair, the event to which the fourth feature pair includes the event feature is different from the event indicated by the event label corresponding to the included image feature, it is obvious that the smaller the percentage of the event to which the event feature belongs in the event indicated by at least one event label corresponding to the image feature is, the larger the difference between the image feature and the event feature should be, and further, when calculating the second prediction loss value, different weights may be given to the fourth feature differences corresponding to different fourth feature pairs, so as to achieve the above object, that is, the fourth feature differences are weighted according to the repetition degrees of the event feature in the fourth feature pair and the event label corresponding to the image feature. Then, the generating a third prediction loss value of the prediction model for the sample feature according to the third feature difference corresponding to the third feature pair in the third combined feature pair and the fourth feature difference corresponding to the fourth feature pair may specifically include the following steps: determining the event features contained in the fourth feature pair in the third combined feature pair as third event features, and determining the image features contained in the fourth feature pair in the third combined feature pair as third image features. And secondly, calling the prediction model to generate a fifth label weight of the event to which the third event characteristic belongs based on the third image characteristic, and generating a sixth label weight of the event label of the image data to which the third image characteristic belongs. The method for determining the fifth label weight and the sixth label weight may refer to the above description for determining the first label weight and the second label weight, that is, the fifth label weight corresponding to the event feature and the sixth label weight corresponding to the event label corresponding to the image feature are calculated by referring to formula 1 and formula 2, which is not described herein again. And generating a third difference weight of a fourth feature difference corresponding to the third combined feature pair according to the fifth label weight and the sixth label weight. Wherein, when determining the third difference weight, the determination may be performed according to the following formula.

Wherein, mu _E (e _k J) represents an event feature e _k A third difference weight corresponding to a fourth feature difference between the image features having the sample number j, that is, the weight given to the fourth feature difference corresponding to the third combined feature. Wherein, the first and the second end of the pipe are connected with each other,

representing a sixth label weight, E, corresponding to the second image feature _j -e _k Represents E _j Median event feature e _k The events other than the events to which the event belongs,

indicating the event feature e of each event label corresponding to the third image feature _k The sum of the sixth label weights corresponding to the event labels of the events outside the belonging event,

representing the characteristics of an event e _k And the weight of a fifth label corresponding to the event is delta, and the delta represents a preset constant.

And generating a third prediction loss value according to the third difference weight, the third feature difference corresponding to the third feature pair in the third combined feature pair and the fourth feature difference corresponding to the fourth feature pair in the third combined feature pair. The manner of generating the third predicted loss value may be determined by referring to the above equation 5, which is not described herein again.

Similarly, when calculating the fourth prediction loss value, different weights may be given to the fourth feature difference corresponding to the fourth feature pair in different fourth combined feature pairs, so that the fourth prediction loss value of the prediction model for the sample feature is generated according to the third feature difference corresponding to the first feature pair in the fourth combined feature pair and the fourth feature difference corresponding to the fourth feature pair, which may specifically include the following steps: 1. and determining the event feature contained in the fourth feature pair in the fourth combined feature pair as a fourth event feature, and determining the image feature contained in the fourth feature pair in the fourth combined feature pair as a fourth image feature. 2. And calling the prediction model to generate a seventh label weight of the event to which the fourth event characteristic belongs based on the fourth image characteristic, and generating an eighth label weight of the event label of the image data to which the fourth image characteristic belongs. 3. And generating a fourth difference weight of a fourth feature difference corresponding to the fourth combined feature pair according to the seventh label weight and the eighth label weight. 4. And generating a fourth prediction loss value according to the fourth difference weight, the third characteristic difference corresponding to the third characteristic pair in the fourth combined characteristic pair and the fourth characteristic difference corresponding to the fourth characteristic pair in the fourth combined characteristic pair. The manner of determining the seventh label weight and the eighth label weight may refer to the description related to the fifth label weight and the sixth label weight, and the manner of determining the fourth difference weight may refer to the description related to the third difference weight, which is not described herein again.

S508, calling a prediction model to predict the events indicated by the N text data respectively, and calling an information extractor to extract prediction related information of the predicted events indicated by the N text data from the N text data respectively.

Each text data in the N text data carries an associated information tag, and the associated information tag is used for indicating associated information in the text data, namely, for an argument indicated by each argument role corresponding to an event in the text data. The prediction correlation information is used for indicating arguments indicated by various argument roles corresponding to the events indicated by the text data extracted based on the prediction model.

In one possible implementation, an argument extractor may be included in the information extractor of the predictive model, and the argument extractor may be configured to extract information associated with an event indicated by the text data. Then, when extracting the prediction related information of the event indicated by the text data, the event in the text data may be determined based on the trigger extractor, and further determines an event feature of the event indicated by the text data based on an event encoder in the information extractor, then the characteristics of the trigger words, the text characteristics and the event characteristics obtained based on the trigger word extractor are spliced and input into the argument extractor, thus determining the probability that each participle in the text data belongs to each argument role corresponding to the event to which the input event characteristic belongs, if the probability of one argument role corresponding to one participle is greater than a threshold value, determining that the participle is the argument of the argument role corresponding to the event, and then, the continuous participles predicted to belong to the same argument role form corresponding arguments, so that the prediction related information of the event indicated by the text data is obtained.

In one possible implementation, the argument decimator in the predictive model may employ a method of sequence labeling. Referring to fig. 6, fig. 6 is a schematic diagram of a result of a trigger extractor according to an embodiment of the present application, as shown in fig. 6, in the trigger extractor (shown as 601 in fig. 6), an event indicated by text data and a trigger corresponding to the event may be predicted based on the trigger extractor (shown as 605 in fig. 6), and then an event characteristic (shown as 604 in fig. 6), a trigger corresponding to the trigger and a text characteristic (shown as 603 in fig. 6) of the predicted event may be input into an argument extractor (shown as 602 in fig. 6), in which the trigger and the text characteristic of the event characteristic may be first spliced together, then a linear layer (shown as 606 in fig. 6) is input, and then a Conditional Random Field (CRF) is used to perform joint probability characterization calculation on outputs of the linear layer, the probability that each participle corresponds to each argument role of the event is obtained, then, the argument role corresponding to each participle is determined based on the probability, and the participle predicted to have the corresponding argument role is determined as the argument of the event argument role, the argument and the argument role are the associated information of the event, as shown in 308 in fig. 3, in the text data, the event triggered by the trigger word 1 has arguments 1-1 and 1-2, the event triggered by the trigger word 2 has arguments 2-1 and 2-2, and the event triggered by the trigger word 2 has no corresponding argument in the text data. In the training process, model parameters of the prediction model can be optimized based on the associated information and the associated information labels of the events indicated by the text data predicted by the information extractor, so that the trained prediction model can accurately predict the associated information of the events indicated by the text data.

It can be understood that the above manner of predicting the event indicated by the text data and the associated information of the event may be referred to as a pipeline manner (an event extraction manner), and the embodiment of the present application may also implement prediction of the event based on the text feature by using a common extraction method. The event extraction task can be specifically converted into a question and answer task to realize the extraction of the event information. In addition, the event extraction task can be converted into a graph network generation task, namely, the task generated from text data to a graph is realized, the information of the event structure can be kept to the maximum extent in a combined mode, and the events with the same trigger words or arguments can be linked through potential connection in the graph mode, so that more information can be kept.

S509, an information extraction deviation of the information extractor is generated based on the associated information tags and the prediction associated information of the N text data.

Wherein the information extraction deviation may be determined for a difference between the predicted associated information based on the N text data and the associated information tag. In the training process, the information extraction deviation needs to be gradually reduced until convergence, that is, the argument role corresponding to each predicted character is gradually consistent with the associated information label, so that the trained prediction model can accurately determine the associated information of the event indicated by the text data.

And S510, correcting model parameters of the prediction model based on the first characteristic prediction deviation, the second characteristic prediction deviation and the information extraction deviation to obtain a trained prediction model.

The information extractor in the trained prediction model is used for extracting the associated information of the event indicated by the input text data, and then the event indicated by the text data and the associated information of the indicated event, namely the argument corresponding to each argument role corresponding to the indicated event, can be determined through the trained prediction model. It is understood that, in the training process of the prediction model, the first feature prediction deviation, the second feature prediction deviation and the information extraction deviation can be gradually reduced until convergence. In a possible embodiment, the trained prediction model may be obtained by correcting the model parameters of the prediction model based on the first feature prediction bias, the second feature prediction bias, the information extraction bias, and the event prediction bias.

Here, a training process of a prediction model is described as an example in the training process, please refer to fig. 7, and fig. 7 is a schematic diagram of a training framework of a prediction model according to an embodiment of the present application. First, a sample data set, that is, a plurality of text data shown as 701 in fig. 7 and a plurality of image data shown as 702 in fig. 7, is obtained, so that a text feature of each text data can be obtained based on a text encoder (shown as 703 in fig. 7), an image feature of each image data can be obtained based on an image encoder (shown as 704 in fig. 7), so as to determine a first feature prediction bias (shown as 705 in fig. 7) according to the respective text feature and the image feature, and an event feature of each event in the event set can be obtained based on an event encoder (shown as 709 in fig. 7), so as to determine a second feature prediction bias (shown as 706 in fig. 7) according to the respective image feature and the respective practice feature. And invoking a trigger word extractor (shown as 706 in fig. 7) to determine an event (shown as 707 in fig. 7) indicated by the text data predicted by each text feature based on the text feature of the respective text data, and to determine an event feature of the event indicated by the predicted text data based on the event encoder, so that, based on the information extractor composed of the event encoder and the argument decimator (as shown by 708 in fig. 7), the prediction related information of the event indicated by the text data (as shown by 710 in fig. 7), thereby determining an information extraction bias (shown as 711 in figure 7) based on the predicted correlation information and the correlation information tag carried by the text data, whereby model parameters of the prediction model can be corrected based on the first feature prediction bias, the second feature prediction bias and the information extraction bias, the prediction model is enabled to predict an event indicated by the text data and associated information of the event indicated by the text data.

It is understood that when the model parameter is corrected based on the first feature prediction bias, the difference in the features between the text data and the image data having the same event label is made small, the difference in the features between the text data and the image data having different event labels is made large, and when the model parameter is corrected based on the second feature prediction bias, the difference in the features between the events identical to the event indicated by the event label included in the image data is made small, and the difference in the features between the events not indicated by the event label included in the image data is made large, so that the event can be better characterized based on the text data and the image data, and the event can be better characterized based on the text feature and the image data, and the extraction effect of the argument corresponding to the event can be improved, therefore, the accuracy of prediction of the event in the text data and the accuracy of prediction of the associated information corresponding to the event are improved.

In a possible scenario, the embodiment of the present application can be applied to an online inquiry scenario in the medical technology field. Referring to fig. 8, fig. 8 is a schematic flowchart of an application scenario provided in the embodiment of the present application. As shown in fig. 8, if a user needs to perform an on-line inquiry, a home page of an inquiry website may be opened (as shown in step S801), so as to determine a department that needs to be registered based on the intelligent navigation of the text website (as shown in step S802). After the user finishes the registration, the user may enter a pre-inquiry process (as shown in step S803), and further obtain relevant basic information of the user, such as current medical history, family history, and disease cause, based on some preset inquiry questions. Furthermore, the event and the information related to the event, which are indicated by the text data in the pre-inquiry process, can be determined based on the prediction model (as shown in step S804), and further, an electronic examination report can be generated based on the event and the information related to the event, and a doctor assigned to the user is pushed to perform inquiry (as shown in step S805), so that the doctor can have preliminary knowledge about the condition of the user before diagnosing the user, and the whole diagnosis and treatment process is optimized. It is understood that the embodiments of the present application are not limited to the medical technology field, and may be applied to other fields, which are not limited herein.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure. Alternatively, the data processing apparatus may be disposed in the electronic device. As shown in fig. 9, the data processing apparatus described in the present embodiment may include:

an obtaining unit 901, configured to obtain a sample data set; the sample data set sample data comprises N text data and M image data, any text data and any image data have event tags, and N and M are positive integers;

a processing unit 902, configured to invoke a prediction model to generate a text feature of each text data and an image feature of each image data;

the processing unit 902 is further configured to generate a first feature difference between a text feature and an image feature included in each of the plurality of first feature pairs, respectively; any first feature pair contains the same event label of the text data to which the text feature belongs and the image data to which the image feature belongs;

the processing unit 902 is further configured to generate a second feature difference between a text feature and an image feature included in each of the plurality of second feature pairs, respectively; any second feature pair contains text data to which the text features belong and image data to which the image features belong, and has different event labels;

the processing unit 902 is further configured to modify a model parameter of the prediction model based on a first feature difference corresponding to each first feature pair and a second feature difference corresponding to each second feature pair, so as to obtain a trained prediction model; the trained predictive model is used for predicting the event indicated by the text data according to the input text data.

In an implementation manner, the processing unit 902 is specifically configured to:

combining the plurality of first feature pairs and the plurality of second feature pairs to obtain first combined feature pairs and second combined feature pairs; a first feature pair and a second feature pair in the first combined feature pair comprise the same text feature, and a first feature pair and a second feature pair in the second combined feature pair comprise the same image feature;

In one implementation, the processing unit 902 is specifically configured to:

determining text features contained in a second feature pair in the first combined feature pair as first text features, and determining image features contained in the second feature pair in the first combined feature pair as first image features;

calling a prediction model to generate a first label weight of an event label of the text data to which the first text feature belongs based on the first image feature, and generating a second label weight of the event label of the image data to which the first image feature belongs;

generating a first difference weight of the first combined feature to the corresponding second feature difference according to the first label weight and the second label weight;

and generating a first prediction loss value according to the first difference weight, the first characteristic difference corresponding to the first characteristic pair in the first combined characteristic pair and the second characteristic difference corresponding to the second characteristic pair in the first combined characteristic pair.

In one implementation, the processing unit 902 is specifically configured to:

determining the text features contained in the second feature pair in the second combined feature pair as second text features, and determining the image features contained in the second feature pair in the first combined feature pair as second image features;

calling the prediction model to generate a third label weight of an event label of the text data to which the second text feature belongs based on the second image feature, and generating a fourth label weight of the event label of the image data to which the second image feature belongs;

generating a second difference weight of the second combined feature pair corresponding to the second feature difference according to the third label weight and the fourth label weight;

and generating a second prediction loss value according to the second difference weight, the first characteristic difference corresponding to the first characteristic pair in the second combined characteristic pair and the second characteristic difference corresponding to the second characteristic pair in the second combined characteristic pair.

In one implementation, event tags of sample data in a sample data set are labeled based on an event set, wherein the event set comprises a plurality of events; the N text data carry associated information labels; the prediction model comprises an information extractor; the processing unit 902 is specifically configured to:

generating a first feature prediction bias of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair;

calling an information extractor to generate an event feature of each event in an event set;

generating a second feature prediction bias of the information extractor based on the event feature of each event and the image feature of each image data;

calling a prediction model to respectively predict events indicated by the N text data, and calling an information extractor to respectively extract prediction related information of the predicted events indicated by the N text data from the N text data;

generating information extraction deviation of an information extractor based on the associated information labels and the predicted associated information of the N text data;

correcting model parameters of the prediction model based on the first characteristic prediction deviation, the second characteristic prediction deviation and the information extraction deviation to obtain a trained prediction model; an information extractor in the trained predictive model is used to extract associated information for events indicated by the input text data.

In one implementation, the processing unit 902 is specifically configured to:

calling an information extractor to generate a third feature difference between the event feature and the image feature contained in each of the plurality of third feature pairs; any third feature pair contains the same event as the event indicated by the event label corresponding to the contained image feature;

calling an information extractor to generate a fourth feature difference between the event feature and the image feature contained in each of a plurality of fourth feature pairs; any fourth feature pair contains an event feature which belongs to an event different from the event indicated by the event label corresponding to the contained image feature;

the calling information extractor generates a second feature prediction bias for the information extractor based on the third feature difference corresponding to each third feature pair and the fourth feature difference corresponding to each fourth feature pair.

In one implementation, the processing unit 902 is further configured to:

acquiring target text data;

calling the trained prediction model to generate text features of the target text data;

and calling the trained prediction model to predict the event indicated by the target text data based on the text features of the target text data.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device described in this embodiment includes: a processor 1001 and a memory 1002. Optionally, the electronic device may further include a network interface or a power supply module. The processor 1001 and the memory 1002 may exchange data with each other.

The Processor 1001 may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The network interface may include an input device, such as a control panel, a microphone, a receiver, etc., and/or an output device, such as a display screen, a transmitter, etc., to name but a few. For example, in an application embodiment, the network interface may include a receiver and a transmitter.

The memory 1002 may include a read-only memory and a random access memory, and provides program instructions and data to the processor 1001. A portion of the memory 1002 may also include non-volatile random access memory. When the processor 1001 calls the program instruction, it is configured to:

In one implementation, the processor 1001 is specifically configured to:

In one implementation, event tags of sample data in a sample data set are labeled based on an event set, wherein the event set comprises a plurality of events; the N text data carry associated information labels; the prediction model comprises an information extractor; the processor 1001 is specifically configured to:

In one implementation, the processor 1001 is specifically configured to:

In one implementation, the processor 1001 is further configured to:

acquiring target text data;

Optionally, the program instructions may also implement other steps of the method in the above embodiments when executed by the processor, and details are not described here.

The present application further provides a computer-readable storage medium, in which a computer program is stored, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the above method, such as executing the method executed by the electronic device, which is not described herein in detail.

Optionally, the storage medium, such as a computer-readable storage medium, referred to herein may be non-volatile or volatile.

Alternatively, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps performed in the embodiments of the methods described above. For example, the computer device may be a terminal, or may be a server.

The foregoing detailed description has provided a data processing method, an electronic device, a program product, and a medium provided in the embodiments of the present application, and specific examples have been applied herein to explain the principles and implementations of the present application, and the description of the foregoing embodiments is only used to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of data processing, the method comprising:

acquiring a sample data set; the sample data in the sample data set comprises N text data and M image data, any text data and any image data have event tags, and N and M are positive integers;

respectively generating a second feature difference between the text feature and the image feature contained in each of the plurality of second feature pairs; any second feature pair has different event labels for the text data to which the included text features belong and the image data to which the included image features belong;

correcting the model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair to obtain a trained prediction model; the trained predictive model is used for predicting the event indicated by the text data according to the input text data.

2. The method of claim 1, wherein the modifying the model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair to obtain the trained prediction model comprises:

combining the plurality of first feature pairs and the plurality of second feature pairs to obtain a first combined feature pair and a second combined feature pair; a first feature pair and a second feature pair of the first combined feature pair comprise the same text feature, and a first feature pair and a second feature pair of the second combined feature pair comprise the same image feature;

generating a first prediction loss value of the prediction model for the sample characteristics according to a first characteristic difference corresponding to a first characteristic pair in the first combined characteristic pair and a second characteristic difference corresponding to a second characteristic pair;

generating a second prediction loss value of the prediction model for the sample feature according to a first feature difference corresponding to a first feature pair in the second combined feature pair and a second feature difference corresponding to a second feature pair;

and determining a first characteristic prediction deviation of the prediction model according to the first prediction loss value and the second prediction loss value, and correcting model parameters of the prediction model according to the first characteristic prediction deviation to obtain a trained prediction model.

3. The method of claim 2, wherein generating a first prediction loss value of the prediction model for the sample feature according to a first feature difference corresponding to a first feature pair in the first combined feature pair and a second feature difference corresponding to a second feature pair comprises:

calling the prediction model to generate a first label weight of an event label of the text data to which the first text feature belongs based on the first image feature, and generating a second label weight of the event label of the image data to which the first image feature belongs;

generating a first difference weight of a second feature difference corresponding to the first combined feature pair according to the first label weight and the second label weight;

and generating a first prediction loss value according to the first difference weight, the first feature difference corresponding to the first feature pair in the first combined feature pair and the second feature difference corresponding to the second feature pair in the first combined feature pair.

4. The method of claim 2, wherein generating a second prediction loss value of the prediction model for the sample feature according to a first feature difference corresponding to a first feature pair and a second feature difference corresponding to a second feature pair in the second combined feature pair comprises:

determining text features contained in a second feature pair in the second combined feature pair as second text features, and determining image features contained in the second feature pair in the first combined feature pair as second image features;

generating a second difference weight of a second feature difference corresponding to the second combined feature pair according to the third label weight and the fourth label weight;

and generating a second prediction loss value according to the second difference weight, the first feature difference corresponding to the first feature pair in the second combined feature pair and the second feature difference corresponding to the second feature pair in the second combined feature pair.

5. The method of claim 1, wherein event tags of sample data in the sample data set are labeled based on an event set, wherein the event set comprises a plurality of events; the N text data carry associated information labels; the predictive model includes an information extractor; the modifying the model parameters of the prediction model based on the first feature difference corresponding to each first feature pair and the second feature difference corresponding to each second feature pair to obtain a trained prediction model includes:

calling the information extractor to generate an event feature of each event in the event set;

generating a second feature prediction bias for the information extractor based on the event features for each event and the image features for each image data;

calling the prediction model to respectively predict events indicated by the N text data, and calling the information extractor to respectively extract predicted relevant information of the predicted events indicated by the N text data from the N text data;

generating an information extraction deviation of the information extractor based on the associated information tags and the predicted associated information of the N text data;

correcting model parameters of the prediction model based on the first characteristic prediction deviation, the second characteristic prediction deviation and the information extraction deviation to obtain a trained prediction model; and the information extractor in the trained prediction model is used for extracting the associated information of the event indicated by the input text data.

6. The method of claim 5, wherein generating a second feature prediction bias for the information extractor based on the event feature for each event and the image feature for each image data comprises:

calling the information extractor to generate a third feature difference between the event feature and the image feature contained in each of the plurality of third feature pairs; any third feature pair contains the same event as the event indicated by the event label corresponding to the contained image feature;

calling the information extractor to generate a fourth feature difference between the event feature and the image feature contained in each of a plurality of fourth feature pairs; any fourth feature pair contains an event feature which belongs to an event different from the event indicated by the event label corresponding to the contained image feature;

invoking the information extractor to generate the second feature prediction bias for the information extractor based on the third feature difference corresponding to the each third feature pair and the fourth feature difference corresponding to the each fourth feature pair.

7. The method of claim 1, further comprising:

acquiring target text data;

and calling the trained prediction model to predict and obtain the event indicated by the target text data based on the text features of the target text data.

8. An electronic device comprising a processor, a memory, wherein the memory is configured to store a computer program comprising program instructions, and wherein the processor is configured to invoke the program instructions to perform the method of any of claims 1-7.

9. A computer program product comprising computer programs/instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.