CN116644743A - Information extraction, article identification and information extraction model training method - Google Patents

Information extraction, article identification and information extraction model training method Download PDF

Info

Publication number
CN116644743A
CN116644743A CN202310431817.3A CN202310431817A CN116644743A CN 116644743 A CN116644743 A CN 116644743A CN 202310431817 A CN202310431817 A CN 202310431817A CN 116644743 A CN116644743 A CN 116644743A
Authority
CN
China
Prior art keywords
information extraction
text
prompt
information
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310431817.3A
Other languages
Chinese (zh)
Inventor
赵富邦
刘程远
康杨杨
孙常龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202310431817.3A priority Critical patent/CN116644743A/en
Publication of CN116644743A publication Critical patent/CN116644743A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the specification provides an information extraction, article identification and information extraction model training method, wherein the information extraction method comprises the following steps: receiving an information extraction task, wherein the information extraction task comprises a text to be extracted and at least two prompt messages; inputting the text to be extracted and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to a message extraction task, wherein the message extraction matrix represents the corresponding relation between the text to be extracted and each prompt message; and determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages. On the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the message extraction matrix, so that the message extraction speed is not influenced by the complexity of the message extraction schema, and the message extraction efficiency is improved.

Description

Information extraction, article identification and information extraction model training method
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to an information extraction method.
Background
With the development of computer technology, text processing is increasingly dependent on the internet. Text processing is a process of analyzing, understanding, extracting, etc., text, and has been widely used in various fields of people's daily life. Taking information extraction as an example, the information extraction refers to a text processing technology for extracting fact information such as entities, relations, events and the like of a specified type from natural language text and forming structured data output.
At present, information extraction can be generally performed by using a machine reading and understanding (MRC, machine Reading Comprehension) manner, however, for complex and diverse information extraction tasks, information extraction can be achieved only by using different machine learning models, which results in large resource consumption in the model training process and long time consumption in the information extraction process, so that an efficient information extraction scheme is needed.
Disclosure of Invention
In view of this, the present embodiment provides an information extraction method. One or more embodiments of the present disclosure relate to an article identification method, an information extraction model training method, an information extraction apparatus, an article identification apparatus, an information extraction model training apparatus, a computing device, a computer-readable storage medium, and a computer program, which solve the technical drawbacks of the prior art.
According to a first aspect of embodiments of the present disclosure, there is provided an information extraction method, including:
receiving an information extraction task, wherein the information extraction task comprises a text to be extracted and at least two prompt messages;
inputting the text to be extracted and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to a message extraction task, wherein the message extraction matrix represents the corresponding relation between the text to be extracted and each prompt message;
and determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages.
According to a second aspect of embodiments of the present specification, there is provided an article identification method comprising:
receiving an article identification task, wherein the article identification task comprises a text to be identified and at least two prompt messages;
inputting the text to be identified and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to the article identification task, wherein the message extraction matrix represents the corresponding relation between the text to be identified and each prompt message;
and determining a target recognition result corresponding to the object recognition task according to the information extraction matrix, the text to be recognized and at least two prompt messages.
According to a third aspect of embodiments of the present disclosure, there is provided an information extraction model training method applied to cloud-side equipment, including:
acquiring a sample set, wherein the sample set comprises a plurality of sample texts, and the sample texts carry information extraction labels and at least two sample prompt messages;
extracting a first sample text from a plurality of sample texts, wherein the first sample text is any one of the plurality of sample texts;
inputting the first sample text and first sample prompt information carried by the first sample into an initial information extraction model to obtain a first prediction information extraction matrix corresponding to the first sample text;
determining a first prediction extraction result corresponding to the first sample text according to the first prediction information extraction matrix, the first sample text and the first sample prompt information;
comparing the first predicted extraction result with a first information extraction label carried by the first sample, and calculating a loss value;
adjusting model parameters of the initial information extraction model according to the loss value, and returning to execute the step of extracting a first sample text from the plurality of sample texts until a preset stopping condition is reached, so as to obtain the model parameters of the information extraction model;
and sending the model parameters of the information extraction model to the terminal equipment.
According to a fourth aspect of embodiments of the present specification, there is provided an information extraction apparatus including:
the first receiving module is configured to receive an information extraction task, wherein the information extraction task comprises a text to be extracted and at least two prompt messages;
the first input module is configured to input the text to be extracted and at least two prompt messages into the information extraction model, and determine an information extraction matrix corresponding to the information extraction task, wherein the information extraction matrix represents the corresponding relation between the text to be extracted and each prompt message;
the first determining module is configured to determine a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages.
According to a fifth aspect of embodiments of the present specification, there is provided an article identification device comprising:
the second receiving module is configured to receive an article identification task, wherein the article identification task comprises a text to be identified and at least two prompt messages;
the second input module is configured to input the text to be identified and at least two prompt messages into the information extraction model, and determine an information extraction matrix corresponding to the article identification task, wherein the information extraction matrix represents the corresponding relation between the text to be identified and each prompt message;
The second determining module is configured to determine a target recognition result corresponding to the object recognition task according to the information extraction matrix, the text to be recognized and at least two prompt messages.
According to a sixth aspect of embodiments of the present disclosure, there is provided an information extraction model training apparatus applied to cloud-side equipment, including:
the acquisition module is configured to acquire a sample set, wherein the sample set comprises a plurality of sample texts, and the sample texts carry information extraction labels and at least two sample prompt messages;
an extraction module configured to extract a first sample text from a plurality of sample texts, wherein the first sample text is any one of the plurality of sample texts;
the third input module is configured to input the first sample text and first sample prompt information carried by the first sample into the initial information extraction model to obtain an information extraction matrix corresponding to the first sample text;
the third determining module is configured to determine a first prediction extraction result corresponding to the first sample text according to the first prediction information extraction matrix, the first sample text and the first sample prompt information;
the calculating module is configured to compare the first predicted extraction result with the first information extraction label carried by the first sample, and calculate a loss value;
The adjusting module is configured to adjust model parameters of the initial information extraction model according to the loss value, and returns to execute the step of extracting a first sample text from the plurality of sample texts until a preset stopping condition is reached, so as to obtain the model parameters of the information extraction model;
and the sending module is configured to send the model parameters of the information extraction model to the end-side equipment.
According to a seventh aspect of embodiments of the present specification, there is provided a computing device comprising:
a memory and a processor;
the memory is configured to store computer executable instructions that, when executed by the processor, implement the steps of the methods provided in the first, second or third aspects above.
According to an eighth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing computer executable instructions which when executed by a processor implement the steps of the method provided in the first or second or third aspects above.
According to a ninth aspect of embodiments of the present specification, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the method provided in the first or second or third aspect described above.
According to the information extraction method provided by the embodiment of the specification, an information extraction task is received, wherein the information extraction task comprises a text to be extracted and at least two prompt messages; inputting the text to be extracted and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to a message extraction task, wherein the message extraction matrix represents the corresponding relation between the text to be extracted and each prompt message; and determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages. On the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the message extraction matrix, so that the message extraction speed is not influenced by the complexity of the message extraction schema (schema), and the message extraction efficiency is improved.
Drawings
FIG. 1 is a block diagram of an information extraction system according to one embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for extracting information according to one embodiment of the present disclosure;
FIG. 3 is a flow chart of a method of identifying an item provided in one embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for training an information extraction model according to one embodiment of the present disclosure;
FIG. 5 is a flow chart of another information extraction method provided by one embodiment of the present disclosure;
FIG. 6 is a flowchart of a process of an information extraction method according to one embodiment of the present disclosure;
FIG. 7 is a flowchart of a process of another information extraction method according to one embodiment of the present disclosure;
FIG. 8 is a schematic diagram of an attention mask matrix in an information extraction method according to one embodiment of the present disclosure;
FIG. 9 is an interface diagram of an information extraction interface according to one embodiment of the present disclosure;
fig. 10 is a schematic structural view of an information extraction device according to an embodiment of the present disclosure;
FIG. 11 is a schematic diagram of an article identification device according to an embodiment of the present disclosure;
FIG. 12 is a schematic diagram of a training device for information extraction model according to one embodiment of the present disclosure;
FIG. 13 is a block diagram of a computing device provided in one embodiment of the present description.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.
First, terms related to one or more embodiments of the present specification will be explained.
Prompt learning (prompt learning): and a method for obtaining a better effect under a low-resource scene by utilizing the capability of the pre-training model.
General information extraction (UIE, universal Information Extraction): according to a specific extraction framework, information structures (entities, relationships, events, etc.) meeting the extraction requirements are extracted from a given set of free texts. Different extraction frameworks may extract different information structures for the same input text.
BERT model: the BERT (Bidirectional Encoder Representation from Transformers) model is a pre-trained language characterization model that can be used as an encoder to extract features from the input text. It emphasizes that instead of pre-training by using a conventional one-way language model or shallow stitching of two one-way language models as in the past, a new mask language model is used to enable deep two-way language characterization.
seq2seq: seq2seq is a network of encoder-decoder structures whose input is a sequence and whose output is a sequence. The encoder changes a variable length signal sequence into a fixed length vector representation, and the decoder changes the fixed length vector into a variable length target signal sequence.
With the development of computer technology, text processing is increasingly dependent on the internet. Text processing is a process of analyzing, understanding, extracting, etc., text, and has been widely used in various fields of people's daily life. Taking information extraction as an example, the information extraction refers to a text processing technology for extracting fact information such as entities, relations, events and the like of a specified type from natural language text and forming structured data output.
Currently, common information extraction schemes include the following two types: first, information extraction is based on machine-readable understanding (MRC, machine Reading Comprehension). However, core processors (CPUs, central Processing Unit) inference time is long based on the way machine reads understand. Second, information extraction is performed based on the seq2 seq. However, the information extraction is performed based on the seq2seq, which has a problem of poor information extraction efficiency. Therefore, there is a need for an efficient information extraction scheme.
In order to solve the above problems, the present disclosure provides an information extraction method, which receives an information extraction task, where the information extraction task includes a text to be extracted and at least two hint information; inputting the text to be extracted and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to a message extraction task, wherein the message extraction matrix represents the corresponding relation between the text to be extracted and each prompt message; and determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages. On the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the message extraction matrix, so that the message extraction speed is not influenced by the complexity of the message extraction schema, and the message extraction efficiency is improved.
In the present specification, an information extraction method, the present specification relates to an article identification method, an information extraction model training method, an information extraction apparatus, an article identification apparatus, an information extraction model training apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.
Referring to fig. 1, fig. 1 illustrates an architecture diagram of an information extraction system provided in one embodiment of the present disclosure, where the information extraction system may include a client 100 and a server 200;
the client 100 is configured to send an information extraction task to the server 200, where the information extraction task includes a text to be extracted and at least two hint information;
the server 200 is configured to input a text to be extracted and at least two prompt messages into the information extraction model, and determine an information extraction matrix corresponding to the information extraction task, where the information extraction matrix characterizes a corresponding relationship between the text to be extracted and each prompt message; determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages; sending a target extraction result corresponding to the information extraction task to the client 100;
the client 100 is further configured to receive a target extraction result corresponding to the information extraction task sent by the server 200.
By applying the scheme of the embodiment of the specification, an information extraction task is received, wherein the information extraction task comprises a text to be extracted and at least two prompt messages; inputting the text to be extracted and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to a message extraction task, wherein the message extraction matrix represents the corresponding relation between the text to be extracted and each prompt message; and determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages. On the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the message extraction matrix, so that the message extraction speed is not influenced by the complexity of the message extraction schema, and the message extraction efficiency is improved.
In practical applications, the information extraction system may include a plurality of clients 100 and a server 200. Communication connection can be established between the plurality of clients 100 through the server 200, and in the information extraction scenario, the server 200 is used to provide information extraction services between the plurality of clients 100, and the plurality of clients 100 can respectively serve as a transmitting end or a receiving end, so that communication is realized through the server 200.
The user may interact with the server 200 through the client 100 to receive data transmitted from other clients 100, or transmit data to other clients 100, etc. In the information extraction scenario, the user may issue a data stream to the server 200 through the client 100, and the server 200 generates an information extraction result according to the data stream and pushes the information extraction result to other clients that establish communication.
Wherein, the client 100 and the server 200 establish a connection through a network. The network provides a medium for a communication link between client 100 and server 200. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The data transmitted by the client 100 may need to be encoded, transcoded, compressed, etc. before being distributed to the server 200.
The client 100 may be a browser, APP (Application), or a web Application such as H5 (HyperText Markup Language, hypertext markup language (htv) 5 th edition) Application, or a light Application (also called applet, a lightweight Application) or cloud Application, etc., and the client 100 may be based on a software development kit (SDK, software Development Kit) of a corresponding service provided by the server 200, such as a real-time communication (RTC, real Time Communication) based SDK development acquisition, etc. The client 100 may be deployed in an electronic device, need to run depending on the device or some APP in the device, etc. The electronic device may have a display screen and support information browsing, etc., for example, may be a terminal-side device such as a personal mobile terminal, e.g., a mobile phone, a tablet computer, a personal computer, etc. Various other types of applications are also commonly deployed in electronic devices, such as human-machine conversation type applications, model training type applications, text processing type applications, web browser applications, shopping type applications, search type applications, instant messaging tools, mailbox clients, social platform software, and the like.
The server 200 may include a server that provides various services, such as a server that provides communication services for multiple clients, a server for background training that provides support for a model used on a client, a server that processes data sent by a client, and so on. It should be noted that, the server 200 may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. The server may also be a server of a distributed system or a server that incorporates a blockchain. The server may also be a cloud server (cloud-side device) of a basic cloud computing service such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, domain name service, security service, content delivery network (CDN, content Delivery Network), big data, an artificial intelligence platform, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.
It should be noted that, the information extraction method provided in the embodiments of the present disclosure is generally executed by the server, but in other embodiments of the present disclosure, the client may also have a similar function to the server, so as to execute the information extraction method provided in the embodiments of the present disclosure. In other embodiments, the information extraction method provided in the embodiments of the present disclosure may be performed by the client and the server together.
Referring to fig. 2, fig. 2 shows a flowchart of an information extraction method according to an embodiment of the present disclosure, which specifically includes the following steps:
step 202: and receiving an information extraction task, wherein the information extraction task comprises a text to be extracted and at least two prompt messages.
In one or more embodiments of the present disclosure, an information extraction task may be processed according to a text to be extracted and at least two hint information in the information extraction task, so as to implement information extraction.
Specifically, the information extraction task refers to a task of extracting a continuous text segment (span) from a text to be extracted. The task types corresponding to the information extraction task are various, and the task types include, but are not limited to, a named entity recognition task, a relation extraction task, an event extraction task and an attribute emotion extraction task. The information extraction task may be a task of a different scenario including, but not limited to, a financial scenario, a conference scenario, an e-commerce scenario. The text to be extracted is the object of information extraction. The prompt information is information for guiding the information extraction process, and can be understood as a schema (schema), which is a general and abstract description of things, embodies the cognition level of the things, and determines the capability conversion of the machine to extract events.
It should be noted that the named entity recognition task is a binary task including an entity type and an entity span. The relationship extraction task is a triplet task that includes a subject span, a relationship type, and an object span. The event extraction task is a four-tuple task including an event type, a trigger word span, an argument type, and an argument span. The attribute emotion extraction task is a triplet task comprising a theme, emotion span and emotion polarity.
Step 204: inputting the text to be extracted and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to a message extraction task, wherein the message extraction matrix represents the corresponding relation between the text to be extracted and each prompt message.
In one or more embodiments of the present disclosure, after receiving the information extraction task, further, the text to be extracted and at least two prompt messages may be input into an information extraction model, to determine an information extraction matrix corresponding to the information extraction task.
Specifically, the information extraction model information is obtained by training based on a plurality of sample texts, information extraction labels carried by various sample texts and at least two sample prompt messages, wherein the sample prompt messages are obtained by converting the corresponding schema of the sample texts. The information extraction model is a machine learning model, which can be understood as a trained program that can find patterns in new data and make predictions. These models are represented as a mathematical function that accepts requests in the form of input data, predicts the input data, and then provides outputs in response.
In practical application, the information extraction model comprises a feature extraction layer and an attention layer; the text to be extracted and at least two prompt messages can be input into the information extraction model, the feature extraction layer is utilized to extract features of the text to be extracted and the at least two prompt messages, the attention layer is utilized to process the prompt features and the text features output by the feature extraction layer, and an information extraction matrix is obtained, namely, the text to be extracted and the at least two prompt messages are input into the information extraction model, and the information extraction matrix corresponding to the information extraction task is determined, and the method comprises the following steps:
inputting the text to be extracted and at least two prompt messages into a feature extraction layer to obtain text features of the text to be extracted and prompt features corresponding to the prompt messages;
inputting text features of the text to be extracted and prompt features corresponding to the prompt messages into the attention layer to obtain an information extraction matrix corresponding to the information extraction task.
Specifically, the information extraction model includes an input layer, a feature extraction layer, an attention layer, and an output layer. The input layer transmits the input text to be extracted and at least two prompt messages into the feature extraction layer; the feature extraction layer outputs the coded features to the attention layer; the attention layer converts the coded features into a two-dimensional score matrix, namely an information extraction matrix; the output layer calculates the extracted information according to the positions of 0 and 1 in the information extraction matrix. The feature extraction layer may be referred to as an encoder layer, and is configured to generate an embedded high-dimensional spatial representation of the input information, that is, text features of the text to be extracted and prompt features corresponding to each prompt message. Text features may also be referred to as text vectors and hint features may also be referred to as hint vectors. The feature extraction layer includes, but is not limited to, a recurrent neural network (RNN, recurrent Neural Networks), a convolutional neural network (CNN, convolutional Neural Network), and the like, and is specifically selected according to the actual situation, which is not limited in any way by the embodiments of the present specification.
In the attention layer, attention distribution of the input information may be calculated by an attention mechanism, and a weighted average of the input information may be further calculated according to the attention distribution, so as to obtain an information extraction matrix corresponding to the information extraction task.
By applying the scheme of the embodiment of the specification, the text to be extracted and at least two prompt messages are input into a feature extraction layer, so that text features of the text to be extracted and prompt features corresponding to the prompt messages are obtained; the text features of the text to be extracted and the prompt features corresponding to the prompt messages are input into the attention layer, an information extraction matrix corresponding to the information extraction task is obtained, and the corresponding relation between the text to be extracted and the prompt messages is expressed through the information extraction matrix, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the information extraction efficiency is improved.
In an optional embodiment of the present disclosure, the inputting the text feature of the text to be extracted and the prompt feature corresponding to each prompt message into the attention layer to obtain the information extraction matrix corresponding to the information extraction task may include the following steps:
in the attention layer, a feature matrix is constructed according to text features of the text to be extracted and prompt features corresponding to the prompt messages;
And determining an information extraction matrix corresponding to the information extraction task according to the attention mechanism and the feature matrix.
Specifically, the feature matrix includes a plurality of feature vectors, the information extraction matrix may be a "01" matrix, and based on the effective value "1" in the information extraction matrix, the extraction result corresponding to the information extraction task may be determined.
It should be noted that, according to the text feature of the text to be extracted and the corresponding prompt feature of each prompt message, there are various ways of constructing the feature matrix, and the selection is specifically performed according to the actual situation, which is not limited in any way in the embodiment of the present specification. In one possible implementation manner of the present disclosure, a text of a text to be extracted and a prompt feature corresponding to each prompt message may be spliced to obtain a spliced feature, and further, the spliced feature is used as a row and a column of a matrix to construct a feature matrix. In another possible implementation manner of the present disclosure, a first matrix may be constructed by using text features as rows and columns of the matrix, and a second matrix may be constructed by using the corresponding prompt features of each prompt message as rows and columns of the matrix, so as to construct a feature matrix based on the first matrix and the second matrix.
Further, after the feature matrix is constructed and obtained, the attention mechanism in the self-attention layer can be utilized to perform attention calculation on the feature matrix, so that an information extraction matrix corresponding to the information extraction task is obtained.
By applying the scheme of the embodiment of the specification, in the attention layer, a feature matrix is constructed according to text features of the text to be extracted and prompt features corresponding to the prompt messages; according to the attention mechanism and the feature matrix, an information extraction matrix corresponding to the information extraction task is determined, a two-dimensional information extraction matrix is obtained through vector product calculation, and the information extraction matrix is utilized to express the corresponding relation between the text to be extracted and each prompt message, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the information extraction efficiency is improved.
In another alternative embodiment of the present disclosure, the location information of each character and the mask processing manner may be introduced during the processing of the feature extraction layer, so as to improve the accuracy of the information processing model, that is, the feature extraction layer includes an embedding layer and at least one encoding layer; the text to be extracted and at least two prompt messages are input into the feature extraction layer to obtain text features of the text to be extracted and prompt features corresponding to the prompt messages, and the method comprises the following steps:
inputting the text to be extracted and each prompting message into an embedding layer, and determining a text characteristic sequence of the text to be extracted and a prompting characteristic sequence of each prompting message in the embedding layer by utilizing the position information of each character in the text to be extracted and the position information of each character in each prompting message;
Inputting the text feature sequence of the text to be extracted and the prompt feature sequence of each prompt message into at least one coding layer, and determining the text feature of the text to be extracted and the prompt feature corresponding to each prompt message in the at least one coding layer by using the attention mask matrix.
In particular, the feature extraction layer may be composed of an embedded layer and at least one coding layer, at least one coding layer being a transducer structure, using a pre-trained language model. The embedded layer can convert discrete characters in the text to be extracted and each prompt message into embedded feature sequences, namely text feature sequences and prompt feature sequences, and can integrate position information (positionedof the characters when the embedded layer calculates the embedded feature sequences, so that information extraction is accurately realized. At least one coding layer may calculate a high-dimensional vector representation of the embedded feature sequence, i.e. text features and corresponding hinting features for each hinting information, and an attention mask (attention mask) matrix may be used when calculating the high-dimensional vector representation with at least one coding layer.
By applying the scheme of the embodiment of the specification, the position information is integrated in the processing process of the embedded layer, and the attention mask matrix is integrated in the processing process of at least one coding layer, so that the accuracy of the information processing model is improved.
Further, the attention layer comprises a first feedforward neural network layer and a second feedforward neural network layer; the text feature of the text to be extracted and the prompt feature corresponding to each prompt message are input into the attention layer, and an information extraction matrix corresponding to the information extraction task is obtained, which may include the following steps:
inputting the text characteristics and the prompt characteristics corresponding to the prompt messages into a first feedforward neural network layer to obtain first network characteristics;
inputting the text features and the prompt features corresponding to the prompt messages into a second feedforward neural network layer to obtain second network features;
calculating relative position features according to the text features and the feature position information of the prompt features corresponding to the prompt information;
and determining an information extraction matrix corresponding to the information extraction task according to the first network characteristic, the second network characteristic and the relative position characteristic.
In particular, the attention layer may be a rotational attention layer (rotational displacement), by which the high-dimensional feature representation may be converted into a two-dimensional information extraction matrix, which is used to represent the fraction between every two characters. The rotating attention layer includes two feedforward neural networks, namely a first feedforward neural network and a second feedforward neural network. The two feedforward neural networks can respectively convert the high-dimensional characteristics into two network characteristics, and further can utilize the information extraction matrix obtained by the vector product calculation of the output and the relative position characteristics of the two feedforward neural networks.
In practical applications, the information extraction matrix may be determined in the attention layer by the following formula (1):
wherein T is the matrix transposition operation,for Cronecker product, Z represents the output two-dimensional information extraction matrix, < >>Is the value of the jth row and kth column, h is the output of the feature extraction layer, i.e. the input of the attention layer,>a high-dimensional feature representing the jth word of the ith row; />High-dimensional features representing the kth word of line i, FFNN q Representing a first feed forward neural network, FFNN k Representing a second feedforward neural network, the first feedforward neural network and the second feedforward neural network converting h into two different network characteristics respectively;position code representing the kth position of line i,/->Position code indicating the j-th position of the i-th row,/->Is according to->And->The relative position features calculated by the distance between them, which may also be referred to as relative position-coded vectors, M is the attention matrix.
By applying the scheme of the embodiment of the specification, inputting the text characteristics and the prompt characteristics corresponding to each prompt message into a first feedforward neural network layer to obtain first network characteristics; inputting the text features and the prompt features corresponding to the prompt messages into a second feedforward neural network layer to obtain second network features; calculating relative position features according to the text features and the feature position information of the prompt features corresponding to the prompt information; according to the first network characteristic, the second network characteristic and the relative position characteristic, an information extraction matrix corresponding to the information extraction task is determined, and the relative position characteristic is introduced into the rotating attention layer, so that the accuracy of the information processing model is improved.
Step 206: and determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages.
In one or more embodiments of the present disclosure, an information extraction task is received, a text to be extracted and at least two pieces of prompt information are input into an information extraction model, and after an information extraction matrix corresponding to the information extraction task is determined, further, a target extraction result corresponding to the information extraction task may be determined according to the information extraction matrix, the text to be extracted and the at least two pieces of prompt information.
By applying the scheme of the embodiment of the specification, on the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the information extraction matrix, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the information extraction efficiency is improved.
In practical application, according to the information extraction matrix, the text to be extracted and at least two prompt messages, there are various ways of determining the target extraction result corresponding to the information extraction task, and the selection is specifically performed according to the practical situation, and the embodiment of the present disclosure does not limit the method.
In one possible implementation manner of the present disclosure, a text to be extracted and at least two prompt messages may be combined to obtain combined identification information, and the combined identification information is used to identify a row and a column of an information extraction matrix, and the combined identification information corresponding to an effective value in the information extraction matrix is searched, so as to obtain a target extraction result corresponding to an information extraction task.
The text to be extracted and at least two prompt messages are combined to obtain combined identification information of "< hint > city < hint > character < sep > Zhang three-hometown in XY city", the information extraction matrix is assumed to be a "01" matrix of 16 x 16, the rows and columns of the information extraction matrix are identified by the combined identification information, and combined identification information corresponding to an effective value of "1" in the information extraction matrix is searched for, so that a target extraction result corresponding to an information extraction task can be obtained. Note that, < hit > and < sep > are identifiers for separating text to be extracted from hint information.
In another possible implementation manner of the present disclosure, a text matrix may be constructed according to a text to be extracted and at least two pieces of prompt information, so that a target extraction result is extracted from the text matrix according to an effective value in the information extraction matrix, that is, the target extraction result corresponding to the information extraction task is determined according to the information extraction matrix, the text to be extracted and the at least two pieces of prompt information, and the method may include the following steps:
constructing a text matrix according to the text to be extracted and at least two prompt messages;
And extracting a target extraction result corresponding to the information extraction task from the text matrix according to the information extraction matrix.
When the text matrix is constructed according to the text to be extracted and at least two prompt messages, the text to be extracted and the at least two prompt messages can be spliced, and the text matrix can be constructed by taking the spliced text messages as rows and columns of the matrix.
For example, assuming that the text to be extracted is in XY city and at least two prompt messages are character and city, combining the text to be extracted and at least two prompt messages to obtain < hint > city < hint > character < sep > in XY city < sep > ", and using < hint > city < hint > character < sep > in XY city < sep >" as the rows and columns of the text matrix to obtain the text matrix of 16 x 16. And extracting a target extraction result corresponding to the information extraction task from the text matrix according to the information extraction matrix.
In an optional embodiment of the present disclosure, before extracting, according to the information extraction matrix, the target extraction result corresponding to the information extraction task from the text matrix, the text matrix and the information extraction matrix may be aligned, so that, according to the information extraction matrix, the target extraction result corresponding to the information extraction task is extracted from the text matrix.
By applying the scheme of the embodiment of the specification, a text matrix is constructed according to the text to be extracted and at least two prompt messages; according to the information extraction matrix, extracting a target extraction result corresponding to the information extraction task from the text matrix, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the information extraction efficiency is improved.
In an optional embodiment of the present disclosure, since the information extraction task is generally complex, in order to improve the extraction efficiency of the information extraction task, in this embodiment of the present disclosure, in a case where the information extraction task includes an information extraction subtask, at least two information extraction subtasks and initial prompt information corresponding to each information extraction subtask may be determined, and the information extraction subtask is processed by a recursive reasoning manner, that is, after the information extraction task is received, the method may further include the following steps:
analyzing the information extraction task, and determining at least two information extraction subtasks and initial prompt information corresponding to each information extraction subtask;
and determining the current prompt information corresponding to the current information extraction subtask according to the initial prompt information and the extraction result of the completed information extraction subtask.
Specifically, the completed information extraction subtask refers to a subtask that has obtained an extraction result corresponding to the information extraction subtask. The current information extraction subtask refers to an information extraction subtask which is about to extract information but does not obtain an extraction result in the current information extraction process. The initial prompt information corresponding to each information extraction subtask may include one prompt information or may include a plurality of prompt information, and specifically, the initial prompt information is selected according to the actual situation, which is not limited in any way in the embodiment of the present specification.
For example, assuming that the prompt information included in the information extraction task is "people, cities and hometown", classifying three prompt information included in the information extraction task, and determining that "people" and "cities" are entities and "hometown" is a relational entity, the information extraction task may be divided into two information extraction subtasks, where the information extraction subtask 1 is an entity of "people and cities" and the initial prompt information corresponding to the information extraction subtask 1 is "people and cities"; the information extraction subtask 2 is used for extracting the hometown entity corresponding to the person, and the initial prompt information corresponding to the information extraction subtask 2 is used for extracting the person and the hometown.
In the embodiment of the present disclosure, when at least two information extraction subtasks are processed in a recursive manner, for the current information extraction subtask, since the extraction result of the completed information extraction subtask may be used in the current information extraction subtask processing process, for example, in the above example, when the information extraction subtask 2 is executed, the entity corresponding to the person needs to be clear, so that the hometown entity of the person can be determined, so that the current prompt information corresponding to the current information extraction subtask may be determined according to the initial prompt information and the extraction result of the completed information extraction subtask.
With reference to the above example, assuming that the current information extraction subtask is the information extraction subtask 2, the current prompt information corresponding to the current information extraction subtask is "person: thirdly, stretching; hometown.
By applying the scheme of the embodiment of the specification, the information extraction task is analyzed, and at least two information extraction subtasks and initial prompt information corresponding to each information extraction subtask are determined; and determining the current prompt information corresponding to the current information extraction subtask according to the initial prompt information and the extraction result of the completed information extraction subtask. By analyzing the information extraction task, the complex information extraction task is converted into at least two simple information extraction subtasks, so that the scheme can support the information extraction task combined by any information extraction subtasks, and the universality of information extraction is improved.
In practical application, the information extraction task is analyzed, at least two information extraction subtasks and the implementation modes of the initial prompt information corresponding to each information extraction subtask are determined, the implementation modes are specifically selected according to practical situations, and the embodiment of the specification does not limit the implementation modes at all. In one possible implementation manner of the present disclosure, the information extraction tasks may be classified according to at least one task type corresponding to the information extraction task, and at least two information extraction subtasks corresponding to the information extraction task and initial prompt information corresponding to each information extraction subtask are determined.
In another possible implementation manner of the present disclosure, the information extraction task may be split according to at least two prompt messages included in the information extraction task, so as to obtain at least two information extraction subtasks and initial prompt messages corresponding to each information extraction subtask, that is, the above analysis information extraction task, and determine at least two information extraction subtasks and initial prompt messages corresponding to each information extraction subtask, which may include the following steps:
classifying at least two prompt messages in the message extraction task, and determining the prompt type corresponding to each prompt message;
And determining at least two information extraction subtasks corresponding to the information extraction tasks and initial prompt information corresponding to each information extraction subtask according to prompt types corresponding to each prompt information.
In practical application, at least two prompt messages in the message extraction task are classified, various modes for determining prompt types corresponding to the prompt messages are selected according to practical situations, and the embodiment of the specification is not limited in any way. In one possible implementation manner of the present disclosure, a prompt type corresponding to each prompt message may be searched from a preset prompt-type relationship table. In another possible implementation manner of the present disclosure, each prompt message may be input into a pre-trained information classification model, so as to obtain a prompt type corresponding to each prompt message.
Further, after determining the prompt types corresponding to the prompt messages, the information extraction tasks can be divided according to the number of the prompt types, at least two information extraction subtasks are determined, and the prompt messages of the same prompt type are used as the initial prompt messages of the information extraction subtasks corresponding to the prompt types.
By applying the scheme of the embodiment of the specification, at least two prompt messages in the message extraction task are classified, and the prompt type corresponding to each prompt message is determined; according to the prompt types corresponding to the prompt messages, at least two information extraction subtasks corresponding to the information extraction tasks and initial prompt messages corresponding to the information extraction subtasks are determined, and the accuracy of the initial prompt messages is ensured.
In an optional embodiment of the present disclosure, when the information extraction task is divided into at least two information extraction subtasks, inputting the text to be extracted and at least two prompt information into the information extraction model, and determining an information extraction matrix corresponding to the information extraction task may include the following steps:
inputting the text to be extracted and the current prompt information into an information extraction model, and determining an information extraction matrix corresponding to a current information extraction subtask;
the determining the target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages may include the following steps:
determining an extraction result corresponding to the current information extraction subtask according to the information extraction matrix corresponding to the current information extraction subtask;
and determining a target extraction result corresponding to the information extraction task according to the extraction result corresponding to each information extraction subtask.
The implementation manner of inputting the text to be extracted and the current prompt information into the information extraction model and determining the information extraction matrix corresponding to the current information extraction subtask is the same as the implementation manner of inputting the text to be extracted and the at least two prompt information into the information extraction model and determining the information extraction matrix corresponding to the information extraction task, which is described above, and will not be described in detail in the embodiment of the present specification.
Further, after the information extraction matrix corresponding to the current information extraction subtask is determined, an extraction result corresponding to the current information extraction subtask can be determined according to the current information extraction subtask, the text to be extracted and the current prompt information. The implementation manner of determining the extraction result corresponding to the current information extraction subtask according to the information extraction matrix corresponding to the current information extraction subtask is the same as the implementation manner of determining the target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages, and the embodiment of the present specification will not be repeated.
In the embodiment of the present disclosure, after determining the extraction result corresponding to each information extraction subtask, the extraction result corresponding to each information extraction subtask may be integrated, and each integrated extraction result may be used as the target extraction result.
Illustratively, assume that the information extraction task includes an information extraction subtask a and an information extraction subtask B. The initial prompt information corresponding to the information extraction subtask A is a, and the initial prompt information corresponding to the information extraction subtask B is B. And under the condition that the current information extraction subtask is an information extraction subtask A, the current prompt information is a. Inputting the text to be extracted and the current prompt information a into an information extraction model, determining an information extraction matrix A corresponding to the information extraction subtask A, and determining an extraction result corresponding to the information extraction subtask A as A according to the information extraction matrix A, the text to be extracted and the current prompt information a. Under the condition that the current information extraction subtask is an information extraction subtask B, determining that the current prompt information of the current information extraction subtask B is A+b according to the initial prompt information B of the information extraction subtask B and the extraction result A of the completed information extraction subtask A. Inputting the text to be extracted and the current prompt information A+b into an information extraction model, determining an information extraction matrix B corresponding to the information extraction subtask B, and determining an extraction result corresponding to the information extraction subtask B as B according to the information extraction matrix B, the text to be extracted and the current prompt information A+b.
By applying the scheme of the embodiment of the specification, the text to be extracted and the current prompt information are input into an information extraction model, and an information extraction matrix corresponding to the current information extraction subtask is determined; determining an extraction result corresponding to the current information extraction subtask according to the information extraction matrix corresponding to the current information extraction subtask; according to the extraction results corresponding to the information extraction subtasks, determining target extraction results corresponding to the information extraction tasks, converting the complex information extraction tasks into at least two simple information extraction subtasks by analyzing the information extraction tasks, so that the scheme can support the information extraction tasks combined by any information extraction subtasks, the universality of information extraction is improved, the corresponding relation between the text to be extracted and the current prompt information is expressed by the information extraction matrix, the information extraction speed is not influenced by the complexity of the information extraction schema, and the efficiency of information extraction is improved.
In an optional embodiment of the present disclosure, before determining, according to the initial prompt information and the extraction result of the completed information extraction subtask, the current prompt information corresponding to the current information extraction subtask, whether the completed information extraction subtask exists currently may be determined, and the current prompt information corresponding to the current information extraction subtask is determined according to the determination result, that is, before determining, according to the initial prompt information and the extraction result of the completed information extraction subtask, the current prompt information corresponding to the current information extraction subtask may further include the following steps:
Searching extraction results corresponding to the information extraction subtasks to obtain search results;
and determining whether the completed information extraction subtask exists currently according to the search result.
Illustratively, it is assumed that the information extraction task includes an information extraction subtask a, an information extraction subtask B, and an information extraction subtask C, where the information extraction subtask a corresponds to an extraction result a. Under the condition that the current information extraction subtask is an information extraction subtask B, searching the information extraction subtask A, the information extraction subtask B and the information extraction subtask C, and determining that the information extraction subtask A corresponds to an extraction result, determining that the completed information extraction subtask exists currently, and determining that the completed information extraction subtask is the information extraction subtask A corresponding to the extraction result.
By applying the scheme of the embodiment of the specification, the extraction results corresponding to the information extraction subtasks are searched for, and the search results are obtained; and determining whether the completed information extraction subtasks exist at present according to the search result, and traversing each information extraction subtask completely and completely, thereby ensuring the accuracy of the completed information extraction subtasks and further improving the accuracy of the current prompt information.
In practical application, judging whether the current information extraction subtask exists or not, wherein the judging result comprises the existence or nonexistence, and further determining the current prompt information corresponding to the current information extraction subtask according to different judging results, namely, determining the current prompt information corresponding to the current information extraction subtask according to the initial prompt information and the extraction result of the completed information extraction subtask, and the method comprises the following steps:
Under the condition that the completed information extraction subtask does not exist, taking the initial prompt information as the current prompt information corresponding to the current information extraction subtask;
under the condition that the completed information extraction subtask exists, determining current prompt information corresponding to the current information extraction subtask according to the initial prompt information and an extraction result of the completed information extraction subtask.
If the completed information extraction subtask does not exist, the current information extraction subtask is described as the first subtask for starting information extraction, and at the moment, the initial prompt information corresponding to the information extraction subtask is directly used as the current prompt information corresponding to the current information extraction subtask. If the completed information extraction subtask exists, the fact that the current information extraction subtask is not the first subtask for starting information extraction is indicated, and the current prompt information corresponding to the current information extraction subtask is influenced by the extraction result of the completed information extraction subtask, so that the current prompt information corresponding to the current information extraction subtask can be determined according to the initial prompt information corresponding to the current information extraction subtask and the extraction result of the completed information extraction subtask.
Further, when determining the current prompt information corresponding to the current information extraction subtask according to the initial prompt information and the extraction result of the completed information extraction subtask, the extraction results of the initial prompt information and the completed information extraction subtask can be combined, and the current prompt information can be obtained.
By applying the scheme of the embodiment of the specification, the current prompt information corresponding to the current information extraction subtask is determined in a recursion reasoning mode according to whether the judging result of the completed information extraction subtask exists or not, so that the accuracy of the current prompt information is ensured.
In an alternative embodiment of the present disclosure, the training method of the information extraction model may include the following steps:
acquiring a sample set, wherein the sample set comprises a plurality of sample texts, and the sample texts carry information extraction labels and at least two sample prompt messages;
extracting a first sample text from a plurality of sample texts, wherein the first sample text is any one of the plurality of sample texts;
inputting the first sample text and first sample prompt information carried by the first sample into an initial information extraction model to obtain a first prediction information extraction matrix corresponding to the first sample text;
Determining a first prediction extraction result corresponding to the first sample text according to the first prediction information extraction matrix, the first sample text and the first sample prompt information;
comparing the first predicted extraction result with a first information extraction label carried by the first sample, and calculating a loss value;
and adjusting model parameters of the initial information extraction model according to the loss value, and returning to execute the step of extracting the first sample text from the plurality of sample texts until a preset stopping condition is reached, so as to obtain the information extraction model.
Specifically, model parameters of the initial information extraction model are initialized with a pre-trained model (e.g., a BERT model). The training mode of the information extraction model is supervised training based on prompt learning, namely, each text in the sample set carries a real information extraction label, and the information extraction label is an extraction target of the information extraction model and is used for guiding the training process of the information extraction model. The sample set may be obtained by reading a large number of sample texts carrying information extraction tags and at least two sample prompt messages from other data acquisition devices or databases. The sample set can also be formed by a large number of sample texts carrying information extraction labels and at least two sample prompt messages which are input by a user. The manner in which the text training set is obtained is specifically selected according to the actual situation, which is not limited in any way in the embodiment of the present specification.
It is worth to describe that the implementation mode of inputting the first sample text and the first sample prompt information carried by the first sample into the initial information extraction model to obtain the first prediction information extraction matrix corresponding to the first sample text is the same as the implementation mode of inputting the text to be extracted and at least two prompt information into the information extraction model to determine the information extraction matrix corresponding to the information extraction task; the implementation manner of determining the first predicted extraction result corresponding to the first sample according to the first predicted information extraction matrix, the first sample text and the first sample prompt information is the same as the implementation manner of determining the target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt information, which is described above, and the embodiments of this specification will not be repeated.
In one possible implementation manner of the present disclosure, the preset stopping condition includes that the loss value is less than or equal to a preset threshold value. And extracting the label according to the first predicted extraction result and the first information carried by the first sample, and comparing the loss value with a preset threshold value after calculating the loss value.
Specifically, if the loss value is greater than a preset threshold, it is indicated that the difference between the first predicted extraction result and the first information extraction label carried by the first sample is greater, the prediction capability of the initial information extraction model on the first sample text is poorer, at this time, model parameters of the initial information extraction model can be adjusted, and a step of extracting the first sample text from a plurality of sample texts is performed in a returning manner, so that training of the initial information extraction model is continued until the loss value is less than or equal to the preset threshold, it is indicated that the difference between the first predicted extraction result and the first information extraction label carried by the first sample text is smaller, a preset stop condition is reached, and the information extraction model for completing training is obtained.
In another possible implementation manner of the present disclosure, in addition to comparing the magnitude relation between the loss value and the preset threshold, it may also be determined whether the training of the current initial information extraction model is completed in combination with the iteration number.
Specifically, if the loss value is greater than a preset threshold, the model parameters of the initial information extraction model are adjusted, the step of extracting the first sample text from the plurality of sample texts is returned to be executed, the initial information extraction model is continuously trained until the iteration is stopped under the condition that the preset iteration number is reached, and the trained information extraction model is obtained, wherein the preset threshold and the preset iteration number are specifically selected according to the actual situation, and the embodiment of the present disclosure is not limited in any way.
In practical applications, there are many functions for calculating the loss value, such as cross entropy loss function, L1 norm loss function, maximum loss function, mean square error loss function, logarithmic loss function, and the like, which are specifically selected according to practical situations, and the embodiment of the present disclosure is not limited in any way.
According to the scheme of the embodiment of the specification, according to the first prediction extraction result and the first information extraction label carried by the first sample, calculating to obtain a loss value, comparing the loss value with a preset stop condition, and continuing training the initial information extraction model under the condition that the preset stop condition is not met until the preset stop condition is met, and completing training to obtain the information extraction model. The model parameters of the initial information extraction model are continuously adjusted, so that the finally obtained information extraction model is more accurate.
Referring to fig. 3, fig. 3 shows a flowchart of a method for identifying an article according to an embodiment of the present disclosure, which specifically includes the following steps:
step 302: and receiving an article identification task, wherein the article identification task comprises a text to be identified and at least two prompt messages.
Step 304: inputting the text to be identified and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to the article identification task, wherein the message extraction matrix represents the corresponding relation between the text to be identified and each prompt message.
Step 306: and determining a target recognition result corresponding to the object recognition task according to the information extraction matrix, the text to be recognized and at least two prompt messages.
It should be noted that, the implementation manner of step 302 to step 306 is the same as the implementation manner of step 202 to step 206, and the description of the embodiment of the present disclosure is omitted.
In practical application, taking an e-commerce scenario as an example, a consumer can input text to be identified through a client, wherein the text to be identified comprises an article which the consumer wants to purchase. The client may generate an item identification request based on the text to be identified entered by the consumer and send the item identification request to the server. The server side can determine the target recognition result corresponding to the object recognition task by using the object recognition method, and send the target recognition result to the client side so that the client side can display the target recognition result to the consumer. Further, the client can also inquire the corresponding article according to the target identification result and recommend the article meeting the consumer demand to the consumer.
By applying the scheme of the embodiment of the specification, on the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the information extraction matrix, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the information extraction efficiency is improved.
Referring to fig. 4, fig. 4 shows a flowchart of an information extraction model training method provided in an embodiment of the present disclosure, where the information extraction model training method is applied to cloud-side equipment, and specifically includes the following steps:
step 402: and obtaining a sample set, wherein the sample set comprises a plurality of sample texts, and the sample texts carry information extraction labels and at least two sample prompt messages.
Step 404: a first sample text is extracted from a plurality of sample texts, wherein the first sample text is any one of the plurality of sample texts.
Step 406: and inputting the first sample text and the first sample prompt information carried by the first sample into an initial information extraction model to obtain a first prediction information extraction matrix corresponding to the first sample text.
Step 408: and determining a first prediction extraction result corresponding to the first sample text according to the first prediction information extraction matrix, the first sample text and the first sample prompt information.
Step 410: and comparing the first predicted extraction result with the first information extraction label carried by the first sample, and calculating a loss value.
Step 412: and adjusting model parameters of the initial information extraction model according to the loss value, and returning to execute the step of extracting the first sample text from the plurality of sample texts until a preset stopping condition is reached, so as to obtain the model parameters of the information extraction model.
Step 414: and sending the model parameters of the information extraction model to the terminal equipment.
It should be noted that, the training method of the information extraction model in the information extraction method provided in fig. 2 is the same as that of the step 402-step 412, and the description is omitted herein.
In practical application, after the cloud side device sends the model parameters of the information extraction model to the end side device, the end side device can locally construct the information extraction model according to the model parameters of the information extraction model, and further use the information extraction model to extract information.
According to the scheme of the embodiment of the specification, according to the first prediction extraction result and the first information extraction label carried by the first sample, calculating to obtain a loss value, comparing the loss value with a preset stop condition, and continuing training the initial information extraction model under the condition that the preset stop condition is not met until the preset stop condition is met, and completing training to obtain the information extraction model. The model parameters of the initial information extraction model are continuously adjusted, so that the finally obtained information extraction model is more accurate.
Referring to fig. 5, fig. 5 shows a flowchart of another information extraction method according to an embodiment of the present disclosure, which specifically includes the following steps:
step 502: and receiving an information extraction task, wherein the information extraction task comprises a text to be extracted and at least two prompt messages.
Step 504: analyzing the information extraction task, and determining at least two information extraction subtasks and initial prompt information corresponding to each information extraction subtask.
Step 506: and under the condition that the completed information extraction subtask does not exist, taking the initial prompt information as the current prompt information corresponding to the current information extraction subtask.
Step 508: under the condition that the completed information extraction subtask exists, determining current prompt information corresponding to the current information extraction subtask according to the initial prompt information and an extraction result of the completed information extraction subtask.
Step 510: inputting the text to be extracted and the current prompt information into a feature extraction layer to obtain text features of the text to be extracted and current prompt features corresponding to the current prompt information.
Step 512: in the attention layer, a feature matrix is constructed according to the text features and the current prompt features.
Step 514: and determining an information extraction matrix corresponding to the current information extraction subtask according to the attention mechanism and the feature matrix.
Step 516: and constructing a text matrix according to the text to be extracted and the current prompt information.
Step 518: and extracting an extraction result corresponding to the current information extraction subtask from the text matrix according to the information extraction matrix corresponding to the current information extraction subtask.
Step 520: and determining a target extraction result corresponding to the information extraction task according to the extraction result corresponding to each information extraction subtask.
It should be noted that, the specific implementation manner of the steps 502 to 520 is the same as the implementation manner of the information extraction method provided in fig. 2, and the description of the embodiment of the present disclosure is omitted.
By applying the scheme of the embodiment of the specification, the complex information extraction task is converted into at least two simple information extraction subtasks by analyzing the information extraction task, so that the scheme can support the information extraction task of any information extraction subtask combination, and the universality of information extraction is improved. And on the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the information extraction matrix, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the information extraction efficiency is improved.
Referring to fig. 6, fig. 6 is a flowchart illustrating a processing procedure of an information extraction method according to an embodiment of the present disclosure. Referring to fig. 7, fig. 7 is a flowchart illustrating a processing procedure of another information extraction method according to an embodiment of the present disclosure. Referring to fig. 8, fig. 8 is a schematic diagram of an attention mask matrix in an information extraction method according to an embodiment of the present disclosure.
Assuming that prompt information included in the information extraction task is 'people, cities and hometown', analyzing the information extraction task, and dividing the information extraction task into two information extraction subtasks, wherein the information extraction subtask 1 is an entity for extracting people and city types, and initial prompt information corresponding to the information extraction subtask 1 is 'people and cities'; the information extraction subtask 2 is used for extracting the hometown entity corresponding to the person, and the initial prompt information corresponding to the information extraction subtask 2 is used for extracting the person and the hometown.
For information extraction subtask 1: as shown in fig. 6, when the information extraction subtask 1 is the current information extraction subtask, there is no completed information extraction subtask, so that the current prompt information corresponding to the information extraction subtask 1 is "people and cities", the text to be extracted "Zhang Sanzhu village in XY city" and the current prompt information "people and cities" are input into the information extraction model to obtain an information extraction matrix, the information extraction matrix is updated according to the text to be extracted and the current prompt information to obtain the information extraction matrix shown in fig. 6, and the extraction result corresponding to the information extraction subtask 1 is determined as "people" based on the information extraction matrix shown in fig. 6: thirdly, stretching; city: XY/XY market).
Note that, < hit > and < sep > are identifiers for separating text to be extracted from hint information. The "H, T, THWS, NHW" in the information extraction matrix is a valid value and may be converted to "1". The H part in the information extraction matrix represents the corresponding relationship between the start position of the extraction result (span) and the prompt information, the T part represents the corresponding relationship between the end position of the extraction result and the prompt information, THWS represents that there is an entity from the j-th column to the i-th row, and NNW represents the continuous relationship from the i-th row to the j-th column of the entity feature (token).
For information extraction subtask 2: as shown in fig. 7, when the information extraction subtask 2 is the current information extraction subtask, the information extraction subtask 1 is the completed information extraction subtask, so that the current prompt information can be determined as "character" according to the initial prompt information of the information extraction subtask 2 and the extraction result of the information extraction subtask 1: thirdly, stretching; hometown. The text to be extracted is ' Zhang Sanzhu hometown in XY city ' and the character of the current prompt message ': thirdly, stretching; the hometown inputs an information extraction model to obtain an information extraction matrix, updates the information extraction matrix according to the text to be extracted and the current prompt information to obtain an information extraction matrix shown in fig. 7, and determines that an extraction result corresponding to the information extraction subtask 2 is character based on the information extraction matrix shown in fig. 7: thirdly, stretching; hometown: XY city).
In an alternative embodiment of the present disclosure, as shown in fig. 8, assuming that the text to be extracted is "Zhang three hometown in XY city", by using the above information extraction method, an attention mask matrix as shown in fig. 8 may be determined, where a dark square in the attention mask matrix is 0, which indicates that attention between characters of a corresponding row and characters of a corresponding column is masked, and a light square is 1. Included in the attention mask matrix are [ CLS ], people,: zhang Sano, [ T ], residential (location), site, work, (institution), [ P ], X, Y, [ Text ], zhang Sano hometown in XY city, [ SEP ]. Where [ CLS ] is placed first, representing classification (classification), it can be understood as a classification task for downstream. [ P ] represents position information, [ Text ] represents Text content to be extracted, [ SEP ] is used to separate two input sentences.
Further, the extraction result corresponding to the prompt information such as the place, the number and the population can be determined in the same way.
By applying the scheme of the embodiment of the specification, in the information extraction process, the information extraction process does not need to enumerate and input prompt information, a plurality of prompt information are spliced and input into an information extraction model, and the corresponding relation between the text to be extracted and each prompt information is expressed by utilizing a matrix structure, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the reasoning efficiency is improved by 70 percent because the text length is often far longer than the length of the spliced prompt information. In addition, the information extraction mode of recursive reasoning is adopted in the scheme, and the extraction of any multi-group set can be supported, so that the general information extraction of multi-groups is realized.
Referring to fig. 9, fig. 9 is an interface schematic diagram of an information extraction interface according to an embodiment of the present disclosure. The information extraction interface is divided into an information extraction task input interface and an information extraction result display interface. The information extraction task input interface comprises an information extraction task input box, a 'determination' control and a 'cancellation' control. The information extraction result display interface comprises an information extraction result display frame.
The method comprises the steps that a user inputs an information extraction task through an information extraction task input box displayed by a client, clicks a 'determination' control, and receives the information extraction task sent by the client by a server, wherein the information extraction task comprises a text to be extracted and at least two prompt messages; inputting the text to be extracted and at least two prompt messages into a message extraction model, and determining a message extraction matrix corresponding to a message extraction task, wherein the message extraction matrix represents the corresponding relation between the text to be extracted and each prompt message; determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and at least two prompt messages; and sending a target extraction result corresponding to the information extraction task to the client. And the client displays the target extraction result in the information extraction result display frame.
In practical applications, the manner in which the user operates the control includes any manner such as clicking, double clicking, touch control, mouse hovering, sliding, long pressing, voice control or shaking, and the like, and the selection is specifically performed according to the practical situation, which is not limited in any way in the embodiments of the present disclosure.
Corresponding to the above method embodiments, the present disclosure further provides an embodiment of an information extraction device, and fig. 10 shows a schematic structural diagram of an information extraction device provided in one embodiment of the present disclosure. As shown in fig. 10, the apparatus includes:
the first receiving module 1002 is configured to receive an information extraction task, where the information extraction task includes a text to be extracted and at least two hint information;
the first input module 1004 is configured to input the text to be extracted and at least two prompt messages into the information extraction model, and determine an information extraction matrix corresponding to the information extraction task, wherein the information extraction matrix characterizes the corresponding relationship between the text to be extracted and each prompt message;
the first determining module 1006 is configured to determine a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted, and at least two prompt messages.
Optionally, the information extraction model includes a feature extraction layer and an attention layer; the first input module 1004 is further configured to input the text to be extracted and at least two prompt messages into the feature extraction layer, so as to obtain text features of the text to be extracted and prompt features corresponding to the prompt messages; inputting text features of the text to be extracted and prompt features corresponding to the prompt messages into the attention layer to obtain an information extraction matrix corresponding to the information extraction task.
Optionally, the first input module 1004 is further configured to construct, in the attention layer, a feature matrix according to text features of the text to be extracted and prompt features corresponding to each prompt message; and determining an information extraction matrix corresponding to the information extraction task according to the attention mechanism and the feature matrix.
Optionally, the feature extraction layer comprises an embedding layer and at least one coding layer; the first input module 1004 is further configured to input the text to be extracted and each prompt message into an embedded layer, and determine a text feature sequence of the text to be extracted and a prompt feature sequence of each prompt message in the embedded layer by using the position information of each character in the text to be extracted and the position information of each character in each prompt message; inputting the text feature sequence of the text to be extracted and the prompt feature sequence of each prompt message into at least one coding layer, and determining the text feature of the text to be extracted and the prompt feature corresponding to each prompt message in the at least one coding layer by using the attention mask matrix.
Optionally, the attention layer includes a first feedforward neural network layer and a second feedforward neural network layer; the first input module 1004 is further configured to input the text feature and the prompt feature corresponding to each prompt message into the first feedforward neural network layer to obtain a first network feature; inputting the text features and the prompt features corresponding to the prompt messages into a second feedforward neural network layer to obtain second network features; calculating relative position features according to the text features and the feature position information of the prompt features corresponding to the prompt information; and determining an information extraction matrix corresponding to the information extraction task according to the first network characteristic, the second network characteristic and the relative position characteristic.
Optionally, the first determining module 1006 is further configured to construct a text matrix according to the text to be extracted and at least two hint information; and extracting a target extraction result corresponding to the information extraction task from the text matrix according to the information extraction matrix.
Optionally, the apparatus further comprises: the analysis module is configured to analyze the information extraction task and determine at least two information extraction subtasks and initial prompt information corresponding to each information extraction subtask; and determining the current prompt information corresponding to the current information extraction subtask according to the initial prompt information and the extraction result of the completed information extraction subtask.
Optionally, the first input module 1004 is further configured to input the text to be extracted and the current prompt information into the information extraction model, and determine an information extraction matrix corresponding to the current information extraction subtask; the first determining module 1006 is further configured to determine an extraction result corresponding to the current information extraction subtask according to the information extraction matrix corresponding to the current information extraction subtask; and determining a target extraction result corresponding to the information extraction task according to the extraction result corresponding to each information extraction subtask.
Optionally, the analysis module is further configured to classify at least two prompt messages in the message extraction task, and determine a prompt type corresponding to each prompt message; and determining at least two information extraction subtasks corresponding to the information extraction tasks and initial prompt information corresponding to each information extraction subtask according to prompt types corresponding to each prompt information.
Optionally, the apparatus further comprises: the searching module is configured to search the extraction results corresponding to the information extraction subtasks to obtain search results; and determining whether the completed information extraction subtask exists currently according to the search result.
Optionally, the parsing module is further configured to use the initial prompt information as the current prompt information corresponding to the current information extraction subtask when the completed information extraction subtask does not exist; under the condition that the completed information extraction subtask exists, determining current prompt information corresponding to the current information extraction subtask according to the initial prompt information and an extraction result of the completed information extraction subtask.
Optionally, the apparatus further comprises: the information extraction model training module is configured to acquire a sample set, wherein the sample set comprises a plurality of sample texts, and the sample texts carry information extraction labels and at least two sample prompt messages; extracting a first sample text from a plurality of sample texts, wherein the first sample text is any one of the plurality of sample texts; inputting the first sample text and first sample prompt information carried by the first sample into an initial information extraction model to obtain a first prediction information extraction matrix corresponding to the first sample text; determining a first prediction extraction result corresponding to the first sample text according to the first prediction information extraction matrix, the first sample text and the first sample prompt information; comparing the first predicted extraction result with a first information extraction label carried by the first sample, and calculating a loss value; and adjusting model parameters of the initial information extraction model according to the loss value, and returning to execute the step of extracting the first sample text from the plurality of sample texts until a preset stopping condition is reached, so as to obtain the information extraction model.
By applying the scheme of the embodiment of the specification, on the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the information extraction matrix, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the information extraction efficiency is improved.
The above is a schematic scheme of an information extraction apparatus of the present embodiment. It should be noted that, the technical solution of the information extraction device and the technical solution of the information extraction method belong to the same concept, and details of the technical solution of the information extraction device, which are not described in detail, can be referred to the description of the technical solution of the information extraction method.
Corresponding to the above method embodiments, the present disclosure further provides an embodiment of an article identifying device, and fig. 11 shows a schematic structural diagram of an article identifying device provided in one embodiment of the present disclosure. As shown in fig. 11, the apparatus includes:
a second receiving module 1102 configured to receive an item identification task, wherein the item identification task includes a text to be identified and at least two hint information;
a second input module 1104 configured to input the text to be identified and at least two prompt messages into a message extraction model, and determine a message extraction matrix corresponding to the article identification task, wherein the message extraction matrix characterizes a corresponding relationship between the text to be identified and each prompt message;
the second determining module 1106 is configured to determine a target recognition result corresponding to the object recognition task according to the information extraction matrix, the text to be recognized and at least two prompt messages.
By applying the scheme of the embodiment of the specification, on the basis of parallel input of a plurality of prompt messages, the corresponding relation between the text to be extracted and each prompt message is expressed through the information extraction matrix, so that the information extraction speed is not influenced by the complexity of the information extraction schema, and the information extraction efficiency is improved.
The above is an exemplary embodiment of an article identification device of the present embodiment. It should be noted that, the technical solution of the article identifying device and the technical solution of the article identifying method belong to the same concept, and details of the technical solution of the article identifying device, which are not described in detail, can be referred to the description of the technical solution of the article identifying method.
Corresponding to the method embodiment, the present disclosure further provides an embodiment of an information extraction model training apparatus, and fig. 12 shows a schematic structural diagram of an information extraction model training apparatus provided in one embodiment of the present disclosure. As shown in fig. 12, the apparatus is applied to cloud-side equipment, and includes:
an obtaining module 1202 configured to obtain a sample set, wherein the sample set includes a plurality of sample texts, the sample texts carrying an information extraction tag and at least two sample prompt messages;
An extraction module 1204 configured to extract a first sample text from the plurality of sample texts, wherein the first sample text is any one of the plurality of sample texts;
a third input module 1206, configured to input the first sample text and the first sample prompt information carried by the first sample into an initial information extraction model, to obtain a first prediction information extraction matrix corresponding to the first sample text;
a third determining module 1208 configured to determine a first prediction extraction result corresponding to the first sample text according to the first prediction information extraction matrix, the first sample text, and the first sample prompt information;
a calculation module 1210 configured to compare the first predicted extraction result with the first information extraction tag carried by the first sample, and calculate a loss value;
an adjustment module 1212 configured to adjust model parameters of the initial information extraction model according to the loss value, and to return to performing the step of extracting the first sample text from the plurality of sample texts until a preset stop condition is reached, to obtain model parameters of the information extraction model;
a transmitting module 1214 configured to transmit model parameters of the information extraction model to the end-side device.
According to the scheme of the embodiment of the specification, according to the first prediction extraction result and the first information extraction label carried by the first sample, calculating to obtain a loss value, comparing the loss value with a preset stop condition, and continuing training the initial information extraction model under the condition that the preset stop condition is not met until the preset stop condition is met, and completing training to obtain the information extraction model. The model parameters of the initial information extraction model are continuously adjusted, so that the finally obtained information extraction model is more accurate.
The above is a schematic scheme of an information extraction model training apparatus of this embodiment. It should be noted that, the technical solution of the information extraction model training device and the technical solution of the information extraction model training method belong to the same concept, and details of the technical solution of the information extraction model training device which are not described in detail can be referred to the description of the technical solution of the information extraction model training method.
FIG. 13 illustrates a block diagram of a computing device provided in one embodiment of the present description. The components of computing device 1300 include, but are not limited to, a memory 1310 and a processor 1320. Processor 1320 is coupled to memory 1310 via bus 1330, and database 1350 is used to store data.
Computing device 1300 also includes an access device 1340, which access device 1340 enables computing device 1300 to communicate via one or more networks 1360. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. Access device 1340 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network Interface Card), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Networks) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, world Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, a near-field communication (NFC, near Field Communication) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 1300, as well as other components not shown in FIG. 13, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 13 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 1300 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 1300 may also be a mobile or stationary server.
Wherein the processor 1320 is configured to execute computer-executable instructions that, when executed by the processor, perform the steps of the information extraction method or the article identification method or the information extraction model training method described above.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device belongs to the same concept as the technical solution of the information extraction method, the article identification method and the information extraction model training method, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the information extraction method, the article identification method or the information extraction model training method.
An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the above-described information extraction method or article identification method or information extraction model training method.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solutions of the information extraction method, the article identification method and the information extraction model training method described above belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solutions of the information extraction method, the article identification method or the information extraction model training method described above.
An embodiment of the present disclosure further provides a computer program, wherein the computer program when executed in a computer causes the computer to perform the steps of the above-described information extraction method or article identification method or information extraction model training method.
The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solutions of the information extraction method, the article identification method and the information extraction model training method described above belong to the same concept, and details of the technical solution of the computer program which are not described in detail can be referred to the description of the technical solutions of the information extraction method, the article identification method or the information extraction model training method described above.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims (16)

1. An information extraction method, comprising:
receiving an information extraction task, wherein the information extraction task comprises a text to be extracted and at least two prompt messages;
inputting the text to be extracted and the at least two prompt messages into an information extraction model, and determining an information extraction matrix corresponding to the information extraction task, wherein the information extraction matrix represents the corresponding relation between the text to be extracted and each prompt message;
and determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and the at least two prompt messages.
2. The method of claim 1, the information extraction model comprising a feature extraction layer and an attention layer;
inputting the text to be extracted and the at least two prompt messages into an information extraction model, and determining an information extraction matrix corresponding to the information extraction task, wherein the information extraction matrix comprises the following components:
inputting the text to be extracted and the at least two prompt messages into the feature extraction layer to obtain text features of the text to be extracted and prompt features corresponding to the prompt messages;
inputting the text features of the text to be extracted and the prompt features corresponding to the prompt messages into the attention layer to obtain an information extraction matrix corresponding to the information extraction task.
3. The method according to claim 2, wherein the inputting the text feature of the text to be extracted and the prompt feature corresponding to each prompt message into the attention layer to obtain the information extraction matrix corresponding to the information extraction task includes:
in the attention layer, a feature matrix is constructed according to the text features of the text to be extracted and the prompt features corresponding to the prompt messages;
and determining an information extraction matrix corresponding to the information extraction task according to the attention mechanism and the feature matrix.
4. The method of claim 2, the feature extraction layer comprising an embedding layer and at least one coding layer;
inputting the text to be extracted and the at least two prompt messages into the feature extraction layer to obtain text features of the text to be extracted and prompt features corresponding to the prompt messages, wherein the method comprises the following steps:
inputting the text to be extracted and each prompt message into the embedded layer, and determining a text feature sequence of the text to be extracted and a prompt feature sequence of each prompt message in the embedded layer by utilizing the position information of each character in the text to be extracted and the position information of each character in each prompt message;
inputting the text feature sequence of the text to be extracted and the prompt feature sequence of each prompt message into the at least one coding layer, and determining the text feature of the text to be extracted and the prompt feature corresponding to each prompt message in the at least one coding layer by using an attention mask matrix.
5. The method of claim 4, the attention layer comprising a first feedforward neural network layer and a second feedforward neural network layer;
inputting the text features of the text to be extracted and the prompt features corresponding to the prompt messages into the attention layer to obtain an information extraction matrix corresponding to the information extraction task, wherein the method comprises the following steps:
Inputting the text features and the prompt features corresponding to the prompt messages into the first feedforward neural network layer to obtain first network features;
inputting the text features and the prompt features corresponding to the prompt messages into the second feedforward neural network layer to obtain second network features;
calculating relative position features according to the text features and feature position information of the prompt features corresponding to the prompt messages;
and determining an information extraction matrix corresponding to the information extraction task according to the first network characteristic, the second network characteristic and the relative position characteristic.
6. The method according to claim 1, wherein the determining, according to the information extraction matrix, the text to be extracted, and the at least two hint information, a target extraction result corresponding to the information extraction task includes:
constructing a text matrix according to the text to be extracted and the at least two prompt messages;
and extracting a target extraction result corresponding to the information extraction task from the text matrix according to the information extraction matrix.
7. The method of claim 1, further comprising, after the receiving information extraction task:
Analyzing the information extraction task, and determining at least two information extraction subtasks and initial prompt information corresponding to each information extraction subtask;
and determining the current prompt information corresponding to the current information extraction subtask according to the initial prompt information and the extraction result of the completed information extraction subtask.
8. The method of claim 7, wherein the inputting the text to be extracted and the at least two hints information into the information extraction model, determining an information extraction matrix corresponding to the information extraction task, comprises:
inputting the text to be extracted and the current prompt information into an information extraction model, and determining an information extraction matrix corresponding to the current information extraction subtask;
the determining a target extraction result corresponding to the information extraction task according to the information extraction matrix, the text to be extracted and the at least two prompt messages comprises the following steps:
determining an extraction result corresponding to the current information extraction subtask according to the information extraction matrix corresponding to the current information extraction subtask;
and determining a target extraction result corresponding to the information extraction task according to the extraction result corresponding to each information extraction subtask.
9. The method of claim 7, wherein the parsing the information extraction task to determine at least two information extraction subtasks and initial prompt information corresponding to each information extraction subtask comprises:
classifying at least two prompt messages in the message extraction task, and determining the prompt type corresponding to each prompt message;
and determining at least two information extraction subtasks corresponding to the information extraction tasks and initial prompt information corresponding to each information extraction subtask according to the prompt types corresponding to each prompt information.
10. The method according to claim 7, wherein before determining the current prompt information corresponding to the current information extraction subtask according to the initial prompt information and the extraction result of the completed information extraction subtask, the method further comprises:
searching extraction results corresponding to the information extraction subtasks to obtain search results;
and determining whether the completed information extraction subtask exists currently according to the search result.
11. The method according to any one of claims 7-10, wherein the determining, according to the initial prompt information and the extraction result of the completed information extraction subtask, the current prompt information corresponding to the current information extraction subtask includes:
Under the condition that the completed information extraction subtask does not exist, the initial prompt information is used as current prompt information corresponding to the current information extraction subtask;
and under the condition that the completed information extraction subtask exists, determining the current prompt information corresponding to the current information extraction subtask according to the initial prompt information and the extraction result of the completed information extraction subtask.
12. The method of claim 1, wherein the training mode of the information extraction model comprises:
acquiring a sample set, wherein the sample set comprises a plurality of sample texts, and the sample texts carry information extraction labels and at least two sample prompt messages;
extracting a first sample text from the plurality of sample texts, wherein the first sample text is any one of the plurality of sample texts;
inputting the first sample text and first sample prompt information carried by the first sample into an initial information extraction model to obtain a first prediction information extraction matrix corresponding to the first sample text;
determining a first prediction extraction result corresponding to the first sample text according to the first prediction information extraction matrix, the first sample text and the first sample prompt information;
Comparing the first prediction extraction result with a first information extraction label carried by the first sample, and calculating a loss value;
and adjusting model parameters of the initial information extraction model according to the loss value, and returning to execute the step of extracting the first sample text from the plurality of sample texts until a preset stopping condition is reached, so as to obtain the information extraction model.
13. An article identification method comprising:
receiving an article identification task, wherein the article identification task comprises a text to be identified and at least two prompt messages;
inputting the text to be identified and the at least two prompt messages into an information extraction model, and determining an information extraction matrix corresponding to the article identification task, wherein the information extraction matrix characterizes the corresponding relation between the text to be identified and each prompt message;
and determining a target recognition result corresponding to the object recognition task according to the information extraction matrix, the text to be recognized and the at least two prompt messages.
14. An information extraction model training method is applied to cloud side equipment and comprises the following steps:
acquiring a sample set, wherein the sample set comprises a plurality of sample texts, and the sample texts carry information extraction labels and at least two sample prompt messages;
Extracting a first sample text from the plurality of sample texts, wherein the first sample text is any one of the plurality of sample texts;
inputting the first sample text and first sample prompt information carried by the first sample into an initial information extraction model to obtain a first prediction information extraction matrix corresponding to the first sample text;
determining a first prediction extraction result corresponding to the first sample text according to the first prediction information extraction matrix, the first sample text and the first sample prompt information;
comparing the first prediction extraction result with a first information extraction label carried by the first sample, and calculating a loss value;
adjusting model parameters of the initial information extraction model according to the loss value, and returning to execute the step of extracting the first sample text from the plurality of sample texts until a preset stopping condition is reached, so as to obtain the model parameters of the information extraction model;
and sending the model parameters of the information extraction model to the terminal side equipment.
15. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer executable instructions, the processor being configured to execute the computer executable instructions, which when executed by the processor, implement the steps of the method of any one of claims 1 to 12 or claim 13 or claim 14.
16. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the steps of the method of any one of claims 1 to 12 or claim 13 or claim 14.
CN202310431817.3A 2023-04-19 2023-04-19 Information extraction, article identification and information extraction model training method Pending CN116644743A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310431817.3A CN116644743A (en) 2023-04-19 2023-04-19 Information extraction, article identification and information extraction model training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310431817.3A CN116644743A (en) 2023-04-19 2023-04-19 Information extraction, article identification and information extraction model training method

Publications (1)

Publication Number Publication Date
CN116644743A true CN116644743A (en) 2023-08-25

Family

ID=87617763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310431817.3A Pending CN116644743A (en) 2023-04-19 2023-04-19 Information extraction, article identification and information extraction model training method

Country Status (1)

Country Link
CN (1) CN116644743A (en)

Similar Documents

Publication Publication Date Title
CN107846350B (en) Method, computer readable medium and system for context-aware network chat
CN111932144B (en) Customer service agent distribution method and device, server and storage medium
CN111666400B (en) Message acquisition method, device, computer equipment and storage medium
CN116595154B (en) Task processing method and automatic question-answering method
CN111680510B (en) Text processing method and device, computer equipment and storage medium
CN113850392A (en) Contrast-based self-supervised machine learning for common sense reasoning
CN117332072B (en) Dialogue processing, voice abstract extraction and target dialogue model training method
CN116050405A (en) Text processing, question-answer text processing and text processing model training method
CN115759001A (en) Language model training method, text prediction method and device
CN113887237A (en) Slot position prediction method and device for multi-intention text and computer equipment
CN116303558A (en) Query statement generation method, data query method and generation model training method
CN116663565A (en) Information extraction, conference view extraction and information extraction model training method
US11900067B1 (en) Multi-modal machine learning architectures integrating language models and computer vision systems
WO2024088012A1 (en) Image-text recognition method, and data processing method for image-text recognition model
CN117271745A (en) Information processing method and device, computing equipment and storage medium
CN116975288A (en) Text processing method and text processing model training method
CN116702976A (en) Enterprise resource prediction method and device based on modeling dynamic enterprise relationship
CN116644743A (en) Information extraction, article identification and information extraction model training method
CN115374283A (en) Double-graph attention network-based aspect category emotion classification method
CN116522014B (en) Data processing method and device
CN117633540B (en) Sample data construction method and device
CN116467500B (en) Data relation identification, automatic question-answer and query sentence generation method
CN112256833B (en) Mobile phone problem intelligent question answering method based on big data and AI algorithm
CN116136869A (en) Dialogue content generation, virtual dialogue, and data processing method for dialogue content
CN112115347B (en) Search result acquisition method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination