CN113204616B - Training of text extraction model and text extraction method and device - Google Patents

Training of text extraction model and text extraction method and device Download PDF

Info

Publication number
CN113204616B
CN113204616B CN202110479305.5A CN202110479305A CN113204616B CN 113204616 B CN113204616 B CN 113204616B CN 202110479305 A CN202110479305 A CN 202110479305A CN 113204616 B CN113204616 B CN 113204616B
Authority
CN
China
Prior art keywords
extraction model
text
texts
training
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110479305.5A
Other languages
Chinese (zh)
Other versions
CN113204616A (en
Inventor
刘同阳
王述
常万里
郑伟
冯知凡
柴春光
朱勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110479305.5A priority Critical patent/CN113204616B/en
Publication of CN113204616A publication Critical patent/CN113204616A/en
Application granted granted Critical
Publication of CN113204616B publication Critical patent/CN113204616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure provides a training and text extraction method of a text extraction model, and relates to the technical fields of deep learning, knowledge maps and natural language processing. The training method of the text extraction model comprises the following steps: acquiring training data; constructing a neural network model comprising a first extraction model and a second extraction model, wherein the output of the first extraction model is the input of the second extraction model; respectively inputting a plurality of texts into a first extraction model to obtain a predicted result of the entity word output by the first extraction model for each text; training the second extraction model by using the plurality of texts, the entity word prediction results of the plurality of texts and the aspect word labeling results of the plurality of texts until the second extraction model converges, and forming the text extraction model by the first extraction model and the trained second extraction model. The text extraction method comprises the following steps: acquiring a text to be processed; inputting the text to be processed into a text extraction model, and taking the output result of the text extraction model as the extraction result of the text to be processed.

Description

Training of text extraction model and text extraction method and device
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, knowledge graph and natural language processing. A method, apparatus, electronic device, and readable storage medium for training and text extraction models are provided.
Background
The entity words in the text have independent semantics, people, articles or concepts can be clearly expressed, and the aspect words corresponding to the entity words in the text are used for describing one aspect of the entity words.
In the prior art, various schemes for extracting triples corresponding to entity words from texts exist, but the schemes for extracting the triples cannot solve the technical problem of extracting triples consisting of entity words and corresponding aspect words from texts.
Disclosure of Invention
According to a first aspect of the present disclosure, there is provided a training method of a text extraction model, including: acquiring training data, wherein the training data comprises a plurality of texts and aspect word labeling results of the texts; constructing a neural network model comprising a first extraction model and a second extraction model, wherein the output of the first extraction model is the input of the second extraction model; respectively inputting a plurality of texts into the first extraction model to obtain a predicted result of the entity word output by the first extraction model for each text; training the second extraction model by using a plurality of texts, the entity word prediction results of the texts and the aspect word labeling results of the texts until the second extraction model converges, and forming the text extraction model by the first extraction model and the second extraction model obtained by training.
According to a second aspect of the present disclosure, there is provided a method of text extraction, comprising: acquiring a text to be processed; inputting the text to be processed into a text extraction model, and taking the output result of the text extraction model as the extraction result of the text to be processed.
According to a third aspect of the present disclosure, there is provided a training device for a text extraction model, including: the first acquisition unit is used for acquiring training data, wherein the training data comprises a plurality of texts and aspect word labeling results of the texts; a building unit, configured to build a neural network model including a first extraction model and a second extraction model, where an output of the first extraction model is an input of the second extraction model; the processing unit is used for respectively inputting a plurality of texts into the first extraction model to obtain a predicted result of the entity word output by the first extraction model for each text; the training unit is used for training the second extraction model by using a plurality of texts, the entity word prediction results of the texts and the aspect word labeling results of the texts until the second extraction model converges, and forming the text extraction model by the first extraction model and the second extraction model obtained by training.
According to a fourth aspect of the present disclosure, there is provided an apparatus for text extraction, including: the second acquisition unit is used for acquiring the text to be processed; the extraction unit is used for inputting the text to be processed into a text extraction model, and taking the output result of the text extraction model as the extraction result of the text to be processed.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
According to the technical scheme, the second extraction model in the neural network model is enabled to obtain the aspect words corresponding to the entity words in the text according to the output of the first extraction model by constructing the neural network model comprising the first extraction model and the second extraction model, and the second extraction model takes the entity words obtained by the first extraction model as priori knowledge, so that accuracy of the second extraction model in extracting the aspect words is improved, the field of the input text and the output aspect words is not limited, and the text extraction model obtained through training has stronger generalization capability.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure
Fig. 7 is a block diagram of an electronic device for implementing a method of training and text extraction of a text extraction model in accordance with an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. As shown in fig. 1, the training method of the text extraction model of the present embodiment may specifically include the following steps:
s101, acquiring training data, wherein the training data comprises a plurality of texts and aspect word labeling results of the texts;
s102, constructing a neural network model comprising a first extraction model and a second extraction model, wherein the output of the first extraction model is the input of the second extraction model;
s103, respectively inputting a plurality of texts into the first extraction model to obtain a predicted result of the entity word output by the first extraction model for each text;
and S104, training the second extraction model by using a plurality of texts, the entity word prediction results of the texts and the aspect word labeling results of the texts until the second extraction model converges, and forming the text extraction model by the first extraction model and the second extraction model obtained by training.
According to the training method of the text extraction model, the second extraction model in the neural network model is enabled to obtain the aspect words corresponding to the entity words in the text according to the output of the first extraction model by constructing the neural network model comprising the first extraction model and the second extraction model, and the second extraction model takes the entity words obtained by the first extraction model as priori knowledge, so that accuracy of the second extraction model in extracting the aspect words is improved, the field of the input text and the output aspect words is not limited, and the text extraction model obtained through training has stronger generalization capability.
In the training data obtained in S101, the aspect word labeling result of the text includes the entity word in the text and the aspect word corresponding to the entity word. In addition, the training data obtained in S101 may further include a labeling result of the entity word of the text, where the labeling result of the entity word includes the entity word included in the text.
The entity words in the embodiment are words which can clearly express people, articles or concepts in the text; the aspect words in the embodiment correspond to different entity words in the text, and are used for describing one aspect (aspect) of the entity word corresponding to the aspect words, and one entity word in the text can also be an aspect word corresponding to other entity words.
For example, if the text is "Zhang Sanzhuang" and the movie is "xx", if the entity words in the text are "Zhang Sanzhuang" and "xx", if the aspect word corresponding to the entity word "Zhang Sanzhuan" in the text is "movie", the aspect word corresponding to the entity word "xx" is "lead actor"; in this embodiment, "Zhang Sano" and "xx" are used as the entity word labeling results of the text, and "Zhang Sano, movie" and "lead actor" are used as the aspect word labeling results of the text.
In this embodiment, after executing S101 to obtain entity word labeling results and aspect word labeling results of a plurality of texts and a plurality of texts, executing S102 to construct a neural network model including a first extraction model and a second extraction model, where the neural network model uses the output of the first extraction model as the input of the second extraction model.
It can be understood that, in this embodiment, the first extraction model of the neural network model is used to extract the text for the first time, so as to obtain the entity word in the text; and the second extraction model of the neural network model is used for extracting the text for the second time according to the extraction result of the first extraction model so as to obtain the aspect words corresponding to the entity words in the text.
In the embodiment, when the text extraction model is trained, the extraction result obtained by the first extraction model is used as the input of the second extraction model, so that the second extraction model is more focused on extracting the aspect words corresponding to the entity words in the text after obtaining the entity word information in the text, and the accuracy of the second extraction model in extracting the aspect words is improved.
The first extraction model in this embodiment is a neural network model capable of extracting entity words from text, and includes a first coding layer, a second coding layer and a first classification layer; the first coding layer may be a pre-trained language model, such as an ERNIE model, for coding each character (token) in the text to obtain a semantic representation vector for each character; the second coding layer may be LSTM (Long Short-Term Memory network), for example, biLSTM, for coding the semantic representation vector of each character to obtain a sentence representation vector rich in context and character sequence for each character; the first classification layer is configured to label the head or the tail of the entity word in the text according to the sentence representation vector of each character, for example, if a certain character is the head or the tail of the entity word, the first classification layer labels the classification result of the character as 1, otherwise labels as 0.
The second extraction model in this embodiment is a neural network model capable of extracting aspect words corresponding to entity words from text, and includes a third coding layer, a fourth coding layer and a second classification layer; the third coding layer may be a pre-training language model, such as an ERNIE model, for coding each character (token) in the text to which the entity word information is added, to obtain a semantic representation vector of each character; the fourth coding layer may be LSTM, for example, biLSTM, for coding the semantic representation vector of each character to obtain a sentence representation vector rich in context and character order for each character; the second classification layer is configured to label the head or the tail of the aspect word in the text according to the sentence representation vector of each character, for example, if a certain character is the head or the tail of the aspect word, the second classification layer labels the classification result of the character as 1, otherwise labels as 0.
In this embodiment, after executing S102 to construct a neural network model including a first extraction model and a second extraction model, executing S103 to input a plurality of texts into the first extraction model respectively, so as to obtain a physical word prediction result output by the first extraction model for each text.
The predicted result of the entity word obtained in S103 in this embodiment may be the position of the head and the tail of the entity word in the text, or may be the entity word in the text directly.
After executing S103 to obtain the entity word prediction result output by the first extraction model for each text, executing S104 to train the second extraction model by using the plurality of texts, the entity word prediction results of the plurality of texts and the aspect word labeling results of the plurality of texts until the second extraction model converges, and forming the text extraction model by the first extraction model and the trained second extraction model.
That is, in this embodiment, only the second extraction model in the neural network model may be trained, but the first extraction model may not be trained, and after the second extraction model is trained to converge, training of the entire neural network model is completed, so as to obtain the text extraction model. By using the text extraction model obtained in the embodiment, the entity words and the aspect words corresponding to the entity words in the text can be output according to the input text.
Specifically, when executing S104 to train the second extraction model by using the plurality of texts, the entity word prediction results of the plurality of texts, and the aspect word labeling results of the plurality of texts, the present embodiment may adopt the following alternative implementation manners: fusing a plurality of texts with the entity word prediction results of the texts; and training the second extraction model by using the fusion result of the texts and the aspect word labeling result of the texts until the second extraction model converges.
That is, according to the fusion result obtained by predicting the entity word of the text and the text, the prior knowledge of the entity word is included in the fusion result, so that the second extraction model is more focused on extracting the aspect word in the text, and the accuracy of the second extraction model in extracting the aspect word is improved.
In this embodiment, when executing S104, training the second extraction model by using the fusion result of the plurality of texts and the aspect word labeling result of the plurality of texts until the second extraction model converges, an optional implementation manner may be adopted as follows: respectively inputting the fusion results of the texts into a second extraction model to obtain aspect word prediction results output by the second extraction model for each text; and calculating a first loss function value according to the aspect word prediction results of the texts and the aspect word labeling results of the texts, and adjusting parameters in the second extraction model according to the calculated first loss function value until the second extraction model converges.
In addition, the second extraction model in the present embodiment can acquire, in addition to the entity word prediction result output by the first extraction model, an aspect word in the text predicted by an entity word extracted from the text in other ways, for example, an entity word extraction result obtained by using a preset dictionary in addition to the neural network model.
It can be understood that, the predicted result of the aspect word obtained by executing S104 in this embodiment may be the positions of the head and the tail of the entity word and the aspect word corresponding to the entity word in the text, or may be a binary group formed by the entity word and the aspect word corresponding to the entity word in the text.
In this embodiment, when executing S104 to fuse the text with the predicted result of the entity word of the text, the entity word in the text may be labeled according to the predicted result of the entity word, for example, a tag indicating the head or tail of the entity is inserted into the text, and the text after labeling is used as the fused result of the text.
For example, if the text is "Zhang Sanzhuang" and the movie is "xx", the predicted result of the entity word obtained by the first extraction model is "Zhang Sanj" and "xx"; in this embodiment, when fusing the text and the predicted result of the entity word of the text, the tag "< e >" representing the head of the entity word may be inserted before the "sheet" and the "x", and the tag "</e >" representing the tail of the entity word may be inserted after the "three" and the "x", so that the obtained fused result is "< e > sheet three and the movie" < e > xx "is mainly performed, and the second extraction model is input, so as to obtain the predicted result of the aspect word.
According to the method, the second extraction model in the neural network model is enabled to obtain the aspect words corresponding to the entity words in the text according to the output of the first extraction model by constructing the neural network model comprising the first extraction model and the second extraction model, and the second extraction model takes the entity words obtained by the first extraction model as priori knowledge, so that accuracy of the second extraction model in extracting the aspect words is improved, and the method does not limit the belonging fields of the input text and the output aspect words, so that the text extraction model obtained through training has stronger generalization capability.
Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure. As shown in fig. 2, in the embodiment, when executing S104 "training the second extraction model by using a plurality of texts, a plurality of text word prediction results, and a plurality of text word labeling results, until the second extraction model converges, and forming the text extraction model by the first extraction model and the second extraction model obtained by training", the method specifically may include the following steps:
s201, training the first extraction model according to the entity word prediction results of the texts and the entity word labeling results of the texts until the first extraction model converges;
s202, forming a text extraction model by the first extraction model obtained through training and the second extraction model obtained through training.
That is, in this embodiment, the second extraction model is trained, and the first extraction model is trained at the same time, so that the two extraction models obtained by training form the text extraction model under the condition that both the two extraction models are determined to be converged.
In this embodiment, in executing S201, training the first extraction model according to the predicted results of the entity words of the plurality of texts and the labeling results of the entity words of the plurality of texts until the first extraction model converges, an optional implementation manner may be adopted as follows: and calculating a second loss function value according to the entity word prediction results of the texts and the entity word labeling results of the texts, and adjusting parameters in the first extraction model according to the calculated second loss function value until the first extraction model converges.
Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure. As shown in fig. 3, the left side in fig. 3 is a first extraction model among the text extraction models, and the right side is a second extraction model.
The first extraction model outputs the head and the tail of entity words Zhang Sanand xx in the text according to the input text Zhang Sanzhuang developed film xx; the second extraction model plays a movie according to the "< e > three times (xx) obtained by the head and tail of the entity word output by the first extraction model and the input text, and outputs an aspect word" movie "corresponding to the entity word" three times (xx) "in the text and an aspect word" main play "corresponding to the entity word" xx ".
Fig. 4 is a schematic diagram according to a fourth embodiment of the present disclosure. As shown in fig. 4, the text extraction method of the present embodiment may specifically include the following steps:
s401, acquiring a text to be processed;
s402, inputting the text to be processed into a text extraction model, and taking an output result of the text extraction model as an extraction result of the text to be processed.
According to the text extraction method, the text extraction model obtained by training in advance in the embodiment is used for obtaining the extraction result, and the text extraction model comprises the first extraction model and the second extraction model, so that the text extraction result is obtained according to the steps of extracting the entity words in the text first and then extracting the aspect words in the text according to the entity words, and the accuracy of the obtained extraction result is further improved.
Although the first extraction model in the text extraction models used in the present embodiment can extract the entity words in the texts belonging to different fields, the accuracy of extraction is lower for some entity words appearing in the texts as proper nouns.
In order to ensure the extraction accuracy of the entity words in the text and ensure that the text extraction model can extract the aspect words corresponding to the entity words focused by the user, in the embodiment, when executing S401 to input the text to be processed into the text extraction model, the optional implementation manner may be: taking entity words in a preset dictionary in a text to be processed as target entity words, wherein the preset dictionary in the embodiment is obtained through user definition and comprises a plurality of words; and inputting the text to be processed and the target entity word into a text extraction model.
The extraction result obtained in S402 includes a binary group composed of each entity word and its corresponding aspect word in the text to be processed.
Fig. 5 is a schematic diagram according to a fifth embodiment of the present disclosure. As shown in fig. 5, the training device 500 of the text extraction model of the present embodiment includes:
a first obtaining unit 501, configured to obtain training data, where the training data includes a plurality of texts and aspect word labeling results of the plurality of texts;
a building unit 502, configured to build a neural network model including a first extraction model and a second extraction model, where an output of the first extraction model is an input of the second extraction model;
the processing unit 503 is configured to input a plurality of texts into the first extraction model respectively, so as to obtain a predicted result of the entity word output by the first extraction model for each text;
training unit 504, configured to train the second extraction model by using the plurality of texts, the entity word prediction results of the plurality of texts, and the aspect word labeling results of the plurality of texts, until the second extraction model converges, and form the text extraction model from the first extraction model and the second extraction model obtained by training.
In the training data acquired by the first acquiring unit 501, the term labeling result of the text includes the entity word in the text and the term corresponding to the entity word. In addition, the training data acquired by the first acquiring unit 501 may further include a labeling result of the entity word of the text, where the labeling result of the entity word includes the entity word included in the text.
In this embodiment, after the first obtaining unit 501 obtains the entity word labeling results and the aspect word labeling results of the texts, the building unit 502 builds a neural network model including a first extraction model and a second extraction model, where the neural network model uses the output of the first extraction model as the input of the second extraction model.
The first extraction model constructed by the construction unit 502 is a neural network model capable of extracting entity words from text, and includes a first coding layer, a second coding layer and a first classification layer; the first coding layer may be a pre-training language model, and is used for coding each character (token) in the text to obtain a semantic representation vector of each character; the second coding layer may be LSTM (Long Short-Term Memory network) for coding the semantic representation vector of each character to obtain a sentence representation vector rich in context and character sequence for each character; the first classification layer is used for marking the head or tail of the entity word in the text according to the sentence representation vector of each character.
The second extraction model constructed by the construction unit 502 is a neural network model capable of extracting aspect words corresponding to entity words from text, and includes a third coding layer, a fourth coding layer and a second classification layer; the third coding layer may be a pre-training language model, and is configured to code each character (token) in the text to which the entity word information is added, so as to obtain a semantic representation vector of each character; the fourth coding layer may be LSTM, configured to code a semantic representation vector of each character to obtain a sentence representation vector of each character that is rich in context and character order; the second classification layer is used for marking the head or tail of the aspect words in the text according to the sentence representation vector of each character.
In this embodiment, after the building unit 502 builds the neural network model including the first extraction model and the second extraction model, the processing unit 503 inputs the plurality of texts into the first extraction model, so as to obtain the entity word prediction result output by the first extraction model for each text.
The predicted result of the entity word obtained by the processing unit 503 may be the position of the head and the tail of the entity word in the text, or may be the entity word in the text directly.
In this embodiment, after the processing unit 503 obtains the predicted result of the entity word output by the first extraction model for each text, the training unit 504 trains the second extraction model by using the plurality of texts, the predicted result of the entity word of the plurality of texts and the labeling result of the aspect word of the plurality of texts until the second extraction model converges, and the first extraction model and the second extraction model obtained by training form the text extraction model.
Specifically, when training the second extraction model using the plurality of texts, the entity word prediction results of the plurality of texts, and the aspect word labeling results of the plurality of texts, the training unit 504 may adopt the following alternative implementation manners: fusing a plurality of texts with the entity word prediction results of the texts; and training the second extraction model by using the fusion result of the texts and the aspect word labeling result of the texts until the second extraction model converges.
The training unit 504 trains the second extraction model by using the fusion result of the plurality of texts and the aspect word labeling result of the plurality of texts until the second extraction model converges, where the optional implementation manner may be: respectively inputting the fusion results of the texts into a second extraction model to obtain aspect word prediction results output by the second extraction model for each text; and calculating a first loss function value according to the aspect word prediction results of the texts and the aspect word labeling results of the texts, and adjusting parameters in the second extraction model according to the calculated first loss function value until the second extraction model converges.
In addition, when training the second extraction model, training section 504 can acquire, in addition to the result of predicting the physical word outputted by the first extraction model, the physical word extracted from the text by other means to predict the aspect word in the text.
It may be understood that the predicted result of the aspect word obtained by the training unit 504 may be the position of the head and the tail of the entity word and the aspect word corresponding to the entity word in the text, or may be a binary group formed by the entity word and the aspect word corresponding to the entity word in the text.
When the training unit 504 fuses the text and the predicted result of the entity word of the text, the entity word in the text may be labeled according to the predicted result of the entity word, and the labeled text is used as the fused result of the text.
In addition, when training the second extraction model by using the plurality of texts, the entity word prediction results of the plurality of texts, and the aspect word labeling results of the plurality of texts, until the second extraction model converges, and forming the text extraction model from the first extraction model and the second extraction model obtained by training, the training unit 504 may adopt the following alternative implementation manners: training the first extraction model according to the entity word prediction results of the texts and the entity word labeling results of the texts until the first extraction model converges; and forming a text extraction model by the first extraction model obtained through training and the second extraction model obtained through training.
That is, the training unit 504 trains the second extraction model and trains the first extraction model, so that the two extraction models obtained by training form the text extraction model under the condition that the two extraction models are determined to be converged, and the first extraction model obtained by training ensures that the entity word prediction has higher accuracy, so that the accuracy of the aspect words corresponding to the entity words in the extracted text of the text extraction model is further improved.
The training unit 504 trains the first extraction model according to the predicted results of the entity words of the texts and the labeling results of the entity words of the texts until the first extraction model converges, and may adopt the following alternative implementation modes: and calculating a second loss function value according to the entity word prediction results of the texts and the entity word labeling results of the texts, and adjusting parameters in the first extraction model according to the calculated second loss function value until the first extraction model converges.
Fig. 6 is a schematic diagram according to a sixth embodiment of the present disclosure. As shown in fig. 6, the text extraction apparatus 600 of the present embodiment includes:
a second obtaining unit 601, configured to obtain a text to be processed;
and the extraction unit 602 is configured to input the text to be processed into a text extraction model, and take an output result of the text extraction model as an extraction result of the text to be processed.
In order to ensure the extraction accuracy of the entity words in the text and ensure that the text extraction model can extract the aspect words corresponding to the entity words focused by the user, when the extraction unit 602 inputs the text to be processed into the text extraction model, the following alternative implementation manners may be adopted: taking entity words in a preset dictionary in the text to be processed as target entity words; and inputting the text to be processed and the target entity word into a text extraction model.
The extraction result obtained by the extraction unit 602 includes a binary group composed of each entity word and its corresponding aspect word in the text to be processed.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
As shown in fig. 7, is a block diagram of an electronic device for a method of training and text labeling of a text extraction model according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as training of text extraction models and text labeling. For example, in some embodiments, the method of training and text labeling of a text extraction model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708.
In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM702 and/or communication unit 709. When the computer program is loaded into RAM703 and executed by the computing unit 701, one or more steps of the method of training and text labeling of a text extraction model described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the training of the text extraction model and the method of text labeling in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here can be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (12)

1. A training method of a text extraction model, comprising:
acquiring training data, wherein the training data comprises a plurality of texts and aspect word labeling results of the texts;
constructing a neural network model comprising a first extraction model and a second extraction model, wherein the output of the first extraction model is the input of the second extraction model;
respectively inputting a plurality of texts into the first extraction model to obtain a predicted result of the entity word output by the first extraction model for each text;
training the second extraction model by using a plurality of texts, entity word prediction results of the texts and aspect word labeling results of the texts until the second extraction model converges, and forming a text extraction model by the first extraction model and the second extraction model obtained by training;
training the second extraction model by using the plurality of texts, the entity word prediction results of the plurality of texts and the aspect word labeling results of the plurality of texts until the second extraction model converges, wherein the training comprises the following steps:
fusing a plurality of texts with the entity word prediction results of the texts;
and training the second extraction model by using the fusion result of the texts and the aspect word labeling result of the texts until the second extraction model converges.
2. The method of claim 1, wherein the training data further comprises entity word labeling results of a plurality of texts.
3. The method of claim 2, wherein the training the second extraction model using the plurality of texts, the solid word prediction results of the plurality of texts, and the aspect word labeling results of the plurality of texts until the second extraction model converges, and the composing the first extraction model and the trained second extraction model into a text extraction model comprises:
training the first extraction model according to the entity word prediction results of the texts and the entity word labeling results of the texts until the first extraction model converges;
and forming a text extraction model by the first extraction model obtained through training and the second extraction model obtained through training.
4. A method of text extraction, comprising:
acquiring a text to be processed;
inputting the text to be processed into a text extraction model, and taking the output result of the text extraction model as the extraction result of the text to be processed;
wherein the text extraction model is pre-trained according to the method of any one of claims 1-3.
5. The method of claim 4, wherein said inputting the text to be processed into a text extraction model comprises:
taking the entity words in the preset dictionary in the text to be processed as target entity words;
and inputting the text to be processed and the target entity word into the text extraction model.
6. A training device for a text extraction model, comprising:
the first acquisition unit is used for acquiring training data, wherein the training data comprises a plurality of texts and aspect word labeling results of the texts;
a building unit, configured to build a neural network model including a first extraction model and a second extraction model, where an output of the first extraction model is an input of the second extraction model;
the processing unit is used for respectively inputting a plurality of texts into the first extraction model to obtain a predicted result of the entity word output by the first extraction model for each text;
the training unit is used for training the second extraction model by using a plurality of texts, the entity word prediction results of the texts and the aspect word labeling results of the texts until the second extraction model converges, and forming a text extraction model by the first extraction model and the second extraction model obtained by training;
the training unit trains the second extraction model by using a plurality of texts, the entity word prediction results of the texts and the aspect word labeling results of the texts until the second extraction model converges, and specifically performs:
fusing a plurality of texts with the entity word prediction results of the texts;
and training the second extraction model by using the fusion result of the texts and the aspect word labeling result of the texts until the second extraction model converges.
7. The apparatus of claim 6, wherein the training data acquired by the first acquiring unit further includes entity word labeling results of a plurality of texts.
8. The apparatus of claim 7, wherein the training unit is configured to, when training the second extraction model using the plurality of texts, the solid word prediction results of the plurality of texts, and the aspect word labeling results of the plurality of texts until the second extraction model converges, combine the first extraction model with the trained second extraction model to form a text extraction model, specifically perform:
training the first extraction model according to the entity word prediction results of the texts and the entity word labeling results of the texts until the first extraction model converges;
and forming a text extraction model by the first extraction model obtained through training and the second extraction model obtained through training.
9. An apparatus for text extraction, comprising:
the second acquisition unit is used for acquiring the text to be processed;
the extraction unit is used for inputting the text to be processed into a text extraction model, and taking the output result of the text extraction model as the extraction result of the text to be processed;
wherein the text extraction model is pre-trained according to the apparatus of any one of claims 6-8.
10. The apparatus according to claim 9, wherein the extraction unit, when inputting the text to be processed into a text extraction model, specifically performs:
taking the entity words in the preset dictionary in the text to be processed as target entity words;
and inputting the text to be processed and the target entity word into the text extraction model.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202110479305.5A 2021-04-30 2021-04-30 Training of text extraction model and text extraction method and device Active CN113204616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110479305.5A CN113204616B (en) 2021-04-30 2021-04-30 Training of text extraction model and text extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110479305.5A CN113204616B (en) 2021-04-30 2021-04-30 Training of text extraction model and text extraction method and device

Publications (2)

Publication Number Publication Date
CN113204616A CN113204616A (en) 2021-08-03
CN113204616B true CN113204616B (en) 2023-11-24

Family

ID=77028602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110479305.5A Active CN113204616B (en) 2021-04-30 2021-04-30 Training of text extraction model and text extraction method and device

Country Status (1)

Country Link
CN (1) CN113204616B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595686B (en) * 2022-03-11 2023-02-03 北京百度网讯科技有限公司 Knowledge extraction method, and training method and device of knowledge extraction model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633007B1 (en) * 2016-03-24 2017-04-25 Xerox Corporation Loose term-centric representation for term classification in aspect-based sentiment analysis
CN109597997A (en) * 2018-12-07 2019-04-09 上海宏原信息科技有限公司 Based on comment entity, aspect grade sensibility classification method and device and its model training
CN111339260A (en) * 2020-03-02 2020-06-26 北京理工大学 BERT and QA thought-based fine-grained emotion analysis method
CN111581981A (en) * 2020-05-06 2020-08-25 西安交通大学 Evaluation object strengthening and constraint label embedding based aspect category detection system and method
CN111666761A (en) * 2020-05-13 2020-09-15 北京大学 Fine-grained emotion analysis model training method and device
CN112487826A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Information extraction method, extraction model training method and device and electronic equipment
CN112579778A (en) * 2020-12-23 2021-03-30 重庆邮电大学 Aspect-level emotion classification method based on multi-level feature attention

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2635257C1 (en) * 2016-07-28 2017-11-09 Общество с ограниченной ответственностью "Аби Продакшн" Sentiment analysis at level of aspects and creation of reports using machine learning methods
US10755174B2 (en) * 2017-04-11 2020-08-25 Sap Se Unsupervised neural attention model for aspect extraction
US20200159863A1 (en) * 2018-11-20 2020-05-21 Sap Se Memory networks for fine-grain opinion mining

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633007B1 (en) * 2016-03-24 2017-04-25 Xerox Corporation Loose term-centric representation for term classification in aspect-based sentiment analysis
CN109597997A (en) * 2018-12-07 2019-04-09 上海宏原信息科技有限公司 Based on comment entity, aspect grade sensibility classification method and device and its model training
CN111339260A (en) * 2020-03-02 2020-06-26 北京理工大学 BERT and QA thought-based fine-grained emotion analysis method
CN111581981A (en) * 2020-05-06 2020-08-25 西安交通大学 Evaluation object strengthening and constraint label embedding based aspect category detection system and method
CN111666761A (en) * 2020-05-13 2020-09-15 北京大学 Fine-grained emotion analysis model training method and device
CN112487826A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Information extraction method, extraction model training method and device and electronic equipment
CN112579778A (en) * 2020-12-23 2021-03-30 重庆邮电大学 Aspect-level emotion classification method based on multi-level feature attention

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Aspect Extraction with Automated Prior Knowledge Learning;Chen, Zhiyuan 等;Acl;全文 *
Entity and Aspect Extraction for Organizing News Comments;Prasojo R E 等;the 24th ACM International;全文 *
一种用于基于方面情感分析的深度分层网络模型;刘全 等;计算机学报;全文 *

Also Published As

Publication number Publication date
CN113204616A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN112487173B (en) Man-machine conversation method, device and storage medium
CN112926306B (en) Text error correction method, device, equipment and storage medium
CN113407610B (en) Information extraction method, information extraction device, electronic equipment and readable storage medium
CN112506359B (en) Method and device for providing candidate long sentences in input method and electronic equipment
CN113407698B (en) Method and device for training and recognizing intention of intention recognition model
CN115640520B (en) Pre-training method, device and storage medium of cross-language cross-modal model
CN112786108B (en) Training method, device, equipment and medium of molecular understanding model
CN112528641A (en) Method and device for establishing information extraction model, electronic equipment and readable storage medium
CN113053367A (en) Speech recognition method, model training method and device for speech recognition
CN113850080A (en) Rhyme word recommendation method, device, equipment and storage medium
CN114490998A (en) Text information extraction method and device, electronic equipment and storage medium
CN113641829B (en) Training and knowledge graph completion method and device for graph neural network
CN112948584B (en) Short text classification method, device, equipment and storage medium
CN113204616B (en) Training of text extraction model and text extraction method and device
CN114490985A (en) Dialog generation method and device, electronic equipment and storage medium
CN113468857A (en) Method and device for training style conversion model, electronic equipment and storage medium
CN113408273A (en) Entity recognition model training and entity recognition method and device
CN112906368A (en) Industry text increment method, related device and computer program product
CN114758649B (en) Voice recognition method, device, equipment and medium
CN113051926B (en) Text extraction method, apparatus and storage medium
CN112560481B (en) Statement processing method, device and storage medium
CN113723120B (en) Display method and device of reference information and electronic equipment
CN113553863B (en) Text generation method, device, electronic equipment and storage medium
CN113408300B (en) Model training method, brand word recognition device and electronic equipment
CN112507712B (en) Method and device for establishing slot identification model and slot identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant