CN114881033A - Text abstract generation method and device, computer equipment and storage medium - Google Patents

Text abstract generation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114881033A
CN114881033A CN202210489164.XA CN202210489164A CN114881033A CN 114881033 A CN114881033 A CN 114881033A CN 202210489164 A CN202210489164 A CN 202210489164A CN 114881033 A CN114881033 A CN 114881033A
Authority
CN
China
Prior art keywords
entity
data
text
fusion
abstract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210489164.XA
Other languages
Chinese (zh)
Inventor
陈焕坤
王伟
黄勇其
张黔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Runlian Software System Shenzhen Co Ltd
Original Assignee
Runlian Software System Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Runlian Software System Shenzhen Co Ltd filed Critical Runlian Software System Shenzhen Co Ltd
Priority to CN202210489164.XA priority Critical patent/CN114881033A/en
Publication of CN114881033A publication Critical patent/CN114881033A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application belongs to the technical field of natural language processing in artificial intelligence, and relates to a text abstract generating method and device fusing entity information, computer equipment and a storage medium. An Entity mapping layer (hereinafter referred to as Entity mapping) and an Entity type mapping layer (hereinafter referred to as type mapping) are added on the basis of an original network, Entity information of an input text is respectively mapped to 2 vectors with the same dimension, and the information amount received by a model is enhanced; meanwhile, a word-entity cross attention layer is added between the multi-head attention layer and the feedforward neural network layer of each sub-module, so that the expression capability of the model to the entity is enhanced, and the decoder can accurately extract important information.

Description

Text abstract generation method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a text summary generation method and apparatus, a computer device, and a storage medium, which integrate entity information.
Background
With the advent of the big data age, the presentation of text data on the internet has increased explosively, so that people have to spend a lot of time browsing and understanding the corresponding text, and certain important information is inevitably missed. Therefore, how to quickly and efficiently acquire important information from a large amount of text becomes increasingly important. Automatic digest techniques are an effective way to alleviate this problem.
There is a text summary generation method, i.e. an encoder-decoder model using deep learning. Specifically, the encoder is responsible for carrying out vector encoding on the original text and extracting semantic information; the decoder decodes the compressed information to generate the abstract of the original text.
However, the applicant finds that, in the conventional text abstract generation method, a dictionary or a pre-training model based on words is adopted, and the encoding process inevitably loses entity information, so that the decoded abstract does not sufficiently reflect the content of the original text center, and thus, the conventional text abstract generation method has the problem of low accuracy.
Disclosure of Invention
The embodiment of the application aims to provide a text abstract generating method, a text abstract generating device, computer equipment and a storage medium for fusing entity information, so as to solve the problem that the traditional text abstract generating method is low in accuracy.
In order to solve the above technical problem, an embodiment of the present application provides a text summary generating method for merging entity information, which adopts the following technical scheme:
acquiring original text data to be processed;
performing entity extraction operation on the original text data to obtain entity text data and entity type data;
performing a first fusion operation on the entity text data and the entity type data to obtain entity fusion data;
performing vector conversion operation on the original text data to obtain text vector data;
respectively inputting the text vector data and the entity fusion data into a language representation model for abstract coding operation to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder module;
and decoding the abstract coded data to obtain a target text abstract.
In order to solve the above technical problem, an embodiment of the present application further provides a text summary generating device fusing entity information, which adopts the following technical scheme:
the data acquisition module is used for acquiring original text data to be processed;
the entity extraction module is used for carrying out entity extraction operation on the original text data to obtain entity text data and entity type data;
the first fusion module is used for carrying out first fusion operation on the entity text data and the entity type data to obtain entity fusion data;
the vector conversion module is used for carrying out vector conversion operation on the original text data to obtain text vector data;
the abstract coding module is used for respectively inputting the text vector data and the entity fusion data into a language representation model to carry out abstract coding operation so as to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder module;
and the abstract decoding module is used for decoding the abstract coded data to obtain a target text abstract.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
the text abstract generating method comprises a memory and a processor, wherein computer readable instructions are stored in the memory, and the processor realizes the steps of the text abstract generating method for the converged entity information when executing the computer readable instructions.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the text abstract generating method for merging entity information as described above.
The application provides a text abstract generating method fusing entity information, which comprises the following steps: acquiring original text data to be processed; performing entity extraction operation on the original text data to obtain entity text data and entity type data; performing a first fusion operation on the entity text data and the entity type data to obtain entity fusion data; performing vector conversion operation on the original text data to obtain text vector data; respectively inputting the text vector data and the entity fusion data into a language representation model for abstract coding operation to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder module; and decoding the abstract coded data to obtain a target text abstract. Compared with the prior art, the method has the advantages that an Entity mapping layer (hereinafter referred to as Entity mapping) and an Entity type mapping layer (hereinafter referred to as type mapping) are added on the basis of the original network, Entity information of an input text is mapped to 2 vectors with the same dimensionality respectively, and the information amount received by a model is enhanced; meanwhile, a word-entity cross attention layer is added between the multi-head attention layer and the feedforward neural network layer of each sub-module, so that the expression capability of the model to the entity is enhanced, and the decoder can accurately extract important information.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
fig. 2 is a flowchart illustrating an implementation of a text abstract generating method for merging entity information according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an embodiment of an Encoder module according to an embodiment of the present application;
FIG. 4 is a flowchart of one embodiment of step S203 in FIG. 2;
FIG. 5 is a flowchart of one embodiment of step S205 of FIG. 2;
FIG. 6 is a flowchart of one embodiment of step S502 in FIG. 5;
fig. 7 is a schematic structural diagram of a text abstract generating device for merging entity information according to a second embodiment of the present application;
fig. 8 is a schematic structural diagram of an embodiment of the first fusion module 230 according to the second embodiment of the present disclosure;
FIG. 9 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the text abstract generating method for merging entity information provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the text abstract generating apparatus for merging entity information is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Example one
Continuing to refer to fig. 2, an implementation flowchart of a text abstract generation method for merging entity information provided in an embodiment of the present application is shown, and for convenience of description, only the relevant portions of the present application are shown.
The text abstract generating method of the fused entity information comprises the following steps:
step S201: and acquiring original text data to be processed.
Step S202: and performing entity extraction operation on the original text data to obtain entity text data and entity type data.
Step S203: and carrying out first fusion operation on the entity text data and the entity type data to obtain entity fusion data.
In the embodiment of the present application, referring to fig. 3, a schematic structural diagram of an Encoder module provided in the embodiment of the present application is shown, where an original text to be abstracted is assumed to be D ═ x 1 ,x 2 ,…,x n ]Wherein x is i Is the ith word of the document and n is the length of the document. Firstly, using a text tool (such as Hanlp, etc.) to extract entities in an original text (called entity extraction), and each entity carries an entity type (name, place name, etc.), thereby forming an entity sequence E ═ E 1 ,e 2 ,…,e m ]And entity type sequence E '═ E' 1 ,e′ 2 ,…,e′ m ]. Wherein e i The ith entity representing sequence, e' i The same is true. Mapping both the EntityEmbedding layer and the TypeEmbedding layer to fixed vectors with the same dimension to respectively generate a vector sequence T ═ T 1 ,t 2 ,…,t m ]And K ═ K 1 ,k 2 ,…,k m ]. Considering the importance degree of the entity and the influence of the length of the entity, the T and the K need to be fused.
Step S204: and carrying out vector conversion operation on the original text data to obtain text vector data.
In the embodiment of the application, original text data D is mapped to vectors with the same dimension through a wordlebelling layer to obtain a matrix H ═ H 1 ,h 2 ,…,h n ]。
Step S205: respectively inputting the text vector data and the entity fusion data into a language representation model for abstract coding operation to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder modules.
Step S206: and decoding the abstract coded data to obtain a target text abstract.
In an embodiment of the present application, a method for generating a text abstract fusing entity information is provided, including: acquiring original text data to be processed; performing entity extraction operation on the original text data to obtain entity text data and entity type data; performing first fusion operation on entity text data and entity type data to obtain entity fusion data; carrying out vector conversion operation on the original text data to obtain text vector data; respectively inputting the text vector data and the entity fusion data into a language representation model for abstract coding operation to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder module; and decoding the abstract coded data to obtain a target text abstract. Compared with the prior art, the method and the device have the advantages that an Entity mapping layer (hereinafter referred to as Entity mapping) and an Entity type mapping layer (hereinafter referred to as type mapping) are added on the basis of an original network, Entity information of an input text is mapped to 2 vectors with the same dimension respectively, and the information amount received by a model is enhanced; meanwhile, a word-entity cross attention layer is added between the multi-head attention layer and the feedforward neural network layer of each sub-module, so that the expression capability of the model to the entity is enhanced, and the decoder can accurately extract important information.
Continuing to refer to fig. 4, a flowchart of one embodiment of step S203 of fig. 2 is shown, and for convenience of illustration, only the portions relevant to the present application are shown.
In some optional implementation manners of this embodiment, step S203 specifically includes:
step S401: performing fusion calculation on the entity text data and the entity type data according to an entity fusion algorithm to obtain entity fusion data, wherein the entity fusion algorithm is expressed as:
Figure BDA0003630577070000071
wherein, g i Entity fusion data representing an ith entity; w is a i Representing the ith entity text data; tfidf (w) i ) A word frequency-inverse document frequency index representing the ith entity text data; max (tfidf (w) i ),i∈[1,m]) Maximum word frequency-inverse document frequency index representing all entities in the original text data; | w i L represents the length of the ith entity; max (| w) i |,i∈[1,m]) Representing the length of the longest entity in the original text data.
Continuing to refer to fig. 5, a flowchart of one embodiment of step S205 of fig. 2 is shown, and for convenience of illustration, only the relevant portions of the present application are shown.
In some optional implementation manners of this embodiment, step S205 specifically includes:
step S501: and performing self-attention layer calculation and splicing operation in the multi-head attention layer according to the text vector data to obtain a multi-head attention layer result.
Step S502: and performing second fusion operation in the word-entity cross attention layer according to the multi-head attention layer result and the entity fusion data to obtain a cross attention layer result.
Step S503: and inputting the cross attention layer result into a residual error neural network and a feedforward neural network to obtain abstract coding data.
In the embodiment of the application, the input matrix H firstly performs calculation of a self-attention layer, then splices the calculation results of a plurality of self-attention layers, and finally passes through a residual error and feedforward neural network layer.
In some alternative implementations of the present embodiment, the multi-headed attention layer result Z L Expressed as:
Figure BDA0003630577070000081
wherein the content of the first and second substances,
Figure BDA0003630577070000082
H L text vector data representing L-th layer transforms.
Continuing to refer to fig. 6, a flowchart of one embodiment of step S502 in fig. 5 is shown, and for ease of illustration, only the portions relevant to the present application are shown.
In some optional implementation manners of this embodiment, step S502 specifically includes:
step S601: performing softmax operation on the multi-head attention layer result and the entity fusion data according to a softmax algorithm to obtain a first weight distribution result a g′ And a second weight distribution result a z′ Wherein, the softmax algorithm is expressed as:
A=Z L *G
a g =softmax-line(A)
a z =softmax-row(A)
wherein, softmax-row represents that softmax operation is carried out on the column direction of A, and softmax-line is the same;
step S602: respectively distributing the results a to the first weight g′ And entity type data, second weight distribution result a z′ And multi-head attention layer result Z L Carrying out weighted summation to obtain a first weighted summation result and a second weighted summation result;
step S603: performing second fusion operation on the first weighted summation result and the second weighted summation result according to a second fusion algorithm to obtain a second fusion matrix;
step S604: and inputting the second fusion matrix into the full connection layer to carry out dimension compression operation, and obtaining a cross attention layer result.
In the embodiment of the present application, let the multi-head attention layer output of the L-th layer be Z L . To enable the model to fuse semantic information of text sequences and entity sequences, entities fuse data G and Z in a cross-attention layer L The following formula is calculated:
A=Z L *G
a g =softmax-line(A)
a z =softmax-row(A)
wherein softmax-row represents that softmax operation is carried out on the column direction of A, and softmax-line is the same. Obtaining weight distribution a after softmax operation g′ And a z′ . Wherein a is g′ Representing the attention weight distribution of a word to all entities, a z′ And vice versa.
Are respectively to a g And G, a z And Z L The weighted sum is performed, and the formula is as follows:
Figure BDA0003630577070000091
Figure BDA0003630577070000092
in the embodiment of the application, in order to further fuse the distributed expression of the text and the entity sequence, 2 fusion methods are provided on the basis of the vectors, the results are spliced to obtain a vector Z', the process is repeated, and an output matrix Z is generated L″ . The formula is as follows:
Figure BDA0003630577070000093
wherein the content of the first and second substances,
Figure BDA0003630577070000094
the two in table form are subtracted element by element,
Figure BDA0003630577070000095
representing the element-by-element multiplication of the two.
In the embodiment of the application, because the output matrix and the weight matrix of the lower network have different dimensions, the output matrix is sent to the full connection layer for dimension compression, and the final output of the WEA layer is obtained. The calculation formula is as follows:
Figure BDA0003630577070000096
in the embodiment of the application, the output of the WEA layer is sent to the residual error and feedforward neural network layer for operation, and the output of the encoder is obtained. Finally, 12 encoders perform the above operation, and the inputs of the WEA layer of each encoder share the information of the same EntityEmbedding layer.
In some optional implementation manners of this embodiment, the output of the top-level model is sent to a decoder for decoding, and the text abstract model is trained in an autoregressive manner and parameters in the text abstract model are optimized by using an Adam optimization method. The training target formula is as follows:
Figure BDA0003630577070000101
it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
Example two
With further reference to fig. 7, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a text abstract generating apparatus for merging entity information, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices in particular.
As shown in fig. 7, the text summary generation apparatus 200 of the converged entity information of the present embodiment includes: a data acquisition module 210, an entity extraction module 220, a first fusion module 230, a vector conversion module 240, a digest encoding module 250, and a digest decoding module 260. Wherein:
a data obtaining module 210, configured to obtain original text data to be processed;
an entity extraction module 220, configured to perform entity extraction operation on the original text data to obtain entity text data and entity type data;
the first fusion module 230 is configured to perform a first fusion operation on the entity text data and the entity type data to obtain entity fusion data;
the vector conversion module 240 is configured to perform vector conversion operation on the original text data to obtain text vector data;
the abstract coding module 250 is used for respectively inputting the text vector data and the entity fusion data into a language representation model for abstract coding operation to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder module;
and the abstract decoding module 260 is configured to perform decoding operation on the abstract coded data to obtain a target text abstract.
In the embodiment of the present application, referring to fig. 3, a schematic structural diagram of an Encoder module provided in the embodiment of the present application is shown, where an original text to be abstracted is assumed to be D ═ x 1 ,x 2 ,…,x n ]Wherein x is i Is the ith word of the document and n is the length of the document. Firstly, using a text tool (such as Hanlp, etc.) to extract entities in an original text (called entity extraction), and each entity carries an entity type (name, place name, etc.), thereby forming an entity sequence E ═ E 1 ,e 2 ,…,e m ]And entity type sequence E '═ E' 1 ,e′ 2 ,…,e′ m ]. Wherein e i The ith entity representing sequence, e' i The same is true. Mapping both the EntityEmbedding layer and the TypeEmbedding layer to fixed vectors with the same dimension to respectively generate a vector sequence T ═ T 1 ,t 2 ,…,t m ]And K ═ K 1 ,k 2 ,…,k m ]. Considering the importance degree of the entity and the influence of the length of the entity, the T and the K need to be fused.
In the embodiment of the application, original text data D is mapped to vectors with the same dimension through a wordlebelling layer to obtain a matrix H ═ H 1 ,h 2 ,…,h n ]。
In an embodiment of the present application, there is provided a text summary generating apparatus 200 for merging entity information, including: a data obtaining module 210, configured to obtain original text data to be processed; an entity extraction module 220, configured to perform entity extraction operation on the original text data to obtain entity text data and entity type data; the first fusion module 230 is configured to perform a first fusion operation on the entity text data and the entity type data to obtain entity fusion data; the vector conversion module 240 is configured to perform vector conversion operation on the original text data to obtain text vector data; the abstract coding module 250 is used for respectively inputting the text vector data and the entity fusion data into a language representation model for abstract coding operation to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder module; and the abstract decoding module 260 is configured to perform decoding operation on the abstract coded data to obtain a target text abstract. Compared with the prior art, the method and the device have the advantages that an Entity mapping layer (hereinafter referred to as Entity mapping) and an Entity type mapping layer (hereinafter referred to as type mapping) are added on the basis of an original network, Entity information of an input text is mapped to 2 vectors with the same dimension respectively, and the information amount received by a model is enhanced; meanwhile, a word-entity cross attention layer is added between the multi-head attention layer and the feedforward neural network layer of each sub-module, so that the expression capability of the model to the entity is enhanced, and the decoder can accurately extract important information.
Continuing to refer to fig. 8, a schematic structural diagram of a specific implementation of the first fusion module 230 provided in the second embodiment of the present application is shown, and for convenience of illustration, only the portions related to the present application are shown.
In some optional implementations of this embodiment, the first fusion module 230 includes: a first fusion submodule 231, wherein:
the first fusion submodule 231 is configured to perform fusion calculation on the entity text data and the entity type data according to an entity fusion algorithm to obtain entity fusion data, where the entity fusion algorithm is expressed as:
Figure BDA0003630577070000121
wherein, g i Entity fusion data representing an ith entity; w is a i Representing the ith entity text data; tfidf (w) i ) Word frequency-inverse representing ith entity text dataA document frequency index; max (tfidf (w) i ),i∈[1,m]) Maximum word frequency-inverse document frequency index representing all entities in the original text data; | w i L represents the length of the ith entity; max (| w) i |,i∈[1,m]) Representing the length of the longest entity in the original text data.
In some optional implementations of this embodiment, the digest encoding module 250 includes: the calculation splicing submodule, the second fusion submodule and the abstract coding submodule are connected, wherein:
the calculation splicing submodule is used for performing self-attention layer calculation and splicing operation in the multi-head attention layer according to the text vector data to obtain a multi-head attention layer result;
the second fusion submodule is used for carrying out second fusion operation in the word-entity cross attention layer according to the multi-head attention layer result and the entity fusion data to obtain a cross attention layer result;
and the abstract coding sub-module is used for inputting the cross attention layer result into the residual error neural network and the feedforward neural network to obtain abstract coding data.
In some alternative implementations of the present embodiment, the multi-headed attention layer result Z L Expressed as:
Figure BDA0003630577070000131
wherein the content of the first and second substances,
Figure BDA0003630577070000132
H L text vector data representing L-th layer transforms.
In some optional implementations of this embodiment, the second fusion submodule includes: the device comprises a softmax operation unit, a weighted summation unit, a second fusion unit and a dimension compression unit, wherein:
a softmax operation unit, configured to perform softmax operation on the multi-head attention layer result and the entity fusion data according to a softmax algorithm to obtain a first weight distribution result a g′ And the second rightRedistribution of the results a z′ Wherein the softmax algorithm is expressed as:
A=Z L *G
a g =softmax-line(A)
a z =softmax-row(A)
wherein, softmax-row represents that softmax operation is carried out on the column direction of A, and softmax-line is the same;
a weighted summation unit for respectively weighting the first weight distribution result a g′ And entity type data, second weight distribution result a z′ And multi-head attention layer result Z L Carrying out weighted summation to obtain a first weighted summation result and a second weighted summation result;
the second fusion unit is used for performing second fusion operation on the first weighted summation result and the second weighted summation result according to a second fusion algorithm to obtain a second fusion matrix;
and the dimension compression unit is used for inputting the second fusion matrix to the full connection layer to carry out dimension compression operation so as to obtain a cross attention layer result.
In the embodiment of the present application, let the multi-head attention layer output of the L-th layer be Z L . To enable the model to fuse semantic information of text sequences and entity sequences, entities fuse data G and Z in a cross-attention layer L The following formula is calculated:
A=Z L *G
a g =softmax-line(A)
a z =softmax-row(A)
wherein softmax-row represents that softmax operation is carried out on the column direction of A, and softmax-line is the same. Obtaining weight distribution a after softmax operation g′ And a z′ . Wherein a is g′ Representing the distribution of attention weights of a word to all entities, a z′ And vice versa.
Are respectively to a g And G, a z And Z L The weighted sum is performed, and the formula is as follows:
Figure BDA0003630577070000141
Figure BDA0003630577070000142
in the embodiment of the application, in order to further fuse the distributed expression of the text and the entity sequence, 2 fusion methods are provided on the basis of the vectors, the results are spliced to obtain a vector Z', the process is repeated, and an output matrix Z is generated L″ . The formula is as follows:
Figure BDA0003630577070000143
wherein the content of the first and second substances,
Figure BDA0003630577070000144
the two in table form are subtracted element by element,
Figure BDA0003630577070000145
representing the element-by-element multiplication of the two.
In the embodiment of the application, because the output matrix and the weight matrix of the lower network have different dimensions, the output matrix is sent to the full connection layer for dimension compression, and the final output of the WEA layer is obtained. The calculation formula is as follows:
Figure BDA0003630577070000146
in the embodiment of the application, the output of the WEA layer is sent to the residual error and feedforward neural network layer for operation, and the output of the encoder is obtained. Finally, 12 encoders perform the above operation, and the inputs of the WEA layer of each encoder share the information of the same EntityEmbedding layer.
In some optional implementation manners of this embodiment, the output of the top-level model is sent to a decoder for decoding, and the text abstract model is trained in an autoregressive manner and parameters in the text abstract model are optimized by using an Adam optimization method. The training target formula is as follows:
Figure BDA0003630577070000151
in order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 9, fig. 9 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 300 includes a memory 310, a processor 320, and a network interface 330 communicatively coupled to each other via a system bus. It is noted that only computer device 300 having components 310 and 330 is shown, but it is understood that not all of the shown components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 310 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 310 may be an internal storage unit of the computer device 300, such as a hard disk or a memory of the computer device 300. In other embodiments, the memory 310 may also be an external storage device of the computer device 300, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 300. Of course, the memory 310 may also include both internal and external storage devices of the computer device 300. In this embodiment, the memory 310 is generally used for storing an operating system installed in the computer device 300 and various types of application software, such as computer readable instructions of a text summary generation method for merging entity information. In addition, the memory 310 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 320 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 320 is generally operative to control overall operation of the computer device 300. In this embodiment, the processor 320 is configured to execute computer readable instructions stored in the memory 310 or computer readable instructions for processing data, such as executing the text summary generation method of the converged entity information.
The network interface 330 may include a wireless network interface or a wired network interface, and the network interface 330 is generally used to establish a communication connection between the computer device 300 and other electronic devices.
According to the computer equipment, an Entity mapping layer (hereinafter referred to as Entity mapping) and an Entity type mapping layer (hereinafter referred to as type mapping) are added on the basis of an original network, Entity information of an input text is respectively mapped to 2 vectors with the same dimension, and the information amount received by a model is enhanced; meanwhile, a word-entity cross attention layer is added between the multi-head attention layer and the feedforward neural network layer of each sub-module, so that the expression capability of the model to the entity is enhanced, and the decoder can accurately extract important information.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the text summary generation method for fused entity information as described above.
The computer-readable storage medium provided by the application is characterized in that an Entity mapping layer (hereinafter referred to as Entity mapping) and an Entity type mapping layer (hereinafter referred to as type mapping) are added on the basis of an original network, Entity information of an input text is respectively mapped to 2 vectors with the same dimension, and the information quantity received by a model is enhanced; meanwhile, a word-entity cross attention layer is added between the multi-head attention layer and the feedforward neural network layer of each sub-module, so that the expression capability of the model to the entity is enhanced, and the decoder can accurately extract important information.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A text abstract generating method for fusing entity information is characterized by comprising the following steps:
acquiring original text data to be processed;
performing entity extraction operation on the original text data to obtain entity text data and entity type data;
performing a first fusion operation on the entity text data and the entity type data to obtain entity fusion data;
performing vector conversion operation on the original text data to obtain text vector data;
respectively inputting the text vector data and the entity fusion data into a language representation model for abstract coding operation to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder module;
and decoding the abstract coded data to obtain a target text abstract.
2. The method for generating the text abstract of the converged entity information according to claim 1, wherein the step of performing the first fusion operation on the entity text data and the entity type data to obtain the entity converged data specifically comprises the following steps:
performing fusion calculation on the entity text data and the entity type data according to an entity fusion algorithm to obtain the entity fusion data, wherein the entity fusion algorithm is represented as:
Figure FDA0003630577060000011
wherein, g i Entity fusion data representing an ith entity; w is a i Representing the ith entity text data; tfidf (w) i ) A word frequency-inverse document frequency index representing the ith entity text data; max (tfidf (w) i ),i∈[1,m]) Representing a maximum word frequency-inverse document frequency index of all entities in the original text data; | w i L represents the length of the ith entity; max (| w) i |,i∈[1,m]) Representing the length of the longest entity in the original text data.
3. The method for generating the text abstract of the fused entity information according to claim 1, wherein the step of inputting the text vector data and the entity fused data into a trained language representation model for abstract coding to obtain abstract coded data comprises the following steps:
performing self-attention layer calculation and splicing operation in the multi-head attention layer according to the text vector data to obtain a multi-head attention layer result;
performing a second fusion operation in the word-entity cross attention layer according to the multi-head attention layer result and the entity fusion data to obtain a cross attention layer result;
and inputting the cross attention layer result into a residual error neural network and a feed-forward neural network to obtain the summary coding data.
4. The method as claimed in claim 3, wherein the multi-head attention layer result Z is generated by a text summary of the fused entity information L Expressed as:
Figure FDA0003630577060000021
wherein the content of the first and second substances,
Figure FDA0003630577060000022
H L text vector data representing L-th layer transforms.
5. The method for generating a text summary of fused entity information according to claim 3, wherein the step of performing a second fusion operation in the word-entity cross attention layer according to the multi-head attention layer result and the entity fusion data to obtain a cross attention layer result specifically includes the following steps:
performing softmax operation on the multi-attention layer result and the entity fusion data according to a softmax algorithm to obtain a first weight distribution result a g′ And a second weight distribution result a z, Wherein the softmax algorithm is represented as:
A=Z L *G
a g =softmax-line(A)
a z =softmax-row(A)
wherein, softmax-row represents that softmax operation is carried out on the column direction of A, and softmax-line is the same;
respectively distributing the first weight results a g′ And entity type data, second weight distribution result a z′ And multi-head attention layer result Z L Carrying out weighted summation to obtain a first weighted summation result and a second weighted summation result;
performing second fusion operation on the first weighted sum result and the second weighted sum result according to a second fusion algorithm to obtain a second fusion matrix;
and inputting the second fusion matrix into a full connection layer to carry out dimension compression operation, so as to obtain the cross attention layer junction.
6. The method for generating a text excerpt of converged entity information according to claim 1, wherein after the step of performing a decoding operation on the excerpt encoded data to obtain the target text excerpt, the method further comprises the steps of:
training the language characterization model according to an autoregressive mode, and optimizing parameters of the language characterization model according to an Adam optimization method, wherein a training target of the autoregressive mode is represented as:
Figure FDA0003630577060000031
7. a text abstract generating device fusing entity information is characterized by comprising:
the data acquisition module is used for acquiring original text data to be processed;
the entity extraction module is used for carrying out entity extraction operation on the original text data to obtain entity text data and entity type data;
the first fusion module is used for carrying out first fusion operation on the entity text data and the entity type data to obtain entity fusion data;
the vector conversion module is used for carrying out vector conversion operation on the original text data to obtain text vector data;
the abstract coding module is used for respectively inputting the text vector data and the entity fusion data into a language representation model to carry out abstract coding operation so as to obtain abstract coding data, wherein the language representation model is formed by overlapping 12 layers of transform Encoder modules, and a word-entity cross attention layer is arranged between a multi-head attention layer and a feedforward neural network layer of the Encoder module;
and the abstract decoding module is used for decoding the abstract coded data to obtain a target text abstract.
8. The apparatus for generating a text summary of converged entity information according to claim 7, wherein the first convergence module comprises:
the first fusion submodule is used for performing fusion calculation on the entity text data and the entity type data according to an entity fusion algorithm to obtain the entity fusion data, wherein the entity fusion algorithm is expressed as:
Figure FDA0003630577060000041
wherein, g i Entity fusion data representing an ith entity; w is a i Representing the ith entity text data; tfidf (w) i ) A word frequency-inverse document frequency index representing the ith entity text data; max (tfidf (w) i ),i∈[1,m]) Representing a maximum word frequency-inverse document frequency index of all entities in the original text data; | w i L represents the length of the ith entity; max (| w) i |,i∈[1,m]) Representing the length of the longest entity in the original text data.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor that when executed performs the steps of the method for generating a text excerpt of converged entity information according to any one of claims 1 to 6.
10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the text abstract generating method for converged entity information according to any one of claims 1 to 6.
CN202210489164.XA 2022-05-06 2022-05-06 Text abstract generation method and device, computer equipment and storage medium Pending CN114881033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210489164.XA CN114881033A (en) 2022-05-06 2022-05-06 Text abstract generation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210489164.XA CN114881033A (en) 2022-05-06 2022-05-06 Text abstract generation method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114881033A true CN114881033A (en) 2022-08-09

Family

ID=82674247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210489164.XA Pending CN114881033A (en) 2022-05-06 2022-05-06 Text abstract generation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114881033A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407051A (en) * 2023-12-12 2024-01-16 武汉大学 Code automatic abstracting method based on structure position sensing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407051A (en) * 2023-12-12 2024-01-16 武汉大学 Code automatic abstracting method based on structure position sensing
CN117407051B (en) * 2023-12-12 2024-03-08 武汉大学 Code automatic abstracting method based on structure position sensing

Similar Documents

Publication Publication Date Title
WO2022142014A1 (en) Multi-modal information fusion-based text classification method, and related device thereof
WO2022007438A1 (en) Emotional voice data conversion method, apparatus, computer device, and storage medium
WO2021135455A1 (en) Semantic recall method, apparatus, computer device, and storage medium
CN112863683A (en) Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium
CN112231569A (en) News recommendation method and device, computer equipment and storage medium
CN112653798A (en) Intelligent customer service voice response method and device, computer equipment and storage medium
WO2022126904A1 (en) Voice conversion method and apparatus, computer device, and storage medium
CN113987169A (en) Text abstract generation method, device and equipment based on semantic block and storage medium
CN112650842A (en) Human-computer interaction based customer service robot intention recognition method and related equipment
CN113947095A (en) Multilingual text translation method and device, computer equipment and storage medium
CN112836521A (en) Question-answer matching method and device, computer equipment and storage medium
CN112699213A (en) Speech intention recognition method and device, computer equipment and storage medium
CN113052262A (en) Form generation method and device, computer equipment and storage medium
CN114420107A (en) Speech recognition method based on non-autoregressive model and related equipment
CN115438149A (en) End-to-end model training method and device, computer equipment and storage medium
CN113420212A (en) Deep feature learning-based recommendation method, device, equipment and storage medium
CN110222144B (en) Text content extraction method and device, electronic equipment and storage medium
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN114881033A (en) Text abstract generation method and device, computer equipment and storage medium
CN112598039B (en) Method for obtaining positive samples in NLP (non-linear liquid) classification field and related equipment
CN117312535A (en) Method, device, equipment and medium for processing problem data based on artificial intelligence
CN114742058B (en) Named entity extraction method, named entity extraction device, computer equipment and storage medium
CN113256395B (en) Product recommendation method, device, equipment and storage medium based on recommendation graph network
CN113505595A (en) Text phrase extraction method and device, computer equipment and storage medium
CN113420869A (en) Translation method based on omnidirectional attention and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination