CN111191023A - Automatic generation method, device and system for topic labels - Google Patents

Automatic generation method, device and system for topic labels Download PDF

Info

Publication number
CN111191023A
CN111191023A CN201911395888.2A CN201911395888A CN111191023A CN 111191023 A CN111191023 A CN 111191023A CN 201911395888 A CN201911395888 A CN 201911395888A CN 111191023 A CN111191023 A CN 111191023A
Authority
CN
China
Prior art keywords
topic
content
cls
source text
transformer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911395888.2A
Other languages
Chinese (zh)
Other versions
CN111191023B (en
Inventor
李建欣
毛乾任
李熙
黄洪仁
钟盛海
朱洪东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201911395888.2A priority Critical patent/CN111191023B/en
Publication of CN111191023A publication Critical patent/CN111191023A/en
Application granted granted Critical
Publication of CN111191023B publication Critical patent/CN111191023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method, a device and a system for automatically generating topic labels comprise the following steps: the method comprises the following steps: constructing a training data set and preprocessing the data, and performing the second step: a Transformer encoder for realizing a content selection mechanism based on content segments, a third step: a topic abstract generator model of a Transformer decoder; step four: training data and optimizing according to cross validation, and realizing model encapsulation and interface realization of the device; the invention realizes the automatic generation of the topic labels through the text abstract generation technology, provides a new scene for generating the topic labels, provides the Transformer coding of a content selection mechanism, extracts important source text segments, inputs the important source text segments into a decoder for text generation, captures effective core semantic segments through the design, and reduces the expense of model training.

Description

Automatic generation method, device and system for topic labels
Technical Field
The invention relates to the technical field of computers, in particular to a method, a device and a system for automatically generating topic labels.
Background
The rapid development of the internet has been accompanied by the generation of a large amount of text data every day, and each user of the internet can distribute and modify contents on the internet as a content producer, so that contents on the internet are increasing every day at an unthinkable rate. The mass content on the Internet brings great convenience to people who want to acquire the content, and people can find any content which they want to acquire from the Internet; meanwhile, a large amount of invalid content and junk content exist in mass content on the internet, so that the way for acquiring new information is fundamentally changed in the internet information era, and more challenges are brought. Balancing the relationship between content generation and content acquisition has become a research focus in the field of Natural Language Processing (NLP). How to efficiently acquire useful information from a large amount of disordered, disordered and unstructured texts becomes an urgent problem to be solved. Among many information processing methods, text generation can provide a way for users to quickly understand the content of the original text, can help people quickly understand the information in the content, and is a key point for balancing the relationship between content generation and content acquisition.
The generation of topics is a high generalization of the upper semantic content for text. The topic is a central semantic meaning obtained by concentrating a text, and in the process of generating the text with the high-order semantic meanings, firstly, effective representation is required to be carried out on the word and sentence levels of the text, and then, the representation vectors of the word and the sentence are required to be abstracted into higher-order feature words to form the topic.
In the text analysis of the social data for the microblog, a large amount of texts may not contain topic tags, and in order to solve the problem, the texts need to be automatically summarized by a method for realizing automatic generation in the process of summarizing topics, so as to generate topics. At present, a thorough research aiming at a topic automatic generation and induction method is lacked, and the intensive research mainly focuses on the extraction research and topic content clustering research of topic keywords for social media analysis in China. Microblogs report the latest topics from daily life to and around the world, reflecting real-time events in our lives. Considering the huge volume of microblog contents and the inherent redundancy and noise thereof, the method for generating topics by clustering information is not suitable for application landing, and the method is limited by clustering performance. Most of the current methods of generating based on topic labels utilize and organize the semantic materials in a form based on rules to generate the topic labels, and the whole pipeline process has the problems of error accumulation and error transmission. The topic summarization method is low in availability, narrow in adaptability, low in accuracy and high in cost.
Disclosure of Invention
In order to solve the technical problem, the invention provides an end-to-end topic label automatic generation method based on a Transformer model.
A topic label automatic generation method comprises the following steps:
the method comprises the following steps: constructing a training data set and preprocessing data;
step two: a Transformer encoder feature encoder for implementing a content selection mechanism based on content segments;
step three: a topic abstract generator model of a Transformer decoder;
step four: training data and adjusting and optimizing according to cross validation, and realizing model encapsulation and interface realization of the device.
Further, in the first step, the method for constructing the training data set and preprocessing the data includes:
dividing the microblog topic and the microblog content, and generating a topic label by using a Source text;
screening sentences with topic semantics, and generating topics by using the screened sentences;
dividing the microblog content into segments, segmenting the source text content, and presenting the source text content in a segment form;
semantically coding the source text in the form of the fragments, and adding [ cls ] and [ eos ] labels to each fragment;
combining each fragment with the starting tag and the ending tag of the fragment, and adding a [ senten ] tag at the beginning of a sentence for learning the semantics of the sentence to obtain Source data;
and constructing a training data set, processing data taking the topic as Target, and simultaneously filling the data into a model for training to obtain an initial training corpus.
Further, in the second step, the method for implementing a transformer encoder based on a content selection mechanism of content segments includes:
the method comprises the following steps that a Transformer based on a content selection mechanism encodes microblog content, obtains microblog content vector representation, and obtains a source text sentence feature encoding vector:
sourceembedding=Transformer(weibo content)
extracting a sentence [ senten ]]Labels and [ clsi]Label corresponding Embedding:
Tsenten=GetSenten(sourceembedding)
Tcls=GetCls(sourceembedding)
wherein, TsentenFeature vectors, T, representing source text sentences output by a Transformer encoderclsRepresenting a set of feature vectors representing each content segment output by the transform encoder;
transform feature coding using content selection mechanism by computing the sentence [ senten ]]Represents a group of formulae [ cls ]i]The importance of the fragment representation is mainly calculated by utilizing a bilinear function attention mechanismsentenAnd TclsThe importance of (c):
Ri=TsentenWTcls
Figure RE-GDA0002444090280000031
wherein R isiTo representFeature weight calculation input vector integrating TsentenAnd TclsThe semantic correlation of the two labels is learned through a weight matrix W, and the semantic correlation is obtained through the normalization calculation of a Softmax function
Figure RE-GDA0002444090280000033
I.e. each TclsRelative to TsentenThe importance weight of;
extracting 3 [ cls ] with highest similarityi]And taking the content fragment text corresponding to the label as the input of the generator:
Figure RE-GDA0002444090280000032
wherein, T[3]Indicating three important [ cls ] selectedi]All token vectors for the fragment.
Further, in the third step, the implementation method of the topic abstract generator model of the Transformer decoder is as follows:
encoding topic text using a Transformer
tar getembedding=Transformer(weibo topic);
Inputting the feature codes and the topic codes obtained in the step two into a transform abstract generator to generate an abstract
y=Decoder(tar getembedding,T[1-l])。
Further, training data in the fourth step is optimized according to cross validation, and model encapsulation and interface realization of the device are realized; and designing a loss function to train the model, and after the parameters are adjusted and optimized, using the trained model for device interface packaging.
The invention also provides a technical scheme, which comprises the following steps:
an automatic generation device of a topic label comprises an information input module, a source text preprocessing module and a topic label generating module, wherein the information input module is used for preprocessing the content of a source text and inputting the source text; the topic label automatic generation module is used for carrying out abstract generation on an input source text by applying the topic label automatic generation method based on the content segment content selection mechanism; and the information output module outputs the automatically generated abstract through an interface program.
The invention also provides a technical scheme, which comprises the following steps:
when the server executes the summary generation process, the server obtains a source text from a data input module through the automatic topic label generation device of the content selection mechanism based on the content fragments, and executes the method to obtain a final topic summary output with the source text.
Aiming at the application scene of the development of a generative text abstract in the prior art on the generation of an upper-layer semantic topic label, the invention provides an automatic topic label generation method, which designs a Transformer characteristic coding model of a content selection mechanism and realizes the generation of a topic text through the training of a generative Transformer abstract generation model. The traditional topic labels are all set by manual editing, the invention realizes the automatic generation of the topic labels through a text abstract generation technology, and the invention provides a new scene for generating the topic labels.
The invention provides a Transformer coding of a content selection mechanism, extracts important source text segments, inputs the important source text segments into a decoder for text generation, and not only captures effective core semantic segments, but also reduces the expense of model training.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a topic tag generation model based on a content selection mechanism according to the present invention.
Detailed Description
So that the manner in which the features and aspects of the embodiments of the present invention can be understood in detail, a more particular description of the embodiments of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings.
To clearly illustrate the design concept of the present invention, the present invention will be described with reference to the following examples.
Example one
As shown in fig. 1, an automatic generation method of topic labels includes:
the method comprises the following steps: constructing a training data set and preprocessing data, wherein the method comprises the following steps:
taking the microblog topic as an example, the main form is as follows: "[ China International Explorer # countdown for 8 days! The second international introduction exposition of china will be held in Shanghai in 11 months, 5 days-10 days. Eating, playing, using, global good will be converged here! At that time, watching news at the heart will turn on multi-live broadcasting, take you to step into this "buying and buying at Splendid"! Forward tells more people the Bar! ". The portion included in the "#" is topic content. Then the microblog topic is "# international import exposition #".
Firstly, dividing microblog topics and microblog contents. And extracting a part included by the "#" as topic content (Target), and cleaning the rest part as microblog content (Source). The invention realizes the generation of the topic label by using the Source text.
In fact, the embodiment can see that the semantics of the microblog topics can be covered by partial sentences in the source text. Dividing the microblog content into segments, dividing the source text content by the symbols such as periods or commas, and presenting the source text content in the form of segments.
And thirdly, performing semantic coding on the source text to obtain fragments after the source text is divided, and adding [ cls ] and [ eos ] labels to the front and the back of each fragment respectively, wherein the [ cls ] labels are mainly used for learning coding information for the beginning of the fragments in the sentence modeling process and can represent the semantics of the whole fragments, and the [ eos ] is mainly used for learning the semantics of the ending of the fragments.
And fourthly, combining each segment with the start and end labels of the segment. And adds a [ senten ] at the beginning of the sentence]The tags are used to learn the semantics of the entire sentence. The structure is obtained as { [ senten ]],[cls1],[x1],[cls2],[x2],...,[eos]Source data of. Where x represents the word vector representation in a segment.
Fifthly, training data is constructed, the purpose of the invention is to realize the generation of a topic label, a large number of real samples are needed in the modeling process, and after a large number of real topic labels and corresponding source texts are collected on a microblog platform, the construction of a training data set comprises the following steps: the method comprises the steps of processing data with a topic as Target and simultaneously filling the data into a model for training, wherein the source text is coded content, and the topic is decoded content. The structure of the topic Target data is { [ senten ], [ y ], [ eos ] }. The initial corpus is obtained.
Step two: a Transformer characteristic encoder based on a content segment selection mechanism is realized, and the method comprises the following steps:
after semantic coding of the source text and the topic, a generation model needs to be established for model modeling of topic label data generation. The invention provides a Transformer Encoder characteristic coding based on a content fragment content selection mechanism by using sequence-to-sequence characteristic coding capability of a Transformer model for reference, selects three content fragments with the highest importance scores as compressed representation of the whole microblog content, and inputs vectors of the compressed representation into a decoder to assist a Transformer decoder at a decoder end of the sequence model to realize topic generation. The method comprises the following specific steps:
firstly, as shown in fig. 2, a Transformer based on the content selection mechanism provided by the present invention encodes microblog content to obtain a microblog content vector representation. And obtaining a feature coding vector of the source text sentence.
sourceembedding=Transformer(weibo content)
Secondly, extracting the sentence [ senten]Labels and [ clsi]And the labels respectively imply the overall characteristics of the sentence and the characteristics of each content segment. This step extracts the sentence [ senten]Labels and [ clsi]Label corresponding to label.
Tsenten=GetSenten(sourceembedding)
Tcls=GetCls(sourceembedding)
Wherein, TsentenFeature vectors, T, representing source text sentences output by a Transformer encoderclsA set of feature vectors representing each content segment output by the transform encoder is represented.
Thirdly, using Transformer feature coding of content selection mechanism to calculate sentence [ senten]Represents a group of formulae [ cls ]i]The importance of the fragment representation is mainly calculated by utilizing a bilinear function attention mechanismsentenAnd TclsThe importance of (c).
Ri=TsentenWTcls
Figure RE-GDA0002444090280000071
Wherein R isiRepresenting a feature weight calculation input vector integrating TsentenAnd TclsAnd learning the semantic relevance of the two labels through a weight matrix W. Obtained by normalization calculation of Softmax function
Figure RE-GDA0002444090280000072
I.e. each TclsRelative to TsentenImportance weight of.
Fourthly, extracting 3 [ cls ] with highest similarityi]The content fragment text corresponding to the label is used as the input of the generator.
Figure RE-GDA0002444090280000073
Wherein, T[3]Indicating three important [ cls ] selectedi]All token vectors for the fragment.
Step three: the topic abstract generator model based on the Transformer is realized, and the method comprises the following steps:
the topic abstract generator is based on a Transformer Decoder structure, and has two input parts: source data and Target data. In the generation model, Source data is a microblog content feature code generated by a Transformer based on the content selection mechanism of the topic segment in the step two, and Target is obtained by performing Transformer coding on the topic text.
Firstly, the topic text is coded by using a Transformer.
targetembedding=Transformer(weibo topic)。
And II, inputting the feature codes and the topic codes in the step II into a transform abstract generator to generate an abstract.
y=Decoder(targetembedding,T[1-l])。
Step four: training data and adjusting and optimizing according to cross validation, and realizing model encapsulation and interface realization of the device.
And designing a loss function to train the model, and after adjusting parameters, packaging the interface of the trained model.
The invention firstly considers that the composition of topic texts is often from a plurality of word groups or word combinations, and the grammatical structure of the topic texts is different from the continuity of the semantic meaning of news headlines or news topic sentence sentences on the grammatical structure. Different semantic units such as Chinese characters and words are taken as representation learning objects to be input into the model to discuss the generation effect of the different semantic units. According to the method, topic contents are usually from a fragment of an original text, namely part of Clause, secondary screening is carried out based on the Clause level, and the screened Clause contents are input into an encoder to serve as context semantics of a generated text. The invention provides a Transformer structure of a content selection mechanism, which is used for a feature representation vector of a bottom-layer modeling semantic unit, a feature representation vector of a Clause level of upper-layer modeling, an important semantic segment is obtained by selecting the feature weight of the Clause level through Attention weight calculation and calculating, and then the important semantic segment is input to a coder to be used as a context feature representation vector. The method has the advantages that important contents are selected and integrated, and the cost of model training is saved.
Aiming at the application scene of the development of a generative text abstract in the prior art on the generation of an upper-layer semantic topic label, the invention provides an automatic topic label generation method, which designs a Transformer characteristic coding model of a content selection mechanism and realizes the generation of a topic text through the training of a generative Transformer abstract generation model. The traditional topic labels are all set by manual editing, the invention realizes the automatic generation of the topic labels through a text abstract generation technology, and the invention provides a new scene for generating the topic labels.
The invention provides a Transformer coding of a content selection mechanism, extracts important source text segments, inputs the important source text segments into a decoder for text generation, and not only captures effective core semantic segments, but also reduces the expense of model training.
Example two
An automatic generation device of a topic label comprises an information input module, a source text preprocessing module and a topic label generating module, wherein the information input module is used for preprocessing the content of a source text and inputting the source text; the topic label automatic generation module is used for carrying out abstract generation on an input source text by applying the topic label automatic generation method based on the content segment content selection mechanism; and the information output module outputs the automatically generated abstract through an interface program.
EXAMPLE III
When the server executes the summary generation process, the server obtains a source text from a data input module through the automatic topic label generation device of the content selection mechanism based on the content fragments, and executes the method to obtain a final topic summary output with the source text.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (7)

1. A method for automatically generating a topic tag, the method comprising:
the method comprises the following steps: constructing a training data set and preprocessing data;
step two: a Transformer encoder feature encoder for implementing a content selection mechanism based on content segments;
step three: a topic abstract generator model of a Transformer decoder;
step four: training data and adjusting and optimizing according to cross validation, and realizing model encapsulation and interface realization of the device.
2. The method for automatically generating topic labels according to claim 1, wherein in the first step, the method for constructing the training data set and preprocessing the data comprises:
dividing the microblog topic and the microblog content, and generating a topic label by using a Source text;
screening sentences with topic semantics, and generating topics by using the screened sentences;
dividing the microblog content into segments, segmenting the source text content, and presenting the source text content in a segment form;
semantically coding the source text in the form of the fragments, and adding [ cls ] and [ eos ] labels to each fragment;
combining each fragment with the starting tag and the ending tag of the fragment, and adding a [ senten ] tag at the beginning of a sentence for learning the semantics of the sentence to obtain Source data;
and constructing a training data set, processing data taking the topic as Target, and simultaneously filling the data into a model for training to obtain an initial training corpus.
3. The method for automatically generating a topic tag according to claim 1, wherein in the second step, a method for implementing a transform encoder based on a content selection mechanism of a content segment comprises:
the method comprises the following steps that a Transformer based on a content selection mechanism encodes microblog content, obtains microblog content vector representation, and obtains a source text sentence feature encoding vector:
sourceembedding=Transformer(weibo content)
extracting a sentence [ senten ]]Labels and [ clsi]Label corresponding Embedding:
Tsenten=GetSenten(sourceembedding)
Tcls=GetCls(sourceembedding)
wherein, TsentenFeature vectors, T, representing source text sentences output by a Transformer encoderclsRepresenting a set of feature vectors representing each content segment output by the transform encoder;
transform feature coding using content selection mechanism by computing the sentence [ senten ]]Represents a group of formulae [ cls ]i]The importance of the fragment representation is mainly calculated by utilizing a bilinear function attention mechanismsentenAnd TclsThe importance of (c):
Ri=TsentenWTcls
Figure RE-FDA0002444090270000021
wherein R isiRepresenting a feature weight calculation input vector integrating TsentenAnd TclsThe semantic correlation of the two labels is learned through a weight matrix W, and the semantic correlation is obtained through the normalization calculation of a Softmax function
Figure RE-FDA0002444090270000022
I.e. each TclsRelative to TsentenThe importance weight of;
extracting 3 [ cls ] with highest similarityi]And taking the content fragment text corresponding to the label as the input of the generator:
Figure RE-FDA0002444090270000023
wherein, T[3]Indicating three important [ cls ] selectedi]All token vectors for the fragment.
4. The method for automatically generating the topic tag according to claim 1, wherein in the third step, the topic abstract generator model of the transform decoder is implemented by:
encoding topic text using a Transformer
targetembedding=Transformer(weibo topic);
Inputting the feature codes and the topic codes obtained in the step two into a transform abstract generator to generate an abstract
y=Decoder(targetembedding,T[1-l])。
5. The method for automatically generating topic labels according to claim 1, wherein the step four comprises training data and tuning according to cross validation, and realizing interface implementation of model encapsulation and devices.
6. An automatic topic tag generation apparatus, the method according to any one of claims 1 to 5, characterized in that: the method comprises an information input module, a source text preprocessing module and a source text display module, wherein the information input module is used for preprocessing the content of a source text and inputting the source text; the topic label automatic generation module is used for carrying out abstract generation on an input source text by applying the topic label automatic generation method based on the content segment content selection mechanism; and the information output module outputs the automatically generated abstract through an interface program.
7. An automatic topic tag generation system according to claim 6, wherein the automatic topic tag generation apparatus comprises: the method comprises the steps that at least one server and a topic label automatic generation device connected with the server are included, when the server executes the summary generation process, a source text is obtained from a data input module through the topic label automatic generation device based on the content selection mechanism of the content fragments, and the method is executed to obtain a final topic summary output with the source text.
CN201911395888.2A 2019-12-30 2019-12-30 Automatic generation method, device and system for topic labels Active CN111191023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911395888.2A CN111191023B (en) 2019-12-30 2019-12-30 Automatic generation method, device and system for topic labels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911395888.2A CN111191023B (en) 2019-12-30 2019-12-30 Automatic generation method, device and system for topic labels

Publications (2)

Publication Number Publication Date
CN111191023A true CN111191023A (en) 2020-05-22
CN111191023B CN111191023B (en) 2022-07-26

Family

ID=70709465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911395888.2A Active CN111191023B (en) 2019-12-30 2019-12-30 Automatic generation method, device and system for topic labels

Country Status (1)

Country Link
CN (1) CN111191023B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753497A (en) * 2020-06-29 2020-10-09 西交利物浦大学 Method and system for generating abstract by utilizing hierarchical layer Transformer based on multiple texts
CN111897965A (en) * 2020-09-29 2020-11-06 北京三快在线科技有限公司 Topic generation method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180260472A1 (en) * 2017-03-10 2018-09-13 Eduworks Corporation Automated tool for question generation
CN109635279A (en) * 2018-11-22 2019-04-16 桂林电子科技大学 A kind of Chinese name entity recognition method neural network based
CN109885673A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of Method for Automatic Text Summarization based on pre-training language model
CN110032729A (en) * 2019-02-13 2019-07-19 北京航空航天大学 A kind of autoabstract generation method based on neural Turing machine
US10380236B1 (en) * 2017-09-22 2019-08-13 Amazon Technologies, Inc. Machine learning system for annotating unstructured text

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180260472A1 (en) * 2017-03-10 2018-09-13 Eduworks Corporation Automated tool for question generation
US10380236B1 (en) * 2017-09-22 2019-08-13 Amazon Technologies, Inc. Machine learning system for annotating unstructured text
CN109635279A (en) * 2018-11-22 2019-04-16 桂林电子科技大学 A kind of Chinese name entity recognition method neural network based
CN109885673A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of Method for Automatic Text Summarization based on pre-training language model
CN110032729A (en) * 2019-02-13 2019-07-19 北京航空航天大学 A kind of autoabstract generation method based on neural Turing machine

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753497A (en) * 2020-06-29 2020-10-09 西交利物浦大学 Method and system for generating abstract by utilizing hierarchical layer Transformer based on multiple texts
CN111753497B (en) * 2020-06-29 2023-11-03 西交利物浦大学 Method and system for generating abstract based on multiple texts by using hierarchical layer convertors
CN111897965A (en) * 2020-09-29 2020-11-06 北京三快在线科技有限公司 Topic generation method and device, storage medium and electronic equipment
CN111897965B (en) * 2020-09-29 2021-01-01 北京三快在线科技有限公司 Topic generation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111191023B (en) 2022-07-26

Similar Documents

Publication Publication Date Title
Zhang et al. Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary
CN107944027B (en) Method and system for creating semantic key index
WO2021114745A1 (en) Named entity recognition method employing affix perception for use in social media
JP7106802B2 (en) Resource sorting method, method for training a sorting model and corresponding apparatus
CN110390103A (en) Short text auto-abstracting method and system based on Dual-encoder
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN104391842A (en) Translation model establishing method and system
CN112765345A (en) Text abstract automatic generation method and system fusing pre-training model
CN110619043A (en) Automatic text abstract generation method based on dynamic word vector
CN112749253B (en) Multi-text abstract generation method based on text relation graph
CN110276071A (en) A kind of text matching technique, device, computer equipment and storage medium
CN115495568B (en) Training method and device for dialogue model, dialogue response method and device
CN111723295B (en) Content distribution method, device and storage medium
CN114625866A (en) Method, device, equipment and medium for training abstract generation model
CN111191023B (en) Automatic generation method, device and system for topic labels
CN109815485A (en) A kind of method, apparatus and storage medium of the identification of microblogging short text feeling polarities
CN114969304A (en) Case public opinion multi-document generation type abstract method based on element graph attention
CN111984782A (en) Method and system for generating text abstract of Tibetan language
CN111782810A (en) Text abstract generation method based on theme enhancement
CN113626584A (en) Automatic text abstract generation method, system, computer equipment and storage medium
Zhou et al. DynamicRetriever: a pre-trained model-based IR system without an explicit index
Zhang et al. A method of constructing a fine-grained sentiment lexicon for the humanities computing of classical chinese poetry
CN113392647B (en) Corpus generation method, related device, computer equipment and storage medium
CN116958997B (en) Graphic summary method and system based on heterogeneous graphic neural network
CN116432653A (en) Method, device, storage medium and equipment for constructing multilingual database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant