CN111046672A - Multi-scene text abstract generation method - Google Patents
Multi-scene text abstract generation method Download PDFInfo
- Publication number
- CN111046672A CN111046672A CN201911264821.5A CN201911264821A CN111046672A CN 111046672 A CN111046672 A CN 111046672A CN 201911264821 A CN201911264821 A CN 201911264821A CN 111046672 A CN111046672 A CN 111046672A
- Authority
- CN
- China
- Prior art keywords
- sentence
- vector
- character
- model
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
A multi-scene text abstract generation method comprises model learning and model use, different information preferences under different scenes are fully considered through the method, differential abstract extraction under different scenes of the same document can be achieved, when a text abstract generation system is trained, data corresponding to text abstracts in a one-to-one mode are not used, and data cost is reduced.
Description
Technical Field
The invention relates to the technical field of natural language processing and text data mining, in particular to a multi-scene text abstract generating method.
Background
With the rapid development of informatization, the problem of information explosion is gradually brought to people, and how to quickly and accurately extract the desired data from a large amount of data becomes a key for improving the current information acquisition efficiency.
The existing text abstract generating system has fixed information preference for generating abstract no matter with supervision or without supervision, and is difficult to adapt to the field needing to switch scenes continuously, for example, in the medical field, doctors in different departments have obvious difference in the focus point when checking, while the traditional abstract system trained based on the supervision or the unsupervised method has fixed information extraction preference and cannot adapt to the requirements of doctors in different departments.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a multi-scene text abstract generating method for extracting different abstract of the same document in different scenes.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a multi-scene text abstract generation method comprises model learning and model use, wherein the specific model learning comprises the following steps:
a-1) acquiring an unmarked original corpus data set, wherein the original corpus data set is a plurality of complete articles, characters appearing in the original corpus data set are subjected to non-repeated coding after repeated parts of the characters are removed, the codes are continuous positive integers, and the characters are stored as a dictionary after being in one-to-one correspondence with the codes;
a-2) training a smoothness judging model of the neural network through the obtained original corpus data set to minimize the error of the smoothness judging model;
a-3) carrying out high-dimensional semantic space training through the obtained original corpus data set;
a-4) acquiring scene corpus data sets of required abstracts, and expressing the scene corpus data sets of the abstracts as { T }1,T2,T3,......,TmWhere T isiThe method comprises the steps of setting an article set under the ith scene, wherein i is more than or equal to 1 and less than or equal to m, i is a positive integer, m is the number of scenes, and setting the number of articles under each scene in a scene corpus data set of an abstract as a vector { l [/L ]1,l2,l3,......,lmIn which liIs TiThe weight vector [ lambda ] of each scene is constructed for the dictionaryi0,λi1,λi2,......,λinIn which λ isijThe summary weight corresponding to the characters coded as j in the dictionary under the ith scene is represented, j is more than or equal to 0 and less than or equal to n, n is the number of the characters in the dictionary,Nijnumber of articles appearing in ith scene for dictionary-coded jkIs TkNumber of documents, TkThe article set under the kth scene is obtained;
a-5) initializing a coding-decoding model of a neural network, extracting an article from an original corpus data set, extracting a plurality of sentences from the article, and forming the sentences into a sentence set;
a-6) inputting the sentence set into an encoder in an encoding-decoding model, decoding a decoder by using a decoding algorithm, and recording a decoding result and character probability distribution of each position in the decoding process;
a-7) sequentially inputting characters of the decoding result into the smoothness judging model, and recording character probability distribution of each position output by the smoothness judging model;
a-8) calculating the character probability distribution of each position in the decoding process and the error of the position corresponding to the character probability distribution of each position output by the currency judging model;
a-9) adjusting a coding-decoding model by using a neural network optimization algorithm, optimizing the error in the step a-8), stopping training if the error is minimum, and otherwise, skipping to execute the step a-5);
the model use comprises the following steps:
b-1) giving the articles waiting for summarization, and forming a set { S ] by breaking the articles waiting for summarization according to the appearance sequence of the sentences1,S2,S3,......,SoO is the number of sentences, i-th sentence SiHas a length of LiI is more than or equal to 1 and less than or equal to o; b-2) the first sentence S in the article to be summarized1The corresponding set of words isEvery sentence of the first sentenceThe characters look up corresponding numbers from the dictionary, corresponding vectors are taken out from the trained high-dimensional semantic space, and the taken out vectors are sequentially arranged according to the appearance sequence of the corresponding characters to form a vector sequence;
b-3) repeating the step b-2) to obtain each sentence S of the article to be abstracted1To SoUsing V, usingSijRepresenting the jth vector in the ith sentence;
b-4) weight vector from step a-4) { λi0,λi1,λi2,......,λinTake out the word weight vector of k scenes and represent as { lambda }k0,λk1,λk2,......,λkn};
b-5) defining the sentence selection vector of length o as { h }1,h2,h3,......,hoIf the selection vector h of the ith sentence is hiEqual to 0 indicates the set S1,S2,S3,......,SoThe ith sentence S iniNot in the extracted key sentence set, if hiEqual to 1 indicates a set S1,S2,S3,......,SoThe ith sentence S iniWithin the extracted set of key sentences;
b-6) using the formulaCalculating the sentence selection vector with the maximum value of the objective function, wherein lambda is in the formulakiFor the summary weight, h, corresponding to the text coded i of the k scenejValue of the selection vector, V, for the jth sentenceiFor words coded as i, corresponding to vectors, V, in a high-dimensional semantic spaceSjtIs a vector in a high-dimensional semantic space corresponding to the t character of the jth sentence in the sentence set to be abstracted, | ViL is ViModulo length, | V, of the vectorSjtL is VSjtThe die length of (2);
b-7) extracting sentences corresponding to the value equal to 1 in the sentence selection vectors calculated in b-6) to form a key sentence set;
b-8) inputting the key sentence set into the step a-6), replacing the sentence set with the key sentence set, inputting the key sentence set into the trained coding-decoding model, and decoding the decoder to obtain the final document abstract.
Furthermore, the characters in the step a) are Chinese characters, English words or numbers, tab marks and space marks are extracted and deleted from continuous numbers with decimal points and continuous English integers, and codes of all characters are stored in a json format.
Further, the training method of the compliance discriminant model in the step a-2) comprises the following steps:
a-2.1) initializing a compliance judgment model in a neural network, extracting an article from the obtained original corpus data set, and inputting an initial character of the article into the compliance judgment model;
a-2.2) outputting the probability distribution of the first character of the article by the currency judging model, carrying out error calculation on the first character of the article and the probability distribution output by the currency judging model, and recording errors;
a-2.3) forming a sequence with the length of 2 by the initial character of the article and the first character in the article, inputting the sequence into a smoothness judging model, outputting the probability distribution of the 2 nd character in the article by the smoothness judging model, and performing error calculation and recording the error by using the 2 nd character in the article and the probability distribution output by the smoothness judging model;
a-2.4) repeating the step a-2.3) until obtaining the probability distribution of all characters output by the smoothness discriminant model, calculating the error between the probability distribution of all characters and the end symbol of the article and recording the error
a-2.5) using the error recorded in the step a-2.4) to carry out optimization training on the parameter of the compliance discriminant model in the neural network, stopping the training if the obtained error in the step a-5) is minimum, otherwise, skipping to execute the step a-2.1).
Further, the high-dimensional semantic space training in the step a-3) comprises the following steps:
a-3.1) initialization vector set V0,V1,V2,V3,......,VnIn which V isiIs the height of the ith character in the dictionary in a-1)Dimension vector representation, where 1 ≦ i ≦ n, V0Representing the high-dimensional vector of characters which do not exist in the dictionary, wherein n is the number of the characters in the dictionary;
a-3.2) extracting continuous k characters from the original corpus data set to form character fragments;
a-3.3) converting each word in the character segments into corresponding codes by using a dictionary to form the code sequences corresponding to the character segments;
a-3.4) assembling the coding sequence from a set of vectors { V }0,V1,V2,V3,......,VnSequentially extracting corresponding vectors to form a vector sequence;
a-3.5) optimizing vector set, stopping training until the cosine similarity of any two vectors in the vector sequence is maximum and the cosine similarity of the vectors outside the vector sequence is minimum, otherwise, skipping to execute the step a-3.2).
The invention has the beneficial effects that: by the method, different information preferences under different scenes are fully considered, the extraction of the differential abstracts of the same document under different scenes can be realized, and when a text abstract generating system is trained, data corresponding to the text abstracts in a one-to-one mode are not used, so that the data cost is reduced.
Detailed Description
The present invention is further explained below.
A multi-scene text abstract generation method comprises model learning and model use, wherein the specific model learning comprises the following steps:
a-1) obtaining an unmarked original corpus data set, wherein the original corpus data set is a plurality of complete articles, characters appearing in the original corpus data set are subjected to non-repeated coding after repeated parts of the characters are removed, the codes are continuous positive integers, and the characters are stored as a dictionary after being in one-to-one correspondence with the codes. The channel for acquiring the original corpus data set can be an existing data set, a news article, an encyclopedia article, a medical record and the like.
a-2) training a smoothness judging model of the neural network through the obtained original corpus data set to minimize the error of the smoothness judging model. The deep learning training of the compliance judging model can be methods such as RNN, GRU, LSTM, Transformer and the like.
a-3) carrying out high-dimensional semantic space training through the obtained original corpus data set. The training method can be a skip-gram, a CBOW, a GloVe and the like.
a-4) acquiring scene corpus data sets of required abstracts, and expressing the scene corpus data sets of the abstracts as { T }1,T2,T3,......,TmWhere T isiThe method comprises the steps of setting an article set under the ith scene, wherein i is more than or equal to 1 and less than or equal to m, i is a positive integer, m is the number of scenes, and setting the number of articles under each scene in a scene corpus data set of an abstract as a vector { l [/L ]1,l2,l3,......,lmIn which liIs TiThe weight vector [ lambda ] of each scene is constructed for the dictionaryi0,λi1,λi2,......,λinIn which λ isijThe summary weight corresponding to the characters coded as j in the dictionary under the ith scene is represented, j is more than or equal to 0 and less than or equal to n, n is the number of the characters in the dictionary,Nijnumber of articles appearing in ith scene for dictionary-coded jkIs TkNumber of documents, TkThe article set under the kth scene is obtained;
a-5) initializing a coding-decoding model of a neural network, extracting an article from an original corpus data set, extracting a plurality of sentences from the article, and forming the sentences into a sentence set;
a-6) inputting the sentence set into an encoder in an encoding-decoding model, decoding a decoder by using a decoding algorithm, and recording a decoding result and character probability distribution of each position in the decoding process;
a-7) sequentially inputting characters of the decoding result into the smoothness judging model, and recording character probability distribution of each position output by the smoothness judging model;
a-8) calculating the character probability distribution of each position in the decoding process and the error of the position corresponding to the character probability distribution of each position output by the currency judging model;
a-9) adjusting a coding-decoding model by using a neural network optimization algorithm, optimizing the error in the step a-8), stopping training if the error is minimum, and otherwise, skipping to execute the step a-5);
the model use comprises the following steps:
b-1) giving the articles waiting for summarization, and forming a set { S ] by breaking the articles waiting for summarization according to the appearance sequence of the sentences1,S2,S3,......,SoO is the number of sentences, i-th sentence SiHas a length of Li,1≤i≤o;
b-2) the first sentence S in the article to be summarized1The corresponding set of words isSearching each character of the first sentence from the dictionary for a corresponding number, taking out a corresponding vector from the trained high-dimensional semantic space, and sequentially arranging the taken-out vectors according to the sequence of the corresponding characters to form a vector sequence;
b-3) repeating the step b-2) to obtain each sentence S of the article to be abstracted1To SoUsing V, usingSijRepresenting the jth vector in the ith sentence;
b-4) weight vector from step a-4) { λi0,λi1,λi2,......,λinTake out the word weight vector of k scenes and represent as { lambda }k0,λk1,λk2,......,λkn};
b-5) defining the sentence selection vector of length o as { h }1,h2,h3,......,hoIf the selection vector h of the ith sentence is hiEqual to 0 indicates the set S1,S2,S3,......,SoThe ith sentence S iniNot in the extracted key sentence set, if hiEqual to 1 indicates a set S1,S2,S3,......,SoInThe ith sentence SiWithin the extracted set of key sentences;
b-6) using the formulaCalculating the sentence selection vector with the maximum value of the objective function, wherein lambda is in the formulakiFor the summary weight, h, corresponding to the text coded i of the k scenejValue of the selection vector, V, for the jth sentenceiFor words coded as i, corresponding to vectors, V, in a high-dimensional semantic spaceSjtIs a vector in a high-dimensional semantic space corresponding to the t character of the jth sentence in the sentence set to be abstracted, | ViL is ViModulo length, | V, of the vectorSjtL is VSjtThe die length of (2);
b-7) extracting sentences corresponding to the value equal to 1 in the sentence selection vectors calculated in b-6) to form a key sentence set;
b-8) inputting the key sentence set into the step a-6), replacing the sentence set with the key sentence set, inputting the key sentence set into the trained coding-decoding model, and decoding the decoder to obtain the final document abstract.
By the method, different information preferences under different scenes are fully considered, the extraction of the differential abstracts of the same document under different scenes can be realized, and when a text abstract generating system is trained, data corresponding to the text abstracts in a one-to-one mode are not used, so that the data cost is reduced.
Further, the characters in step a) are Chinese characters, English words or numbers, tab and space characters are extracted and deleted from continuous numbers with decimal points and continuous English integers, split units are combined in the same type, the whole data set is traversed, a non-repeated split unit set is obtained, the final set is encoded, and a json format is used for storage (for example: { 'of' 1, 'of' 2, … … }). .
Further, the training method of the compliance discriminant model in the step a-2) comprises the following steps:
a-2.1) initializing a compliance judgment model in a neural network, extracting an article from the obtained original corpus data set, and inputting an initial character of the article into the compliance judgment model;
a-2.2) outputting the probability distribution of the first character of the article by the currency judging model, carrying out error calculation on the first character of the article and the probability distribution output by the currency judging model, and recording errors;
a-2.3) forming a sequence with the length of 2 by the initial character of the article and the first character in the article, inputting the sequence into a smoothness judging model, outputting the probability distribution of the 2 nd character in the article by the smoothness judging model, and performing error calculation and recording the error by using the 2 nd character in the article and the probability distribution output by the smoothness judging model;
a-2.4) repeating the step a-2.3) until obtaining the probability distribution of all characters output by the smoothness discriminant model, calculating the error between the probability distribution of all characters and the end symbol of the article and recording the error
a-2.5) using the error recorded in the step a-2.4) to carry out optimization training on the parameter of the compliance discriminant model in the neural network, stopping the training if the obtained error in the step a-5) is minimum, otherwise, skipping to execute the step a-2.1).
The finally trained model can deduce the probability of the next word in the neighborhood according to the given information, the probability is a vector, the length of the vector is the same as that of the dictionary stored in the json format, and the probability of the word of the corresponding index in the dictionary is stored in the corresponding position.
Giving a character sequence, giving a starting character, and outputting the probability distribution of a first character; giving the probability distribution of the first character and outputting the second character, and circulating in sequence until the input of all the characters is completed, namely obtaining the probability distribution corresponding to each position.
Further, the high-dimensional semantic space training in the step a-3) comprises the following steps:
a-3.1) initialization vector set V0,V1,V2,V3,......,VnIn which V isiIs high-dimensional vector representation of ith character in dictionary in a-1), wherein i is more than or equal to 1 and less than or equal to n, V0Is a high-dimensional vector representation of characters which do not exist in a dictionary, and n is a characterThe number of characters in the dictionary;
a-3.2) extracting continuous k characters from the original corpus data set to form character fragments;
a-3.3) converting each word in the character segments into corresponding codes by using a dictionary to form the code sequences corresponding to the character segments;
a-3.4) assembling the coding sequence from a set of vectors { V }0,V1,V2,V3,......,VnSequentially extracting corresponding vectors to form a vector sequence;
a-3.5) optimizing vector set, stopping training until the cosine similarity of any two vectors in the vector sequence is maximum and the cosine similarity of the vectors outside the vector sequence is minimum, otherwise, skipping to execute the step a-3.2).
Claims (4)
1. A multi-scene text abstract generation method is characterized by comprising model learning and model use, wherein the specific model learning comprises the following steps:
a-1) acquiring an unmarked original corpus data set, wherein the original corpus data set is a plurality of complete articles, characters appearing in the original corpus data set are subjected to non-repeated coding after repeated parts of the characters are removed, the codes are continuous positive integers, and the characters are stored as a dictionary after being in one-to-one correspondence with the codes;
a-2) training a smoothness judging model of the neural network through the obtained original corpus data set to minimize the error of the smoothness judging model;
a-3) carrying out high-dimensional semantic space training through the obtained original corpus data set;
a-4) acquiring scene corpus data sets of required abstracts, and expressing the scene corpus data sets of the abstracts as { T }1,T2,T3,......,TmWhere T isiThe method comprises the steps of setting an article set under the ith scene, wherein i is more than or equal to 1 and less than or equal to m, i is a positive integer, m is the number of scenes, and setting the number of articles under each scene in a scene corpus data set of an abstract as a vector { l [/L ]1,l2,l3,......,lmIn which liIs TiThe weight vector [ lambda ] of each scene is constructed for the dictionaryi0,λi1,λi2,......,λinIn which λ isijThe summary weight corresponding to the characters coded as j in the dictionary under the ith scene is represented, j is more than or equal to 0 and less than or equal to n, n is the number of the characters in the dictionary,Nijnumber of articles appearing in ith scene for dictionary-coded jkIs TkNumber of documents, TkThe article set under the kth scene is obtained;
a-5) initializing a coding-decoding model of a neural network, extracting an article from an original corpus data set, extracting a plurality of sentences from the article, and forming the sentences into a sentence set;
a-6) inputting the sentence set into an encoder in an encoding-decoding model, decoding a decoder by using a decoding algorithm, and recording a decoding result and character probability distribution of each position in the decoding process;
a-7) sequentially inputting characters of the decoding result into the smoothness judging model, and recording character probability distribution of each position output by the smoothness judging model;
a-8) calculating the character probability distribution of each position in the decoding process and the error of the position corresponding to the character probability distribution of each position output by the currency judging model;
a-9) adjusting a coding-decoding model by using a neural network optimization algorithm, optimizing the error in the step a-8), stopping training if the error is minimum, and otherwise, skipping to execute the step a-5);
the model use comprises the following steps:
b-1) giving the articles waiting for summarization, and forming a set { S ] by breaking the articles waiting for summarization according to the appearance sequence of the sentences1,S2,S3,......,SoO is the number of sentences, i-th sentence SiHas a length of LiI is more than or equal to 1 and less than or equal to o; b-2) the first sentence S in the article to be summarized1The corresponding set of words isSearching each character of the first sentence from the dictionary for a corresponding number, taking out a corresponding vector from the trained high-dimensional semantic space, and sequentially arranging the taken-out vectors according to the sequence of the corresponding characters to form a vector sequence;
b-3) repeating the step b-2) to obtain each sentence S of the article to be abstracted1To SoUsing V, usingSijRepresenting the jth vector in the ith sentence;
b-4) weight vector from step a-4) { λi0,λi1,λi2,......,λinTake out the word weight vector of k scenes and represent as { lambda }k0,λk1,λk2,......,λkn};
b-5) defining the sentence selection vector of length o as { h }1,h2,h3,......,hoIf the selection vector h of the ith sentence is hiEqual to 0 indicates the set S1,S2,S3,......,SoThe ith sentence S iniNot in the extracted key sentence set, if hiEqual to 1 indicates a set S1,S2,S3,......,SoThe ith sentence S iniWithin the extracted set of key sentences;
b-6) using the formulaCalculating the sentence selection vector with the maximum value of the objective function, wherein lambda is in the formulakiFor the summary weight, h, corresponding to the text coded i of the k scenejValue of the selection vector, V, for the jth sentenceiFor words coded as i, corresponding to vectors, V, in a high-dimensional semantic spaceSjtIs a vector in a high-dimensional semantic space corresponding to the t character of the jth sentence in the sentence set to be abstracted, | ViL is ViModulo length, | V, of the vectorSjtL is VSjtThe die length of (2);
b-7) extracting sentences corresponding to the value equal to 1 in the sentence selection vectors calculated in b-6) to form a key sentence set;
b-8) inputting the key sentence set into the step a-6), replacing the sentence set with the key sentence set, inputting the key sentence set into the trained coding-decoding model, and decoding the decoder to obtain the final document abstract.
2. The multi-scenario text summary generation method of claim 1, characterized in that: the characters in the step a) are Chinese characters, English words or numbers, continuous numbers with decimal points and continuous English integers are extracted, tab marks and space marks are deleted, and codes of all characters are stored in a json format.
3. The multi-scenario text abstract generating method according to claim 1, wherein the training method of the compliance discriminant model in the step a-2) comprises the following steps:
a-2.1) initializing a compliance judgment model in a neural network, extracting an article from the obtained original corpus data set, and inputting an initial character of the article into the compliance judgment model;
a-2.2) outputting the probability distribution of the first character of the article by the currency judging model, carrying out error calculation on the first character of the article and the probability distribution output by the currency judging model, and recording errors;
a-2.3) forming a sequence with the length of 2 by the initial character of the article and the first character in the article, inputting the sequence into a smoothness judging model, outputting the probability distribution of the 2 nd character in the article by the smoothness judging model, and performing error calculation and recording the error by using the 2 nd character in the article and the probability distribution output by the smoothness judging model;
a-2.4) repeating the step a-2.3) until obtaining the probability distribution of all characters output by the smoothness discriminant model, calculating the error between the probability distribution of all characters and the end symbol of the article and recording the error
a-2.5) using the error recorded in the step a-2.4) to carry out optimization training on the parameter of the compliance discriminant model in the neural network, stopping the training if the obtained error in the step a-5) is minimum, otherwise, skipping to execute the step a-2.1).
4. The method for generating a multi-scenario text summary according to claim 1, wherein the training of the high-dimensional semantic space in step a-3) includes the following steps:
a-3.1) initialization vector set V0,V1,V2,V3,......,VnIn which V isiIs high-dimensional vector representation of ith character in dictionary in a-1), wherein i is more than or equal to 1 and less than or equal to n, V0Representing the high-dimensional vector of characters which do not exist in the dictionary, wherein n is the number of the characters in the dictionary;
a-3.2) extracting continuous k characters from the original corpus data set to form character fragments;
a-3.3) converting each word in the character segments into corresponding codes by using a dictionary to form the code sequences corresponding to the character segments;
a-3.4) assembling the coding sequence from a set of vectors { V }0,V1,V2,V3,......,VnSequentially extracting corresponding vectors to form a vector sequence;
a-3.5) optimizing vector set, stopping training until the cosine similarity of any two vectors in the vector sequence is maximum and the cosine similarity of the vectors outside the vector sequence is minimum, otherwise, skipping to execute the step a-3.2).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911264821.5A CN111046672B (en) | 2019-12-11 | 2019-12-11 | Multi-scene text abstract generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911264821.5A CN111046672B (en) | 2019-12-11 | 2019-12-11 | Multi-scene text abstract generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111046672A true CN111046672A (en) | 2020-04-21 |
CN111046672B CN111046672B (en) | 2020-07-14 |
Family
ID=70235588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911264821.5A Active CN111046672B (en) | 2019-12-11 | 2019-12-11 | Multi-scene text abstract generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111046672B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116976290A (en) * | 2023-06-19 | 2023-10-31 | 珠海盈米基金销售有限公司 | Multi-scene information abstract generation method and device based on autoregressive model |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102945228A (en) * | 2012-10-29 | 2013-02-27 | 广西工学院 | Multi-document summarization method based on text segmentation |
CN107092674A (en) * | 2017-04-14 | 2017-08-25 | 福建工程学院 | The automatic abstracting method and system of a kind of Chinese medicine acupuncture field event trigger word |
CN108062351A (en) * | 2017-11-14 | 2018-05-22 | 厦门市美亚柏科信息股份有限公司 | Text snippet extracting method, readable storage medium storing program for executing on particular topic classification |
US20180373986A1 (en) * | 2017-06-26 | 2018-12-27 | QbitLogic, Inc. | Machine learning using dynamic multilayer perceptrons |
CN109977981A (en) * | 2017-12-27 | 2019-07-05 | 深圳市优必选科技有限公司 | Scene analytic method, robot and storage device based on binocular vision |
CN110134964A (en) * | 2019-05-20 | 2019-08-16 | 中国科学技术大学 | A kind of text matching technique based on stratification convolutional neural networks and attention mechanism |
CN110162778A (en) * | 2019-04-02 | 2019-08-23 | 阿里巴巴集团控股有限公司 | The generation method and device of text snippet |
CN110196903A (en) * | 2019-05-06 | 2019-09-03 | 中国海洋大学 | A kind of method and system for for article generation abstract |
CN110210037A (en) * | 2019-06-12 | 2019-09-06 | 四川大学 | Category detection method towards evidence-based medicine EBM field |
CN110263257A (en) * | 2019-06-24 | 2019-09-20 | 北京交通大学 | Multi-source heterogeneous data mixing recommended models based on deep learning |
CN110287309A (en) * | 2019-06-21 | 2019-09-27 | 深圳大学 | The method of rapidly extracting text snippet |
CN110347819A (en) * | 2019-06-21 | 2019-10-18 | 同济大学 | A kind of text snippet generation method based on positive negative sample dual training |
CN110362654A (en) * | 2018-04-09 | 2019-10-22 | 谢碧青 | Target group data acquires classification method |
CN110473636A (en) * | 2019-08-22 | 2019-11-19 | 山东众阳健康科技集团有限公司 | Intelligent doctor's advice recommended method and system based on deep learning |
CN110491465A (en) * | 2019-08-20 | 2019-11-22 | 山东众阳健康科技集团有限公司 | Classification of diseases coding method, system, equipment and medium based on deep learning |
CN110532328A (en) * | 2019-08-26 | 2019-12-03 | 哈尔滨工程大学 | A kind of text concept figure building method |
-
2019
- 2019-12-11 CN CN201911264821.5A patent/CN111046672B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102945228A (en) * | 2012-10-29 | 2013-02-27 | 广西工学院 | Multi-document summarization method based on text segmentation |
CN107092674A (en) * | 2017-04-14 | 2017-08-25 | 福建工程学院 | The automatic abstracting method and system of a kind of Chinese medicine acupuncture field event trigger word |
US20180373986A1 (en) * | 2017-06-26 | 2018-12-27 | QbitLogic, Inc. | Machine learning using dynamic multilayer perceptrons |
CN108062351A (en) * | 2017-11-14 | 2018-05-22 | 厦门市美亚柏科信息股份有限公司 | Text snippet extracting method, readable storage medium storing program for executing on particular topic classification |
CN109977981A (en) * | 2017-12-27 | 2019-07-05 | 深圳市优必选科技有限公司 | Scene analytic method, robot and storage device based on binocular vision |
CN110362654A (en) * | 2018-04-09 | 2019-10-22 | 谢碧青 | Target group data acquires classification method |
CN110162778A (en) * | 2019-04-02 | 2019-08-23 | 阿里巴巴集团控股有限公司 | The generation method and device of text snippet |
CN110196903A (en) * | 2019-05-06 | 2019-09-03 | 中国海洋大学 | A kind of method and system for for article generation abstract |
CN110134964A (en) * | 2019-05-20 | 2019-08-16 | 中国科学技术大学 | A kind of text matching technique based on stratification convolutional neural networks and attention mechanism |
CN110210037A (en) * | 2019-06-12 | 2019-09-06 | 四川大学 | Category detection method towards evidence-based medicine EBM field |
CN110287309A (en) * | 2019-06-21 | 2019-09-27 | 深圳大学 | The method of rapidly extracting text snippet |
CN110347819A (en) * | 2019-06-21 | 2019-10-18 | 同济大学 | A kind of text snippet generation method based on positive negative sample dual training |
CN110263257A (en) * | 2019-06-24 | 2019-09-20 | 北京交通大学 | Multi-source heterogeneous data mixing recommended models based on deep learning |
CN110491465A (en) * | 2019-08-20 | 2019-11-22 | 山东众阳健康科技集团有限公司 | Classification of diseases coding method, system, equipment and medium based on deep learning |
CN110473636A (en) * | 2019-08-22 | 2019-11-19 | 山东众阳健康科技集团有限公司 | Intelligent doctor's advice recommended method and system based on deep learning |
CN110532328A (en) * | 2019-08-26 | 2019-12-03 | 哈尔滨工程大学 | A kind of text concept figure building method |
Non-Patent Citations (3)
Title |
---|
YANG LIU等: "Text Summarization with Pretrained Encoders", 《HTTPS://ARXIV.ORG/PDF/1908.08345.PDF》 * |
张弛: "基于语义重构的文本摘要算法", 《中国优秀硕士学位论文全文数据库·信息科技辑》 * |
胡莺夕: "基于深度学习的多实体关系识别及自动文本摘要方法研究与实现", 《中国优秀硕士学位论文全文数据库·信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116976290A (en) * | 2023-06-19 | 2023-10-31 | 珠海盈米基金销售有限公司 | Multi-scene information abstract generation method and device based on autoregressive model |
CN116976290B (en) * | 2023-06-19 | 2024-03-19 | 珠海盈米基金销售有限公司 | Multi-scene information abstract generation method and device based on autoregressive model |
Also Published As
Publication number | Publication date |
---|---|
CN111046672B (en) | 2020-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111897949B (en) | Guided text abstract generation method based on Transformer | |
CN110209801B (en) | Text abstract automatic generation method based on self-attention network | |
CN110795556B (en) | Abstract generation method based on fine-grained plug-in decoding | |
CN109190131B (en) | Neural machine translation-based English word and case joint prediction method thereof | |
CN111694924B (en) | Event extraction method and system | |
CN110275936B (en) | Similar legal case retrieval method based on self-coding neural network | |
CN111178093B (en) | Neural machine translation system training acceleration method based on stacking algorithm | |
CN111209749A (en) | Method for applying deep learning to Chinese word segmentation | |
CN112329482A (en) | Machine translation method, device, electronic equipment and readable storage medium | |
CN112749253B (en) | Multi-text abstract generation method based on text relation graph | |
CN115438154A (en) | Chinese automatic speech recognition text restoration method and system based on representation learning | |
CN115034218A (en) | Chinese grammar error diagnosis method based on multi-stage training and editing level voting | |
CN115310448A (en) | Chinese named entity recognition method based on combining bert and word vector | |
CN113065349A (en) | Named entity recognition method based on conditional random field | |
CN114912453A (en) | Chinese legal document named entity identification method based on enhanced sequence features | |
CN113423004A (en) | Video subtitle generating method and system based on decoupling decoding | |
CN113221542A (en) | Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening | |
CN115658898A (en) | Chinese and English book entity relation extraction method, system and equipment | |
CN111046672B (en) | Multi-scene text abstract generation method | |
CN111444720A (en) | Named entity recognition method for English text | |
CN115759119B (en) | Financial text emotion analysis method, system, medium and equipment | |
CN112364647A (en) | Duplicate checking method based on cosine similarity algorithm | |
CN112989839A (en) | Keyword feature-based intent recognition method and system embedded in language model | |
CN110704664A (en) | Hash retrieval method | |
CN116069924A (en) | Text abstract generation method and system integrating global and local semantic features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 12 / F, building 1, Aosheng building, 1166 Xinluo street, hi tech Zone, Jinan City, Shandong Province Patentee after: Zhongyang Health Technology Group Co.,Ltd. Address before: 12 / F, building 1, Aosheng building, 1166 Xinluo street, high tech Zone, Jinan City, Shandong Province Patentee before: SHANDONG MSUNHEALTH TECHNOLOGY GROUP Co.,Ltd. |
|
CP03 | Change of name, title or address |