CN110362797A - A kind of research report generation method and relevant device - Google Patents
A kind of research report generation method and relevant device Download PDFInfo
- Publication number
- CN110362797A CN110362797A CN201910513763.9A CN201910513763A CN110362797A CN 110362797 A CN110362797 A CN 110362797A CN 201910513763 A CN201910513763 A CN 201910513763A CN 110362797 A CN110362797 A CN 110362797A
- Authority
- CN
- China
- Prior art keywords
- research report
- report
- research
- word
- dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of research report generation method and relevant devices, the present invention, which passes through, constructs research report dictionary using multiple research reports of acquisition, further according to event text, research report dictionary, outline generates model and report generation model exports corresponding research report automatically, wherein, outline generates model and report generation model and selects word composition word sequence as framework for reporting and research report from research report dictionary according to probability principle of optimality, overcome and the technical issues of manual compiling report expends great effort and cost of labor exists in the prior art, and the research report quality generated according to framework for reporting is higher.
Description
Technical field
The present invention relates to report generation field, especially a kind of research report generation method and relevant device.
Background technique
LSTM (Long Short-Term Memory) is shot and long term memory network, is a kind of time Recognition with Recurrent Neural Network,
It is suitable for being spaced and postpone relatively long critical event in processing and predicted time sequence.
Variation self-encoding encoder (Variational Auto-Encoder, VAE) is a kind of depth generation model.
Work is write in the report of a large amount of fixed format involved in financial field, such as grinds report, prospectus, and throw
The letter of intent is provided, the report of different industries different company has different requirements.These report write often require that high-timeliness, with
And the work such as a large amount of data collection, analysis are carried out, in the prior art, usually artificially collects data, analysis data and write
Report, therefore cost of labor is high, and needs to expend a large amount of energy of people.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.For this purpose, of the invention
One purpose is to provide a kind of research report generation method and relevant device, for automatically generating research report according to event text
It accuses.
The technical scheme adopted by the invention is that:
In a first aspect, the present invention provides a kind of research report generation method, comprising:
Research report acquisition step: multiple research reports from multiple information sources are obtained;
Dictionary obtaining step: data prediction and feature selecting are carried out to construct research report to multiple research reports
Dictionary;
Outline obtaining step: model is generated according to event text, the research report dictionary and outline and obtains the event
The corresponding framework for reporting of text, the outline generation model select from the research report dictionary more according to probability principle of optimality
A word composition word sequence is as the framework for reporting;
Report generation step: according to the event text, the framework for reporting, the research report dictionary and report generation
Model obtains research report, and the report generation model selects from the research report dictionary multiple according to probability principle of optimality
Word forms word sequence as the research report.
Further, the report generation model selects from the research report dictionary and exports word one by one to generate
The research report.
Further, the outline generates model according to the event text, the framework for reporting, the research report word
The upper word that allusion quotation, the report generation step export updates the framework for reporting.
Further, the dictionary obtaining step further include: addition beginning label and closing tag to the research report
Dictionary.
Further, the outline generation model includes:
Vector expression is carried out to obtain event to the event text, beginning label according to the research report dictionary
Vector sum starts label vector;
The hidden layer state of the event text is obtained according to the event vector, the beginning label vector and LSTM network
With the hidden layer state of the beginning label;
According to the hidden layer state of the event text, the hidden layer state and the acquisition of attention mechanism of the beginning label
Framework for reporting.
Further, the outline generates model further include:
The upper word that the report generation step is exported according to the research report dictionary carry out vector indicate with
Obtain word vector;
The hidden layer state of word is obtained according to the word vector sum LSTM network;
According to the hidden layer state of the word, the hidden layer state of the event text and the attention new mechanism
Framework for reporting.
Further, the outline generation model includes:
The framework for reporting is obtained according to the event text, beginning label and Transformer model;
A upper word, the event text and the Transformer mould exported according to the report generation step
Type updates the framework for reporting.
Further, the report generation model includes that VAE generates model.
Further, the research report generation method further include:
The research report obtained according to the research report acquisition step carries out event Entity recognition to obtain corresponding thing
Part text, multiple research reports and corresponding event text form training dataset, and the training dataset is for described in training
Outline generates model and the report generation model.
Second aspect, the present invention provide a kind of research report generating means, comprising:
Research report acquisition module, for obtaining multiple research reports from multiple information sources;
Dictionary obtains module, for carrying out data prediction and feature selecting to multiple research reports to construct research
Report dictionary;
Outline obtains module, for according to the generation model acquisition of event text, the research report dictionary and outline
The corresponding framework for reporting of event text, the outline generate model and select from the research report dictionary according to probability principle of optimality
Multiple word composition word sequences are selected as the framework for reporting;
Report generation module, for according to the event text, the framework for reporting, the research report dictionary and report
It generates model and obtains research report, the report generation model is selected from the research report dictionary according to probability principle of optimality
Multiple word composition word sequences are as the research report.
The third aspect, the present invention provide a kind of computer readable storage medium, the computer-readable recording medium storage
There are computer executable instructions, the computer executable instructions are used to that the computer to be made to execute the research report to generate
Method.
The beneficial effects of the present invention are:
The present invention, which passes through, constructs research report dictionary using multiple research reports of acquisition, further according to event text, research
Report dictionary, outline generate model and report generation model and export corresponding research report automatically, wherein outline generate model and
Report generation model selects word composition word sequence as framework for reporting according to probability principle of optimality from research report dictionary
And research report, overcome and the technical issues of manual compiling report expends great effort and cost of labor exists in the prior art, and
And it is higher according to the research report quality that framework for reporting generates.
Detailed description of the invention
Fig. 1 is a kind of method flow diagram of embodiment of research report generation method in the present invention;
Fig. 2 is the method flow diagram of the first specific embodiment of research report generation method in the present invention;
Fig. 3 is the training process schematic diagram of Fig. 2;
Fig. 4 is that the example schematic for grinding report is generated using the research report generation method in the present invention;
Fig. 5 is the method flow diagram of second of specific embodiment of research report generation method in the present invention;
Fig. 6 is the training process schematic diagram of Fig. 5;
Fig. 7 is a kind of structural block diagram of embodiment of research report generating means in the present invention.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.
Embodiment 1
The thought of research report generation method is mainly in the present embodiment: as that can draft an outline one before writing
Sample first generates the outline of a research report (hereinafter referred to as grinding report) according to event text, raw further according to outline and event text
Report is ground at final.Herein, event text refers to event narrative text, that is, describe one or more events (or
Problem) text set, such as the event money article (hereinafter referred to as news) in financial field, people carry out according to news
Macroscopic analysis is to generate macro -examination report (hereinafter referred to as macroscopic view grinds report);The for another example argumentative writing writing in college entrance examination Chinese language, topic
In give some problem or the text of some event, it is desirable that write according to this text, this composition write out is also
Research report quite in this article.It specifically, is a kind of implementation of research report generation method in the present invention with reference to Fig. 1, Fig. 1
The method flow diagram of example;Research report generation method includes:
Research report acquisition step: multiple research reports from multiple information sources are obtained, information source can be internet
Or papery data source (such as library), by taking financial field as an example, the information source that report is ground in acquisition can be Sina's finance and economics, east
The financial web sites such as wealth and straight flush, and the information source write a composition can be the composition resource on library or network;
Dictionary obtaining step: data prediction and feature selecting are carried out to multiple research reports of acquisition to construct research report
Accuse dictionary, data prediction include word segmentation processing is carried out to research report, then count word in grinding report the number that occurs with into
Row feature selecting decides whether for word to be put into research report dictionary according to frequency of occurrence;
Outline obtaining step: it is corresponding that model acquisition event text is generated according to event text, research report dictionary and outline
Framework for reporting, outline generates model and selects multiple words composition word sequences from research report dictionary according to probability principle of optimality
Column are used as framework for reporting, and the method according to probability principle of optimality selection word can be the word using global search optimum probability
The method of output, i.e., primary only to export a word, multiple word composition word sequences are as framework for reporting;It is also possible to use
Beam search algorithm searches for the method for obtaining the word output of preset number optimum probability, i.e., once output preset number is single
Word, the word repeatedly exported select multiple word composition word sequences as framework for reporting further according to the optimal principle of probability;
Report generation step: it is obtained and is studied according to event text, framework for reporting, research report dictionary and report generation model
Report, report generation model according to probability principle of optimality selected from research report dictionary multiple words composition word sequences as
Research report, it is similar to outline generation, it can be using the word output of global search optimum probability or using beam-search
Algorithms selection word is exported, and is repeated no more.
The present invention, which passes through, constructs research report dictionary using multiple research reports of acquisition, further according to event text, research
Report dictionary, outline generate model and report generation model exports corresponding research report automatically, overcomes and exists in the prior art
The technical issues of manual compiling report expends great effort and cost of labor, liberates manpower, improves the delivery efficiency for grinding report, reduces
Human cost.And the research report quality generated according to framework for reporting is higher.Further, research report generation method is whole
Body frame is that (list entries is exactly converted to a fixation to Encoder-Decoder (coding-decoding) structure by so-called coding
The vector of length;Decoding, is exactly then converted into output sequence for the fixed vector generated before.), that is, a sequence is inputted, it is defeated
A sequence out.The process of Encoder is that event text sequence is become to the vector expression an of regular length, Decoder's
Process is that the vector expression of this regular length is become to the text sequence of variable-length, i.e. research report.
Below by financial field grind report for, research report acquisition and the building process of research report dictionary are said
It is bright:
Firstly, crawling the websites such as Sina's finance and economics, east wealth and straight flush in the research report text of macroeconomy plate
(110,000 data in total).The Chinese character and Chinese punctuation mark ground in report is extracted by regular expression again.Due to number
Symbol is difficult to directly generate not having to as a result, thus casting out for high accuracy.Then use jieba participle by text data cutting for
Single word is to obtain set of words, in order to improve the efficiency of calculating, the word for repeatedly occurring over 5 times in set of words is selected then to put
Enter in research report dictionary.In addition, in dictionary obtaining step, addition beginning label and closing tag to research report dictionary.Tool
Body, the first four key (key can be understood as the element in dictionary) of research report dictionary are respectively mask label (mask), not
Know label (unk), beginning label (start), closing tag (end).Wherein, mask label (mask) masks out not for indicating
The information needed.Unknown mark (unk) is used to indicate the word in set of words not in dictionary, usually infrequently occurs but have to contain
The word of justice, such as mechanism name, name.Beginning label (start) and closing tag (end) for being added in each textual data respectively
According to beginning and end, to indicate the beginning and end of text data.The key of dictionary is word (including single punctuation mark),
The value of dictionary is the serial number of word.
In addition, research report generation method further include:
The research report obtained according to research report acquisition step carries out event Entity recognition to obtain corresponding event text
This, multiple research reports and corresponding event text form training dataset, and training dataset generates model for drill outline
With report generation model.By taking the research report of financial field as an example, when crawl to obtain macroscopic view grind report after, utilize rule match and word
Property matching treatment macroscopic view grind and respond with acquisition newsletter archive, the text for containing event related content is extracted as news, thus will
Data become one-to-one news and macroscopic view grinds the set of report as training dataset.Specifically, research report can be carried out
Array is cut and is saved as in segmentation, then passes through the event rules set to the text in array and matched with acquisition event section
It falls as newsletter archive.
In the present embodiment, two kinds of outlines are provided and generate model, the first outline generates model reference Fig. 2, and Fig. 2 is the present invention
The method flow diagram of the first specific embodiment of middle research report generation method;Using the news of financial field as event text
For be illustrated, the first outline generate model include:
Vector expression is carried out to obtain thing to event text, beginning label (<start>) respectively according to research report dictionary
Part vector sum starts label vector;It, can be by word embedding (word insertion vector) by a text in the present embodiment
The vector for being converted into regular length indicates.
According to the hidden layer state of event vector and two-way LSTM network acquisition events text, according to beginning label vector and list
The hidden layer state of beginning label is obtained to LSTM network, wherein the processing time needed for unidirectional LSTM network is less, still, double
To LSTM network can preferably capture two-way semantic dependency;
Framework for reporting is obtained according to the hidden layer state of event text, the hidden layer state of beginning label and attention mechanism;
Carrying out vector to the upper word that report generation step exports further according to research report dictionary indicates to obtain list
Word equally can be switched to word vector using word embedding by term vector;
The hidden layer state of word is obtained according to the unidirectional LSTM network of word vector sum;
According to the hidden layer state of word, the hidden layer state of event text and attention new mechanism framework for reporting.
In the present embodiment, report generation model includes that VAE generates model, the first outline generates in model, by paying attention to
The probability distribution sequence of available word after power mechanism, outline generate model and grind further according to the probability distribution information selection of word
The word combination for studying carefully optimum probability in report dictionary exports vocabulary as framework for reporting.And similarly, in report generation model, warp
It crosses VAE and generates another available probability distribution information after model treatment, report generation model is further according to probability distribution information
The word combination output composition of optimum probability grinds report in Selecting research report dictionary.Specifically, with reference to Fig. 2, the beginning of report will be ground
It is marked with<start>to mark, ending is marked to mark with<end>.Model is generated using the first outline and VAE generates model
The probability distribution for reporting next word is ground in prediction since<start>label, is selected from research report dictionary further according to probability distribution
Word output is selected to generate research report.Briefly, when research report generation method executes, firstly, input news and one
Start<start>label, the first word of report is ground in output after outline generates model and report generation model treatment, this first
The word of a output is back to input terminal and replaces beginning label, regenerates second output, second output returns again to input
End replaces first output, and so on, word is exported one by one, is that closing tag<end>then stops until exporting, final
Research report generation finishes.The word currently exported need to rely on it is upper one output word, according to upper one export
The news updating record outline of word and input, effectively increases the quality for grinding report of generation.Wherein, above-mentioned according to probability distribution
It, can be by global search optimal solution in research report when selecting word output from research report dictionary to generate research report
The maximum word of dictionary select probability (i.e. output only exports a word every time), but when research report dictionary is very huge
When, the space efficiency of global search optimal solution is very low.It is therefore possible to use beam search algorithm improves search efficiency, boundling
Search is limited in the number of a possibility that each step remains word using beam size parameter, not only only accounts for single list
The probability of word also contemplates the probability that front and back word is put together.Therefore, beam-search is utilized after VAE generates model treatment
Algorithm obtains a possibility that preset number maximum probability word, in the present embodiment, setting beam size=3, i.e. preset number
It is 3, each step retains the result of three maximum possible.Prediction each time can obtain 3 outputs, then this 3 outputs are returned
It returns input terminal to be predicted next time, after prediction, further according to all for generating the word for grinding report of output, according to general
Rate principle of optimality, every 3 output selects one to export as final output, to obtain final grinding report.
Outline generates model and VAE generates model and needs to be trained before formal use, is Fig. 2 with reference to Fig. 3, Fig. 3
Training process schematic diagram;Using the news of financial field as event text, training for model is generated to the first outline below
Journey is illustrated:
The news for defining input first is X, and x is the word in news, and the expression formula of news is as follows,
X=(x1,…,xm) (1)
There are two the stage, it is the word ground in report outline that the first stage, which generates and grinds the outline O, o of report, for the decoding of latent variable,
Second stage generates final grinding and reports Y, and y is the word ground in report, and the length that definition generates text is L, and expression formula is as follows:
O=(o1,…,oL) (2)
Y=(y1,…,yL) (3)
Report is ground to the news of training dataset and macroscopic view again, extraction Chinese and Chinese character are matched by canonical, then
After jieba participle by text as unit of word, a preliminary length statistics is done, the length statistical data finally obtained is such as
Shown in table 1:
1 news of table is counted with report length is ground
Then in order to train needs, by news and report progress " truncate and mend length " is ground to the same length being arranged, such as news
Length is 30 words, and grinding report length is 200 words.It inputs newsletter archive and grinds message sheet and become by word embedding
It indicates to grind report vector to obtain news vector sum at vector, directlys adopt one layer of Embedding network herein.Due to LSTM network
When to sentence modeling, information from back to front can not be encoded, and two-way LSTM network can preferably capture it is two-way
Semantic dependency.Then the news vector after Embedding is input in two-way LSTM network.Wherein, news vector is defeated
The hidden layer state for entering available news after two-way LSTM network is expressed as H, and h is hidden level sub-states, and t is time, the expression formula of h
It is as follows:
In order to generate the report that grinds of high quality, model based coding and decoded process need to fully absorb the corresponding structure for grinding report and
Content, Decoder process grind the probability distribution in vocabulary for reporting next word by LSTM neural network forecast.Macroscopic view grinds the hidden of report
Layer state is expressed as S, and the hidden layer state of each time step all relied on the hidden layer of time an input and a upper time
State, expression formula are as follows:
The hidden layer state that the hidden layer state and macroscopic view of news grind report calculates attention score (such as formula by attention mechanism
(6)) it, uses the attention of multiplication to accumulate attention score herein, attention is calculated using softmax function and weighs
Weight (such as formula (7)).By the weighted average (such as formula (8)) of attention weight and the hidden layer state of news, i.e., by attention weight
It is multiplied to obtain context vector with the hidden layer state of news.Formula (6), formula (7), the expression formula of formula (8) are as follows:
Then series connection context vector and macroscopic view grind the hidden layer state of report to update the power hidden layer state that gains attention.Input one
The prediction output of a word will be noted power hidden layer state computation and obtain.Expression formula is as follows:
Wherein, Wc is model parameter.
Decoded first stage objective function is as follows:
Wherein, P is probability.
So far, the probability distribution that the word in research report dictionary is obtained according to the news of input, according to probability distribution
The outline of report can be generated.
Decoded second stage, in order to generate the final report that grinds, this stage needs to input news and is generated according to news
Outline.
Decoded model uses variation self-encoding encoder model (VAE), generates mesh by fusion dual input news X and outline O
Mark variable, the Posterior probability distribution of study hidden variable z, P (z | X) following expression can be rewritten as:
Assuming that the Posterior distrbutionp is standardized normal distribution, stochastical sampling further decoding is to original text from distribution.Pass through
It measures reconstruct loss and regularization loss is trained, then ELBO can be expressed as:
logP(X,O)≥Eq(z|x,o)[logp(x,o|z)]-KL(q(z|x,o)||p(z)) (12)
It is ELBO on the right side of inequality above, wherein first item is to sample out attention from P (z | X, O), uses sampling
Attention out is calculated as the input of decoder intersects entropy loss, and Section 2 is to measure two probability distribution by KL divergence
Similarity, it is ensured that Posterior distrbutionp is close to prior distribution.
Decoded second stage objective function is as follows:
So far, the probability distribution information that can obtain word again according to news and framework for reporting, believes according to probability distribution
It ceases and available final grinds report.
Global objective function is obtained by the adduction of the loss function of two decoding stages, and expression formula is as follows:
The training dataset that one-to-one news obtained above and macroscopic view grind report is inputted to the model and 50 wheel of training of Fig. 3
Afterwards, adjustable and determining final model parameter.Specifically, it by taking global optimum is searched for as an example, grinds once only defeated when report generates
A word out, each macroscopic view, which is ground, to be responded with beginning label-macroscopic view and grinds report-closing tag and sequentially input in model, every time one by one
Word input is trained, such as the label of input beginning at the beginning can obtain an output in model output end, this is defeated
Out it is the first word for grinding report of prediction, the word of output is compared with the first word for really grinding report, according to
Comparing result modifies model parameter;Then macroscopic view is ground in the first word input model of report again, then obtains an output, this
A output is second word for grinding report of prediction, and second word that second word and true macroscopic view grind report is carried out
Compare, adjusts model parameter again according to comparing result, constantly reduce the error between model output and real word.With multiple
News-macroscopic view grinds count off according to after being trained to model, and the structure and parameter of model will all preserve after training.According to
The model for finally determining model parameter can predict the news newly inputted, be predicted with the model of Fig. 2, and input is new
Available Fig. 4 grinds report after news, and Fig. 4 is to generate the example for grinding report using the research report generation method in the present invention to illustrate
Figure.
It is the method flow diagram of second of specific embodiment of research report generation method in the present invention with reference to Fig. 5, Fig. 5;
In Fig. 5, by taking the news of financial field as an example, second of outline generates model and includes: event text
Framework for reporting is obtained according to event text, beginning label and Transformer model;
It is big according to a upper word, event text and Transformer the model modification report of report generation step output
Guiding principle.Wherein, report generation model is also to generate model using VAE.And Transformer model is equivalent to and LSTM network is substituted
With attention mechanism, the attention score of Transformer model is solved such as formula (15),
Wherein, Q, K, V are three matrix-vectors for inputting X and being transformed into.Similarly with Fig. 2, the implementation procedure of the model of Fig. 5
After input news and beginning<start>label, report is ground in output after outline generates model and report generation model treatment
First word, this first output word be back to input terminal replace beginning label, and so on, the model of Fig. 5
Output one by one is ground to the word of report.In addition, similarly, outline generates model and report generation model can choose global search most
The method choice word of excellent solution, and beam search algorithm also can be applied in the model of Fig. 5, to improve search efficiency.
It is the training process schematic diagram of Fig. 5 with reference to Fig. 6, Fig. 6;Training dataset is sequentially input model to be trained, with
Fig. 4 similarly, macroscopic view is ground and responds with beginning label-macroscopic view and grinds in report-closing tag form input model, is exported according to model
It compares with really declaration form word is ground to adjust model parameter.
Embodiment 2
Embodiment 2 is provided based on embodiment 1, embodiment 2 provides a kind of research report generating means, is with reference to Fig. 7, Fig. 7
A kind of structural block diagram of embodiment of research report generating means, research report generating means include: in the present invention
Research report acquisition module, for obtaining multiple research reports from multiple information sources;
Dictionary obtains module, for carrying out data prediction and feature selecting to multiple research reports to construct research report
Dictionary;
Outline obtains module, obtains event text for generating model according to event text, research report dictionary and outline
Corresponding framework for reporting, outline generate model and select multiple word compositions single from research report dictionary according to probability principle of optimality
Word sequence is as framework for reporting;
Report generation module, for being obtained according to event text, framework for reporting, research report dictionary and report generation model
Research report, report generation model select multiple word composition word sequences according to probability principle of optimality from research report dictionary
As research report.
The specific work process description of research report generating means can refer to the description of embodiment 1, repeat no more.Using grinding
Study carefully report preparing apparatus and can automatically generate and grind report, liberate manpower, report delivery efficiency is ground in raising.
Embodiment 3
Embodiment 3 is provided based on embodiment 1, embodiment 3 provides a kind of computer readable storage medium, and the computer can
It reads storage medium and is stored with computer executable instructions, the computer executable instructions are strictly according to the facts for executing the computer
Apply research report generation method described in example 1.The specific descriptions of research report generation method can refer to the description of embodiment 1,
It repeats no more.
It is to be illustrated to preferable implementation of the invention, but the invention is not limited to the implementation above
Example, those skilled in the art can also make various equivalent variations on the premise of without prejudice to spirit of the invention or replace
It changes, these equivalent deformations or replacement are all included in the scope defined by the claims of the present application.
Claims (10)
1. a kind of research report generation method characterized by comprising
Research report acquisition step: multiple research reports from multiple information sources are obtained;
Dictionary obtaining step: feature selecting is carried out to construct research report dictionary to multiple research reports;
Outline obtaining step: model is generated according to event text, the research report dictionary and outline and obtains the event text
Corresponding framework for reporting, the outline generate model and select multiple lists from the research report dictionary according to probability principle of optimality
Word forms word sequence as the framework for reporting;
Report generation step: according to the event text, the framework for reporting, the research report dictionary and report generation model
Research report is obtained, the report generation model selects multiple words from the research report dictionary according to probability principle of optimality
Word sequence is formed as the research report.
2. research report generation method according to claim 1, which is characterized in that the report generation model is ground from described
Study carefully and is selected in report dictionary and export word one by one to generate the research report.
3. research report generation method according to claim 2, which is characterized in that the outline generates model according to
The upper word that event text, the framework for reporting, the research report dictionary, the report generation step export updates institute
State framework for reporting.
4. research report generation method according to claim 3, which is characterized in that the dictionary obtaining step further include:
Addition beginning label and closing tag to the research report dictionary.
5. research report generation method according to claim 4, which is characterized in that the outline generates model and includes:
Vector expression is carried out to obtain event vector to the event text, beginning label according to the research report dictionary
With beginning label vector;
Hidden layer state and the institute of the event text are obtained according to the event vector, the beginning label vector and LSTM network
State the hidden layer state of beginning label;
The report is obtained according to the hidden layer state of the event text, the hidden layer state of the beginning label and attention mechanism
Outline;
Carrying out vector to the upper word that the report generation step exports according to the research report dictionary indicates to obtain
Word vector;
The hidden layer state of word is obtained according to the word vector sum LSTM network;
It is reported according to the hidden layer state of the word, the hidden layer state of the event text and the attention new mechanism
Outline.
6. research report generation method according to claim 4, which is characterized in that the outline generates model and includes:
The framework for reporting is obtained according to the event text, beginning label and Transformer model;
More according to a upper word, the event text and the Transformer model of report generation step output
The new framework for reporting.
7. research report generation method according to any one of claims 1 to 6, which is characterized in that the report generation mould
Type includes that VAE generates model.
8. research report generation method according to any one of claims 1 to 6, which is characterized in that the research report is raw
At method further include:
The research report obtained according to the research report acquisition step carries out event Entity recognition to obtain corresponding event text
This, multiple research reports and corresponding event text form training dataset, and the training dataset is for training the outline
Generate model and the report generation model.
9. a kind of research report generating means characterized by comprising
Research report acquisition module, for obtaining multiple research reports from multiple information sources;
Dictionary obtains module, for carrying out feature selecting to multiple research reports to construct research report dictionary;
Outline obtains module, obtains the event for generating model according to event text, the research report dictionary and outline
The corresponding framework for reporting of text, the outline generation model select from the research report dictionary more according to probability principle of optimality
A word composition word sequence is as the framework for reporting;
Report generation module, for according to the event text, the framework for reporting, the research report dictionary and report generation
Model obtains research report, and the report generation model selects from the research report dictionary multiple according to probability principle of optimality
Word forms word sequence as the research report.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer can
It executes instruction, the computer executable instructions are for making the computer execute as claimed in any one of claims 1 to 8 grind
Study carefully report-generating method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910513763.9A CN110362797B (en) | 2019-06-14 | 2019-06-14 | Research report generation method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910513763.9A CN110362797B (en) | 2019-06-14 | 2019-06-14 | Research report generation method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110362797A true CN110362797A (en) | 2019-10-22 |
CN110362797B CN110362797B (en) | 2023-10-13 |
Family
ID=68216086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910513763.9A Active CN110362797B (en) | 2019-06-14 | 2019-06-14 | Research report generation method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110362797B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046629A (en) * | 2019-12-16 | 2020-04-21 | 北大方正集团有限公司 | Outline display method, device and equipment |
CN112242185A (en) * | 2020-09-09 | 2021-01-19 | 山东大学 | Medical image report automatic generation method and system based on deep learning |
CN112417846A (en) * | 2020-11-25 | 2021-02-26 | 中译语通科技股份有限公司 | Text automatic generation method and device, electronic equipment and storage medium |
CN113160963A (en) * | 2020-12-18 | 2021-07-23 | 中电云脑(天津)科技有限公司 | Event determination method and device, electronic equipment and storage medium |
CN114490778A (en) * | 2022-02-15 | 2022-05-13 | 北京固加数字科技有限公司 | Financial research and report automatic generation system and method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140142939A1 (en) * | 2012-11-21 | 2014-05-22 | Algotes Systems Ltd. | Method and system for voice to text reporting for medical image software |
CN109635302A (en) * | 2018-12-17 | 2019-04-16 | 北京百度网讯科技有限公司 | A kind of method and apparatus of training text summarization generation model |
-
2019
- 2019-06-14 CN CN201910513763.9A patent/CN110362797B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140142939A1 (en) * | 2012-11-21 | 2014-05-22 | Algotes Systems Ltd. | Method and system for voice to text reporting for medical image software |
CN109635302A (en) * | 2018-12-17 | 2019-04-16 | 北京百度网讯科技有限公司 | A kind of method and apparatus of training text summarization generation model |
Non-Patent Citations (2)
Title |
---|
DARRYL GRIFFITHS ET AL.: "A Self-Report Study that Gauges Perceived and Induced Emotion with Music", 《2015 INTERNET TECHNOLOGIES AND APPLICATIONS (ITA)》 * |
吴杰 等: "核电厂运行经验报告的定量评估方法", 《发电设备》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046629A (en) * | 2019-12-16 | 2020-04-21 | 北大方正集团有限公司 | Outline display method, device and equipment |
CN111046629B (en) * | 2019-12-16 | 2022-03-01 | 北大方正集团有限公司 | Outline display method, device and equipment |
CN112242185A (en) * | 2020-09-09 | 2021-01-19 | 山东大学 | Medical image report automatic generation method and system based on deep learning |
CN112417846A (en) * | 2020-11-25 | 2021-02-26 | 中译语通科技股份有限公司 | Text automatic generation method and device, electronic equipment and storage medium |
WO2022110454A1 (en) * | 2020-11-25 | 2022-06-02 | 中译语通科技股份有限公司 | Automatic text generation method and apparatus, and electronic device and storage medium |
CN113160963A (en) * | 2020-12-18 | 2021-07-23 | 中电云脑(天津)科技有限公司 | Event determination method and device, electronic equipment and storage medium |
CN114490778A (en) * | 2022-02-15 | 2022-05-13 | 北京固加数字科技有限公司 | Financial research and report automatic generation system and method |
Also Published As
Publication number | Publication date |
---|---|
CN110362797B (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109359293B (en) | Mongolian name entity recognition method neural network based and its identifying system | |
CN110362797A (en) | A kind of research report generation method and relevant device | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
CN110348016B (en) | Text abstract generation method based on sentence correlation attention mechanism | |
CN111177394B (en) | Knowledge map relation data classification method based on syntactic attention neural network | |
CN109635124B (en) | Remote supervision relation extraction method combined with background knowledge | |
Yu et al. | Transition-based neural RST parsing with implicit syntax features | |
CN107844469A (en) | The text method for simplifying of word-based vector query model | |
CN105843801B (en) | The structure system of more translation Parallel Corpus | |
WO2018218708A1 (en) | Deep-learning-based public opinion hotspot category classification method | |
CN110222188A (en) | A kind of the company's bulletin processing method and server-side of multi-task learning | |
CN110287482B (en) | Semi-automatic participle corpus labeling training device | |
CN111858932A (en) | Multiple-feature Chinese and English emotion classification method and system based on Transformer | |
CN109800310A (en) | A kind of electric power O&M text analyzing method based on structuring expression | |
CN111881677A (en) | Address matching algorithm based on deep learning model | |
CN107798624A (en) | A kind of technical label in software Ask-Answer Community recommends method | |
CN109858042A (en) | A kind of determination method and device of translation quality | |
CN103631858A (en) | Science and technology project similarity calculation method | |
CN104699797A (en) | Webpage data structured analytic method and device | |
CN108920446A (en) | A kind of processing method of Engineering document | |
CN111666373A (en) | Chinese news classification method based on Transformer | |
CN115357719A (en) | Power audit text classification method and device based on improved BERT model | |
CN117112850A (en) | Address standardization method, device, equipment and storage medium | |
CN109325243A (en) | Mongolian word cutting method and its word cutting system of the character level based on series model | |
CN112036179B (en) | Electric power plan information extraction method based on text classification and semantic frame |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |