CN110362797A - A kind of research report generation method and relevant device - Google Patents

A kind of research report generation method and relevant device Download PDF

Info

Publication number
CN110362797A
CN110362797A CN201910513763.9A CN201910513763A CN110362797A CN 110362797 A CN110362797 A CN 110362797A CN 201910513763 A CN201910513763 A CN 201910513763A CN 110362797 A CN110362797 A CN 110362797A
Authority
CN
China
Prior art keywords
research report
report
research
word
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910513763.9A
Other languages
Chinese (zh)
Other versions
CN110362797B (en
Inventor
胡文馨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201910513763.9A priority Critical patent/CN110362797B/en
Publication of CN110362797A publication Critical patent/CN110362797A/en
Application granted granted Critical
Publication of CN110362797B publication Critical patent/CN110362797B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of research report generation method and relevant devices, the present invention, which passes through, constructs research report dictionary using multiple research reports of acquisition, further according to event text, research report dictionary, outline generates model and report generation model exports corresponding research report automatically, wherein, outline generates model and report generation model and selects word composition word sequence as framework for reporting and research report from research report dictionary according to probability principle of optimality, overcome and the technical issues of manual compiling report expends great effort and cost of labor exists in the prior art, and the research report quality generated according to framework for reporting is higher.

Description

A kind of research report generation method and relevant device
Technical field
The present invention relates to report generation field, especially a kind of research report generation method and relevant device.
Background technique
LSTM (Long Short-Term Memory) is shot and long term memory network, is a kind of time Recognition with Recurrent Neural Network, It is suitable for being spaced and postpone relatively long critical event in processing and predicted time sequence.
Variation self-encoding encoder (Variational Auto-Encoder, VAE) is a kind of depth generation model.
Work is write in the report of a large amount of fixed format involved in financial field, such as grinds report, prospectus, and throw The letter of intent is provided, the report of different industries different company has different requirements.These report write often require that high-timeliness, with And the work such as a large amount of data collection, analysis are carried out, in the prior art, usually artificially collects data, analysis data and write Report, therefore cost of labor is high, and needs to expend a large amount of energy of people.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.For this purpose, of the invention One purpose is to provide a kind of research report generation method and relevant device, for automatically generating research report according to event text It accuses.
The technical scheme adopted by the invention is that:
In a first aspect, the present invention provides a kind of research report generation method, comprising:
Research report acquisition step: multiple research reports from multiple information sources are obtained;
Dictionary obtaining step: data prediction and feature selecting are carried out to construct research report to multiple research reports Dictionary;
Outline obtaining step: model is generated according to event text, the research report dictionary and outline and obtains the event The corresponding framework for reporting of text, the outline generation model select from the research report dictionary more according to probability principle of optimality A word composition word sequence is as the framework for reporting;
Report generation step: according to the event text, the framework for reporting, the research report dictionary and report generation Model obtains research report, and the report generation model selects from the research report dictionary multiple according to probability principle of optimality Word forms word sequence as the research report.
Further, the report generation model selects from the research report dictionary and exports word one by one to generate The research report.
Further, the outline generates model according to the event text, the framework for reporting, the research report word The upper word that allusion quotation, the report generation step export updates the framework for reporting.
Further, the dictionary obtaining step further include: addition beginning label and closing tag to the research report Dictionary.
Further, the outline generation model includes:
Vector expression is carried out to obtain event to the event text, beginning label according to the research report dictionary Vector sum starts label vector;
The hidden layer state of the event text is obtained according to the event vector, the beginning label vector and LSTM network With the hidden layer state of the beginning label;
According to the hidden layer state of the event text, the hidden layer state and the acquisition of attention mechanism of the beginning label Framework for reporting.
Further, the outline generates model further include:
The upper word that the report generation step is exported according to the research report dictionary carry out vector indicate with Obtain word vector;
The hidden layer state of word is obtained according to the word vector sum LSTM network;
According to the hidden layer state of the word, the hidden layer state of the event text and the attention new mechanism Framework for reporting.
Further, the outline generation model includes:
The framework for reporting is obtained according to the event text, beginning label and Transformer model;
A upper word, the event text and the Transformer mould exported according to the report generation step Type updates the framework for reporting.
Further, the report generation model includes that VAE generates model.
Further, the research report generation method further include:
The research report obtained according to the research report acquisition step carries out event Entity recognition to obtain corresponding thing Part text, multiple research reports and corresponding event text form training dataset, and the training dataset is for described in training Outline generates model and the report generation model.
Second aspect, the present invention provide a kind of research report generating means, comprising:
Research report acquisition module, for obtaining multiple research reports from multiple information sources;
Dictionary obtains module, for carrying out data prediction and feature selecting to multiple research reports to construct research Report dictionary;
Outline obtains module, for according to the generation model acquisition of event text, the research report dictionary and outline The corresponding framework for reporting of event text, the outline generate model and select from the research report dictionary according to probability principle of optimality Multiple word composition word sequences are selected as the framework for reporting;
Report generation module, for according to the event text, the framework for reporting, the research report dictionary and report It generates model and obtains research report, the report generation model is selected from the research report dictionary according to probability principle of optimality Multiple word composition word sequences are as the research report.
The third aspect, the present invention provide a kind of computer readable storage medium, the computer-readable recording medium storage There are computer executable instructions, the computer executable instructions are used to that the computer to be made to execute the research report to generate Method.
The beneficial effects of the present invention are:
The present invention, which passes through, constructs research report dictionary using multiple research reports of acquisition, further according to event text, research Report dictionary, outline generate model and report generation model and export corresponding research report automatically, wherein outline generate model and Report generation model selects word composition word sequence as framework for reporting according to probability principle of optimality from research report dictionary And research report, overcome and the technical issues of manual compiling report expends great effort and cost of labor exists in the prior art, and And it is higher according to the research report quality that framework for reporting generates.
Detailed description of the invention
Fig. 1 is a kind of method flow diagram of embodiment of research report generation method in the present invention;
Fig. 2 is the method flow diagram of the first specific embodiment of research report generation method in the present invention;
Fig. 3 is the training process schematic diagram of Fig. 2;
Fig. 4 is that the example schematic for grinding report is generated using the research report generation method in the present invention;
Fig. 5 is the method flow diagram of second of specific embodiment of research report generation method in the present invention;
Fig. 6 is the training process schematic diagram of Fig. 5;
Fig. 7 is a kind of structural block diagram of embodiment of research report generating means in the present invention.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.
Embodiment 1
The thought of research report generation method is mainly in the present embodiment: as that can draft an outline one before writing Sample first generates the outline of a research report (hereinafter referred to as grinding report) according to event text, raw further according to outline and event text Report is ground at final.Herein, event text refers to event narrative text, that is, describe one or more events (or Problem) text set, such as the event money article (hereinafter referred to as news) in financial field, people carry out according to news Macroscopic analysis is to generate macro -examination report (hereinafter referred to as macroscopic view grinds report);The for another example argumentative writing writing in college entrance examination Chinese language, topic In give some problem or the text of some event, it is desirable that write according to this text, this composition write out is also Research report quite in this article.It specifically, is a kind of implementation of research report generation method in the present invention with reference to Fig. 1, Fig. 1 The method flow diagram of example;Research report generation method includes:
Research report acquisition step: multiple research reports from multiple information sources are obtained, information source can be internet Or papery data source (such as library), by taking financial field as an example, the information source that report is ground in acquisition can be Sina's finance and economics, east The financial web sites such as wealth and straight flush, and the information source write a composition can be the composition resource on library or network;
Dictionary obtaining step: data prediction and feature selecting are carried out to multiple research reports of acquisition to construct research report Accuse dictionary, data prediction include word segmentation processing is carried out to research report, then count word in grinding report the number that occurs with into Row feature selecting decides whether for word to be put into research report dictionary according to frequency of occurrence;
Outline obtaining step: it is corresponding that model acquisition event text is generated according to event text, research report dictionary and outline Framework for reporting, outline generates model and selects multiple words composition word sequences from research report dictionary according to probability principle of optimality Column are used as framework for reporting, and the method according to probability principle of optimality selection word can be the word using global search optimum probability The method of output, i.e., primary only to export a word, multiple word composition word sequences are as framework for reporting;It is also possible to use Beam search algorithm searches for the method for obtaining the word output of preset number optimum probability, i.e., once output preset number is single Word, the word repeatedly exported select multiple word composition word sequences as framework for reporting further according to the optimal principle of probability;
Report generation step: it is obtained and is studied according to event text, framework for reporting, research report dictionary and report generation model Report, report generation model according to probability principle of optimality selected from research report dictionary multiple words composition word sequences as Research report, it is similar to outline generation, it can be using the word output of global search optimum probability or using beam-search Algorithms selection word is exported, and is repeated no more.
The present invention, which passes through, constructs research report dictionary using multiple research reports of acquisition, further according to event text, research Report dictionary, outline generate model and report generation model exports corresponding research report automatically, overcomes and exists in the prior art The technical issues of manual compiling report expends great effort and cost of labor, liberates manpower, improves the delivery efficiency for grinding report, reduces Human cost.And the research report quality generated according to framework for reporting is higher.Further, research report generation method is whole Body frame is that (list entries is exactly converted to a fixation to Encoder-Decoder (coding-decoding) structure by so-called coding The vector of length;Decoding, is exactly then converted into output sequence for the fixed vector generated before.), that is, a sequence is inputted, it is defeated A sequence out.The process of Encoder is that event text sequence is become to the vector expression an of regular length, Decoder's Process is that the vector expression of this regular length is become to the text sequence of variable-length, i.e. research report.
Below by financial field grind report for, research report acquisition and the building process of research report dictionary are said It is bright:
Firstly, crawling the websites such as Sina's finance and economics, east wealth and straight flush in the research report text of macroeconomy plate (110,000 data in total).The Chinese character and Chinese punctuation mark ground in report is extracted by regular expression again.Due to number Symbol is difficult to directly generate not having to as a result, thus casting out for high accuracy.Then use jieba participle by text data cutting for Single word is to obtain set of words, in order to improve the efficiency of calculating, the word for repeatedly occurring over 5 times in set of words is selected then to put Enter in research report dictionary.In addition, in dictionary obtaining step, addition beginning label and closing tag to research report dictionary.Tool Body, the first four key (key can be understood as the element in dictionary) of research report dictionary are respectively mask label (mask), not Know label (unk), beginning label (start), closing tag (end).Wherein, mask label (mask) masks out not for indicating The information needed.Unknown mark (unk) is used to indicate the word in set of words not in dictionary, usually infrequently occurs but have to contain The word of justice, such as mechanism name, name.Beginning label (start) and closing tag (end) for being added in each textual data respectively According to beginning and end, to indicate the beginning and end of text data.The key of dictionary is word (including single punctuation mark), The value of dictionary is the serial number of word.
In addition, research report generation method further include:
The research report obtained according to research report acquisition step carries out event Entity recognition to obtain corresponding event text This, multiple research reports and corresponding event text form training dataset, and training dataset generates model for drill outline With report generation model.By taking the research report of financial field as an example, when crawl to obtain macroscopic view grind report after, utilize rule match and word Property matching treatment macroscopic view grind and respond with acquisition newsletter archive, the text for containing event related content is extracted as news, thus will Data become one-to-one news and macroscopic view grinds the set of report as training dataset.Specifically, research report can be carried out Array is cut and is saved as in segmentation, then passes through the event rules set to the text in array and matched with acquisition event section It falls as newsletter archive.
In the present embodiment, two kinds of outlines are provided and generate model, the first outline generates model reference Fig. 2, and Fig. 2 is the present invention The method flow diagram of the first specific embodiment of middle research report generation method;Using the news of financial field as event text For be illustrated, the first outline generate model include:
Vector expression is carried out to obtain thing to event text, beginning label (<start>) respectively according to research report dictionary Part vector sum starts label vector;It, can be by word embedding (word insertion vector) by a text in the present embodiment The vector for being converted into regular length indicates.
According to the hidden layer state of event vector and two-way LSTM network acquisition events text, according to beginning label vector and list The hidden layer state of beginning label is obtained to LSTM network, wherein the processing time needed for unidirectional LSTM network is less, still, double To LSTM network can preferably capture two-way semantic dependency;
Framework for reporting is obtained according to the hidden layer state of event text, the hidden layer state of beginning label and attention mechanism;
Carrying out vector to the upper word that report generation step exports further according to research report dictionary indicates to obtain list Word equally can be switched to word vector using word embedding by term vector;
The hidden layer state of word is obtained according to the unidirectional LSTM network of word vector sum;
According to the hidden layer state of word, the hidden layer state of event text and attention new mechanism framework for reporting.
In the present embodiment, report generation model includes that VAE generates model, the first outline generates in model, by paying attention to The probability distribution sequence of available word after power mechanism, outline generate model and grind further according to the probability distribution information selection of word The word combination for studying carefully optimum probability in report dictionary exports vocabulary as framework for reporting.And similarly, in report generation model, warp It crosses VAE and generates another available probability distribution information after model treatment, report generation model is further according to probability distribution information The word combination output composition of optimum probability grinds report in Selecting research report dictionary.Specifically, with reference to Fig. 2, the beginning of report will be ground It is marked with<start>to mark, ending is marked to mark with<end>.Model is generated using the first outline and VAE generates model The probability distribution for reporting next word is ground in prediction since<start>label, is selected from research report dictionary further according to probability distribution Word output is selected to generate research report.Briefly, when research report generation method executes, firstly, input news and one Start<start>label, the first word of report is ground in output after outline generates model and report generation model treatment, this first The word of a output is back to input terminal and replaces beginning label, regenerates second output, second output returns again to input End replaces first output, and so on, word is exported one by one, is that closing tag<end>then stops until exporting, final Research report generation finishes.The word currently exported need to rely on it is upper one output word, according to upper one export The news updating record outline of word and input, effectively increases the quality for grinding report of generation.Wherein, above-mentioned according to probability distribution It, can be by global search optimal solution in research report when selecting word output from research report dictionary to generate research report The maximum word of dictionary select probability (i.e. output only exports a word every time), but when research report dictionary is very huge When, the space efficiency of global search optimal solution is very low.It is therefore possible to use beam search algorithm improves search efficiency, boundling Search is limited in the number of a possibility that each step remains word using beam size parameter, not only only accounts for single list The probability of word also contemplates the probability that front and back word is put together.Therefore, beam-search is utilized after VAE generates model treatment Algorithm obtains a possibility that preset number maximum probability word, in the present embodiment, setting beam size=3, i.e. preset number It is 3, each step retains the result of three maximum possible.Prediction each time can obtain 3 outputs, then this 3 outputs are returned It returns input terminal to be predicted next time, after prediction, further according to all for generating the word for grinding report of output, according to general Rate principle of optimality, every 3 output selects one to export as final output, to obtain final grinding report.
Outline generates model and VAE generates model and needs to be trained before formal use, is Fig. 2 with reference to Fig. 3, Fig. 3 Training process schematic diagram;Using the news of financial field as event text, training for model is generated to the first outline below Journey is illustrated:
The news for defining input first is X, and x is the word in news, and the expression formula of news is as follows,
X=(x1,…,xm) (1)
There are two the stage, it is the word ground in report outline that the first stage, which generates and grinds the outline O, o of report, for the decoding of latent variable, Second stage generates final grinding and reports Y, and y is the word ground in report, and the length that definition generates text is L, and expression formula is as follows:
O=(o1,…,oL) (2)
Y=(y1,…,yL) (3)
Report is ground to the news of training dataset and macroscopic view again, extraction Chinese and Chinese character are matched by canonical, then After jieba participle by text as unit of word, a preliminary length statistics is done, the length statistical data finally obtained is such as Shown in table 1:
1 news of table is counted with report length is ground
Then in order to train needs, by news and report progress " truncate and mend length " is ground to the same length being arranged, such as news Length is 30 words, and grinding report length is 200 words.It inputs newsletter archive and grinds message sheet and become by word embedding It indicates to grind report vector to obtain news vector sum at vector, directlys adopt one layer of Embedding network herein.Due to LSTM network When to sentence modeling, information from back to front can not be encoded, and two-way LSTM network can preferably capture it is two-way Semantic dependency.Then the news vector after Embedding is input in two-way LSTM network.Wherein, news vector is defeated The hidden layer state for entering available news after two-way LSTM network is expressed as H, and h is hidden level sub-states, and t is time, the expression formula of h It is as follows:
In order to generate the report that grinds of high quality, model based coding and decoded process need to fully absorb the corresponding structure for grinding report and Content, Decoder process grind the probability distribution in vocabulary for reporting next word by LSTM neural network forecast.Macroscopic view grinds the hidden of report Layer state is expressed as S, and the hidden layer state of each time step all relied on the hidden layer of time an input and a upper time State, expression formula are as follows:
The hidden layer state that the hidden layer state and macroscopic view of news grind report calculates attention score (such as formula by attention mechanism (6)) it, uses the attention of multiplication to accumulate attention score herein, attention is calculated using softmax function and weighs Weight (such as formula (7)).By the weighted average (such as formula (8)) of attention weight and the hidden layer state of news, i.e., by attention weight It is multiplied to obtain context vector with the hidden layer state of news.Formula (6), formula (7), the expression formula of formula (8) are as follows:
Then series connection context vector and macroscopic view grind the hidden layer state of report to update the power hidden layer state that gains attention.Input one The prediction output of a word will be noted power hidden layer state computation and obtain.Expression formula is as follows:
Wherein, Wc is model parameter.
Decoded first stage objective function is as follows:
Wherein, P is probability.
So far, the probability distribution that the word in research report dictionary is obtained according to the news of input, according to probability distribution The outline of report can be generated.
Decoded second stage, in order to generate the final report that grinds, this stage needs to input news and is generated according to news Outline.
Decoded model uses variation self-encoding encoder model (VAE), generates mesh by fusion dual input news X and outline O Mark variable, the Posterior probability distribution of study hidden variable z, P (z | X) following expression can be rewritten as:
Assuming that the Posterior distrbutionp is standardized normal distribution, stochastical sampling further decoding is to original text from distribution.Pass through It measures reconstruct loss and regularization loss is trained, then ELBO can be expressed as:
logP(X,O)≥Eq(z|x,o)[logp(x,o|z)]-KL(q(z|x,o)||p(z)) (12)
It is ELBO on the right side of inequality above, wherein first item is to sample out attention from P (z | X, O), uses sampling Attention out is calculated as the input of decoder intersects entropy loss, and Section 2 is to measure two probability distribution by KL divergence Similarity, it is ensured that Posterior distrbutionp is close to prior distribution.
Decoded second stage objective function is as follows:
So far, the probability distribution information that can obtain word again according to news and framework for reporting, believes according to probability distribution It ceases and available final grinds report.
Global objective function is obtained by the adduction of the loss function of two decoding stages, and expression formula is as follows:
The training dataset that one-to-one news obtained above and macroscopic view grind report is inputted to the model and 50 wheel of training of Fig. 3 Afterwards, adjustable and determining final model parameter.Specifically, it by taking global optimum is searched for as an example, grinds once only defeated when report generates A word out, each macroscopic view, which is ground, to be responded with beginning label-macroscopic view and grinds report-closing tag and sequentially input in model, every time one by one Word input is trained, such as the label of input beginning at the beginning can obtain an output in model output end, this is defeated Out it is the first word for grinding report of prediction, the word of output is compared with the first word for really grinding report, according to Comparing result modifies model parameter;Then macroscopic view is ground in the first word input model of report again, then obtains an output, this A output is second word for grinding report of prediction, and second word that second word and true macroscopic view grind report is carried out Compare, adjusts model parameter again according to comparing result, constantly reduce the error between model output and real word.With multiple News-macroscopic view grinds count off according to after being trained to model, and the structure and parameter of model will all preserve after training.According to The model for finally determining model parameter can predict the news newly inputted, be predicted with the model of Fig. 2, and input is new Available Fig. 4 grinds report after news, and Fig. 4 is to generate the example for grinding report using the research report generation method in the present invention to illustrate Figure.
It is the method flow diagram of second of specific embodiment of research report generation method in the present invention with reference to Fig. 5, Fig. 5; In Fig. 5, by taking the news of financial field as an example, second of outline generates model and includes: event text
Framework for reporting is obtained according to event text, beginning label and Transformer model;
It is big according to a upper word, event text and Transformer the model modification report of report generation step output Guiding principle.Wherein, report generation model is also to generate model using VAE.And Transformer model is equivalent to and LSTM network is substituted With attention mechanism, the attention score of Transformer model is solved such as formula (15),
Wherein, Q, K, V are three matrix-vectors for inputting X and being transformed into.Similarly with Fig. 2, the implementation procedure of the model of Fig. 5 After input news and beginning<start>label, report is ground in output after outline generates model and report generation model treatment First word, this first output word be back to input terminal replace beginning label, and so on, the model of Fig. 5 Output one by one is ground to the word of report.In addition, similarly, outline generates model and report generation model can choose global search most The method choice word of excellent solution, and beam search algorithm also can be applied in the model of Fig. 5, to improve search efficiency.
It is the training process schematic diagram of Fig. 5 with reference to Fig. 6, Fig. 6;Training dataset is sequentially input model to be trained, with Fig. 4 similarly, macroscopic view is ground and responds with beginning label-macroscopic view and grinds in report-closing tag form input model, is exported according to model It compares with really declaration form word is ground to adjust model parameter.
Embodiment 2
Embodiment 2 is provided based on embodiment 1, embodiment 2 provides a kind of research report generating means, is with reference to Fig. 7, Fig. 7 A kind of structural block diagram of embodiment of research report generating means, research report generating means include: in the present invention
Research report acquisition module, for obtaining multiple research reports from multiple information sources;
Dictionary obtains module, for carrying out data prediction and feature selecting to multiple research reports to construct research report Dictionary;
Outline obtains module, obtains event text for generating model according to event text, research report dictionary and outline Corresponding framework for reporting, outline generate model and select multiple word compositions single from research report dictionary according to probability principle of optimality Word sequence is as framework for reporting;
Report generation module, for being obtained according to event text, framework for reporting, research report dictionary and report generation model Research report, report generation model select multiple word composition word sequences according to probability principle of optimality from research report dictionary As research report.
The specific work process description of research report generating means can refer to the description of embodiment 1, repeat no more.Using grinding Study carefully report preparing apparatus and can automatically generate and grind report, liberate manpower, report delivery efficiency is ground in raising.
Embodiment 3
Embodiment 3 is provided based on embodiment 1, embodiment 3 provides a kind of computer readable storage medium, and the computer can It reads storage medium and is stored with computer executable instructions, the computer executable instructions are strictly according to the facts for executing the computer Apply research report generation method described in example 1.The specific descriptions of research report generation method can refer to the description of embodiment 1, It repeats no more.
It is to be illustrated to preferable implementation of the invention, but the invention is not limited to the implementation above Example, those skilled in the art can also make various equivalent variations on the premise of without prejudice to spirit of the invention or replace It changes, these equivalent deformations or replacement are all included in the scope defined by the claims of the present application.

Claims (10)

1. a kind of research report generation method characterized by comprising
Research report acquisition step: multiple research reports from multiple information sources are obtained;
Dictionary obtaining step: feature selecting is carried out to construct research report dictionary to multiple research reports;
Outline obtaining step: model is generated according to event text, the research report dictionary and outline and obtains the event text Corresponding framework for reporting, the outline generate model and select multiple lists from the research report dictionary according to probability principle of optimality Word forms word sequence as the framework for reporting;
Report generation step: according to the event text, the framework for reporting, the research report dictionary and report generation model Research report is obtained, the report generation model selects multiple words from the research report dictionary according to probability principle of optimality Word sequence is formed as the research report.
2. research report generation method according to claim 1, which is characterized in that the report generation model is ground from described Study carefully and is selected in report dictionary and export word one by one to generate the research report.
3. research report generation method according to claim 2, which is characterized in that the outline generates model according to The upper word that event text, the framework for reporting, the research report dictionary, the report generation step export updates institute State framework for reporting.
4. research report generation method according to claim 3, which is characterized in that the dictionary obtaining step further include: Addition beginning label and closing tag to the research report dictionary.
5. research report generation method according to claim 4, which is characterized in that the outline generates model and includes:
Vector expression is carried out to obtain event vector to the event text, beginning label according to the research report dictionary With beginning label vector;
Hidden layer state and the institute of the event text are obtained according to the event vector, the beginning label vector and LSTM network State the hidden layer state of beginning label;
The report is obtained according to the hidden layer state of the event text, the hidden layer state of the beginning label and attention mechanism Outline;
Carrying out vector to the upper word that the report generation step exports according to the research report dictionary indicates to obtain Word vector;
The hidden layer state of word is obtained according to the word vector sum LSTM network;
It is reported according to the hidden layer state of the word, the hidden layer state of the event text and the attention new mechanism Outline.
6. research report generation method according to claim 4, which is characterized in that the outline generates model and includes:
The framework for reporting is obtained according to the event text, beginning label and Transformer model;
More according to a upper word, the event text and the Transformer model of report generation step output The new framework for reporting.
7. research report generation method according to any one of claims 1 to 6, which is characterized in that the report generation mould Type includes that VAE generates model.
8. research report generation method according to any one of claims 1 to 6, which is characterized in that the research report is raw At method further include:
The research report obtained according to the research report acquisition step carries out event Entity recognition to obtain corresponding event text This, multiple research reports and corresponding event text form training dataset, and the training dataset is for training the outline Generate model and the report generation model.
9. a kind of research report generating means characterized by comprising
Research report acquisition module, for obtaining multiple research reports from multiple information sources;
Dictionary obtains module, for carrying out feature selecting to multiple research reports to construct research report dictionary;
Outline obtains module, obtains the event for generating model according to event text, the research report dictionary and outline The corresponding framework for reporting of text, the outline generation model select from the research report dictionary more according to probability principle of optimality A word composition word sequence is as the framework for reporting;
Report generation module, for according to the event text, the framework for reporting, the research report dictionary and report generation Model obtains research report, and the report generation model selects from the research report dictionary multiple according to probability principle of optimality Word forms word sequence as the research report.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer can It executes instruction, the computer executable instructions are for making the computer execute as claimed in any one of claims 1 to 8 grind Study carefully report-generating method.
CN201910513763.9A 2019-06-14 2019-06-14 Research report generation method and related equipment Active CN110362797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910513763.9A CN110362797B (en) 2019-06-14 2019-06-14 Research report generation method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910513763.9A CN110362797B (en) 2019-06-14 2019-06-14 Research report generation method and related equipment

Publications (2)

Publication Number Publication Date
CN110362797A true CN110362797A (en) 2019-10-22
CN110362797B CN110362797B (en) 2023-10-13

Family

ID=68216086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910513763.9A Active CN110362797B (en) 2019-06-14 2019-06-14 Research report generation method and related equipment

Country Status (1)

Country Link
CN (1) CN110362797B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046629A (en) * 2019-12-16 2020-04-21 北大方正集团有限公司 Outline display method, device and equipment
CN112242185A (en) * 2020-09-09 2021-01-19 山东大学 Medical image report automatic generation method and system based on deep learning
CN112417846A (en) * 2020-11-25 2021-02-26 中译语通科技股份有限公司 Text automatic generation method and device, electronic equipment and storage medium
CN113160963A (en) * 2020-12-18 2021-07-23 中电云脑(天津)科技有限公司 Event determination method and device, electronic equipment and storage medium
CN114490778A (en) * 2022-02-15 2022-05-13 北京固加数字科技有限公司 Financial research and report automatic generation system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140142939A1 (en) * 2012-11-21 2014-05-22 Algotes Systems Ltd. Method and system for voice to text reporting for medical image software
CN109635302A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 A kind of method and apparatus of training text summarization generation model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140142939A1 (en) * 2012-11-21 2014-05-22 Algotes Systems Ltd. Method and system for voice to text reporting for medical image software
CN109635302A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 A kind of method and apparatus of training text summarization generation model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DARRYL GRIFFITHS ET AL.: "A Self-Report Study that Gauges Perceived and Induced Emotion with Music", 《2015 INTERNET TECHNOLOGIES AND APPLICATIONS (ITA)》 *
吴杰 等: "核电厂运行经验报告的定量评估方法", 《发电设备》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046629A (en) * 2019-12-16 2020-04-21 北大方正集团有限公司 Outline display method, device and equipment
CN111046629B (en) * 2019-12-16 2022-03-01 北大方正集团有限公司 Outline display method, device and equipment
CN112242185A (en) * 2020-09-09 2021-01-19 山东大学 Medical image report automatic generation method and system based on deep learning
CN112417846A (en) * 2020-11-25 2021-02-26 中译语通科技股份有限公司 Text automatic generation method and device, electronic equipment and storage medium
WO2022110454A1 (en) * 2020-11-25 2022-06-02 中译语通科技股份有限公司 Automatic text generation method and apparatus, and electronic device and storage medium
CN113160963A (en) * 2020-12-18 2021-07-23 中电云脑(天津)科技有限公司 Event determination method and device, electronic equipment and storage medium
CN114490778A (en) * 2022-02-15 2022-05-13 北京固加数字科技有限公司 Financial research and report automatic generation system and method

Also Published As

Publication number Publication date
CN110362797B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN109359293B (en) Mongolian name entity recognition method neural network based and its identifying system
CN110362797A (en) A kind of research report generation method and relevant device
CN111241294B (en) Relationship extraction method of graph convolution network based on dependency analysis and keywords
CN110348016B (en) Text abstract generation method based on sentence correlation attention mechanism
CN111177394B (en) Knowledge map relation data classification method based on syntactic attention neural network
CN109635124B (en) Remote supervision relation extraction method combined with background knowledge
Yu et al. Transition-based neural RST parsing with implicit syntax features
CN107844469A (en) The text method for simplifying of word-based vector query model
CN105843801B (en) The structure system of more translation Parallel Corpus
WO2018218708A1 (en) Deep-learning-based public opinion hotspot category classification method
CN110222188A (en) A kind of the company&#39;s bulletin processing method and server-side of multi-task learning
CN110287482B (en) Semi-automatic participle corpus labeling training device
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN109800310A (en) A kind of electric power O&amp;M text analyzing method based on structuring expression
CN111881677A (en) Address matching algorithm based on deep learning model
CN107798624A (en) A kind of technical label in software Ask-Answer Community recommends method
CN109858042A (en) A kind of determination method and device of translation quality
CN103631858A (en) Science and technology project similarity calculation method
CN104699797A (en) Webpage data structured analytic method and device
CN108920446A (en) A kind of processing method of Engineering document
CN111666373A (en) Chinese news classification method based on Transformer
CN115357719A (en) Power audit text classification method and device based on improved BERT model
CN117112850A (en) Address standardization method, device, equipment and storage medium
CN109325243A (en) Mongolian word cutting method and its word cutting system of the character level based on series model
CN112036179B (en) Electric power plan information extraction method based on text classification and semantic frame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant