CN111666769A - Method for extracting financial field event sentences in annual newspaper - Google Patents
Method for extracting financial field event sentences in annual newspaper Download PDFInfo
- Publication number
- CN111666769A CN111666769A CN202010528238.7A CN202010528238A CN111666769A CN 111666769 A CN111666769 A CN 111666769A CN 202010528238 A CN202010528238 A CN 202010528238A CN 111666769 A CN111666769 A CN 111666769A
- Authority
- CN
- China
- Prior art keywords
- textrank
- financial field
- model
- field event
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 31
- 238000002372 labelling Methods 0.000 claims abstract description 6
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000010801 machine learning Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 abstract description 6
- 238000004364 calculation method Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for extracting event sentences in the financial field in annual newspapers, which comprises the following specific steps: the method comprises the steps of 1, inputting financial report data, 2, preprocessing the data, 3, selecting named entity identification based on perceptron sequence labeling, 4, improving a keyword extraction algorithm based on TextRank, and 5, outputting to obtain text keywords, and relates to the technical field of financial field event sentence extraction. The financial field event sentence extraction method in the yearbook solves the problems that named entities are ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the extracted keywords are easy to be interfered by noise information to cause errors.
Description
Technical Field
The invention relates to the technical field of financial field event sentence extraction, in particular to a method for extracting financial field event sentences in annual newspapers.
Background
With the rise of the internet and the development of information technology, a large amount of data and texts are displayed by taking a computer as a medium, most of the complicated internet short texts require a user to spend a large amount of time for reading and understanding, how to quickly process the short texts and accurately refine text keywords or abstracts by using the computer becomes a research hotspot and a main problem in the current natural language processing field, and in the field of natural language processing, the information extraction technology can effectively solve the problem.
When the TextRank keyword extraction algorithm is used for word segmentation, named entities are ignored, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause errors.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a financial field event sentence extraction method in yearbook, and solves the problems that a named entity is ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause error in extracting the keyword.
In order to achieve the purpose, the invention is realized by the following technical scheme: a financial field event sentence extraction method in annual newspapers comprises the following specific steps:
step 1, inputting financial report data;
step 2, preprocessing the data;
step 3, selecting named entity identification based on sequence marking of a perceptron;
step 4, improving a keyword extraction algorithm based on TextRank;
and 5, outputting to obtain text keywords.
Preferably, the method for identifying the named entity based on the perceptron sequence annotation in step 3 is as follows:
A. training a perceptron model;
B. labeling a text word sequence;
C. the named entity identifies the participle.
Preferably, the keyword extraction algorithm based on TextRank is improved in step 4, and the specific steps are as follows:
a. constructing a TextRank graph model;
b. performing iterative computation;
c. calculating the weight of the words;
d. and (6) ranking.
Preferably, the sensing machine model in step a is a machine learning base model based on a linear sensing algorithm, and the model is adjusted by calculating an error.
Preferably, the TextRank in step 4 is a text ranking algorithm improved based on the PageRank web page ranking algorithm.
Preferably, the model of the sensor in step C is adjusted by calculating an error.
Advantageous effects
The invention provides a method for extracting financial field event sentences in annual newspapers. The method has the following beneficial effects:
according to the financial field event sentence extraction method in the yearbook, defects of a TextRank graph model are improved by constructing the TextRank graph model, iterative computation, word weight calculation and ranking calculation, and the problems that a named entity is ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause errors are solved.
Drawings
Fig. 1 is a flowchart of a financial field event sentence extraction method in the annual report of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: a financial field event sentence extraction method in annual newspapers comprises the following specific steps:
step 1, inputting financial report data;
step 2, preprocessing the data;
step 3, selecting named entity identification based on sequence marking of a perceptron;
step 4, improving a keyword extraction algorithm based on TextRank;
and 5, outputting to obtain text keywords.
Further, the method for identifying the named entity based on the perceptron sequence annotation in step 3 is as follows:
A. training a perceptron model;
B. labeling a text word sequence;
C. the named entity identifies the participle.
Further, a keyword extraction algorithm based on TextRank is improved in step 4, and the specific steps are as follows:
a. constructing a TextRank graph model;
b. performing iterative computation;
c. calculating the weight of the words;
d. and (6) ranking.
Further, the sensing machine model in the step a is specifically a machine learning basic model based on a linear sensing algorithm, and the model is adjusted by calculating an error.
Further, the TextRank in the step 4 is a text sorting algorithm improved based on the PageRank web page sorting algorithm.
Further, the model of the sensor in step C adjusts the model by calculating an error.
A financial field event sentence extraction method in annual newspapers comprises the following specific steps: step 1, inputting financial report data; step 2, preprocessing the data; step 3, selecting named entity identification based on sequence marking of a perceptron; step 4, improving a keyword extraction algorithm based on TextRank; and 5, outputting to obtain text keywords.
The method for identifying the named entity based on the perceptron sequence annotation in the step 3 of the invention comprises the following steps: A. training a perceptron model; B. labeling a text word sequence; C. the named entity identifies the participle.
In step 4 of the invention, a keyword extraction algorithm based on TextRank is improved, and the method specifically comprises the following steps: a. constructing a TextRank graph model; b. performing iterative computation; c. calculating the weight of the words; d. and (6) ranking.
The perceptron model in the step A is specifically a machine learning basic model based on a linear perception algorithm, a TextRank algorithm preprocessing word segmentation method is improved by a named entity recognition method based on perceptron sequence labeling through a calculation error adjustment model, the accuracy of a keyword extraction algorithm based on the TextRank is improved by recognizing the granularity of a named entity after text word segmentation is increased, the TextRank in the step 4 is a text sorting algorithm improved based on a Pagerank webpage sorting algorithm, and the perceptron model in the step C is adjusted through the calculation error adjustment model.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. A financial field event sentence extraction method in annual newspapers comprises the following specific steps:
step 1, inputting financial report data;
step 2, preprocessing the data;
step 3, selecting named entity identification based on sequence marking of a perceptron;
step 4, improving a keyword extraction algorithm based on TextRank;
and 5, outputting to obtain text keywords.
2. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: the method for identifying the named entity based on the perceptron sequence annotation in the step 3 comprises the following steps:
A. training a perceptron model;
B. labeling a text word sequence;
C. the named entity identifies the participle.
3. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: in the step 4, a keyword extraction algorithm based on TextRank is improved, and the method specifically comprises the following steps:
a. constructing a TextRank graph model;
b. performing iterative computation;
c. calculating the weight of the words;
d. and (6) ranking.
4. The financial field event sentence extraction method in yearly newspaper according to claim 2, characterized in that: and B, the sensor model in the step A is specifically a machine learning basic model based on a linear sensing algorithm, and the model is adjusted by calculating errors.
5. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: the TextRank in the step 4 is a text sorting algorithm improved based on a PageRank webpage sorting algorithm.
6. The method for extracting financial field event sentences in yearbook according to claim 4, wherein: and C, adjusting the model by calculating the error of the sensor model in the step C.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010528238.7A CN111666769A (en) | 2020-06-11 | 2020-06-11 | Method for extracting financial field event sentences in annual newspaper |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010528238.7A CN111666769A (en) | 2020-06-11 | 2020-06-11 | Method for extracting financial field event sentences in annual newspaper |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111666769A true CN111666769A (en) | 2020-09-15 |
Family
ID=72387110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010528238.7A Pending CN111666769A (en) | 2020-06-11 | 2020-06-11 | Method for extracting financial field event sentences in annual newspaper |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111666769A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN109165380A (en) * | 2018-07-26 | 2019-01-08 | 咪咕数字传媒有限公司 | A kind of neural network model training method and device, text label determine method and device |
CN110162592A (en) * | 2019-05-24 | 2019-08-23 | 东北大学 | A kind of news keyword extracting method based on the improved TextRank of gravitation |
CN110232112A (en) * | 2019-05-31 | 2019-09-13 | 北京创鑫旅程网络技术有限公司 | Keyword extracting method and device in article |
-
2020
- 2020-06-11 CN CN202010528238.7A patent/CN111666769A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN109165380A (en) * | 2018-07-26 | 2019-01-08 | 咪咕数字传媒有限公司 | A kind of neural network model training method and device, text label determine method and device |
CN110162592A (en) * | 2019-05-24 | 2019-08-23 | 东北大学 | A kind of news keyword extracting method based on the improved TextRank of gravitation |
CN110232112A (en) * | 2019-05-31 | 2019-09-13 | 北京创鑫旅程网络技术有限公司 | Keyword extracting method and device in article |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107291723B (en) | Method and device for classifying webpage texts and method and device for identifying webpage texts | |
CN113076431B (en) | Question and answer method and device for machine reading understanding, computer equipment and storage medium | |
CN109753660B (en) | LSTM-based winning bid web page named entity extraction method | |
US8275600B2 (en) | Machine learning for transliteration | |
CN112434691A (en) | HS code matching and displaying method and system based on intelligent analysis and identification and storage medium | |
CN113254574A (en) | Method, device and system for auxiliary generation of customs official documents | |
CN112507711A (en) | Text abstract extraction method and system | |
CN101127042A (en) | Sensibility classification method based on language model | |
CN113632092A (en) | Entity recognition method and device, dictionary establishing method, equipment and medium | |
CN111046660B (en) | Method and device for identifying text professional terms | |
Chen et al. | Information extraction from resume documents in pdf format | |
CN111444704B (en) | Network safety keyword extraction method based on deep neural network | |
CN106202255A (en) | Merge the Vietnamese name entity recognition method of physical characteristics | |
CN111160019A (en) | Public opinion monitoring method, device and system | |
Ha et al. | Information extraction from scanned invoice images using text analysis and layout features | |
Sun et al. | Chinese new word identification: a latent discriminative model with global features | |
JP2020098592A (en) | Method, device and storage medium of extracting web page content | |
CN113672731A (en) | Emotion analysis method, device and equipment based on domain information and storage medium | |
CN111241824A (en) | Method for identifying Chinese metaphor information | |
CN111339457A (en) | Method and apparatus for extracting information from web page and storage medium | |
CN114510923B (en) | Text theme generation method, device, equipment and medium based on artificial intelligence | |
CN113987175B (en) | Text multi-label classification method based on medical subject vocabulary enhancement characterization | |
Kadagadkai et al. | Summarization tool for multimedia data | |
CN110717029A (en) | Information processing method and system | |
Tanaka et al. | Corpus Construction for Historical Newspapers: A Case Study on Public Meeting Corpus Construction Using OCR Error Correction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200915 |