CN111666769A

CN111666769A - Method for extracting financial field event sentences in annual newspaper

Info

Publication number: CN111666769A
Application number: CN202010528238.7A
Authority: CN
Inventors: 温秋华; 潘定; 梁倬骞
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2020-09-15

Abstract

The invention discloses a method for extracting event sentences in the financial field in annual newspapers, which comprises the following specific steps: the method comprises the steps of 1, inputting financial report data, 2, preprocessing the data, 3, selecting named entity identification based on perceptron sequence labeling, 4, improving a keyword extraction algorithm based on TextRank, and 5, outputting to obtain text keywords, and relates to the technical field of financial field event sentence extraction. The financial field event sentence extraction method in the yearbook solves the problems that named entities are ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the extracted keywords are easy to be interfered by noise information to cause errors.

Description

Method for extracting financial field event sentences in annual newspaper

Technical Field

The invention relates to the technical field of financial field event sentence extraction, in particular to a method for extracting financial field event sentences in annual newspapers.

Background

With the rise of the internet and the development of information technology, a large amount of data and texts are displayed by taking a computer as a medium, most of the complicated internet short texts require a user to spend a large amount of time for reading and understanding, how to quickly process the short texts and accurately refine text keywords or abstracts by using the computer becomes a research hotspot and a main problem in the current natural language processing field, and in the field of natural language processing, the information extraction technology can effectively solve the problem.

When the TextRank keyword extraction algorithm is used for word segmentation, named entities are ignored, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause errors.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a financial field event sentence extraction method in yearbook, and solves the problems that a named entity is ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause error in extracting the keyword.

In order to achieve the purpose, the invention is realized by the following technical scheme: a financial field event sentence extraction method in annual newspapers comprises the following specific steps:

step 1, inputting financial report data;

step 2, preprocessing the data;

step 3, selecting named entity identification based on sequence marking of a perceptron;

step 4, improving a keyword extraction algorithm based on TextRank;

and 5, outputting to obtain text keywords.

Preferably, the method for identifying the named entity based on the perceptron sequence annotation in step 3 is as follows:

A. training a perceptron model;

B. labeling a text word sequence;

C. the named entity identifies the participle.

Preferably, the keyword extraction algorithm based on TextRank is improved in step 4, and the specific steps are as follows:

a. constructing a TextRank graph model;

b. performing iterative computation;

c. calculating the weight of the words;

d. and (6) ranking.

Preferably, the sensing machine model in step a is a machine learning base model based on a linear sensing algorithm, and the model is adjusted by calculating an error.

Preferably, the TextRank in step 4 is a text ranking algorithm improved based on the PageRank web page ranking algorithm.

Preferably, the model of the sensor in step C is adjusted by calculating an error.

Advantageous effects

The invention provides a method for extracting financial field event sentences in annual newspapers. The method has the following beneficial effects:

according to the financial field event sentence extraction method in the yearbook, defects of a TextRank graph model are improved by constructing the TextRank graph model, iterative computation, word weight calculation and ranking calculation, and the problems that a named entity is ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause errors are solved.

Drawings

Fig. 1 is a flowchart of a financial field event sentence extraction method in the annual report of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution: a financial field event sentence extraction method in annual newspapers comprises the following specific steps:

step 1, inputting financial report data;

step 2, preprocessing the data;

step 4, improving a keyword extraction algorithm based on TextRank;

and 5, outputting to obtain text keywords.

Further, the method for identifying the named entity based on the perceptron sequence annotation in step 3 is as follows:

A. training a perceptron model;

B. labeling a text word sequence;

C. the named entity identifies the participle.

Further, a keyword extraction algorithm based on TextRank is improved in step 4, and the specific steps are as follows:

a. constructing a TextRank graph model;

b. performing iterative computation;

c. calculating the weight of the words;

d. and (6) ranking.

Further, the sensing machine model in the step a is specifically a machine learning basic model based on a linear sensing algorithm, and the model is adjusted by calculating an error.

Further, the TextRank in the step 4 is a text sorting algorithm improved based on the PageRank web page sorting algorithm.

Further, the model of the sensor in step C adjusts the model by calculating an error.

A financial field event sentence extraction method in annual newspapers comprises the following specific steps: step 1, inputting financial report data; step 2, preprocessing the data; step 3, selecting named entity identification based on sequence marking of a perceptron; step 4, improving a keyword extraction algorithm based on TextRank; and 5, outputting to obtain text keywords.

The method for identifying the named entity based on the perceptron sequence annotation in the step 3 of the invention comprises the following steps: A. training a perceptron model; B. labeling a text word sequence; C. the named entity identifies the participle.

In step 4 of the invention, a keyword extraction algorithm based on TextRank is improved, and the method specifically comprises the following steps: a. constructing a TextRank graph model; b. performing iterative computation; c. calculating the weight of the words; d. and (6) ranking.

The perceptron model in the step A is specifically a machine learning basic model based on a linear perception algorithm, a TextRank algorithm preprocessing word segmentation method is improved by a named entity recognition method based on perceptron sequence labeling through a calculation error adjustment model, the accuracy of a keyword extraction algorithm based on the TextRank is improved by recognizing the granularity of a named entity after text word segmentation is increased, the TextRank in the step 4 is a text sorting algorithm improved based on a Pagerank webpage sorting algorithm, and the perceptron model in the step C is adjusted through the calculation error adjustment model.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A financial field event sentence extraction method in annual newspapers comprises the following specific steps:

step 1, inputting financial report data;

step 2, preprocessing the data;

step 4, improving a keyword extraction algorithm based on TextRank;

and 5, outputting to obtain text keywords.

2. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: the method for identifying the named entity based on the perceptron sequence annotation in the step 3 comprises the following steps:

A. training a perceptron model;

B. labeling a text word sequence;

C. the named entity identifies the participle.

3. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: in the step 4, a keyword extraction algorithm based on TextRank is improved, and the method specifically comprises the following steps:

a. constructing a TextRank graph model;

b. performing iterative computation;

c. calculating the weight of the words;

d. and (6) ranking.

4. The financial field event sentence extraction method in yearly newspaper according to claim 2, characterized in that: and B, the sensor model in the step A is specifically a machine learning basic model based on a linear sensing algorithm, and the model is adjusted by calculating errors.

5. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: the TextRank in the step 4 is a text sorting algorithm improved based on a PageRank webpage sorting algorithm.

6. The method for extracting financial field event sentences in yearbook according to claim 4, wherein: and C, adjusting the model by calculating the error of the sensor model in the step C.