CN111666769A - Method for extracting financial field event sentences in annual newspaper - Google Patents

Method for extracting financial field event sentences in annual newspaper Download PDF

Info

Publication number
CN111666769A
CN111666769A CN202010528238.7A CN202010528238A CN111666769A CN 111666769 A CN111666769 A CN 111666769A CN 202010528238 A CN202010528238 A CN 202010528238A CN 111666769 A CN111666769 A CN 111666769A
Authority
CN
China
Prior art keywords
textrank
financial field
model
field event
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010528238.7A
Other languages
Chinese (zh)
Inventor
温秋华
潘定
梁倬骞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202010528238.7A priority Critical patent/CN111666769A/en
Publication of CN111666769A publication Critical patent/CN111666769A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for extracting event sentences in the financial field in annual newspapers, which comprises the following specific steps: the method comprises the steps of 1, inputting financial report data, 2, preprocessing the data, 3, selecting named entity identification based on perceptron sequence labeling, 4, improving a keyword extraction algorithm based on TextRank, and 5, outputting to obtain text keywords, and relates to the technical field of financial field event sentence extraction. The financial field event sentence extraction method in the yearbook solves the problems that named entities are ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the extracted keywords are easy to be interfered by noise information to cause errors.

Description

Method for extracting financial field event sentences in annual newspaper
Technical Field
The invention relates to the technical field of financial field event sentence extraction, in particular to a method for extracting financial field event sentences in annual newspapers.
Background
With the rise of the internet and the development of information technology, a large amount of data and texts are displayed by taking a computer as a medium, most of the complicated internet short texts require a user to spend a large amount of time for reading and understanding, how to quickly process the short texts and accurately refine text keywords or abstracts by using the computer becomes a research hotspot and a main problem in the current natural language processing field, and in the field of natural language processing, the information extraction technology can effectively solve the problem.
When the TextRank keyword extraction algorithm is used for word segmentation, named entities are ignored, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause errors.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a financial field event sentence extraction method in yearbook, and solves the problems that a named entity is ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause error in extracting the keyword.
In order to achieve the purpose, the invention is realized by the following technical scheme: a financial field event sentence extraction method in annual newspapers comprises the following specific steps:
step 1, inputting financial report data;
step 2, preprocessing the data;
step 3, selecting named entity identification based on sequence marking of a perceptron;
step 4, improving a keyword extraction algorithm based on TextRank;
and 5, outputting to obtain text keywords.
Preferably, the method for identifying the named entity based on the perceptron sequence annotation in step 3 is as follows:
A. training a perceptron model;
B. labeling a text word sequence;
C. the named entity identifies the participle.
Preferably, the keyword extraction algorithm based on TextRank is improved in step 4, and the specific steps are as follows:
a. constructing a TextRank graph model;
b. performing iterative computation;
c. calculating the weight of the words;
d. and (6) ranking.
Preferably, the sensing machine model in step a is a machine learning base model based on a linear sensing algorithm, and the model is adjusted by calculating an error.
Preferably, the TextRank in step 4 is a text ranking algorithm improved based on the PageRank web page ranking algorithm.
Preferably, the model of the sensor in step C is adjusted by calculating an error.
Advantageous effects
The invention provides a method for extracting financial field event sentences in annual newspapers. The method has the following beneficial effects:
according to the financial field event sentence extraction method in the yearbook, defects of a TextRank graph model are improved by constructing the TextRank graph model, iterative computation, word weight calculation and ranking calculation, and the problems that a named entity is ignored when a TextRank keyword extraction algorithm is used for word segmentation, the keyword extraction calculation algorithm is not ideal, and the keyword extraction is easily interfered by noise information to cause errors are solved.
Drawings
Fig. 1 is a flowchart of a financial field event sentence extraction method in the annual report of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: a financial field event sentence extraction method in annual newspapers comprises the following specific steps:
step 1, inputting financial report data;
step 2, preprocessing the data;
step 3, selecting named entity identification based on sequence marking of a perceptron;
step 4, improving a keyword extraction algorithm based on TextRank;
and 5, outputting to obtain text keywords.
Further, the method for identifying the named entity based on the perceptron sequence annotation in step 3 is as follows:
A. training a perceptron model;
B. labeling a text word sequence;
C. the named entity identifies the participle.
Further, a keyword extraction algorithm based on TextRank is improved in step 4, and the specific steps are as follows:
a. constructing a TextRank graph model;
b. performing iterative computation;
c. calculating the weight of the words;
d. and (6) ranking.
Further, the sensing machine model in the step a is specifically a machine learning basic model based on a linear sensing algorithm, and the model is adjusted by calculating an error.
Further, the TextRank in the step 4 is a text sorting algorithm improved based on the PageRank web page sorting algorithm.
Further, the model of the sensor in step C adjusts the model by calculating an error.
A financial field event sentence extraction method in annual newspapers comprises the following specific steps: step 1, inputting financial report data; step 2, preprocessing the data; step 3, selecting named entity identification based on sequence marking of a perceptron; step 4, improving a keyword extraction algorithm based on TextRank; and 5, outputting to obtain text keywords.
The method for identifying the named entity based on the perceptron sequence annotation in the step 3 of the invention comprises the following steps: A. training a perceptron model; B. labeling a text word sequence; C. the named entity identifies the participle.
In step 4 of the invention, a keyword extraction algorithm based on TextRank is improved, and the method specifically comprises the following steps: a. constructing a TextRank graph model; b. performing iterative computation; c. calculating the weight of the words; d. and (6) ranking.
The perceptron model in the step A is specifically a machine learning basic model based on a linear perception algorithm, a TextRank algorithm preprocessing word segmentation method is improved by a named entity recognition method based on perceptron sequence labeling through a calculation error adjustment model, the accuracy of a keyword extraction algorithm based on the TextRank is improved by recognizing the granularity of a named entity after text word segmentation is increased, the TextRank in the step 4 is a text sorting algorithm improved based on a Pagerank webpage sorting algorithm, and the perceptron model in the step C is adjusted through the calculation error adjustment model.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. A financial field event sentence extraction method in annual newspapers comprises the following specific steps:
step 1, inputting financial report data;
step 2, preprocessing the data;
step 3, selecting named entity identification based on sequence marking of a perceptron;
step 4, improving a keyword extraction algorithm based on TextRank;
and 5, outputting to obtain text keywords.
2. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: the method for identifying the named entity based on the perceptron sequence annotation in the step 3 comprises the following steps:
A. training a perceptron model;
B. labeling a text word sequence;
C. the named entity identifies the participle.
3. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: in the step 4, a keyword extraction algorithm based on TextRank is improved, and the method specifically comprises the following steps:
a. constructing a TextRank graph model;
b. performing iterative computation;
c. calculating the weight of the words;
d. and (6) ranking.
4. The financial field event sentence extraction method in yearly newspaper according to claim 2, characterized in that: and B, the sensor model in the step A is specifically a machine learning basic model based on a linear sensing algorithm, and the model is adjusted by calculating errors.
5. The method for extracting financial field event sentences in yearbook according to claim 1, wherein: the TextRank in the step 4 is a text sorting algorithm improved based on a PageRank webpage sorting algorithm.
6. The method for extracting financial field event sentences in yearbook according to claim 4, wherein: and C, adjusting the model by calculating the error of the sensor model in the step C.
CN202010528238.7A 2020-06-11 2020-06-11 Method for extracting financial field event sentences in annual newspaper Pending CN111666769A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010528238.7A CN111666769A (en) 2020-06-11 2020-06-11 Method for extracting financial field event sentences in annual newspaper

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010528238.7A CN111666769A (en) 2020-06-11 2020-06-11 Method for extracting financial field event sentences in annual newspaper

Publications (1)

Publication Number Publication Date
CN111666769A true CN111666769A (en) 2020-09-15

Family

ID=72387110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010528238.7A Pending CN111666769A (en) 2020-06-11 2020-06-11 Method for extracting financial field event sentences in annual newspaper

Country Status (1)

Country Link
CN (1) CN111666769A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN109165380A (en) * 2018-07-26 2019-01-08 咪咕数字传媒有限公司 A kind of neural network model training method and device, text label determine method and device
CN110162592A (en) * 2019-05-24 2019-08-23 东北大学 A kind of news keyword extracting method based on the improved TextRank of gravitation
CN110232112A (en) * 2019-05-31 2019-09-13 北京创鑫旅程网络技术有限公司 Keyword extracting method and device in article

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN109165380A (en) * 2018-07-26 2019-01-08 咪咕数字传媒有限公司 A kind of neural network model training method and device, text label determine method and device
CN110162592A (en) * 2019-05-24 2019-08-23 东北大学 A kind of news keyword extracting method based on the improved TextRank of gravitation
CN110232112A (en) * 2019-05-31 2019-09-13 北京创鑫旅程网络技术有限公司 Keyword extracting method and device in article

Similar Documents

Publication Publication Date Title
CN107291723B (en) Method and device for classifying webpage texts and method and device for identifying webpage texts
CN113076431B (en) Question and answer method and device for machine reading understanding, computer equipment and storage medium
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
US8275600B2 (en) Machine learning for transliteration
CN112434691A (en) HS code matching and displaying method and system based on intelligent analysis and identification and storage medium
CN113254574A (en) Method, device and system for auxiliary generation of customs official documents
CN112507711A (en) Text abstract extraction method and system
CN101127042A (en) Sensibility classification method based on language model
CN113632092A (en) Entity recognition method and device, dictionary establishing method, equipment and medium
CN111046660B (en) Method and device for identifying text professional terms
Chen et al. Information extraction from resume documents in pdf format
CN111444704B (en) Network safety keyword extraction method based on deep neural network
CN106202255A (en) Merge the Vietnamese name entity recognition method of physical characteristics
CN111160019A (en) Public opinion monitoring method, device and system
Ha et al. Information extraction from scanned invoice images using text analysis and layout features
Sun et al. Chinese new word identification: a latent discriminative model with global features
JP2020098592A (en) Method, device and storage medium of extracting web page content
CN113672731A (en) Emotion analysis method, device and equipment based on domain information and storage medium
CN111241824A (en) Method for identifying Chinese metaphor information
CN111339457A (en) Method and apparatus for extracting information from web page and storage medium
CN114510923B (en) Text theme generation method, device, equipment and medium based on artificial intelligence
CN113987175B (en) Text multi-label classification method based on medical subject vocabulary enhancement characterization
Kadagadkai et al. Summarization tool for multimedia data
CN110717029A (en) Information processing method and system
Tanaka et al. Corpus Construction for Historical Newspapers: A Case Study on Public Meeting Corpus Construction Using OCR Error Correction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200915