CN112507190B - Method and system for extracting keywords of financial and economic news - Google Patents

Method and system for extracting keywords of financial and economic news Download PDF

Info

Publication number
CN112507190B
CN112507190B CN202011495561.5A CN202011495561A CN112507190B CN 112507190 B CN112507190 B CN 112507190B CN 202011495561 A CN202011495561 A CN 202011495561A CN 112507190 B CN112507190 B CN 112507190B
Authority
CN
China
Prior art keywords
financial
character
news
text
text data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011495561.5A
Other languages
Chinese (zh)
Other versions
CN112507190A (en
Inventor
李明玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinhua Zhiyun Technology Co ltd
Original Assignee
Xinhua Zhiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinhua Zhiyun Technology Co ltd filed Critical Xinhua Zhiyun Technology Co ltd
Priority to CN202011495561.5A priority Critical patent/CN112507190B/en
Publication of CN112507190A publication Critical patent/CN112507190A/en
Application granted granted Critical
Publication of CN112507190B publication Critical patent/CN112507190B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a keyword extraction method and a keyword extraction system for financial and financial fast news, wherein the method comprises the following steps: acquiring financial quick messaging text data and marking the financial text; inputting the marked text data into a pre-trained convolutional neural network to obtain font embedded characteristic vectors of text data characters; inputting the labeled text data into a pre-trained RoBerta-wwm model, and acquiring semantic embedded feature vectors of text data characters; splicing and reducing the dimension of the font embedded feature vector and the semantic embedded feature vector to obtain a combined character feature vector; inputting the combined character feature vector into a conditional random field layer, and acquiring an output character label by adjusting training parameters; and extracting key words according to the character tags. The method and the system adopt a Chinese RoBerta-wwm prediction model to represent the character vector of the financial and economic news text, combine the five-stroke characteristics of Chinese to carry out representation, and can improve the extraction accuracy of the keywords by combining the character type characteristics of the five-stroke Chinese.

Description

Method and system for extracting keywords of financial and economic news
Technical Field
The invention relates to the field of artificial intelligence, in particular to a method and a system for extracting keywords of financial and economic news.
Background
At present, most text keyword extraction algorithms are based on unsupervised algorithms, and the existing keyword extraction method comprises the following steps: the method for extracting keywords based on statistical characteristics, the method for extracting keywords based on word graph characteristics, the method for extracting keywords based on a topic model and the combination of the keyword extraction methods are adopted, however, the existing keyword extraction method depends heavily on the performance of a Chinese word segmentation device, the proportion of wrong specific nouns of the Chinese word segmentation device to the specific nouns in the financial field is high, the extracted keywords are not accurate, for short texts such as financial and channel news and even ultra-short texts with dozens of characters, the text statistical characteristics, the word graph characteristics and the topic characteristics used in the existing scheme are weak, and the keywords extracted by the existing scheme cannot effectively express the core of the financial news and channel news, so that the quasi-calling rate of a keyword algorithm is low.
Disclosure of Invention
One of the main purposes of the invention is to provide a method and a system for extracting keywords of financial and economic news. The method and the system adopt a Chinese RoBerta-wwm prediction model to represent the character vector of the financial and economic news text, combine the five-stroke characteristics of Chinese to carry out representation, and can improve the extraction accuracy of the keywords by combining the character type characteristics of the five-stroke Chinese.
The invention also aims to provide a method and a system for extracting keywords of financial and economic news. The method and the system feed the character mixed vector of the financial and economic news text into a CRF (conditional random field) for correcting the constraint of the part-of-speech syntax of the keyword, and can further judge the type of each character according to the output result.
The invention also aims to provide a method and a system for extracting keywords of financial and economic news. The method and the system are combined with the font characteristics and the semantic characteristics of the characters to represent the financial fast news, so that the relevance of extracting keywords of the financial fast news can be improved.
The invention also aims to provide a method and a system for extracting keywords of financial and financial fast news. The method and the system adopt a supervised learning method to obtain a keyword extraction model, carry out sequence labeling on the finance and economics news shortcut text keywords according to the naming rule of the finance and economics news shortcut, and clean the obtained text before labeling so as to improve the accuracy of the model for extracting the finance and economics news shortcut keywords.
In order to achieve at least one of the above objects, the present invention further provides a method for extracting keywords of financial news, comprising the steps of:
acquiring financial and economic news text data and labeling the financial and economic texts;
inputting the marked text data into a pre-trained convolutional neural network to obtain font embedded characteristic vectors of text data characters;
inputting the labeled text data into a pre-trained RoBerta-wwm model, and acquiring semantic embedded feature vectors of text data characters;
splicing and reducing the dimension of the font embedded feature vector and the semantic embedded feature vector to obtain a combined character feature vector;
inputting the combined character feature vector into a conditional random field layer, and acquiring an output character label by adjusting training parameters;
and extracting key words according to the character tags.
According to a preferred embodiment of the present invention, a five-stroke font feature vector of each character is obtained according to a single financial affair express message text, and a five-stroke font feature vector matrix of the single financial affair express message is established for obtaining a font embedded feature vector of the single financial affair express message.
According to a preferred embodiment of the present invention, at least 3 convolution kernel sliding windows with different sizes are established, a sliding feature map of each convolution kernel sliding window on a five-stroke font feature vector matrix is calculated, and pooling operation is performed according to the obtained feature maps.
According to a preferred embodiment of the present invention, a maximum pooling training parameter α and an average pooling training parameter β are obtained, and a pooled window output characteristic is further calculated, wherein the window output characteristic is:
[O 1 ,O 2 ,O 3 ]=αMaxPool[m 1 ,m 2 ,m 3 ]+βMeanPool[m 1 ,m 2 ,m 3 ];
wherein [ m ] is 1 ,m 2 ,m 3 ]For a characteristic diagram of different windows, [ O ] 1 ,O 2 ,O 3 ]The output characteristics of the different windows.
According to one preferred embodiment of the present invention, the pooled output features of different windows are spliced to obtain the five-stroke embedded feature vector
Figure BDA0002842049770000021
Comprises the following steps:
Figure BDA0002842049770000022
according to a preferred embodiment of the present invention, the method for obtaining the semantic embedded feature vector comprises the following steps: inputting RoBerta-wwm encorder end into same single financial and financial rapid messaging text for obtaining each character languageFalse embedding feature vector
Figure BDA0002842049770000023
According to one preferred embodiment of the present invention, a dimension-reduced training parameter W is obtained O Splicing the semantic embedded feature vector and the semantic embedded feature vector, reducing the dimension of the spliced result according to the dimension reduction training parameter WO, and acquiring the final combined character feature vector
Figure BDA0002842049770000024
Figure BDA0002842049770000025
According to one preferred embodiment of the invention, the label probability distribution of each character in the single financial affair express message text is calculated according to the label output by the conditional random field layer, and the keyword of the single financial affair express message is obtained by adopting the BIEO labeling rule according to the probability distribution.
According to one preferred embodiment of the present invention, the CRF layer is decoded using a first order Viterbi algorithm and the entire model is trained using a log likelihood loss function, where the log likelihood loss function is:
Figure BDA0002842049770000031
is a regular term, lambda is a training parameter, N is the total number of the DCs labeled with keywords, theta is a model integral parameter, and P (y) i |s i ) Is the label probability distribution of the character.
In order to achieve at least one of the above objects, the present invention further provides a system for extracting keywords of financial news, wherein the system employs the above method for extracting keywords of financial news.
Drawings
FIG. 1 is a schematic diagram showing a flow of a keyword extraction method for financial and economic news;
FIG. 2 is a schematic diagram showing a model structure of a keyword extraction system for financial and financial news in a manner of the present invention;
FIG. 3 is a schematic diagram showing a convolution diagram of five-stroke font feature vector acquisition in the keyword extraction method of financial and economic news.
Detailed Description
The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It will be understood by those skilled in the art that in the present disclosure, the terms "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for convenience in describing the invention and simplicity in description, but do not indicate or imply that the device or component being referred to must have a particular orientation, be constructed in a particular orientation, and be constructed in a particular manner of operation, and thus, the terms are not to be construed as limiting the invention.
It is understood that the terms "a" and "an" should be interpreted as meaning that a number of one element or element is one in one embodiment, while a number of other elements is one in another embodiment, and the terms "a" and "an" should not be interpreted as limiting the number.
Referring to fig. 1-3, the present invention provides a schematic flow diagram and a schematic model structure diagram of a keyword extraction method for financial and financial fast news. The method comprises the steps of extracting characteristics of character characteristics in a financial text by adopting a pre-trained model, obtaining a label of a character of the financial news and the economic news in a labeling mode, and extracting keywords according to the label. The invention extracts the key words by taking the characters of the financial and economic news as units and identifies and extracts the key words by utilizing the structural characteristics of the Chinese characters, thereby having obvious advantages compared with the noise caused by inaccurate word segmentation of the traditional Chinese word segmentation device.
The keyword extraction method specifically comprises the following steps: firstly acquiring text data of financial news flashes, wherein the text data of a single financial news flash acquired each time is used as a basic unit for keyword extraction, and the acquired text data is cleaned, wherein the text data cleaning method comprises the following steps: deleting some special characters, invisible characters in the financial fast-messaging text crawler webpage; removing head and tail blank characters, line feed characters and the like of the flash text; removing the URL link in the DCT text; and removing some electric head and electric tail in the financial news prompt text by using rules, such as: (news of the wealth consortium XX day); eliminating financial fast messages with the number of text words less than 10; and intercepting the financial fast messaging text with the text word number still larger than 512 after the processing of the steps. So that each financial and economic news conforms to the word number and format requirements.
Marking the cleaned data, and carrying out entity marking on the text data of each financial affair news according to the naming rule of the financial news, wherein the marked entity content comprises the following steps: entities such as person name, place name, organization name and date in the text need to be represented. The key words of the financial news in the news are required to reflect the fluctuation of the market, the influence on the industry and the financial concept and the like. Entities to be marked comprise related keywords such as futures, financial plates, industries, industry chain nouns and financial event nouns.
Further, the named financial and economic news text is subjected to feature extraction, and the feature extraction comprises the following steps: the font characteristics of each character in the text are extracted by adopting a Convolutional Neural Network (CNN), the convolutional neural network is obtained by pre-training and configuring relevant training parameters, and specifically, the character information of each financial and economic news can be represented as follows: s i ={w 1 ,w 2 ,…,w n },S i The character information of a single express message, n represents the number of the characters of the single financial and economic express message, wherein n is more than or equal to 10 and less than or equal to 512. Defining a single Chinese character as w j Then the five-stroke input of each Chinese character is wubi (w) j )={b j1 ,b j2 ,…,b jk In which b is jk A font structure for five-stroke input of a single chinese character, k representing the five-stroke font structure, and j representing the character. The five input feature vectors can be obtained through the trained convolutional neural network. Converting the acquired five-stroke input characteristic vector into an exponential form, setting the five-stroke vector dimension as d, and calculating to acquire a five-stroke vector matrix B of each character in the single financial and economic news i ∈R k×d The structure of the five-stroke input feature vector matrix is as follows:
Figure BDA0002842049770000041
wherein wubi (c) jk ) The five strokes input feature vectors represent characters, and e is a natural index.
Further, different sliding windows [ a ] are established using convolution kernels 1 ,a 2 ,a 3 ]In one preferred embodiment of the present invention, the sliding window size of the convolution kernel can be set as: [2,3,4]And moving the different sliding windows in the five-stroke vector matrix to obtain feature maps under the sliding windows with different sizes, wherein the feature maps can be represented as follows:
Figure BDA0002842049770000051
wherein m is 1 ,m 2 ,m 3 Representing the feature map under different size sliding windows. Further performing average pooling and maximum pooling on feature maps under different window sizes, setting an average pooling and training parameter beta and a maximum pooling trainable parameter alpha, performing pooling operation according to the average pooling and training parameter beta and the maximum pooling trainable parameter alpha, and outputting features:
[O 1 ,O 2 ,O 3 ]=αMaxPool[m 1 ,m 2 ,m 3 ]+βMeanPool[m 1 ,m 2 ,m 3 ];
maxpool as maximum pooling operation and Meanpool as average poolChemical conversion operation, [ O ] 1 ,O 2 ,O 3 ]Output characteristics under different sliding windows. Further splicing the output characteristic vectors for obtaining the final font embedded characteristic vector
Figure BDA0002842049770000059
Figure BDA0002842049770000053
It should be noted that the vector splicing referred to in the present invention is an expansion of the vector in the horizontal or vertical direction, for example: defining a one-dimensional vector m 1 =[1,2],m 2 =[3,4]Splicing the one-dimensional vectors to obtain a spliced vector m 3 =[1,2,3,4]. The same applies to the two-dimensional and above vector splicing method.
The invention further adopts a pre-trained RoBerta-wwm model to obtain semantic embedded feature vectors, and the specific method comprises the following steps: the character information S of the same financial and economic news text is used i Inputting the semantic embedded feature vector of each character to an encoder end of a RoBerta-wwm model
Figure BDA0002842049770000054
Splicing the obtained semantic embedded characteristic vector and font embedded characteristic vector of the same quick news, and leading the spliced result to pass through dimension reduction trainable parameters W O Reducing dimension, and finally obtaining the combined character feature vector of the same financial and economic news, wherein the combined character feature vector
Figure BDA0002842049770000055
The expression is as follows:
Figure BDA0002842049770000056
explained, the dimension reduction trainable parameters W O Semantic embedding by sum-concatenation for matrix structureThe feature vector and the font embedding feature vector are multiplied to obtain a combined character feature vector with smaller output dimension
Figure BDA0002842049770000057
Combining character feature vectors obtained after dimensionality reduction
Figure BDA0002842049770000058
Inputting the character into a conditional random field layer (CRF layer), and obtaining the label probability distribution of each character in a quick text sequence through the constraint decoding of the conditional random field to the lexical method: />
Figure BDA0002842049770000061
Where y is the target tag, s is the character information, y' is all possible tag sequences, w j ' is all Chinese characters corresponding, W CRF And b CRF Are parameters and bias terms for the CRF layer. The CRF layer is further decoded according to a first order viterbi algorithm and the entire model is trained with a log likelihood loss function with an L2 regularization term. Wherein the log likelihood loss function is:
Figure BDA0002842049770000062
wherein N is the total number of the DCS samples marked with the keywords, theta is the overall model parameter, lambda is the training parameter, and P (y) i |s i ) Assigning the characters to corresponding labels according to the highest label probability for the label probability distribution of the characters; and further extracting keywords according to a result output by the CRF layer, wherein the label adopts a BIEO label rule, B is a starting character of the keyword, I is a middle character of the keyword, E is an ending character of the keyword, and O is a non-keyword character in the BIEO label rule, and the keyword between B and E is automatically acquired as the keyword corresponding to the financial and financial fast news.
It should be noted that the whole model is shown in fig. 2, and includes a trained convolutional neural network for acquiring feature vectors of five inputs, a RoBerta-wwm model and a CRF layer.
In particular, according to embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wire segments, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless segments, wire segments, fiber optic cables, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be understood by those skilled in the art that the embodiments of the present invention described above and illustrated in the accompanying drawings are illustrative only and not restrictive of the broad invention, and that the objects of the invention have been fully and effectively achieved and that the functional and structural principles of the present invention have been shown and described in the embodiments and that modifications and variations may be resorted to without departing from the principles described herein.

Claims (9)

1. A method for extracting keywords of financial news flashes is characterized by comprising the following steps:
acquiring financial quick messaging text data and marking the financial text;
inputting the marked text data into a pre-trained convolutional neural network to obtain font embedded characteristic vectors of text data characters;
inputting the labeled text data into a pre-trained RoBerta-wwm model, and acquiring semantic embedded feature vectors of text data characters;
splicing and reducing the dimension of the font embedded feature vector and the semantic embedded feature vector to obtain a combined character feature vector;
inputting the combined character feature vector into a conditional random field layer, and acquiring an output character label by adjusting training parameters;
extracting key words according to the character tags;
and acquiring a five-stroke font characteristic vector of each character according to the single financial and economic news text, and establishing a five-stroke font characteristic vector matrix of the single financial and economic news for acquiring a font embedded characteristic vector of the single financial and economic news.
2. The method of claim 1, wherein at least 3 convolutional kernel sliding windows of different sizes are established, a sliding feature map of each convolutional kernel sliding window on a five-stroke feature vector matrix is calculated, and pooling operation is performed according to the obtained feature map.
3. The method of claim 2, wherein a maximum pooling training parameter is obtained
Figure DEST_PATH_IMAGE002
And an average pooled training parameter>
Figure DEST_PATH_IMAGE004
And further calculating the output characteristics of the window after pooling, wherein the output characteristics of the window are as follows:
Figure DEST_PATH_IMAGE006
wherein
Figure DEST_PATH_IMAGE008
For characteristic maps of different windows, ->
Figure DEST_PATH_IMAGE010
The output characteristics of the different windows.
4. The method of claim 3, wherein the five-stroke embedded feature vector is obtained by concatenating the pooled output features of different windows
Figure DEST_PATH_IMAGE012
Comprises the following steps:
Figure DEST_PATH_IMAGE014
5. the method for extracting keywords of financial news flashes as claimed in claim 4, wherein the method for obtaining the semantic embedded feature vectors comprises the following steps: inputting RoBerta-wm encoder end to same single financial and economic news text for obtaining semantic embedded characteristic vector of each character
Figure DEST_PATH_IMAGE016
6. The method of claim 5, wherein the dimension-reduction training parameters are obtained
Figure DEST_PATH_IMAGE018
The semantic embedded characteristic vector and the semantic embedded characteristic vector are spliced, and the spliced result is based on the dimension-reduction training parameter->
Figure 218090DEST_PATH_IMAGE018
Performing dimension reduction to obtain the final characteristic vector(s) of the combined character(s)>
Figure DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE022
7. The method as claimed in claim 6, wherein the probability distribution of the label of each character in the single financial news text is calculated according to the label output from the conditional random field layer, and the keyword of the single financial news is obtained according to the probability distribution by using BIEO labeling rule.
8. The method of claim 7, wherein a CRF layer is decoded using a first order Viterbi algorithm and the entire model is trained using a log likelihood loss function, wherein
Figure DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE028
Is a canonical term, <' > based on a criterion>
Figure DEST_PATH_IMAGE030
In order to train the parameters of the device,Nfor the total number of newsletter samples labeled with the keyword, < > or>
Figure DEST_PATH_IMAGE032
Is the overall parameter of the model and is the overall parameter of the model,
Figure DEST_PATH_IMAGE034
is the label probability distribution of the character.
9. A keyword extraction system for financial and financial news in a short time is characterized in that the keyword extraction system for financial and financial news in a short time adopts the keyword extraction method for financial and financial news in any one of claims 1 to 8.
CN202011495561.5A 2020-12-17 2020-12-17 Method and system for extracting keywords of financial and economic news Active CN112507190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011495561.5A CN112507190B (en) 2020-12-17 2020-12-17 Method and system for extracting keywords of financial and economic news

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011495561.5A CN112507190B (en) 2020-12-17 2020-12-17 Method and system for extracting keywords of financial and economic news

Publications (2)

Publication Number Publication Date
CN112507190A CN112507190A (en) 2021-03-16
CN112507190B true CN112507190B (en) 2023-04-07

Family

ID=74922127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011495561.5A Active CN112507190B (en) 2020-12-17 2020-12-17 Method and system for extracting keywords of financial and economic news

Country Status (1)

Country Link
CN (1) CN112507190B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268953A (en) * 2021-07-15 2021-08-17 中国平安人寿保险股份有限公司 Text key word extraction method and device, computer equipment and storage medium
CN113822061B (en) * 2021-08-13 2023-09-08 国网上海市电力公司 Small sample patent classification method based on feature map construction
CN113887206B (en) * 2021-09-15 2023-04-28 北京三快在线科技有限公司 Model training and keyword extraction method and device
CN114757184B (en) * 2022-04-11 2023-11-10 中国航空综合技术研究所 Method and system for realizing knowledge question and answer in aviation field

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492215A (en) * 2018-09-18 2019-03-19 平安科技(深圳)有限公司 News property recognition methods, device, computer equipment and storage medium
CN110287483B (en) * 2019-06-06 2023-12-05 广东技术师范大学 Unregistered word recognition method and system utilizing five-stroke character root deep learning
CN110598213A (en) * 2019-09-06 2019-12-20 腾讯科技(深圳)有限公司 Keyword extraction method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112507190A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN112507190B (en) Method and system for extracting keywords of financial and economic news
CN109299273B (en) Multi-source multi-label text classification method and system based on improved seq2seq model
US11501182B2 (en) Method and apparatus for generating model
US11392838B2 (en) Method, equipment, computing device and computer-readable storage medium for knowledge extraction based on TextCNN
CN110442707B (en) Seq2 seq-based multi-label text classification method
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN111639175B (en) Self-supervision dialogue text abstract method and system
CN112800776B (en) Bidirectional GRU relation extraction data processing method, system, terminal and medium
CN111078887B (en) Text classification method and device
CN108920461B (en) Multi-type entity extraction method and device containing complex relationships
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN113076739A (en) Method and system for realizing cross-domain Chinese text error correction
CN111428485A (en) Method and device for classifying judicial literature paragraphs, computer equipment and storage medium
CN112287672A (en) Text intention recognition method and device, electronic equipment and storage medium
CN110569505A (en) text input method and device
CN111753086A (en) Junk mail identification method and device
CN111177375A (en) Electronic document classification method and device
CN114529903A (en) Text refinement network
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN113076720B (en) Long text segmentation method and device, storage medium and electronic device
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN115906835A (en) Chinese question text representation learning method based on clustering and contrast learning
CN114416981A (en) Long text classification method, device, equipment and storage medium
CN114925175A (en) Abstract generation method and device based on artificial intelligence, computer equipment and medium
CN111460105B (en) Topic mining method, system, equipment and storage medium based on short text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant