CN109558492A - A kind of listed company's knowledge mapping construction method and device suitable for event attribution - Google Patents
A kind of listed company's knowledge mapping construction method and device suitable for event attribution Download PDFInfo
- Publication number
- CN109558492A CN109558492A CN201811205312.0A CN201811205312A CN109558492A CN 109558492 A CN109558492 A CN 109558492A CN 201811205312 A CN201811205312 A CN 201811205312A CN 109558492 A CN109558492 A CN 109558492A
- Authority
- CN
- China
- Prior art keywords
- information
- news
- listed company
- real
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013507 mapping Methods 0.000 title claims abstract description 43
- 238000010276 construction Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 30
- 239000000284 extract Substances 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims description 41
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000012512 characterization method Methods 0.000 claims description 13
- 230000006978 adaptation Effects 0.000 claims description 11
- 238000003062 neural network model Methods 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 239000003245 coal Substances 0.000 claims description 7
- 238000013135 deep learning Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 7
- CDBYLPFSWZWCQE-UHFFFAOYSA-L Sodium Carbonate Chemical compound [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000013139 quantization Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 229910000029 sodium carbonate Inorganic materials 0.000 description 3
- 235000017550 sodium carbonate Nutrition 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 239000002360 explosive Substances 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 235000001674 Agaricus brunnescens Nutrition 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The present invention discloses a kind of listed company's knowledge mapping construction method and device suitable for event attribution, and the present apparatus generates financial dictionary including the use of the personal share basic side information of the listed company of acquisition and relevant personal share history news for realizing this method, this method;Real-time news database is generated using the relevant real-time news of each listed company of acquisition;Text classification is carried out to real-time news by text classifier;Extract newsletter archive information;Realize that listed company's knowledge mapping building entity mobility models map tracks the node on map according to particular news content using chart database Neo4J, to construct listed company's knowledge mapping with event attribution function.
Description
Technical field
The present invention relates to knowledge mappings to construct field, in particular to a kind of listed company's knowledge mapping suitable for event attribution
Construction method and device.
Background technique
With the rapid development of internet, the finance and economics information that we obtain shows explosive growth, major finance and economics security
Portal website also emerges in large numbers in succession like the mushrooms after rain.In order to guarantee the timeliness of news and rich, more preferably to strive
Take user resources by force, major financial web site all improves the publication density and range of financial and economic news in succession, and Domestic News expansion outburst becomes
Gesture is further violent.However the most investors of China are casual households at present, there is no sufficient time energy to go browsing a large amount of
Domestic News, also go to track the correlation degree between each news without enough retrieval analysis abilities.It therefore will be major
Listed company's related news are extract, and the map network for constructing an event attribution is necessary, and very valuable
Value.This, which will be more advantageous to general casual household, can accurately and quickly recognize the ups and downs possibility of which listed company or stock
Which influenced by media event, to make more valuable investment judgement.In addition, this knowledge graph based on event attribution
Spectrum can also be applied to quantization transaction.Quantization clerk can extract associated media event content in map, tie
Relevant natural language processing technique method is closed, a series of valuable indexs are formed, to be more advantageous to guidance quantization investment.
Current knowledge mapping building relates generally to two key technologies, and one is entity-relationship recognition technology, another
It is knowledge reasoning technology.
Entity-relationship recognition, which refers to, extracts the noun in article with specific information meaning, as specific
Processing unit is analyzed and researched.It is most suggested early in 1998 your year MUC meetings, the purpose is to pass through filling relationship templates
The mode of slot extracts specific relationship in text.With the development of statistical method, from identifying asking for relationship between entity in text
Topic is gradually converted into classification problem, and Zelenko [3] et al. proposes to express using the upper minimum public subtree of shallow parsing tree
Relationship example calculates the kernel function between two stalk trees, is divided by training (as utilized SVM separator) to example.But
Since the constraint of kernel function similarity calculation process compatible is stringenter, especially for there are larger in the expression of listed company's title
Redundancy, cause kernel-based method recall rate generally lower.Over time, corpus increases, and information is taken out
It takes and has been increasingly turned to the research based on neural model, relevant corpus is proposed as testing standard.Based on neural network model
Outstanding feature is not need that too many feature is added, and the feature being generally available has term vector, position etc..Later again it has been proposed that
Using be based on joint extraction model, this model can extract simultaneously entity and its between relationship.But whether being neural mould
The method of type, or the method extracted based on joint require a large amount of training corpus, and in financial and economic news and do not have foot
Enough label informations are unsatisfactory for carrying out this condition of model training, therefore this method based on classification using a large amount of corpus
Be not suitable for constructing the knowledge mapping of integrated listed company and related news information.
The general thoughts of knowledge reasoning technology are can be by existing node relationships and nodal information in map, in certain sections
When point changes, the corresponding change situation of associated node can be inferred to.Specifically, related personnel proposes
A kind of inference method based on symbol with a kind of easy to handle conceptual language, and develops the semantic network system of some commercializations
System, to make semantic network be provided simultaneously with Formal Semantic and efficient reasoning.Later related personnel uses multicore multiprocessing
Technology, and the distributed computing technology (such as MapReduce Computational frame, Peer-To-Peer network frame) based on network communication,
To solve the efficiency on Formal Semantic.But since financial and economic news quantity is in explosive growth, the reasoning of these systems
Efficiency is still difficult to meet growing data needs, it is difficult to use well.In addition, knowledge mapping here in addition to
Except listed company's quotation information such as shareholder, senior executive's essential information, the quotation information of some recessiveness is also required to be included in
Wherein, such as the content of company's principal products of business, the upstream and downstream Relationship of principal products of business etc..Upstream industry is related to raw material and confession
Quotient is answered, downstream industry is related to the problems such as consumer goods are with quotient is consumed, in addition, the current Industry situation of principal products of business is also a key
Information point, it is related to the relevance of industry competition opponent.It therefore only can not be in depth with this inference method based on symbol
Corresponding financial and economic news information is added in map, the trace ability of map event attribution is influenced.
Summary of the invention
The main object of the present invention is to propose a kind of listed company's knowledge mapping construction method suitable for event attribution, it is intended to
Overcome problem above.
To achieve the above object, a kind of knowledge mapping building side, listed company suitable for event attribution proposed by the present invention
Method characterized by comprising
S10 generates financial dictionary: obtaining several listed company's personal share basic side information and history news, extracts crucial words and phrases
Generate financial dictionary;
S20 generates real-time news database: obtaining the real-time news of listed company, generates real-time news database;
S30 designs text classifier: borrowing financial dictionary and extracts real-time news corpus from real-time news library, to be used to
Training text classifier carries out text classification to real-time news using the first convolution neural network model;
S40 Text Information Extraction: borrowing financial dictionary and carry out information extraction to the real-time news after classification, will be unstructured
Information is converted into the structured message of adaptation news database;
S50 constructs entity mobility models map: it is public to establish listing using the concept of figure in the data structure of Neo4J graphic data base
The initial model for taking charge of knowledge mapping, wherein using listed company's personal share basic side information as node, between each listed company
Relationship is boundary, inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
Preferably, before the S10 further include:
S01 link is well-known to destroy a burst website, obtains the stock list of listed company using crawlers, personal share basic side is believed
The relevant historical news of breath, personal share;
After the S10, before the S20 further include:
S02 links the website of major security and finance and economics information, and the real-time news of each listed company is obtained using crawlers.
Preferably, the first convolution neural network model is divided into four layers:
First layer is embedding layers, this layer indicates the vector that each word is mapped to low-dimensional;
The second layer is convolutional layer, is made of the Filter of different windows size, the same Filter parameter sharing, one
Filter is a kind of feature identifier, and window size is exactly the n-gram information identified;
Third layer is pond layer, and pond layer operation is to extract the maximum value for the column vector that convolution obtains, to obtain
To with the consistent row vector of Filter quantity;
4th layer is full articulamentum, i.e., adds one softmax layers after the layer of pond, and the vector of pond layer output is converted
For required output as a result, being the news category label needed for us.
Preferably, the described embedding layers method for indicating the vector that each word is mapped to low-dimensional utilizes open source
Word2vec kit.
Preferably, in the S30 using the first convolutional neural networks to real-time news carry out text classification before further include:
S301 pretreatment stage: word segmentation processing is carried out to each real-time news information, filters out low-frequency word and stop words, spy
Different symbol, punctuation mark and unallied mark information.
Preferably, the step of converting the structured message of adaptation news database for unstructured information in the S40
Include:
S401 entity mark: financial dictionary is borrowed, corresponding entity is identified in each news, and carry out to it
Entity mark;
S402 Relation extraction: using the term vector table trained in advance of the method inquiry based on deep learning, each sentence is generated
The term vector matrix of son, while coal addition position vector characteristics, the keyword for obtaining characterization classification by keyword abstraction algorithm are special
Sign carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position vector of vocabulary vector sum word
As the input of the second convolutional neural networks, sentence expression is obtained, wherein the second convolution neural network structure includes convolutional layer, pond
Change layer, non-linear layer, series of features is obtained by convolution algorithm to the keyword feature of characterization classification first, the layer in pond
The key feature of the lower each sentence of selection of effect, is combined into feature vector, finally by non-linear layer enter in classifier into
Row classification;
S403 event extraction: being showed the non-structured text containing event information with structured form, according to public
Take charge of name information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
Preferably, the S403 specifically:
(1) it company name information: using company name as an important feature of event sentence, is acquired by following formula:
Scorecompany(Si)=Count (Si);
(2) financial field verb information: financial dictionary is borrowed, the weight of verb information is calculated, calculation formula is as follows:
(3) sentence position: sentence position weight computing formula is as follows:
The invention also discloses a kind of listed company's knowledge mapping construction device based on event attribution, comprising:
First generation module extracts keyword for obtaining several listed company's personal share basic side information and history news
Sentence generates financial dictionary;
Second generation module generates real-time news database for obtaining the real-time news of listed company;
Categorization module extracts real-time news corpus from real-time news library for borrowing financial dictionary, to be used to train
Text classifier carries out text classification to real-time news using the first convolution neural network model, wherein further including that pretreatment is single
Member, the pretreatment unit are used to before carrying out text classification to real-time news carry out at participle each real-time news information
Reason, filters out low-frequency word and stop words, additional character, punctuation mark and unallied mark information;
Abstraction module carries out information extraction to the real-time news after classification for borrowing financial dictionary, by unstructured letter
Breath is converted into the structured message of adaptation news database;
Third generation module, the concept for figure in the data structure using Neo4J graphic data base establish listed company
The initial model of knowledge mapping, wherein using listed company's personal share basic side information as node, with the pass between each listed company
System is boundary, inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
Preferably, further include that link crawls module, for link it is well-known destroy a burst website, it is public to obtain listing using crawlers
The stock list of department, personal share basic side information, the relevant historical news of personal share;And the website of the major security and finance and economics information of link,
The real-time news of each listed company is obtained using crawlers.
Preferably, the abstraction module includes:
Entity marks unit and identifies corresponding entity in each news, and to it for borrowing financial dictionary
Carry out entity mark;
Relation extraction unit generates every for the term vector table trained in advance using the method inquiry based on deep learning
The term vector matrix of a sentence, while coal addition position vector characteristics obtain the key of characterization classification by keyword abstraction algorithm
Word feature carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position of vocabulary vector sum word
Input of the vector as the second convolutional neural networks obtains sentence expression, wherein the second convolution neural network structure includes convolution
Layer, pond layer, non-linear layer obtain series of features by convolution algorithm to the keyword feature of characterization classification first, in pond
The key feature that each sentence is selected under the action of change layer, is combined into feature vector, enters classification finally by non-linear layer
Classify in device;
Event extraction unit, for the non-structured text containing event information to be showed with structured form, according to
According to company name information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
It is the investment in financial market the purpose of the present invention is constructing the listed company's map for having event attribution function
Person provides the inherent clue of clear apparent financial and economic news and corresponding listed company, and investor is helped to spend the less time but can be more
The connection of major listed company's financial and economic news information is comprehensively cleared, so that more accurate Value investment judgement is made, while
Important indicator relevant to Domestic News can be provided for quantization transaction practitioner.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
The structure shown according to these attached drawings obtains other attached drawings.
Fig. 1 is the method flow of one embodiment of listed company's knowledge mapping construction method suitable for event attribution of the invention
Figure;
Fig. 2 is the method stream for converting unstructured information in the S40 structured message of adaptation news database
Cheng Tu;
Fig. 3 is the method stream of another embodiment of listed company's knowledge mapping construction method suitable for event attribution of the invention
Cheng Tu;
Fig. 4 is the functional module of one embodiment of listed company's knowledge mapping construction device suitable for event attribution of the invention
Figure;
Fig. 5 is that the function of the abstraction module refines figure;
Fig. 6 is the structural schematic diagram of the first convolution neural network model;
Fig. 7 is the structural schematic diagram of second convolutional neural networks;
Fig. 8 is the knowledge mapping frame of certain specific drinks stock.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.Base
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its
His embodiment, shall fall within the protection scope of the present invention.
It is to be appreciated that if relating to directionality instruction (such as up, down, left, right, before and after ...) in the embodiment of the present invention,
Then directionality instruction be only used for explain under a certain particular pose (as shown in the picture) between each component relative positional relationship,
Motion conditions etc., if the particular pose changes, directionality instruction is also correspondingly changed correspondingly.
In addition, being somebody's turn to do " first ", " second " etc. if relating to the description of " first ", " second " etc. in the embodiment of the present invention
Description be used for description purposes only, be not understood to indicate or imply its relative importance or implicitly indicate indicated skill
The quantity of art feature." first " is defined as a result, the feature of " second " can explicitly or implicitly include at least one spy
Sign.It in addition, the technical solution between each embodiment can be combined with each other, but must be with those of ordinary skill in the art's energy
It is enough realize based on, will be understood that the knot of this technical solution when conflicting or cannot achieve when occurs in the combination of technical solution
Conjunction is not present, also not the present invention claims protection scope within.
As shown in figs. 1-7, a kind of listed company's knowledge mapping construction method suitable for event attribution proposed by the present invention,
It is characterized in that, comprising:
S10 generates financial dictionary: obtaining several listed company's personal share basic side information and history news, extracts crucial words and phrases
Generate financial dictionary;
S20 generates real-time news database: obtaining the real-time news of listed company, generates real-time news database;
S30 designs text classifier: borrowing financial dictionary and extracts real-time news corpus from real-time news library, to be used to
Training text classifier carries out text classification to real-time news using the first convolution neural network model;
S40 Text Information Extraction: borrowing financial dictionary and carry out information extraction to the real-time news after classification, will be unstructured
Information is converted into the structured message of adaptation news database;
S50 constructs entity mobility models map: it is public to establish listing using the concept of figure in the data structure of Neo4J graphic data base
The initial model for taking charge of knowledge mapping, wherein using listed company's personal share basic side information as node, between each listed company
Relationship is boundary, inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
In embodiments of the present invention, the present invention constructs financial dictionary and real-time news database in advance, and financial dictionary is used for
It is segmented for subsequent real-time news sentence, extracts keyword, sentence etc. and prepare;Real-time news database is public for subsequent listing
The atlas analysis of handler of miscellaneous affairs's part attribution retrospect;Text classifier is used to carry out text classification, the real-time news of each to real-time news
There is specific subject classification, is related to some personal share, some industry concept, so needing for various real-time news to be classified as accordingly
Some classifications, to prepare for the analysis of public opinion subsequently with respect to map;Text Information Extraction is used for unstructured information
It is converted into the structured message of adaptation news database;Entity mobility models map is used for according to particular news content, on map
Node carries out tracking attribution.
Preferably, before the S10 further include:
S01 link is well-known to destroy a burst website, obtains the stock list of listed company using crawlers, personal share basic side is believed
The relevant historical news of breath, personal share;
After the S10, before the S20 further include:
S02 links the website of major security and finance and economics information, and the real-time news of each listed company is obtained using crawlers.
Preferably, the first convolution neural network model is divided into four layers:
First layer is embedding layers, this layer indicates the vector that each word is mapped to low-dimensional;
The second layer is convolutional layer, is made of the Filter of different windows size, the same Filter parameter sharing, one
Filter is a kind of feature identifier, and window size is exactly the n-gram information identified;
Third layer is pond layer, and pond layer operation is to extract the maximum value for the column vector that convolution obtains, to obtain
To with the consistent row vector of Filter quantity;
4th layer is full articulamentum, i.e., adds one softmax layers after the layer of pond, and the vector of pond layer output is converted
For required output as a result, being the news category label needed for us.
Preferably, the described embedding layers method for indicating the vector that each word is mapped to low-dimensional utilizes open source
Word2vec kit.
Preferably, in the S30 using the first convolutional neural networks to real-time news carry out text classification before further include:
S301 pretreatment stage: word segmentation processing is carried out to each real-time news information, filters out low-frequency word and stop words, spy
Different symbol, punctuation mark and unallied mark information.
Preferably, the step of converting the structured message of adaptation news database for unstructured information in the S40
Include:
S401 entity mark: financial dictionary is borrowed, corresponding entity is identified in each news, and carry out to it
Entity mark;
S402 Relation extraction: using the term vector table trained in advance of the method inquiry based on deep learning, each sentence is generated
The term vector matrix of son, while coal addition position vector characteristics, the keyword for obtaining characterization classification by keyword abstraction algorithm are special
Sign carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position vector of vocabulary vector sum word
As the input of the second convolutional neural networks, sentence expression is obtained, wherein the second convolution neural network structure includes convolutional layer, pond
Change layer, non-linear layer, series of features is obtained by convolution algorithm to the keyword feature of characterization classification first, the layer in pond
The key feature of the lower each sentence of selection of effect, is combined into feature vector, finally by non-linear layer enter in classifier into
Row classification;
S403 event extraction: being showed the non-structured text containing event information with structured form, according to public
Take charge of name information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
Preferably, the S403 specifically:
(1) it company name information: using company name as an important feature of event sentence, is acquired by following formula:
Scorecompany(Si)=Count (Si);
(2) financial field verb information: financial dictionary is borrowed, the weight of verb information is calculated, calculation formula is as follows:
(3) sentence position: sentence position weight computing formula is as follows:
The invention also discloses a kind of listed company's knowledge mapping construction device based on event attribution, for realizing above-mentioned
Method at least has above-mentioned implementation since the present apparatus uses whole technical solutions of all embodiments of the above method
All beneficial effects brought by the technical solution of example, this is no longer going to repeat them.The present apparatus includes:
First generation module 10 extracts crucial for obtaining several listed company's personal share basic side information and history news
Words and phrases generate financial dictionary;
Second generation module 20 generates real-time news database for obtaining the real-time news of listed company;
Categorization module 30 extracts real-time news corpus from real-time news library for borrowing financial dictionary, to be used to instruct
Practice text classifier, text classification is carried out to real-time news using the first convolution neural network model, wherein further including pretreatment
Unit, the pretreatment unit are used to before carrying out text classification to real-time news carry out at participle each real-time news information
Reason, filters out low-frequency word and stop words, additional character, punctuation mark and unallied mark information;
Abstraction module 40 carries out information extraction to the real-time news after classification for borrowing financial dictionary, will be unstructured
Information is converted into the structured message of adaptation news database;
Third generation module 50, it is public for establishing listing using the concept of figure in the data structure of Neo4J graphic data base
The initial model for taking charge of knowledge mapping, wherein using listed company's personal share basic side information as node, between each listed company
Relationship is boundary, inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
Preferably, further include link crawl module 01, for link it is well-known destroy a burst website, using crawlers obtain list
The stock list of company, personal share basic side information, the relevant historical news of personal share;And the net of the major security and finance and economics information of link
It stands, the real-time news of each listed company is obtained using crawlers.
Preferably, the abstraction module 40 includes:
Entity marks unit 401 and identifies corresponding entity in each news for borrowing financial dictionary, and
Entity mark is carried out to it;
Relation extraction unit 402 is generated for the term vector table trained in advance using the method inquiry based on deep learning
The term vector matrix of each sentence, while coal addition position vector characteristics obtain the pass of characterization classification by keyword abstraction algorithm
Keyword feature carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position of vocabulary vector sum word
Input of the vector as the second convolutional neural networks is set, sentence expression is obtained, wherein the second convolution neural network structure includes volume
Lamination, pond layer, non-linear layer obtain series of features by convolution algorithm to the keyword feature of characterization classification first,
The key feature that each sentence is selected under the action of the layer of pond, is combined into feature vector, enters point finally by non-linear layer
Classify in class device;
Event extraction unit 403, for the non-structured text containing event information to be showed with structured form,
According to company name information, financial field verb information and sentence position, with judge current sentence whether be a news event
Sentence.
Practical operation example of the invention:
Straight flush is obtained in advance, then the web site urls such as east wealth obtain A-share stock using the crawlers realized and arrange
Table, personal share basic side information, and a large amount of relevant personal share history news, for constructing financial dictionary.Financial dictionary mainly wraps
Containing two large divisions, a part be comprising various entities, it is another comprising company name, company code, director, senior executive, trade information etc.
Part is the concrete behavior vocabulary for describing financial and economic news personal share current status.In addition major security and finance and economics information is obtained in advance
Website, then obtain the relevant real-time news of each listed company also with similar crawlers, form corresponding news
Database, for subsequent atlas analysis to be added.
Real-time news corpus is extracted from real-time news database to be used to train.Before classification, & apos, it is pre-processed first
Stage carries out word segmentation processing to each real-time news information, filter out low-frequency word and stop words, additional character, punctuation mark and
Some unrelated mark informations.Here text classification is realized using CNN, entire model is divided into four layers.First layer is embedding
The vector that each word is mapped to low-dimensional is indicated (using the method for word2vec) by layer, this layer;The second layer is convolutional layer, by not
Filter with window size is constituted, and the same Filter parameter sharing greatly reduces number of parameters, and a Filter
It can only identify same category feature, so a Filter is exactly a kind of feature identifier, window size is exactly the n-gram identified
Information.Third layer be pond layer, pondization operation be the maximum value for the column vector that convolution obtains is extracted, thus obtain and
The consistent row vector of Filter quantity.4th layer is full articulamentum, i.e., one softmax layers is added after the layer of pond, purpose
It is that required output is converted into for the vector for exporting pond layer as a result, news category label i.e. needed for us, the first volume
Product Artificial Neural Network Structures are as shown in Figure 6.
After the text classification for completing real-time news, it is also necessary to which the information for carrying out news on the basis of particular category is taken out
It takes.The purpose of information extraction is to convert general structured message for existing unstructured news information, and detailed process can
It is divided into following three step.
(1) entity marks, and the financial dictionary constructed using S1 can identify corresponding reality in each news
Body, and entity mark is carried out to it.It is explained below with a simple news example.Wherein ' Shandong sea ', ' net profit
Profit ', ' soda ash ' etc. related entities are all identified as with different categories class respectively.
Such as: Shandong sea [company name] night on the 28th bulletin, it is contemplated that the realization of 2017 years belongs to listed company's stock
63,0,000,000 yuan -69 of net profit [performance index] of east, 0,000,000 yuan, realization is made a profit instead of suffering a loss.Same period last year loss: 12,308.68
Wan Yuan.In report period, leading products soda ash [principal products of business] volume of production and marketing is significantly increased compared with same period last year, and sale price is also compared with same period last year
It is substantially increased.
(2) Relation extraction, main purpose be from text identify entity and extract entity and then identify entity it
Between semantic relation.Herein, Relation extraction mainly uses the method based on deep learning, is closed using convolutional neural networks
System extracts.Specifically, the input using the position vector of vocabulary vector sum word as convolutional neural networks, passes through convolutional layer, pond
Change layer and non-linear layer obtains sentence expression.Term vector table trained in advance by inquiry first, generate the word of each sentence to
Moment matrix, while coal addition position vector characteristics obtain the keyword feature of characterization classification by keyword abstraction algorithm.Then it passes through
Cross convolution algorithm and obtain series of features, select the key feature of each sentence under the action of layer in pond, be combined into feature to
Amount, enters in classifier finally by full articulamentum and classifies, the second convolutional neural networks are as shown in Figure 7.
(3) event extraction is showed the non-structured text containing event information in the form of structuring.Divide
Analyse a sentence whether be a news event sentence, mainly consider three features: company name information, field verb information and language
Sentence position.
(a) company name information.The important theme of media event is company, so using company name as a weight of event sentence
Want feature.It can be acquired with following formula:
Scorecompany(Si)=Count (Si)
(b) financial field verb information, verb generally as an event core, according to the financial field of constructed earlier
Dictionary can calculate the weight of verb information.Its calculation formula is as follows:
(c) sentence position.In financial and economic news, the high sentence of information content typically occurs in former sentences, so its weight
Calculation formula is as follows:
By handling above, the finally obtained event extraction content of the news that S2 is provided is as follows:
Estimated realize of<performance information>Shandong seaization is turned losses into profits.
<product information>leading products soda ash yield is significantly increased compared with same period last year, sale price also compared with same period last year substantially on
It rises.
(4) building of knowledge mapping
After the extraction for completing news information, i.e., using existing content creating knowledge mapping.Here diagram data is used
Library Neo4J realizes listed company's knowledge mapping.Neo4J is modeled using the concept of figure in data structure, wherein most basic
Concept be node and side.Node presentation-entity, such as personal share, shareholder, senior executive's content.The side then relationship between presentation-entity.
Using the corresponding interface of Neo4J, we can be added event extraction content obtained in S3 wherein.Shown in Fig. 8 is one
The knowledge mapping frame of certain specific drinks stock.There can be corresponding corresponding relationship between each entity and entity, and each
Entity can all have a real-time news list of thing associated therewith.According to the information of this map, we be can be carried out
The analysis of event attribution.For example, we can go retrospect may from from the node if the drinks stock price goes up
The related news event for leading to those of drinks stock price rise interdependent node, does the attribution of outgoing event clear and accurately.
The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all at this
Under the inventive concept of invention, using equivalent structure transformation made by description of the invention and accompanying drawing content, or directly/use indirectly
It is included in other related technical areas in scope of patent protection of the invention.
Claims (10)
1. a kind of listed company's knowledge mapping construction method suitable for event attribution characterized by comprising
S10 generates financial dictionary: obtaining several listed company's personal share basic side information and history news, extracts crucial words and phrases and generate
Financial dictionary;
S20 generates real-time news database: obtaining the real-time news of listed company, generates real-time news database;
S30 designs text classifier: borrowing financial dictionary and extracts real-time news corpus from real-time news library, to be used to train
Text classifier carries out text classification to real-time news using the first convolution neural network model;
S40 extracts text information: borrowing financial dictionary and carries out information extraction to the real-time news after classification, by unstructured information
It is converted into the structured message of adaptation news database;
S50 constructs entity mobility models map: establishing listed company using the concept of figure in the data structure of Neo4J graphic data base and knows
The initial model for knowing map, wherein using listed company's personal share basic side information as node, with the relationship between each listed company
For boundary, the entity news information obtained by S40 information extraction is inputted, generates listed company's knowledge mapping.
2. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described
Before S10 further include:
S01 link is well-known to destroy a burst website, obtains the stock list of listed company, personal share basic side information, a using crawlers
The relevant historical news of stock;
After the S10, before the S20 further include:
S02 links the website of major security and finance and economics information, and the real-time news of each listed company is obtained using crawlers.
3. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described
First convolution neural network model is divided into four layers:
First layer is embedding layers, this layer indicates the vector that each word is mapped to low-dimensional;
The second layer is convolutional layer, is made of the Filter of different windows size, the same Filter parameter sharing, a Filter
For a kind of feature identifier, window size is exactly the n-gram information identified;
Third layer is pond layer, pond layer operation for the maximum value for the column vector that convolution obtains is extracted, thus obtain and
The consistent row vector of Filter quantity;
4th layer is full articulamentum, i.e., adds one softmax layers after the layer of pond, converts institute for the vector that pond layer exports
The output needed is as a result, be the news category label needed for us.
4. being suitable for listed company's knowledge mapping construction method of event attribution as claimed in claim 3, which is characterized in that described
The embedding layers of method for indicating the vector that each word is mapped to low-dimensional utilize open source Word2vec kit.
5. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described
In S30 using convolutional neural networks to real-time news carry out text classification before further include:
S301 pretreatment stage: word segmentation processing is carried out to each real-time news information, filters out low-frequency word and stop words, special symbol
Number, punctuation mark and unallied mark information.
6. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described
In S40 by unstructured information be converted into adaptation news database structured message the step of include:
S401 entity mark: financial dictionary is borrowed, corresponding entity is identified in each news, and entity is carried out to it
Mark;
S402 Relation extraction: using the term vector table trained in advance of the method inquiry based on deep learning, each sentence is generated
Term vector matrix, while coal addition position vector characteristics obtain the keyword feature of characterization classification, benefit by keyword abstraction algorithm
Semantic relation extraction between entity is carried out with the second convolutional neural networks, i.e., using the position vector of vocabulary vector sum word as the
The input of two convolutional neural networks, obtain sentence expression, wherein the second convolution neural network structure include convolutional layer, pond layer,
Non-linear layer obtains series of features by convolution algorithm to the keyword feature of characterization classification first, the effect of layer in pond
The key feature of the lower each sentence of selection, is combined into feature vector, enters in classifier and divided finally by non-linear layer
Class;
S403 event extraction: being showed the non-structured text containing event information with structured form, according to company name
Information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
7. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described
S403 specifically:
(1) it company name information: using company name as an important feature of event sentence, is acquired by following formula:
Scorecompany(Si)=Count (Si);
(2) financial field verb information: financial dictionary is borrowed, the weight of verb information is calculated, calculation formula is as follows:
(3) sentence position: sentence position weight computing formula is as follows:
8. a kind of listed company's knowledge mapping construction device based on event attribution characterized by comprising
It is raw to extract crucial words and phrases for obtaining several listed company's personal share basic side information and history news for first generation module
At financial dictionary;
Second generation module generates real-time news database for obtaining the real-time news of listed company;
Categorization module extracts real-time news corpus from real-time news library for borrowing financial dictionary, to be used to training text
Classifier carries out text classification to real-time news using the first convolution neural network model, wherein further include pretreatment unit, institute
Pretreatment unit is stated for carrying out word segmentation processing, filtering to each real-time news information before carrying out text classification to real-time news
Fall low-frequency word and stop words, additional character, punctuation mark and unallied mark information;
Abstraction module carries out information extraction to the real-time news after classification for borrowing financial dictionary, unstructured information is turned
Turn to the structured message of adaptation news database;
Third generation module, the concept for figure in the data structure using Neo4J graphic data base establish listed company's knowledge
The initial model of map, wherein being with the relationship between each listed company using listed company's personal share basic side information as node
Boundary inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
9. being suitable for listed company's knowledge mapping construction method of event attribution as claimed in claim 8, which is characterized in that also wrap
Include link and crawl module, for link it is well-known destroy a burst website, obtain the stock list of listed company, personal share base using crawlers
This face information, the relevant historical news of personal share;And the website of the major security and finance and economics information of link, it is obtained on each using crawlers
The real-time news of company, city.
10. being suitable for listed company's knowledge mapping construction method of event attribution as claimed in claim 8, which is characterized in that institute
Stating abstraction module includes:
Entity marks unit and identifies corresponding entity in each news, and carry out to it for borrowing financial dictionary
Entity mark;
Relation extraction unit generates each sentence for the term vector table trained in advance using the method inquiry based on deep learning
The term vector matrix of son, while coal addition position vector characteristics, the keyword for obtaining characterization classification by keyword abstraction algorithm are special
Sign carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position vector of vocabulary vector sum word
As the input of the second convolutional neural networks, sentence expression is obtained, wherein the second convolution neural network structure includes convolutional layer, pond
Change layer, non-linear layer, series of features is obtained by convolution algorithm to the keyword feature of characterization classification first, the layer in pond
The key feature of the lower each sentence of selection of effect, is combined into feature vector, finally by non-linear layer enter in classifier into
Row classification;
Event extraction unit, for the non-structured text containing event information to be showed with structured form, according to public
Take charge of name information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811205312.0A CN109558492A (en) | 2018-10-16 | 2018-10-16 | A kind of listed company's knowledge mapping construction method and device suitable for event attribution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811205312.0A CN109558492A (en) | 2018-10-16 | 2018-10-16 | A kind of listed company's knowledge mapping construction method and device suitable for event attribution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109558492A true CN109558492A (en) | 2019-04-02 |
Family
ID=65865034
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811205312.0A Pending CN109558492A (en) | 2018-10-16 | 2018-10-16 | A kind of listed company's knowledge mapping construction method and device suitable for event attribution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109558492A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110377756A (en) * | 2019-07-04 | 2019-10-25 | 成都迪普曼林信息技术有限公司 | Mass data collection event relation abstracting method |
CN110377693A (en) * | 2019-06-06 | 2019-10-25 | 新华智云科技有限公司 | The model training method and generation method of financial and economic news, device, equipment and medium |
CN110399339A (en) * | 2019-06-18 | 2019-11-01 | 平安科技(深圳)有限公司 | File classifying method, device, equipment and the storage medium of knowledge base management system |
CN110543562A (en) * | 2019-08-19 | 2019-12-06 | 武大吉奥信息技术有限公司 | Event map-based automatic urban management event distribution method and system |
CN110990525A (en) * | 2019-11-15 | 2020-04-10 | 华融融通(北京)科技有限公司 | Natural language processing-based public opinion information extraction and knowledge base generation method |
CN111475625A (en) * | 2020-05-09 | 2020-07-31 | 山东舜网传媒股份有限公司 | News manuscript generation method and system based on knowledge graph |
CN111612633A (en) * | 2020-05-27 | 2020-09-01 | 佛山市知识图谱科技有限公司 | Stock analysis method, stock analysis device, computer equipment and storage medium |
CN111626898A (en) * | 2020-03-20 | 2020-09-04 | 贝壳技术有限公司 | Method, device, medium and electronic equipment for realizing attribution of events |
CN112286772A (en) * | 2020-10-14 | 2021-01-29 | 北京易观智库网络科技有限公司 | Attribution analysis method and device and electronic equipment |
CN112612899A (en) * | 2020-11-24 | 2021-04-06 | 中国传媒大学 | Knowledge graph construction method and device, storage medium and electronic equipment |
CN112819308B (en) * | 2021-01-23 | 2024-04-02 | 罗家德 | Head enterprise identification method based on bidirectional graph convolution neural network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066599A (en) * | 2017-04-20 | 2017-08-18 | 北京文因互联科技有限公司 | A kind of similar enterprise of the listed company searching classification method and system of knowledge based storehouse reasoning |
CN107665252A (en) * | 2017-09-27 | 2018-02-06 | 深圳证券信息有限公司 | A kind of method and device of creation of knowledge collection of illustrative plates |
CN108596439A (en) * | 2018-03-29 | 2018-09-28 | 北京中兴通网络科技股份有限公司 | A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates |
-
2018
- 2018-10-16 CN CN201811205312.0A patent/CN109558492A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066599A (en) * | 2017-04-20 | 2017-08-18 | 北京文因互联科技有限公司 | A kind of similar enterprise of the listed company searching classification method and system of knowledge based storehouse reasoning |
CN107665252A (en) * | 2017-09-27 | 2018-02-06 | 深圳证券信息有限公司 | A kind of method and device of creation of knowledge collection of illustrative plates |
CN108596439A (en) * | 2018-03-29 | 2018-09-28 | 北京中兴通网络科技股份有限公司 | A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110377693A (en) * | 2019-06-06 | 2019-10-25 | 新华智云科技有限公司 | The model training method and generation method of financial and economic news, device, equipment and medium |
CN110399339A (en) * | 2019-06-18 | 2019-11-01 | 平安科技(深圳)有限公司 | File classifying method, device, equipment and the storage medium of knowledge base management system |
CN110377756B (en) * | 2019-07-04 | 2020-03-17 | 成都迪普曼林信息技术有限公司 | Method for extracting event relation of mass data set |
CN110377756A (en) * | 2019-07-04 | 2019-10-25 | 成都迪普曼林信息技术有限公司 | Mass data collection event relation abstracting method |
CN110543562A (en) * | 2019-08-19 | 2019-12-06 | 武大吉奥信息技术有限公司 | Event map-based automatic urban management event distribution method and system |
CN110990525A (en) * | 2019-11-15 | 2020-04-10 | 华融融通(北京)科技有限公司 | Natural language processing-based public opinion information extraction and knowledge base generation method |
CN111626898A (en) * | 2020-03-20 | 2020-09-04 | 贝壳技术有限公司 | Method, device, medium and electronic equipment for realizing attribution of events |
CN111626898B (en) * | 2020-03-20 | 2022-03-15 | 贝壳找房(北京)科技有限公司 | Method, device, medium and electronic equipment for realizing attribution of events |
CN111475625A (en) * | 2020-05-09 | 2020-07-31 | 山东舜网传媒股份有限公司 | News manuscript generation method and system based on knowledge graph |
CN111612633A (en) * | 2020-05-27 | 2020-09-01 | 佛山市知识图谱科技有限公司 | Stock analysis method, stock analysis device, computer equipment and storage medium |
CN112286772A (en) * | 2020-10-14 | 2021-01-29 | 北京易观智库网络科技有限公司 | Attribution analysis method and device and electronic equipment |
CN112612899A (en) * | 2020-11-24 | 2021-04-06 | 中国传媒大学 | Knowledge graph construction method and device, storage medium and electronic equipment |
CN112819308B (en) * | 2021-01-23 | 2024-04-02 | 罗家德 | Head enterprise identification method based on bidirectional graph convolution neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109558492A (en) | A kind of listed company's knowledge mapping construction method and device suitable for event attribution | |
Da | The computational case against computational literary studies | |
CN112199511B (en) | Cross-language multi-source vertical domain knowledge graph construction method | |
JP7468929B2 (en) | How to acquire geographical knowledge | |
WO2018028077A1 (en) | Deep learning based method and device for chinese semantics analysis | |
CN111159395B (en) | Chart neural network-based rumor standpoint detection method and device and electronic equipment | |
CN114064918B (en) | Multi-modal event knowledge graph construction method | |
US20170083817A1 (en) | Topic detection in a social media sentiment extraction system | |
CN108182295A (en) | A kind of Company Knowledge collection of illustrative plates attribute extraction method and system | |
CN105512687A (en) | Emotion classification model training and textual emotion polarity analysis method and system | |
CN110502626A (en) | A kind of aspect grade sentiment analysis method based on convolutional neural networks | |
Mehmood et al. | A precisely xtreme-multi channel hybrid approach for roman urdu sentiment analysis | |
CN112559656A (en) | Method for constructing affair map based on hydrologic events | |
CN113254652B (en) | Social media posting authenticity detection method based on hypergraph attention network | |
Yu et al. | Sentiment analysis for news and social media in COVID-19 | |
CN113032552A (en) | Text abstract-based policy key point extraction method and system | |
CN113282757A (en) | End-to-end triple extraction method and system based on E-commerce field representation model | |
CN113869040A (en) | Voice recognition method for power grid dispatching | |
CN116522945A (en) | Model and method for identifying named entities in food safety field | |
CN113111136A (en) | Entity disambiguation method and device based on UCL knowledge space | |
Zhang | Exploration of Cross-Modal Text Generation Methods in Smart Justice | |
Fu et al. | A study on recursive neural network based sentiment classification of Sina Weibo | |
Zhang et al. | ELMo+ Gated self-attention network based on BiDAF for machine reading comprehension | |
CN116522895B (en) | Text content authenticity assessment method and device based on writing style | |
Chaturvedi et al. | Basic tasks of sentiment analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190402 |
|
RJ01 | Rejection of invention patent application after publication |