CN109558492A - A kind of listed company's knowledge mapping construction method and device suitable for event attribution - Google Patents

A kind of listed company's knowledge mapping construction method and device suitable for event attribution Download PDF

Info

Publication number
CN109558492A
CN109558492A CN201811205312.0A CN201811205312A CN109558492A CN 109558492 A CN109558492 A CN 109558492A CN 201811205312 A CN201811205312 A CN 201811205312A CN 109558492 A CN109558492 A CN 109558492A
Authority
CN
China
Prior art keywords
information
news
listed company
real
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811205312.0A
Other languages
Chinese (zh)
Inventor
郑子彬
梁宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201811205312.0A priority Critical patent/CN109558492A/en
Publication of CN109558492A publication Critical patent/CN109558492A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The present invention discloses a kind of listed company's knowledge mapping construction method and device suitable for event attribution, and the present apparatus generates financial dictionary including the use of the personal share basic side information of the listed company of acquisition and relevant personal share history news for realizing this method, this method;Real-time news database is generated using the relevant real-time news of each listed company of acquisition;Text classification is carried out to real-time news by text classifier;Extract newsletter archive information;Realize that listed company's knowledge mapping building entity mobility models map tracks the node on map according to particular news content using chart database Neo4J, to construct listed company's knowledge mapping with event attribution function.

Description

A kind of listed company's knowledge mapping construction method and device suitable for event attribution
Technical field
The present invention relates to knowledge mappings to construct field, in particular to a kind of listed company's knowledge mapping suitable for event attribution Construction method and device.
Background technique
With the rapid development of internet, the finance and economics information that we obtain shows explosive growth, major finance and economics security Portal website also emerges in large numbers in succession like the mushrooms after rain.In order to guarantee the timeliness of news and rich, more preferably to strive Take user resources by force, major financial web site all improves the publication density and range of financial and economic news in succession, and Domestic News expansion outburst becomes Gesture is further violent.However the most investors of China are casual households at present, there is no sufficient time energy to go browsing a large amount of Domestic News, also go to track the correlation degree between each news without enough retrieval analysis abilities.It therefore will be major Listed company's related news are extract, and the map network for constructing an event attribution is necessary, and very valuable Value.This, which will be more advantageous to general casual household, can accurately and quickly recognize the ups and downs possibility of which listed company or stock Which influenced by media event, to make more valuable investment judgement.In addition, this knowledge graph based on event attribution Spectrum can also be applied to quantization transaction.Quantization clerk can extract associated media event content in map, tie Relevant natural language processing technique method is closed, a series of valuable indexs are formed, to be more advantageous to guidance quantization investment.
Current knowledge mapping building relates generally to two key technologies, and one is entity-relationship recognition technology, another It is knowledge reasoning technology.
Entity-relationship recognition, which refers to, extracts the noun in article with specific information meaning, as specific Processing unit is analyzed and researched.It is most suggested early in 1998 your year MUC meetings, the purpose is to pass through filling relationship templates The mode of slot extracts specific relationship in text.With the development of statistical method, from identifying asking for relationship between entity in text Topic is gradually converted into classification problem, and Zelenko [3] et al. proposes to express using the upper minimum public subtree of shallow parsing tree Relationship example calculates the kernel function between two stalk trees, is divided by training (as utilized SVM separator) to example.But Since the constraint of kernel function similarity calculation process compatible is stringenter, especially for there are larger in the expression of listed company's title Redundancy, cause kernel-based method recall rate generally lower.Over time, corpus increases, and information is taken out It takes and has been increasingly turned to the research based on neural model, relevant corpus is proposed as testing standard.Based on neural network model Outstanding feature is not need that too many feature is added, and the feature being generally available has term vector, position etc..Later again it has been proposed that Using be based on joint extraction model, this model can extract simultaneously entity and its between relationship.But whether being neural mould The method of type, or the method extracted based on joint require a large amount of training corpus, and in financial and economic news and do not have foot Enough label informations are unsatisfactory for carrying out this condition of model training, therefore this method based on classification using a large amount of corpus Be not suitable for constructing the knowledge mapping of integrated listed company and related news information.
The general thoughts of knowledge reasoning technology are can be by existing node relationships and nodal information in map, in certain sections When point changes, the corresponding change situation of associated node can be inferred to.Specifically, related personnel proposes A kind of inference method based on symbol with a kind of easy to handle conceptual language, and develops the semantic network system of some commercializations System, to make semantic network be provided simultaneously with Formal Semantic and efficient reasoning.Later related personnel uses multicore multiprocessing Technology, and the distributed computing technology (such as MapReduce Computational frame, Peer-To-Peer network frame) based on network communication, To solve the efficiency on Formal Semantic.But since financial and economic news quantity is in explosive growth, the reasoning of these systems Efficiency is still difficult to meet growing data needs, it is difficult to use well.In addition, knowledge mapping here in addition to Except listed company's quotation information such as shareholder, senior executive's essential information, the quotation information of some recessiveness is also required to be included in Wherein, such as the content of company's principal products of business, the upstream and downstream Relationship of principal products of business etc..Upstream industry is related to raw material and confession Quotient is answered, downstream industry is related to the problems such as consumer goods are with quotient is consumed, in addition, the current Industry situation of principal products of business is also a key Information point, it is related to the relevance of industry competition opponent.It therefore only can not be in depth with this inference method based on symbol Corresponding financial and economic news information is added in map, the trace ability of map event attribution is influenced.
Summary of the invention
The main object of the present invention is to propose a kind of listed company's knowledge mapping construction method suitable for event attribution, it is intended to Overcome problem above.
To achieve the above object, a kind of knowledge mapping building side, listed company suitable for event attribution proposed by the present invention Method characterized by comprising
S10 generates financial dictionary: obtaining several listed company's personal share basic side information and history news, extracts crucial words and phrases Generate financial dictionary;
S20 generates real-time news database: obtaining the real-time news of listed company, generates real-time news database;
S30 designs text classifier: borrowing financial dictionary and extracts real-time news corpus from real-time news library, to be used to Training text classifier carries out text classification to real-time news using the first convolution neural network model;
S40 Text Information Extraction: borrowing financial dictionary and carry out information extraction to the real-time news after classification, will be unstructured Information is converted into the structured message of adaptation news database;
S50 constructs entity mobility models map: it is public to establish listing using the concept of figure in the data structure of Neo4J graphic data base The initial model for taking charge of knowledge mapping, wherein using listed company's personal share basic side information as node, between each listed company Relationship is boundary, inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
Preferably, before the S10 further include:
S01 link is well-known to destroy a burst website, obtains the stock list of listed company using crawlers, personal share basic side is believed The relevant historical news of breath, personal share;
After the S10, before the S20 further include:
S02 links the website of major security and finance and economics information, and the real-time news of each listed company is obtained using crawlers.
Preferably, the first convolution neural network model is divided into four layers:
First layer is embedding layers, this layer indicates the vector that each word is mapped to low-dimensional;
The second layer is convolutional layer, is made of the Filter of different windows size, the same Filter parameter sharing, one Filter is a kind of feature identifier, and window size is exactly the n-gram information identified;
Third layer is pond layer, and pond layer operation is to extract the maximum value for the column vector that convolution obtains, to obtain To with the consistent row vector of Filter quantity;
4th layer is full articulamentum, i.e., adds one softmax layers after the layer of pond, and the vector of pond layer output is converted For required output as a result, being the news category label needed for us.
Preferably, the described embedding layers method for indicating the vector that each word is mapped to low-dimensional utilizes open source Word2vec kit.
Preferably, in the S30 using the first convolutional neural networks to real-time news carry out text classification before further include:
S301 pretreatment stage: word segmentation processing is carried out to each real-time news information, filters out low-frequency word and stop words, spy Different symbol, punctuation mark and unallied mark information.
Preferably, the step of converting the structured message of adaptation news database for unstructured information in the S40 Include:
S401 entity mark: financial dictionary is borrowed, corresponding entity is identified in each news, and carry out to it Entity mark;
S402 Relation extraction: using the term vector table trained in advance of the method inquiry based on deep learning, each sentence is generated The term vector matrix of son, while coal addition position vector characteristics, the keyword for obtaining characterization classification by keyword abstraction algorithm are special Sign carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position vector of vocabulary vector sum word As the input of the second convolutional neural networks, sentence expression is obtained, wherein the second convolution neural network structure includes convolutional layer, pond Change layer, non-linear layer, series of features is obtained by convolution algorithm to the keyword feature of characterization classification first, the layer in pond The key feature of the lower each sentence of selection of effect, is combined into feature vector, finally by non-linear layer enter in classifier into Row classification;
S403 event extraction: being showed the non-structured text containing event information with structured form, according to public Take charge of name information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
Preferably, the S403 specifically:
(1) it company name information: using company name as an important feature of event sentence, is acquired by following formula: Scorecompany(Si)=Count (Si);
(2) financial field verb information: financial dictionary is borrowed, the weight of verb information is calculated, calculation formula is as follows:
(3) sentence position: sentence position weight computing formula is as follows:
The invention also discloses a kind of listed company's knowledge mapping construction device based on event attribution, comprising:
First generation module extracts keyword for obtaining several listed company's personal share basic side information and history news Sentence generates financial dictionary;
Second generation module generates real-time news database for obtaining the real-time news of listed company;
Categorization module extracts real-time news corpus from real-time news library for borrowing financial dictionary, to be used to train Text classifier carries out text classification to real-time news using the first convolution neural network model, wherein further including that pretreatment is single Member, the pretreatment unit are used to before carrying out text classification to real-time news carry out at participle each real-time news information Reason, filters out low-frequency word and stop words, additional character, punctuation mark and unallied mark information;
Abstraction module carries out information extraction to the real-time news after classification for borrowing financial dictionary, by unstructured letter Breath is converted into the structured message of adaptation news database;
Third generation module, the concept for figure in the data structure using Neo4J graphic data base establish listed company The initial model of knowledge mapping, wherein using listed company's personal share basic side information as node, with the pass between each listed company System is boundary, inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
Preferably, further include that link crawls module, for link it is well-known destroy a burst website, it is public to obtain listing using crawlers The stock list of department, personal share basic side information, the relevant historical news of personal share;And the website of the major security and finance and economics information of link, The real-time news of each listed company is obtained using crawlers.
Preferably, the abstraction module includes:
Entity marks unit and identifies corresponding entity in each news, and to it for borrowing financial dictionary Carry out entity mark;
Relation extraction unit generates every for the term vector table trained in advance using the method inquiry based on deep learning The term vector matrix of a sentence, while coal addition position vector characteristics obtain the key of characterization classification by keyword abstraction algorithm Word feature carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position of vocabulary vector sum word Input of the vector as the second convolutional neural networks obtains sentence expression, wherein the second convolution neural network structure includes convolution Layer, pond layer, non-linear layer obtain series of features by convolution algorithm to the keyword feature of characterization classification first, in pond The key feature that each sentence is selected under the action of change layer, is combined into feature vector, enters classification finally by non-linear layer Classify in device;
Event extraction unit, for the non-structured text containing event information to be showed with structured form, according to According to company name information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
It is the investment in financial market the purpose of the present invention is constructing the listed company's map for having event attribution function Person provides the inherent clue of clear apparent financial and economic news and corresponding listed company, and investor is helped to spend the less time but can be more The connection of major listed company's financial and economic news information is comprehensively cleared, so that more accurate Value investment judgement is made, while Important indicator relevant to Domestic News can be provided for quantization transaction practitioner.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with The structure shown according to these attached drawings obtains other attached drawings.
Fig. 1 is the method flow of one embodiment of listed company's knowledge mapping construction method suitable for event attribution of the invention Figure;
Fig. 2 is the method stream for converting unstructured information in the S40 structured message of adaptation news database Cheng Tu;
Fig. 3 is the method stream of another embodiment of listed company's knowledge mapping construction method suitable for event attribution of the invention Cheng Tu;
Fig. 4 is the functional module of one embodiment of listed company's knowledge mapping construction device suitable for event attribution of the invention Figure;
Fig. 5 is that the function of the abstraction module refines figure;
Fig. 6 is the structural schematic diagram of the first convolution neural network model;
Fig. 7 is the structural schematic diagram of second convolutional neural networks;
Fig. 8 is the knowledge mapping frame of certain specific drinks stock.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.
It is to be appreciated that if relating to directionality instruction (such as up, down, left, right, before and after ...) in the embodiment of the present invention, Then directionality instruction be only used for explain under a certain particular pose (as shown in the picture) between each component relative positional relationship, Motion conditions etc., if the particular pose changes, directionality instruction is also correspondingly changed correspondingly.
In addition, being somebody's turn to do " first ", " second " etc. if relating to the description of " first ", " second " etc. in the embodiment of the present invention Description be used for description purposes only, be not understood to indicate or imply its relative importance or implicitly indicate indicated skill The quantity of art feature." first " is defined as a result, the feature of " second " can explicitly or implicitly include at least one spy Sign.It in addition, the technical solution between each embodiment can be combined with each other, but must be with those of ordinary skill in the art's energy It is enough realize based on, will be understood that the knot of this technical solution when conflicting or cannot achieve when occurs in the combination of technical solution Conjunction is not present, also not the present invention claims protection scope within.
As shown in figs. 1-7, a kind of listed company's knowledge mapping construction method suitable for event attribution proposed by the present invention, It is characterized in that, comprising:
S10 generates financial dictionary: obtaining several listed company's personal share basic side information and history news, extracts crucial words and phrases Generate financial dictionary;
S20 generates real-time news database: obtaining the real-time news of listed company, generates real-time news database;
S30 designs text classifier: borrowing financial dictionary and extracts real-time news corpus from real-time news library, to be used to Training text classifier carries out text classification to real-time news using the first convolution neural network model;
S40 Text Information Extraction: borrowing financial dictionary and carry out information extraction to the real-time news after classification, will be unstructured Information is converted into the structured message of adaptation news database;
S50 constructs entity mobility models map: it is public to establish listing using the concept of figure in the data structure of Neo4J graphic data base The initial model for taking charge of knowledge mapping, wherein using listed company's personal share basic side information as node, between each listed company Relationship is boundary, inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
In embodiments of the present invention, the present invention constructs financial dictionary and real-time news database in advance, and financial dictionary is used for It is segmented for subsequent real-time news sentence, extracts keyword, sentence etc. and prepare;Real-time news database is public for subsequent listing The atlas analysis of handler of miscellaneous affairs's part attribution retrospect;Text classifier is used to carry out text classification, the real-time news of each to real-time news There is specific subject classification, is related to some personal share, some industry concept, so needing for various real-time news to be classified as accordingly Some classifications, to prepare for the analysis of public opinion subsequently with respect to map;Text Information Extraction is used for unstructured information It is converted into the structured message of adaptation news database;Entity mobility models map is used for according to particular news content, on map Node carries out tracking attribution.
Preferably, before the S10 further include:
S01 link is well-known to destroy a burst website, obtains the stock list of listed company using crawlers, personal share basic side is believed The relevant historical news of breath, personal share;
After the S10, before the S20 further include:
S02 links the website of major security and finance and economics information, and the real-time news of each listed company is obtained using crawlers.
Preferably, the first convolution neural network model is divided into four layers:
First layer is embedding layers, this layer indicates the vector that each word is mapped to low-dimensional;
The second layer is convolutional layer, is made of the Filter of different windows size, the same Filter parameter sharing, one Filter is a kind of feature identifier, and window size is exactly the n-gram information identified;
Third layer is pond layer, and pond layer operation is to extract the maximum value for the column vector that convolution obtains, to obtain To with the consistent row vector of Filter quantity;
4th layer is full articulamentum, i.e., adds one softmax layers after the layer of pond, and the vector of pond layer output is converted For required output as a result, being the news category label needed for us.
Preferably, the described embedding layers method for indicating the vector that each word is mapped to low-dimensional utilizes open source Word2vec kit.
Preferably, in the S30 using the first convolutional neural networks to real-time news carry out text classification before further include:
S301 pretreatment stage: word segmentation processing is carried out to each real-time news information, filters out low-frequency word and stop words, spy Different symbol, punctuation mark and unallied mark information.
Preferably, the step of converting the structured message of adaptation news database for unstructured information in the S40 Include:
S401 entity mark: financial dictionary is borrowed, corresponding entity is identified in each news, and carry out to it Entity mark;
S402 Relation extraction: using the term vector table trained in advance of the method inquiry based on deep learning, each sentence is generated The term vector matrix of son, while coal addition position vector characteristics, the keyword for obtaining characterization classification by keyword abstraction algorithm are special Sign carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position vector of vocabulary vector sum word As the input of the second convolutional neural networks, sentence expression is obtained, wherein the second convolution neural network structure includes convolutional layer, pond Change layer, non-linear layer, series of features is obtained by convolution algorithm to the keyword feature of characterization classification first, the layer in pond The key feature of the lower each sentence of selection of effect, is combined into feature vector, finally by non-linear layer enter in classifier into Row classification;
S403 event extraction: being showed the non-structured text containing event information with structured form, according to public Take charge of name information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
Preferably, the S403 specifically:
(1) it company name information: using company name as an important feature of event sentence, is acquired by following formula: Scorecompany(Si)=Count (Si);
(2) financial field verb information: financial dictionary is borrowed, the weight of verb information is calculated, calculation formula is as follows:
(3) sentence position: sentence position weight computing formula is as follows:
The invention also discloses a kind of listed company's knowledge mapping construction device based on event attribution, for realizing above-mentioned Method at least has above-mentioned implementation since the present apparatus uses whole technical solutions of all embodiments of the above method All beneficial effects brought by the technical solution of example, this is no longer going to repeat them.The present apparatus includes:
First generation module 10 extracts crucial for obtaining several listed company's personal share basic side information and history news Words and phrases generate financial dictionary;
Second generation module 20 generates real-time news database for obtaining the real-time news of listed company;
Categorization module 30 extracts real-time news corpus from real-time news library for borrowing financial dictionary, to be used to instruct Practice text classifier, text classification is carried out to real-time news using the first convolution neural network model, wherein further including pretreatment Unit, the pretreatment unit are used to before carrying out text classification to real-time news carry out at participle each real-time news information Reason, filters out low-frequency word and stop words, additional character, punctuation mark and unallied mark information;
Abstraction module 40 carries out information extraction to the real-time news after classification for borrowing financial dictionary, will be unstructured Information is converted into the structured message of adaptation news database;
Third generation module 50, it is public for establishing listing using the concept of figure in the data structure of Neo4J graphic data base The initial model for taking charge of knowledge mapping, wherein using listed company's personal share basic side information as node, between each listed company Relationship is boundary, inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
Preferably, further include link crawl module 01, for link it is well-known destroy a burst website, using crawlers obtain list The stock list of company, personal share basic side information, the relevant historical news of personal share;And the net of the major security and finance and economics information of link It stands, the real-time news of each listed company is obtained using crawlers.
Preferably, the abstraction module 40 includes:
Entity marks unit 401 and identifies corresponding entity in each news for borrowing financial dictionary, and Entity mark is carried out to it;
Relation extraction unit 402 is generated for the term vector table trained in advance using the method inquiry based on deep learning The term vector matrix of each sentence, while coal addition position vector characteristics obtain the pass of characterization classification by keyword abstraction algorithm Keyword feature carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position of vocabulary vector sum word Input of the vector as the second convolutional neural networks is set, sentence expression is obtained, wherein the second convolution neural network structure includes volume Lamination, pond layer, non-linear layer obtain series of features by convolution algorithm to the keyword feature of characterization classification first, The key feature that each sentence is selected under the action of the layer of pond, is combined into feature vector, enters point finally by non-linear layer Classify in class device;
Event extraction unit 403, for the non-structured text containing event information to be showed with structured form, According to company name information, financial field verb information and sentence position, with judge current sentence whether be a news event Sentence.
Practical operation example of the invention:
Straight flush is obtained in advance, then the web site urls such as east wealth obtain A-share stock using the crawlers realized and arrange Table, personal share basic side information, and a large amount of relevant personal share history news, for constructing financial dictionary.Financial dictionary mainly wraps Containing two large divisions, a part be comprising various entities, it is another comprising company name, company code, director, senior executive, trade information etc. Part is the concrete behavior vocabulary for describing financial and economic news personal share current status.In addition major security and finance and economics information is obtained in advance Website, then obtain the relevant real-time news of each listed company also with similar crawlers, form corresponding news Database, for subsequent atlas analysis to be added.
Real-time news corpus is extracted from real-time news database to be used to train.Before classification, & apos, it is pre-processed first Stage carries out word segmentation processing to each real-time news information, filter out low-frequency word and stop words, additional character, punctuation mark and Some unrelated mark informations.Here text classification is realized using CNN, entire model is divided into four layers.First layer is embedding The vector that each word is mapped to low-dimensional is indicated (using the method for word2vec) by layer, this layer;The second layer is convolutional layer, by not Filter with window size is constituted, and the same Filter parameter sharing greatly reduces number of parameters, and a Filter It can only identify same category feature, so a Filter is exactly a kind of feature identifier, window size is exactly the n-gram identified Information.Third layer be pond layer, pondization operation be the maximum value for the column vector that convolution obtains is extracted, thus obtain and The consistent row vector of Filter quantity.4th layer is full articulamentum, i.e., one softmax layers is added after the layer of pond, purpose It is that required output is converted into for the vector for exporting pond layer as a result, news category label i.e. needed for us, the first volume Product Artificial Neural Network Structures are as shown in Figure 6.
After the text classification for completing real-time news, it is also necessary to which the information for carrying out news on the basis of particular category is taken out It takes.The purpose of information extraction is to convert general structured message for existing unstructured news information, and detailed process can It is divided into following three step.
(1) entity marks, and the financial dictionary constructed using S1 can identify corresponding reality in each news Body, and entity mark is carried out to it.It is explained below with a simple news example.Wherein ' Shandong sea ', ' net profit Profit ', ' soda ash ' etc. related entities are all identified as with different categories class respectively.
Such as: Shandong sea [company name] night on the 28th bulletin, it is contemplated that the realization of 2017 years belongs to listed company's stock 63,0,000,000 yuan -69 of net profit [performance index] of east, 0,000,000 yuan, realization is made a profit instead of suffering a loss.Same period last year loss: 12,308.68 Wan Yuan.In report period, leading products soda ash [principal products of business] volume of production and marketing is significantly increased compared with same period last year, and sale price is also compared with same period last year It is substantially increased.
(2) Relation extraction, main purpose be from text identify entity and extract entity and then identify entity it Between semantic relation.Herein, Relation extraction mainly uses the method based on deep learning, is closed using convolutional neural networks System extracts.Specifically, the input using the position vector of vocabulary vector sum word as convolutional neural networks, passes through convolutional layer, pond Change layer and non-linear layer obtains sentence expression.Term vector table trained in advance by inquiry first, generate the word of each sentence to Moment matrix, while coal addition position vector characteristics obtain the keyword feature of characterization classification by keyword abstraction algorithm.Then it passes through Cross convolution algorithm and obtain series of features, select the key feature of each sentence under the action of layer in pond, be combined into feature to Amount, enters in classifier finally by full articulamentum and classifies, the second convolutional neural networks are as shown in Figure 7.
(3) event extraction is showed the non-structured text containing event information in the form of structuring.Divide Analyse a sentence whether be a news event sentence, mainly consider three features: company name information, field verb information and language Sentence position.
(a) company name information.The important theme of media event is company, so using company name as a weight of event sentence Want feature.It can be acquired with following formula:
Scorecompany(Si)=Count (Si)
(b) financial field verb information, verb generally as an event core, according to the financial field of constructed earlier Dictionary can calculate the weight of verb information.Its calculation formula is as follows:
(c) sentence position.In financial and economic news, the high sentence of information content typically occurs in former sentences, so its weight Calculation formula is as follows:
By handling above, the finally obtained event extraction content of the news that S2 is provided is as follows:
Estimated realize of<performance information>Shandong seaization is turned losses into profits.
<product information>leading products soda ash yield is significantly increased compared with same period last year, sale price also compared with same period last year substantially on It rises.
(4) building of knowledge mapping
After the extraction for completing news information, i.e., using existing content creating knowledge mapping.Here diagram data is used Library Neo4J realizes listed company's knowledge mapping.Neo4J is modeled using the concept of figure in data structure, wherein most basic Concept be node and side.Node presentation-entity, such as personal share, shareholder, senior executive's content.The side then relationship between presentation-entity. Using the corresponding interface of Neo4J, we can be added event extraction content obtained in S3 wherein.Shown in Fig. 8 is one The knowledge mapping frame of certain specific drinks stock.There can be corresponding corresponding relationship between each entity and entity, and each Entity can all have a real-time news list of thing associated therewith.According to the information of this map, we be can be carried out The analysis of event attribution.For example, we can go retrospect may from from the node if the drinks stock price goes up The related news event for leading to those of drinks stock price rise interdependent node, does the attribution of outgoing event clear and accurately.
The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all at this Under the inventive concept of invention, using equivalent structure transformation made by description of the invention and accompanying drawing content, or directly/use indirectly It is included in other related technical areas in scope of patent protection of the invention.

Claims (10)

1. a kind of listed company's knowledge mapping construction method suitable for event attribution characterized by comprising
S10 generates financial dictionary: obtaining several listed company's personal share basic side information and history news, extracts crucial words and phrases and generate Financial dictionary;
S20 generates real-time news database: obtaining the real-time news of listed company, generates real-time news database;
S30 designs text classifier: borrowing financial dictionary and extracts real-time news corpus from real-time news library, to be used to train Text classifier carries out text classification to real-time news using the first convolution neural network model;
S40 extracts text information: borrowing financial dictionary and carries out information extraction to the real-time news after classification, by unstructured information It is converted into the structured message of adaptation news database;
S50 constructs entity mobility models map: establishing listed company using the concept of figure in the data structure of Neo4J graphic data base and knows The initial model for knowing map, wherein using listed company's personal share basic side information as node, with the relationship between each listed company For boundary, the entity news information obtained by S40 information extraction is inputted, generates listed company's knowledge mapping.
2. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described Before S10 further include:
S01 link is well-known to destroy a burst website, obtains the stock list of listed company, personal share basic side information, a using crawlers The relevant historical news of stock;
After the S10, before the S20 further include:
S02 links the website of major security and finance and economics information, and the real-time news of each listed company is obtained using crawlers.
3. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described First convolution neural network model is divided into four layers:
First layer is embedding layers, this layer indicates the vector that each word is mapped to low-dimensional;
The second layer is convolutional layer, is made of the Filter of different windows size, the same Filter parameter sharing, a Filter For a kind of feature identifier, window size is exactly the n-gram information identified;
Third layer is pond layer, pond layer operation for the maximum value for the column vector that convolution obtains is extracted, thus obtain and The consistent row vector of Filter quantity;
4th layer is full articulamentum, i.e., adds one softmax layers after the layer of pond, converts institute for the vector that pond layer exports The output needed is as a result, be the news category label needed for us.
4. being suitable for listed company's knowledge mapping construction method of event attribution as claimed in claim 3, which is characterized in that described The embedding layers of method for indicating the vector that each word is mapped to low-dimensional utilize open source Word2vec kit.
5. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described In S30 using convolutional neural networks to real-time news carry out text classification before further include:
S301 pretreatment stage: word segmentation processing is carried out to each real-time news information, filters out low-frequency word and stop words, special symbol Number, punctuation mark and unallied mark information.
6. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described In S40 by unstructured information be converted into adaptation news database structured message the step of include:
S401 entity mark: financial dictionary is borrowed, corresponding entity is identified in each news, and entity is carried out to it Mark;
S402 Relation extraction: using the term vector table trained in advance of the method inquiry based on deep learning, each sentence is generated Term vector matrix, while coal addition position vector characteristics obtain the keyword feature of characterization classification, benefit by keyword abstraction algorithm Semantic relation extraction between entity is carried out with the second convolutional neural networks, i.e., using the position vector of vocabulary vector sum word as the The input of two convolutional neural networks, obtain sentence expression, wherein the second convolution neural network structure include convolutional layer, pond layer, Non-linear layer obtains series of features by convolution algorithm to the keyword feature of characterization classification first, the effect of layer in pond The key feature of the lower each sentence of selection, is combined into feature vector, enters in classifier and divided finally by non-linear layer Class;
S403 event extraction: being showed the non-structured text containing event information with structured form, according to company name Information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
7. being suitable for listed company's knowledge mapping construction method of event attribution as described in claim 1, which is characterized in that described S403 specifically:
(1) it company name information: using company name as an important feature of event sentence, is acquired by following formula:
Scorecompany(Si)=Count (Si);
(2) financial field verb information: financial dictionary is borrowed, the weight of verb information is calculated, calculation formula is as follows:
(3) sentence position: sentence position weight computing formula is as follows:
8. a kind of listed company's knowledge mapping construction device based on event attribution characterized by comprising
It is raw to extract crucial words and phrases for obtaining several listed company's personal share basic side information and history news for first generation module At financial dictionary;
Second generation module generates real-time news database for obtaining the real-time news of listed company;
Categorization module extracts real-time news corpus from real-time news library for borrowing financial dictionary, to be used to training text Classifier carries out text classification to real-time news using the first convolution neural network model, wherein further include pretreatment unit, institute Pretreatment unit is stated for carrying out word segmentation processing, filtering to each real-time news information before carrying out text classification to real-time news Fall low-frequency word and stop words, additional character, punctuation mark and unallied mark information;
Abstraction module carries out information extraction to the real-time news after classification for borrowing financial dictionary, unstructured information is turned Turn to the structured message of adaptation news database;
Third generation module, the concept for figure in the data structure using Neo4J graphic data base establish listed company's knowledge The initial model of map, wherein being with the relationship between each listed company using listed company's personal share basic side information as node Boundary inputs the entity news information obtained by S40 information extraction, generates listed company's knowledge mapping.
9. being suitable for listed company's knowledge mapping construction method of event attribution as claimed in claim 8, which is characterized in that also wrap Include link and crawl module, for link it is well-known destroy a burst website, obtain the stock list of listed company, personal share base using crawlers This face information, the relevant historical news of personal share;And the website of the major security and finance and economics information of link, it is obtained on each using crawlers The real-time news of company, city.
10. being suitable for listed company's knowledge mapping construction method of event attribution as claimed in claim 8, which is characterized in that institute Stating abstraction module includes:
Entity marks unit and identifies corresponding entity in each news, and carry out to it for borrowing financial dictionary Entity mark;
Relation extraction unit generates each sentence for the term vector table trained in advance using the method inquiry based on deep learning The term vector matrix of son, while coal addition position vector characteristics, the keyword for obtaining characterization classification by keyword abstraction algorithm are special Sign carries out semantic relation extraction between entity using the second convolutional neural networks, that is, uses the position vector of vocabulary vector sum word As the input of the second convolutional neural networks, sentence expression is obtained, wherein the second convolution neural network structure includes convolutional layer, pond Change layer, non-linear layer, series of features is obtained by convolution algorithm to the keyword feature of characterization classification first, the layer in pond The key feature of the lower each sentence of selection of effect, is combined into feature vector, finally by non-linear layer enter in classifier into Row classification;
Event extraction unit, for the non-structured text containing event information to be showed with structured form, according to public Take charge of name information, financial field verb information and sentence position, with judge current sentence whether be a news event sentence.
CN201811205312.0A 2018-10-16 2018-10-16 A kind of listed company's knowledge mapping construction method and device suitable for event attribution Pending CN109558492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811205312.0A CN109558492A (en) 2018-10-16 2018-10-16 A kind of listed company's knowledge mapping construction method and device suitable for event attribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811205312.0A CN109558492A (en) 2018-10-16 2018-10-16 A kind of listed company's knowledge mapping construction method and device suitable for event attribution

Publications (1)

Publication Number Publication Date
CN109558492A true CN109558492A (en) 2019-04-02

Family

ID=65865034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811205312.0A Pending CN109558492A (en) 2018-10-16 2018-10-16 A kind of listed company's knowledge mapping construction method and device suitable for event attribution

Country Status (1)

Country Link
CN (1) CN109558492A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377756A (en) * 2019-07-04 2019-10-25 成都迪普曼林信息技术有限公司 Mass data collection event relation abstracting method
CN110377693A (en) * 2019-06-06 2019-10-25 新华智云科技有限公司 The model training method and generation method of financial and economic news, device, equipment and medium
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN110543562A (en) * 2019-08-19 2019-12-06 武大吉奥信息技术有限公司 Event map-based automatic urban management event distribution method and system
CN110990525A (en) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 Natural language processing-based public opinion information extraction and knowledge base generation method
CN111475625A (en) * 2020-05-09 2020-07-31 山东舜网传媒股份有限公司 News manuscript generation method and system based on knowledge graph
CN111612633A (en) * 2020-05-27 2020-09-01 佛山市知识图谱科技有限公司 Stock analysis method, stock analysis device, computer equipment and storage medium
CN111626898A (en) * 2020-03-20 2020-09-04 贝壳技术有限公司 Method, device, medium and electronic equipment for realizing attribution of events
CN112286772A (en) * 2020-10-14 2021-01-29 北京易观智库网络科技有限公司 Attribution analysis method and device and electronic equipment
CN112612899A (en) * 2020-11-24 2021-04-06 中国传媒大学 Knowledge graph construction method and device, storage medium and electronic equipment
CN112819308B (en) * 2021-01-23 2024-04-02 罗家德 Head enterprise identification method based on bidirectional graph convolution neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066599A (en) * 2017-04-20 2017-08-18 北京文因互联科技有限公司 A kind of similar enterprise of the listed company searching classification method and system of knowledge based storehouse reasoning
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN108596439A (en) * 2018-03-29 2018-09-28 北京中兴通网络科技股份有限公司 A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066599A (en) * 2017-04-20 2017-08-18 北京文因互联科技有限公司 A kind of similar enterprise of the listed company searching classification method and system of knowledge based storehouse reasoning
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN108596439A (en) * 2018-03-29 2018-09-28 北京中兴通网络科技股份有限公司 A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377693A (en) * 2019-06-06 2019-10-25 新华智云科技有限公司 The model training method and generation method of financial and economic news, device, equipment and medium
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN110377756B (en) * 2019-07-04 2020-03-17 成都迪普曼林信息技术有限公司 Method for extracting event relation of mass data set
CN110377756A (en) * 2019-07-04 2019-10-25 成都迪普曼林信息技术有限公司 Mass data collection event relation abstracting method
CN110543562A (en) * 2019-08-19 2019-12-06 武大吉奥信息技术有限公司 Event map-based automatic urban management event distribution method and system
CN110990525A (en) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 Natural language processing-based public opinion information extraction and knowledge base generation method
CN111626898A (en) * 2020-03-20 2020-09-04 贝壳技术有限公司 Method, device, medium and electronic equipment for realizing attribution of events
CN111626898B (en) * 2020-03-20 2022-03-15 贝壳找房(北京)科技有限公司 Method, device, medium and electronic equipment for realizing attribution of events
CN111475625A (en) * 2020-05-09 2020-07-31 山东舜网传媒股份有限公司 News manuscript generation method and system based on knowledge graph
CN111612633A (en) * 2020-05-27 2020-09-01 佛山市知识图谱科技有限公司 Stock analysis method, stock analysis device, computer equipment and storage medium
CN112286772A (en) * 2020-10-14 2021-01-29 北京易观智库网络科技有限公司 Attribution analysis method and device and electronic equipment
CN112612899A (en) * 2020-11-24 2021-04-06 中国传媒大学 Knowledge graph construction method and device, storage medium and electronic equipment
CN112819308B (en) * 2021-01-23 2024-04-02 罗家德 Head enterprise identification method based on bidirectional graph convolution neural network

Similar Documents

Publication Publication Date Title
CN109558492A (en) A kind of listed company&#39;s knowledge mapping construction method and device suitable for event attribution
Da The computational case against computational literary studies
CN112199511B (en) Cross-language multi-source vertical domain knowledge graph construction method
JP7468929B2 (en) How to acquire geographical knowledge
WO2018028077A1 (en) Deep learning based method and device for chinese semantics analysis
CN111159395B (en) Chart neural network-based rumor standpoint detection method and device and electronic equipment
CN114064918B (en) Multi-modal event knowledge graph construction method
US20170083817A1 (en) Topic detection in a social media sentiment extraction system
CN108182295A (en) A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
CN105512687A (en) Emotion classification model training and textual emotion polarity analysis method and system
CN110502626A (en) A kind of aspect grade sentiment analysis method based on convolutional neural networks
Mehmood et al. A precisely xtreme-multi channel hybrid approach for roman urdu sentiment analysis
CN112559656A (en) Method for constructing affair map based on hydrologic events
CN113254652B (en) Social media posting authenticity detection method based on hypergraph attention network
Yu et al. Sentiment analysis for news and social media in COVID-19
CN113032552A (en) Text abstract-based policy key point extraction method and system
CN113282757A (en) End-to-end triple extraction method and system based on E-commerce field representation model
CN113869040A (en) Voice recognition method for power grid dispatching
CN116522945A (en) Model and method for identifying named entities in food safety field
CN113111136A (en) Entity disambiguation method and device based on UCL knowledge space
Zhang Exploration of Cross-Modal Text Generation Methods in Smart Justice
Fu et al. A study on recursive neural network based sentiment classification of Sina Weibo
Zhang et al. ELMo+ Gated self-attention network based on BiDAF for machine reading comprehension
CN116522895B (en) Text content authenticity assessment method and device based on writing style
Chaturvedi et al. Basic tasks of sentiment analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190402

RJ01 Rejection of invention patent application after publication