CN107943847A - Business connection extracting method, device and storage medium - Google Patents
Business connection extracting method, device and storage medium Download PDFInfo
- Publication number
- CN107943847A CN107943847A CN201711061205.0A CN201711061205A CN107943847A CN 107943847 A CN107943847 A CN 107943847A CN 201711061205 A CN201711061205 A CN 201711061205A CN 107943847 A CN107943847 A CN 107943847A
- Authority
- CN
- China
- Prior art keywords
- vector
- sentence
- business
- word
- sample sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Databases & Information Systems (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of business connection extracting method, device and storage medium, this method includes:Extracted from knowledge base and sample storehouse is established as training sample sentence to sentence there are the business entity of relation;All trained sample sentences comprising a business entity pair are extracted from sample storehouse and are segmented, each word is mapped to term vector xi, it is mapped to sentence vector Si;Term vector x is calculated with LSTMiThe first hidden layer state vector hiWith the second hidden layer state vector hi', splicing obtains comprehensive hidden layer state vector, then obtains feature vector Ti;By feature vector TiSubstitute into average vector expression formula and calculate average vector S;Average vector S and the relationship type of business entity pair substitution softmax classification functions are calculated to the weight a of each trained sample sentencei;Extraction includes the sentence of Liang Ge business entities, and feature vector T is obtained by bi LSTMi, trained RNN models are input to, predict the relation of the Liang Ge enterprises, cost of labor is reduced, more accurately predicts the relation between the Liang Ge business entities.
Description
Technical field
The present invention relates to processing data information technical field, more particularly to a kind of business connection extracting method, device and meter
Calculation machine readable storage medium storing program for executing.
Background technology
Identify the association between different enterprises, such as treasury trade, supply chain, cooperation, to business risk early warning in news
There is very great meaning.But now common entity relation extraction method needs manually to carry out the mark of a large amount of training datas,
And corpus labeling work generally takes time and effort very much.
The content of the invention
In view of the foregoing, the present invention provides a kind of business connection extracting method, device and computer-readable recording medium,
Relation extraction model based on convolutional neural networks can be expanded in remote supervisory data, efficiently reduce model to artificial
The dependence of labeled data, and this business connection extracting method for having supervision has more compared to semi-supervised or unsupervised approaches
Good accuracy rate and recall rate.
To achieve the above object, the present invention provides a kind of business connection extracting method, and this method includes:
Sample storehouse establishment step:Extract from knowledge base and sentence is established as training sample sentence there are the business entity of relation
Sample storehouse;
Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default point
Word instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence
It is mapped to sentence vector Si, the input as Recognition with Recurrent Neural Network model first layer;
Splice step:In the second layer of Recognition with Recurrent Neural Network model, calculated from left to right currently with shot and long term memory module
Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector
hi', the synthesis hidden layer state vector of each word in trained sample sentence is obtained by splicing two hidden layer state vectors, further according to
The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei;
Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize
Average vector expression formula calculates the average vector S of each trained sample sentence;
Weight determines step:It is in last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real
The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functionsi;
Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, remembers mould by two-way shot and long term
Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains
Relation between the Liang Ge business entities.
Preferably, the participle step includes:
Each word after participle is represented in the form of one-hot vectors, obtains initial term vector, and is each training sample
Sentence ID, is mapped as the initial one vector of corresponding training sample sentence by sentence mark sentence ID, by the initial one vector sum instruction
The initial term vector for practicing the left and right adjacent word of some word in sample sentence inputs the continuous bag of words, and prediction obtains the word of the word
Vector xi, the sentence vector of each forecast updating training sample sentence, until prediction obtain the word of each word in the training sample sentence to
Measure xi, the sentence vector S of the training sample sentence is used as using the sentence vector after updating for the last timei。
Preferably, the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current
Term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1It is hidden
Hide layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula is:
S=sum (ai*Ti)/n
Wherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents training sample sentence
Quantity.
Preferably, the softmax classification functions expression formula is:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation
Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
In addition, the present invention also provides a kind of electronic device, which includes:Memory, processor and it is stored in institute
The business connection extraction procedure that can be run on memory and on the processor is stated, the business connection extraction procedure is described
Processor is performed, it can be achieved that following steps:
Sample storehouse establishment step:Extract from knowledge base and sentence is established as training sample sentence there are the business entity of relation
Sample storehouse;
Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default point
Word instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence
It is mapped to sentence vector Si, the input as Recognition with Recurrent Neural Network model first layer;
Splice step:In the second layer of Recognition with Recurrent Neural Network model, calculated from left to right currently with shot and long term memory module
Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector
hi', the synthesis hidden layer state vector of each word in trained sample sentence is obtained by splicing two hidden layer state vectors, further according to
The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei;
Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize
Average vector expression formula calculates the average vector S of each trained sample sentence;
Weight determines step:It is in last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real
The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functionsi;
Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, remembers mould by two-way shot and long term
Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains
Relation between the Liang Ge business entities.
Preferably, the splicing step includes:
With shot and long term memory module from left to right according to current word vector xiPrevious term vector xi-1Hiding layer state
Vectorial hi-1Calculate current word vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiIt is latter
A term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula is:
S=sum (ai*Ti)/n
Wherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents training sample sentence
Quantity.
Preferably, the softmax classification functions expression formula is:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation
Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer-readable recording medium
Storage medium includes business connection extraction procedure, it can be achieved that as above when the business connection extraction procedure is executed by processor
Arbitrary steps in the business connection extracting method.
Business connection extracting method, electronic device and computer-readable recording medium proposed by the present invention, from unstructured
The sentence of business entity pair in text in extraction knowledge base there are relation as training sample sentence and establishes sample storehouse.Then in sample
All trained sample sentences for including a business entity pair are extracted in this storehouse, and each trained sample sentence is segmented, and obtain each instruction
Practice the sentence vector S of sample sentencei, the feature vector T of each trained sample sentence is calculated by shot and long term memory modulei.Then according to each
The feature vector T of training sample sentencei, the average vector S of each training sample sentence is calculated, average vector S is substituted into softmax classification letters
Number is calculated, and is determined to train the weight a of sample sentence according to the relationship type of business entity pairi.Finally extracted from current text
The sentence of Liang Ge business entities is included, the feature vector T of sentence is obtained by two-way shot and long term memory module, by this feature vector
T inputs trained Recognition with Recurrent Neural Network model, predicts the relation between the Liang Ge business entities, improves in news to different enterprises
The recognition capability of relation between industry, reduces the dependence to being manually trained data mark.
Brief description of the drawings
Fig. 1 is the schematic diagram of electronic device preferred embodiment of the present invention;
Fig. 2 is the module diagram of business connection extraction procedure preferred embodiment in Fig. 1;
Fig. 3 is the flow chart of business connection extracting method preferred embodiment of the present invention;
Fig. 4 is the frame diagram of prediction module of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, it is the schematic diagram of 1 preferred embodiment of electronic device of the present invention.
In the present embodiment, electronic device 1 can be server, smart mobile phone, tablet computer, PC, portable meter
Calculation machine and other electronic equipments with calculation function.
The electronic device 1 includes:Memory 11, processor 12, knowledge base 13, network interface 14 and communication bus 15.Its
In, knowledge base 13 is stored on memory 11, and the sentence for containing business entity pair is extracted from knowledge base 13 as training sample
Sentence establishes sample storehouse.
Wherein, network interface 14 can alternatively include standard wireline interface and wireless interface (such as WI-FI interfaces).It is logical
Letter bus 15 is used for realization the connection communication between these components.
Memory 11 includes at least a type of readable storage medium storing program for executing.The readable storage medium storing program for executing of at least one type
Can be such as flash memory, hard disk, multimedia card, the non-volatile memory medium of card-type memory.In certain embodiments, it is described to deposit
Reservoir 11 can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.In other embodiments
In, the memory 11 can also be the external memory unit of the electronic device 1, such as be equipped with the electronic device 1
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, dodges
Deposit card (Flash Card) etc..
In the present embodiment, the memory 11 can be not only used for storage be installed on the electronic device 1 application it is soft
Part and Various types of data, such as business connection extraction procedure 10, knowledge base 13 and sample storehouse, can be also used for temporarily storing
Output or the data that will be exported.
Processor 12 can be in certain embodiments a central processing unit (Central Processing Unit,
CPU), microprocessor or other data processing chips, for the program code or processing data stored in run memory 11, example
Such as perform the training of the computer program code and each class model of business connection extraction procedure 10.
Preferably, which can also include display, and display is properly termed as display screen or display unit.
Display can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and OLED (Organic in some embodiments
Light-Emitting Diode, Organic Light Emitting Diode) touch device etc..Display is used to show to be handled in the electronic apparatus 1
Information and for showing visual working interface, such as:The result and weight a of display model trainingiOptimal value.
Preferably, which can also include user interface, and user interface can include input unit such as keyboard
(Keyboard), instantaneous speech power such as sound equipment, earphone etc., alternatively user interface can also be connect including the wired of standard
Mouth, wave point.
In the device embodiment shown in Fig. 1, closed as enterprise is stored in a kind of memory 11 of computer-readable storage medium
It is the program code of extraction procedure 10, when processor 12 performs the program code of business connection extraction procedure 10, realizes following step
Suddenly:
Sample storehouse establishment step:Extract from knowledge base and sentence is established as training sample sentence there are the business entity of relation
Sample storehouse;
Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default point
Word instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence
It is mapped to sentence vector Si, the input as Recognition with Recurrent Neural Network model first layer;
Splice step:In the second layer of Recognition with Recurrent Neural Network model, calculated from left to right currently with shot and long term memory module
Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector
hi', the synthesis hidden layer state vector of each word in trained sample sentence is obtained by splicing two hidden layer state vectors, further according to
The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei;
Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize
Average vector expression formula calculates the average vector S of each trained sample sentence;
Weight determines step:It is in last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real
The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functionsi;
Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, remembers mould by two-way shot and long term
Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains
Relation between the Liang Ge business entities.
In the present embodiment, it is assumed that Liang Ge business entities in knowledge base there are certain relation, then it is real comprising the Liang Ge enterprises
The unstructured sentence of body can represent this relation.Therefore, when we need identify news in certain Liang Ge business entity it
Between association when, from knowledge base extraction include all unstructured sentences of the Liang Ge business entities, using the sentence as
Training sample sentence establishes sample storehouse.Wherein, the knowledge base is real comprising any two enterprise in history news data by collecting
What the unstructured sentence of body was established.For example, it is desired to the association in news between certain Liang Ge business entity is identified, from knowledge base
All unstructured sentences containing the Liang Ge business entities are extracted, and a sample is established using the sentence as training sample sentence
Storehouse.Wherein business entity includes the relations such as treasury trade, supply chain and cooperation to existing relation." Foxconn is for example, sentence
Rub and visit the supplier of bicycle " in the business entity that includes to for " Foxconn ", " rub and visit bicycle ", the relation between business entity
" supplier " belongs to supply chain relationship.
All trained sample sentences for including a business entity pair are extracted from sample storehouse, each training sample sentence includes this to enterprise
The title of industry entity and the relationship type of the business entity pair, and each trained sample sentence is carried out at participle using participle instrument
Reason.Wherein it is possible to each trained sample sentence is divided using participle instruments such as Stanford Chinese word segmentings instrument, jieba participles
Word processing.Each word after participle is represented in the form of one-hot vectors, obtains initial term vector.Wherein one-hot vectors
Method refer to each vocabulary be shown as a very long vector, vectorial dimension represent word number, only one of which dimension
Value be 1, remaining dimension be 0, which represents current word.Foxconn and Mo Bai bicycles are included for example, being extracted from sample storehouse
All trained sample sentences, and each train sample sentence to include Foxconn and rub to visit the bicycle Liang Ge business entities title and the enterprise
The relationship type (supplier) of industry entity pair.Word segmentation processing is carried out to " Foxconn be rub and visit bicycle supplier ", is obtained as follows
As a result " Foxconn | be | rub and visit bicycle | | supplier ".Initial term vector such as " Foxconn " is [0100000000], "Yes"
Initial term vector be [0010000000].Then ID is marked for each training sample sentence, sentence ID is mapped as corresponding training sample
The initial one vector of sentence.
The initial term vector that the initial one vector sum trains the left and right adjacent word of some word in sample sentence is inputted into the company
Continuous bag of words, prediction obtain the term vector x of the wordi.By the initial one vector renewal replace with the first renewal sentence to
Amount, by described in the initial term vector input of the left and right adjacent word of next word in the first renewal sentence vector sum training sample sentence
Continuous bag of words, prediction obtain the term vector x of the wordi+1, the described first renewal sentence vector renewal is replaced with into the second renewal
Sentence vector, such repetitive exercise, training updates the sentence vector of the training sample sentence every time, until prediction obtains training in sample sentence
The term vector x of each wordi, i=(0,1,2,3 ..., m), by the sentence vector after last time training renewal as the training
The sentence vector S of sample sentencei, i=(0,1,2,3 ..., n).As Recognition with Recurrent Neural Network (Recurrent Neural
Network, RNN) model first layer input.For example, by the left adjoining of "Yes" can word " Foxconn ", right adjoining can word
Initial term vector and the initial one vector of " rub and visit bicycle " input continuous bag of words, and prediction obtains the term vector of "Yes"
x2, initial one vector is once updated, obtains the first renewal sentence vector;The left adjoining that " will be rubbed and visit bicycle " can word
The initial term vector or current term vector of "Yes", right adjoining can word " " initial word vector sum first to update sentence vector defeated
Enter continuous bag of words, prediction obtains the term vector x of " rub and visit bicycle "3, the first renewal sentence vector is updated, obtains the
Two renewal sentences vector ... such repetitive exercises, until prediction obtain it is above-mentioned it is all can word term vector xi, renewal obtains
The sentence vector S of the training sample sentencei.In the process, the sentence ID of each news sentence remains constant.
In the second layer of RNN models, then with shot and long term memory module (Long Short-Term Memory, LSTM) from
From left to right is according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current word vector xi
One hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hidden layer state vector
hi+1Calculate current word vector xiThe second hidden layer state vector hi', two hiding stratiforms are spliced by Concatenate functions
State vector obtains training the synthesis hidden layer state vector of each word in sample sentence, hidden further according to the synthesis of all words in training sample sentence
Hide layer state vector and obtain the feature vector T of each trained sample sentencei, i=(0,1,2,3 ..., n).For example, " Foxconn is to rub
Visit the supplier of bicycle " in sentence, with LSTM from left to right according to the term vector x of " Foxconn "1Hidden layer state vector h1Meter
Calculate the term vector x of "Yes"2The first hidden layer state vector h2, and from right to left according to the term vector x of " rub and visit bicycle "3It is hidden
Hide layer state vector h3Calculate the term vector x of "Yes"2The second hidden layer state vector h2', spelled by Concatenate functions
Meet two hidden layer state vector (h2And h2') the synthesis hidden layer state vector of each word in trained sample sentence is obtained, further according to instruction
The synthesis hidden layer state vector for practicing all words in sample sentence obtains the feature vector T of each trained sample sentencei。
In the third layer of RNN models, according to the feature vector T of each trained sample sentencei, it is public using the calculating of average vector
Formula:S=sum (ai*Ti)/n, calculates the average vector S of each trained sample sentence.Wherein aiRepresent the weight of training sample sentence, TiRepresent
The feature vector of each training sample sentence, n represent the quantity of training sample sentence.
In last layer of RNN models, average vector S is updated to softmax classification functions:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation
Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
According to the relationship type of training Yang Juzhong business entities pair, determine to train the weight a of sample sentencei.It is constantly excellent by constantly learning
Change the weight a of training sample sentenceiSo that effective sentence obtains higher weight, and the sentence for having noise obtains less weight.
In the present embodiment, after RNN models determine, the unstructured sentence of business entity pair can be carried to any one
Son carries out Relationship Prediction, and the prediction of model is not associated with specific enterprise name.
The sentence of business entity of the extraction comprising two relations to be predicted from current text, and these sentences are divided
Word obtains sentence vector.For example, S1,S2,S3,S4What is represented is the vector set of the corresponding sentence of Liang Ge business entities.By double
Each sentence is extracted to shot and long term memory module (Bidirectional Long Short-term Memory, bi-LSTM)
Feature vector T1,T2,T3,T4, the feature vector of each sentence is inputted into trained RNN models, obtains Liang Ge enterprises reality
Relationship Prediction result between body.
The business connection extracting method that above-described embodiment proposes, exists by being extracted from non-structured text in knowledge base
The training sample sentence of the business entity pair of relation establishes sample storehouse.All training of a business entity pair are included in sample drawn storehouse
Sample sentence is simultaneously segmented, and obtains the sentence vector S of each trained sample sentencei, using LSTM calculate the feature of each trained sample sentence to
Measure Ti.The average vector S of each trained sample sentence is calculated by the calculation formula of average vector, average vector S is substituted into softmax
Classification function is calculated, and is determined to train the weight a of sample sentence according to the relationship type of business entity pairi.Finally from current text
It is middle to extract the sentence for including Liang Ge business entities, obtain the feature vector T of sentence by bi-LSTMi, by this feature vector TiIt is defeated
Enter trained RNN models, predict the relation between the Liang Ge business entities, not only reduce cumbersome training data and manually mark
Step, and have more preferable accuracy rate and recall rate than other monitor modes.
As shown in Fig. 2, it is the module diagram of 10 preferred embodiment of business connection extraction procedure in Fig. 1.Alleged by the present invention
Module be refer to complete specific function series of computation machine programmed instruction section.
In the present embodiment, business connection extraction procedure 10 includes:Establish module 110, word-dividing mode 120, concatenation module
130th, computing module 140, weight determination module 150, prediction module 160, the functions or operations that the module 110-160 is realized
Step is similar as above, is no longer described in detail herein, exemplarily, such as wherein:
Module 110 is established, sentence is built as training sample sentence there are the business entity of relation for being extracted from knowledge base
Vertical sample storehouse;
Word-dividing mode 120, for extracting all trained sample sentences for including a business entity pair from sample storehouse, using pre-
If participle instrument each trained sample sentence is segmented, each word after participle is mapped to term vector xi, and will each instruct
Practice sample sentence and be mapped to sentence vector Si, the input as RNN model first layers;
Concatenation module 130, for the second layer in RNN models, current word vector x is calculated with LSTM from left to righti
One hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', pass through splicing
Two hidden layer state vectors obtain training the synthesis hidden layer state vector of each word in sample sentence, further according to institute in training sample sentence
The synthesis hidden layer state vector for having word obtains the feature vector T of each trained sample sentencei;
Computing module 140, for the third layer in RNN models, according to the feature vector T of each trained sample sentencei, using flat
Equal vector expression calculates the average vector S of each trained sample sentence;
Weight determination module 150, for last layer in RNN models, by the average vector S and the business entity
To relationship type substitute into softmax classification functions the weight a of each trained sample sentence be calculatedi;
Prediction module 160, for extracting the sentence for including Liang Ge business entities from current text, obtains by bi-LSTM
To the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained RNN models are inputted, prediction obtains Liang Ge enterprises reality
Relation between body.
As shown in figure 3, it is the flow chart of business connection extracting method preferred embodiment of the present invention.
In the present embodiment, processor 12 performs the computer journey of the business connection extraction procedure 10 stored in memory 11
The following steps of business connection extracting method are realized during sequence:
Step S10, extracts from knowledge base and establishes sample storehouse as training sample sentence to sentence there are the business entity of relation;
Step S20, all trained sample sentences for including a business entity pair are extracted from sample storehouse, use default participle
Instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and each trained sample sentence is reflected
Penetrate the subvector S that forms a complete sentencei, the input as RNN model first layers;
Step S30, in the second layer of RNN models, current word vector x is calculated with LSTM from left to rightiThe first hidden layer
State vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', hidden by splicing two
Layer state vector obtains training the synthesis hidden layer state vector of each word in sample sentence, further according in training sample sentence all words it is comprehensive
Close hidden layer state vector and obtain the feature vector T of each trained sample sentencei;
Step S40, in the third layer of RNN models, according to the feature vector T of each trained sample sentencei, utilize average vector table
The average vector S of each trained sample sentence is calculated up to formula;
Step S50, in last layer of RNN models, by the average vector S and the relationship type of the business entity pair
Substitute into the weight a that each trained sample sentence is calculated in softmax classification functionsi;
Step S60, the sentence for including Liang Ge business entities is extracted from current text, sentence is obtained by bi-LSTM
Feature vector Ti, by this feature vector TiAbove-mentioned trained RNN models are inputted, prediction obtains the pass between the Liang Ge business entities
System.
In the present embodiment, it is assumed that Liang Ge business entities in knowledge base there are certain relation, then it is real comprising the Liang Ge enterprises
The unstructured sentence of body can represent this relation.When we need to identify the pass in news between certain Liang Ge business entity
During connection, all unstructured sentences for including the Liang Ge business entities are extracted out from knowledge base, using the sentence as training sample
Sentence establishes sample storehouse.Wherein, the knowledge base is non-comprising any two business entity in history news data by collecting
What structuring sentence was established.For example, it is desired to identify the association in news between certain Liang Ge business entity, extract and contain from knowledge base
There are all unstructured sentences of the Liang Ge business entities, and a sample storehouse is established using the sentence as training sample sentence.Its
Middle business entity includes the relations such as treasury trade, supply chain and cooperation to existing relation.For example, taken out from non-structured text
The sentence that contains " Foxconn " and " rub and visit bicycle " Liang Ge business entities pair is taken as training sample sentence, wherein " Foxconn is sentence
Rub and visit the supplier of bicycle " in the business entity that includes to for " Foxconn ", " rub and visit bicycle ", the relation between business entity
" supplier " belongs to supply chain relationship.
All trained sample sentences for including a business entity pair are extracted from sample storehouse, each training sample sentence includes this to enterprise
The title of industry entity and the relationship type of the business entity pair, and each trained sample sentence is carried out at participle using participle instrument
Reason.For example, extracting all trained sample sentences comprising Foxconn and Mo Bai bicycles from sample storehouse, and each sample sentence is trained to wrap
Include Foxconn and rub and visit the relationship type (supplier) of the bicycle Liang Ge business entities title and the business entity pair.Use
The participle instruments such as Stanford Chinese word segmentings instrument, jieba participles carry out word segmentation processing to each trained sample sentence.Such as:To " rich
Scholar's health is the supplier for rubbing and visiing bicycle " carry out word segmentation processing, obtain following result " Foxconn | be | rub and visit bicycle | | supply
Business ".Each word after participle is represented in the form of one-hot vectors, obtains initial term vector.Wherein one-hot vectors
Method refers to is shown as a very long vector each vocabulary, vectorial dimension represent word number, only one of which dimension
It is worth for 1, remaining dimension is 0, which represents current word.For example, the initial term vector of " Foxconn " for [0100000000],
The initial term vector of "Yes" is [0010000000].Then ID is marked for each training sample sentence, sentence ID is mapped as corresponding instruction
Practice the initial one vector of sample sentence.
The initial term vector that the initial one vector sum trains the left and right adjacent word of some word in sample sentence is inputted into the company
Continuous bag of words, prediction obtain the term vector x of the wordi.By the initial one vector renewal replace with the first renewal sentence to
Amount, by described in the initial term vector input of the left and right adjacent word of next word in the first renewal sentence vector sum training sample sentence
Continuous bag of words, prediction obtain the term vector x of the wordi+1, the described first renewal sentence vector renewal is replaced with into the second renewal
Sentence vector, such repetitive exercise, training updates the sentence vector of the training sample sentence every time, until prediction obtains training in sample sentence
The term vector x of each wordi, i=(0,1,2,3 ..., m), by the sentence vector after last time training renewal as the training
The sentence vector S of sample sentencei, i=(0,1,2,3 ..., n).For example, in " Foxconn is the supplier for rubbing and visiing bicycle " sentence, will
The left adjoining of "Yes" can word " Foxconn ", right adjoining can word " rub and visit bicycle " initial term vector and initial one vector
Continuous bag of words are inputted, prediction obtains the term vector x of "Yes"2, initial one vector is once updated, obtains first more
New sentence vector;By the left adjoining of " rub and visit bicycle " can word "Yes" initial term vector or current term vector, right adjoining can use
Word " " initial word vector sum first update sentence vector and input continuous bag of words, prediction obtain the word of " rub and visit bicycle " to
Measure x3, the first renewal sentence vector is updated, the such repetitive exercise of the second renewal sentence vector ... is obtained, until prediction
Obtain it is above-mentioned it is all can word term vector xi, update and obtain the sentence vector S of the training sample sentencei.In the process, Mei Gexin
The sentence ID for hearing sentence remains constant.
In the second layer of RNN models, then with LSTM from left to right according to current word vector xiPrevious term vector xi-1
Hidden layer state vector hi-1Calculate current word vector xiThe first hidden layer state vector hi, and from right to left according to current word
Vector xiThe latter term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector
hi', two hidden layer state vectors are spliced by Concatenate functions and obtain the synthesis hidden layer of each word in trained sample sentence
State vector, the feature vector of each trained sample sentence is obtained further according to the synthesis hidden layer state vector of all words in training sample sentence
Ti, i=(0,1,2,3 ..., n).For example, in " Foxconn is the supplier for rubbing and visiing bicycle " sentence, with LSTM roots from left to right
According to the term vector x of " Foxconn "1Hidden layer state vector h1Calculate the term vector x of "Yes"2The first hidden layer state vector
h2, and from right to left according to the term vector x of " rub and visit bicycle "3Hidden layer state vector h3Calculate the term vector x of "Yes"2
Two hidden layer state vector h2', two hidden layer state vector (h are spliced by Concatenate functions2And h2') trained
The synthesis hidden layer state vector of each word in sample sentence, obtains further according to the synthesis hidden layer state vector of all words in training sample sentence
To the feature vector T of each trained sample sentencei。
In the third layer of RNN models, according to the feature vector T of each trained sample sentencei, it is public using the calculating of average vector
Formula:S=sum (ai*Ti)/n, calculates the average vector S of each trained sample sentence.Wherein aiRepresent the weight of training sample sentence, TiRepresent
The feature vector of each training sample sentence, n represent the quantity of training sample sentence.It is assumed that " Foxconn " and " Mo Bai are extracted from knowledge base
The training sample sentence of bicycle " entity pair has 50,000, then by the feature vector T of every trained sample sentencei, i=(0,1,2,3 ..., n)
Substitute into the calculation formula of average vector:S=sum (ai*Ti)/n, calculates the average vector S of each trained sample sentence.Wherein n is equal to 5
Ten thousand.
In last layer of RNN models, average vector S is then updated to softmax classification functions:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation
Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
According to the relationship type of training Yang Juzhong business entities pair, determine to train the weight a of sample sentencei.By constantly iterative learning, no
The weight a of disconnected optimization training sample sentenceiSo that effective sentence obtains higher weight, and the sentence for having noise obtains less power
Weight, so as to obtain reliable RNN models.
In the present embodiment, after RNN models determine, the unstructured sentence of business entity pair can be carried to any one
Son carries out Relationship Prediction, and the prediction of model is not associated with specific enterprise name.
Finally, as shown in figure 4, being the frame diagram of prediction module of the present invention.Extraction is treated pre- comprising two from current text
The sentence of the business entity of survey relation, sentence of the extraction comprising " Chinese safety group " with " Bank of China " such as from news, and
These sentences are segmented to obtain sentence vector.For example, S1,S2,S3,S4What is represented is the corresponding sentence of Liang Ge business entities
Vector set.The feature vector T of each sentence is extracted by bi-LSTM1,T2,T3,T4, then by calculating TiWith relation object
The similarities of type r vectors assigns TiIn the weight that whole sentence is concentrated, finally take in each sentence weighting and pass through with after
Softmax graders predict the relation of " Chinese safety group " between " Bank of China ".
The business connection extracting method that above-described embodiment proposes, exists by being extracted from non-structured text in knowledge base
The sentence of the business entity pair of relation is as training sample sentence and establishes sample storehouse.A business entity pair is included in sample drawn storehouse
All trained sample sentences and segmented, obtain the sentence vector S of each trained sample sentencei, each trained sample is calculated using LSTM
The feature vector T of sentencei.Then the average vector S of each trained sample sentence is calculated by the calculation formula of average vector, will it is average to
Amount S substitutes into softmax classification functions and is calculated, and is determined to train the weight a of sample sentence according to the relationship type of business entity pairi。
The sentence for including Liang Ge business entities is finally extracted from current text, the feature vector T of sentence is obtained by bi-LSTMi, will
This feature vector TiTrained RNN models are inputted, the relation between the Liang Ge business entities is predicted, improves in news to difference
The recognition capability of relation and the early warning to business risk between enterprise, reduce the artificial annotation step of cumbersome training data.
In addition, the embodiment of the present invention also proposes a kind of computer-readable recording medium, the computer-readable recording medium
Include business connection extraction procedure 10, following operation is realized when the business connection extraction procedure 10 is executed by processor:
Sample storehouse establishment step:Extract from knowledge base and sentence is established as training sample sentence there are the business entity of relation
Sample storehouse;
Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default point
Word instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence
It is mapped to sentence vector Si, the input as RNN model first layers;
Splice step:In the second layer of RNN models, current word vector x is calculated from left to right with LSTMiThe first hidden layer
State vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', hidden by splicing two
Layer state vector obtains training the synthesis hidden layer state vector of each word in sample sentence, further according in training sample sentence all words it is comprehensive
Close hidden layer state vector and obtain the feature vector T of each trained sample sentencei;
Calculation procedure:In the third layer of RNN models, according to the feature vector T of each trained sample sentencei, utilize average vector
Expression formula calculates the average vector S of each trained sample sentence;
Weight determines step:In last layer of RNN models, by the average vector S and the pass of the business entity pair
Set type substitutes into the weight a that each trained sample sentence is calculated in softmax classification functionsi;
Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, and sentence is obtained by bi-LSTM
Feature vector Ti, by this feature vector TiAbove-mentioned trained RNN models are inputted, prediction obtains the pass between the Liang Ge business entities
System.
Preferably, the participle step includes:
Each word after participle is represented in the form of one-hot vectors, obtains initial term vector, and is each training sample
Sentence ID, is mapped as the initial one vector of corresponding training sample sentence by sentence mark sentence ID, by the initial one vector sum instruction
The initial term vector for practicing the left and right adjacent word of some word in sample sentence inputs the continuous bag of words, and prediction obtains the word of the word
Vector xi, the sentence vector of each forecast updating training sample sentence, until prediction obtain the word of each word in the training sample sentence to
Measure xi, the sentence vector S of the training sample sentence is used as using the sentence vector after updating for the last timei。
Preferably, the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current
Term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1It is hidden
Hide layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula is:
S=sum (ai*Ti)/n
Wherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents training sample sentence
Quantity.
Preferably, the softmax classification functions expression formula is:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation
Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
The embodiment of the computer-readable recording medium of the present invention is specific with above-mentioned business connection extracting method
Embodiment is roughly the same, and details are not described herein.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on such understanding, technical scheme substantially in other words does the prior art
Going out the part of contribution can be embodied in the form of software product, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions use so that a station terminal equipment (can be mobile phone,
Computer, server, or network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair
The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made, is directly or indirectly used in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
- A kind of 1. business connection extracting method, it is characterised in that the described method includes:Sample storehouse establishment step:Extracted from knowledge base and sample is established as training sample sentence to sentence there are the business entity of relation Storehouse;Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default participle work Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence to map Form a complete sentence subvector Si, the input as Recognition with Recurrent Neural Network model first layer;Splice step:In the second layer of Recognition with Recurrent Neural Network model, with shot and long term memory module calculate from left to right current word to Measure xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', Obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to training The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in sample sentencei;Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, using averagely Vector expression calculates the average vector S of each trained sample sentence;Weight determines step:In last layer of Recognition with Recurrent Neural Network model, by the average vector S and the business entity pair Relationship type substitute into softmax classification functions the weight a of each trained sample sentence be calculatedi;Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, is obtained by two-way shot and long term memory module To the feature vector T of sentencei, by this feature vector TiInput above-mentioned trained Recognition with Recurrent Neural Network model, prediction obtain this two Relation between a business entity.
- 2. business connection extracting method according to claim 1, it is characterised in that the participle step includes:Each word after participle is represented in the form of one-hot vectors, obtains initial term vector, and is each training sample sentence mark Sentence ID is noted, sentence ID is mapped as to the initial one vector of corresponding training sample sentence, by the initial one vector sum training sample The initial term vector of the left and right adjacent word of some word inputs the continuous bag of words in sentence, and prediction obtains the term vector of the word xi, the sentence vector of each forecast updating training sample sentence, until prediction obtains the term vector x of each word in the training sample sentencei, The sentence vector S of the training sample sentence is used as using the sentence vector after updating for the last timei。
- 3. business connection extracting method according to claim 1, it is characterised in that the splicing step includes:From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hiding stratiform State vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
- 4. business connection extracting method according to claim 1, it is characterised in that the expression formula of the average vector is:S=sum (ai*Ti)/nWherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents the quantity of training sample sentence.
- 5. business connection extracting method according to claim 4, it is characterised in that the table of the softmax classification functions It is up to formula:<mrow> <mi>&sigma;</mi> <msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>j</mi> </msub> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mi>S</mi> </msup> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </msubsup> <msup> <mi>e</mi> <msub> <mi>S</mi> <mi>k</mi> </msub> </msup> </mrow> </mfrac> </mrow>Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Represent certain Kind business connection type, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
- 6. a kind of electronic device, it is characterised in that described device includes:Memory, processor, are stored with enterprise on the memory Industry relation extraction procedure, the business connection extraction procedure are performed, it can be achieved that following steps by the processor:Sample storehouse establishment step:Extracted from knowledge base and sample is established as training sample sentence to sentence there are the business entity of relation Storehouse;Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default participle work Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence to map Form a complete sentence subvector Si, the input as Recognition with Recurrent Neural Network model first layer;Splice step:In the second layer of Recognition with Recurrent Neural Network model, with shot and long term memory module calculate from left to right current word to Measure xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', Obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to training The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in sample sentencei;Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, using averagely Vector expression calculates the average vector S of each trained sample sentence;Weight determines step:In last layer of Recognition with Recurrent Neural Network model, by the average vector S and the business entity pair Relationship type substitute into softmax classification functions the weight a of each trained sample sentence be calculatedi;Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, is obtained by two-way shot and long term memory module To the feature vector T of sentencei, by this feature vector TiInput above-mentioned trained Recognition with Recurrent Neural Network model, prediction obtain this two Relation between a business entity.
- 7. electronic device according to claim 6, it is characterised in that the splicing step includes:From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hiding stratiform State vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
- 8. electronic device according to claim 6, it is characterised in that the expression formula of the average vector is:S=sum (ai*Ti)/nWherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents the quantity of training sample sentence.
- 9. electronic device according to claim 8, it is characterised in that the expression formula of the softmax classification functions is:<mrow> <mi>&sigma;</mi> <msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>j</mi> </msub> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mi>S</mi> </msup> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </msubsup> <msup> <mi>e</mi> <msub> <mi>S</mi> <mi>k</mi> </msub> </msup> </mrow> </mfrac> </mrow>Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Represent certain Kind business connection type, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
- 10. a kind of computer-readable recording medium, it is characterised in that the computer-readable recording medium includes business connection Extraction procedure, it can be achieved that such as any one of claim 1 to 5 institute when the system business connection extraction procedure is executed by processor The step of stating business connection extracting method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711061205.0A CN107943847B (en) | 2017-11-02 | 2017-11-02 | Business connection extracting method, device and storage medium |
PCT/CN2018/076119 WO2019085328A1 (en) | 2017-11-02 | 2018-02-10 | Enterprise relationship extraction method and device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711061205.0A CN107943847B (en) | 2017-11-02 | 2017-11-02 | Business connection extracting method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107943847A true CN107943847A (en) | 2018-04-20 |
CN107943847B CN107943847B (en) | 2019-05-17 |
Family
ID=61934111
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711061205.0A Active CN107943847B (en) | 2017-11-02 | 2017-11-02 | Business connection extracting method, device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107943847B (en) |
WO (1) | WO2019085328A1 (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108876044A (en) * | 2018-06-25 | 2018-11-23 | 中国人民大学 | Content popularit prediction technique on a kind of line of knowledge based strength neural network |
CN108920587A (en) * | 2018-06-26 | 2018-11-30 | 清华大学 | Merge the open field vision answering method and device of external knowledge |
CN108985501A (en) * | 2018-06-29 | 2018-12-11 | 平安科技(深圳)有限公司 | Stock index prediction method, server and the storage medium extracted based on index characteristic |
CN109063032A (en) * | 2018-07-16 | 2018-12-21 | 清华大学 | A kind of noise-reduction method of remote supervisory retrieval data |
CN109243616A (en) * | 2018-06-29 | 2019-01-18 | 东华大学 | Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning |
CN109376250A (en) * | 2018-09-27 | 2019-02-22 | 中山大学 | Entity relationship based on intensified learning combines abstracting method |
CN109582956A (en) * | 2018-11-15 | 2019-04-05 | 中国人民解放军国防科技大学 | text representation method and device applied to sentence embedding |
CN109597851A (en) * | 2018-09-26 | 2019-04-09 | 阿里巴巴集团控股有限公司 | Feature extracting method and device based on incidence relation |
CN109710768A (en) * | 2019-01-10 | 2019-05-03 | 西安交通大学 | A kind of taxpayer's industry two rank classification method based on MIMO recurrent neural network |
CN110188202A (en) * | 2019-06-06 | 2019-08-30 | 北京百度网讯科技有限公司 | Training method, device and the terminal of semantic relation identification model |
CN110188201A (en) * | 2019-05-27 | 2019-08-30 | 上海上湖信息技术有限公司 | A kind of information matching method and equipment |
CN110209836A (en) * | 2019-05-17 | 2019-09-06 | 北京邮电大学 | Remote supervisory Relation extraction method and device |
CN110427624A (en) * | 2019-07-30 | 2019-11-08 | 北京百度网讯科技有限公司 | Entity relation extraction method and device |
CN110737758A (en) * | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating a model |
CN111476035A (en) * | 2020-05-06 | 2020-07-31 | 中国人民解放军国防科技大学 | Chinese open relation prediction method and device, computer equipment and storage medium |
CN111581387A (en) * | 2020-05-09 | 2020-08-25 | 电子科技大学 | Entity relation joint extraction method based on loss optimization |
CN111680127A (en) * | 2020-06-11 | 2020-09-18 | 暨南大学 | Annual report-oriented company name and relationship extraction method |
CN111784488A (en) * | 2020-06-28 | 2020-10-16 | 中国工商银行股份有限公司 | Enterprise capital risk prediction method and device |
CN111950279A (en) * | 2019-05-17 | 2020-11-17 | 百度在线网络技术(北京)有限公司 | Entity relationship processing method, device, equipment and computer readable storage medium |
CN112036181A (en) * | 2019-05-14 | 2020-12-04 | 上海晶赞融宣科技有限公司 | Entity relationship identification method and device and computer readable storage medium |
CN112215288A (en) * | 2020-10-13 | 2021-01-12 | 中国光大银行股份有限公司 | Target enterprise category determination method and device, storage medium and electronic device |
CN112418320A (en) * | 2020-11-24 | 2021-02-26 | 杭州未名信科科技有限公司 | Enterprise association relation identification method and device and storage medium |
CN113486630A (en) * | 2021-09-07 | 2021-10-08 | 浙江大学 | Supply chain data vectorization and visualization processing method and device |
CN113806538A (en) * | 2021-09-17 | 2021-12-17 | 平安银行股份有限公司 | Label extraction model training method, device, equipment and storage medium |
CN116562303A (en) * | 2023-07-04 | 2023-08-08 | 之江实验室 | Reference resolution method and device for reference external knowledge |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110619053A (en) * | 2019-09-18 | 2019-12-27 | 北京百度网讯科技有限公司 | Training method of entity relation extraction model and method for extracting entity relation |
CN110879938A (en) * | 2019-11-14 | 2020-03-13 | 中国联合网络通信集团有限公司 | Text emotion classification method, device, equipment and storage medium |
CN111382843B (en) * | 2020-03-06 | 2023-10-20 | 浙江网商银行股份有限公司 | Method and device for establishing enterprise upstream and downstream relationship identification model and mining relationship |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160217393A1 (en) * | 2013-09-12 | 2016-07-28 | Hewlett-Packard Development Company, L.P. | Information extraction |
CN106372058A (en) * | 2016-08-29 | 2017-02-01 | 中译语通科技(北京)有限公司 | Short text emotion factor extraction method and device based on deep learning |
CN106407211A (en) * | 2015-07-30 | 2017-02-15 | 富士通株式会社 | Method and device for classifying semantic relationships among entity words |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106855853A (en) * | 2016-12-28 | 2017-06-16 | 成都数联铭品科技有限公司 | Entity relation extraction system based on deep neural network |
CN107220237A (en) * | 2017-05-24 | 2017-09-29 | 南京大学 | A kind of method of business entity's Relation extraction based on convolutional neural networks |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194422A (en) * | 2017-06-19 | 2017-09-22 | 中国人民解放军国防科学技术大学 | A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination |
-
2017
- 2017-11-02 CN CN201711061205.0A patent/CN107943847B/en active Active
-
2018
- 2018-02-10 WO PCT/CN2018/076119 patent/WO2019085328A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160217393A1 (en) * | 2013-09-12 | 2016-07-28 | Hewlett-Packard Development Company, L.P. | Information extraction |
CN106407211A (en) * | 2015-07-30 | 2017-02-15 | 富士通株式会社 | Method and device for classifying semantic relationships among entity words |
CN106372058A (en) * | 2016-08-29 | 2017-02-01 | 中译语通科技(北京)有限公司 | Short text emotion factor extraction method and device based on deep learning |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106855853A (en) * | 2016-12-28 | 2017-06-16 | 成都数联铭品科技有限公司 | Entity relation extraction system based on deep neural network |
CN107220237A (en) * | 2017-05-24 | 2017-09-29 | 南京大学 | A kind of method of business entity's Relation extraction based on convolutional neural networks |
Non-Patent Citations (7)
Title |
---|
LEI MENG ET AL: "An Improved Method for Chinese Company Name and Abbreviation Recognition", 《 KNOWLEDGE MANAGEMENT IN ORGANIZATIONS》 * |
PENG ZHOU ET AL: "Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification", 《PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 * |
XIAOYUN HOU ET AL: "Classifying Relation via Bidirectional Recurrent Neural Network Based on Local Information", 《WEB TECHNOLOGIES AND APPLICATIONS》 * |
YONGHUI WU ET AL: "named entity recognition in Chinese clinical text using deep neural network", 《STUDIES IN HEALTH TECHNOLOGY & INFORMATION》 * |
胡新辰: "基于LSTM的语义关系分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
郭喜跃 等: "基于句法语义特征的中文实体关系抽取", 《中文信息学报》 * |
黄蓓静 等: "远程监督人物关系抽取中的去噪研究", 《计算机应用与软件》 * |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108876044A (en) * | 2018-06-25 | 2018-11-23 | 中国人民大学 | Content popularit prediction technique on a kind of line of knowledge based strength neural network |
CN108876044B (en) * | 2018-06-25 | 2021-02-26 | 中国人民大学 | Online content popularity prediction method based on knowledge-enhanced neural network |
CN108920587A (en) * | 2018-06-26 | 2018-11-30 | 清华大学 | Merge the open field vision answering method and device of external knowledge |
CN108985501A (en) * | 2018-06-29 | 2018-12-11 | 平安科技(深圳)有限公司 | Stock index prediction method, server and the storage medium extracted based on index characteristic |
CN109243616A (en) * | 2018-06-29 | 2019-01-18 | 东华大学 | Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning |
CN108985501B (en) * | 2018-06-29 | 2022-04-29 | 平安科技(深圳)有限公司 | Index feature extraction-based stock index prediction method, server and storage medium |
CN110737758B (en) * | 2018-07-03 | 2022-07-05 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating a model |
CN110737758A (en) * | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating a model |
US11501182B2 (en) | 2018-07-03 | 2022-11-15 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for generating model |
CN109063032A (en) * | 2018-07-16 | 2018-12-21 | 清华大学 | A kind of noise-reduction method of remote supervisory retrieval data |
CN109063032B (en) * | 2018-07-16 | 2020-09-11 | 清华大学 | Noise reduction method for remote supervision and retrieval data |
CN109597851B (en) * | 2018-09-26 | 2023-03-21 | 创新先进技术有限公司 | Feature extraction method and device based on incidence relation |
CN109597851A (en) * | 2018-09-26 | 2019-04-09 | 阿里巴巴集团控股有限公司 | Feature extracting method and device based on incidence relation |
CN109376250A (en) * | 2018-09-27 | 2019-02-22 | 中山大学 | Entity relationship based on intensified learning combines abstracting method |
CN109582956A (en) * | 2018-11-15 | 2019-04-05 | 中国人民解放军国防科技大学 | text representation method and device applied to sentence embedding |
CN109710768B (en) * | 2019-01-10 | 2020-07-28 | 西安交通大学 | Tax payer industry two-level classification method based on MIMO recurrent neural network |
CN109710768A (en) * | 2019-01-10 | 2019-05-03 | 西安交通大学 | A kind of taxpayer's industry two rank classification method based on MIMO recurrent neural network |
CN112036181A (en) * | 2019-05-14 | 2020-12-04 | 上海晶赞融宣科技有限公司 | Entity relationship identification method and device and computer readable storage medium |
CN110209836A (en) * | 2019-05-17 | 2019-09-06 | 北京邮电大学 | Remote supervisory Relation extraction method and device |
CN110209836B (en) * | 2019-05-17 | 2022-04-26 | 北京邮电大学 | Remote supervision relation extraction method and device |
CN111950279A (en) * | 2019-05-17 | 2020-11-17 | 百度在线网络技术(北京)有限公司 | Entity relationship processing method, device, equipment and computer readable storage medium |
CN110188201A (en) * | 2019-05-27 | 2019-08-30 | 上海上湖信息技术有限公司 | A kind of information matching method and equipment |
CN110188202A (en) * | 2019-06-06 | 2019-08-30 | 北京百度网讯科技有限公司 | Training method, device and the terminal of semantic relation identification model |
CN110427624B (en) * | 2019-07-30 | 2023-04-25 | 北京百度网讯科技有限公司 | Entity relation extraction method and device |
CN110427624A (en) * | 2019-07-30 | 2019-11-08 | 北京百度网讯科技有限公司 | Entity relation extraction method and device |
CN111476035B (en) * | 2020-05-06 | 2023-09-05 | 中国人民解放军国防科技大学 | Chinese open relation prediction method, device, computer equipment and storage medium |
CN111476035A (en) * | 2020-05-06 | 2020-07-31 | 中国人民解放军国防科技大学 | Chinese open relation prediction method and device, computer equipment and storage medium |
CN111581387B (en) * | 2020-05-09 | 2022-10-11 | 电子科技大学 | Entity relation joint extraction method based on loss optimization |
CN111581387A (en) * | 2020-05-09 | 2020-08-25 | 电子科技大学 | Entity relation joint extraction method based on loss optimization |
CN111680127A (en) * | 2020-06-11 | 2020-09-18 | 暨南大学 | Annual report-oriented company name and relationship extraction method |
CN111784488B (en) * | 2020-06-28 | 2023-08-01 | 中国工商银行股份有限公司 | Enterprise fund risk prediction method and device |
CN111784488A (en) * | 2020-06-28 | 2020-10-16 | 中国工商银行股份有限公司 | Enterprise capital risk prediction method and device |
CN112215288A (en) * | 2020-10-13 | 2021-01-12 | 中国光大银行股份有限公司 | Target enterprise category determination method and device, storage medium and electronic device |
CN112215288B (en) * | 2020-10-13 | 2024-04-30 | 中国光大银行股份有限公司 | Method and device for determining category of target enterprise, storage medium and electronic device |
CN112418320A (en) * | 2020-11-24 | 2021-02-26 | 杭州未名信科科技有限公司 | Enterprise association relation identification method and device and storage medium |
CN112418320B (en) * | 2020-11-24 | 2024-01-19 | 杭州未名信科科技有限公司 | Enterprise association relation identification method, device and storage medium |
CN113486630A (en) * | 2021-09-07 | 2021-10-08 | 浙江大学 | Supply chain data vectorization and visualization processing method and device |
CN113806538A (en) * | 2021-09-17 | 2021-12-17 | 平安银行股份有限公司 | Label extraction model training method, device, equipment and storage medium |
CN113806538B (en) * | 2021-09-17 | 2023-08-22 | 平安银行股份有限公司 | Label extraction model training method, device, equipment and storage medium |
CN116562303A (en) * | 2023-07-04 | 2023-08-08 | 之江实验室 | Reference resolution method and device for reference external knowledge |
CN116562303B (en) * | 2023-07-04 | 2023-11-21 | 之江实验室 | Reference resolution method and device for reference external knowledge |
Also Published As
Publication number | Publication date |
---|---|
CN107943847B (en) | 2019-05-17 |
WO2019085328A1 (en) | 2019-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107943847B (en) | Business connection extracting method, device and storage medium | |
CN107330011B (en) | The recognition methods of the name entity of more strategy fusions and device | |
CN108647205B (en) | Fine-grained emotion analysis model construction method and device and readable storage medium | |
CN110489555A (en) | A kind of language model pre-training method of combination class word information | |
CN108563703A (en) | A kind of determination method of charge, device and computer equipment, storage medium | |
CN107729309A (en) | A kind of method and device of the Chinese semantic analysis based on deep learning | |
CN106202010A (en) | The method and apparatus building Law Text syntax tree based on deep neural network | |
CN110222178A (en) | Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing | |
CN108647191B (en) | Sentiment dictionary construction method based on supervised sentiment text and word vector | |
CN113051356B (en) | Open relation extraction method and device, electronic equipment and storage medium | |
CN110222184A (en) | A kind of emotion information recognition methods of text and relevant apparatus | |
CN110502626A (en) | A kind of aspect grade sentiment analysis method based on convolutional neural networks | |
CN113378970B (en) | Sentence similarity detection method and device, electronic equipment and storage medium | |
CN110059924A (en) | Checking method, device, equipment and the computer readable storage medium of contract terms | |
CN113360654B (en) | Text classification method, apparatus, electronic device and readable storage medium | |
CN115392237B (en) | Emotion analysis model training method, device, equipment and storage medium | |
CN115357719A (en) | Power audit text classification method and device based on improved BERT model | |
CN108920446A (en) | A kind of processing method of Engineering document | |
CN114528398A (en) | Emotion prediction method and system based on interactive double-graph convolutional network | |
CN113204967A (en) | Resume named entity identification method and system | |
CN103678318A (en) | Multi-word unit extraction method and equipment and artificial neural network training method and equipment | |
CN107943788A (en) | Enterprise's abbreviation generation method, device and storage medium | |
CN116681082A (en) | Discrete text semantic segmentation method, device, equipment and storage medium | |
CN117290515A (en) | Training method of text annotation model, method and device for generating text graph | |
CN113688232B (en) | Method and device for classifying bid-inviting text, storage medium and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |