CN107943847B - Business connection extracting method, device and storage medium - Google Patents
Business connection extracting method, device and storage medium Download PDFInfo
- Publication number
- CN107943847B CN107943847B CN201711061205.0A CN201711061205A CN107943847B CN 107943847 B CN107943847 B CN 107943847B CN 201711061205 A CN201711061205 A CN 201711061205A CN 107943847 B CN107943847 B CN 107943847B
- Authority
- CN
- China
- Prior art keywords
- vector
- sentence
- trained
- sample
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
Abstract
The invention discloses a kind of business connection extracting method, device and storage mediums, this method comprises: extracting from knowledge base, there are the business entities of relationship to establish sample database as training sample sentence to sentence;All trained sample sentences comprising a business entity pair are extracted from sample database and are segmented, and each word is mapped to term vector xi, it is mapped to sentence vector Si;Term vector x is calculated with LSTMiThe first hidden layer state vector hiWith the second hidden layer state vector hi', splicing obtains comprehensive hidden layer state vector, then obtains feature vector Ti;By feature vector TiIt substitutes into average vector expression formula and calculates average vector S;Average vector S and the relationship type of business entity pair are substituted into the weight a that softmax classification function calculates each trained sample sentencei;The sentence comprising Liang Ge business entity is extracted, obtains feature vector T by bi-LSTMi, it is input to trained RNN model, predicts the relationship of the Liang Ge enterprise, cost of labor is reduced, more accurately predicts the relationship between the Liang Ge business entity.
Description
Technical field
The present invention relates to processing data information technical field more particularly to a kind of business connection extracting methods, device and meter
Calculation machine readable storage medium storing program for executing.
Background technique
The association in news between different enterprises, such as treasury trade, supply chain, cooperation are identified, to business risk early warning
There is very great meaning.However now common entity relation extraction method needs manually to carry out the mark of a large amount of training datas,
And corpus labeling work generally takes time and effort very much.
Summary of the invention
In view of the foregoing, the present invention provides a kind of business connection extracting method, device and computer readable storage medium,
Relationship based on convolutional neural networks can be extracted on model extension to remote supervisory data, efficiently reduce model to artificial
The dependence of labeled data, and this business connection extracting method for having supervision has more compared to semi-supervised or unsupervised approaches
Good accuracy rate and recall rate.
To achieve the above object, the present invention provides a kind of business connection extracting method, this method comprises:
Sample database establishment step: it is extracted from knowledge base there are the business entity of relationship to sentence as training sample sentence foundation
Sample database;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset point
Word tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and by each trained sample sentence
It is mapped to sentence vector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, being calculated from left to right with shot and long term memory module current
Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector
hi', obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to
The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei;
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize
Average vector expression formula indicates the average vector S of the business entity pair;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real
The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functioni, obtain trained follow
Ring neural network model;
Prediction steps: extracting the sentence comprising Liang Ge business entity from current text, remembers mould by two-way shot and long term
Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains
Relationship between the Liang Ge business entity.
Preferably, the participle step includes:
Each word after participle is indicated in the form of one-hot vector, obtains initial term vector, and is each trained sample
Sentence ID is mapped as the initial one vector of corresponding training sample sentence by sentence mark sentence ID, by the initial one vector sum instruction
The initial term vector for practicing the left and right adjacent word of some word in sample sentence inputs the continuous bag of words, and prediction obtains the word of the word
Vector xi, the sentence vector of each forecast updating training sample sentence, until prediction obtain the word of each word in the training sample sentence to
Measure xi, using the updated sentence vector of last time as the sentence vector S of the training sample sentencei。
Preferably, the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1It calculates current
Term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1It is hidden
Hide layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula are as follows:
S=sum (ai*Ti)/n
Wherein aiIt represents the weight of training sample sentence, be required value, TiThe feature vector of each trained sample sentence is represented, n represents instruction
Practice the quantity of sample sentence.
Preferably, the softmax classification function expression formula are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent institute
State the business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type
In probability.
In addition, the present invention also provides a kind of electronic device, which includes: memory, processor and is stored in institute
The business connection extraction procedure that can be run on memory and on the processor is stated, the business connection extraction procedure is described
Processor executes, it can be achieved that following steps:
Sample database establishment step: it is extracted from knowledge base there are the business entity of relationship to sentence as training sample sentence foundation
Sample database;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset point
Word tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and by each trained sample sentence
It is mapped to sentence vector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, being calculated from left to right with shot and long term memory module current
Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector
hi', obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to
The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei;
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize
Average vector expression formula indicates the average vector S of the business entity pair;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real
The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functioni, obtain trained follow
Ring neural network model;
Prediction steps: extracting the sentence comprising Liang Ge business entity from current text, remembers mould by two-way shot and long term
Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains
Relationship between the Liang Ge business entity.
Preferably, the splicing step includes:
With shot and long term memory module from left to right according to current word vector xiPrevious term vector xi-1Hiding layer state
Vector hi-1Calculate current word vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiIt is latter
A term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula are as follows:
S=sum (ai*Ti)/n
Wherein aiIt represents the weight of training sample sentence, be required value, TiThe feature vector of each trained sample sentence is represented, n represents instruction
Practice the quantity of sample sentence.
Preferably, the softmax classification function expression formula are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Described in representative
The business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type
Probability.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium
It include business connection extraction procedure in storage medium, it can be achieved that as above when the business connection extraction procedure is executed by processor
Arbitrary steps in the business connection extracting method.
Business connection extracting method, electronic device and computer readable storage medium proposed by the present invention, from unstructured
It is extracted in text and is used as training sample sentence there are the sentence of the business entity pair of relationship in knowledge base and establishes sample database.Then in sample
All trained sample sentences comprising a business entity pair are extracted in this library, and segment to each trained sample sentence, obtain each instruction
Practice the sentence vector S of sample sentencei, the feature vector T of each trained sample sentence is calculated by shot and long term memory modulei.Then according to each
The feature vector T of training sample sentenceiThe average vector S of each trained sample sentence is obtained, average vector S is substituted into softmax classification letter
Number is calculated, and the weight a of training sample sentence is determined according to the relationship type of business entity pairi, obtain trained circulation nerve
Network model.The sentence comprising Liang Ge business entity is finally extracted from current text, is obtained by two-way shot and long term memory module
To the feature vector T of sentence, this feature vector T is inputted into trained Recognition with Recurrent Neural Network model, predicts Liang Ge enterprise reality
Relationship between body improves in news between the recognition capability of relationship different enterprises, reduces and is trained data mark to artificial
Dependence.
Detailed description of the invention
Fig. 1 is the schematic diagram of electronic device preferred embodiment of the present invention;
Fig. 2 is the module diagram of business connection extraction procedure preferred embodiment in Fig. 1;
Fig. 3 is the flow chart of business connection extracting method preferred embodiment of the present invention;
Fig. 4 is the frame diagram of prediction module of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, being the schematic diagram of 1 preferred embodiment of electronic device of the present invention.
In the present embodiment, electronic device 1 can be server, smart phone, tablet computer, PC, portable meter
Calculation machine and other electronic equipments with calculation function.
The electronic device 1 includes: memory 11, processor 12, knowledge base 13, network interface 14 and communication bus 15.Its
In, knowledge base 13 is stored on memory 11, and the sentence for containing business entity pair is extracted from knowledge base 13 as training sample
Sentence establishes sample database.
Wherein, network interface 14 optionally may include standard wireline interface and wireless interface (such as WI-FI interface).It is logical
Believe bus 15 for realizing the connection communication between these components.
Memory 11 includes at least a type of readable storage medium storing program for executing.The readable storage medium storing program for executing of at least one type
It can be the non-volatile memory medium of such as flash memory, hard disk, multimedia card, card-type memory.In some embodiments, described to deposit
Reservoir 11 can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.In other embodiments
In, the memory 11 is also possible to the external memory unit of the electronic device 1, such as be equipped on the electronic device 1
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge
Deposit card (Flash Card) etc..
In the present embodiment, the memory 11 can be not only used for storage be installed on the electronic device 1 using soft
Part and Various types of data, such as business connection extraction procedure 10, knowledge base 13 and sample database, can be also used for temporarily storing
Output or the data that will be exported.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit,
CPU), microprocessor or other data processing chips, program code or processing data for being stored in run memory 11, example
Such as execute the training of the computer program code and each class model of business connection extraction procedure 10.
Preferably, which can also include display, and display is properly termed as display screen or display unit.?
Display can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and OLED (Organic in some embodiments
Light-Emitting Diode, Organic Light Emitting Diode) touch device etc..Display is handled in the electronic apparatus 1 for showing
Information and for showing visual working interface, such as: display model training result and weight aiOptimal value.
Preferably, which can also include user interface, and user interface may include input unit such as keyboard
(Keyboard), instantaneous speech power such as sound equipment, earphone etc., optionally user interface can also include that the wired of standard connects
Mouth, wireless interface.
In Installation practice shown in Fig. 1, closed as enterprise is stored in a kind of memory 11 of computer storage medium
It is the program code of extraction procedure 10, when processor 12 executes the program code of business connection extraction procedure 10, realizes following step
It is rapid:
Sample database establishment step: it is extracted from knowledge base there are the business entity of relationship to sentence as training sample sentence foundation
Sample database;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset point
Word tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and by each trained sample sentence
It is mapped to sentence vector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, being calculated from left to right with shot and long term memory module current
Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector
hi', obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to
The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei;
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize
Average vector expression formula indicates the average vector S of the business entity pair;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real
The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functioni, obtain trained follow
Ring neural network model;
Prediction steps: extracting the sentence comprising Liang Ge business entity from current text, remembers mould by two-way shot and long term
Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains
Relationship between the Liang Ge business entity.
In the present embodiment, it is assumed that there are certain relationships in knowledge base for Liang Ge business entity, then real comprising the Liang Ge enterprise
The unstructured sentence of body can represent this relationship.Therefore, when we need to identify in news certain Liang Ge business entity it
Between association when, from knowledge base extraction include the Liang Ge business entity all unstructured sentences, using the sentence as
Training sample sentence establishes sample database.Wherein, the knowledge base is real comprising any two enterprise in history news data by collecting
What the unstructured sentence of body was established.For example, it is desired to the association in news between certain Liang Ge business entity be identified, from knowledge base
All unstructured sentences containing the Liang Ge business entity are extracted, and establish a sample for the sentence as training sample sentence
Library.Wherein business entity includes the relationships such as treasury trade, supply chain and cooperation to existing relationship." Foxconn is for example, sentence
Rub and visit the supplier of bicycle " in include business entity to the relationship for " Foxconn ", " rub and visit bicycle ", between business entity
" supplier " belongs to supply chain relationship.
All trained sample sentences comprising a business entity pair are extracted from sample database, each trained sample sentence includes this to enterprise
The title of industry entity and the relationship type of the business entity pair, and each trained sample sentence is carried out at participle using participle tool
Reason.Wherein it is possible to be divided using participles tools such as Stanford Chinese word segmenting tool, jieba participles each trained sample sentence
Word processing.Each word after participle is indicated in the form of one-hot vector, obtains initial term vector.Wherein one-hot vector
Method refer to each vocabulary be shown as a very long vector, the dimension of vector indicate word number, only one of them dimension
Value be 1, remaining dimension be 0, which represents current word.For example, extracting from sample database includes Foxconn and Mo Bai bicycle
All trained sample sentences, and each trained sample sentence includes Foxconn and rubs and visit the bicycle Liang Ge business entity title and the enterprise
The relationship type (supplier) of industry entity pair.Word segmentation processing is carried out to " Foxconn is the supplier for rubbing and visiing bicycle ", is obtained as follows
As a result " Foxconn | be | rub and visit bicycle | | supplier ".If the initial term vector of " Foxconn " is [0100000000], "Yes"
Initial term vector be [0010000000].Then ID is marked for each trained sample sentence, sentence ID is mapped as corresponding training sample
The initial one vector of sentence.
The initial term vector of the left and right adjacent word of some word in initial one vector sum training sample sentence is inputted into the company
Continuous bag of words, prediction obtain the term vector x of the wordi.By the initial one vector update replace with the first update sentence to
Amount, will be described in initial term vector input of the left and right adjacent word of next word in the first update sentence vector sum training sample sentence
Continuous bag of words, prediction obtain the term vector x of the wordi+1, the first update sentence vector update is replaced with into the second update
Sentence vector, such repetitive exercise, training updates the sentence vector of the training sample sentence every time, until prediction obtains training in sample sentence
The term vector x of each wordi, i=(0,1,2,3 ..., m), will the updated sentence vector of last time training as the training
The sentence vector S of sample sentencei, i=(0,1,2,3 ..., n).As Recognition with Recurrent Neural Network (Recurrent Neural
Network, RNN) model first layer input.For example, by the left adjoining of "Yes" can word " Foxconn ", right adjoining can word
The initial term vector and initial one vector of " rub and visit bicycle " input continuous bag of words, and prediction obtains the term vector of "Yes"
x2, initial one vector is once updated, the first update sentence vector is obtained;The left adjoining that " will be rubbed and visit bicycle " can word
The initial term vector or current term vector of "Yes", right adjoining can word " " initial word vector sum first to update sentence vector defeated
Enter continuous bag of words, prediction obtains the term vector x of " rub and visit bicycle "3, the first update sentence vector is updated, obtains the
Two update the such repetitive exercises of sentence vector ..., until prediction obtain it is above-mentioned it is all can word term vector xi, update obtains
The sentence vector S of the training sample sentencei.In the process, the sentence ID of each news sentence remains constant.
In the second layer of RNN model, then with shot and long term memory module (Long Short-Term Memory, LSTM) from
From left to right is according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current word vector xi?
One hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hidden layer state vector
hi+1Calculate current word vector xiThe second hidden layer state vector hi', two hiding stratiforms are spliced by Concatenate function
State vector obtains training the synthesis hidden layer state vector of each word in sample sentence, hidden further according to the synthesis of all words in training sample sentence
Hiding layer state vector obtains the feature vector T of each trained sample sentencei, i=(0,1,2,3 ..., n).For example, " Foxconn is to rub
Visit the supplier of bicycle " in sentence, with LSTM from left to right according to the term vector x of " Foxconn "1Hidden layer state vector h1Meter
Calculate the term vector x of "Yes"2The first hidden layer state vector h2, and from right to left according to the term vector x of " rub and visit bicycle "3It is hidden
Hide layer state vector h3Calculate the term vector x of "Yes"2The second hidden layer state vector h2', it is spelled by Concatenate function
Meet two hidden layer state vector (h2And h2') the synthesis hidden layer state vector of each word in trained sample sentence is obtained, further according to instruction
The synthesis hidden layer state vector for practicing all words in sample sentence obtains the feature vector T of each trained sample sentencei。
In the third layer of RNN model, according to the feature vector T of each trained sample sentencei, public using the calculating of average vector
Formula: S=sum (ai*Ti)/n indicates the average vector S of the business entity pair.Wherein aiRepresent training sample sentence weight, for
Definite value, TiThe feature vector of the trained sample sentence of each of described business entity pair is represented, n represents the quantity of training sample sentence.
In the last layer of RNN model, average vector S is updated to softmax classification function:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent institute
State the business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type
In probability.According to the relationship type of training Yang Juzhong business entity pair, the weight a of training sample sentence is determinedi.By constantly learning
It practises, continues to optimize the weight a of trained sample sentencei, so that effectively sentence obtains higher weight, and there have the sentence of noise to obtain to be smaller
Weight.
In the present embodiment, after RNN model determines, the unstructured sentence of business entity pair can be had to any one
Son carries out Relationship Prediction, and the prediction of model is not associated with specific enterprise name.
The sentence of the business entity comprising two relationships to be predicted is extracted from current text, and these sentences are divided
Word obtains sentence vector.For example, S1,S2,S3,S4What is indicated is the vector set of the corresponding sentence of Liang Ge business entity.By double
Each sentence is extracted to shot and long term memory module (Bidirectional Long Short-term Memory, bi-LSTM)
Feature vector T1,T2,T3,T4, the feature vector of each sentence is inputted into trained RNN model, obtains Liang Ge enterprise reality
Relationship Prediction result between body.
Above-described embodiment propose business connection extracting method, by from non-structured text extract knowledge base in exist
The training sample sentence of the business entity pair of relationship establishes sample database.It include all training of a business entity pair in sample drawn library
Sample sentence is simultaneously segmented, and the sentence vector S of each trained sample sentence is obtainedi, using LSTM calculate the feature of each trained sample sentence to
Measure Ti.Average vector S is substituted into softmax by the average vector S that each trained sample sentence is calculated by the calculation formula of average vector
Classification function is calculated, and the weight a of training sample sentence is determined according to the relationship type of business entity pairi.Finally from current text
It is middle to extract the sentence comprising Liang Ge business entity, the feature vector T of sentence is obtained by bi-LSTMi, by this feature vector TiIt is defeated
Enter trained RNN model, predict the relationship between the Liang Ge business entity, not only reduces cumbersome training data and manually mark
Step, and have better accuracy rate and recall rate than other monitor modes.
As shown in Fig. 2, being the module diagram of 10 preferred embodiment of business connection extraction procedure in Fig. 1.Alleged by the present invention
Module be refer to complete specific function series of computation machine program instruction section.
In the present embodiment, business connection extraction procedure 10 includes: to establish module 110, word segmentation module 120, splicing module
130, computing module 140, weight determination module 150, prediction module 160, the functions or operations that the module 110-160 is realized
Step is similar as above, and and will not be described here in detail, illustratively, such as wherein:
Module 110 is established, there are the business entities of relationship to build sentence as training sample sentence for extracting from knowledge base
Vertical sample database;
Word segmentation module 120, for extracting all trained sample sentences comprising a business entity pair from sample database, using pre-
If participle tool each trained sample sentence is segmented, each word after participle is mapped to term vector xi, and by each instruction
Practice sample sentence and is mapped to sentence vector Si, input as RNN model first layer;
Splicing module 130 calculates current word vector x with LSTM for the second layer in RNN model from left to righti?
One hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', pass through splicing
Two hidden layer state vectors obtain training the synthesis hidden layer state vector of each word in sample sentence, further according to institute in training sample sentence
There is the synthesis hidden layer state vector of word to obtain the feature vector T of each trained sample sentencei;
Computing module 140, for the third layer in RNN model, according to the feature vector T of each trained sample sentencei, using flat
Equal vector expression indicates the average vector S of the business entity pair;
Weight determination module 150, for the last layer in RNN model, by the average vector S and the business entity
Pair relationship type substitute into softmax classification function the weight a of the trained sample sentence of each of described business entity pair be calculatedi,
Obtain trained RNN model;
Prediction module 160 is obtained for extracting the sentence comprising Liang Ge business entity from current text by bi-LSTM
To the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained RNN model is inputted, prediction obtains Liang Ge enterprise reality
Relationship between body.
As shown in figure 3, being the flow chart of business connection extracting method preferred embodiment of the present invention.
In the present embodiment, processor 12 executes the computer journey of the business connection extraction procedure 10 stored in memory 11
The following steps of business connection extracting method are realized when sequence:
Step S10 is extracted to be used as sentence there are the business entity of relationship from knowledge base and sample sentence is trained to establish sample database;
Step S20 extracts all trained sample sentences comprising a business entity pair from sample database, uses preset participle
Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and each trained sample sentence is reflected
Penetrate the subvector S that forms a complete sentencei, input as RNN model first layer;
Step S30 calculates current word vector x with LSTM in the second layer of RNN model from left to rightiThe first hidden layer
State vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', it is hidden by splicing two
Layer state vector obtain train sample sentence in each word synthesis hidden layer state vector, further according to training sample sentence in all words it is comprehensive
It closes hidden layer state vector and obtains the feature vector T of each trained sample sentencei;
Step S40, in the third layer of RNN model, according to the feature vector T of each trained sample sentencei, utilize average vector table
The average vector S of the business entity pair is indicated up to formula;
Step S50, in the last layer of RNN model, by the average vector S and the relationship type of the business entity pair
Substitute into the weight a that the trained sample sentence of each of described business entity pair is calculated in softmax classification functioni, obtain trained
RNN model;
Step S60 extracts the sentence comprising Liang Ge business entity from current text, obtains sentence by bi-LSTM
Feature vector Ti, by this feature vector TiAbove-mentioned trained RNN model is inputted, prediction obtains the pass between the Liang Ge business entity
System.
In the present embodiment, it is assumed that there are certain relationships in knowledge base for Liang Ge business entity, then real comprising the Liang Ge enterprise
The unstructured sentence of body can represent this relationship.When we need to identify the pass in news between certain Liang Ge business entity
When connection, extraction includes all unstructured sentences of the Liang Ge business entity from knowledge base, using the sentence as training sample
Sentence establishes sample database.Wherein, the knowledge base is non-comprising any two business entity in history news data by collecting
What structuring sentence was established.For example, it is desired to identify the association in news between certain Liang Ge business entity, extracts and contain from knowledge base
There are all unstructured sentences of the Liang Ge business entity, and establishes a sample database for the sentence as training sample sentence.Its
Middle business entity includes the relationships such as treasury trade, supply chain and cooperation to existing relationship.For example, being taken out from non-structured text
Take the sentence for containing " Foxconn " and " rub and visit bicycle " Liang Ge business entity pair as training sample sentence, wherein " Foxconn is sentence
Rub and visit the supplier of bicycle " in include business entity to the relationship for " Foxconn ", " rub and visit bicycle ", between business entity
" supplier " belongs to supply chain relationship.
All trained sample sentences comprising a business entity pair are extracted from sample database, each trained sample sentence includes this to enterprise
The title of industry entity and the relationship type of the business entity pair, and each trained sample sentence is carried out at participle using participle tool
Reason.For example, extracting all trained sample sentences comprising Foxconn and Mo Bai bicycle from sample database, and each trained sample sentence wraps
It includes Foxconn and rubs and visit the relationship type (supplier) of the bicycle Liang Ge business entity title and the business entity pair.It uses
The participle tools such as Stanford Chinese word segmenting tool, jieba participle carry out word segmentation processing to each trained sample sentence.Such as: to " rich
Scholar's health is the supplier for rubbing and visiing bicycle " carry out word segmentation processing, obtain following result " Foxconn | be | rub and visit bicycle | | supply
Quotient ".Each word after participle is indicated in the form of one-hot vector, obtains initial term vector.Wherein one-hot vector
Method, which refers to, is shown as a very long vector each vocabulary, the dimension of vector indicate word number, only one of them dimension
Value is 1, remaining dimension is 0, which represents current word.For example, the initial term vector of " Foxconn " be [0100000000],
The initial term vector of "Yes" is [0010000000].Then ID is marked for each trained sample sentence, sentence ID is mapped as corresponding instruction
Practice the initial one vector of sample sentence.
The initial term vector of the left and right adjacent word of some word in initial one vector sum training sample sentence is inputted into the company
Continuous bag of words, prediction obtain the term vector x of the wordi.By the initial one vector update replace with the first update sentence to
Amount, will be described in initial term vector input of the left and right adjacent word of next word in the first update sentence vector sum training sample sentence
Continuous bag of words, prediction obtain the term vector x of the wordi+1, the first update sentence vector update is replaced with into the second update
Sentence vector, such repetitive exercise, training updates the sentence vector of the training sample sentence every time, until prediction obtains training in sample sentence
The term vector x of each wordi, i=(0,1,2,3 ..., m), will the updated sentence vector of last time training as the training
The sentence vector S of sample sentencei, i=(0,1,2,3 ..., n).For example, in " Foxconn is the supplier for rubbing and visiing bicycle " sentence, it will
The left adjoining of "Yes" can word " Foxconn ", right adjoining can word " rub and visit bicycle " initial term vector and initial one vector
Continuous bag of words are inputted, prediction obtains the term vector x of "Yes"2, initial one vector is once updated, obtains first more
New sentence vector;The left adjoining that " will be rubbed and visit bicycle " can word "Yes" initial term vector or current term vector, right adjoining it is available
Word " " initial word vector sum first update sentence vector and input continuous bag of words, prediction obtain the word of " rub and visit bicycle " to
Measure x3, the first update sentence vector is updated, the second such repetitive exercise of update sentence vector ... is obtained, until prediction
Obtain it is above-mentioned it is all can word term vector xi, update and obtain the sentence vector S of the training sample sentencei.In the process, Mei Gexin
The sentence ID for hearing sentence remains constant.
In the second layer of RNN model, then with LSTM from left to right according to current word vector xiPrevious term vector xi-1
Hidden layer state vector hi-1Calculate current word vector xiThe first hidden layer state vector hi, and from right to left according to current word
Vector xiThe latter term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector
hi', the synthesis hidden layer that two hidden layer state vectors obtain training each word in sample sentence is spliced by Concatenate function
State vector obtains the feature vector of each trained sample sentence further according to the synthesis hidden layer state vector of all words in training sample sentence
Ti, i=(0,1,2,3 ..., n).For example, in " Foxconn is the supplier for rubbing and visiing bicycle " sentence, with LSTM root from left to right
According to the term vector x of " Foxconn "1Hidden layer state vector h1Calculate the term vector x of "Yes"2The first hidden layer state vector
h2, and from right to left according to the term vector x of " rub and visit bicycle "3Hidden layer state vector h3Calculate the term vector x of "Yes"2?
Two hidden layer state vector h2', two hidden layer state vector (h are spliced by Concatenate function2And h2') trained
The synthesis hidden layer state vector of each word in sample sentence is obtained further according to the synthesis hidden layer state vector of all words in training sample sentence
To the feature vector T of each trained sample sentencei。
In the third layer of RNN model, according to the feature vector T of each trained sample sentencei, public using the calculating of average vector
Formula: S=sum (ai*Ti)/n indicates the average vector S of the business entity pair.Wherein aiRepresent the weight of training sample sentence, TiGeneration
The feature vector of the trained sample sentence of each of business entity pair described in table, n represent the quantity of training sample sentence.It is assumed that from knowledge base
The training sample sentence for extracting " Foxconn " and " rub and visit bicycle " entity pair has 50,000, then by the feature vector T of every trained sample sentencei,
I=(0,1,2,3 ..., n) substitutes into the calculation formula of average vector: S=sum (ai*Ti)/n calculates " Foxconn " and " rubs and visit list
The average vector S of vehicle " entity pair.Wherein n is equal to 50,000.
In the last layer of RNN model, average vector S is then updated to softmax classification function:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent institute
State the business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type
In probability.According to the relationship type of training Yang Juzhong business entity pair, the weight a of training sample sentence is determinedi.By constantly changing
Generation study, continues to optimize the weight a of trained sample sentencei, so that effectively sentence obtains higher weight, and there is the sentence of noise to obtain
Lesser weight, to obtain reliable RNN model.
In the present embodiment, after RNN model determines, the unstructured sentence of business entity pair can be had to any one
Son carries out Relationship Prediction, and the prediction of model is not associated with specific enterprise name.
Finally, as shown in figure 4, being the frame diagram of prediction module of the present invention.It extracts comprising two from current text to pre-
The sentence of the business entity of survey relationship extracts the sentence comprising " Chinese safety group " and " Bank of China " such as from news, and
These sentences are segmented to obtain sentence vector.For example, S1,S2,S3,S4What is indicated is the corresponding sentence of Liang Ge business entity
Vector set.The feature vector T of each sentence is extracted by bi-LSTM1,T2,T3,T4, then by calculating TiWith relation object
The similarity of type r vector assigns TiIn the weight that entire sentence is concentrated, finally takes in each sentence weighting and pass through with after
Softmax classifier predicts the relationship between " Chinese safety group " and " Bank of China ".
Above-described embodiment propose business connection extracting method, by from non-structured text extract knowledge base in exist
The sentence of the business entity pair of relationship is as training sample sentence and establishes sample database.It include a business entity pair in sample drawn library
All trained sample sentences and segmented, obtain the sentence vector S of each trained sample sentencei, each trained sample is calculated using LSTM
The feature vector T of sentencei.Then the average vector S that the business entity pair is indicated by the calculation formula of average vector, will be averaged
Vector S substitutes into softmax classification function and is calculated, and the weight of training sample sentence is determined according to the relationship type of business entity pair
ai, obtain trained RNN model.The sentence comprising Liang Ge business entity is finally extracted from current text, by bi-LSTM
Obtain the feature vector T of sentencei, by this feature vector TiTrained RNN model is inputted, is predicted between the Liang Ge business entity
Relationship improves in news between the recognition capability of relationship different enterprises and to the early warning of business risk, reduces cumbersome training
The artificial annotation step of data.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium
In include business connection extraction procedure 10, following operation is realized when the business connection extraction procedure 10 is executed by processor:
Sample database establishment step: it is extracted from knowledge base there are the business entity of relationship to sentence as training sample sentence foundation
Sample database;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset point
Word tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and by each trained sample sentence
It is mapped to sentence vector Si, input as RNN model first layer;
Splicing step: in the second layer of RNN model, current word vector x is calculated from left to right with LSTMiThe first hidden layer
State vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', it is hidden by splicing two
Layer state vector obtain train sample sentence in each word synthesis hidden layer state vector, further according to training sample sentence in all words it is comprehensive
It closes hidden layer state vector and obtains the feature vector T of each trained sample sentencei;
Calculate step: in the third layer of RNN model, according to the feature vector T of each trained sample sentencei, utilize average vector
Expression formula indicates the average vector S of the business entity pair;
Weight determines step: in the last layer of RNN model, by the average vector S and the pass of the business entity pair
Set type substitutes into the weight a that each trained sample sentence is calculated in softmax classification functioni, obtain trained RNN model;
Prediction steps: the sentence comprising Liang Ge business entity is extracted from current text, obtains sentence by bi-LSTM
Feature vector Ti, by this feature vector TiAbove-mentioned trained RNN model is inputted, prediction obtains the pass between the Liang Ge business entity
System.
Preferably, the participle step includes:
Each word after participle is indicated in the form of one-hot vector, obtains initial term vector, and is each trained sample
Sentence ID is mapped as the initial one vector of corresponding training sample sentence by sentence mark sentence ID, by the initial one vector sum instruction
The initial term vector for practicing the left and right adjacent word of some word in sample sentence inputs the continuous bag of words, and prediction obtains the word of the word
Vector xi, the sentence vector of each forecast updating training sample sentence, until prediction obtain the word of each word in the training sample sentence to
Measure xi, using the updated sentence vector of last time as the sentence vector S of the training sample sentencei。
Preferably, the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1It calculates current
Term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1It is hidden
Hide layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula are as follows:
S=sum (ai*Ti)/n
Wherein aiIt represents the weight of training sample sentence, be required value, TiRepresent the trained sample sentence of each of described business entity pair
Feature vector, n represent the quantity of training sample sentence.
Preferably, the softmax classification function expression formula are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Described in representative
The business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type
Probability.
The specific embodiment of the computer readable storage medium of the present invention is specific with above-mentioned business connection extracting method
Embodiment is roughly the same, and details are not described herein.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone,
Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (8)
1. a kind of business connection extracting method, which is characterized in that the described method includes:
Sample database establishment step: it extracts to be used as sentence there are the business entity of relationship from knowledge base and sample sentence is trained to establish sample
Library;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset participle work
Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and each trained sample sentence is mapped
Form a complete sentence subvector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, with shot and long term memory module calculate from left to right current word to
Measure xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi',
The synthesis hidden layer state vector for obtaining training each word in sample sentence by splicing two hidden layer state vectors, further according to training
The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in sample sentencei;
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the spy of the trained sample sentence of each of the business entity pair
Levy vector Ti, the average vector S:S=sum (a of the business entity pair is indicated using average vector expression formulai*Ti)/n, wherein
aiIt represents the weight of each trained sample sentence, be required value, TiThe feature vector of each trained sample sentence is represented, n represents training sample sentence
Quantity;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, by the average vector S and the business entity pair
Relationship type substitute into softmax classification function the weight a of each trained sample sentence be calculatedi, obtain trained circulation mind
Through network model;
Prediction steps: the sentence comprising Liang Ge business entity is extracted from current text, is obtained by two-way shot and long term memory module
To the feature vector T of sentencei, by this feature vector TiInput above-mentioned trained Recognition with Recurrent Neural Network model, prediction obtain this two
Relationship between a business entity.
2. business connection extracting method according to claim 1, which is characterized in that the participle step includes:
Each word after participle is indicated in the form of one-hot vector, obtains initial term vector, and is each trained sample sentence mark
Sentence ID is infused, sentence ID is mapped as to the initial one vector of corresponding training sample sentence, by the initial one vector sum training sample
The initial term vector of the left and right adjacent word of some word inputs continuous bag of words in sentence, and prediction obtains the term vector x of the wordi, often
The sentence vector of the secondary forecast updating training sample sentence, until prediction obtains the term vector x of each word in the training sample sentencei, with most
Sentence vector S of the primary updated sentence vector as the training sample sentence afterwardsi。
3. business connection extracting method according to claim 1, which is characterized in that the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current term vector
xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hiding stratiform
State vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
4. business connection extracting method according to claim 1, which is characterized in that the table of the softmax classification function
Up to formula are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent the enterprise
The business connection type of entity pair, σ (z)jIt is general in each business connection type to represent the business connection type for needing to predict
Rate.
5. a kind of electronic device, which is characterized in that described device includes: memory, processor, and enterprise is stored on the memory
Industry relationship extraction procedure, the business connection extraction procedure are executed by the processor, it can be achieved that following steps:
Sample database establishment step: it extracts to be used as sentence there are the business entity of relationship from knowledge base and sample sentence is trained to establish sample
Library;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset participle work
Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and each trained sample sentence is mapped
Form a complete sentence subvector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, with shot and long term memory module calculate from left to right current word to
Measure xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi',
The synthesis hidden layer state vector for obtaining training each word in sample sentence by splicing two hidden layer state vectors, further according to training
The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in sample sentencei;
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the spy of the trained sample sentence of each of the business entity pair
Levy vector Ti, the average vector S:S=sum (a of the business entity pair is indicated using average vector expression formulai*Ti)/n, wherein
aiIt represents the weight of each trained sample sentence, be required value, TiThe feature vector of each trained sample sentence is represented, n represents training sample sentence
Quantity;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, by the average vector S and the business entity pair
Relationship type substitute into softmax classification function the weight a of each trained sample sentence be calculatedi, obtain trained circulation mind
Through network model;
Prediction steps: the sentence comprising Liang Ge business entity is extracted from current text, is obtained by two-way shot and long term memory module
To the feature vector T of sentencei, by this feature vector TiInput above-mentioned trained Recognition with Recurrent Neural Network model, prediction obtain this two
Relationship between a business entity.
6. electronic device according to claim 5, which is characterized in that the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current term vector
xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hiding stratiform
State vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
7. electronic device according to claim 5, which is characterized in that the expression formula of the softmax classification function are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent the enterprise
The business connection type of entity pair, σ (z)jIt is general in each business connection type to represent the business connection type for needing to predict
Rate.
8. a kind of computer readable storage medium, which is characterized in that include business connection in the computer readable storage medium
Extraction procedure, it can be achieved that as described in any one of claims 1 to 5 when the business connection extraction procedure is executed by processor
The step of business connection extracting method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711061205.0A CN107943847B (en) | 2017-11-02 | 2017-11-02 | Business connection extracting method, device and storage medium |
PCT/CN2018/076119 WO2019085328A1 (en) | 2017-11-02 | 2018-02-10 | Enterprise relationship extraction method and device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711061205.0A CN107943847B (en) | 2017-11-02 | 2017-11-02 | Business connection extracting method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107943847A CN107943847A (en) | 2018-04-20 |
CN107943847B true CN107943847B (en) | 2019-05-17 |
Family
ID=61934111
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711061205.0A Active CN107943847B (en) | 2017-11-02 | 2017-11-02 | Business connection extracting method, device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107943847B (en) |
WO (1) | WO2019085328A1 (en) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108876044B (en) * | 2018-06-25 | 2021-02-26 | 中国人民大学 | Online content popularity prediction method based on knowledge-enhanced neural network |
CN108920587B (en) * | 2018-06-26 | 2021-09-24 | 清华大学 | Open domain visual question-answering method and device fusing external knowledge |
CN108985501B (en) * | 2018-06-29 | 2022-04-29 | 平安科技(深圳)有限公司 | Index feature extraction-based stock index prediction method, server and storage medium |
CN109243616A (en) * | 2018-06-29 | 2019-01-18 | 东华大学 | Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning |
CN110737758B (en) * | 2018-07-03 | 2022-07-05 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating a model |
CN109063032B (en) * | 2018-07-16 | 2020-09-11 | 清华大学 | Noise reduction method for remote supervision and retrieval data |
CN109597851B (en) * | 2018-09-26 | 2023-03-21 | 创新先进技术有限公司 | Feature extraction method and device based on incidence relation |
CN109376250A (en) * | 2018-09-27 | 2019-02-22 | 中山大学 | Entity relationship based on intensified learning combines abstracting method |
CN109582956B (en) * | 2018-11-15 | 2022-11-11 | 中国人民解放军国防科技大学 | Text representation method and device applied to sentence embedding |
CN109710768B (en) * | 2019-01-10 | 2020-07-28 | 西安交通大学 | Tax payer industry two-level classification method based on MIMO recurrent neural network |
CN112036181A (en) * | 2019-05-14 | 2020-12-04 | 上海晶赞融宣科技有限公司 | Entity relationship identification method and device and computer readable storage medium |
CN110209836B (en) * | 2019-05-17 | 2022-04-26 | 北京邮电大学 | Remote supervision relation extraction method and device |
CN111950279B (en) * | 2019-05-17 | 2023-06-23 | 百度在线网络技术(北京)有限公司 | Entity relationship processing method, device, equipment and computer readable storage medium |
CN110188201A (en) * | 2019-05-27 | 2019-08-30 | 上海上湖信息技术有限公司 | A kind of information matching method and equipment |
CN110188202B (en) * | 2019-06-06 | 2021-07-20 | 北京百度网讯科技有限公司 | Training method and device of semantic relation recognition model and terminal |
CN110427624B (en) * | 2019-07-30 | 2023-04-25 | 北京百度网讯科技有限公司 | Entity relation extraction method and device |
CN110619053A (en) * | 2019-09-18 | 2019-12-27 | 北京百度网讯科技有限公司 | Training method of entity relation extraction model and method for extracting entity relation |
CN110879938A (en) * | 2019-11-14 | 2020-03-13 | 中国联合网络通信集团有限公司 | Text emotion classification method, device, equipment and storage medium |
CN111382843B (en) * | 2020-03-06 | 2023-10-20 | 浙江网商银行股份有限公司 | Method and device for establishing enterprise upstream and downstream relationship identification model and mining relationship |
CN111476035B (en) * | 2020-05-06 | 2023-09-05 | 中国人民解放军国防科技大学 | Chinese open relation prediction method, device, computer equipment and storage medium |
CN111581387B (en) * | 2020-05-09 | 2022-10-11 | 电子科技大学 | Entity relation joint extraction method based on loss optimization |
CN111680127A (en) * | 2020-06-11 | 2020-09-18 | 暨南大学 | Annual report-oriented company name and relationship extraction method |
CN111784488B (en) * | 2020-06-28 | 2023-08-01 | 中国工商银行股份有限公司 | Enterprise fund risk prediction method and device |
CN112418320B (en) * | 2020-11-24 | 2024-01-19 | 杭州未名信科科技有限公司 | Enterprise association relation identification method, device and storage medium |
CN113486630B (en) * | 2021-09-07 | 2021-11-19 | 浙江大学 | Supply chain data vectorization and visualization processing method and device |
CN113806538B (en) * | 2021-09-17 | 2023-08-22 | 平安银行股份有限公司 | Label extraction model training method, device, equipment and storage medium |
CN116562303B (en) * | 2023-07-04 | 2023-11-21 | 之江实验室 | Reference resolution method and device for reference external knowledge |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106372058A (en) * | 2016-08-29 | 2017-02-01 | 中译语通科技(北京)有限公司 | Short text emotion factor extraction method and device based on deep learning |
CN106407211A (en) * | 2015-07-30 | 2017-02-15 | 富士通株式会社 | Method and device for classifying semantic relationships among entity words |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106855853A (en) * | 2016-12-28 | 2017-06-16 | 成都数联铭品科技有限公司 | Entity relation extraction system based on deep neural network |
CN107220237A (en) * | 2017-05-24 | 2017-09-29 | 南京大学 | A kind of method of business entity's Relation extraction based on convolutional neural networks |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160217393A1 (en) * | 2013-09-12 | 2016-07-28 | Hewlett-Packard Development Company, L.P. | Information extraction |
CN107194422A (en) * | 2017-06-19 | 2017-09-22 | 中国人民解放军国防科学技术大学 | A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination |
-
2017
- 2017-11-02 CN CN201711061205.0A patent/CN107943847B/en active Active
-
2018
- 2018-02-10 WO PCT/CN2018/076119 patent/WO2019085328A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407211A (en) * | 2015-07-30 | 2017-02-15 | 富士通株式会社 | Method and device for classifying semantic relationships among entity words |
CN106372058A (en) * | 2016-08-29 | 2017-02-01 | 中译语通科技(北京)有限公司 | Short text emotion factor extraction method and device based on deep learning |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106855853A (en) * | 2016-12-28 | 2017-06-16 | 成都数联铭品科技有限公司 | Entity relation extraction system based on deep neural network |
CN107220237A (en) * | 2017-05-24 | 2017-09-29 | 南京大学 | A kind of method of business entity's Relation extraction based on convolutional neural networks |
Non-Patent Citations (5)
Title |
---|
An Improved Method for Chinese Company Name and Abbreviation Recognition;Lei Meng et al;《 Knowledge Management in Organizations》;20170712;第435-447页 |
Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification;Peng Zhou et al;《Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics》;20160812;第207-212页 |
Classifying Relation via Bidirectional Recurrent Neural Network Based on Local Information;Xiaoyun Hou et al;《Web Technologies and Applications》;20160917;第420-430页 |
基于LSTM的语义关系分类研究;胡新辰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160215;第2016年卷(第02期);第I138-2096页 |
远程监督人物关系抽取中的去噪研究;黄蓓静 等;《计算机应用与软件》;20170731;第34卷(第7期);第11-18页 |
Also Published As
Publication number | Publication date |
---|---|
CN107943847A (en) | 2018-04-20 |
WO2019085328A1 (en) | 2019-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107943847B (en) | Business connection extracting method, device and storage medium | |
CN107330011B (en) | The recognition methods of the name entity of more strategy fusions and device | |
CN107832299B (en) | Title rewriting processing method and device based on artificial intelligence and readable medium | |
CN110489555A (en) | A kind of language model pre-training method of combination class word information | |
CN108563703A (en) | A kind of determination method of charge, device and computer equipment, storage medium | |
CN108647205A (en) | Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing | |
CN108304468A (en) | A kind of file classification method and document sorting apparatus | |
CN113051356B (en) | Open relation extraction method and device, electronic equipment and storage medium | |
CN104809105B (en) | Recognition methods and the system of event argument and argument roles based on maximum entropy | |
CN113378970B (en) | Sentence similarity detection method and device, electronic equipment and storage medium | |
CN116097250A (en) | Layout aware multimodal pre-training for multimodal document understanding | |
WO2021139316A1 (en) | Method and apparatus for establishing expression recognition model, and computer device and storage medium | |
CN110059924A (en) | Checking method, device, equipment and the computer readable storage medium of contract terms | |
CN115392237B (en) | Emotion analysis model training method, device, equipment and storage medium | |
CN113707299A (en) | Auxiliary diagnosis method and device based on inquiry session and computer equipment | |
CN107943788A (en) | Enterprise's abbreviation generation method, device and storage medium | |
CN113821622B (en) | Answer retrieval method and device based on artificial intelligence, electronic equipment and medium | |
CN113627797B (en) | Method, device, computer equipment and storage medium for generating staff member portrait | |
CN110489765A (en) | Machine translation method, device and computer readable storage medium | |
CN117290515A (en) | Training method of text annotation model, method and device for generating text graph | |
CN112632227A (en) | Resume matching method, resume matching device, electronic equipment, storage medium and program product | |
CN116681082A (en) | Discrete text semantic segmentation method, device, equipment and storage medium | |
CN116821373A (en) | Map-based prompt recommendation method, device, equipment and medium | |
CN116628162A (en) | Semantic question-answering method, device, equipment and storage medium | |
CN115510188A (en) | Text keyword association method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |