CN107943847B - Business connection extracting method, device and storage medium - Google Patents

Business connection extracting method, device and storage medium Download PDF

Info

Publication number
CN107943847B
CN107943847B CN201711061205.0A CN201711061205A CN107943847B CN 107943847 B CN107943847 B CN 107943847B CN 201711061205 A CN201711061205 A CN 201711061205A CN 107943847 B CN107943847 B CN 107943847B
Authority
CN
China
Prior art keywords
vector
sentence
trained
sample
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711061205.0A
Other languages
Chinese (zh)
Other versions
CN107943847A (en
Inventor
徐冰
汪伟
罗傲雪
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201711061205.0A priority Critical patent/CN107943847B/en
Priority to PCT/CN2018/076119 priority patent/WO2019085328A1/en
Publication of CN107943847A publication Critical patent/CN107943847A/en
Application granted granted Critical
Publication of CN107943847B publication Critical patent/CN107943847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The invention discloses a kind of business connection extracting method, device and storage mediums, this method comprises: extracting from knowledge base, there are the business entities of relationship to establish sample database as training sample sentence to sentence;All trained sample sentences comprising a business entity pair are extracted from sample database and are segmented, and each word is mapped to term vector xi, it is mapped to sentence vector Si;Term vector x is calculated with LSTMiThe first hidden layer state vector hiWith the second hidden layer state vector hi', splicing obtains comprehensive hidden layer state vector, then obtains feature vector Ti;By feature vector TiIt substitutes into average vector expression formula and calculates average vector S;Average vector S and the relationship type of business entity pair are substituted into the weight a that softmax classification function calculates each trained sample sentencei;The sentence comprising Liang Ge business entity is extracted, obtains feature vector T by bi-LSTMi, it is input to trained RNN model, predicts the relationship of the Liang Ge enterprise, cost of labor is reduced, more accurately predicts the relationship between the Liang Ge business entity.

Description

Business connection extracting method, device and storage medium
Technical field
The present invention relates to processing data information technical field more particularly to a kind of business connection extracting methods, device and meter Calculation machine readable storage medium storing program for executing.
Background technique
The association in news between different enterprises, such as treasury trade, supply chain, cooperation are identified, to business risk early warning There is very great meaning.However now common entity relation extraction method needs manually to carry out the mark of a large amount of training datas, And corpus labeling work generally takes time and effort very much.
Summary of the invention
In view of the foregoing, the present invention provides a kind of business connection extracting method, device and computer readable storage medium, Relationship based on convolutional neural networks can be extracted on model extension to remote supervisory data, efficiently reduce model to artificial The dependence of labeled data, and this business connection extracting method for having supervision has more compared to semi-supervised or unsupervised approaches Good accuracy rate and recall rate.
To achieve the above object, the present invention provides a kind of business connection extracting method, this method comprises:
Sample database establishment step: it is extracted from knowledge base there are the business entity of relationship to sentence as training sample sentence foundation Sample database;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset point Word tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and by each trained sample sentence It is mapped to sentence vector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, being calculated from left to right with shot and long term memory module current Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize Average vector expression formula indicates the average vector S of the business entity pair;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functioni, obtain trained follow Ring neural network model;
Prediction steps: extracting the sentence comprising Liang Ge business entity from current text, remembers mould by two-way shot and long term Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains Relationship between the Liang Ge business entity.
Preferably, the participle step includes:
Each word after participle is indicated in the form of one-hot vector, obtains initial term vector, and is each trained sample Sentence ID is mapped as the initial one vector of corresponding training sample sentence by sentence mark sentence ID, by the initial one vector sum instruction The initial term vector for practicing the left and right adjacent word of some word in sample sentence inputs the continuous bag of words, and prediction obtains the word of the word Vector xi, the sentence vector of each forecast updating training sample sentence, until prediction obtain the word of each word in the training sample sentence to Measure xi, using the updated sentence vector of last time as the sentence vector S of the training sample sentencei
Preferably, the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1It calculates current Term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1It is hidden Hide layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula are as follows:
S=sum (ai*Ti)/n
Wherein aiIt represents the weight of training sample sentence, be required value, TiThe feature vector of each trained sample sentence is represented, n represents instruction Practice the quantity of sample sentence.
Preferably, the softmax classification function expression formula are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent institute State the business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type In probability.
In addition, the present invention also provides a kind of electronic device, which includes: memory, processor and is stored in institute The business connection extraction procedure that can be run on memory and on the processor is stated, the business connection extraction procedure is described Processor executes, it can be achieved that following steps:
Sample database establishment step: it is extracted from knowledge base there are the business entity of relationship to sentence as training sample sentence foundation Sample database;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset point Word tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and by each trained sample sentence It is mapped to sentence vector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, being calculated from left to right with shot and long term memory module current Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize Average vector expression formula indicates the average vector S of the business entity pair;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functioni, obtain trained follow Ring neural network model;
Prediction steps: extracting the sentence comprising Liang Ge business entity from current text, remembers mould by two-way shot and long term Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains Relationship between the Liang Ge business entity.
Preferably, the splicing step includes:
With shot and long term memory module from left to right according to current word vector xiPrevious term vector xi-1Hiding layer state Vector hi-1Calculate current word vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiIt is latter A term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula are as follows:
S=sum (ai*Ti)/n
Wherein aiIt represents the weight of training sample sentence, be required value, TiThe feature vector of each trained sample sentence is represented, n represents instruction Practice the quantity of sample sentence.
Preferably, the softmax classification function expression formula are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Described in representative The business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type Probability.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium It include business connection extraction procedure in storage medium, it can be achieved that as above when the business connection extraction procedure is executed by processor Arbitrary steps in the business connection extracting method.
Business connection extracting method, electronic device and computer readable storage medium proposed by the present invention, from unstructured It is extracted in text and is used as training sample sentence there are the sentence of the business entity pair of relationship in knowledge base and establishes sample database.Then in sample All trained sample sentences comprising a business entity pair are extracted in this library, and segment to each trained sample sentence, obtain each instruction Practice the sentence vector S of sample sentencei, the feature vector T of each trained sample sentence is calculated by shot and long term memory modulei.Then according to each The feature vector T of training sample sentenceiThe average vector S of each trained sample sentence is obtained, average vector S is substituted into softmax classification letter Number is calculated, and the weight a of training sample sentence is determined according to the relationship type of business entity pairi, obtain trained circulation nerve Network model.The sentence comprising Liang Ge business entity is finally extracted from current text, is obtained by two-way shot and long term memory module To the feature vector T of sentence, this feature vector T is inputted into trained Recognition with Recurrent Neural Network model, predicts Liang Ge enterprise reality Relationship between body improves in news between the recognition capability of relationship different enterprises, reduces and is trained data mark to artificial Dependence.
Detailed description of the invention
Fig. 1 is the schematic diagram of electronic device preferred embodiment of the present invention;
Fig. 2 is the module diagram of business connection extraction procedure preferred embodiment in Fig. 1;
Fig. 3 is the flow chart of business connection extracting method preferred embodiment of the present invention;
Fig. 4 is the frame diagram of prediction module of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, being the schematic diagram of 1 preferred embodiment of electronic device of the present invention.
In the present embodiment, electronic device 1 can be server, smart phone, tablet computer, PC, portable meter Calculation machine and other electronic equipments with calculation function.
The electronic device 1 includes: memory 11, processor 12, knowledge base 13, network interface 14 and communication bus 15.Its In, knowledge base 13 is stored on memory 11, and the sentence for containing business entity pair is extracted from knowledge base 13 as training sample Sentence establishes sample database.
Wherein, network interface 14 optionally may include standard wireline interface and wireless interface (such as WI-FI interface).It is logical Believe bus 15 for realizing the connection communication between these components.
Memory 11 includes at least a type of readable storage medium storing program for executing.The readable storage medium storing program for executing of at least one type It can be the non-volatile memory medium of such as flash memory, hard disk, multimedia card, card-type memory.In some embodiments, described to deposit Reservoir 11 can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.In other embodiments In, the memory 11 is also possible to the external memory unit of the electronic device 1, such as be equipped on the electronic device 1 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..
In the present embodiment, the memory 11 can be not only used for storage be installed on the electronic device 1 using soft Part and Various types of data, such as business connection extraction procedure 10, knowledge base 13 and sample database, can be also used for temporarily storing Output or the data that will be exported.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chips, program code or processing data for being stored in run memory 11, example Such as execute the training of the computer program code and each class model of business connection extraction procedure 10.
Preferably, which can also include display, and display is properly termed as display screen or display unit.? Display can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and OLED (Organic in some embodiments Light-Emitting Diode, Organic Light Emitting Diode) touch device etc..Display is handled in the electronic apparatus 1 for showing Information and for showing visual working interface, such as: display model training result and weight aiOptimal value.
Preferably, which can also include user interface, and user interface may include input unit such as keyboard (Keyboard), instantaneous speech power such as sound equipment, earphone etc., optionally user interface can also include that the wired of standard connects Mouth, wireless interface.
In Installation practice shown in Fig. 1, closed as enterprise is stored in a kind of memory 11 of computer storage medium It is the program code of extraction procedure 10, when processor 12 executes the program code of business connection extraction procedure 10, realizes following step It is rapid:
Sample database establishment step: it is extracted from knowledge base there are the business entity of relationship to sentence as training sample sentence foundation Sample database;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset point Word tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and by each trained sample sentence It is mapped to sentence vector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, being calculated from left to right with shot and long term memory module current Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize Average vector expression formula indicates the average vector S of the business entity pair;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functioni, obtain trained follow Ring neural network model;
Prediction steps: extracting the sentence comprising Liang Ge business entity from current text, remembers mould by two-way shot and long term Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains Relationship between the Liang Ge business entity.
In the present embodiment, it is assumed that there are certain relationships in knowledge base for Liang Ge business entity, then real comprising the Liang Ge enterprise The unstructured sentence of body can represent this relationship.Therefore, when we need to identify in news certain Liang Ge business entity it Between association when, from knowledge base extraction include the Liang Ge business entity all unstructured sentences, using the sentence as Training sample sentence establishes sample database.Wherein, the knowledge base is real comprising any two enterprise in history news data by collecting What the unstructured sentence of body was established.For example, it is desired to the association in news between certain Liang Ge business entity be identified, from knowledge base All unstructured sentences containing the Liang Ge business entity are extracted, and establish a sample for the sentence as training sample sentence Library.Wherein business entity includes the relationships such as treasury trade, supply chain and cooperation to existing relationship." Foxconn is for example, sentence Rub and visit the supplier of bicycle " in include business entity to the relationship for " Foxconn ", " rub and visit bicycle ", between business entity " supplier " belongs to supply chain relationship.
All trained sample sentences comprising a business entity pair are extracted from sample database, each trained sample sentence includes this to enterprise The title of industry entity and the relationship type of the business entity pair, and each trained sample sentence is carried out at participle using participle tool Reason.Wherein it is possible to be divided using participles tools such as Stanford Chinese word segmenting tool, jieba participles each trained sample sentence Word processing.Each word after participle is indicated in the form of one-hot vector, obtains initial term vector.Wherein one-hot vector Method refer to each vocabulary be shown as a very long vector, the dimension of vector indicate word number, only one of them dimension Value be 1, remaining dimension be 0, which represents current word.For example, extracting from sample database includes Foxconn and Mo Bai bicycle All trained sample sentences, and each trained sample sentence includes Foxconn and rubs and visit the bicycle Liang Ge business entity title and the enterprise The relationship type (supplier) of industry entity pair.Word segmentation processing is carried out to " Foxconn is the supplier for rubbing and visiing bicycle ", is obtained as follows As a result " Foxconn | be | rub and visit bicycle | | supplier ".If the initial term vector of " Foxconn " is [0100000000], "Yes" Initial term vector be [0010000000].Then ID is marked for each trained sample sentence, sentence ID is mapped as corresponding training sample The initial one vector of sentence.
The initial term vector of the left and right adjacent word of some word in initial one vector sum training sample sentence is inputted into the company Continuous bag of words, prediction obtain the term vector x of the wordi.By the initial one vector update replace with the first update sentence to Amount, will be described in initial term vector input of the left and right adjacent word of next word in the first update sentence vector sum training sample sentence Continuous bag of words, prediction obtain the term vector x of the wordi+1, the first update sentence vector update is replaced with into the second update Sentence vector, such repetitive exercise, training updates the sentence vector of the training sample sentence every time, until prediction obtains training in sample sentence The term vector x of each wordi, i=(0,1,2,3 ..., m), will the updated sentence vector of last time training as the training The sentence vector S of sample sentencei, i=(0,1,2,3 ..., n).As Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) model first layer input.For example, by the left adjoining of "Yes" can word " Foxconn ", right adjoining can word The initial term vector and initial one vector of " rub and visit bicycle " input continuous bag of words, and prediction obtains the term vector of "Yes" x2, initial one vector is once updated, the first update sentence vector is obtained;The left adjoining that " will be rubbed and visit bicycle " can word The initial term vector or current term vector of "Yes", right adjoining can word " " initial word vector sum first to update sentence vector defeated Enter continuous bag of words, prediction obtains the term vector x of " rub and visit bicycle "3, the first update sentence vector is updated, obtains the Two update the such repetitive exercises of sentence vector ..., until prediction obtain it is above-mentioned it is all can word term vector xi, update obtains The sentence vector S of the training sample sentencei.In the process, the sentence ID of each news sentence remains constant.
In the second layer of RNN model, then with shot and long term memory module (Long Short-Term Memory, LSTM) from From left to right is according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current word vector xi? One hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi', two hiding stratiforms are spliced by Concatenate function State vector obtains training the synthesis hidden layer state vector of each word in sample sentence, hidden further according to the synthesis of all words in training sample sentence Hiding layer state vector obtains the feature vector T of each trained sample sentencei, i=(0,1,2,3 ..., n).For example, " Foxconn is to rub Visit the supplier of bicycle " in sentence, with LSTM from left to right according to the term vector x of " Foxconn "1Hidden layer state vector h1Meter Calculate the term vector x of "Yes"2The first hidden layer state vector h2, and from right to left according to the term vector x of " rub and visit bicycle "3It is hidden Hide layer state vector h3Calculate the term vector x of "Yes"2The second hidden layer state vector h2', it is spelled by Concatenate function Meet two hidden layer state vector (h2And h2') the synthesis hidden layer state vector of each word in trained sample sentence is obtained, further according to instruction The synthesis hidden layer state vector for practicing all words in sample sentence obtains the feature vector T of each trained sample sentencei
In the third layer of RNN model, according to the feature vector T of each trained sample sentencei, public using the calculating of average vector Formula: S=sum (ai*Ti)/n indicates the average vector S of the business entity pair.Wherein aiRepresent training sample sentence weight, for Definite value, TiThe feature vector of the trained sample sentence of each of described business entity pair is represented, n represents the quantity of training sample sentence.
In the last layer of RNN model, average vector S is updated to softmax classification function:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent institute State the business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type In probability.According to the relationship type of training Yang Juzhong business entity pair, the weight a of training sample sentence is determinedi.By constantly learning It practises, continues to optimize the weight a of trained sample sentencei, so that effectively sentence obtains higher weight, and there have the sentence of noise to obtain to be smaller Weight.
In the present embodiment, after RNN model determines, the unstructured sentence of business entity pair can be had to any one Son carries out Relationship Prediction, and the prediction of model is not associated with specific enterprise name.
The sentence of the business entity comprising two relationships to be predicted is extracted from current text, and these sentences are divided Word obtains sentence vector.For example, S1,S2,S3,S4What is indicated is the vector set of the corresponding sentence of Liang Ge business entity.By double Each sentence is extracted to shot and long term memory module (Bidirectional Long Short-term Memory, bi-LSTM) Feature vector T1,T2,T3,T4, the feature vector of each sentence is inputted into trained RNN model, obtains Liang Ge enterprise reality Relationship Prediction result between body.
Above-described embodiment propose business connection extracting method, by from non-structured text extract knowledge base in exist The training sample sentence of the business entity pair of relationship establishes sample database.It include all training of a business entity pair in sample drawn library Sample sentence is simultaneously segmented, and the sentence vector S of each trained sample sentence is obtainedi, using LSTM calculate the feature of each trained sample sentence to Measure Ti.Average vector S is substituted into softmax by the average vector S that each trained sample sentence is calculated by the calculation formula of average vector Classification function is calculated, and the weight a of training sample sentence is determined according to the relationship type of business entity pairi.Finally from current text It is middle to extract the sentence comprising Liang Ge business entity, the feature vector T of sentence is obtained by bi-LSTMi, by this feature vector TiIt is defeated Enter trained RNN model, predict the relationship between the Liang Ge business entity, not only reduces cumbersome training data and manually mark Step, and have better accuracy rate and recall rate than other monitor modes.
As shown in Fig. 2, being the module diagram of 10 preferred embodiment of business connection extraction procedure in Fig. 1.Alleged by the present invention Module be refer to complete specific function series of computation machine program instruction section.
In the present embodiment, business connection extraction procedure 10 includes: to establish module 110, word segmentation module 120, splicing module 130, computing module 140, weight determination module 150, prediction module 160, the functions or operations that the module 110-160 is realized Step is similar as above, and and will not be described here in detail, illustratively, such as wherein:
Module 110 is established, there are the business entities of relationship to build sentence as training sample sentence for extracting from knowledge base Vertical sample database;
Word segmentation module 120, for extracting all trained sample sentences comprising a business entity pair from sample database, using pre- If participle tool each trained sample sentence is segmented, each word after participle is mapped to term vector xi, and by each instruction Practice sample sentence and is mapped to sentence vector Si, input as RNN model first layer;
Splicing module 130 calculates current word vector x with LSTM for the second layer in RNN model from left to righti? One hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', pass through splicing Two hidden layer state vectors obtain training the synthesis hidden layer state vector of each word in sample sentence, further according to institute in training sample sentence There is the synthesis hidden layer state vector of word to obtain the feature vector T of each trained sample sentencei
Computing module 140, for the third layer in RNN model, according to the feature vector T of each trained sample sentencei, using flat Equal vector expression indicates the average vector S of the business entity pair;
Weight determination module 150, for the last layer in RNN model, by the average vector S and the business entity Pair relationship type substitute into softmax classification function the weight a of the trained sample sentence of each of described business entity pair be calculatedi, Obtain trained RNN model;
Prediction module 160 is obtained for extracting the sentence comprising Liang Ge business entity from current text by bi-LSTM To the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained RNN model is inputted, prediction obtains Liang Ge enterprise reality Relationship between body.
As shown in figure 3, being the flow chart of business connection extracting method preferred embodiment of the present invention.
In the present embodiment, processor 12 executes the computer journey of the business connection extraction procedure 10 stored in memory 11 The following steps of business connection extracting method are realized when sequence:
Step S10 is extracted to be used as sentence there are the business entity of relationship from knowledge base and sample sentence is trained to establish sample database;
Step S20 extracts all trained sample sentences comprising a business entity pair from sample database, uses preset participle Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and each trained sample sentence is reflected Penetrate the subvector S that forms a complete sentencei, input as RNN model first layer;
Step S30 calculates current word vector x with LSTM in the second layer of RNN model from left to rightiThe first hidden layer State vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', it is hidden by splicing two Layer state vector obtain train sample sentence in each word synthesis hidden layer state vector, further according to training sample sentence in all words it is comprehensive It closes hidden layer state vector and obtains the feature vector T of each trained sample sentencei
Step S40, in the third layer of RNN model, according to the feature vector T of each trained sample sentencei, utilize average vector table The average vector S of the business entity pair is indicated up to formula;
Step S50, in the last layer of RNN model, by the average vector S and the relationship type of the business entity pair Substitute into the weight a that the trained sample sentence of each of described business entity pair is calculated in softmax classification functioni, obtain trained RNN model;
Step S60 extracts the sentence comprising Liang Ge business entity from current text, obtains sentence by bi-LSTM Feature vector Ti, by this feature vector TiAbove-mentioned trained RNN model is inputted, prediction obtains the pass between the Liang Ge business entity System.
In the present embodiment, it is assumed that there are certain relationships in knowledge base for Liang Ge business entity, then real comprising the Liang Ge enterprise The unstructured sentence of body can represent this relationship.When we need to identify the pass in news between certain Liang Ge business entity When connection, extraction includes all unstructured sentences of the Liang Ge business entity from knowledge base, using the sentence as training sample Sentence establishes sample database.Wherein, the knowledge base is non-comprising any two business entity in history news data by collecting What structuring sentence was established.For example, it is desired to identify the association in news between certain Liang Ge business entity, extracts and contain from knowledge base There are all unstructured sentences of the Liang Ge business entity, and establishes a sample database for the sentence as training sample sentence.Its Middle business entity includes the relationships such as treasury trade, supply chain and cooperation to existing relationship.For example, being taken out from non-structured text Take the sentence for containing " Foxconn " and " rub and visit bicycle " Liang Ge business entity pair as training sample sentence, wherein " Foxconn is sentence Rub and visit the supplier of bicycle " in include business entity to the relationship for " Foxconn ", " rub and visit bicycle ", between business entity " supplier " belongs to supply chain relationship.
All trained sample sentences comprising a business entity pair are extracted from sample database, each trained sample sentence includes this to enterprise The title of industry entity and the relationship type of the business entity pair, and each trained sample sentence is carried out at participle using participle tool Reason.For example, extracting all trained sample sentences comprising Foxconn and Mo Bai bicycle from sample database, and each trained sample sentence wraps It includes Foxconn and rubs and visit the relationship type (supplier) of the bicycle Liang Ge business entity title and the business entity pair.It uses The participle tools such as Stanford Chinese word segmenting tool, jieba participle carry out word segmentation processing to each trained sample sentence.Such as: to " rich Scholar's health is the supplier for rubbing and visiing bicycle " carry out word segmentation processing, obtain following result " Foxconn | be | rub and visit bicycle | | supply Quotient ".Each word after participle is indicated in the form of one-hot vector, obtains initial term vector.Wherein one-hot vector Method, which refers to, is shown as a very long vector each vocabulary, the dimension of vector indicate word number, only one of them dimension Value is 1, remaining dimension is 0, which represents current word.For example, the initial term vector of " Foxconn " be [0100000000], The initial term vector of "Yes" is [0010000000].Then ID is marked for each trained sample sentence, sentence ID is mapped as corresponding instruction Practice the initial one vector of sample sentence.
The initial term vector of the left and right adjacent word of some word in initial one vector sum training sample sentence is inputted into the company Continuous bag of words, prediction obtain the term vector x of the wordi.By the initial one vector update replace with the first update sentence to Amount, will be described in initial term vector input of the left and right adjacent word of next word in the first update sentence vector sum training sample sentence Continuous bag of words, prediction obtain the term vector x of the wordi+1, the first update sentence vector update is replaced with into the second update Sentence vector, such repetitive exercise, training updates the sentence vector of the training sample sentence every time, until prediction obtains training in sample sentence The term vector x of each wordi, i=(0,1,2,3 ..., m), will the updated sentence vector of last time training as the training The sentence vector S of sample sentencei, i=(0,1,2,3 ..., n).For example, in " Foxconn is the supplier for rubbing and visiing bicycle " sentence, it will The left adjoining of "Yes" can word " Foxconn ", right adjoining can word " rub and visit bicycle " initial term vector and initial one vector Continuous bag of words are inputted, prediction obtains the term vector x of "Yes"2, initial one vector is once updated, obtains first more New sentence vector;The left adjoining that " will be rubbed and visit bicycle " can word "Yes" initial term vector or current term vector, right adjoining it is available Word " " initial word vector sum first update sentence vector and input continuous bag of words, prediction obtain the word of " rub and visit bicycle " to Measure x3, the first update sentence vector is updated, the second such repetitive exercise of update sentence vector ... is obtained, until prediction Obtain it is above-mentioned it is all can word term vector xi, update and obtain the sentence vector S of the training sample sentencei.In the process, Mei Gexin The sentence ID for hearing sentence remains constant.
In the second layer of RNN model, then with LSTM from left to right according to current word vector xiPrevious term vector xi-1 Hidden layer state vector hi-1Calculate current word vector xiThe first hidden layer state vector hi, and from right to left according to current word Vector xiThe latter term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi', the synthesis hidden layer that two hidden layer state vectors obtain training each word in sample sentence is spliced by Concatenate function State vector obtains the feature vector of each trained sample sentence further according to the synthesis hidden layer state vector of all words in training sample sentence Ti, i=(0,1,2,3 ..., n).For example, in " Foxconn is the supplier for rubbing and visiing bicycle " sentence, with LSTM root from left to right According to the term vector x of " Foxconn "1Hidden layer state vector h1Calculate the term vector x of "Yes"2The first hidden layer state vector h2, and from right to left according to the term vector x of " rub and visit bicycle "3Hidden layer state vector h3Calculate the term vector x of "Yes"2? Two hidden layer state vector h2', two hidden layer state vector (h are spliced by Concatenate function2And h2') trained The synthesis hidden layer state vector of each word in sample sentence is obtained further according to the synthesis hidden layer state vector of all words in training sample sentence To the feature vector T of each trained sample sentencei
In the third layer of RNN model, according to the feature vector T of each trained sample sentencei, public using the calculating of average vector Formula: S=sum (ai*Ti)/n indicates the average vector S of the business entity pair.Wherein aiRepresent the weight of training sample sentence, TiGeneration The feature vector of the trained sample sentence of each of business entity pair described in table, n represent the quantity of training sample sentence.It is assumed that from knowledge base The training sample sentence for extracting " Foxconn " and " rub and visit bicycle " entity pair has 50,000, then by the feature vector T of every trained sample sentencei, I=(0,1,2,3 ..., n) substitutes into the calculation formula of average vector: S=sum (ai*Ti)/n calculates " Foxconn " and " rubs and visit list The average vector S of vehicle " entity pair.Wherein n is equal to 50,000.
In the last layer of RNN model, average vector S is then updated to softmax classification function:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent institute State the business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type In probability.According to the relationship type of training Yang Juzhong business entity pair, the weight a of training sample sentence is determinedi.By constantly changing Generation study, continues to optimize the weight a of trained sample sentencei, so that effectively sentence obtains higher weight, and there is the sentence of noise to obtain Lesser weight, to obtain reliable RNN model.
In the present embodiment, after RNN model determines, the unstructured sentence of business entity pair can be had to any one Son carries out Relationship Prediction, and the prediction of model is not associated with specific enterprise name.
Finally, as shown in figure 4, being the frame diagram of prediction module of the present invention.It extracts comprising two from current text to pre- The sentence of the business entity of survey relationship extracts the sentence comprising " Chinese safety group " and " Bank of China " such as from news, and These sentences are segmented to obtain sentence vector.For example, S1,S2,S3,S4What is indicated is the corresponding sentence of Liang Ge business entity Vector set.The feature vector T of each sentence is extracted by bi-LSTM1,T2,T3,T4, then by calculating TiWith relation object The similarity of type r vector assigns TiIn the weight that entire sentence is concentrated, finally takes in each sentence weighting and pass through with after Softmax classifier predicts the relationship between " Chinese safety group " and " Bank of China ".
Above-described embodiment propose business connection extracting method, by from non-structured text extract knowledge base in exist The sentence of the business entity pair of relationship is as training sample sentence and establishes sample database.It include a business entity pair in sample drawn library All trained sample sentences and segmented, obtain the sentence vector S of each trained sample sentencei, each trained sample is calculated using LSTM The feature vector T of sentencei.Then the average vector S that the business entity pair is indicated by the calculation formula of average vector, will be averaged Vector S substitutes into softmax classification function and is calculated, and the weight of training sample sentence is determined according to the relationship type of business entity pair ai, obtain trained RNN model.The sentence comprising Liang Ge business entity is finally extracted from current text, by bi-LSTM Obtain the feature vector T of sentencei, by this feature vector TiTrained RNN model is inputted, is predicted between the Liang Ge business entity Relationship improves in news between the recognition capability of relationship different enterprises and to the early warning of business risk, reduces cumbersome training The artificial annotation step of data.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium In include business connection extraction procedure 10, following operation is realized when the business connection extraction procedure 10 is executed by processor:
Sample database establishment step: it is extracted from knowledge base there are the business entity of relationship to sentence as training sample sentence foundation Sample database;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset point Word tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and by each trained sample sentence It is mapped to sentence vector Si, input as RNN model first layer;
Splicing step: in the second layer of RNN model, current word vector x is calculated from left to right with LSTMiThe first hidden layer State vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', it is hidden by splicing two Layer state vector obtain train sample sentence in each word synthesis hidden layer state vector, further according to training sample sentence in all words it is comprehensive It closes hidden layer state vector and obtains the feature vector T of each trained sample sentencei
Calculate step: in the third layer of RNN model, according to the feature vector T of each trained sample sentencei, utilize average vector Expression formula indicates the average vector S of the business entity pair;
Weight determines step: in the last layer of RNN model, by the average vector S and the pass of the business entity pair Set type substitutes into the weight a that each trained sample sentence is calculated in softmax classification functioni, obtain trained RNN model;
Prediction steps: the sentence comprising Liang Ge business entity is extracted from current text, obtains sentence by bi-LSTM Feature vector Ti, by this feature vector TiAbove-mentioned trained RNN model is inputted, prediction obtains the pass between the Liang Ge business entity System.
Preferably, the participle step includes:
Each word after participle is indicated in the form of one-hot vector, obtains initial term vector, and is each trained sample Sentence ID is mapped as the initial one vector of corresponding training sample sentence by sentence mark sentence ID, by the initial one vector sum instruction The initial term vector for practicing the left and right adjacent word of some word in sample sentence inputs the continuous bag of words, and prediction obtains the word of the word Vector xi, the sentence vector of each forecast updating training sample sentence, until prediction obtain the word of each word in the training sample sentence to Measure xi, using the updated sentence vector of last time as the sentence vector S of the training sample sentencei
Preferably, the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1It calculates current Term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1It is hidden Hide layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula are as follows:
S=sum (ai*Ti)/n
Wherein aiIt represents the weight of training sample sentence, be required value, TiRepresent the trained sample sentence of each of described business entity pair Feature vector, n represent the quantity of training sample sentence.
Preferably, the softmax classification function expression formula are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Described in representative The business connection type of business entity pair, σ (z)jIt represents and needs the business connection type predicted in each business connection type Probability.
The specific embodiment of the computer readable storage medium of the present invention is specific with above-mentioned business connection extracting method Embodiment is roughly the same, and details are not described herein.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (8)

1. a kind of business connection extracting method, which is characterized in that the described method includes:
Sample database establishment step: it extracts to be used as sentence there are the business entity of relationship from knowledge base and sample sentence is trained to establish sample Library;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset participle work Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and each trained sample sentence is mapped Form a complete sentence subvector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, with shot and long term memory module calculate from left to right current word to Measure xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', The synthesis hidden layer state vector for obtaining training each word in sample sentence by splicing two hidden layer state vectors, further according to training The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in sample sentencei
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the spy of the trained sample sentence of each of the business entity pair Levy vector Ti, the average vector S:S=sum (a of the business entity pair is indicated using average vector expression formulai*Ti)/n, wherein aiIt represents the weight of each trained sample sentence, be required value, TiThe feature vector of each trained sample sentence is represented, n represents training sample sentence Quantity;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, by the average vector S and the business entity pair Relationship type substitute into softmax classification function the weight a of each trained sample sentence be calculatedi, obtain trained circulation mind Through network model;
Prediction steps: the sentence comprising Liang Ge business entity is extracted from current text, is obtained by two-way shot and long term memory module To the feature vector T of sentencei, by this feature vector TiInput above-mentioned trained Recognition with Recurrent Neural Network model, prediction obtain this two Relationship between a business entity.
2. business connection extracting method according to claim 1, which is characterized in that the participle step includes:
Each word after participle is indicated in the form of one-hot vector, obtains initial term vector, and is each trained sample sentence mark Sentence ID is infused, sentence ID is mapped as to the initial one vector of corresponding training sample sentence, by the initial one vector sum training sample The initial term vector of the left and right adjacent word of some word inputs continuous bag of words in sentence, and prediction obtains the term vector x of the wordi, often The sentence vector of the secondary forecast updating training sample sentence, until prediction obtains the term vector x of each word in the training sample sentencei, with most Sentence vector S of the primary updated sentence vector as the training sample sentence afterwardsi
3. business connection extracting method according to claim 1, which is characterized in that the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hiding stratiform State vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
4. business connection extracting method according to claim 1, which is characterized in that the table of the softmax classification function Up to formula are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent the enterprise The business connection type of entity pair, σ (z)jIt is general in each business connection type to represent the business connection type for needing to predict Rate.
5. a kind of electronic device, which is characterized in that described device includes: memory, processor, and enterprise is stored on the memory Industry relationship extraction procedure, the business connection extraction procedure are executed by the processor, it can be achieved that following steps:
Sample database establishment step: it extracts to be used as sentence there are the business entity of relationship from knowledge base and sample sentence is trained to establish sample Library;
It segments step: extracting all trained sample sentences comprising a business entity pair from sample database, use preset participle work Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and each trained sample sentence is mapped Form a complete sentence subvector Si, input as Recognition with Recurrent Neural Network model first layer;
Splice step: in the second layer of Recognition with Recurrent Neural Network model, with shot and long term memory module calculate from left to right current word to Measure xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', The synthesis hidden layer state vector for obtaining training each word in sample sentence by splicing two hidden layer state vectors, further according to training The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in sample sentencei
Calculate step: in the third layer of Recognition with Recurrent Neural Network model, according to the spy of the trained sample sentence of each of the business entity pair Levy vector Ti, the average vector S:S=sum (a of the business entity pair is indicated using average vector expression formulai*Ti)/n, wherein aiIt represents the weight of each trained sample sentence, be required value, TiThe feature vector of each trained sample sentence is represented, n represents training sample sentence Quantity;
Weight determines step: in the last layer of Recognition with Recurrent Neural Network model, by the average vector S and the business entity pair Relationship type substitute into softmax classification function the weight a of each trained sample sentence be calculatedi, obtain trained circulation mind Through network model;
Prediction steps: the sentence comprising Liang Ge business entity is extracted from current text, is obtained by two-way shot and long term memory module To the feature vector T of sentencei, by this feature vector TiInput above-mentioned trained Recognition with Recurrent Neural Network model, prediction obtain this two Relationship between a business entity.
6. electronic device according to claim 5, which is characterized in that the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hiding stratiform State vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
7. electronic device according to claim 5, which is characterized in that the expression formula of the softmax classification function are as follows:
Wherein K represents the number of business connection type, and S represents the average vector of the business entity pair,Represent the enterprise The business connection type of entity pair, σ (z)jIt is general in each business connection type to represent the business connection type for needing to predict Rate.
8. a kind of computer readable storage medium, which is characterized in that include business connection in the computer readable storage medium Extraction procedure, it can be achieved that as described in any one of claims 1 to 5 when the business connection extraction procedure is executed by processor The step of business connection extracting method.
CN201711061205.0A 2017-11-02 2017-11-02 Business connection extracting method, device and storage medium Active CN107943847B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711061205.0A CN107943847B (en) 2017-11-02 2017-11-02 Business connection extracting method, device and storage medium
PCT/CN2018/076119 WO2019085328A1 (en) 2017-11-02 2018-02-10 Enterprise relationship extraction method and device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711061205.0A CN107943847B (en) 2017-11-02 2017-11-02 Business connection extracting method, device and storage medium

Publications (2)

Publication Number Publication Date
CN107943847A CN107943847A (en) 2018-04-20
CN107943847B true CN107943847B (en) 2019-05-17

Family

ID=61934111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711061205.0A Active CN107943847B (en) 2017-11-02 2017-11-02 Business connection extracting method, device and storage medium

Country Status (2)

Country Link
CN (1) CN107943847B (en)
WO (1) WO2019085328A1 (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876044B (en) * 2018-06-25 2021-02-26 中国人民大学 Online content popularity prediction method based on knowledge-enhanced neural network
CN108920587B (en) * 2018-06-26 2021-09-24 清华大学 Open domain visual question-answering method and device fusing external knowledge
CN108985501B (en) * 2018-06-29 2022-04-29 平安科技(深圳)有限公司 Index feature extraction-based stock index prediction method, server and storage medium
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning
CN110737758B (en) * 2018-07-03 2022-07-05 百度在线网络技术(北京)有限公司 Method and apparatus for generating a model
CN109063032B (en) * 2018-07-16 2020-09-11 清华大学 Noise reduction method for remote supervision and retrieval data
CN109597851B (en) * 2018-09-26 2023-03-21 创新先进技术有限公司 Feature extraction method and device based on incidence relation
CN109376250A (en) * 2018-09-27 2019-02-22 中山大学 Entity relationship based on intensified learning combines abstracting method
CN109582956B (en) * 2018-11-15 2022-11-11 中国人民解放军国防科技大学 Text representation method and device applied to sentence embedding
CN109710768B (en) * 2019-01-10 2020-07-28 西安交通大学 Tax payer industry two-level classification method based on MIMO recurrent neural network
CN112036181A (en) * 2019-05-14 2020-12-04 上海晶赞融宣科技有限公司 Entity relationship identification method and device and computer readable storage medium
CN110209836B (en) * 2019-05-17 2022-04-26 北京邮电大学 Remote supervision relation extraction method and device
CN111950279B (en) * 2019-05-17 2023-06-23 百度在线网络技术(北京)有限公司 Entity relationship processing method, device, equipment and computer readable storage medium
CN110188201A (en) * 2019-05-27 2019-08-30 上海上湖信息技术有限公司 A kind of information matching method and equipment
CN110188202B (en) * 2019-06-06 2021-07-20 北京百度网讯科技有限公司 Training method and device of semantic relation recognition model and terminal
CN110427624B (en) * 2019-07-30 2023-04-25 北京百度网讯科技有限公司 Entity relation extraction method and device
CN110619053A (en) * 2019-09-18 2019-12-27 北京百度网讯科技有限公司 Training method of entity relation extraction model and method for extracting entity relation
CN110879938A (en) * 2019-11-14 2020-03-13 中国联合网络通信集团有限公司 Text emotion classification method, device, equipment and storage medium
CN111382843B (en) * 2020-03-06 2023-10-20 浙江网商银行股份有限公司 Method and device for establishing enterprise upstream and downstream relationship identification model and mining relationship
CN111476035B (en) * 2020-05-06 2023-09-05 中国人民解放军国防科技大学 Chinese open relation prediction method, device, computer equipment and storage medium
CN111581387B (en) * 2020-05-09 2022-10-11 电子科技大学 Entity relation joint extraction method based on loss optimization
CN111680127A (en) * 2020-06-11 2020-09-18 暨南大学 Annual report-oriented company name and relationship extraction method
CN111784488B (en) * 2020-06-28 2023-08-01 中国工商银行股份有限公司 Enterprise fund risk prediction method and device
CN112418320B (en) * 2020-11-24 2024-01-19 杭州未名信科科技有限公司 Enterprise association relation identification method, device and storage medium
CN113486630B (en) * 2021-09-07 2021-11-19 浙江大学 Supply chain data vectorization and visualization processing method and device
CN113806538B (en) * 2021-09-17 2023-08-22 平安银行股份有限公司 Label extraction model training method, device, equipment and storage medium
CN116562303B (en) * 2023-07-04 2023-11-21 之江实验室 Reference resolution method and device for reference external knowledge

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106407211A (en) * 2015-07-30 2017-02-15 富士通株式会社 Method and device for classifying semantic relationships among entity words
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106855853A (en) * 2016-12-28 2017-06-16 成都数联铭品科技有限公司 Entity relation extraction system based on deep neural network
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217393A1 (en) * 2013-09-12 2016-07-28 Hewlett-Packard Development Company, L.P. Information extraction
CN107194422A (en) * 2017-06-19 2017-09-22 中国人民解放军国防科学技术大学 A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407211A (en) * 2015-07-30 2017-02-15 富士通株式会社 Method and device for classifying semantic relationships among entity words
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106855853A (en) * 2016-12-28 2017-06-16 成都数联铭品科技有限公司 Entity relation extraction system based on deep neural network
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
An Improved Method for Chinese Company Name and Abbreviation Recognition;Lei Meng et al;《 Knowledge Management in Organizations》;20170712;第435-447页
Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification;Peng Zhou et al;《Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics》;20160812;第207-212页
Classifying Relation via Bidirectional Recurrent Neural Network Based on Local Information;Xiaoyun Hou et al;《Web Technologies and Applications》;20160917;第420-430页
基于LSTM的语义关系分类研究;胡新辰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160215;第2016年卷(第02期);第I138-2096页
远程监督人物关系抽取中的去噪研究;黄蓓静 等;《计算机应用与软件》;20170731;第34卷(第7期);第11-18页

Also Published As

Publication number Publication date
CN107943847A (en) 2018-04-20
WO2019085328A1 (en) 2019-05-09

Similar Documents

Publication Publication Date Title
CN107943847B (en) Business connection extracting method, device and storage medium
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
CN107832299B (en) Title rewriting processing method and device based on artificial intelligence and readable medium
CN110489555A (en) A kind of language model pre-training method of combination class word information
CN108563703A (en) A kind of determination method of charge, device and computer equipment, storage medium
CN108647205A (en) Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing
CN108304468A (en) A kind of file classification method and document sorting apparatus
CN113051356B (en) Open relation extraction method and device, electronic equipment and storage medium
CN104809105B (en) Recognition methods and the system of event argument and argument roles based on maximum entropy
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN116097250A (en) Layout aware multimodal pre-training for multimodal document understanding
WO2021139316A1 (en) Method and apparatus for establishing expression recognition model, and computer device and storage medium
CN110059924A (en) Checking method, device, equipment and the computer readable storage medium of contract terms
CN115392237B (en) Emotion analysis model training method, device, equipment and storage medium
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
CN107943788A (en) Enterprise's abbreviation generation method, device and storage medium
CN113821622B (en) Answer retrieval method and device based on artificial intelligence, electronic equipment and medium
CN113627797B (en) Method, device, computer equipment and storage medium for generating staff member portrait
CN110489765A (en) Machine translation method, device and computer readable storage medium
CN117290515A (en) Training method of text annotation model, method and device for generating text graph
CN112632227A (en) Resume matching method, resume matching device, electronic equipment, storage medium and program product
CN116681082A (en) Discrete text semantic segmentation method, device, equipment and storage medium
CN116821373A (en) Map-based prompt recommendation method, device, equipment and medium
CN116628162A (en) Semantic question-answering method, device, equipment and storage medium
CN115510188A (en) Text keyword association method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant