CN107943847A - Business connection extracting method, device and storage medium - Google Patents

Business connection extracting method, device and storage medium Download PDF

Info

Publication number
CN107943847A
CN107943847A CN201711061205.0A CN201711061205A CN107943847A CN 107943847 A CN107943847 A CN 107943847A CN 201711061205 A CN201711061205 A CN 201711061205A CN 107943847 A CN107943847 A CN 107943847A
Authority
CN
China
Prior art keywords
vector
sentence
business
word
sample sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711061205.0A
Other languages
Chinese (zh)
Other versions
CN107943847B (en
Inventor
徐冰
汪伟
罗傲雪
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201711061205.0A priority Critical patent/CN107943847B/en
Priority to PCT/CN2018/076119 priority patent/WO2019085328A1/en
Publication of CN107943847A publication Critical patent/CN107943847A/en
Application granted granted Critical
Publication of CN107943847B publication Critical patent/CN107943847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of business connection extracting method, device and storage medium, this method includes:Extracted from knowledge base and sample storehouse is established as training sample sentence to sentence there are the business entity of relation;All trained sample sentences comprising a business entity pair are extracted from sample storehouse and are segmented, each word is mapped to term vector xi, it is mapped to sentence vector Si;Term vector x is calculated with LSTMiThe first hidden layer state vector hiWith the second hidden layer state vector hi', splicing obtains comprehensive hidden layer state vector, then obtains feature vector Ti;By feature vector TiSubstitute into average vector expression formula and calculate average vector S;Average vector S and the relationship type of business entity pair substitution softmax classification functions are calculated to the weight a of each trained sample sentencei;Extraction includes the sentence of Liang Ge business entities, and feature vector T is obtained by bi LSTMi, trained RNN models are input to, predict the relation of the Liang Ge enterprises, cost of labor is reduced, more accurately predicts the relation between the Liang Ge business entities.

Description

Business connection extracting method, device and storage medium
Technical field
The present invention relates to processing data information technical field, more particularly to a kind of business connection extracting method, device and meter Calculation machine readable storage medium storing program for executing.
Background technology
Identify the association between different enterprises, such as treasury trade, supply chain, cooperation, to business risk early warning in news There is very great meaning.But now common entity relation extraction method needs manually to carry out the mark of a large amount of training datas, And corpus labeling work generally takes time and effort very much.
The content of the invention
In view of the foregoing, the present invention provides a kind of business connection extracting method, device and computer-readable recording medium, Relation extraction model based on convolutional neural networks can be expanded in remote supervisory data, efficiently reduce model to artificial The dependence of labeled data, and this business connection extracting method for having supervision has more compared to semi-supervised or unsupervised approaches Good accuracy rate and recall rate.
To achieve the above object, the present invention provides a kind of business connection extracting method, and this method includes:
Sample storehouse establishment step:Extract from knowledge base and sentence is established as training sample sentence there are the business entity of relation Sample storehouse;
Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default point Word instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence It is mapped to sentence vector Si, the input as Recognition with Recurrent Neural Network model first layer;
Splice step:In the second layer of Recognition with Recurrent Neural Network model, calculated from left to right currently with shot and long term memory module Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', the synthesis hidden layer state vector of each word in trained sample sentence is obtained by splicing two hidden layer state vectors, further according to The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei
Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize Average vector expression formula calculates the average vector S of each trained sample sentence;
Weight determines step:It is in last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functionsi
Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, remembers mould by two-way shot and long term Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains Relation between the Liang Ge business entities.
Preferably, the participle step includes:
Each word after participle is represented in the form of one-hot vectors, obtains initial term vector, and is each training sample Sentence ID, is mapped as the initial one vector of corresponding training sample sentence by sentence mark sentence ID, by the initial one vector sum instruction The initial term vector for practicing the left and right adjacent word of some word in sample sentence inputs the continuous bag of words, and prediction obtains the word of the word Vector xi, the sentence vector of each forecast updating training sample sentence, until prediction obtain the word of each word in the training sample sentence to Measure xi, the sentence vector S of the training sample sentence is used as using the sentence vector after updating for the last timei
Preferably, the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current Term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1It is hidden Hide layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula is:
S=sum (ai*Ti)/n
Wherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents training sample sentence Quantity.
Preferably, the softmax classification functions expression formula is:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
In addition, the present invention also provides a kind of electronic device, which includes:Memory, processor and it is stored in institute The business connection extraction procedure that can be run on memory and on the processor is stated, the business connection extraction procedure is described Processor is performed, it can be achieved that following steps:
Sample storehouse establishment step:Extract from knowledge base and sentence is established as training sample sentence there are the business entity of relation Sample storehouse;
Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default point Word instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence It is mapped to sentence vector Si, the input as Recognition with Recurrent Neural Network model first layer;
Splice step:In the second layer of Recognition with Recurrent Neural Network model, calculated from left to right currently with shot and long term memory module Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', the synthesis hidden layer state vector of each word in trained sample sentence is obtained by splicing two hidden layer state vectors, further according to The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei
Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize Average vector expression formula calculates the average vector S of each trained sample sentence;
Weight determines step:It is in last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functionsi
Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, remembers mould by two-way shot and long term Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains Relation between the Liang Ge business entities.
Preferably, the splicing step includes:
With shot and long term memory module from left to right according to current word vector xiPrevious term vector xi-1Hiding layer state Vectorial hi-1Calculate current word vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiIt is latter A term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula is:
S=sum (ai*Ti)/n
Wherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents training sample sentence Quantity.
Preferably, the softmax classification functions expression formula is:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer-readable recording medium Storage medium includes business connection extraction procedure, it can be achieved that as above when the business connection extraction procedure is executed by processor Arbitrary steps in the business connection extracting method.
Business connection extracting method, electronic device and computer-readable recording medium proposed by the present invention, from unstructured The sentence of business entity pair in text in extraction knowledge base there are relation as training sample sentence and establishes sample storehouse.Then in sample All trained sample sentences for including a business entity pair are extracted in this storehouse, and each trained sample sentence is segmented, and obtain each instruction Practice the sentence vector S of sample sentencei, the feature vector T of each trained sample sentence is calculated by shot and long term memory modulei.Then according to each The feature vector T of training sample sentencei, the average vector S of each training sample sentence is calculated, average vector S is substituted into softmax classification letters Number is calculated, and is determined to train the weight a of sample sentence according to the relationship type of business entity pairi.Finally extracted from current text The sentence of Liang Ge business entities is included, the feature vector T of sentence is obtained by two-way shot and long term memory module, by this feature vector T inputs trained Recognition with Recurrent Neural Network model, predicts the relation between the Liang Ge business entities, improves in news to different enterprises The recognition capability of relation between industry, reduces the dependence to being manually trained data mark.
Brief description of the drawings
Fig. 1 is the schematic diagram of electronic device preferred embodiment of the present invention;
Fig. 2 is the module diagram of business connection extraction procedure preferred embodiment in Fig. 1;
Fig. 3 is the flow chart of business connection extracting method preferred embodiment of the present invention;
Fig. 4 is the frame diagram of prediction module of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, it is the schematic diagram of 1 preferred embodiment of electronic device of the present invention.
In the present embodiment, electronic device 1 can be server, smart mobile phone, tablet computer, PC, portable meter Calculation machine and other electronic equipments with calculation function.
The electronic device 1 includes:Memory 11, processor 12, knowledge base 13, network interface 14 and communication bus 15.Its In, knowledge base 13 is stored on memory 11, and the sentence for containing business entity pair is extracted from knowledge base 13 as training sample Sentence establishes sample storehouse.
Wherein, network interface 14 can alternatively include standard wireline interface and wireless interface (such as WI-FI interfaces).It is logical Letter bus 15 is used for realization the connection communication between these components.
Memory 11 includes at least a type of readable storage medium storing program for executing.The readable storage medium storing program for executing of at least one type Can be such as flash memory, hard disk, multimedia card, the non-volatile memory medium of card-type memory.In certain embodiments, it is described to deposit Reservoir 11 can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.In other embodiments In, the memory 11 can also be the external memory unit of the electronic device 1, such as be equipped with the electronic device 1 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, dodges Deposit card (Flash Card) etc..
In the present embodiment, the memory 11 can be not only used for storage be installed on the electronic device 1 application it is soft Part and Various types of data, such as business connection extraction procedure 10, knowledge base 13 and sample storehouse, can be also used for temporarily storing Output or the data that will be exported.
Processor 12 can be in certain embodiments a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chips, for the program code or processing data stored in run memory 11, example Such as perform the training of the computer program code and each class model of business connection extraction procedure 10.
Preferably, which can also include display, and display is properly termed as display screen or display unit. Display can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and OLED (Organic in some embodiments Light-Emitting Diode, Organic Light Emitting Diode) touch device etc..Display is used to show to be handled in the electronic apparatus 1 Information and for showing visual working interface, such as:The result and weight a of display model trainingiOptimal value.
Preferably, which can also include user interface, and user interface can include input unit such as keyboard (Keyboard), instantaneous speech power such as sound equipment, earphone etc., alternatively user interface can also be connect including the wired of standard Mouth, wave point.
In the device embodiment shown in Fig. 1, closed as enterprise is stored in a kind of memory 11 of computer-readable storage medium It is the program code of extraction procedure 10, when processor 12 performs the program code of business connection extraction procedure 10, realizes following step Suddenly:
Sample storehouse establishment step:Extract from knowledge base and sentence is established as training sample sentence there are the business entity of relation Sample storehouse;
Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default point Word instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence It is mapped to sentence vector Si, the input as Recognition with Recurrent Neural Network model first layer;
Splice step:In the second layer of Recognition with Recurrent Neural Network model, calculated from left to right currently with shot and long term memory module Term vector xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', the synthesis hidden layer state vector of each word in trained sample sentence is obtained by splicing two hidden layer state vectors, further according to The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in training sample sentencei
Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, utilize Average vector expression formula calculates the average vector S of each trained sample sentence;
Weight determines step:It is in last layer of Recognition with Recurrent Neural Network model, the average vector S and the enterprise is real The relationship type of body pair substitutes into the weight a that each trained sample sentence is calculated in softmax classification functionsi
Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, remembers mould by two-way shot and long term Block obtains the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained Recognition with Recurrent Neural Network model is inputted, prediction obtains Relation between the Liang Ge business entities.
In the present embodiment, it is assumed that Liang Ge business entities in knowledge base there are certain relation, then it is real comprising the Liang Ge enterprises The unstructured sentence of body can represent this relation.Therefore, when we need identify news in certain Liang Ge business entity it Between association when, from knowledge base extraction include all unstructured sentences of the Liang Ge business entities, using the sentence as Training sample sentence establishes sample storehouse.Wherein, the knowledge base is real comprising any two enterprise in history news data by collecting What the unstructured sentence of body was established.For example, it is desired to the association in news between certain Liang Ge business entity is identified, from knowledge base All unstructured sentences containing the Liang Ge business entities are extracted, and a sample is established using the sentence as training sample sentence Storehouse.Wherein business entity includes the relations such as treasury trade, supply chain and cooperation to existing relation." Foxconn is for example, sentence Rub and visit the supplier of bicycle " in the business entity that includes to for " Foxconn ", " rub and visit bicycle ", the relation between business entity " supplier " belongs to supply chain relationship.
All trained sample sentences for including a business entity pair are extracted from sample storehouse, each training sample sentence includes this to enterprise The title of industry entity and the relationship type of the business entity pair, and each trained sample sentence is carried out at participle using participle instrument Reason.Wherein it is possible to each trained sample sentence is divided using participle instruments such as Stanford Chinese word segmentings instrument, jieba participles Word processing.Each word after participle is represented in the form of one-hot vectors, obtains initial term vector.Wherein one-hot vectors Method refer to each vocabulary be shown as a very long vector, vectorial dimension represent word number, only one of which dimension Value be 1, remaining dimension be 0, which represents current word.Foxconn and Mo Bai bicycles are included for example, being extracted from sample storehouse All trained sample sentences, and each train sample sentence to include Foxconn and rub to visit the bicycle Liang Ge business entities title and the enterprise The relationship type (supplier) of industry entity pair.Word segmentation processing is carried out to " Foxconn be rub and visit bicycle supplier ", is obtained as follows As a result " Foxconn | be | rub and visit bicycle | | supplier ".Initial term vector such as " Foxconn " is [0100000000], "Yes" Initial term vector be [0010000000].Then ID is marked for each training sample sentence, sentence ID is mapped as corresponding training sample The initial one vector of sentence.
The initial term vector that the initial one vector sum trains the left and right adjacent word of some word in sample sentence is inputted into the company Continuous bag of words, prediction obtain the term vector x of the wordi.By the initial one vector renewal replace with the first renewal sentence to Amount, by described in the initial term vector input of the left and right adjacent word of next word in the first renewal sentence vector sum training sample sentence Continuous bag of words, prediction obtain the term vector x of the wordi+1, the described first renewal sentence vector renewal is replaced with into the second renewal Sentence vector, such repetitive exercise, training updates the sentence vector of the training sample sentence every time, until prediction obtains training in sample sentence The term vector x of each wordi, i=(0,1,2,3 ..., m), by the sentence vector after last time training renewal as the training The sentence vector S of sample sentencei, i=(0,1,2,3 ..., n).As Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) model first layer input.For example, by the left adjoining of "Yes" can word " Foxconn ", right adjoining can word Initial term vector and the initial one vector of " rub and visit bicycle " input continuous bag of words, and prediction obtains the term vector of "Yes" x2, initial one vector is once updated, obtains the first renewal sentence vector;The left adjoining that " will be rubbed and visit bicycle " can word The initial term vector or current term vector of "Yes", right adjoining can word " " initial word vector sum first to update sentence vector defeated Enter continuous bag of words, prediction obtains the term vector x of " rub and visit bicycle "3, the first renewal sentence vector is updated, obtains the Two renewal sentences vector ... such repetitive exercises, until prediction obtain it is above-mentioned it is all can word term vector xi, renewal obtains The sentence vector S of the training sample sentencei.In the process, the sentence ID of each news sentence remains constant.
In the second layer of RNN models, then with shot and long term memory module (Long Short-Term Memory, LSTM) from From left to right is according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current word vector xi One hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi', two hiding stratiforms are spliced by Concatenate functions State vector obtains training the synthesis hidden layer state vector of each word in sample sentence, hidden further according to the synthesis of all words in training sample sentence Hide layer state vector and obtain the feature vector T of each trained sample sentencei, i=(0,1,2,3 ..., n).For example, " Foxconn is to rub Visit the supplier of bicycle " in sentence, with LSTM from left to right according to the term vector x of " Foxconn "1Hidden layer state vector h1Meter Calculate the term vector x of "Yes"2The first hidden layer state vector h2, and from right to left according to the term vector x of " rub and visit bicycle "3It is hidden Hide layer state vector h3Calculate the term vector x of "Yes"2The second hidden layer state vector h2', spelled by Concatenate functions Meet two hidden layer state vector (h2And h2') the synthesis hidden layer state vector of each word in trained sample sentence is obtained, further according to instruction The synthesis hidden layer state vector for practicing all words in sample sentence obtains the feature vector T of each trained sample sentencei
In the third layer of RNN models, according to the feature vector T of each trained sample sentencei, it is public using the calculating of average vector Formula:S=sum (ai*Ti)/n, calculates the average vector S of each trained sample sentence.Wherein aiRepresent the weight of training sample sentence, TiRepresent The feature vector of each training sample sentence, n represent the quantity of training sample sentence.
In last layer of RNN models, average vector S is updated to softmax classification functions:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type. According to the relationship type of training Yang Juzhong business entities pair, determine to train the weight a of sample sentencei.It is constantly excellent by constantly learning Change the weight a of training sample sentenceiSo that effective sentence obtains higher weight, and the sentence for having noise obtains less weight.
In the present embodiment, after RNN models determine, the unstructured sentence of business entity pair can be carried to any one Son carries out Relationship Prediction, and the prediction of model is not associated with specific enterprise name.
The sentence of business entity of the extraction comprising two relations to be predicted from current text, and these sentences are divided Word obtains sentence vector.For example, S1,S2,S3,S4What is represented is the vector set of the corresponding sentence of Liang Ge business entities.By double Each sentence is extracted to shot and long term memory module (Bidirectional Long Short-term Memory, bi-LSTM) Feature vector T1,T2,T3,T4, the feature vector of each sentence is inputted into trained RNN models, obtains Liang Ge enterprises reality Relationship Prediction result between body.
The business connection extracting method that above-described embodiment proposes, exists by being extracted from non-structured text in knowledge base The training sample sentence of the business entity pair of relation establishes sample storehouse.All training of a business entity pair are included in sample drawn storehouse Sample sentence is simultaneously segmented, and obtains the sentence vector S of each trained sample sentencei, using LSTM calculate the feature of each trained sample sentence to Measure Ti.The average vector S of each trained sample sentence is calculated by the calculation formula of average vector, average vector S is substituted into softmax Classification function is calculated, and is determined to train the weight a of sample sentence according to the relationship type of business entity pairi.Finally from current text It is middle to extract the sentence for including Liang Ge business entities, obtain the feature vector T of sentence by bi-LSTMi, by this feature vector TiIt is defeated Enter trained RNN models, predict the relation between the Liang Ge business entities, not only reduce cumbersome training data and manually mark Step, and have more preferable accuracy rate and recall rate than other monitor modes.
As shown in Fig. 2, it is the module diagram of 10 preferred embodiment of business connection extraction procedure in Fig. 1.Alleged by the present invention Module be refer to complete specific function series of computation machine programmed instruction section.
In the present embodiment, business connection extraction procedure 10 includes:Establish module 110, word-dividing mode 120, concatenation module 130th, computing module 140, weight determination module 150, prediction module 160, the functions or operations that the module 110-160 is realized Step is similar as above, is no longer described in detail herein, exemplarily, such as wherein:
Module 110 is established, sentence is built as training sample sentence there are the business entity of relation for being extracted from knowledge base Vertical sample storehouse;
Word-dividing mode 120, for extracting all trained sample sentences for including a business entity pair from sample storehouse, using pre- If participle instrument each trained sample sentence is segmented, each word after participle is mapped to term vector xi, and will each instruct Practice sample sentence and be mapped to sentence vector Si, the input as RNN model first layers;
Concatenation module 130, for the second layer in RNN models, current word vector x is calculated with LSTM from left to righti One hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', pass through splicing Two hidden layer state vectors obtain training the synthesis hidden layer state vector of each word in sample sentence, further according to institute in training sample sentence The synthesis hidden layer state vector for having word obtains the feature vector T of each trained sample sentencei
Computing module 140, for the third layer in RNN models, according to the feature vector T of each trained sample sentencei, using flat Equal vector expression calculates the average vector S of each trained sample sentence;
Weight determination module 150, for last layer in RNN models, by the average vector S and the business entity To relationship type substitute into softmax classification functions the weight a of each trained sample sentence be calculatedi
Prediction module 160, for extracting the sentence for including Liang Ge business entities from current text, obtains by bi-LSTM To the feature vector T of sentencei, by this feature vector TiAbove-mentioned trained RNN models are inputted, prediction obtains Liang Ge enterprises reality Relation between body.
As shown in figure 3, it is the flow chart of business connection extracting method preferred embodiment of the present invention.
In the present embodiment, processor 12 performs the computer journey of the business connection extraction procedure 10 stored in memory 11 The following steps of business connection extracting method are realized during sequence:
Step S10, extracts from knowledge base and establishes sample storehouse as training sample sentence to sentence there are the business entity of relation;
Step S20, all trained sample sentences for including a business entity pair are extracted from sample storehouse, use default participle Instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and each trained sample sentence is reflected Penetrate the subvector S that forms a complete sentencei, the input as RNN model first layers;
Step S30, in the second layer of RNN models, current word vector x is calculated with LSTM from left to rightiThe first hidden layer State vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', hidden by splicing two Layer state vector obtains training the synthesis hidden layer state vector of each word in sample sentence, further according in training sample sentence all words it is comprehensive Close hidden layer state vector and obtain the feature vector T of each trained sample sentencei
Step S40, in the third layer of RNN models, according to the feature vector T of each trained sample sentencei, utilize average vector table The average vector S of each trained sample sentence is calculated up to formula;
Step S50, in last layer of RNN models, by the average vector S and the relationship type of the business entity pair Substitute into the weight a that each trained sample sentence is calculated in softmax classification functionsi
Step S60, the sentence for including Liang Ge business entities is extracted from current text, sentence is obtained by bi-LSTM Feature vector Ti, by this feature vector TiAbove-mentioned trained RNN models are inputted, prediction obtains the pass between the Liang Ge business entities System.
In the present embodiment, it is assumed that Liang Ge business entities in knowledge base there are certain relation, then it is real comprising the Liang Ge enterprises The unstructured sentence of body can represent this relation.When we need to identify the pass in news between certain Liang Ge business entity During connection, all unstructured sentences for including the Liang Ge business entities are extracted out from knowledge base, using the sentence as training sample Sentence establishes sample storehouse.Wherein, the knowledge base is non-comprising any two business entity in history news data by collecting What structuring sentence was established.For example, it is desired to identify the association in news between certain Liang Ge business entity, extract and contain from knowledge base There are all unstructured sentences of the Liang Ge business entities, and a sample storehouse is established using the sentence as training sample sentence.Its Middle business entity includes the relations such as treasury trade, supply chain and cooperation to existing relation.For example, taken out from non-structured text The sentence that contains " Foxconn " and " rub and visit bicycle " Liang Ge business entities pair is taken as training sample sentence, wherein " Foxconn is sentence Rub and visit the supplier of bicycle " in the business entity that includes to for " Foxconn ", " rub and visit bicycle ", the relation between business entity " supplier " belongs to supply chain relationship.
All trained sample sentences for including a business entity pair are extracted from sample storehouse, each training sample sentence includes this to enterprise The title of industry entity and the relationship type of the business entity pair, and each trained sample sentence is carried out at participle using participle instrument Reason.For example, extracting all trained sample sentences comprising Foxconn and Mo Bai bicycles from sample storehouse, and each sample sentence is trained to wrap Include Foxconn and rub and visit the relationship type (supplier) of the bicycle Liang Ge business entities title and the business entity pair.Use The participle instruments such as Stanford Chinese word segmentings instrument, jieba participles carry out word segmentation processing to each trained sample sentence.Such as:To " rich Scholar's health is the supplier for rubbing and visiing bicycle " carry out word segmentation processing, obtain following result " Foxconn | be | rub and visit bicycle | | supply Business ".Each word after participle is represented in the form of one-hot vectors, obtains initial term vector.Wherein one-hot vectors Method refers to is shown as a very long vector each vocabulary, vectorial dimension represent word number, only one of which dimension It is worth for 1, remaining dimension is 0, which represents current word.For example, the initial term vector of " Foxconn " for [0100000000], The initial term vector of "Yes" is [0010000000].Then ID is marked for each training sample sentence, sentence ID is mapped as corresponding instruction Practice the initial one vector of sample sentence.
The initial term vector that the initial one vector sum trains the left and right adjacent word of some word in sample sentence is inputted into the company Continuous bag of words, prediction obtain the term vector x of the wordi.By the initial one vector renewal replace with the first renewal sentence to Amount, by described in the initial term vector input of the left and right adjacent word of next word in the first renewal sentence vector sum training sample sentence Continuous bag of words, prediction obtain the term vector x of the wordi+1, the described first renewal sentence vector renewal is replaced with into the second renewal Sentence vector, such repetitive exercise, training updates the sentence vector of the training sample sentence every time, until prediction obtains training in sample sentence The term vector x of each wordi, i=(0,1,2,3 ..., m), by the sentence vector after last time training renewal as the training The sentence vector S of sample sentencei, i=(0,1,2,3 ..., n).For example, in " Foxconn is the supplier for rubbing and visiing bicycle " sentence, will The left adjoining of "Yes" can word " Foxconn ", right adjoining can word " rub and visit bicycle " initial term vector and initial one vector Continuous bag of words are inputted, prediction obtains the term vector x of "Yes"2, initial one vector is once updated, obtains first more New sentence vector;By the left adjoining of " rub and visit bicycle " can word "Yes" initial term vector or current term vector, right adjoining can use Word " " initial word vector sum first update sentence vector and input continuous bag of words, prediction obtain the word of " rub and visit bicycle " to Measure x3, the first renewal sentence vector is updated, the such repetitive exercise of the second renewal sentence vector ... is obtained, until prediction Obtain it is above-mentioned it is all can word term vector xi, update and obtain the sentence vector S of the training sample sentencei.In the process, Mei Gexin The sentence ID for hearing sentence remains constant.
In the second layer of RNN models, then with LSTM from left to right according to current word vector xiPrevious term vector xi-1 Hidden layer state vector hi-1Calculate current word vector xiThe first hidden layer state vector hi, and from right to left according to current word Vector xiThe latter term vector xi+1Hidden layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi', two hidden layer state vectors are spliced by Concatenate functions and obtain the synthesis hidden layer of each word in trained sample sentence State vector, the feature vector of each trained sample sentence is obtained further according to the synthesis hidden layer state vector of all words in training sample sentence Ti, i=(0,1,2,3 ..., n).For example, in " Foxconn is the supplier for rubbing and visiing bicycle " sentence, with LSTM roots from left to right According to the term vector x of " Foxconn "1Hidden layer state vector h1Calculate the term vector x of "Yes"2The first hidden layer state vector h2, and from right to left according to the term vector x of " rub and visit bicycle "3Hidden layer state vector h3Calculate the term vector x of "Yes"2 Two hidden layer state vector h2', two hidden layer state vector (h are spliced by Concatenate functions2And h2') trained The synthesis hidden layer state vector of each word in sample sentence, obtains further according to the synthesis hidden layer state vector of all words in training sample sentence To the feature vector T of each trained sample sentencei
In the third layer of RNN models, according to the feature vector T of each trained sample sentencei, it is public using the calculating of average vector Formula:S=sum (ai*Ti)/n, calculates the average vector S of each trained sample sentence.Wherein aiRepresent the weight of training sample sentence, TiRepresent The feature vector of each training sample sentence, n represent the quantity of training sample sentence.It is assumed that " Foxconn " and " Mo Bai are extracted from knowledge base The training sample sentence of bicycle " entity pair has 50,000, then by the feature vector T of every trained sample sentencei, i=(0,1,2,3 ..., n) Substitute into the calculation formula of average vector:S=sum (ai*Ti)/n, calculates the average vector S of each trained sample sentence.Wherein n is equal to 5 Ten thousand.
In last layer of RNN models, average vector S is then updated to softmax classification functions:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type. According to the relationship type of training Yang Juzhong business entities pair, determine to train the weight a of sample sentencei.By constantly iterative learning, no The weight a of disconnected optimization training sample sentenceiSo that effective sentence obtains higher weight, and the sentence for having noise obtains less power Weight, so as to obtain reliable RNN models.
In the present embodiment, after RNN models determine, the unstructured sentence of business entity pair can be carried to any one Son carries out Relationship Prediction, and the prediction of model is not associated with specific enterprise name.
Finally, as shown in figure 4, being the frame diagram of prediction module of the present invention.Extraction is treated pre- comprising two from current text The sentence of the business entity of survey relation, sentence of the extraction comprising " Chinese safety group " with " Bank of China " such as from news, and These sentences are segmented to obtain sentence vector.For example, S1,S2,S3,S4What is represented is the corresponding sentence of Liang Ge business entities Vector set.The feature vector T of each sentence is extracted by bi-LSTM1,T2,T3,T4, then by calculating TiWith relation object The similarities of type r vectors assigns TiIn the weight that whole sentence is concentrated, finally take in each sentence weighting and pass through with after Softmax graders predict the relation of " Chinese safety group " between " Bank of China ".
The business connection extracting method that above-described embodiment proposes, exists by being extracted from non-structured text in knowledge base The sentence of the business entity pair of relation is as training sample sentence and establishes sample storehouse.A business entity pair is included in sample drawn storehouse All trained sample sentences and segmented, obtain the sentence vector S of each trained sample sentencei, each trained sample is calculated using LSTM The feature vector T of sentencei.Then the average vector S of each trained sample sentence is calculated by the calculation formula of average vector, will it is average to Amount S substitutes into softmax classification functions and is calculated, and is determined to train the weight a of sample sentence according to the relationship type of business entity pairi。 The sentence for including Liang Ge business entities is finally extracted from current text, the feature vector T of sentence is obtained by bi-LSTMi, will This feature vector TiTrained RNN models are inputted, the relation between the Liang Ge business entities is predicted, improves in news to difference The recognition capability of relation and the early warning to business risk between enterprise, reduce the artificial annotation step of cumbersome training data.
In addition, the embodiment of the present invention also proposes a kind of computer-readable recording medium, the computer-readable recording medium Include business connection extraction procedure 10, following operation is realized when the business connection extraction procedure 10 is executed by processor:
Sample storehouse establishment step:Extract from knowledge base and sentence is established as training sample sentence there are the business entity of relation Sample storehouse;
Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default point Word instrument segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence It is mapped to sentence vector Si, the input as RNN model first layers;
Splice step:In the second layer of RNN models, current word vector x is calculated from left to right with LSTMiThe first hidden layer State vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', hidden by splicing two Layer state vector obtains training the synthesis hidden layer state vector of each word in sample sentence, further according in training sample sentence all words it is comprehensive Close hidden layer state vector and obtain the feature vector T of each trained sample sentencei
Calculation procedure:In the third layer of RNN models, according to the feature vector T of each trained sample sentencei, utilize average vector Expression formula calculates the average vector S of each trained sample sentence;
Weight determines step:In last layer of RNN models, by the average vector S and the pass of the business entity pair Set type substitutes into the weight a that each trained sample sentence is calculated in softmax classification functionsi
Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, and sentence is obtained by bi-LSTM Feature vector Ti, by this feature vector TiAbove-mentioned trained RNN models are inputted, prediction obtains the pass between the Liang Ge business entities System.
Preferably, the participle step includes:
Each word after participle is represented in the form of one-hot vectors, obtains initial term vector, and is each training sample Sentence ID, is mapped as the initial one vector of corresponding training sample sentence by sentence mark sentence ID, by the initial one vector sum instruction The initial term vector for practicing the left and right adjacent word of some word in sample sentence inputs the continuous bag of words, and prediction obtains the word of the word Vector xi, the sentence vector of each forecast updating training sample sentence, until prediction obtain the word of each word in the training sample sentence to Measure xi, the sentence vector S of the training sample sentence is used as using the sentence vector after updating for the last timei
Preferably, the splicing step includes:
From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current Term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1It is hidden Hide layer state vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
Preferably, the average vector expression formula is:
S=sum (ai*Ti)/n
Wherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents training sample sentence Quantity.
Preferably, the softmax classification functions expression formula is:
Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Generation Certain business connection type of table, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
The embodiment of the computer-readable recording medium of the present invention is specific with above-mentioned business connection extracting method Embodiment is roughly the same, and details are not described herein.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme substantially in other words does the prior art Going out the part of contribution can be embodied in the form of software product, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions use so that a station terminal equipment (can be mobile phone, Computer, server, or network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made, is directly or indirectly used in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

  1. A kind of 1. business connection extracting method, it is characterised in that the described method includes:
    Sample storehouse establishment step:Extracted from knowledge base and sample is established as training sample sentence to sentence there are the business entity of relation Storehouse;
    Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default participle work Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence to map Form a complete sentence subvector Si, the input as Recognition with Recurrent Neural Network model first layer;
    Splice step:In the second layer of Recognition with Recurrent Neural Network model, with shot and long term memory module calculate from left to right current word to Measure xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', Obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to training The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in sample sentencei
    Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, using averagely Vector expression calculates the average vector S of each trained sample sentence;
    Weight determines step:In last layer of Recognition with Recurrent Neural Network model, by the average vector S and the business entity pair Relationship type substitute into softmax classification functions the weight a of each trained sample sentence be calculatedi
    Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, is obtained by two-way shot and long term memory module To the feature vector T of sentencei, by this feature vector TiInput above-mentioned trained Recognition with Recurrent Neural Network model, prediction obtain this two Relation between a business entity.
  2. 2. business connection extracting method according to claim 1, it is characterised in that the participle step includes:
    Each word after participle is represented in the form of one-hot vectors, obtains initial term vector, and is each training sample sentence mark Sentence ID is noted, sentence ID is mapped as to the initial one vector of corresponding training sample sentence, by the initial one vector sum training sample The initial term vector of the left and right adjacent word of some word inputs the continuous bag of words in sentence, and prediction obtains the term vector of the word xi, the sentence vector of each forecast updating training sample sentence, until prediction obtains the term vector x of each word in the training sample sentencei, The sentence vector S of the training sample sentence is used as using the sentence vector after updating for the last timei
  3. 3. business connection extracting method according to claim 1, it is characterised in that the splicing step includes:
    From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hiding stratiform State vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
  4. 4. business connection extracting method according to claim 1, it is characterised in that the expression formula of the average vector is:
    S=sum (ai*Ti)/n
    Wherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents the quantity of training sample sentence.
  5. 5. business connection extracting method according to claim 4, it is characterised in that the table of the softmax classification functions It is up to formula:
    <mrow> <mi>&amp;sigma;</mi> <msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>j</mi> </msub> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mi>S</mi> </msup> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </msubsup> <msup> <mi>e</mi> <msub> <mi>S</mi> <mi>k</mi> </msub> </msup> </mrow> </mfrac> </mrow>
    Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Represent certain Kind business connection type, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
  6. 6. a kind of electronic device, it is characterised in that described device includes:Memory, processor, are stored with enterprise on the memory Industry relation extraction procedure, the business connection extraction procedure are performed, it can be achieved that following steps by the processor:
    Sample storehouse establishment step:Extracted from knowledge base and sample is established as training sample sentence to sentence there are the business entity of relation Storehouse;
    Segment step:All trained sample sentences for including a business entity pair are extracted from sample storehouse, use default participle work Tool segments each trained sample sentence, and each word after participle is mapped to term vector xi, and will each train sample sentence to map Form a complete sentence subvector Si, the input as Recognition with Recurrent Neural Network model first layer;
    Splice step:In the second layer of Recognition with Recurrent Neural Network model, with shot and long term memory module calculate from left to right current word to Measure xiThe first hidden layer state vector hi, and current word vector x is calculated from right to leftiThe second hidden layer state vector hi', Obtain training the synthesis hidden layer state vector of each word in sample sentence by splicing two hidden layer state vectors, further according to training The synthesis hidden layer state vector of all words obtains the feature vector T of each trained sample sentence in sample sentencei
    Calculation procedure:In the third layer of Recognition with Recurrent Neural Network model, according to the feature vector T of each trained sample sentencei, using averagely Vector expression calculates the average vector S of each trained sample sentence;
    Weight determines step:In last layer of Recognition with Recurrent Neural Network model, by the average vector S and the business entity pair Relationship type substitute into softmax classification functions the weight a of each trained sample sentence be calculatedi
    Prediction steps:Extraction includes the sentence of Liang Ge business entities from current text, is obtained by two-way shot and long term memory module To the feature vector T of sentencei, by this feature vector TiInput above-mentioned trained Recognition with Recurrent Neural Network model, prediction obtain this two Relation between a business entity.
  7. 7. electronic device according to claim 6, it is characterised in that the splicing step includes:
    From left to right according to current word vector xiPrevious term vector xi-1Hidden layer state vector hi-1Calculate current term vector xiThe first hidden layer state vector hi, and from right to left according to current word vector xiThe latter term vector xi+1Hiding stratiform State vector hi+1Calculate current word vector xiThe second hidden layer state vector hi’。
  8. 8. electronic device according to claim 6, it is characterised in that the expression formula of the average vector is:
    S=sum (ai*Ti)/n
    Wherein aiRepresent the weight of training sample sentence, TiThe feature vector of each training sample sentence is represented, n represents the quantity of training sample sentence.
  9. 9. electronic device according to claim 8, it is characterised in that the expression formula of the softmax classification functions is:
    <mrow> <mi>&amp;sigma;</mi> <msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>j</mi> </msub> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mi>S</mi> </msup> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </msubsup> <msup> <mi>e</mi> <msub> <mi>S</mi> <mi>k</mi> </msub> </msup> </mrow> </mfrac> </mrow>
    Wherein K represents the number of business connection type, and S represents the average vector for needing to predict business connection type,Represent certain Kind business connection type, σ (z)jRepresent probability of the business connection type for needing to predict in each business connection type.
  10. 10. a kind of computer-readable recording medium, it is characterised in that the computer-readable recording medium includes business connection Extraction procedure, it can be achieved that such as any one of claim 1 to 5 institute when the system business connection extraction procedure is executed by processor The step of stating business connection extracting method.
CN201711061205.0A 2017-11-02 2017-11-02 Business connection extracting method, device and storage medium Active CN107943847B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711061205.0A CN107943847B (en) 2017-11-02 2017-11-02 Business connection extracting method, device and storage medium
PCT/CN2018/076119 WO2019085328A1 (en) 2017-11-02 2018-02-10 Enterprise relationship extraction method and device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711061205.0A CN107943847B (en) 2017-11-02 2017-11-02 Business connection extracting method, device and storage medium

Publications (2)

Publication Number Publication Date
CN107943847A true CN107943847A (en) 2018-04-20
CN107943847B CN107943847B (en) 2019-05-17

Family

ID=61934111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711061205.0A Active CN107943847B (en) 2017-11-02 2017-11-02 Business connection extracting method, device and storage medium

Country Status (2)

Country Link
CN (1) CN107943847B (en)
WO (1) WO2019085328A1 (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876044A (en) * 2018-06-25 2018-11-23 中国人民大学 Content popularit prediction technique on a kind of line of knowledge based strength neural network
CN108920587A (en) * 2018-06-26 2018-11-30 清华大学 Merge the open field vision answering method and device of external knowledge
CN108985501A (en) * 2018-06-29 2018-12-11 平安科技(深圳)有限公司 Stock index prediction method, server and the storage medium extracted based on index characteristic
CN109063032A (en) * 2018-07-16 2018-12-21 清华大学 A kind of noise-reduction method of remote supervisory retrieval data
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning
CN109376250A (en) * 2018-09-27 2019-02-22 中山大学 Entity relationship based on intensified learning combines abstracting method
CN109582956A (en) * 2018-11-15 2019-04-05 中国人民解放军国防科技大学 text representation method and device applied to sentence embedding
CN109597851A (en) * 2018-09-26 2019-04-09 阿里巴巴集团控股有限公司 Feature extracting method and device based on incidence relation
CN109710768A (en) * 2019-01-10 2019-05-03 西安交通大学 A kind of taxpayer's industry two rank classification method based on MIMO recurrent neural network
CN110188202A (en) * 2019-06-06 2019-08-30 北京百度网讯科技有限公司 Training method, device and the terminal of semantic relation identification model
CN110188201A (en) * 2019-05-27 2019-08-30 上海上湖信息技术有限公司 A kind of information matching method and equipment
CN110209836A (en) * 2019-05-17 2019-09-06 北京邮电大学 Remote supervisory Relation extraction method and device
CN110427624A (en) * 2019-07-30 2019-11-08 北京百度网讯科技有限公司 Entity relation extraction method and device
CN110737758A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating a model
CN111476035A (en) * 2020-05-06 2020-07-31 中国人民解放军国防科技大学 Chinese open relation prediction method and device, computer equipment and storage medium
CN111581387A (en) * 2020-05-09 2020-08-25 电子科技大学 Entity relation joint extraction method based on loss optimization
CN111680127A (en) * 2020-06-11 2020-09-18 暨南大学 Annual report-oriented company name and relationship extraction method
CN111784488A (en) * 2020-06-28 2020-10-16 中国工商银行股份有限公司 Enterprise capital risk prediction method and device
CN111950279A (en) * 2019-05-17 2020-11-17 百度在线网络技术(北京)有限公司 Entity relationship processing method, device, equipment and computer readable storage medium
CN112036181A (en) * 2019-05-14 2020-12-04 上海晶赞融宣科技有限公司 Entity relationship identification method and device and computer readable storage medium
CN112215288A (en) * 2020-10-13 2021-01-12 中国光大银行股份有限公司 Target enterprise category determination method and device, storage medium and electronic device
CN112418320A (en) * 2020-11-24 2021-02-26 杭州未名信科科技有限公司 Enterprise association relation identification method and device and storage medium
CN113486630A (en) * 2021-09-07 2021-10-08 浙江大学 Supply chain data vectorization and visualization processing method and device
CN113806538A (en) * 2021-09-17 2021-12-17 平安银行股份有限公司 Label extraction model training method, device, equipment and storage medium
CN116562303A (en) * 2023-07-04 2023-08-08 之江实验室 Reference resolution method and device for reference external knowledge

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619053A (en) * 2019-09-18 2019-12-27 北京百度网讯科技有限公司 Training method of entity relation extraction model and method for extracting entity relation
CN110879938A (en) * 2019-11-14 2020-03-13 中国联合网络通信集团有限公司 Text emotion classification method, device, equipment and storage medium
CN111382843B (en) * 2020-03-06 2023-10-20 浙江网商银行股份有限公司 Method and device for establishing enterprise upstream and downstream relationship identification model and mining relationship

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217393A1 (en) * 2013-09-12 2016-07-28 Hewlett-Packard Development Company, L.P. Information extraction
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106407211A (en) * 2015-07-30 2017-02-15 富士通株式会社 Method and device for classifying semantic relationships among entity words
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106855853A (en) * 2016-12-28 2017-06-16 成都数联铭品科技有限公司 Entity relation extraction system based on deep neural network
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194422A (en) * 2017-06-19 2017-09-22 中国人民解放军国防科学技术大学 A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217393A1 (en) * 2013-09-12 2016-07-28 Hewlett-Packard Development Company, L.P. Information extraction
CN106407211A (en) * 2015-07-30 2017-02-15 富士通株式会社 Method and device for classifying semantic relationships among entity words
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106855853A (en) * 2016-12-28 2017-06-16 成都数联铭品科技有限公司 Entity relation extraction system based on deep neural network
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
LEI MENG ET AL: "An Improved Method for Chinese Company Name and Abbreviation Recognition", 《 KNOWLEDGE MANAGEMENT IN ORGANIZATIONS》 *
PENG ZHOU ET AL: "Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification", 《PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *
XIAOYUN HOU ET AL: "Classifying Relation via Bidirectional Recurrent Neural Network Based on Local Information", 《WEB TECHNOLOGIES AND APPLICATIONS》 *
YONGHUI WU ET AL: "named entity recognition in Chinese clinical text using deep neural network", 《STUDIES IN HEALTH TECHNOLOGY & INFORMATION》 *
胡新辰: "基于LSTM的语义关系分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
郭喜跃 等: "基于句法语义特征的中文实体关系抽取", 《中文信息学报》 *
黄蓓静 等: "远程监督人物关系抽取中的去噪研究", 《计算机应用与软件》 *

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876044A (en) * 2018-06-25 2018-11-23 中国人民大学 Content popularit prediction technique on a kind of line of knowledge based strength neural network
CN108876044B (en) * 2018-06-25 2021-02-26 中国人民大学 Online content popularity prediction method based on knowledge-enhanced neural network
CN108920587A (en) * 2018-06-26 2018-11-30 清华大学 Merge the open field vision answering method and device of external knowledge
CN108985501A (en) * 2018-06-29 2018-12-11 平安科技(深圳)有限公司 Stock index prediction method, server and the storage medium extracted based on index characteristic
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning
CN108985501B (en) * 2018-06-29 2022-04-29 平安科技(深圳)有限公司 Index feature extraction-based stock index prediction method, server and storage medium
CN110737758B (en) * 2018-07-03 2022-07-05 百度在线网络技术(北京)有限公司 Method and apparatus for generating a model
CN110737758A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating a model
US11501182B2 (en) 2018-07-03 2022-11-15 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for generating model
CN109063032A (en) * 2018-07-16 2018-12-21 清华大学 A kind of noise-reduction method of remote supervisory retrieval data
CN109063032B (en) * 2018-07-16 2020-09-11 清华大学 Noise reduction method for remote supervision and retrieval data
CN109597851B (en) * 2018-09-26 2023-03-21 创新先进技术有限公司 Feature extraction method and device based on incidence relation
CN109597851A (en) * 2018-09-26 2019-04-09 阿里巴巴集团控股有限公司 Feature extracting method and device based on incidence relation
CN109376250A (en) * 2018-09-27 2019-02-22 中山大学 Entity relationship based on intensified learning combines abstracting method
CN109582956A (en) * 2018-11-15 2019-04-05 中国人民解放军国防科技大学 text representation method and device applied to sentence embedding
CN109710768B (en) * 2019-01-10 2020-07-28 西安交通大学 Tax payer industry two-level classification method based on MIMO recurrent neural network
CN109710768A (en) * 2019-01-10 2019-05-03 西安交通大学 A kind of taxpayer's industry two rank classification method based on MIMO recurrent neural network
CN112036181A (en) * 2019-05-14 2020-12-04 上海晶赞融宣科技有限公司 Entity relationship identification method and device and computer readable storage medium
CN110209836A (en) * 2019-05-17 2019-09-06 北京邮电大学 Remote supervisory Relation extraction method and device
CN110209836B (en) * 2019-05-17 2022-04-26 北京邮电大学 Remote supervision relation extraction method and device
CN111950279A (en) * 2019-05-17 2020-11-17 百度在线网络技术(北京)有限公司 Entity relationship processing method, device, equipment and computer readable storage medium
CN110188201A (en) * 2019-05-27 2019-08-30 上海上湖信息技术有限公司 A kind of information matching method and equipment
CN110188202A (en) * 2019-06-06 2019-08-30 北京百度网讯科技有限公司 Training method, device and the terminal of semantic relation identification model
CN110427624B (en) * 2019-07-30 2023-04-25 北京百度网讯科技有限公司 Entity relation extraction method and device
CN110427624A (en) * 2019-07-30 2019-11-08 北京百度网讯科技有限公司 Entity relation extraction method and device
CN111476035B (en) * 2020-05-06 2023-09-05 中国人民解放军国防科技大学 Chinese open relation prediction method, device, computer equipment and storage medium
CN111476035A (en) * 2020-05-06 2020-07-31 中国人民解放军国防科技大学 Chinese open relation prediction method and device, computer equipment and storage medium
CN111581387B (en) * 2020-05-09 2022-10-11 电子科技大学 Entity relation joint extraction method based on loss optimization
CN111581387A (en) * 2020-05-09 2020-08-25 电子科技大学 Entity relation joint extraction method based on loss optimization
CN111680127A (en) * 2020-06-11 2020-09-18 暨南大学 Annual report-oriented company name and relationship extraction method
CN111784488B (en) * 2020-06-28 2023-08-01 中国工商银行股份有限公司 Enterprise fund risk prediction method and device
CN111784488A (en) * 2020-06-28 2020-10-16 中国工商银行股份有限公司 Enterprise capital risk prediction method and device
CN112215288A (en) * 2020-10-13 2021-01-12 中国光大银行股份有限公司 Target enterprise category determination method and device, storage medium and electronic device
CN112215288B (en) * 2020-10-13 2024-04-30 中国光大银行股份有限公司 Method and device for determining category of target enterprise, storage medium and electronic device
CN112418320A (en) * 2020-11-24 2021-02-26 杭州未名信科科技有限公司 Enterprise association relation identification method and device and storage medium
CN112418320B (en) * 2020-11-24 2024-01-19 杭州未名信科科技有限公司 Enterprise association relation identification method, device and storage medium
CN113486630A (en) * 2021-09-07 2021-10-08 浙江大学 Supply chain data vectorization and visualization processing method and device
CN113806538A (en) * 2021-09-17 2021-12-17 平安银行股份有限公司 Label extraction model training method, device, equipment and storage medium
CN113806538B (en) * 2021-09-17 2023-08-22 平安银行股份有限公司 Label extraction model training method, device, equipment and storage medium
CN116562303A (en) * 2023-07-04 2023-08-08 之江实验室 Reference resolution method and device for reference external knowledge
CN116562303B (en) * 2023-07-04 2023-11-21 之江实验室 Reference resolution method and device for reference external knowledge

Also Published As

Publication number Publication date
CN107943847B (en) 2019-05-17
WO2019085328A1 (en) 2019-05-09

Similar Documents

Publication Publication Date Title
CN107943847B (en) Business connection extracting method, device and storage medium
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
CN108647205B (en) Fine-grained emotion analysis model construction method and device and readable storage medium
CN110489555A (en) A kind of language model pre-training method of combination class word information
CN108563703A (en) A kind of determination method of charge, device and computer equipment, storage medium
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN106202010A (en) The method and apparatus building Law Text syntax tree based on deep neural network
CN110222178A (en) Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing
CN108647191B (en) Sentiment dictionary construction method based on supervised sentiment text and word vector
CN113051356B (en) Open relation extraction method and device, electronic equipment and storage medium
CN110222184A (en) A kind of emotion information recognition methods of text and relevant apparatus
CN110502626A (en) A kind of aspect grade sentiment analysis method based on convolutional neural networks
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN110059924A (en) Checking method, device, equipment and the computer readable storage medium of contract terms
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN115392237B (en) Emotion analysis model training method, device, equipment and storage medium
CN115357719A (en) Power audit text classification method and device based on improved BERT model
CN108920446A (en) A kind of processing method of Engineering document
CN114528398A (en) Emotion prediction method and system based on interactive double-graph convolutional network
CN113204967A (en) Resume named entity identification method and system
CN103678318A (en) Multi-word unit extraction method and equipment and artificial neural network training method and equipment
CN107943788A (en) Enterprise&#39;s abbreviation generation method, device and storage medium
CN116681082A (en) Discrete text semantic segmentation method, device, equipment and storage medium
CN117290515A (en) Training method of text annotation model, method and device for generating text graph
CN113688232B (en) Method and device for classifying bid-inviting text, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant