CN105975456A - Enterprise entity name analysis and identification system - Google Patents
Enterprise entity name analysis and identification system Download PDFInfo
- Publication number
- CN105975456A CN105975456A CN201610286191.1A CN201610286191A CN105975456A CN 105975456 A CN105975456 A CN 105975456A CN 201610286191 A CN201610286191 A CN 201610286191A CN 105975456 A CN105975456 A CN 105975456A
- Authority
- CN
- China
- Prior art keywords
- prime
- word
- leftarrow
- rightarrow
- enterprise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims description 15
- 238000013528 artificial neural network Methods 0.000 claims abstract description 33
- 230000000306 recurrent effect Effects 0.000 claims abstract description 20
- 210000002569 neuron Anatomy 0.000 claims description 44
- 230000007935 neutral effect Effects 0.000 claims description 41
- 238000012549 training Methods 0.000 claims description 15
- 239000000284 extract Substances 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 10
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 5
- 230000005284 excitation Effects 0.000 claims description 2
- 238000000605 extraction Methods 0.000 abstract description 4
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 230000002457 bidirectional effect Effects 0.000 abstract 2
- 238000002360 preparation method Methods 0.000 abstract 1
- 238000000034 method Methods 0.000 description 16
- PCTMTFRHKVHKIS-BMFZQQSSSA-N (1s,3r,4e,6e,8e,10e,12e,14e,16e,18s,19r,20r,21s,25r,27r,30r,31r,33s,35r,37s,38r)-3-[(2r,3s,4s,5s,6r)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-19,25,27,30,31,33,35,37-octahydroxy-18,20,21-trimethyl-23-oxo-22,39-dioxabicyclo[33.3.1]nonatriaconta-4,6,8,10 Chemical compound C1C=C2C[C@@H](OS(O)(=O)=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2.O[C@H]1[C@@H](N)[C@H](O)[C@@H](C)O[C@H]1O[C@H]1/C=C/C=C/C=C/C=C/C=C/C=C/C=C/[C@H](C)[C@@H](O)[C@@H](C)[C@H](C)OC(=O)C[C@H](O)C[C@H](O)CC[C@@H](O)[C@H](O)C[C@H](O)C[C@](O)(C[C@H](O)[C@H]2C(O)=O)O[C@H]2C1 PCTMTFRHKVHKIS-BMFZQQSSSA-N 0.000 description 12
- 230000000875 corresponding effect Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000007405 data analysis Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of natural language processing, in particular to an enterprise entity name analyzing and identifying system which comprises a bidirectional recurrent neural network module. The system automatically learns the characteristics of basic elements of the text, such as characters, words, punctuation marks and the like, and applies the RNN with bidirectional propagation to ensure that the classification judgment result of the natural language sequence to be recognized depends on context information, and the preparation rate of extraction and judgment is higher.
Description
Technical field
The present invention relates to natural language processing field, particularly to a kind of business entity name analysis identification system.
Background technology
Along with the fast development of the Internet, create web data substantial amounts of, disclosed, the most therefore facilitated various based on
The new industry of big data technique, such as the Internet medical treatment, Internet education, enterprise or individual's reference etc..These the Internets
The rise of industry be unable to do without substantial amounts of information data analysis in prosperity, and the value of information analysis be accurate and sharp, sharp
Analyze require find new information quickly;But directly getting data major part from webpage is all destructuring
, in order to use these data, data cleansing work Cheng Liaoge major company expends the most place of time energy.And data cleansing
Central customizing messages extracts, and the extraction particularly naming entity is again recurrent thing, such as does enterprise's reference, most common
Task be exactly in the middle of big length text, extract the name of company.
In addition to the common rule according to " provinces and cities+keyword+industry+type of organization " is named, there is also a large amount of
Exception, such as exabyte do not use provinces and cities as beginning, or in informal text, exabyte may to write a Chinese character in simplified form,
The mode of abbreviation occurs, this recall rate that directly results in the information analysis using traditional mode to carry out is the highest.In addition with
The prosperity of market economy, the enterprise dominant newly increased constantly occurs, new main market players also can occur in therewith various respectively
In the network data of sample or Media News, find fast and accurately from the webpage information of magnanimity and extract the mechanism's name made new advances
Claim, the promptness of Analysis on Issues Related is had to the meaning of particular importance.
Traditional natural language processing method uses condition random field (CRF) that text is carried out Series Modeling, carries out text
Analyze and identify and find exabyte.Use condition random field, it is necessary first to carry out design construction according to the feature of entity to be identified special
Levying template, feature templates includes the single order word of specified window size context or multistage phrase, the prefix of word, suffix, part of speech
The state features such as mark;The structure of feature templates takes time and effort very much, and recognition result is very big to the degree of dependence of feature templates, and
The feature templates manually arranged is often only in accordance with the feature of part sample, poor universality;And be typically only capable to use the upper of local
Context information, the use of each feature templates is also separate, it was predicted that can not rely on longer historic state information, also without
Method utilizes the information feedback in longer future to correct possible history mistake;Prediction process wastes time and energy, it was predicted that result is difficult to reality
Existing global optimum.
In magnanimity information, analyze, in order in time sharp, the information agent made new advances, research and develop and a set of can find in time and search
The system of collection new firms title is of great value.
Summary of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of prior art, the present invention provides a kind of business entity
Name analysis identification system, utilizes existing enterprise name data mark sample to train described forward-backward recutrnce neutral net, leads to
Cross recurrent neural network the enterprise dominant title in text is predicted, find the enterprise name in pending text, and
Extract new firms title further.
In order to realize foregoing invention purpose, the invention provides techniques below scheme:
A kind of business entity name analysis identification system, described system includes forward-backward recutrnce neural network module, described system
System uses the training sample of the enterprise name mark of storage in existing enterprise's name database to train forward-backward recutrnce neutral net,
Forward-backward recutrnce neural network recognization after having trained goes out the enterprise name in text to be identified, and will not belong to the most denominative
Enterprise name extracts as new firms title.Described system uses enterprise's name of storage in existing enterprise's name database
When claiming mark training sample, the enterprise name segmentation in sample is labeled as: beginning, mid portion and latter end, will
Be not belonging to enterprise name is labeled as irrelevant portions.
Concrete: described forward-backward recutrnce neural network module, use following forward algorithm formula:
I is word or the dimension of word of vectorization, and H is the neuron number of hidden layer, and K is the individual of output layer neuron
Number, whereinThe input of the hidden layer neuron of forward-backward recutrnce neutral net described in t when inputting for forward,For the most defeated
The input of the output layer neuron of forward-backward recutrnce neutral net described in fashionable t,T hidden layer god when inputting for forward
Through the output of unit,For the output of t hidden layer neuron during reversely input, θ () is non-linear for hidden layer neuron
Excitation function,For the input of t output layer neuron,For the output of t output layer neuron,Be one general
Rate value, represents the output valve ratio relative to K neuron output value summation of kth neuron;WithIt it is each dimension
Value is the vector of 0, and wherein T is the length of input word sequence.
Described forward-backward recutrnce neutral net, when predicting the classification of each moment input vector data, combines forward and reverse propagation
Time this moment neutral net hidden layer neuron output signal;During forward and reverse propagation, each moment neutral net hidden layer is neural
The input signal of unit also includes the output letter of a upper moment hidden layer neuron in addition to comprising the word of vectorization, word signal
Number.
Described system belongs to enterprise name beginning, in K by adjacent in forward-backward recutrnce neural network prediction result
Between the part words the most corresponding with latter end extract as enterprise name, wherein K is the integer of >=0.
Further, described system includes word-dividing mode, and described word-dividing mode is to existing enterprise's title and pending text
Carrying out participle, described pending text includes training sample and text to be identified.
Preferred as one, described word-dividing mode is stanford-segmenter segmenter.
Further, described system includes dictionary mapping block, and described dictionary mapping block will pass through in text to be identified
Word, word or punctuate after word segmentation processing inputs in described forward-backward recutrnce neutral net after changing into vector data.
Further, described recurrent neural network module be loaded with the computer of above-mentioned functions program, server or
Mobile intelligent terminal.
Further, described system is to be loaded with the computer of said procedure function, server or mobile intelligent terminal.
Compared with prior art, beneficial effects of the present invention: the present invention provides name analysis identification system of one business entity
System, utilizes existing enterprise name data mark sample to train described forward-backward recutrnce neutral net, passes through recurrent neural network
Being predicted the enterprise dominant title in text, find the enterprise name in pending text, onestep extraction of going forward side by side makes new advances
Enterprise name.In a forward algorithm, first text sequence the most successively forward is inputted described recurrent neural during use
In network, more reversely it is input to described recurrent neural network from tail to head;Each moment during forward and reverse input
The input signal of forward-backward recutrnce neutral net also includes the output signal of a moment recurrent neural network.So in prediction enterprise
Not only relied on information but also relied on hereinafter information above during principal name, it was predicted that result achieve global optimization, identification can
Higher by property.And by the processing mode of forward-backward recutrnce neutral net, it is not necessary to feature templates is manually set, saves manpower and lead to
More preferable by property, can find and extract enterprise name in various types of texts, the recall rate of identification is more traditional rule-based
Processing method significantly improve.The present invention, on the basis of finding enterprise name, contrasts existing enterprise's name database, will not belong to
Enterprise name in available data is defined as newfound enterprise name, adds in enterprise name data base, utilizes the present invention
System quickly finds new firms title in magnanimity internet data information, has provided for catching in time of relevant information
Power instrument.
Accompanying drawing illustrates:
Fig. 1 is this business entity name analysis identification system function module connection diagram.
Fig. 2 is the step schematic diagram realizing business entity's title identification of this business entity name analysis identification system.
Fig. 3 be this business entity name analysis identification system embodiment 1 realize signal flow schematic diagram.
Should be understood that description of the invention accompanying drawing is only schematically, do not represent real embodiment.
Detailed description of the invention
Below in conjunction with test example and detailed description of the invention, the present invention is described in further detail.But this should not understood
Scope for the above-mentioned theme of the present invention is only limitted to below example, and all technology realized based on present invention belong to this
The scope of invention.
A kind of business entity name analysis identification system is provided.Present system utilizes existing enterprise name data to mark
Sample training forward-backward recutrnce neural network module, carries out pre-by recurrent neural network to the enterprise dominant title in text
Survey, find the enterprise name in pending text, on the basis of analyzing enterprise name, contrast existing enterprise namebase, will
The name do not included in existing enterprise's title referred to as new firms title is stored in data base.Present system, uses existing
Enterprise name data base in data carry out automatic marking training sample, be greatly saved during neutral net uses manually
The time cost of mark sample so that the use process of neutral net more simplifies.Moreover present system is by two-way
Not only relied on information but also relied on hereinafter information above when recurrent neural network module predicts enterprise dominant title, it was predicted that knot
Fruit achieves global optimization, and the reliability of identification is higher, and without manually arranging feature templates, can be at various types of texts
Middle discovery also extracts new firms title, provides technical support for analyzing in time of relevant information.
A kind of business entity name analysis identification system, described system includes forward-backward recutrnce neural network module, described system
System uses the training sample of the enterprise name mark of storage in existing enterprise's name database to train forward-backward recutrnce neutral net,
Forward-backward recutrnce neural network recognization after having trained goes out the enterprise name in text to be identified, and will not belong to the most denominative
Enterprise name extracts as new firms title.Described system uses enterprise's name of storage in existing enterprise's name database
When claiming mark training sample, the enterprise name segmentation in sample is labeled as: beginning, mid portion and latter end, will
Be not belonging to enterprise name is labeled as irrelevant portions.Described system by forward-backward recutrnce neural network prediction result by adjacent genus
In enterprise name beginning, words that K mid portion is corresponding with latter end extract as enterprise name, wherein K
For the integer of >=0.
Present system realizes new spectra entity name and automatically analyzes, and comprises following steps as described in Figure 2:
(1) choose the text comprising enterprise name of some (such as 5000), and utilize existing business data
Enterprise name field in text is carried out automatic marking, and according to the concrete condition of enterprise name, by enterprise name segmentation mark
Note is beginning, mid portion and latter end.The part that other are not belonging to enterprise name is labeled as irrelevant portions.Specifically
, the enterprise in text or organization name segmentation are labeled as B (beginning), M (mid portion) and E (latter end),
Other are not belonging to enterprise or institutional label character is N (nonbusiness's title), use letter or numeral to carry out labelling
Word sequence, simple and be easily handled, the operation for follow-up correlated series provides convenient.Existing enterprise's data are used automatically to mark
Note sample, and then carry out the training of neutral net, it is greatly saved neutral net and the most manually marks the people of sample
Power and time cost, simplify the application process of nerual network technique.
(2) by the word sequence in the training sample of handmarking successively forward be reversely input to described two-way pass
Return in neutral net, train described forward-backward recutrnce neutral net;(input of described forward refers to, by the word in sequence or word, press
According in the recurrent neural network sequentially inputting the corresponding moment before and after position smoothly, described reverse input refers to the word in sequence
Or word inverted order sequentially inputs in the recurrent neural net in corresponding moment) the described two-way input returning each current time of neutral net
Signal also includes that the output signal of forward-backward recutrnce neutral net described in the moment, forward and reverse information are conveyed into and all terminate
After, stop recurrence.
(3) during the word sequence being analysed in document is input to described forward-backward recutrnce neutral net, through described two-way
Recurrent neural network to input word sequence classify, identify respectively word sequence to be extracted type (N, B, M or
E), the word that the B M E sequence pair between in classification results two adjacent N is answered is extracted as enterprise name entirety.
(4) on the basis of realizing text enterprise to be identified name extraction, by the enterprise name extracted and existing enterprise
In name database, the enterprise name of storage contrasts, using the enterprise name do not included in data base as new firms title
Save, for used by data analysis.Concrete: described forward-backward recutrnce neural network module, use following forward algorithm formula:
I is the word in word sequence or the dimension after term vector, and H is the neuron number of hidden layer, and K is output layer
The number of neuron, whereinDuring for forward input (word sequence forward input neural network), forward-backward recutrnce god described in t
Input (the moment sequence number of forward-backward recutrnce neutral net described in present system and input literary composition through the hidden layer neuron of network
The position number of word sequence is corresponding, is in word or the word of the 3rd position, the 3rd moment of corresponding input in such as institute's word sequence
Forward-backward recutrnce neutral net in),During for reversely input (the reverse input neural network of word sequence), two-way described in t
The input of the output layer neuron of recurrent neural network,The output of t hidden layer neuron when inputting for forward,For
The reversely output of t hidden layer neuron during input, θ () is the nonlinear activation function for hidden layer neuron,During for t
Carve the input of output layer neuron, it can be seen thatCombine t forward input time hidden layer neuron output signal and
The reversely output signal of hidden layer neuron during input),Result of calculation go ahead propagation until described forward-backward recutrnce
Neutral net exports the classification results in this moment;So both combined when calculating the classification results of current time correspondence word or word
Historical series information combines again following sequence information, has relied on the contextual information of whole text and non-local information, from
And make to predict the outcome and reached global optimum.For the output of t output layer neuron,It is a probit, represents
The output valve of kth neuron, relative to the ratio of K neuron output value summation, generally takesMaximum neuron is corresponding
It is categorized as the final classification of forward-backward recutrnce neural network prediction described in this moment.WithBe each dimension values be 0 to
Amount, T is the length of list entries.
The mode of present system use forward-backward recutrnce neutral net is when predicting enterprise name, in a forward algorithm
First text sequence the most successively forward is inputted in described recurrent neural network, then from tail to head be reversely input to described in pass
Return in neutral net;When during forward and reverse input, the input signal of each moment forward-backward recutrnce neutral net includes this
Carve the word of vectorization or word signal and the output signal of a upper moment recurrent neural network, only described two-way when reversely input
Recurrent neural network just exports this moment correspondence word or classification results of word.So both relied on when predicting enterprise dominant title
Information has relied on again hereinafter information above, it was predicted that result for achieving global optimization, the reliability of identification is higher.And it is logical
Cross the processing mode of forward-backward recutrnce neutral net, it is not necessary to feature templates is manually set, saves manpower and versatility is more preferable, Ke Yi
Finding and extract enterprise name in various types of texts, the more traditional rule-based processing method of recall rate of identification significantly carries
High.
Further, the present invention uses above-mentioned forward algorithm successively to transmit computing in described forward-backward recutrnce neutral net
Data, get identification (prediction) data at output layer, when the annotation results with training sample that predicts the outcome has deviation, logical
Cross error backpropagation algorithm classical in neutral net to each weight adjusting in neutral net, error back propagation method
Error back propagation step by step is shared all neurons of each layer, it is thus achieved that the error signal of each layer neuron, and then revise each
The weight of neuron.Successively transmitted operational data by forward algorithm, and gradually revised each neuron by backward algorithm
The process of weight is exactly the training process of neutral net;Repeat said process, until the accuracy predicted the outcome reaches setting
Threshold value, deconditioning, now it is believed that described forward-backward recutrnce neural network model is the most trained completes.
Further, described system includes word-dividing mode, and described word-dividing mode is to existing enterprise's title and pending text
Carrying out participle, described pending text includes training sample and text to be identified.
Preferred as one, described word-dividing mode is stanford-segmenter segmenter.It is currently available that participle work
Tool a lot of such as: stanford-segmenter segmenter, ICTCLAS, Pan Gu's participle, cook's segmenter ... pass through participle
Longer content of text is resolved into relatively independent words unit, makes pending content of text discretization, serializing, for recurrence
The application of neutral net provides basis, and stanford-segmenter segmenter participle effect is preferable.
Further, described system includes dictionary mapping block, and described dictionary mapping block will pass through in text to be identified
Word, word or punctuate after word segmentation processing inputs after changing into vector data in described forward-backward recutrnce neutral net, described dictionary
Mapping block includes that dictionary mapping table, described dictionary mapping table are a two-dimensional matrix, corresponding one of each of which row vector
Word, word or punctuation mark, row vector and the corresponding relation of word, word or punctuation mark are for set during structure dictionary mapping table
(native system can use functional module attachment structure as shown in Figure 1).
Further, described recurrent neural network module be loaded with the computer of above-mentioned functions program, server or
Mobile intelligent terminal.
Further, described system is to be loaded with the computer of said procedure function, server or mobile intelligent terminal.
The realization that described computer, server or mobile intelligent terminal are systemic-function provides hardware foundation.
Embodiment 1
Native system new spectra name discovery procedure as follows: the such as following newsletter archive at Network Capture: " XXXX
On March 15, in is announced, the 5th meeting of the 7th board of directors of company, has approved and " has invested about company and wholly-owned subsidiary
Set up the proposal of subsidiary ", six wholly-owned subsidiaries that company intends setting up be respectively ABCD medical treatment management of investment company limited,
ABCD electronic medicine business company limited, ABCD investment funds Management Co., Ltd, ABCD new forms of energy company limited, ABCD basis
Facility investment company limited, ABCD Investment Co., Ltd.Investment amount: the gross investment amount of money converts into RMB and is about 6.3 hundred million yuan.”
Obtain through participle: " XXXX/March/15 day/bulletin/,/company/7th// board of directors/5th/time/meeting/,/review/
Passed through/"/about/company/and/wholly-owned/subsidiary/investment/establishment/subsidiary// proposal/"/,/company/plan/establishment/
/ six/wholly-owned/subsidiary/be respectively/AB/CD/ medical treatment/management of investment/company limited/,/AB/CD/ medicine/electronics business
Business/company limited/,/AB/CD/ investment/fund management/company limited/,/AB/CD/ new forms of energy/company limited/,/AB/CD/ base
Infrastructure/investment/company limited/,/AB/CD/ investment/company limited/./ investment/the amount of money/:/total/investment/amount of money/equivalent/people
People's coin/about/6.3 hundred million yuan/." by the above-mentioned word sequence formed through participle, be input in described forward-backward recutrnce nerve, warp
Cross the prediction of described recurrent neural network, output: " NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNMMMMENMMM
MENMMMMENMMMENMMMMENMMMMENNNNNNNNNNNN " by the MMMME in sorting sequence, MMMME, MMMME, MMME,
The word sequence that MMMME, MMMME are corresponding: " ABCD medical treatment management of investment company limited ", " the limited public affairs of ABCD electronic medicine business
Department ", " ABCD investment funds Management Co., Ltd ", " ABCD new forms of energy company limited ", " the limited public affairs of ABCD infrastructure investment
Department ", " ABCD Investment Co., Ltd " extract as enterprise name entirety.The present embodiment realizes the signal that enterprise name is extracted
Flow process (wherein vec-a, vec-b, vec-c, vec-d, vec-e, vec-f, vec-g, vec-h, vec-i, vec-as shown in Figure 3
J, vec-k, vec-l, vec-m ... vec-z etc. represent the row vector of two-dimensional matrix in dictionary mapping table) by said extracted out
Enterprise name and existing enterprise name data base contrast, if above-mentioned enterprise name does not exist and existing enterprise name
Claim in data base, then above-mentioned enterprise name being added as new firms entering in existing enterprise's name database, for relevant
Data analysis provides basis.
Claims (10)
1. business entity's name analysis identification system, described system includes forward-backward recutrnce neural network module, and its feature exists
In: described system uses the training sample of the enterprise name mark of storage in existing enterprise's name database to train forward-backward recutrnce
Neutral net, the forward-backward recutrnce neural network recognization after having trained goes out the enterprise name in text to be identified, and will not belong to
Existing denominative enterprise name extracts as new firms title.
2. the system as claimed in claim 1, it is characterised in that: described system uses storage in existing enterprise's name database
During enterprise name mark training sample, the enterprise name segmentation in sample is labeled as: beginning, mid portion and end portion
Point, will not belong to enterprise name is labeled as irrelevant portions.
3. system as claimed in claim 2, it is characterised in that: described forward-backward recutrnce neural network module, use the most forward
Algorithmic formula:
I is word or the dimension of word of vectorization, and H is the neuron number of hidden layer, and K is the number of output layer neuron, its
InThe input of the hidden layer neuron of forward-backward recutrnce neutral net described in t when inputting for forward,For t during reversely input
The input of the output layer neuron of forward-backward recutrnce neutral net described in the moment,T hidden layer neuron when inputting for forward
Output,For the output of t hidden layer neuron during reversely input, θ () is the non-linear excitation for hidden layer neuron
Function,For the input of t output layer neuron,For the output of t output layer neuron,It is a probit,
Represent the output valve ratio relative to K neuron output value summation of kth neuron;WithIt is that each dimension values is equal
Being the vector of 0, wherein T is the length of input word sequence.
4. system as claimed in claim 3, it is characterised in that: described forward-backward recutrnce neutral net predict the input of each moment to
When measuring the classification of data, the output signal of this moment neutral net hidden layer neuron when combining forward and reverse propagation;Forward and reverse
During propagation, the input signal of each moment neutral net hidden layer neuron is also wrapped in addition to comprising the word of vectorization, word signal
Include the output signal of a moment hidden layer neuron.
5. system as claimed in claim 4, it is characterised in that: described system is by phase in forward-backward recutrnce neural network prediction result
Adjacent belong to enterprise name beginning, words that K mid portion is corresponding with latter end extracts as enterprise name.
6. the system as described in one of claim 1 to 5, it is characterised in that: described system includes word-dividing mode, described participle mould
Block carries out participle to existing enterprise's title and pending text, and described pending text includes training sample and text to be identified.
7. system as claimed in claim 6, it is characterised in that: described word-dividing mode is stanford-segmenter participle
Device.
8. system as claimed in claim 6, it is characterised in that: described system includes dictionary mapping block, and described dictionary maps
Module inputs described two-way pass after word, word or punctuate after word segmentation processing in text to be identified is changed into vector data
Return in neutral net.
9. system as claimed in claim 8, it is characterised in that: described recurrent neural network module is for loading just like claim
One of 1 to 4 computer of described function program, server or mobile intelligent terminal.
10. system as claimed in claim 9, it is characterised in that: described system is described for being loaded with one of claim 1 to 8
The computer of program function, server or mobile intelligent terminal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610286191.1A CN105975456A (en) | 2016-05-03 | 2016-05-03 | Enterprise entity name analysis and identification system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610286191.1A CN105975456A (en) | 2016-05-03 | 2016-05-03 | Enterprise entity name analysis and identification system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105975456A true CN105975456A (en) | 2016-09-28 |
Family
ID=56994292
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610286191.1A Pending CN105975456A (en) | 2016-05-03 | 2016-05-03 | Enterprise entity name analysis and identification system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105975456A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779467A (en) * | 2016-12-31 | 2017-05-31 | 成都数联铭品科技有限公司 | Enterprises ' industry categorizing system based on automatic information screening |
CN106777336A (en) * | 2017-01-13 | 2017-05-31 | 深圳爱拼信息科技有限公司 | A kind of exabyte composition extraction system and method based on deep learning |
CN109165275A (en) * | 2018-07-24 | 2019-01-08 | 国网浙江省电力有限公司电力科学研究院 | Intelligent substation operation order information intelligent search matching process based on deep learning |
WO2019041529A1 (en) * | 2017-08-31 | 2019-03-07 | 平安科技(深圳)有限公司 | Method, electronic apparatus, and computer readable storage medium for identifying company as subject of news report |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104615589A (en) * | 2015-02-15 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Named-entity recognition model training method and named-entity recognition method and device |
CN104615983A (en) * | 2015-01-28 | 2015-05-13 | 中国科学院自动化研究所 | Behavior identification method based on recurrent neural network and human skeleton movement sequences |
CN104952448A (en) * | 2015-05-04 | 2015-09-30 | 张爱英 | Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks |
US9263036B1 (en) * | 2012-11-29 | 2016-02-16 | Google Inc. | System and method for speech recognition using deep recurrent neural networks |
-
2016
- 2016-05-03 CN CN201610286191.1A patent/CN105975456A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9263036B1 (en) * | 2012-11-29 | 2016-02-16 | Google Inc. | System and method for speech recognition using deep recurrent neural networks |
CN104615983A (en) * | 2015-01-28 | 2015-05-13 | 中国科学院自动化研究所 | Behavior identification method based on recurrent neural network and human skeleton movement sequences |
CN104615589A (en) * | 2015-02-15 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Named-entity recognition model training method and named-entity recognition method and device |
CN104952448A (en) * | 2015-05-04 | 2015-09-30 | 张爱英 | Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks |
Non-Patent Citations (3)
Title |
---|
ALEX GRAVES ET AL.: "Speech recognition with deep recurrent neural networks", 《2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS》 * |
JASON P.C. CHIU ET AL.: "Named Entity Recognition with Bidirectional LSTM-CNNs", 《ARXIV:1511.08308V1》 * |
胡新辰: "基于LSTM的语义关系分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779467A (en) * | 2016-12-31 | 2017-05-31 | 成都数联铭品科技有限公司 | Enterprises ' industry categorizing system based on automatic information screening |
CN106777336A (en) * | 2017-01-13 | 2017-05-31 | 深圳爱拼信息科技有限公司 | A kind of exabyte composition extraction system and method based on deep learning |
WO2019041529A1 (en) * | 2017-08-31 | 2019-03-07 | 平安科技(深圳)有限公司 | Method, electronic apparatus, and computer readable storage medium for identifying company as subject of news report |
CN109165275A (en) * | 2018-07-24 | 2019-01-08 | 国网浙江省电力有限公司电力科学研究院 | Intelligent substation operation order information intelligent search matching process based on deep learning |
CN109165275B (en) * | 2018-07-24 | 2021-03-02 | 国网浙江省电力有限公司电力科学研究院 | Intelligent substation operation ticket information intelligent search matching method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105976056A (en) | Information extraction system based on bidirectional RNN | |
CN105955952A (en) | Information extraction method based on bidirectional recurrent neural network | |
CN105975555A (en) | Enterprise abbreviation extraction method based on bidirectional recurrent neural network | |
CN111325029B (en) | Text similarity calculation method based on deep learning integrated model | |
CN105975455A (en) | information analysis system based on bidirectional recurrent neural network | |
CN106886580B (en) | Image emotion polarity analysis method based on deep learning | |
CN107943911A (en) | Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing | |
CN113312461A (en) | Intelligent question-answering method, device, equipment and medium based on natural language processing | |
CN111309910A (en) | Text information mining method and device | |
CN103207855A (en) | Fine-grained sentiment analysis system and method specific to product comment information | |
CN106682089A (en) | RNNs-based method for automatic safety checking of short message | |
US11741318B2 (en) | Open information extraction from low resource languages | |
CN105975457A (en) | Information classification prediction system based on full-automatic learning | |
CN105975456A (en) | Enterprise entity name analysis and identification system | |
CN109783644A (en) | A kind of cross-cutting emotional semantic classification system and method based on text representation study | |
Li et al. | Event extraction for criminal legal text | |
WO2023108985A1 (en) | Method for recognizing proportion of green asset and related product | |
CN112257444B (en) | Financial information negative entity discovery method, device, electronic equipment and storage medium | |
CN112685513A (en) | Al-Si alloy material entity relation extraction method based on text mining | |
CN112561718A (en) | Case microblog evaluation object emotion tendency analysis method based on BilSTM weight sharing | |
CN115309864A (en) | Intelligent sentiment classification method and device for comment text, electronic equipment and medium | |
CN114648029A (en) | Electric power field named entity identification method based on BiLSTM-CRF model | |
Ruan et al. | Effective learning model of user classification based on ensemble learning algorithms | |
CN115510188A (en) | Text keyword association method, device, equipment and storage medium | |
Touati-Hamad et al. | Arabic quran verses authentication using deep learning and word embeddings |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160928 |
|
WD01 | Invention patent application deemed withdrawn after publication |