CN105975456A - Enterprise entity name analysis and identification system - Google Patents

Enterprise entity name analysis and identification system Download PDF

Info

Publication number
CN105975456A
CN105975456A CN201610286191.1A CN201610286191A CN105975456A CN 105975456 A CN105975456 A CN 105975456A CN 201610286191 A CN201610286191 A CN 201610286191A CN 105975456 A CN105975456 A CN 105975456A
Authority
CN
China
Prior art keywords
prime
word
leftarrow
rightarrow
enterprise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610286191.1A
Other languages
Chinese (zh)
Inventor
刘世林
何宏靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Business Big Data Technology Co Ltd
Original Assignee
Chengdu Business Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Business Big Data Technology Co Ltd filed Critical Chengdu Business Big Data Technology Co Ltd
Priority to CN201610286191.1A priority Critical patent/CN105975456A/en
Publication of CN105975456A publication Critical patent/CN105975456A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of natural language processing, in particular to an enterprise entity name analyzing and identifying system which comprises a bidirectional recurrent neural network module. The system automatically learns the characteristics of basic elements of the text, such as characters, words, punctuation marks and the like, and applies the RNN with bidirectional propagation to ensure that the classification judgment result of the natural language sequence to be recognized depends on context information, and the preparation rate of extraction and judgment is higher.

Description

A kind of business entity name analysis identification system
Technical field
The present invention relates to natural language processing field, particularly to a kind of business entity name analysis identification system.
Background technology
Along with the fast development of the Internet, create web data substantial amounts of, disclosed, the most therefore facilitated various based on The new industry of big data technique, such as the Internet medical treatment, Internet education, enterprise or individual's reference etc..These the Internets The rise of industry be unable to do without substantial amounts of information data analysis in prosperity, and the value of information analysis be accurate and sharp, sharp Analyze require find new information quickly;But directly getting data major part from webpage is all destructuring , in order to use these data, data cleansing work Cheng Liaoge major company expends the most place of time energy.And data cleansing Central customizing messages extracts, and the extraction particularly naming entity is again recurrent thing, such as does enterprise's reference, most common Task be exactly in the middle of big length text, extract the name of company.
In addition to the common rule according to " provinces and cities+keyword+industry+type of organization " is named, there is also a large amount of Exception, such as exabyte do not use provinces and cities as beginning, or in informal text, exabyte may to write a Chinese character in simplified form, The mode of abbreviation occurs, this recall rate that directly results in the information analysis using traditional mode to carry out is the highest.In addition with The prosperity of market economy, the enterprise dominant newly increased constantly occurs, new main market players also can occur in therewith various respectively In the network data of sample or Media News, find fast and accurately from the webpage information of magnanimity and extract the mechanism's name made new advances Claim, the promptness of Analysis on Issues Related is had to the meaning of particular importance.
Traditional natural language processing method uses condition random field (CRF) that text is carried out Series Modeling, carries out text Analyze and identify and find exabyte.Use condition random field, it is necessary first to carry out design construction according to the feature of entity to be identified special Levying template, feature templates includes the single order word of specified window size context or multistage phrase, the prefix of word, suffix, part of speech The state features such as mark;The structure of feature templates takes time and effort very much, and recognition result is very big to the degree of dependence of feature templates, and The feature templates manually arranged is often only in accordance with the feature of part sample, poor universality;And be typically only capable to use the upper of local Context information, the use of each feature templates is also separate, it was predicted that can not rely on longer historic state information, also without Method utilizes the information feedback in longer future to correct possible history mistake;Prediction process wastes time and energy, it was predicted that result is difficult to reality Existing global optimum.
In magnanimity information, analyze, in order in time sharp, the information agent made new advances, research and develop and a set of can find in time and search The system of collection new firms title is of great value.
Summary of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of prior art, the present invention provides a kind of business entity Name analysis identification system, utilizes existing enterprise name data mark sample to train described forward-backward recutrnce neutral net, leads to Cross recurrent neural network the enterprise dominant title in text is predicted, find the enterprise name in pending text, and Extract new firms title further.
In order to realize foregoing invention purpose, the invention provides techniques below scheme:
A kind of business entity name analysis identification system, described system includes forward-backward recutrnce neural network module, described system System uses the training sample of the enterprise name mark of storage in existing enterprise's name database to train forward-backward recutrnce neutral net, Forward-backward recutrnce neural network recognization after having trained goes out the enterprise name in text to be identified, and will not belong to the most denominative Enterprise name extracts as new firms title.Described system uses enterprise's name of storage in existing enterprise's name database When claiming mark training sample, the enterprise name segmentation in sample is labeled as: beginning, mid portion and latter end, will Be not belonging to enterprise name is labeled as irrelevant portions.
Concrete: described forward-backward recutrnce neural network module, use following forward algorithm formula:
a h → t = Σ i I w i h → x i t + Σ h ′ → H w h → h ′ → b h ′ → t - 1
b h → t = θ ( a h → t )
a h ← t = Σ i I w i h ← x i t + Σ h ′ ← H w h ← h ′ ← b h ′ ← t + 1
b h ← t = θ ( a h ← t )
a k t = Σ h ′ → H w h ′ → k b h ′ → t + Σ h ′ ← H w h ′ ← k b h ′ ← t
y k t = exp ( a k t ) Σ k ′ K exp ( a k ′ t )
I is word or the dimension of word of vectorization, and H is the neuron number of hidden layer, and K is the individual of output layer neuron Number, whereinThe input of the hidden layer neuron of forward-backward recutrnce neutral net described in t when inputting for forward,For the most defeated The input of the output layer neuron of forward-backward recutrnce neutral net described in fashionable t,T hidden layer god when inputting for forward Through the output of unit,For the output of t hidden layer neuron during reversely input, θ () is non-linear for hidden layer neuron Excitation function,For the input of t output layer neuron,For the output of t output layer neuron,Be one general Rate value, represents the output valve ratio relative to K neuron output value summation of kth neuron;WithIt it is each dimension Value is the vector of 0, and wherein T is the length of input word sequence.
Described forward-backward recutrnce neutral net, when predicting the classification of each moment input vector data, combines forward and reverse propagation Time this moment neutral net hidden layer neuron output signal;During forward and reverse propagation, each moment neutral net hidden layer is neural The input signal of unit also includes the output letter of a upper moment hidden layer neuron in addition to comprising the word of vectorization, word signal Number.
Described system belongs to enterprise name beginning, in K by adjacent in forward-backward recutrnce neural network prediction result Between the part words the most corresponding with latter end extract as enterprise name, wherein K is the integer of >=0.
Further, described system includes word-dividing mode, and described word-dividing mode is to existing enterprise's title and pending text Carrying out participle, described pending text includes training sample and text to be identified.
Preferred as one, described word-dividing mode is stanford-segmenter segmenter.
Further, described system includes dictionary mapping block, and described dictionary mapping block will pass through in text to be identified Word, word or punctuate after word segmentation processing inputs in described forward-backward recutrnce neutral net after changing into vector data.
Further, described recurrent neural network module be loaded with the computer of above-mentioned functions program, server or Mobile intelligent terminal.
Further, described system is to be loaded with the computer of said procedure function, server or mobile intelligent terminal.
Compared with prior art, beneficial effects of the present invention: the present invention provides name analysis identification system of one business entity System, utilizes existing enterprise name data mark sample to train described forward-backward recutrnce neutral net, passes through recurrent neural network Being predicted the enterprise dominant title in text, find the enterprise name in pending text, onestep extraction of going forward side by side makes new advances Enterprise name.In a forward algorithm, first text sequence the most successively forward is inputted described recurrent neural during use In network, more reversely it is input to described recurrent neural network from tail to head;Each moment during forward and reverse input The input signal of forward-backward recutrnce neutral net also includes the output signal of a moment recurrent neural network.So in prediction enterprise Not only relied on information but also relied on hereinafter information above during principal name, it was predicted that result achieve global optimization, identification can Higher by property.And by the processing mode of forward-backward recutrnce neutral net, it is not necessary to feature templates is manually set, saves manpower and lead to More preferable by property, can find and extract enterprise name in various types of texts, the recall rate of identification is more traditional rule-based Processing method significantly improve.The present invention, on the basis of finding enterprise name, contrasts existing enterprise's name database, will not belong to Enterprise name in available data is defined as newfound enterprise name, adds in enterprise name data base, utilizes the present invention System quickly finds new firms title in magnanimity internet data information, has provided for catching in time of relevant information Power instrument.
Accompanying drawing illustrates:
Fig. 1 is this business entity name analysis identification system function module connection diagram.
Fig. 2 is the step schematic diagram realizing business entity's title identification of this business entity name analysis identification system.
Fig. 3 be this business entity name analysis identification system embodiment 1 realize signal flow schematic diagram.
Should be understood that description of the invention accompanying drawing is only schematically, do not represent real embodiment.
Detailed description of the invention
Below in conjunction with test example and detailed description of the invention, the present invention is described in further detail.But this should not understood Scope for the above-mentioned theme of the present invention is only limitted to below example, and all technology realized based on present invention belong to this The scope of invention.
A kind of business entity name analysis identification system is provided.Present system utilizes existing enterprise name data to mark Sample training forward-backward recutrnce neural network module, carries out pre-by recurrent neural network to the enterprise dominant title in text Survey, find the enterprise name in pending text, on the basis of analyzing enterprise name, contrast existing enterprise namebase, will The name do not included in existing enterprise's title referred to as new firms title is stored in data base.Present system, uses existing Enterprise name data base in data carry out automatic marking training sample, be greatly saved during neutral net uses manually The time cost of mark sample so that the use process of neutral net more simplifies.Moreover present system is by two-way Not only relied on information but also relied on hereinafter information above when recurrent neural network module predicts enterprise dominant title, it was predicted that knot Fruit achieves global optimization, and the reliability of identification is higher, and without manually arranging feature templates, can be at various types of texts Middle discovery also extracts new firms title, provides technical support for analyzing in time of relevant information.
A kind of business entity name analysis identification system, described system includes forward-backward recutrnce neural network module, described system System uses the training sample of the enterprise name mark of storage in existing enterprise's name database to train forward-backward recutrnce neutral net, Forward-backward recutrnce neural network recognization after having trained goes out the enterprise name in text to be identified, and will not belong to the most denominative Enterprise name extracts as new firms title.Described system uses enterprise's name of storage in existing enterprise's name database When claiming mark training sample, the enterprise name segmentation in sample is labeled as: beginning, mid portion and latter end, will Be not belonging to enterprise name is labeled as irrelevant portions.Described system by forward-backward recutrnce neural network prediction result by adjacent genus In enterprise name beginning, words that K mid portion is corresponding with latter end extract as enterprise name, wherein K For the integer of >=0.
Present system realizes new spectra entity name and automatically analyzes, and comprises following steps as described in Figure 2:
(1) choose the text comprising enterprise name of some (such as 5000), and utilize existing business data Enterprise name field in text is carried out automatic marking, and according to the concrete condition of enterprise name, by enterprise name segmentation mark Note is beginning, mid portion and latter end.The part that other are not belonging to enterprise name is labeled as irrelevant portions.Specifically , the enterprise in text or organization name segmentation are labeled as B (beginning), M (mid portion) and E (latter end), Other are not belonging to enterprise or institutional label character is N (nonbusiness's title), use letter or numeral to carry out labelling Word sequence, simple and be easily handled, the operation for follow-up correlated series provides convenient.Existing enterprise's data are used automatically to mark Note sample, and then carry out the training of neutral net, it is greatly saved neutral net and the most manually marks the people of sample Power and time cost, simplify the application process of nerual network technique.
(2) by the word sequence in the training sample of handmarking successively forward be reversely input to described two-way pass Return in neutral net, train described forward-backward recutrnce neutral net;(input of described forward refers to, by the word in sequence or word, press According in the recurrent neural network sequentially inputting the corresponding moment before and after position smoothly, described reverse input refers to the word in sequence Or word inverted order sequentially inputs in the recurrent neural net in corresponding moment) the described two-way input returning each current time of neutral net Signal also includes that the output signal of forward-backward recutrnce neutral net described in the moment, forward and reverse information are conveyed into and all terminate After, stop recurrence.
(3) during the word sequence being analysed in document is input to described forward-backward recutrnce neutral net, through described two-way Recurrent neural network to input word sequence classify, identify respectively word sequence to be extracted type (N, B, M or E), the word that the B M E sequence pair between in classification results two adjacent N is answered is extracted as enterprise name entirety.
(4) on the basis of realizing text enterprise to be identified name extraction, by the enterprise name extracted and existing enterprise In name database, the enterprise name of storage contrasts, using the enterprise name do not included in data base as new firms title Save, for used by data analysis.Concrete: described forward-backward recutrnce neural network module, use following forward algorithm formula:
a h → t = Σ i I w i h → x i t + Σ h ′ → H w h → h ′ → b h ′ → t - 1
b h → t = θ ( a h → t )
a h ← t = Σ i I w i h ← x i t + Σ h ′ ← H w h ← h ′ ← b h ′ ← t + 1
b h ← t = θ ( a h ← t )
a k t = Σ h ′ → H w h ′ → k b h ′ → t + Σ h ′ ← H w h ′ ← k b h ′ ← t
y k t = exp ( a k t ) Σ k ′ K exp ( a k ′ t )
I is the word in word sequence or the dimension after term vector, and H is the neuron number of hidden layer, and K is output layer The number of neuron, whereinDuring for forward input (word sequence forward input neural network), forward-backward recutrnce god described in t Input (the moment sequence number of forward-backward recutrnce neutral net described in present system and input literary composition through the hidden layer neuron of network The position number of word sequence is corresponding, is in word or the word of the 3rd position, the 3rd moment of corresponding input in such as institute's word sequence Forward-backward recutrnce neutral net in),During for reversely input (the reverse input neural network of word sequence), two-way described in t The input of the output layer neuron of recurrent neural network,The output of t hidden layer neuron when inputting for forward,For The reversely output of t hidden layer neuron during input, θ () is the nonlinear activation function for hidden layer neuron,During for t Carve the input of output layer neuron, it can be seen thatCombine t forward input time hidden layer neuron output signal and The reversely output signal of hidden layer neuron during input),Result of calculation go ahead propagation until described forward-backward recutrnce Neutral net exports the classification results in this moment;So both combined when calculating the classification results of current time correspondence word or word Historical series information combines again following sequence information, has relied on the contextual information of whole text and non-local information, from And make to predict the outcome and reached global optimum.For the output of t output layer neuron,It is a probit, represents The output valve of kth neuron, relative to the ratio of K neuron output value summation, generally takesMaximum neuron is corresponding It is categorized as the final classification of forward-backward recutrnce neural network prediction described in this moment.WithBe each dimension values be 0 to Amount, T is the length of list entries.
The mode of present system use forward-backward recutrnce neutral net is when predicting enterprise name, in a forward algorithm First text sequence the most successively forward is inputted in described recurrent neural network, then from tail to head be reversely input to described in pass Return in neutral net;When during forward and reverse input, the input signal of each moment forward-backward recutrnce neutral net includes this Carve the word of vectorization or word signal and the output signal of a upper moment recurrent neural network, only described two-way when reversely input Recurrent neural network just exports this moment correspondence word or classification results of word.So both relied on when predicting enterprise dominant title Information has relied on again hereinafter information above, it was predicted that result for achieving global optimization, the reliability of identification is higher.And it is logical Cross the processing mode of forward-backward recutrnce neutral net, it is not necessary to feature templates is manually set, saves manpower and versatility is more preferable, Ke Yi Finding and extract enterprise name in various types of texts, the more traditional rule-based processing method of recall rate of identification significantly carries High.
Further, the present invention uses above-mentioned forward algorithm successively to transmit computing in described forward-backward recutrnce neutral net Data, get identification (prediction) data at output layer, when the annotation results with training sample that predicts the outcome has deviation, logical Cross error backpropagation algorithm classical in neutral net to each weight adjusting in neutral net, error back propagation method Error back propagation step by step is shared all neurons of each layer, it is thus achieved that the error signal of each layer neuron, and then revise each The weight of neuron.Successively transmitted operational data by forward algorithm, and gradually revised each neuron by backward algorithm The process of weight is exactly the training process of neutral net;Repeat said process, until the accuracy predicted the outcome reaches setting Threshold value, deconditioning, now it is believed that described forward-backward recutrnce neural network model is the most trained completes.
Further, described system includes word-dividing mode, and described word-dividing mode is to existing enterprise's title and pending text Carrying out participle, described pending text includes training sample and text to be identified.
Preferred as one, described word-dividing mode is stanford-segmenter segmenter.It is currently available that participle work Tool a lot of such as: stanford-segmenter segmenter, ICTCLAS, Pan Gu's participle, cook's segmenter ... pass through participle Longer content of text is resolved into relatively independent words unit, makes pending content of text discretization, serializing, for recurrence The application of neutral net provides basis, and stanford-segmenter segmenter participle effect is preferable.
Further, described system includes dictionary mapping block, and described dictionary mapping block will pass through in text to be identified Word, word or punctuate after word segmentation processing inputs after changing into vector data in described forward-backward recutrnce neutral net, described dictionary Mapping block includes that dictionary mapping table, described dictionary mapping table are a two-dimensional matrix, corresponding one of each of which row vector Word, word or punctuation mark, row vector and the corresponding relation of word, word or punctuation mark are for set during structure dictionary mapping table (native system can use functional module attachment structure as shown in Figure 1).
Further, described recurrent neural network module be loaded with the computer of above-mentioned functions program, server or Mobile intelligent terminal.
Further, described system is to be loaded with the computer of said procedure function, server or mobile intelligent terminal. The realization that described computer, server or mobile intelligent terminal are systemic-function provides hardware foundation.
Embodiment 1
Native system new spectra name discovery procedure as follows: the such as following newsletter archive at Network Capture: " XXXX On March 15, in is announced, the 5th meeting of the 7th board of directors of company, has approved and " has invested about company and wholly-owned subsidiary Set up the proposal of subsidiary ", six wholly-owned subsidiaries that company intends setting up be respectively ABCD medical treatment management of investment company limited, ABCD electronic medicine business company limited, ABCD investment funds Management Co., Ltd, ABCD new forms of energy company limited, ABCD basis Facility investment company limited, ABCD Investment Co., Ltd.Investment amount: the gross investment amount of money converts into RMB and is about 6.3 hundred million yuan.” Obtain through participle: " XXXX/March/15 day/bulletin/,/company/7th// board of directors/5th/time/meeting/,/review/ Passed through/"/about/company/and/wholly-owned/subsidiary/investment/establishment/subsidiary// proposal/"/,/company/plan/establishment/ / six/wholly-owned/subsidiary/be respectively/AB/CD/ medical treatment/management of investment/company limited/,/AB/CD/ medicine/electronics business Business/company limited/,/AB/CD/ investment/fund management/company limited/,/AB/CD/ new forms of energy/company limited/,/AB/CD/ base Infrastructure/investment/company limited/,/AB/CD/ investment/company limited/./ investment/the amount of money/:/total/investment/amount of money/equivalent/people People's coin/about/6.3 hundred million yuan/." by the above-mentioned word sequence formed through participle, be input in described forward-backward recutrnce nerve, warp Cross the prediction of described recurrent neural network, output: " NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNMMMMENMMM MENMMMMENMMMENMMMMENMMMMENNNNNNNNNNNN " by the MMMME in sorting sequence, MMMME, MMMME, MMME, The word sequence that MMMME, MMMME are corresponding: " ABCD medical treatment management of investment company limited ", " the limited public affairs of ABCD electronic medicine business Department ", " ABCD investment funds Management Co., Ltd ", " ABCD new forms of energy company limited ", " the limited public affairs of ABCD infrastructure investment Department ", " ABCD Investment Co., Ltd " extract as enterprise name entirety.The present embodiment realizes the signal that enterprise name is extracted Flow process (wherein vec-a, vec-b, vec-c, vec-d, vec-e, vec-f, vec-g, vec-h, vec-i, vec-as shown in Figure 3 J, vec-k, vec-l, vec-m ... vec-z etc. represent the row vector of two-dimensional matrix in dictionary mapping table) by said extracted out Enterprise name and existing enterprise name data base contrast, if above-mentioned enterprise name does not exist and existing enterprise name Claim in data base, then above-mentioned enterprise name being added as new firms entering in existing enterprise's name database, for relevant Data analysis provides basis.

Claims (10)

1. business entity's name analysis identification system, described system includes forward-backward recutrnce neural network module, and its feature exists In: described system uses the training sample of the enterprise name mark of storage in existing enterprise's name database to train forward-backward recutrnce Neutral net, the forward-backward recutrnce neural network recognization after having trained goes out the enterprise name in text to be identified, and will not belong to Existing denominative enterprise name extracts as new firms title.
2. the system as claimed in claim 1, it is characterised in that: described system uses storage in existing enterprise's name database During enterprise name mark training sample, the enterprise name segmentation in sample is labeled as: beginning, mid portion and end portion Point, will not belong to enterprise name is labeled as irrelevant portions.
3. system as claimed in claim 2, it is characterised in that: described forward-backward recutrnce neural network module, use the most forward Algorithmic formula:
a h → t = Σ i I w i h → x i t + Σ h ′ → H w h → h ′ → b h ′ → t - 1
b h → t = θ ( a h → t )
a h ← t = Σ i I w i h ← x i t + Σ h ′ ← H w h ← h ′ ← b h ′ ← t - 1
b h ← t = θ ( a h ← t )
a k t = Σ h ′ → H w h ′ → k b h ′ → t + Σ h ′ ← H w h ′ ← k b h ′ ← t
y k t = exp ( a k t ) Σ k ′ K exp ( a k ′ t )
I is word or the dimension of word of vectorization, and H is the neuron number of hidden layer, and K is the number of output layer neuron, its InThe input of the hidden layer neuron of forward-backward recutrnce neutral net described in t when inputting for forward,For t during reversely input The input of the output layer neuron of forward-backward recutrnce neutral net described in the moment,T hidden layer neuron when inputting for forward Output,For the output of t hidden layer neuron during reversely input, θ () is the non-linear excitation for hidden layer neuron Function,For the input of t output layer neuron,For the output of t output layer neuron,It is a probit, Represent the output valve ratio relative to K neuron output value summation of kth neuron;WithIt is that each dimension values is equal Being the vector of 0, wherein T is the length of input word sequence.
4. system as claimed in claim 3, it is characterised in that: described forward-backward recutrnce neutral net predict the input of each moment to When measuring the classification of data, the output signal of this moment neutral net hidden layer neuron when combining forward and reverse propagation;Forward and reverse During propagation, the input signal of each moment neutral net hidden layer neuron is also wrapped in addition to comprising the word of vectorization, word signal Include the output signal of a moment hidden layer neuron.
5. system as claimed in claim 4, it is characterised in that: described system is by phase in forward-backward recutrnce neural network prediction result Adjacent belong to enterprise name beginning, words that K mid portion is corresponding with latter end extracts as enterprise name.
6. the system as described in one of claim 1 to 5, it is characterised in that: described system includes word-dividing mode, described participle mould Block carries out participle to existing enterprise's title and pending text, and described pending text includes training sample and text to be identified.
7. system as claimed in claim 6, it is characterised in that: described word-dividing mode is stanford-segmenter participle Device.
8. system as claimed in claim 6, it is characterised in that: described system includes dictionary mapping block, and described dictionary maps Module inputs described two-way pass after word, word or punctuate after word segmentation processing in text to be identified is changed into vector data Return in neutral net.
9. system as claimed in claim 8, it is characterised in that: described recurrent neural network module is for loading just like claim One of 1 to 4 computer of described function program, server or mobile intelligent terminal.
10. system as claimed in claim 9, it is characterised in that: described system is described for being loaded with one of claim 1 to 8 The computer of program function, server or mobile intelligent terminal.
CN201610286191.1A 2016-05-03 2016-05-03 Enterprise entity name analysis and identification system Pending CN105975456A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610286191.1A CN105975456A (en) 2016-05-03 2016-05-03 Enterprise entity name analysis and identification system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610286191.1A CN105975456A (en) 2016-05-03 2016-05-03 Enterprise entity name analysis and identification system

Publications (1)

Publication Number Publication Date
CN105975456A true CN105975456A (en) 2016-09-28

Family

ID=56994292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610286191.1A Pending CN105975456A (en) 2016-05-03 2016-05-03 Enterprise entity name analysis and identification system

Country Status (1)

Country Link
CN (1) CN105975456A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779467A (en) * 2016-12-31 2017-05-31 成都数联铭品科技有限公司 Enterprises ' industry categorizing system based on automatic information screening
CN106777336A (en) * 2017-01-13 2017-05-31 深圳爱拼信息科技有限公司 A kind of exabyte composition extraction system and method based on deep learning
CN109165275A (en) * 2018-07-24 2019-01-08 国网浙江省电力有限公司电力科学研究院 Intelligent substation operation order information intelligent search matching process based on deep learning
WO2019041529A1 (en) * 2017-08-31 2019-03-07 平安科技(深圳)有限公司 Method, electronic apparatus, and computer readable storage medium for identifying company as subject of news report

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615983A (en) * 2015-01-28 2015-05-13 中国科学院自动化研究所 Behavior identification method based on recurrent neural network and human skeleton movement sequences
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN104952448A (en) * 2015-05-04 2015-09-30 张爱英 Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks
US9263036B1 (en) * 2012-11-29 2016-02-16 Google Inc. System and method for speech recognition using deep recurrent neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9263036B1 (en) * 2012-11-29 2016-02-16 Google Inc. System and method for speech recognition using deep recurrent neural networks
CN104615983A (en) * 2015-01-28 2015-05-13 中国科学院自动化研究所 Behavior identification method based on recurrent neural network and human skeleton movement sequences
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN104952448A (en) * 2015-05-04 2015-09-30 张爱英 Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALEX GRAVES ET AL.: "Speech recognition with deep recurrent neural networks", 《2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS》 *
JASON P.C. CHIU ET AL.: "Named Entity Recognition with Bidirectional LSTM-CNNs", 《ARXIV:1511.08308V1》 *
胡新辰: "基于LSTM的语义关系分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779467A (en) * 2016-12-31 2017-05-31 成都数联铭品科技有限公司 Enterprises ' industry categorizing system based on automatic information screening
CN106777336A (en) * 2017-01-13 2017-05-31 深圳爱拼信息科技有限公司 A kind of exabyte composition extraction system and method based on deep learning
WO2019041529A1 (en) * 2017-08-31 2019-03-07 平安科技(深圳)有限公司 Method, electronic apparatus, and computer readable storage medium for identifying company as subject of news report
CN109165275A (en) * 2018-07-24 2019-01-08 国网浙江省电力有限公司电力科学研究院 Intelligent substation operation order information intelligent search matching process based on deep learning
CN109165275B (en) * 2018-07-24 2021-03-02 国网浙江省电力有限公司电力科学研究院 Intelligent substation operation ticket information intelligent search matching method based on deep learning

Similar Documents

Publication Publication Date Title
CN105976056A (en) Information extraction system based on bidirectional RNN
CN105955952A (en) Information extraction method based on bidirectional recurrent neural network
CN105975555A (en) Enterprise abbreviation extraction method based on bidirectional recurrent neural network
CN111325029B (en) Text similarity calculation method based on deep learning integrated model
CN105975455A (en) information analysis system based on bidirectional recurrent neural network
CN106886580B (en) Image emotion polarity analysis method based on deep learning
CN107943911A (en) Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
CN111309910A (en) Text information mining method and device
CN103207855A (en) Fine-grained sentiment analysis system and method specific to product comment information
CN106682089A (en) RNNs-based method for automatic safety checking of short message
US11741318B2 (en) Open information extraction from low resource languages
CN105975457A (en) Information classification prediction system based on full-automatic learning
CN105975456A (en) Enterprise entity name analysis and identification system
CN109783644A (en) A kind of cross-cutting emotional semantic classification system and method based on text representation study
Li et al. Event extraction for criminal legal text
WO2023108985A1 (en) Method for recognizing proportion of green asset and related product
CN112257444B (en) Financial information negative entity discovery method, device, electronic equipment and storage medium
CN112685513A (en) Al-Si alloy material entity relation extraction method based on text mining
CN112561718A (en) Case microblog evaluation object emotion tendency analysis method based on BilSTM weight sharing
CN115309864A (en) Intelligent sentiment classification method and device for comment text, electronic equipment and medium
CN114648029A (en) Electric power field named entity identification method based on BiLSTM-CRF model
Ruan et al. Effective learning model of user classification based on ensemble learning algorithms
CN115510188A (en) Text keyword association method, device, equipment and storage medium
CN105912720A (en) Method for analyzing emotion-involved text data in computer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160928

WD01 Invention patent application deemed withdrawn after publication