CN110287495A - A kind of power marketing profession word recognition method and system - Google Patents
A kind of power marketing profession word recognition method and system Download PDFInfo
- Publication number
- CN110287495A CN110287495A CN201910584443.2A CN201910584443A CN110287495A CN 110287495 A CN110287495 A CN 110287495A CN 201910584443 A CN201910584443 A CN 201910584443A CN 110287495 A CN110287495 A CN 110287495A
- Authority
- CN
- China
- Prior art keywords
- word
- power marketing
- professional
- identification model
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000012549 training Methods 0.000 claims abstract description 23
- 230000004044 response Effects 0.000 claims abstract description 7
- 239000000284 extract Substances 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 14
- 238000002372 labelling Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 8
- 238000003064 k means clustering Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000003860 storage Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 101001013832 Homo sapiens Mitochondrial peptide methionine sulfoxide reductase Proteins 0.000 description 1
- 102100031767 Mitochondrial peptide methionine sulfoxide reductase Human genes 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- BJORNXNYWNIWEY-UHFFFAOYSA-N tetrahydrozoline hydrochloride Chemical compound Cl.N1CCN=C1C1C2=CC=CC=C2CCC1 BJORNXNYWNIWEY-UHFFFAOYSA-N 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of power marketing profession word recognition methods, including step 1, are based on power marketing external data, training initial identification model;The professional word in power marketing data is extracted, initial professional word dictionary is constructed;Step 2, based on newest professional word dictionary and identification model, the word in power marketing data is labeled, and uses the power marketing data marked, trained new identification model;Step 3, it is less than threshold value in response to the number of iterations, power marketing data is identified based on new identification model, new professional word dictionary is constructed, goes to step 2, otherwise, exports new identification model and identified for professional word.Also disclose corresponding system.This method is directed on the basis of not a large amount of background corpus, returns mark by identification model prediction result, is promoted to the professional word recognition accuracy for being directed to power marketing data, is improved the working efficiency of electric power personnel.
Description
Technical field
The present invention relates to a kind of power marketing profession word recognition method and systems, belong to Chinese information processing technology field.
Background technique
With the development of electronic information, electric power data is multiplied, and there is a large amount of important letters in power marketing data
Breath.The identification of power marketing profession word is a basic and critical task in power domain, by the special of marketing data
Industry word extracts, it can be found that it is hidden in the information of data behind, such as fault detection and prevention, market operation situation etc..
It is mainly extracted at present using two ways for the professional word identification of power domain, one is be based on statistics
Method, the correlation calculated between word and word are extracted, and this method lays particular stress on the more term of frequency of occurrence, and accuracy rate is lower;
Another kind uses deep learning algorithm, and this method needs a large amount of artificial labeled data so that the heavy workload of staff and
Efficiency is lower.
Summary of the invention
The present invention provides a kind of power marketing profession word recognition method and systems, solve existing for existing identification technology
The above problem.
In order to solve the above-mentioned technical problem, the technical scheme adopted by the invention is that:
A kind of power marketing profession word recognition method, includes the following steps,
Step 1, power marketing external data, training initial identification model are based on;Extract the profession in power marketing data
Word constructs initial professional word dictionary;
Step 2, based on newest professional word dictionary and identification model, the word in power marketing data is labeled, and
With the power marketing data marked, the new identification model of training;
Step 3, it is less than threshold value in response to the number of iterations, power marketing data is identified based on new identification model,
The new professional word dictionary of building, goes to step 2, otherwise, exports new identification model and identifies for professional word.
Identification model is the BILSTM-CRF model being added from attention mechanism.
The process of the initial professional word dictionary of building is,
The professional word in power marketing data is extracted using left and right comentropy and K-Means clustering method;
The result that all methods extract and the domain lexicon that external power business personnel provides are merged, by artificial filter
After obtain initial professional word dictionary.
The process being labeled to the professional word in power marketing data is,
Power marketing data are identified with identification model, and the word in power marketing data is labeled;
The word for being labeled as non-physical is searched in professional word dictionary, and if it exists, then mark the word for being labeled as non-physical
For professional word.
According to professional word dictionary, Trie tree is constructed, the word for being labeled as non-physical is searched by Trie tree;Wherein, Trie tree
In each node indicate that each word in professional word dictionary, root node do not store any word.
A kind of power marketing profession word identifying system, including,
It constructs module: being based on power marketing external data, training initial identification model;It extracts special in power marketing data
Industry word constructs initial professional word dictionary;
It marks training module: based on newest professional word dictionary and identification model, the word in power marketing data being carried out
Mark, and the power marketing data marked are used, trained new identification model;
It returns mark module: being less than threshold value in response to the number of iterations, power marketing data are known based on new identification model
Not, new professional word dictionary is constructed, mark training module is gone to, otherwise, new identification model is exported and is identified for professional word.
Identification model is the BILSTM-CRF model being added from attention mechanism.
Constructing module includes initial professional word dictionary creation module, and initial profession word candidate's dictionary creation module includes profession
Word abstraction module and merging filtering module;
Professional word abstraction module: it is extracted using left and right comentropy and K-Means clustering method special in power marketing data
Industry word;
Merge filtering module: the result that all methods extract and the domain lexicon that external power business personnel provides are closed
And initial professional word dictionary is obtained after artificial filter.
Marking training module includes labeling module, and labeling module includes preliminary labeling module and modified module;
Preliminary labeling module: power marketing data are identified with identification model, the word in power marketing data is labeled;
Modified module: the word for being labeled as non-physical is searched in professional word dictionary, and if it exists, will then be labeled as non-physical
Word be labeled as professional word.
Modified module includes searching module;
Searching module: according to professional word candidate dictionary, Trie tree is constructed, is searched by Trie tree and is labeled as non-physical
Word;Wherein, each node indicates that each word in professional word dictionary, root node do not store any word in Trie tree.
Advantageous effects of the invention: this method is directed on the basis of not a large amount of background corpus, pass through identification
Model prediction result returns mark, is promoted to the professional word recognition accuracy for being directed to power marketing data, improves the work of electric power personnel
Efficiency.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is Trie tree structure diagram;
Fig. 3 is from attention mechanism BILSTM-CRF illustraton of model.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention
Technical solution, and not intended to limit the protection scope of the present invention.
As shown in Figure 1, a kind of power marketing profession word recognition method, comprising the following steps:
Step 1, power marketing external data, training initial identification model are based on;Extract the profession in power marketing data
Word constructs initial professional word dictionary.
Identification model is using the BILSTM-CRF model being added from attention mechanism, using power marketing external data as base
Plinth is trained to being added from the BILSTM-CRF model of attention mechanism, obtains initial identification model;Wherein outside power marketing
Portion's data include MSRA data, Money Data, power marketing external professional word dictionary, wherein power marketing external professional word dictionary
By power marketing, business personnel is provided.
The process of the initial professional word dictionary of building is as follows:
11) the professional word in power marketing data is extracted using left and right comentropy and K-Means clustering method.
Power marketing data are clustered using K-Means clustering method, on the basis of cluster, are closed using TF-IDF
Keyword extracts, and retains before each cluster ranking 5 word, constructs terminological dictionary Cluster_dict (cluster dictionary), as shown in table 1.
1 Cluster_dict table of table
Cluster name | Candidate profession word | Score |
Cluster 1 | Ammeter | 0.0274005 |
Cluster 2 | Ammeter | 0.0151421 |
Cluster 3 | Telegram in reply | 0.0162745 |
… | … | … |
Each word or so comentropy is calculated, with word for the center w, the set of words for appearing in the left side word w is α=(a1,
a2....as), appearing in the set of words on the right of word w is β=(b1,b2....bs), calculation formula is as follows:
Wherein, n1 indicates the number that w occurs in corpus, C (ai, w) and indicate a in corpusiThe number occurred simultaneously with word w,
C(w,bi) indicate word w and b in corpusiThe number occurred jointly, L-E (w) are left comentropy, and R-E (w) is right comentropy;
It selects comentropy lesser from the comentropy of left and right, then forming new phrase with word w, constructs terminological dictionary
Entropy_dict (entropy dictionary), as shown in table 2.
2 Entropy_dict table of table
Candidate profession word | Score |
Electricity payment | 0.288829687518 |
Way to pay dues | 0.035226661968 |
Urge expense notification sheet | 0.224742098999 |
… | … |
12) result that all methods extract and the domain lexicon that external power business personnel provides are merged, by artificial
Initial professional word dictionary is obtained after filtering;Cluster_dict, Entropy_dict and external power business personnel are provided
Domain lexicon merge, initial professional word dictionary is obtained after manual examination and verification duplicate removal, such as: ammeter, indicator light, urges expense to notify at electric pole
Singly, handle cancellation etc..
Step 2, based on newest professional word dictionary and identification model, the word in power marketing data is labeled, and
With the power marketing data marked, the new identification model of training.
The process being labeled to the professional word in power marketing data is as follows:
21) power marketing data are identified with identification model, the word in power marketing data is labeled.
The label of mark is broadly divided into five classes, name, place name, institution term, power specialty word and is not belonging to entity
Word, wherein name is indicated with PER, and place name is shown with LOC table, and institution term is shown with ORG table, power specialty word ELECT table
Show, non-physical word is indicated with O.
22) word for being labeled as non-physical is searched in professional word dictionary, and if it exists, will then be labeled as the word mark of non-physical
Note is professional word.
As shown in Fig. 2, according to professional word dictionary, constructing Trie tree in order to accelerate search efficiency, being searched and marked by Trie tree
Note is the word of non-physical;Wherein, each node indicates each word in professional word dictionary in Trie tree, and root node is not stored any
Word.Such as " Xiao Ming and November 20 receive staff's granting urge payment card ", wherein " pressing for payment of expense " with " urging payment card " all
For power marketing profession word, then by the lookup to Trie tree, can selection maximum matching phrase rapidly and efficiently " press for payment of expense
It is single " it is labeled.
Initial data are as follows: " caller client reflection, Xiao Ming to the town Bian Cang power supply station handle timesharing, and T system shows and do not open also
Logical, client has objection to this, asks relevant departments, power supply company to verify as early as possible and replies client."
Data format after mark are as follows: " caller client reflection,<pER>xiao Ming</PER>it arrives<oRG>the town Bian Cang power supply station</
ORG>handle<eLECT>timesharing</ELECT>, T system shows also not open-minded, and client has objection to this, asks power supply company's dependent part
Door is verified as early as possible and replies client."
80% is used as training set in the power marketing data marked, and 20% as verifying collection, to being added from attention machine
The BILSTM-CRF model of system is trained, and obtains new identification model;It wherein inputs input format and the form of label is as follows,
Wherein, the first word feature for being classified as input, second is classified as part of speech feature (part of speech feature is explained as shown in table 3), and third is classified as mark
The label of note, wherein B indicates beginning, and I indicates intermediate, and E indicates ending, and S indicates single word
3 part of speech feature of table is explained
As shown in figure 3, the model mainly includes 6 layers, respectively input layer, search layer, BILSTM layers, from attention mechanism
Layer, full articulamentum and CRF layers.It is specific as follows:
(1) input layer: handling training set and verifying collection, and data are marked using BIOES, and word feature is respectively adopted
Char=[c1,c2....cn] and part of speech feature pos=[p1,p2....pn] it is used as mode input, word feature is for obtaining text
This essential characteristic, part of speech feature are used to obtain the semantic feature under sentence different context, and n is characterized quantity.
(2) it searches layer: word feature and part of speech feature is converted to corresponding word vector and part of speech feature vector respectively,
And obtained vector is spliced, i.e. Xj=[charj,posj], wherein XjIndicate j-th of sentence, j ∈ [1, n].
(3) BILSTM layers: the contextual information of sentence, the contextual information that BILSTM is learnt are obtained by BILSTM
Spliced, obtain BILSTM layers of feature vector, it is assumed that BILSTM hidden layer exports result are as follows:Wherein hiFor
The feature vector of BILSTM layers of output, i.e., i-th of word in sentence,Indicate the hidden layer vector exported before i-th of word to LSTM,Indicate the hidden layer vector of the reversed LSTM output of i-th of word.
(4) from attention mechanism layer: using from attention mechanism, to obtain the correlation between sequence itself, to BILSTM
The feature vector h of layer outputiIts attention weight is calculated, the output from attention mechanism layer is obtained, from the calculating of attention mechanism
Mode are as follows:
Q_t=f (Whi+b)
K_t=f (Whi+b)
V_t=f (Whi+b)
Wherein, Q_t, K_t, V_t respectively indicate the output connected entirely, and d indicates the dimension of Q_t,Wherein m is sentence length, and self_attention is weighing from attention for i-th of word
Value, W are weight matrix to be trained, and b is bias term, and f is activation primitive Relu.
(5) full articulamentum: being mapped to Label space for the output of attention mechanism layer, the output connected entirely, calculates
Formula is as follows:
M=W*self_attention+b
Wherein, M is the output of full articulamentum.
(6) CRF layers: the output M connected entirely is CRF layers incoming, for inputting sentence X=(x1,x2.......xm) reality
The score of border label are as follows:
Wherein, Score (X, y) is the score of the physical tags y of sentence X, Mi,yiFor state score, word x is indicatediIt is marked
For the score of label yi, Ayi,yi+1For transfer matrix, the score that label yi is shifted to label yi+1 is indicated;
The score Score (X, y) of all possible annotated sequence is normalized, probability P after being normalized (y |
X):
Wherein, YXIt is the possible output label set of X,For YXIn label;
The objective function of model training are as follows:
Step 3, it is less than threshold value in response to the number of iterations, threshold value is usually set to 3, seeks based on new identification model to electric power
Pin data are identified, are constructed new professional word dictionary, are gone to step 2, otherwise, export new identification model for professional word knowledge
Not.
When constructing New Specialty word dictionary, only profession word word used in last iteration need to be added in newly identified professional word
Allusion quotation arrives new New Specialty word dictionary after manual examination and verification duplicate removal.
The above method is carried out recognition result with existing the two method to compare, wherein the mould used based on statistical method
Type is TF-IDF, and for the model that deep learning algorithm uses for BILSTM-CRF, comparison result is as shown in table 4.
The comparison of 4 power marketing profession word recognition result of table
It can be seen that the above method by upper table to be directed on the basis of not a large amount of background corpus, pass through semi-automatic mark
Note and identification model prediction result return mark, are promoted to the professional word recognition accuracy for being directed to power marketing data, improve electric power people
The working efficiency of member.
A kind of power marketing profession word identifying system, comprising:
It constructs module: being based on power marketing external data, training initial identification model;It extracts special in power marketing data
Industry word constructs initial professional word dictionary.
Identification model is the BILSTM-CRF model being added from attention mechanism.
Constructing module includes initial professional word dictionary creation module, and initial profession word candidate's dictionary creation module includes profession
Word abstraction module and merging filtering module;
Professional word abstraction module: it is extracted using left and right comentropy and K-Means clustering method special in power marketing data
Industry word;
Merge filtering module: the result that all methods extract and the domain lexicon that external power business personnel provides are closed
And initial professional word dictionary is obtained after artificial filter.
It marks training module: based on newest professional word dictionary and identification model, the word in power marketing data being carried out
Mark, and the power marketing data marked are used, trained new identification model.
Marking training module includes labeling module, and labeling module includes preliminary labeling module and modified module;
Preliminary labeling module: power marketing data are identified with identification model, the word in power marketing data is labeled;
Modified module: the word for being labeled as non-physical is searched in professional word dictionary, and if it exists, will then be labeled as non-physical
Word be labeled as professional word.
Modified module includes searching module;Searching module: according to professional word candidate dictionary, Trie tree is constructed, Trie is passed through
Tree searches the word for being labeled as non-physical;Wherein, each node indicates each word in professional word dictionary in Trie tree, and root node is not
Store any word.
It returns mark module: being less than threshold value in response to the number of iterations, power marketing data are known based on new identification model
Not, new professional word dictionary is constructed, mark training module is gone to, otherwise, new identification model is exported and is identified for professional word.
A kind of computer readable storage medium storing one or more programs, one or more of programs include referring to
Enable, described instruction when executed by a computing apparatus so that the calculatings equipment execution power marketing profession word recognition method.
A kind of calculating equipment, including one or more processors, memory and one or more program, one of them or
Multiple programs store in the memory and are configured as being executed by one or more of processors, one or more of
Program includes the instruction for executing power marketing profession word recognition method.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The above is only the embodiment of the present invention, are not intended to restrict the invention, all in the spirit and principles in the present invention
Within, any modification, equivalent substitution, improvement and etc. done, be all contained in apply pending scope of the presently claimed invention it
It is interior.
Claims (10)
1. a kind of power marketing profession word recognition method, it is characterised in that: include the following steps,
Step 1, power marketing external data, training initial identification model are based on;Extract the professional word in power marketing data, structure
Build initial professional word dictionary;
Step 2, based on newest professional word dictionary and identification model, the word in power marketing data is labeled, and with marking
The power marketing data being poured in, the new identification model of training;
Step 3, it is less than threshold value in response to the number of iterations, power marketing data is identified based on new identification model, constructs
New professional word dictionary, goes to step 2, otherwise, exports new identification model and identifies for professional word.
2. a kind of power marketing profession word recognition method according to claim 1, it is characterised in that: identification model is to be added
From the BILSTM-CRF model of attention mechanism.
3. a kind of power marketing profession word recognition method according to claim 1, it is characterised in that: the initial professional word of building
The process of dictionary is,
The professional word in power marketing data is extracted using left and right comentropy and K-Means clustering method;
The result that all methods extract and the domain lexicon that external power business personnel provides are merged, after artificial filter
To initial professional word dictionary.
4. a kind of power marketing profession word recognition method according to claim 1, it is characterised in that: to power marketing data
In the process that is labeled of professional word be,
Power marketing data are identified with identification model, and the word in power marketing data is labeled;
The word for being labeled as non-physical is searched in professional word dictionary, and if it exists, be then labeled as the word for being labeled as non-physical specially
Industry word.
5. a kind of power marketing profession word recognition method according to claim 4, it is characterised in that: according to professional word word
Allusion quotation constructs Trie tree, and the word for being labeled as non-physical is searched by Trie tree;Wherein, each node indicates professional word in Trie tree
Each word in dictionary, root node do not store any word.
6. a kind of power marketing profession word identifying system, it is characterised in that: including,
It constructs module: being based on power marketing external data, training initial identification model;Extract the profession in power marketing data
Word constructs initial professional word dictionary;
It marks training module: based on newest professional word dictionary and identification model, the word in power marketing data being labeled,
And with the power marketing data marked, the new identification model of training;
It returns mark module: being less than threshold value in response to the number of iterations, power marketing data are identified based on new identification model, structure
New professional word dictionary is built, mark training module is gone to, otherwise, new identification model is exported and is identified for professional word.
7. a kind of power marketing profession word identifying system according to claim 6, it is characterised in that: identification model is to be added
From the BILSTM-CRF model of attention mechanism.
8. a kind of power marketing profession word identifying system according to claim 6, it is characterised in that: building module includes just
Begin professional word dictionary creation module, and initial profession word candidate's dictionary creation module is including professional word abstraction module and merges filter module
Block;
Professional word abstraction module: the professional word in power marketing data is extracted using left and right comentropy and K-Means clustering method;
Merge filtering module: the result that all methods extract and the domain lexicon that external power business personnel provides being merged, warp
Initial professional word dictionary is obtained after crossing artificial filter.
9. a kind of power marketing profession word identifying system according to claim 6, it is characterised in that: mark training module packet
Labeling module is included, labeling module includes preliminary labeling module and modified module;
Preliminary labeling module: power marketing data are identified with identification model, the word in power marketing data is labeled;
Modified module: the word for being labeled as non-physical is searched in professional word dictionary, and if it exists, will then be labeled as the word of non-physical
It is labeled as professional word.
10. a kind of power marketing profession word identifying system according to claim 9, it is characterised in that: modified module includes
Searching module;
Searching module: according to professional word candidate dictionary, Trie tree is constructed, the word for being labeled as non-physical is searched by Trie tree;Its
In, each node indicates that each word in professional word dictionary, root node do not store any word in Trie tree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910584443.2A CN110287495A (en) | 2019-07-01 | 2019-07-01 | A kind of power marketing profession word recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910584443.2A CN110287495A (en) | 2019-07-01 | 2019-07-01 | A kind of power marketing profession word recognition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110287495A true CN110287495A (en) | 2019-09-27 |
Family
ID=68021484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910584443.2A Pending CN110287495A (en) | 2019-07-01 | 2019-07-01 | A kind of power marketing profession word recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287495A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738054A (en) * | 2019-10-14 | 2020-01-31 | 携程计算机技术(上海)有限公司 | Method, system, electronic device and storage medium for identifying hotel information in mail |
CN111339268A (en) * | 2020-02-19 | 2020-06-26 | 北京百度网讯科技有限公司 | Entity word recognition method and device |
CN113762716A (en) * | 2021-07-30 | 2021-12-07 | 国网山东省电力公司营销服务中心(计量中心) | Method and system for evaluating running state of transformer area based on deep learning and attention |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572758A (en) * | 2013-10-24 | 2015-04-29 | 山东大学 | Method and system for automatically extracting power field specialized vocabularies |
CN107133220A (en) * | 2017-06-07 | 2017-09-05 | 东南大学 | Name entity recognition method in a kind of Geography field |
CN107527073A (en) * | 2017-09-05 | 2017-12-29 | 中南大学 | The recognition methods of entity is named in electronic health record |
CN109710947A (en) * | 2019-01-22 | 2019-05-03 | 福建亿榕信息技术有限公司 | Power specialty word stock generating method and device |
CN109710926A (en) * | 2018-12-12 | 2019-05-03 | 内蒙古电力(集团)有限责任公司电力调度控制分公司 | Dispatching of power netwoks professional language semantic relation extraction method, apparatus and electronic equipment |
-
2019
- 2019-07-01 CN CN201910584443.2A patent/CN110287495A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572758A (en) * | 2013-10-24 | 2015-04-29 | 山东大学 | Method and system for automatically extracting power field specialized vocabularies |
CN107133220A (en) * | 2017-06-07 | 2017-09-05 | 东南大学 | Name entity recognition method in a kind of Geography field |
CN107527073A (en) * | 2017-09-05 | 2017-12-29 | 中南大学 | The recognition methods of entity is named in electronic health record |
CN109710926A (en) * | 2018-12-12 | 2019-05-03 | 内蒙古电力(集团)有限责任公司电力调度控制分公司 | Dispatching of power netwoks professional language semantic relation extraction method, apparatus and electronic equipment |
CN109710947A (en) * | 2019-01-22 | 2019-05-03 | 福建亿榕信息技术有限公司 | Power specialty word stock generating method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738054A (en) * | 2019-10-14 | 2020-01-31 | 携程计算机技术(上海)有限公司 | Method, system, electronic device and storage medium for identifying hotel information in mail |
CN111339268A (en) * | 2020-02-19 | 2020-06-26 | 北京百度网讯科技有限公司 | Entity word recognition method and device |
CN111339268B (en) * | 2020-02-19 | 2023-08-15 | 北京百度网讯科技有限公司 | Entity word recognition method and device |
CN113762716A (en) * | 2021-07-30 | 2021-12-07 | 国网山东省电力公司营销服务中心(计量中心) | Method and system for evaluating running state of transformer area based on deep learning and attention |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jung | Semantic vector learning for natural language understanding | |
CN109635117B (en) | Method and device for recognizing user intention based on knowledge graph | |
CN109145294B (en) | Text entity identification method and device, electronic equipment and storage medium | |
CN105244029B (en) | Voice recognition post-processing method and system | |
Xie et al. | Detecting duplicate bug reports with convolutional neural networks | |
CN110457676B (en) | Evaluation information extraction method and device, storage medium and computer equipment | |
CN107861951A (en) | Session subject identifying method in intelligent customer service | |
CN106776538A (en) | The information extracting method of enterprise's noncanonical format document | |
CN109598517B (en) | Commodity clearance processing, object processing and category prediction method and device thereof | |
CN112434535B (en) | Element extraction method, device, equipment and storage medium based on multiple models | |
CN111899090B (en) | Enterprise associated risk early warning method and system | |
Yuan-jie et al. | Web service classification based on automatic semantic annotation and ensemble learning | |
CN110287495A (en) | A kind of power marketing profession word recognition method and system | |
CN112818093A (en) | Evidence document retrieval method, system and storage medium based on semantic matching | |
CN113672718B (en) | Dialogue intention recognition method and system based on feature matching and field self-adaption | |
CN113761218A (en) | Entity linking method, device, equipment and storage medium | |
CN110334186A (en) | Data query method, apparatus, computer equipment and computer readable storage medium | |
CN110222192A (en) | Corpus method for building up and device | |
Tripathi et al. | SimNER–an accurate and faster algorithm for named entity recognition | |
KR20230163983A (en) | Similar patent extraction methods using neural network model and device for the method | |
CN117422074A (en) | Method, device, equipment and medium for standardizing clinical information text | |
Thuy et al. | Leveraging foreign language labeled data for aspect-based opinion mining | |
CN110287396A (en) | Text matching technique and device | |
Spichakova et al. | Using machine learning for automated assessment of misclassification of goods for fraud detection | |
CN114708073A (en) | Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190927 |
|
RJ01 | Rejection of invention patent application after publication |