CN105930347A - Text analysis based power outage cause recognition system - Google Patents

Text analysis based power outage cause recognition system Download PDF

Info

Publication number
CN105930347A
CN105930347A CN201610209966.5A CN201610209966A CN105930347A CN 105930347 A CN105930347 A CN 105930347A CN 201610209966 A CN201610209966 A CN 201610209966A CN 105930347 A CN105930347 A CN 105930347A
Authority
CN
China
Prior art keywords
text
rule
identification
theme
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610209966.5A
Other languages
Chinese (zh)
Other versions
CN105930347B (en
Inventor
李虎
程树华
牛良涛
王伟凯
吴文先
徐进澎
嵇望
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yuanchuan Xinye Technology Co ltd
Original Assignee
Zhejiang Utry Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Utry Information Technology Co Ltd filed Critical Zhejiang Utry Information Technology Co Ltd
Priority to CN201610209966.5A priority Critical patent/CN105930347B/en
Publication of CN105930347A publication Critical patent/CN105930347A/en
Application granted granted Critical
Publication of CN105930347B publication Critical patent/CN105930347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/043Distributed expert systems; Blackboards

Abstract

The invention relates to the field of mode recognition, and discloses a text analysis based power outage cause recognition system. The text analysis based power outage cause recognition system includes a database and a processor; the database records power outage data which is generated when customer service staff provide customer services and record frequent power outage complaints; and the processor is internally provided with a text partition and filtering expert system module, a basic cause recognition expert system module, and an HDSP recognition module. The text analysis based power outage cause recognition system can help a client lock a power outage cause from messy work orders, clear responsibility, enhance customer center management, create conditions for improving the user satisfaction, and allow enterprises to treat a power outage event timely; a used model and a used system are achieved automatically, are objective in evaluation criterion, and are good in integration performance; the workload of a worker can be greatly reduced; and the problem that inconsistent results of workers due to subjective causes can be solved.

Description

Safety at power cut identification system based on text analyzing
Technical field
The present invention relates to area of pattern recognition, particularly relate to safety at power cut identification system based on text analyzing.
Technical background
At present in 1.8 trillion GB capacity of big data, unstructured data has accounted for about 8 ninety percent, and And expect the year two thousand twenty and will increase with the development speed of 44 times.The most effectively manage, excavate, analyze magnanimity The information contained in unstructured data, it has also become the significant challenge of big data fields.At unstructured data In, text data occupies critical role.For having the enterprise of a large amount of text data, the most effectively utilize This part data resource decides enterprise's development in the future.In the data of power industry client service center, how To complaining the frequent power failure data in work order to process, thus finding out corresponding safety at power cut, this is to improvement The internal control of power industry and increase customer satisfaction degree and serve vital effect.
Existing patent documentation includes: number of patent application is the Chinese patent application " of 201210281754.X Plant intelligent switchboard fault diagnosis system and method ", number of patent application is the Chinese patent of 201110281938.1 Application " a kind of subjective text and objective file classification method and device ".
Number of patent application is that the Chinese patent application of 201210281754.X has the disadvantage that this article proposes A kind of intelligent switchboard fault diagnosis system and method, this system uses machine learning algorithm maintenance expert The method in system convention storehouse, but the incompleteness that the method is not set up in view of expert system rule storehouse, Its recognition performance accuracy depends on that the rule in rule base is the most representative.The method is to not having Being carried out judging output by this system identification fault out, only considering can be by rule base in output existing system The fault of coupling.The method only improves the performance of system by improving rule base, and therefore the performance of system is very The completeness of rule base is depended in big degree.
Number of patent application be 201110281938.1 Chinese patent application have the disadvantage that this article proposes A kind of subjective text and the sorting technique of objective text and device, this system uses machine learning algorithm to carry out point Class identification, but the method can only identify single label, it is impossible to a text identification is gone out multiple label.Therefore The method is only applicable to export the situation of single label.
But the most not universal instantly at big data processing method, the analysis people of power industry client service center Member uses and manually extracts safety at power cut from the text data frequently having a power failure, and obtains processing the most again Result carries out data analysis.This method has certain feasibility data volume is little when.But due to all Manually obtain, therefore its outcome quality may have certain loss because of long uninteresting work. Owing to current data volume is increasing, time-consuming long, labour force's input amount presented in this manual operation Greatly, outcome quality can occur the problems such as large variation because of operating time.Draw safety at power cut is carried out responsibility Timesharing, as the different views for the treatment of people cause result to unify.
Finding out safety at power cut therein from every work order, the essence of this process is classified exactly.Can use at present The mainstream technology classifying text data has Text Classification based on machine learning classification and base Text Classification in specialist system.Text data yet with every frequent power failure comprises a plurality of stopping Electricity reason.This is a huge challenge for traditional machine learning classification algorithm, because traditional engineering Practise sorting algorithm and can only identify single safety at power cut.And there is Expert Rules extraction and be difficult to and on rule in specialist system Its search efficiency the lowest inferior problem when then comparing many, the most traditional Expert Rules system also cannot be complete It is used for solving the identification problem of a plurality of safety at power cut.
Summary of the invention
The present invention is directed to the shortcoming that analysis efficiency in prior art is low, it is provided that power failure based on text analyzing Reason identification system.
In order to solve above-mentioned technical problem, the present invention is addressed by following technical proposals:
Safety at power cut identification system based on text analyzing, including data base and processor, record in data base Have and complained for having a power failure when providing customer service by contact staff and record the power failure data of generation, processor Inside it is provided with text partition and filters expert system module, root because identifying that expert system module and HDSP identify mould Block;
Text partition with filter expert system module power failure data are carried out text partition with filter and make partition with Every power failure data after filtration have and an only safety at power cut, text partition with filter expert system module Including text partition unit and filtration specialist system unit, power failure data are passed sequentially through funny by text partition unit Number, split and branch successively split, filter specialist system unit will partition after power failure data carried out Filter and remove the data unrelated with safety at power cut;
Root is because identifying that expert system module power failure data after partition is filtered extract general character rule, and passes through general character Power failure data are analyzed and draw identification text by rule;
With filtration expert system module and Gen Yin, text partition is identified that expert system module is entered by HDSP identification module Row is analyzed and unrecognized power failure data carry out secondary analysis and obtain and arrive identification text.
As preferably, root is because identifying that expert system module also includes Rule unit, rule base unit and the fact Library unit;
Rule unit extracts general character rule, by the property of general character rule to through partition and the power failure data filtered Can parameter compare with the first threshold being set in advance in rule base unit, when the performance ginseng of general character rule When the accuracy rate that number identifies is higher than the accuracy rate of first threshold, then by the performance parameter of this general character rule with true Second Threshold in library unit carries out accuracy rate and compares, if the accuracy rate of the performance parameter of this general character rule is higher than The accuracy rate of Second Threshold, otherwise, then continue to optimize this general character rule;
Include the coupling word for different safety at power cut identifications in rule base unit, by this general character rule with mate Word carries out mating and draw the identification text that these power failure data are corresponding;
Factbase include industry background knowledge, initial text data, later stage labeling data and root because of know The recognition performance data produced in other expert system module running.
As preferably, root is because identifying that expert system module also includes inference machine, man-machine interaction unit and explanation list Unit;The logicality relation inference of inference machine rule-based reasoning in rule base unit, man-machine interaction unit bag Including human-computer interaction interface, engineer carries out rule base unit and the number of factbase unit by human-computer interaction interface According to improving and carrying out new Rule, the recognition result of safety at power cut is presented on human-computer exchange by Interpretation unit User directly it is presented on interface.
As preferably, Unidentified power failure data are extracted and generate training text by HDSP identification module, By the analysis of training text being drawn performance parameter, using performance parameter to generate and identifying text and to remaining Unidentified power failure data carry out the identification of safety at power cut.
As preferably, obtaining θ and p (θ) by training text, the θ vector that is the theme i.e. represents that each theme of each column exists Document occur probability, p (θ) be the theme vector θ Dirichlet distribution, then draw two control parameter alpha and β, α are the parameter that p (θ) is distributed, for generating a theme θ vector;β is the word that each theme is corresponding Probability distribution matrix p (w | z), determined that topic model, model generation identification literary composition by controlling parameter alpha and β This algorithm is as follows: (1) selectes a theme vector θ, determines the selected probability of each theme;(2) from Selecting a theme z in theme distribution vector θ, be distributed by the Word probability of theme z and generate a word, this word is i.e. For identifying text.
As preferably, Unidentified power failure data are extracted and generate test text by HDSP identification module, By artificially test text being carried out safety at power cut identification, it is judged that control parameter alpha and β that training text draws are No rationally and be adjusted.
Due to the fact that and have employed above technical scheme there is significant technique effect: this patent is considering now In the case of one content of text cannot be identified multiple safety at power cut by some machine learning classification algorithms, first Process text data carrying out text partition and filter specialist system, then comprehensive utilization root is because identifying expert System and HDSP identify that model carries out root because identifying to content of text, it is achieved that carry out a content of text The function of multiple safety at power cut identifications.Text partition makes root because identifying specialist system with filtering specialist system Identification range is less, is more beneficial for the foundation of rule, substantially increases root because identifying the identity of specialist system Energy;Result after partition is filtered meets a content of text and comprises only a safety at power cut, so that Machine learning classification algorithm can effectively use.During rule is set up, one is needed in view of specialist system The individual process iterated, therefore only by root because identifying that specialist system carries out root because identifying to content of text, can Can make part text data unrecognized go out corresponding safety at power cut.For these reasons, therefore make further Identify that the text data that this part is unrecognized is recognized for by model with HDSP, this considerably reduce The quantity that text data is unrecognized, also further improves root because identifying that specialist system is on identification function Deficiency.This patent can help client to lock safety at power cut from rambling work order, and clearly defining responsibilities is returned Belong to, create conditions for improving service quality, the management of reinforcement client service center and lifting user satisfaction;Be conducive to enterprise Industry processes power-off event in time, establishes good corporate image for enterprise.Model used and system are automatically Realizing, its evaluation criterion performance objective, integrated is good, greatly reduces the workload of staff, solves Staff causes the inconsistent problem of result system because of subjective reason.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of the present invention.
Fig. 2 is the text partition of Fig. 1 and the schematic diagram filtering expert system module.
Fig. 3 is that the rule of the Rule unit of Fig. 2 generates Principle of Process schematic diagram.
Fig. 4 is topic model schematic diagram.
Fig. 5 is the model training flow chart in HDSP identification module.
Detailed description of the invention
With embodiment, the present invention is described in further detail below in conjunction with the accompanying drawings.
Embodiment 1
Safety at power cut identification system based on text analyzing, including data base and processor, record in data base Have and complained for having a power failure when providing customer service by contact staff and record the power failure data of generation, processor Inside it is provided with text partition and filters expert system module, root because identifying that expert system module and HDSP identify mould Block;
Text partition with filter expert system module power failure data are carried out text partition with filter and make partition with Every power failure data after filtration have and an only safety at power cut, text partition with filter expert system module Including text partition unit and filtration specialist system unit, power failure data are passed sequentially through funny by text partition unit Number, split and branch successively split, filter specialist system unit will partition after power failure data carried out Filter and remove the data unrelated with safety at power cut;
Root is because identifying that the power failure data after partition filtration are extracted general character rule by expert system module, and passes through general character Power failure data are analyzed and draw identification text by rule;
HDSP identification module will identify specialist system mould by text partition with filtration expert system module and Gen Yin Block is analyzed and unrecognized power failure data carry out secondary analysis and obtain and arrive identification text.
The partition rule of text partition is first to split with comma, then splits the result split fullstop, finally Split with branch again.The purpose of text filtering is exactly the unrelated composition after filtering out above-mentioned partition, mainly rule As follows: 1, length filtering out less than 6;2, filtering out of the time descriptions such as year, month, day is only comprised;3、 If this has word filtering out in blacklist;If 4, this there being word to occur in white list not Can filter.
As in figure 2 it is shown, root is because identifying what specialist system mainly carried out setting up according to RBES, With rule base and factbase as core, by the man-machine interaction with user, domain expert and engineer, on rule The stage that then obtains carry out constantly the creating of rule-test-perfect-test-perfect-...-iterative process that updates, logical Cross the logical relation of rule-based reasoning in the clear and definite rule base of inference machine, and by explanation module by specialist system identification During the result of output carry out the related description of matched rule of correspondence, in order to user carries out rule match knot The artificial judgment of fruit.
Root also includes Rule unit, rule base unit and factbase unit because of identification expert system module, as Fig. 3;
Rule unit extracts general character rule, by the property of general character rule to through partition and the power failure data filtered Can parameter compare with the first threshold being set in advance in rule base unit, when the performance ginseng of general character rule When the accuracy rate that number identifies is higher than the accuracy rate of first threshold, then by the performance parameter of this general character rule with true Second Threshold in library unit carries out accuracy rate and compares, if the accuracy rate of the performance parameter of this general character rule is higher than The accuracy rate of Second Threshold, then by these general character Policy Updates to rule base, otherwise, then continue to optimize this altogether Property rule, until this rule meets update condition.
Include the coupling word for different safety at power cut identifications in rule base unit, by this general character rule with mate Word carries out mating and draw the identification text that these power failure data are corresponding;
Industry background knowledge, initial text data, later stage labeling data and at root are included in factbase Because identifying the recognition performance data produced in expert system module running.
Industry background knowledge includes:
1, general power grid accident type analysis in recent years
1.1, by causality classification: from the point of view of the reason that power grid accident occurs, cause the main of general power grid accident Because have: relay protection, vile weather, external force destruction, maloperation, quality are bad, personnel's responsibility and His reason.
1.2, by responsibility category: power grid accident can be divided into by responsibility category: natural disaster, workmanship, External force destruction, operations staff, detail design, personnel's responsibility and other.According to statistics, natural disaster (thunderbolt, Mist dodges, icing waves), personnel's responsibility (operations staff and other staff's responsibility), external force destroys and manufactures matter Amount is the prime responsibility reason of general power grid accident successively.
1.3, by technique classification: power grid accident then can be divided into by technique classification: relay protection, thunderbolt, Ground short circuit, pernicious maloperation, by mistake touch malfunction, equipment fault and other.Wherein, ground short circuit (break by external force Bad, electric discharge over the ground), relay protection (false protection, relay fail, secondary circuit failure etc.) and thunderbolt be structure Become the major technique reason of general power grid accident.
1.4, by device class: power grid accident generally can be divided into by device class: transmission line of electricity, relay protect Protect, other electrical equipment, switch, disconnecting link, combined electrical apparatus etc..Practice have shown that, transmission line of electricity, relay protection depend on The secondary capital equipment reason being to cause power grid accident.
Such as, an initial text data:
On February 6th, 2015, through white sand power supply station of Yangxin County electric company, the long Hu Weihua of outside line every class examines, Amounting to frequency of power cut in the time period of this customers' responsiveness is 3 times, and the reason that causes power failure is specific as follows: 1, white 16 Branch line victory star main road platform district, blue or green waterline fiber crops garden migrates, and stops electric power feeding time: 2015-01-1308:20-16:25;2, white 16 branch line victory star main road 2# platform district, beam public affairs paving hemp gardens increase distribution transforming newly and take fire, stop electric power feeding time: 2015-01-27 08:20-18:05;3, relating to fault ticket: 2015020542186467, safety at power cut is: the low total sky in platform district Open tripping operation, stop electric power feeding time: on February 5th, 2015 20:07-21:08, but the reason caused power failure is taked Repairing reset mode solves, and replys safety at power cut to client (15272057988), and client understands. Later stage labeling data: the label that above-mentioned example is corresponding is scheduled outage, scheduled outage, fault outage.Know Other performance data includes: the label of above-mentioned example Model Identification is scheduled outage, scheduled outage, fault outage. The label of now text identification is the most correct.
Containing a large amount of satisfactory rule in rule base, its format content mainly comprises following two pieces:
1) determination of the bound symbol between each coupling word:
The process of rule match, will coupling word in rule-based knowledge base and corresponding content of text be carried out Join.Obviously the process of coupling can exist and comprises accordingly, do not comprises, comprises simultaneously, only comprises one Etc. situation, therefore during coupling, need the corresponding relation showing to mate word with content of text.Therefore Determine coupling word between bound symbol time, based in above-mentioned matching process it is possible that various feelings Condition, establishes bound symbol as shown in table 1.
Table 1 mates bound symbol explanation between word
Annotation: a coupling word can only be connected after each bound symbol, to by multiple coupling words even Connect, use bound symbol A word+space+bound symbol B contamination to carry out the foundation of rule.
2) determination of alternative symbol between classification:
Owing to, during setting up rule, rule category is carried out dividing foundation by we, So inherently there is the situation of certain alternative in the rule that can there are two classifications.Therefore for one For individual content of text (if A, B two class mutual exclusion), as shown in table 2, if being judged as A class, Then can not be judged as B class, then the rule symbol of this situation is determined by we.
Table 2 alternative symbol description
Root is because identifying that expert system module also includes inference machine, man-machine interaction unit and Interpretation unit;Inference machine The logicality relation inference of the rule-based reasoning in rule base unit, man-machine interaction unit includes man-machine interaction Interface, engineer by human-computer interaction interface carry out rule base unit and factbase unit data improve go forward side by side The Rule that row is new, the recognition result of safety at power cut is presented on alternating interface between man and computer directly by Interpretation unit It is presented to user.
HDSP is mainly based upon the topic model of LDA algorithm, and has merged the most further and have prison Superintend and direct classification learning algorithm, so that this algorithm can also carry out the autonomic learning of label when extracting theme simultaneously. Tradition judges that two documents are the most similar, and simplest way adds up the word that two documents jointly comprise Quantity, such as: TF-IDF.But this method does not also take into account the semantic component that word is comprised, thus can miss Two documents that the quantity of the word sentencing semantic similitude but jointly comprise is little.Therefore when judging document similarity also The semantic component of document itself need to be considered, and be directed to semantic excavation and mainly use topic model.At theme In model, theme can be a concept, an aspect, can also be to comprise a series of relevant word simultaneously Set, be the conditional probability of these words.Generally speaking, theme contains many phases strong with this theme exactly The word of closing property (it is high that document comprises probability).
Unidentified power failure data are extracted based on LDA algorithm and are divided into training literary composition by HDSP identification module Basis and test text, by the training of training text is drawn performance parameter, then surveyed by test text Try and draw the performance parameter that recognition accuracy is higher, use performance parameter to generate and identify text and to remaining Unidentified power failure data carry out the identification of safety at power cut.
Showing that two control parameter alpha and β by training algorithm training, (α is the parameter that p (θ) is distributed, and is used for Generate a theme θ vector;β is word probability distribution matrix p that each theme is corresponding (w | z)), by controlling Parameter alpha and β have determined that topic model and have generated identification text, and the algorithm of model generation identification text is as follows:
Choose parameter θ~P (θ);
Foreach of the N words wn:
Choose a topic zn~p (z | θ);
Choose a word wn~p (w | z);
Wherein:
θ: theme vector, each column represents the probability that each theme occurs at document
The Dirichlet distribution of p (θ): θ
N: the number of the word of document to be generated
wn: the n-th word w of generation
zn: the theme of selection
P (z | θ): the probability distribution of theme z during given θ
P (w | z): the distribution of word w during given theme z
How topic model problem to be solved is for generate theme.For this problem, topic model is raw Model is become to connect document and theme.Generate model, i.e. suppose each word of every article be by " with Certain certain theme of probability selection, and further in this theme with certain word of certain probability selection " Process obtains.Therefore for a document, the probability that each word that it is comprised occurs is:
This new probability formula can represent with matrix:
Formula one:
Wherein " document-word " matrix represents the word frequency of each word in each document, the probability i.e. occurred;" main Topic-word " matrix represents the probability that each word in each theme occurs;" document-theme " matrix represents each literary composition The probability that in Dang, each theme occurs.Given a series of document, by document is carried out participle, calculates each In document, the word frequency of each word can be obtained by " document-word " matrix on the left side.According to the title in formula one Illustrate: the set that word is made up of a series of words, this set comprise heavy rain, heavy rain, magpie, Nest, household electrical appliances, a series of words occurred in document such as electric leakage, word here is text word segmentation processing Rear acquisition;Document the most frequently has a power failure data content, as strong wind and heavy rain weather cause power failure, user household electrical equipment Electric leakage such as causes power failure at the content of text;Theme is frequent safety at power cut, has natural disaster, artificial external force, use Family equipment fault, bird pest etc., these are all each themes under frequent safety at power cut;Theme vector: be exactly The set being made up of each theme mentioned above.
First this method selectes a theme vector θ, here as a example by frequent safety at power cut, its correspondence Type of theme is natural disaster, scheduled outage, bird pest, artificial external force etc., and these type of theme gather into one Set, this set is exactly theme vector, and the element in theme vector is exactly above-mentioned each described theme class Type, it is then determined that the selected probability of each theme.Then generating each word when, from theme distribution Vector θ selects a theme z, is distributed by the Word probability of theme z and generates a word.Understand associating from the graph Probability is:
p ( θ , z , w | α , β ) = p ( θ | α ) Π n = 1 N p ( z n | θ ) p ( w n | z n , β )
Being combined by above formula corresponds on figure, can substantially be interpreted as shown in Figure 4 by figure below, topic model Three represent that layer is showed by table 3:
Table 3 image parameters explanation
By discussed above, it is known that topic model is mainly from given input language material learning training two Individual control parameter alpha and β, learn the two control parameter and determined that model, just can be used to generate literary composition Shelves.
DSP identification module create one based on topic model improve model, consist predominantly of supervised classification and Without supervision two aspects of Subject Clustering.Three below process is the generation process of this model Supervised classification:
1) by sampling a certain amount of power failure data of acquisition as sample data, this sample data is carried out manually Safety at power cut label labelling, and the data after labelling are divided into training and test two parts;
2) use the training sample data of labelling that HDSP model is trained, utilize us to train HDSP model carries out safety at power cut identification to test sample, and the safety at power cut result of output model identification, such as figure Shown in 5, the method for training is:
1. pair document content carries out word segmentation processing, is calculated the probability that each word occurs in a document, In conjunction with formula one, we have obtained " document-word " matrix.
2. initiation parameter α, β, " document-theme " matrix, " theme-word " matrix.
3. utilize " theme-word " matrix in β, " document-theme " matrix calculus document.
4. utilize α, " theme-word " matrix calculus " document-theme " matrix.
5. utilize result " theme-word " the matrix update parameter beta of step 3.
6. utilize result " document-theme " the matrix update parameter alpha of step 4.
The most repeatedly performing above-mentioned steps 3-6, until convergence, then training terminates.
3) according to test result, the test text recognition result of the HDSP model output trained is marked with artificial The test sample result of note compares statistics, is calculated the accuracy rate of HDSP Model Identification safety at power cut, First meeting sets the threshold value of a recognition accuracy, is performance standard, by comparing the accurate of test result Rate and the size of threshold value, it is known that whether "current" model reaches performance standard, without reaching performance mark Standard, the most constantly adjusts model parameter, and repeats 2) process, when test result reaches performance standard, Preserve the model file after training.
Additionally, HDSP identification module can also carry out Unsupervised clustering to theme.During cluster, calculate Method can generate the cluster labels of each example according to the descriptor number set and theme number.Meanwhile, The cluster labels generated artificially is intervened by this algorithm support.After human intervention, algorithm can be learned automatically Practise the label knowledge intervened, re-training model, then text is re-started cluster.Along with entering of iteration OK, the precision of algorithm cluster also can be more and more higher.
In a word, the foregoing is only presently preferred embodiments of the present invention, all made according to scope of the present invention patent Impartial change with modify, all should belong to the covering scope of patent of the present invention.

Claims (6)

1. safety at power cut identification system based on text analyzing, it is characterised in that: include data base and processor, number Have according to record in storehouse and complained for having a power failure when providing customer service by contact staff and record the power failure of generation Data, are provided with text partition with filtration expert system module, root because identifying expert system module in processor With HDSP identification module;
Text partition with filter expert system module and power failure data carried out text partition and filters and make point Every power failure data after tearing open and filtering have and an only safety at power cut, text partition with filter expert system System module includes text partition unit and filters specialist system unit, and power failure data are depended on by text partition unit Secondary successively split by comma, fractionation and branch, filter specialist system unit by the power failure after partition Data carry out filtering and remove the data unrelated with safety at power cut;
Root is because identifying that expert system module power failure data after partition is filtered extract general character rule, and passes through Power failure data are analyzed and draw identification text by general character rule;
Text partition is identified specialist system mould with filtration expert system module and Gen Yin by HDSP identification module Block is analyzed and unrecognized power failure data carry out secondary analysis and obtain and arrive identification text.
Safety at power cut identification system based on text analyzing the most according to claim 1, it is characterised in that: root Because identifying that expert system module also includes Rule unit, rule base unit and factbase unit;
Rule unit extracts general character rule, by general character rule to through partition and the power failure data filtered Performance parameter compare with the first threshold being set in advance in rule base unit, when general character rule When the accuracy rate of performance parameter identification is higher than the accuracy rate of first threshold, then by the performance ginseng of this general character rule Number carries out accuracy rate with the Second Threshold in factbase unit and compares, if the performance parameter of this general character rule Accuracy rate is higher than the accuracy rate of Second Threshold, then by these general character Policy Updates to rule base.
Include the coupling word for different safety at power cut identifications in rule base unit, by this general character rule with Coupling word carries out mating and draw the identification text that these power failure data are corresponding;
Factbase includes industry background knowledge, initial text data, later stage labeling data and at root Because identifying the recognition performance data produced in expert system module running.
Safety at power cut identification system based on text analyzing the most according to claim 2, it is characterised in that: root Because identifying that expert system module also includes inference machine, man-machine interaction unit and Interpretation unit;Inference machine is used for The logicality relation inference of the rule-based reasoning in rule base unit, man-machine interaction unit includes man-machine interaction circle Face, engineer by human-computer interaction interface carry out rule base unit and factbase unit data improve go forward side by side The Rule that row is new, the recognition result of safety at power cut is presented on alternating interface between man and computer directly by Interpretation unit Connect and be presented to user.
Safety at power cut identification system based on text analyzing the most according to claim 1, it is characterised in that:
Unidentified power failure data are extracted and generate training text by HDSP identification module, by training The analysis of text draws performance parameter, uses performance parameter to generate and identifies text and to remaining Unidentified Power failure data carry out the identification of safety at power cut.
Safety at power cut identification system based on text analyzing the most according to claim 4, it is characterised in that: logical Cross training text and obtain θ and p (θ), θ be the theme vector i.e. represent each theme of each column document occur general Rate, p (θ) is the theme the Dirichlet distribution of vector θ, then drawing two, to control parameter alpha and β, α be p (θ) The parameter of distribution, for generating a theme θ vector;β is the word probability moment of distribution that each theme is corresponding Battle array p (w | z), determined that topic model, the algorithm of model generation identification text by controlling parameter alpha and β As follows: (1) selectes a theme vector θ, determines the selected probability of each theme;(2) from theme Selecting a theme z in distribution vector θ, be distributed by the Word probability of theme z and generate a word, this word is Identify text.
Safety at power cut identification system based on text analyzing the most according to claim 5, it is characterised in that: Unidentified power failure data are extracted and generate test text by HDSP identification module, by the most right Test text carries out safety at power cut identification, it is judged that control parameter alpha and β that training text draws are the most reasonable And be adjusted.
CN201610209966.5A 2016-04-05 2016-04-05 Text analysis based power outage cause recognition system Active CN105930347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610209966.5A CN105930347B (en) 2016-04-05 2016-04-05 Text analysis based power outage cause recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610209966.5A CN105930347B (en) 2016-04-05 2016-04-05 Text analysis based power outage cause recognition system

Publications (2)

Publication Number Publication Date
CN105930347A true CN105930347A (en) 2016-09-07
CN105930347B CN105930347B (en) 2017-05-10

Family

ID=56840173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610209966.5A Active CN105930347B (en) 2016-04-05 2016-04-05 Text analysis based power outage cause recognition system

Country Status (1)

Country Link
CN (1) CN105930347B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530127A (en) * 2016-11-09 2017-03-22 国网江苏省电力公司南京供电公司 Complaint early warning and monitoring analysis system based on text mining
CN106529804A (en) * 2016-11-09 2017-03-22 国网江苏省电力公司南京供电公司 Client complaint early-warning monitoring analyzing method based on text mining technology
CN109389418A (en) * 2018-08-17 2019-02-26 国家电网有限公司客户服务中心 Electric service client's demand recognition methods based on LDA model
CN109919799A (en) * 2019-03-01 2019-06-21 广州供电局有限公司 Power off time data intelligent statistical analysis technique
CN110276699A (en) * 2019-06-21 2019-09-24 广州远正智能科技股份有限公司 A kind of public organizations' energy consumption quota index formulating method, system and storage medium
CN112837175A (en) * 2021-01-11 2021-05-25 佰聆数据股份有限公司 Frequent power failure work order information extraction method and system based on information extraction technology
CN112906729A (en) * 2019-12-04 2021-06-04 西安西电高压开关有限责任公司 Method, device and system for determining fault distribution of switch equipment
CN113128571A (en) * 2021-03-30 2021-07-16 国网甘肃省电力公司电力科学研究院 Method for detecting artificial intelligence technology in network security
CN113220875A (en) * 2021-04-09 2021-08-06 北京智慧星光信息技术有限公司 Internet information classification method and system based on industry label and electronic equipment
CN113298326A (en) * 2021-07-27 2021-08-24 成都西辰软件有限公司 Intelligent electronic event supervision method, equipment and storage medium
CN113592040A (en) * 2021-09-27 2021-11-02 山东蓝湾新材料有限公司 Method and device for classifying dangerous chemical accidents

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655838A (en) * 2009-09-10 2010-02-24 复旦大学 Method for extracting topic with quantifiable granularity
CN102981829A (en) * 2012-11-01 2013-03-20 宁波电业局 Graphic data displaying method and graphic data displaying device based on black out management system
JP2015002498A (en) * 2013-06-18 2015-01-05 中国電力株式会社 Outage situation reporting system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655838A (en) * 2009-09-10 2010-02-24 复旦大学 Method for extracting topic with quantifiable granularity
CN102981829A (en) * 2012-11-01 2013-03-20 宁波电业局 Graphic data displaying method and graphic data displaying device based on black out management system
JP2015002498A (en) * 2013-06-18 2015-01-05 中国電力株式会社 Outage situation reporting system

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530127A (en) * 2016-11-09 2017-03-22 国网江苏省电力公司南京供电公司 Complaint early warning and monitoring analysis system based on text mining
CN106529804A (en) * 2016-11-09 2017-03-22 国网江苏省电力公司南京供电公司 Client complaint early-warning monitoring analyzing method based on text mining technology
CN106529804B (en) * 2016-11-09 2023-08-18 国网江苏省电力公司南京供电公司 Customer complaint early warning monitoring analysis method based on text mining technology
CN109389418A (en) * 2018-08-17 2019-02-26 国家电网有限公司客户服务中心 Electric service client's demand recognition methods based on LDA model
CN109919799A (en) * 2019-03-01 2019-06-21 广州供电局有限公司 Power off time data intelligent statistical analysis technique
CN110276699A (en) * 2019-06-21 2019-09-24 广州远正智能科技股份有限公司 A kind of public organizations' energy consumption quota index formulating method, system and storage medium
CN112906729B (en) * 2019-12-04 2024-01-26 西安西电高压开关有限责任公司 Fault distribution determination method, device and system of switch equipment
CN112906729A (en) * 2019-12-04 2021-06-04 西安西电高压开关有限责任公司 Method, device and system for determining fault distribution of switch equipment
CN112837175B (en) * 2021-01-11 2022-05-10 佰聆数据股份有限公司 Frequent power failure work order information extraction method and system based on information extraction technology
CN112837175A (en) * 2021-01-11 2021-05-25 佰聆数据股份有限公司 Frequent power failure work order information extraction method and system based on information extraction technology
CN113128571A (en) * 2021-03-30 2021-07-16 国网甘肃省电力公司电力科学研究院 Method for detecting artificial intelligence technology in network security
CN113220875A (en) * 2021-04-09 2021-08-06 北京智慧星光信息技术有限公司 Internet information classification method and system based on industry label and electronic equipment
CN113220875B (en) * 2021-04-09 2024-01-30 北京智慧星光信息技术有限公司 Internet information classification method and system based on industry labels and electronic equipment
CN113298326A (en) * 2021-07-27 2021-08-24 成都西辰软件有限公司 Intelligent electronic event supervision method, equipment and storage medium
CN113298326B (en) * 2021-07-27 2021-10-26 成都西辰软件有限公司 Intelligent electronic event supervision method, equipment and storage medium
CN113592040A (en) * 2021-09-27 2021-11-02 山东蓝湾新材料有限公司 Method and device for classifying dangerous chemical accidents

Also Published As

Publication number Publication date
CN105930347B (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN105930347A (en) Text analysis based power outage cause recognition system
CN107861942B (en) Suspected power complaint work order identification method based on deep learning
CN111860882B (en) Method and device for constructing power grid dispatching fault processing knowledge graph
CN110263172A (en) A kind of evented autonomous classification method of power system monitor warning information
CN109525595A (en) A kind of black production account recognition methods and equipment based on time flow feature
CN104616205B (en) A kind of operation states of electric power system monitoring method based on distributed information log analysis
CN109189901A (en) Automatically a kind of method of the new classification of discovery and corresponding corpus in intelligent customer service system
CN108985632A (en) A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN109501834A (en) A kind of point machine failure prediction method and device
CN107179503A (en) The method of Wind turbines intelligent fault diagnosis early warning based on random forest
CN109726246A (en) One kind being associated with reason retrogressive method with visual power grid accident based on data mining
CN112859822B (en) Equipment health analysis and fault diagnosis method and system based on artificial intelligence
CN106533754A (en) Fault diagnosis method and expert system for college teaching servers
CN105894177A (en) Decision-making-tree-algorithm-based analysis and evaluation method for operation risk of power equipment
CN103744850A (en) Power grid disaster real-time regulating and control device and method based on intuition fuzzy rough set
CN103618638B (en) The method of assessment power telecom network maintenance solution
CN105574544A (en) Data processing method and device
CN108664538A (en) A kind of automatic identification method and system of the doubtful familial defect of power transmission and transforming equipment
CN102722719A (en) Intrusion detection method based on observational learning
CN105426908A (en) Convolutional neural network based substation attribute classification method
CN110175324A (en) A kind of operation of power networks operational order method of calibration and system based on data mining
CN112580831A (en) Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph
Oliveira et al. Automated monitoring of construction sites of electric power substations using deep learning
CN110175272A (en) One kind realizing the convergent control method of work order and control device based on feature modeling
CN111708817B (en) Intelligent disposal method for transformer substation monitoring information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180123

Address after: Room 501, 502, North building, No. 318, Baishi lane, xialishi lane, Xiacheng City, Zhejiang, Zhejiang

Patentee after: Hangzhou Yuanchuan New Technology Co.,Ltd.

Address before: 310007 Room 508, 5th Floor, 155 Zhonghe Middle Road, Shangcheng District, Hangzhou City, Zhejiang Province, China

Patentee before: ZHEJIANG UTRY INFORMATION TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 23011, Yuejiang commercial center, No. 857, Xincheng Road, Puyan street, Binjiang District, Hangzhou, Zhejiang 311611

Patentee after: Hangzhou Yuanchuan Xinye Technology Co.,Ltd.

Address before: 310006 rooms 501 and 502, North building, 318 Baishi lane, Xiacheng District, Hangzhou City, Zhejiang Province

Patentee before: Hangzhou Yuanchuan New Technology Co.,Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A text analysis-based power failure cause identification system

Effective date of registration: 20220912

Granted publication date: 20170510

Pledgee: Shanghai Pudong Development Bank Co.,Ltd. Hangzhou Chengxi Sub branch

Pledgor: Hangzhou Yuanchuan Xinye Technology Co.,Ltd.

Registration number: Y2022330002181