CN108182523A

CN108182523A - The treating method and apparatus of fault data, computer readable storage medium

Info

Publication number: CN108182523A
Application number: CN201711431807.0A
Authority: CN
Inventors: 宋明彦; 董兆宇; 马晓丽
Original assignee: Xinjiang Goldwind Science and Technology Co Ltd
Current assignee: Xinjiang Goldwind Science and Technology Co Ltd
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2018-06-19

Abstract

The embodiment of the present invention discloses a kind for the treatment of method and apparatus of fault data, computer readable storage medium.The processing method includes：Obtain the primary fault data of wind power generating set；Word segmentation processing is carried out to primary fault data, obtains keyword set corresponding with every primary fault data；Pair keyword set corresponding with all primary fault data clusters, and obtains multiple fault categories and feature set of words corresponding with each fault category.Using the technical solution in the embodiment of the present invention, automatically analyzing to the reliability failure of the historical failure processing information of wind power generating set can be realized.

Description

The treating method and apparatus of fault data, computer readable storage medium

Technical field

The present invention relates to technical field of wind power generation more particularly to a kind of processing methods of wind power generating set fault data With device, computer readable storage medium.

Background technology

Wind energy resources are generally focused on the more severe cold district of environment or highlands, rugged environment cause wind-force Various types of failures inevitably occur in the process of running for generating set.After the completion of each troubleshooting, Field Force Relevant fault handling information can be recorded.To improve the fault removal efficiency of wind power generating set and being unfolded to wind power generating set Failure optimization design, research staff needs to do reliability failure analysis to the historical failure of wind power generating set processing information. The reliability failure that information is mainly handled by the historical failure having been manually done to wind power generating set in the prior art is analyzed.But It is, since the data volume that the historical failure processing information of wind power generating set is related to is quite big, wind-force to be sent out by having been manually done The mode of the reliability failure analysis of the historical failure processing information of motor group can expend a large amount of time and efforts of research staff.

Invention content

An embodiment of the present invention provides a kind of data processing method and device of wind power generating set failure, computer-readable Storage medium can realize automatically analyzing to the reliability failure of the historical failure processing information of wind power generating set.

In a first aspect, an embodiment of the present invention provides a kind of processing method of wind power generating set fault data, this method Including：

Obtain the primary fault data of wind power generating set；

Word segmentation processing is carried out to primary fault data, obtains keyword set corresponding with every primary fault data；

Pair keyword set corresponding with all primary fault data clusters, obtain multiple fault categories and with it is each The corresponding feature set of words of fault category.

In some embodiments of first aspect, word segmentation processing is carried out to primary fault data, is obtained and every original event Hinder the corresponding keyword set of data, including：Clean every primary fault data；Fault data after cleaning is carried out at participle Reason, obtains keyword set corresponding with every primary fault data.

In some embodiments of first aspect, every primary fault data are cleaned, including：It rejects in primary fault data Null character；And/or using regular expression, reject number unrelated with wind power generating set failure in primary fault data Symbol；And/or according to predetermined fixed dictionary, reject fixed phrases unrelated with wind power generating set failure in primary fault data.

In some embodiments of first aspect, word segmentation processing is carried out to the fault data after cleaning, is obtained and every original The corresponding keyword set of beginning fault data, including：Word segmentation processing is carried out to the fault data after cleaning using stammerer participle packet, Keyword set corresponding with every primary fault data is obtained, wherein, the dictionary in stammerer participle packet includes wind-driven generator Group industry dictionary and/or deactivated dictionary.

In some embodiments of first aspect, a pair keyword set corresponding with all primary fault data is gathered Class, multiple fault categories after being clustered and the feature set of words for characterizing each fault category, including：

From original keyword set to be clustered be combined it is middle selection the first keyword set, original keyword set to be clustered be combined for The combination that keyword set corresponding with all primary fault data is formed；

Judge respectively original keyword set to be clustered be combined in addition to the first keyword set, other every group of keyword set Whether conjunction can gather with the first keyword set for one kind；

Can gather all with the first keyword set for the crucial contamination in a kind of keyword sets, as with In the feature set of words of characterization fault category corresponding with the first keyword set, and had not been able to all and the first keyword set It closes to gather and be combined for a kind of updated keyword set to be clustered of keyword set composition；

The second keyword set of middle selection is combined from updated keyword set to be clustered；

Judge respectively updated keyword set to be clustered be combined in addition to the second keyword set, other every group it is crucial Whether set of words can gather with the second keyword set for one kind；

Can gather all with the second keyword set for the crucial contamination in a kind of keyword sets, as with In the feature set of words for characterizing fault category corresponding with the second keyword set, until updated keyword set to be clustered The number of keyword set in group drops to 0.

In some embodiments of first aspect, judge original keyword set to be clustered in being combined except the first keyword respectively Whether outside set, other every group of keyword set can gather with the first keyword set for one kind, including：

From original keyword set to be clustered be combined in addition to the first keyword set, select successively in other keyword sets Take third keyword set；

The keyword total number of the first keyword set and the keyword sum of every group of third keyword set are obtained respectively Mesh, using the keyword total number of the larger keyword set of keyword total number as the first total number；

The second sum of the shared keyword between the first keyword set and every group of third keyword set is obtained respectively Mesh；

The ratio of each second total number and corresponding first total number is calculated respectively；

If the ratio of the second total number and corresponding first total number is more than predetermined ratio, judge and the second total number pair The third keyword set answered can gather with the first keyword set for one kind.

The first similarity value between every two groups of keyword sets during original keyword set to be clustered is combined is calculated, it is original to treat Cluster keyword set is combined the combination formed for keyword set corresponding with all primary fault data；

Two groups of keyword sets of the first similarity value maximum are gathered into the keyword set new for one group, and by new key Set of words and original keyword set to be clustered be combined in addition to two groups of keyword sets of the first similarity value maximum, other pass Keyword set forms updated keyword set to be clustered and is combined；

The second similarity value between every two groups of keyword sets during updated keyword set to be clustered is combined is calculated, and Two groups of keyword sets of second similarity value maximum are gathered into the keyword set new for one group, until all second similar The second similarity value of maximum in angle value is more than predetermined threshold；Alternatively, until the updated keyword set to be clustered is combined In the number of keyword set drop to predetermined number.

In some embodiments of first aspect, keyword set is combined into text formatting, calculate every two groups of keyword sets it Between the first similarity value, including：Keyword set is converted into vector format from text formatting；It calculates and every two groups of keywords Gather the first similarity value between corresponding two groups of vectors.

In some embodiments of first aspect, gather in a pair keyword set corresponding with all primary fault data Class, after obtaining multiple fault categories and feature set of words corresponding with each fault category, this method further includes：For each event Hinder classification setting class label；Feature set of words corresponding with class label is indexed according to class label.

In some embodiments of first aspect, gather in a pair keyword set corresponding with all primary fault data Class, after obtaining multiple fault categories and feature set of words corresponding with each fault category, this method further includes：It obtains and new The corresponding one or more phrases of wind power generating set failure；According to one or more phrase indexes and new wind-driven generator The relevant fault category of group failure and feature set of words；Alternatively, according to one or more word combination searches and new wind-driven generator The group relevant primary fault data of failure.

Second aspect, an embodiment of the present invention provides a kind of processing unit of wind power generating set fault data, the devices Including：

First acquisition module, for obtaining the primary fault data of wind power generating set；

Word-dividing mode for carrying out word segmentation processing to primary fault data, obtains corresponding with every primary fault data Keyword set；

Cluster module clusters for a pair keyword set corresponding with all primary fault data, obtains multiple events Hinder classification and feature set of words corresponding with each fault category.

In some embodiments of second aspect, word-dividing mode specifically includes：Cleaning unit, for cleaning every original event Hinder data；Participle unit for carrying out word segmentation processing to the fault data after cleaning, obtains corresponding with every primary fault data Keyword set.

In some embodiments of second aspect, cluster module specifically includes：

Selection unit, it is original to be clustered for being combined the first keyword set of middle selection from original keyword set to be clustered Keyword set is combined the combination formed for keyword set corresponding with all primary fault data；

Judging unit, for judge respectively original keyword set to be clustered be combined in addition to the first keyword set, its Whether he every group of keyword set can gather with the first keyword set for one kind；

First cluster cell, for that can gather all with the first keyword set for the pass in a kind of keyword set Key contamination, as characterizing the feature set of words of corresponding with the first keyword set fault category, and by it is all not It can gather with the first keyword set and be combined for a kind of updated keyword set to be clustered of keyword set composition；

Selection unit is additionally operable to be combined the second keyword set of middle selection from updated keyword set to be clustered；

Judging unit is additionally operable to judge updated keyword set to be clustered in being combined in addition to the second keyword set respectively , whether other every group of keyword set can gather with the second keyword set for one kind；

First cluster cell is additionally operable to the second keyword set to gather all in a kind of keyword set Crucial contamination, as the feature set of words for characterization fault category corresponding with the second keyword set, until update The number of keyword set during keyword set to be clustered afterwards is combined drops to 0.

In some embodiments of second aspect, cluster module also specifically includes：

Computing unit, for calculating the first phase during original keyword set to be clustered is combined between every two groups of keyword sets Like angle value, original keyword set to be clustered is combined the combination formed for keyword set corresponding with all primary fault data；

Second cluster cell, for two groups of keyword sets of the first similarity value maximum to be gathered the keyword new for one group Set, and two groups of maximum keys of the first similarity value are removed during new keyword set and original keyword set to be clustered are combined Outside set of words, other keyword sets form updated keyword set to be clustered and are combined；

Computing unit is additionally operable to calculate during updated keyword set to be clustered is combined between every two groups of keyword sets Second similarity value, and two groups of keyword sets of the second similarity value maximum are gathered into the keyword set new for one group, until The second similarity value of maximum in all second similarity values is more than predetermined threshold；Alternatively, until updated key to be clustered The number of keyword set during word set is combined drops to predetermined number.

The third aspect, an embodiment of the present invention provides a kind of processing unit of wind power generating set fault data, including depositing Realization is as above on a memory and the program that can run on a processor, when processor performs program for reservoir, processor and storage The processing method of the wind power generating set fault data.

Fourth aspect, an embodiment of the present invention provides a kind of computer readable storage mediums, are stored thereon with program, program The processing method of wind power generating set fault data as described above is realized when being executed by processor.

According to an embodiment of the invention, the primary fault data of wind power generating set can be obtained, then to primary fault Data carry out word segmentation processing, obtain keyword set corresponding with every primary fault data；Pair then with all primary faults The corresponding keyword set of data is clustered, and obtains multiple fault categories and feature word set corresponding with each fault category It closes, so as to fulfill analyzing automatically the reliability failure of the historical failure of wind power generating set processing information, can not only improve Analysis efficiency, and human resources can be saved.

In addition, when faulty generation, staff can obtain failure keyword according to phenomenon of the failure, then utilize event Barrier keyword is matched with clustering obtained feature set of words, obtains the classification and relevant troubleshooting note belonging to failure Record, it will be able to reach quick lock in fault mode, improve the effect of the troubleshooting efficiency of wind power generating set.

Description of the drawings

From below in conjunction with the accompanying drawings to this hair may be better understood in the description of the specific embodiment of the embodiment of the present invention Wherein, the same or similar reference numeral represents the same or similar feature to bright embodiment.

Fig. 1 is the flow diagram of the processing method of wind power generating set fault data that one embodiment of the invention provides；

Fig. 2 is the schematic diagram of word cloud figure displaying that one embodiment of the invention provides；

Fig. 3 is the schematic diagram of word cloud figure displaying that another embodiment of the present invention provides；

Fig. 4 is the flow signal for the processing method of wind power generating set fault data that another embodiment of the present invention provides Figure；

Fig. 5 is the flow signal for the processing method of wind power generating set fault data that further embodiment of this invention provides Figure；

Fig. 6 is the structure diagram of the processing unit of wind power generating set fault data that one embodiment of the invention provides；

Fig. 7 is the structural representation of the processing unit of wind power generating set fault data that another embodiment of the present invention provides Figure；

Fig. 8 is the structural representation of the processing unit of wind power generating set fault data that further embodiment of this invention provides Figure；

Fig. 9 is the structural representation of the processing unit of wind power generating set fault data that yet another embodiment of the invention provides Figure.

Specific embodiment

The feature and exemplary embodiment of the various aspects of the embodiment of the present invention is described more fully below.Following detailed In description, it is proposed that many details, in order to provide the comprehensive understanding to the embodiment of the present invention.

An embodiment of the present invention provides a kind for the treatment of method and apparatus of wind power generating set fault data, are sent out for wind-force The accident analysis field of motor group.It, can using the processing method of the wind power generating set fault data in the embodiment of the present invention It realizes and the reliability failure of the historical failure of wind power generating set processing information is analyzed automatically.

Fig. 1 is the flow diagram of the processing method of wind power generating set fault data that one embodiment of the invention provides. As shown in Figure 1, the processing method includes step 101 to step 103.

In a step 101, the primary fault data of wind power generating set are obtained.Wherein, the event of each wind power generating set Barrier can correspond to one primary fault data of generation.Primary fault data can be by multiple format.

In one example, primary fault data can be text formatting data, primary fault data mainly include pair The reason of failure briefly explains, and failure occurs and troubleshooting situation etc..

A plurality of primary fault data corresponding with multiple wind power generating set failure are shown in table 1.Wherein, it first is classified as The failure number of wind power generating set, secondary series to the 4th row is respectively failure-description (i.e. to the brief description of failure), failure Reason and troubleshooting situation.Primary fault data are by the failure-description of wind power generating set failure, failure cause and scene The spliced fault data of disposition.

Table 1

In a step 102, word segmentation processing is carried out to primary fault data, obtains pass corresponding with every primary fault data Keyword set.

In one example, every primary fault data can be first cleaned, the fault data after cleaning is carried out at participle Reason, obtains keyword set corresponding with every primary fault data, to improve the speed and accuracy that subsequently calculate.

Specifically, primary fault data can be cleaned from the following aspects：

(1) null character in primary fault data is rejected；

(2) using regular expression, digit unrelated with wind power generating set failure in primary fault data is rejected；Than Such as, it is believed that length is unrelated with the relevant failure of wind power generating set more than 5 numeric string, such as phone or network address, right In string number：XXXXX2390452XXXXXX, X represent Chinese character, and since the length of number is more than 5, canonical may be used Expression formula rejects " 2390452 " corresponding number；It is also assumed that time data is unrelated with the relevant failure of wind power generating set. Illustratively, for time data 2,016,/09,/19 15:44:45, regular expression may be used and rejected.

(3) according to predetermined fixed dictionary, fixation word unrelated with wind power generating set failure in primary fault data is rejected Group.It for example, can be by " field data：X ", " cabinet is numbered：XX " and " unit operation time：" and other items name class vocabulary be added to The titles class vocabulary such as " failure-description " and " troubleshooting " can also be added in predetermined fixed dictionary by predetermined fixed dictionary, Mechanisms class vocabulary such as " Central China division departments " can also be added in predetermined fixed dictionary.

Specifically, word segmentation processing can be carried out to the fault data after cleaning using stammerer participle packet, obtained and every original The corresponding keyword set of beginning fault data.Wherein, stammerer participle packet in dictionary include wind power generating set industry dictionary and/ Or deactivated dictionary.

With reference to table 1, wind power generating set industry dictionary includes：Unit, control cabinet, dynamo bearing, small wind, boarding inspection With variable pitch communication etc..

Deactivated dictionary includes：Empty word and non-retrieval word in computer retrieval.Stop words, which can be generally divided into, can be divided to two classes： One kind be using very extensive, even excessively frequently " i ", " is " and " what " etc. in some words, such as English or " I " and " just " in person's Chinese etc.；It is another kind of to be that the frequency of occurrences is very high but word that practical significance is little, these words lead to Chang Zishen has no meaning, and only putting it into a complete sentence just has certain effect, mainly including auxiliary words of mood, Adverbial word, preposition and conjunction etc., such as " ", " ", " and " and " then " etc..

The embodiment of the present invention also uses " stammerer " Chinese word segmentation in stammerer participle, and " stammerer " Chinese word segmentation refers to Text participle component " Jieba " in Python ", can accurately be cut sentence by " Jieba ", by it is all in sentence can be with Word into word all scans, and can be to long word cutting again.

Stammerer participle algorithm principle be：Efficient word figure scanning is realized based on Trie tree constructions, generates Chinese character in sentence It is possible that into the directed acyclic graph that word situation is formed, maximum probability path is then searched using Dynamic Programming, finds out and is based on The maximum cutting combination of word frequency.Will to stammer, participle is better utilized in wind power generating set accident analysis field, art technology Personnel can install and be familiar with grasping stammerer participle tool.

By taking table 1 as an example, word segmentation processing is carried out respectively to every primary fault data in table 1, the number in table 2 can be obtained According to.First in table 2 is classified as the failure number of wind power generating set, and second is classified as, use corresponding with every primary fault data In the keyword set of characterization wind power generating set failure.Every group of keyword set in table 2 is to every primary fault in table 1 Result after the unified participle of failure-description in data, failure cause and troubleshooting situation.

Table 2

In step 103, a pair keyword set corresponding with all primary fault data clusters, and obtains multiple failures Classification and feature set of words corresponding with each fault category.

Wherein, cluster refers to that the set by physics or abstract object is divided into the mistake of multiple classes being made of similar object Journey, by clustering the set and satisfaction that generated cluster is one group of data object：In same cluster, data object be it is similar, no It is dissimilar with the object between cluster.

Since the primary fault data in the embodiment of the present invention are text formatting, the side of text cluster may be used Formula by one group of document according in a certain regular partition to different groups (cluster), includes similar file as far as possible in same group, And different files is detached as far as possible.

Table 3 illustratively shows the cluster result to multigroup keyword set in table 2.Wherein, it first is listed as clustering The number of fault category afterwards, second is classified as the feature set of words for characterizing each fault category, after third is classified as cluster Failure number included by each fault category.

It is the Fisrt fault class in table 3 by the 1-4 fault clusters in Tables 1 and 2 in the cluster result being shown in Table 3 , be not the second fault category in table 3 by the 5-8 fault clusters in Tables 1 and 2, by Tables 1 and 2 failure (9, 10 ...) it clusters as the third fault category in table 3.

Table 3

Fault category	Feature set of words	Serial number
			One	Variable pitch communicates, and resets, fastening, failure, DP	1、2、3、4…
Two	Tackling key problem, unit cause, and fasten, technological transformation, one hundred days	5、6、7、8…
			Three	It checks, unit, resistance value, damage, restores normal, replace	9、10…
…	…	…

As described above, according to an embodiment of the invention, the primary fault data of wind power generating set can be obtained, it is then right Primary fault data carry out word segmentation processing, obtain keyword set corresponding with every primary fault data；Pair then with it is all The corresponding keyword set of primary fault data is clustered, and obtains multiple fault categories and spy corresponding with each fault category Set of words is levied, it, can not only so as to fulfill analyzing automatically the reliability failure of the historical failure of wind power generating set processing information Analysis efficiency is enough improved, and human resources can be saved.

According to an embodiment of the invention, after the feature set of words of same fault category is got, it can be combined with industry Business understands adds class label, such as keyword to each data, for later quick search.

Specifically, the process of quick search can be：Target keyword is inputted, can get and meet all of keyword Classification then for selected classification, can find all historical failure information under this classification.

By taking the feature set of words in table 3 as an example, the class label of the second fault category can be set as to " one hundred days tackling key problem ", The class label of third fault category is set as " unit resistance value ".

In one example, the feature set of words retrieved can be shown by the form of word cloud figure.When need show with During the corresponding feature set of words of the second fault category, the word cloud figure button of " one hundred days tackling key problem " can be selected, display result is refering to figure 2, more intuitively to show failure correlated characteristic.

When needing to show feature set of words corresponding with third fault category, the word cloud figure of " unit resistance value " can be selected Button, display result is refering to Fig. 3, more intuitively to show failure correlated characteristic.

In another example, fault category labels all in history can be obtained, then selects word cloud figure button, is used The mode of paging shows the word cloud figure of these fault categories.

According to an embodiment of the invention, it is the category feature of quick lock in kainogenesis failure, can be combined with business understanding, Obtain one or more phrases corresponding with new wind power generating set failure；Then it according to one or more phrases, retrieves With the new relevant fault category of wind power generating set failure and feature set of words.

It further, can also be relevant original according to one or more word combination searches and new wind power generating set failure Fault data, by reference to the corresponding troubleshooting scheme of related primary fault data, quick obtaining solves the processing of new failure Method.

In one example, the related phrase of historical failure can also be inputted, is obtained all comprising this keyword in history Fault detail information and continue to select, to obtain the treating method of these failures.

In another example, multiple failure correlation phrases can be divided with space, and search condition can be set as including The one or more of these phrases can also be set as including the whole of these phrases.

In some embodiments, after new fault message input, participle and cluster can also be performed automatically, with to new The classification of fault message and storage；Can also participle and cluster be performed according to the participle or cluster option of setting, realized to new The classification of fault message and storage.

It should be noted that above-mentioned cluster process can be that primary, Field Force after point good classification is carried out in certain phase It inquires at any time, the keyword search of similar Baidu does not need to cluster every time, keyword is provided when staff inquires, Not necessarily all segment each time.

The cluster process in step 103 is described in detail below.Based on different cluster principles, the embodiment of the present invention Two kinds of clustering methods are provided, please refer to Fig. 4 and Fig. 5 respectively.

Fig. 4 is the flow signal for the processing method of wind power generating set fault data that another embodiment of the present invention provides Figure.Fig. 4 the difference from Fig. 1 is that, the step 103 in Fig. 1 can be refined as step 1031 and step 1036 in Fig. 4, use The fault data in the embodiment of the present invention is clustered in based on text intersection accounting principle.In the example of fig. 4, with owning The corresponding keyword set of primary fault data constitutes original keyword set to be clustered and is combined.

In step 1031, the first keyword set of middle selection is combined from original keyword set to be clustered.

In step 1032, judge respectively original keyword set to be clustered be combined in addition to the first keyword set, its Whether he every group of keyword set can gather with the first keyword set for one kind.

Specifically, in being combined from original keyword set to be clustered in addition to the first keyword set, other keywords Third keyword set is chosen in set successively；The keyword total number and every group of third for obtaining the first keyword set respectively close The keyword total number of keyword set, it is total using the keyword total number of the larger keyword set of keyword total number as first Number n；The second total number of the shared keyword between the first keyword set and every group of third keyword set is obtained respectively m；The ratio n/m of each second total number and corresponding first total number is calculated respectively；If the second total number and corresponding first The ratio n/m of total number is more than predetermined ratio, then judges that third keyword set corresponding with the second total number m can be with first Keyword set is gathered for one kind.

In step 1033, it can gather all with the first keyword set for the keyword in a kind of keyword set Combination, as characterizing the feature set of words of corresponding with the first keyword set fault category, and had not been able to all Gather to be combined for a kind of updated keyword set to be clustered of keyword set composition with the first keyword set and (please refer to table 3)。

In step 1034, the second keyword set of middle selection is combined from updated keyword set to be clustered.

In step 1035, judge updated keyword set to be clustered in being combined in addition to the second keyword set respectively , whether other every group of keyword set can gather with the second keyword set for one kind.

In step 1036, it can gather all with the second keyword set for the keyword in a kind of keyword set Combination, as characterizing the feature set of words (please referring to table 3) of corresponding with the second keyword set fault category, until The number of keyword set during updated keyword set to be clustered is combined drops to 0.

It illustrates below and the clustering method based on text intersection accounting principle in Fig. 4 is described in detail, if sharing 100 Bar failure text, clustering method are as follows：

(1) text 1 is selected from 100 failure texts；

(2) remaining 99 texts are taken out successively, and 99 texts of residue are subjected to intersection accounting calculating with text 1 respectively；With For text 1 and text 2, intersection accounting computational methods are as follows：

(2-1) is according to formula：Len (intersection (the phrase number in text 1, the phrase number in text 2)), calculates text 1 and text 2 shared phrase number n；

(2-2) is according to formula：Max (the phrase number in text 1, the phrase number in text 2), obtains text 1 and text The larger corresponding phrase number m of text of phrase number in sheet 2；

(2-3) judges whether the ratio n/m of n and m is more than the threshold value of setting, if n/m is more than the threshold value of setting, expository writing Sheet 1 and the similarity of text 2 are higher, can gather for one kind.

Next, aforesaid operations are performed successively to remaining 98 texts.If result (2) is：It is shared in 99 texts 10 texts can gather with text 1 for one kind, then aforesaid operations are re-executed to remaining 89 texts, until clustering not successfully Text quantity be 0.

Being gathered based on text intersection accounting principle to the fault data in the embodiment of the present invention in the embodiment of the present invention The amount of calculation that the method for class is related to is few, has the advantages that computational efficiency is high.

Fig. 5 is the flow signal for the processing method of wind power generating set fault data that fifth embodiment of the invention provides Figure.Fig. 5 the difference from Fig. 1 is that, the step 103 in Fig. 1 can also be refined as step 1037 in Fig. 5 to step 1039, For being clustered based on Agglomerative Hierarchical Clustering principle to the fault data in the embodiment of the present invention.With all primary fault data Corresponding keyword set constitutes original keyword set to be clustered and is combined.

In step 1037, the first phase between every two groups of keyword sets during original keyword set to be clustered is combined is calculated Like angle value.

Wherein, the Clustering Effect for raising wind power generating set fault data, can be first by keyword set first from text After format conversion is vector format, then the first similarity between two groups of vectors corresponding with every two groups of keyword sets is calculated Value.

Assuming that altogether there are three key words text set, wherein, text 1 is：Problem communicates, and disconnects, and resets, communication, wink When, DP；Text 2 is：Nothing, variable pitch communicate, and check, boarding, fastening, cabinet, problem, and failure again, is, abnormal, class, DP, head； Text 3 is：Tackling key problem, causes, technological transformation, one hundred days.For text 1, text 2 and text 3 are converted to vector format from text formatting,

Can total text be obtained by text 1, text 2 and text 3 first.Such as：It will be copied to after word duplicate removal in text 1 In total text, then the word in other texts is copied to successively in total text, if being had existed in the total text of some word, no It replicates, the total text got is：Problem communicates, and disconnects, and resets, and instantaneously, DP, nothing, variable pitch checks, boarding, fastens, cabinet, Failure again, is, abnormal, class, head, and tackling key problem causes, technological transformation, one hundred days.

Then the vector expression of text 1, text 2, text 3 is obtained successively.It specifically includes：Respectively by text 1, text 2, text Word in sheet 3 is compared with the word in total text, such as：If the word at the position i of total text occurred in text 1 K times, then k is denoted as at 1 position i of vector, if do not occurred, is denoted as 0.

By above-mentioned two step, can obtain the result after the DUAL PROBLEMS OF VECTOR MAPPING of above-mentioned text 1- texts 3 is：

Vectorial 1=[1,2,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]；

Vectorial 2=[1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0]；

Vectorial 3=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1].

Specifically, it can be calculated in terms of cosine similarity and Euclidean distance two corresponding with every two groups of keyword sets Two groups of vectors between the first similarity value.Cosine similarity is referred to the calculation formula of Euclidean distance with reference to related mathematics Book, herein without repeating.

In step 1038, two groups of keyword sets of the first similarity value maximum are gathered into the keyword set new for one group It closes, and two groups of keywords of the first similarity value maximum is removed during new keyword set and original keyword set to be clustered are combined Outside set, other keyword sets form updated keyword set to be clustered and are combined.

In step 1039, calculate during the updated keyword set to be clustered is combined between every two groups of keyword sets The second similarity value, and two groups of keyword sets of second similarity value maximum are gathered into the keyword set new for one group It closes, cycle terminates when the second similarity value of maximum in all second similarity values is more than predetermined threshold.

According to an embodiment of the invention, cycle terminates or until the updated keyword set to be clustered is combined In the number of keyword set drop to predetermined number.Those skilled in the art can choose according to practical test effect to be appointed One loop termination condition, herein without limiting.

By taking cosine similarity as an example, can all texts be first converted into vector form, then calculated between text two-by-two Cosine similarity, (1- cosine is similar by the minimum value min of (1- cosine similarities) between finally more all texts two-by-two Degree) and given threshold size, recycle and terminates when min (1- cosine similarities) is more than given threshold, cluster completion.

By taking Euclidean distance as an example, can all texts be first converted into vector form；Then it calculates between text two-by-two Euclidean distance；The minimum value min (Euclidean distance) of Euclidean distance between finally more all texts two-by-two and given threshold Size is recycled when min (Euclidean distance) is more than given threshold and is terminated, and cluster is completed.

Being gathered based on Agglomerative Hierarchical Clustering principle to the fault data in the embodiment of the present invention in the embodiment of the present invention The method of class can be performed by computer, and the operand being able to carry out is larger, so that cluster result has high standard Exactness.

As described above, fault data processing method is highly suitable for the large volume document to routine work in the embodiment of the present invention Search can be completed the combing of fan trouble data by text intelligent clustering, research staff can also be facilitated to carry out data point Class and statistics, such as the reliability failure analysis of wind power generating set.

In practical applications, can fault reference system be established based on the fault data processing method in the embodiment of the present invention System, the major function of the fault reference system can include fault data intelligent classification, new fault data is sorted out, certain fault mode Historical failure detailed description information inquiry etc. under corresponding diagnosis and treating method inquiry, certain fault mode.

Fig. 6 is the structure diagram of the processing unit of wind power generating set fault data that one embodiment of the invention provides. The processing unit of wind power generating set fault data shown in Fig. 6 includes acquisition module, word-dividing mode and cluster module.

Wherein, acquisition module is used to obtain the primary fault data of wind power generating set.

Word-dividing mode is used to carry out word segmentation processing to primary fault data, obtains pass corresponding with every primary fault data Keyword set.

Cluster module clusters for a pair keyword set corresponding with all primary fault data, obtains multiple failures Classification and feature set of words corresponding with each fault category.

Fig. 7 is the structural representation of the processing unit of wind power generating set fault data that another embodiment of the present invention provides Figure.Fig. 7 the difference from Fig. 6 is that, the word-dividing mode in Fig. 6 can be refined as cleaning unit and participle unit in Fig. 7.

Wherein, cleaning unit, for cleaning every primary fault data.

Participle unit is used to carry out word segmentation processing to the fault data after cleaning, obtains corresponding with every primary fault data Keyword set.

Fig. 8 is the structural representation of the processing unit of wind power generating set fault data that further embodiment of this invention provides Figure.Fig. 8 the difference from Fig. 6 is that, the cluster module in Fig. 6 can be refined as selection unit in Fig. 8, judging unit and One cluster cell.

Wherein, selection unit is used to be combined the first keyword set of middle selection from original keyword set to be clustered, original to treat Cluster keyword set is combined the combination formed for keyword set corresponding with all primary fault data.

Judging unit be used for judge respectively original keyword set to be clustered be combined in addition to the first keyword set, other Whether every group of keyword set can gather with the first keyword set for one kind.

First cluster cell is used to the first keyword set to gather all for the pass in a kind of keyword set Key contamination, as characterizing the feature set of words of corresponding with the first keyword set fault category, and by it is all not It can gather with the first keyword set and be combined for a kind of updated keyword set to be clustered of keyword set composition.

Selection unit is additionally operable to be combined the second keyword set of middle selection from updated keyword set to be clustered.

Judging unit is additionally operable to judge updated keyword set to be clustered in being combined in addition to the second keyword set respectively , whether other every group of keyword set can gather with the second keyword set for one kind.

Fig. 9 is the structural representation of the processing unit of wind power generating set fault data that yet another embodiment of the invention provides Figure.Fig. 9 the difference from Fig. 6 is that, the cluster module in Fig. 6 can also be refined as computing unit in Fig. 8 and the second cluster Unit.

Wherein, computing unit is used to calculate during original keyword set to be clustered is combined the between every two groups of keyword sets One similarity value, original keyword set to be clustered are combined the group formed for keyword set corresponding with all primary fault data It closes.

Second cluster cell is used to two groups of keyword sets of the first similarity value maximum gathering the keyword new for one group Set, and two groups of maximum keys of the first similarity value are removed during new keyword set and original keyword set to be clustered are combined Outside set of words, other keyword sets form updated keyword set to be clustered and are combined.

The embodiment of the present invention also provides a kind of processing unit of wind power generating set fault data, including memory, processing On a memory and the program that can run on a processor, when processor execution program, realizes wind-force as described above for device and storage The processing method of generating set fault data.

The embodiment of the present invention also provides a kind of computer readable storage medium, is stored thereon with program, program is by processor The processing method of wind power generating set fault data as described above is realized during execution.

It should be clear that each embodiment in this specification is described by the way of progressive, each embodiment it Between just to refer each other for the same or similar part, the highlights of each of the examples are it is different from other embodiment it Place.For device embodiment, related part may refer to the declaratives of embodiment of the method.Not office of the embodiment of the present invention It is limited to particular step and structure described above and shown in figure.Those skilled in the art can understand the present invention in fact Apply be variously modified after the spirit of example, modification and addition or the sequence between changing the step.Also, it is risen in order to concise See, omit the detailed description to known method technology here.

Structures described above frame functional block shown in figure can be implemented as hardware, software, firmware or their group It closes.When realizing in hardware, it may, for example, be electronic circuit, application-specific integrated circuit (ASIC), appropriate firmware, insert Part, function card etc..When being realized with software mode, the element of the embodiment of the present invention is used to perform the program of required task Or code segment.Either code segment can be stored in machine readable media program or the data by being carried in carrier wave are believed It number is sent in transmission medium or communication links." machine readable media " can include being capable of any of storage or transmission information Medium.The example of machine readable media includes electronic circuit, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disk, CD-ROM, CD, hard disk, fiber medium, radio frequency (RF) link, etc..Code segment can via such as because The computer network of special net, Intranet etc. is downloaded.

The embodiment of the present invention can be realized in other specific forms, without departing from its spirit and essential characteristics.It is for example, special Determine the algorithm described in embodiment to be changed, and system architecture is without departing from the substantially smart of the embodiment of the present invention God.Therefore, current embodiment is all counted as being exemplary rather than limited, the model of the embodiment of the present invention in all respects Enclose and defined by appended claims rather than foregoing description, also, fall into claim meaning and equivalent in the range of it is complete Portion changes all to be included among the range of the embodiment of the present invention.

Claims

1. a kind of processing method of wind power generating set fault data, which is characterized in that including：

Obtain the primary fault data of wind power generating set；

Word segmentation processing is carried out to the primary fault data, obtains keyword set corresponding with every primary fault data；

Pair keyword set corresponding with all primary fault data clusters, obtain multiple fault categories and with each failure The corresponding feature set of words of classification.

2. according to the method described in claim 1, it is characterized in that, it is described to the primary fault data carry out word segmentation processing, Keyword set corresponding with every primary fault data is obtained, including：

Clean every primary fault data；

Word segmentation processing is carried out to the fault data after cleaning, obtains keyword set corresponding with every primary fault data.

3. according to the method described in claim 2, it is characterized in that, it is described cleaning every primary fault data, including：

Reject the null character in the primary fault data；And/or

Using regular expression, digit unrelated with the wind power generating set failure in the primary fault data is rejected； And/or

According to predetermined fixed dictionary, fixation word unrelated with the wind power generating set failure in the primary fault data is rejected Group.

4. according to the method described in claim 3, it is characterized in that, described pair cleaning after fault data carry out word segmentation processing, Keyword set corresponding with every primary fault data is obtained, including：

Word segmentation processing is carried out to the fault data after cleaning using stammerer participle packet, is obtained corresponding with every primary fault data Keyword set, wherein, the dictionary in the stammerer participle packet includes wind power generating set industry dictionary and/or deactivated dictionary.

5. according to the method described in claim 1, it is characterized in that, described pair of keyword corresponding with all primary fault data Set is clustered, and obtains multiple fault categories and feature set of words corresponding with each fault category, including：

From original keyword set to be clustered be combined it is middle selection the first keyword set, the original keyword set to be clustered be combined for The combination that keyword set corresponding with all primary fault data is formed；

Judge respectively the original keyword set to be clustered be combined in addition to first keyword set, other every group it is crucial Whether set of words can gather with first keyword set for one kind；

Can gather all with first keyword set for the crucial contamination in a kind of keyword sets, as with In the feature set of words of characterization fault category corresponding with first keyword set, and had not been able to all and described first Keyword set is gathered to be combined for a kind of updated keyword set to be clustered of keyword set composition；

The second keyword set of middle selection is combined from the updated keyword set to be clustered；

Judge respectively the updated keyword set to be clustered be combined in addition to second keyword set, other every group Whether keyword set can gather with second keyword set for one kind；

Can gather all with second keyword set for the crucial contamination in a kind of keyword sets, as with In the feature set of words for characterizing fault category corresponding with second keyword set, until the updated pass to be clustered The number of keyword set in keyword set group drops to 0.

6. according to the method described in claim 5, it is characterized in that, described judge the original keyword set to be clustered respectively In group in addition to first keyword set, whether other every group of keyword set can gather with first keyword set For one kind, including：

From the original keyword set to be clustered be combined in addition to first keyword set, in other keyword sets according to Secondary selection third keyword set；

Obtain respectively first keyword set keyword total number and every group described in third keyword set keyword Total number, using the keyword total number of the larger keyword set of the keyword total number as the first total number；

Second of the shared keyword between third keyword set described in first keyword set and every group is obtained respectively Total number；

If the ratio of second total number and corresponding first total number is more than predetermined ratio, judge and the described second sum The corresponding third keyword set of mesh can gather with first keyword set for one kind.

7. according to the method described in claim 1, it is characterized in that, described pair of keyword corresponding with all primary fault data Set is clustered, and obtains multiple fault categories and feature set of words corresponding with each fault category, including：

Calculate the first similarity value between every two groups of keyword sets, the original during the original keyword set to be clustered is combined The keyword set to be clustered that begins is combined the combination formed for keyword set corresponding with all primary fault data；

Two groups of keyword sets of first similarity value maximum are gathered into the keyword set new for one group, and will be described new Keyword set and the original keyword set to be clustered remove two groups of keyword sets of first similarity value maximum in being combined Outside conjunction, other keyword sets form updated keyword set to be clustered and are combined；

The second similarity value between every two groups of keyword sets during the updated keyword set to be clustered is combined is calculated, and Two groups of keyword sets of second similarity value maximum are gathered into the keyword set new for one group, until all second similar The second similarity value of maximum in angle value is more than predetermined threshold；Alternatively, until the updated keyword set to be clustered is combined In the number of keyword set drop to predetermined number.

8. the method according to the description of claim 7 is characterized in that the keyword set is combined into text formatting, the calculating is every The first similarity value between two groups of keyword sets, including：

The keyword set is converted into vector format from text formatting；

Calculate the first similarity value between two groups of vectors corresponding with every two groups of keyword sets.

9. according to the method described in claim 1, it is characterized in that, in described pair of key corresponding with all primary fault data Set of words is clustered, after obtaining multiple fault categories and feature set of words corresponding with each fault category, the method It further includes：

For each fault category, class label is set；

Feature set of words corresponding with the class label is indexed according to the class label.

10. according to the method described in claim 1, it is characterized in that, described pair of pass corresponding with all primary fault data Keyword set is clustered, after obtaining multiple fault categories and feature set of words corresponding with each fault category, the side Method further includes：

Obtain one or more phrases corresponding with new wind power generating set failure；

According to one or more of phrase indexes and the new relevant fault category of wind power generating set failure and feature Set of words；Alternatively,

According to one or more of word combination searches and the relevant primary fault data of new wind power generating set failure.

11. a kind of processing unit of wind power generating set fault data, which is characterized in that including：

Acquisition module, for obtaining the primary fault data of wind power generating set；

Word-dividing mode for carrying out word segmentation processing to the primary fault data, obtains corresponding with every primary fault data Keyword set；

Cluster module clusters for a pair keyword set corresponding with all primary fault data, obtains multiple failure classes Feature set of words other and corresponding with each fault category.

12. according to the devices described in claim 11, which is characterized in that the word-dividing mode specifically includes：

Cleaning unit, for cleaning every primary fault data；

Participle unit for carrying out word segmentation processing to the fault data after cleaning, obtains corresponding with every primary fault data Keyword set.

13. according to the devices described in claim 11, which is characterized in that the cluster module specifically includes：

Selection unit, it is described original to treat for being combined the first keyword set of middle selection from the original keyword set to be clustered Cluster keyword set is combined the combination formed for keyword set corresponding with all primary fault data；

Judging unit, for judging the original keyword set to be clustered in being combined in addition to first keyword set respectively , whether other every group of keyword set can gather with first keyword set for one kind；

First cluster cell, for that can gather all with first keyword set for the pass in a kind of keyword set Key contamination, as characterizing the feature set of words of corresponding with first keyword set fault category, and by institute Have to have not been able to gather with first keyword set and form updated keyword set to be clustered for a kind of keyword set Group；

The selection unit is additionally operable to be combined the second keyword set of middle selection from the updated keyword set to be clustered；

The judging unit is additionally operable to judge the updated keyword set to be clustered in being combined except described second is crucial respectively Whether outside set of words, other every group of keyword set can gather with second keyword set for one kind；

First cluster cell is additionally operable to second keyword set to gather all for a kind of keyword set In crucial contamination, as characterizing the feature set of words of corresponding with second keyword set fault category, The number of keyword set in the updated keyword set to be clustered is combined drops to 0.

14. according to the devices described in claim 11, which is characterized in that the cluster module also specifically includes：

Computing unit, for calculating the first phase during the original keyword set to be clustered is combined between every two groups of keyword sets Like angle value, the original keyword set to be clustered is combined the group formed for keyword set corresponding with all primary fault data It closes；

Second cluster cell, for two groups of keyword sets of first similarity value maximum to be gathered the keyword new for one group Set, and remove first similarity value most during the new keyword set and the original keyword set to be clustered are combined Outside two groups of big keyword sets, other keyword sets form updated keyword set to be clustered and are combined；

The computing unit, be additionally operable to calculate during the updated keyword set to be clustered is combined every two groups of keyword sets it Between the second similarity value, and two groups of keyword sets of second similarity value maximum are gathered into the keyword set new for one group It closes, until the second similarity value of maximum in all second similarity values is more than predetermined threshold；Alternatively, until described updated The number of keyword set during keyword set to be clustered is combined drops to predetermined number.

15. a kind of processing unit of wind power generating set fault data including memory, processor and stores on a memory simultaneously The program that can be run on a processor, which is characterized in that the processor realizes such as claim 1-10 when performing described program The processing method of wind power generating set fault data described in any one.

16. a kind of computer readable storage medium, is stored thereon with program, which is characterized in that described program is executed by processor The processing method of wind power generating set fault datas of the Shi Shixian as described in claim 1-10 any one.