Disclosure of Invention
In view of the above, the present invention is directed to a system and a method for processing a power data file, so as to improve the reliability of classified storage.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical scheme:
a method of processing a power data file, comprising:
for each of a plurality of to-be-stored power data files to be stored, marking the to-be-stored power data file as a to-be-processed power data file;
analyzing target power abnormality characterization data corresponding to the power data file to be processed by utilizing a plurality of power data analysis networks, wherein the target power abnormality characterization data are used for reflecting the abnormal state of a power system corresponding to the power data file to be processed;
Performing first classification processing on the plurality of power data files to be stored based on the corresponding target power abnormality characterization data to form at least one first classification set, wherein each first classification set comprises at least one power data file to be stored;
based on the similarity between the power data files to be stored, respectively carrying out second classification processing in each first classification set to form at least one second classification set corresponding to each first classification set, wherein each second classification set comprises at least one power data file to be stored;
and respectively classifying and storing each obtained second classification set.
In some preferred embodiments, in the above power data file processing method, the step of performing a first classification process on the plurality of power data files to be stored based on the corresponding target power anomaly characterization data to form at least one first classification set includes:
carrying out consistency or similarity analysis on target power abnormality characterization data corresponding to each two to-be-stored power data files in the plurality of to-be-stored power data files;
And distributing the analyzed power data files to be stored, of which the corresponding target power abnormality characterization data are consistent, or the power data files to be stored, of which the corresponding target power abnormality characterization data belong to the same parameter interval, into the same first classification set to form at least one first classification set.
In some preferred embodiments, in the above processing method of a power data file, the step of performing a second classification process inside each of the first classification sets based on a similarity between power data files to be stored to form at least one second classification set corresponding to each of the first classification sets includes:
for each first classification set, performing a quantity statistics operation on the power data files to be stored, which are included in the first classification set, to form a corresponding file quantity statistics value, determining the first classification set as a corresponding second classification set if the file quantity statistics value is smaller than or equal to a predetermined first reference value, and determining the first classification set as a corresponding third classification set if the file quantity statistics value is greater than the first reference value;
And respectively carrying out second classification processing inside each third classification set based on the similarity between the power data files to be stored to form at least one second classification set corresponding to each third classification set.
In some preferred embodiments, in the above processing method of a power data file, the step of performing a second classification process inside each of the third classification sets based on a similarity between the power data files to be stored to form at least one second classification set corresponding to each of the third classification sets includes:
performing keyword extraction operation on each power data file to be stored in the third classification set to form a keyword sequence corresponding to each power data file to be stored, wherein each keyword in the keyword sequence belongs to a reference keyword set configured for the power system field;
performing feature mining processing on the corresponding keyword sequences respectively to form keyword feature representations corresponding to the power data files to be stored;
and calculating the similarity between the corresponding power data files to be stored based on the keyword characteristic representation, and respectively carrying out second classification processing inside each third classification set based on the similarity between the power data files to be stored to form at least one second classification set corresponding to each third classification set.
In some preferred embodiments, in the above processing method of a power data file, the step of performing feature mining processing on the corresponding keyword sequences to form a keyword feature representation corresponding to the power data file to be stored includes:
for each keyword in the keyword sequence, carrying out embedding processing on the keyword to form word embedding feature representation corresponding to the keyword; determining whether each keyword in the keyword sequence has a related keyword or not based on target power data corpus, wherein the co-occurrence probability of the related keyword and the corresponding keyword in the target power data corpus is larger than a preset probability;
marking each keyword in the keyword sequence without related keywords as a first keyword, marking each keyword in the keyword sequence with related keywords as a second keyword, and marking a word embedding feature representation of each first keyword to be a target word embedding feature representation of the first keyword;
For each second keyword, marking word embedding feature representations corresponding to related keywords corresponding to the second keyword so as to mark the related word embedding feature representations corresponding to the second keyword, and performing transposition operation on the related word embedding feature representations so as to form transposed word embedding feature representations corresponding to the second keyword;
respectively calculating word embedding feature representations corresponding to each second keyword, corresponding transposed word embedding feature representations and related word embedding feature representations, and performing fusion operation to form target word embedding feature representations corresponding to each second keyword;
and performing splicing operation on the target word embedded feature representation corresponding to each keyword in the keyword sequence to form the keyword feature representation corresponding to the power data file to be stored.
In some preferred embodiments, in the above method for processing a power data file, the step of calculating a similarity between corresponding power data files to be stored based on the keyword feature representation, and performing a second classification process inside each third classification set based on the similarity between the power data files to be stored, to form at least one second classification set corresponding to each third classification set, includes:
Performing average value calculation on the keyword characteristic representation corresponding to each to-be-stored power data file in the third classification set so as to output a corresponding average value keyword characteristic representation;
for each to-be-stored power data file in the third classification set, calculating cosine similarity between the keyword feature representation corresponding to the to-be-stored power data file and the average keyword feature representation to obtain cosine similarity corresponding to the to-be-stored power data file;
and performing second classification processing on each to-be-stored power data file in the third classification set based on a plurality of continuous similarity intervals configured by cosine, so as to form at least one second classification set corresponding to the third classification set, wherein cosine similarity corresponding to each to-be-stored power data file in the second classification set belongs to the same similarity interval.
In some preferred embodiments, in the above method for processing a power data file, the step of analyzing, using a plurality of power data analysis networks, target power anomaly characterization data corresponding to the power data file to be processed includes:
performing feature mining operation on a to-be-processed power data file by using a plurality of power data analysis networks to output a corresponding plurality of initial data feature representations, wherein each power data analysis network in the plurality of power data analysis networks is used for outputting corresponding power abnormality characterization data based on loaded data, and the to-be-processed power data file belongs to operation text data of a power system;
Performing feature representation fusion operation on the plurality of initial data feature representations to form corresponding aggregate data feature representations;
and analyzing target power abnormality characterization data corresponding to the power data file to be processed based on the aggregate data feature representation, wherein the target power abnormality characterization data is used for reflecting the abnormal state of the power system corresponding to the power data file to be processed.
The embodiment of the invention also provides a processing system of the power data file, which comprises the following steps:
the data file marking module is used for marking each to-be-stored power data file in a plurality of to-be-stored power data files to be stored as a to-be-processed power data file;
the power abnormality analysis module is used for analyzing target power abnormality characterization data corresponding to the power data file to be processed by utilizing a plurality of power data analysis networks, and the target power abnormality characterization data is used for reflecting the abnormal state of the power system corresponding to the power data file to be processed;
the first classification processing module is used for carrying out first classification processing on the plurality of power data files to be stored based on the corresponding target power abnormality characterization data so as to form at least one first classification set, wherein each first classification set comprises at least one power data file to be stored;
The second classification processing module is used for respectively carrying out second classification processing in each first classification set based on the similarity between the power data files to be stored to form at least one second classification set corresponding to each first classification set, wherein each second classification set comprises at least one power data file to be stored;
and the classification storage module is used for respectively carrying out classification storage on each obtained second classification set.
In some preferred embodiments, in the above power data file processing system, the first classification processing module is specifically configured to:
carrying out consistency or similarity analysis on target power abnormality characterization data corresponding to each two to-be-stored power data files in the plurality of to-be-stored power data files;
and distributing the analyzed power data files to be stored, of which the corresponding target power abnormality characterization data are consistent, or the power data files to be stored, of which the corresponding target power abnormality characterization data belong to the same parameter interval, into the same first classification set to form at least one first classification set.
In some preferred embodiments, in the above power data file processing system, the second classification processing module is specifically configured to:
For each first classification set, performing a quantity statistics operation on the power data files to be stored, which are included in the first classification set, to form a corresponding file quantity statistics value, determining the first classification set as a corresponding second classification set if the file quantity statistics value is smaller than or equal to a predetermined first reference value, and determining the first classification set as a corresponding third classification set if the file quantity statistics value is greater than the first reference value;
and respectively carrying out second classification processing inside each third classification set based on the similarity between the power data files to be stored to form at least one second classification set corresponding to each third classification set.
According to the processing system and the processing method for the power data files, for each power data file to be stored, the power data file to be stored is marked as a power data file to be processed; analyzing target power abnormality characterization data corresponding to a power data file to be processed by utilizing a plurality of power data analysis networks; performing first classification processing on a plurality of power data files to be stored based on corresponding target power abnormality characterization data to form at least one first classification set; based on the similarity between the power data files to be stored, respectively carrying out second classification processing in each first classification set to form at least one second classification set corresponding to each first classification set; and respectively storing each second classification set in a classification way. Based on the foregoing, two-stage classification processing can be performed through the similarity between the data files and the corresponding target power abnormality characterization data, so that the classification accuracy is higher, and the reliability of classification storage is improved.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides a processing platform for a power data file. Wherein the processing platform of the power data file may include a memory and a processor.
In detail, the memory and the processor are electrically connected directly or indirectly to realize transmission or interaction of data. For example, electrical connection may be made to each other via one or more communication buses or signal lines. The memory may store at least one software functional module (computer program) that may exist in the form of software or firmware. The processor may be configured to execute an executable computer program stored in the memory, thereby implementing a method for processing a power data file provided by an embodiment of the present invention (as described below).
Alternatively, in some embodiments, the Memory may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), and the like.
Alternatively, in some embodiments, the processor may be a general purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a System on Chip (SoC), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
Alternatively, in some embodiments, the processing platform of the power data file may be a server with data processing capabilities.
With reference to fig. 2, an embodiment of the present invention further provides a method for processing a power data file, which is applicable to the above-mentioned processing platform for the power data file. The method steps defined by the flow related to the processing method of the power data file can be realized by a processing platform of the power data file.
The specific flow shown in fig. 2 will be described in detail.
Step S100, for each of a plurality of power data files to be stored, marking the power data file to be stored as a power data file to be processed.
In the embodiment of the present invention, the processing platform of the power data file may mark, for each of a plurality of power data files to be stored, the power data file to be stored as a power data file to be processed (may be performed sequentially or in parallel).
And step 200, analyzing target power abnormality characterization data corresponding to the power data file to be processed by utilizing a plurality of power data analysis networks.
In the embodiment of the invention, the processing platform of the power data file can analyze the target power abnormality characterization data corresponding to the power data file to be processed by utilizing a plurality of power data analysis networks. The target power abnormality characterization data is used for reflecting an abnormal state of the power system corresponding to the power data file to be processed.
Step S300, performing a first classification process on the plurality of power data files to be stored based on the corresponding target power anomaly characterization data, so as to form at least one first classification set.
In the embodiment of the present invention, the processing platform of the power data file may perform a first classification process on the plurality of power data files to be stored based on the corresponding target power anomaly characterization data, so as to form at least one first classification set. Each of the first sorted sets includes at least one power data file to be stored.
Step S400, based on the similarity between the power data files to be stored, performing a second classification process inside each of the first classification sets, so as to form at least one second classification set corresponding to each of the first classification sets.
In the embodiment of the present invention, the processing platform of the power data file may perform a second classification process inside each of the first classification sets based on the similarity between the power data files to be stored, so as to form at least one second classification set corresponding to each of the first classification sets. Each second class set includes at least one power data file to be stored.
And S500, respectively storing each obtained second classification set in a classification mode.
In the embodiment of the invention, the processing platform of the power data file may respectively store each obtained second classification set in a classified manner. In this manner, each power data file to be stored in one second sorted set is stored to the same storage device for subsequent recall, etc.
Based on the foregoing, that is, the steps S100 to S500, two-stage classification processing may be performed by using the similarity between the data files and the corresponding target power anomaly characterization data, so that the classification accuracy is higher, and the reliability of classification storage is improved.
Optionally, in some embodiments, the step of analyzing the target power anomaly characteristic data corresponding to the power data file to be processed by using a plurality of power data analysis networks may further include the following content, such as step S110, step S120 and step S130.
Step S110, performing feature mining operation on the power data file to be processed by using a plurality of power data analysis networks to output a plurality of corresponding initial data feature representations.
In the embodiment of the invention, the processing platform of the power data file can perform feature mining operation on the power data file to be processed by utilizing a plurality of power data analysis networks so as to output a plurality of corresponding initial data feature representations. Each of the plurality of power data analysis networks is configured to output corresponding power anomaly characterization data based on the loaded data, and the to-be-processed power data file belongs to operation text data of the power system, that is, the to-be-processed power data file is configured to describe an operation process of the power system.
And step S120, performing feature representation fusion operation on the plurality of initial data feature representations to form corresponding aggregate data feature representations.
In the embodiment of the invention, the processing platform of the power data file may perform a feature representation fusion operation on the plurality of initial data feature representations to form a corresponding aggregate data feature representation.
And step S130, analyzing target power abnormality characterization data corresponding to the power data file to be processed based on the aggregate data characteristic representation.
In the embodiment of the invention, the processing platform of the power data file may analyze, based on the aggregate data feature representation, target power abnormality characterization data corresponding to the power data file to be processed, where the target power abnormality characterization data is used to reflect an abnormal state of a power system corresponding to the power data file to be processed, such as whether the power system is abnormal, the degree of abnormality, and the like.
Based on the above, since the feature mining operation is performed by using the plurality of electric power data analysis networks, a plurality of initial data feature representations can be obtained, so that the aggregate data feature representations of the target electric power abnormality characterization data analyzed by the user can be further obtained through fusion, that is, the basis for electric power abnormality analysis is more sufficient, and therefore, the reliability of electric power data analysis can be improved.
Optionally, in some embodiments, the step of performing feature mining operation on the power data file to be processed by using the plurality of power data analysis networks to output a corresponding plurality of initial data feature representations may further include the following:
determining a to-be-processed power data file, and analyzing to-be-processed power data file fragments included in the to-be-processed power data file, for example, splitting the to-be-processed power data file to form corresponding to-be-processed power data file fragments, for example, when forming a plurality of to-be-processed power data file fragments, each to-be-processed power data file fragment may not have a temporal sequence relationship, that is, may reflect operation data of a power system at different times, or each to-be-processed power data file fragment may have an on-device correspondence, that is, may reflect operation data of different power devices;
marking the power data file segments to be processed to be loaded data, wherein the power data file segments are loaded into each of a plurality of power data analysis networks, and illustratively, network parameters among the plurality of power data analysis networks can be different, for example, the sizes of filter matrixes among different power data analysis networks can be different, network architectures can be different, and the number of filter matrixes among different power data analysis networks can be different;
And mining one initial data characteristic representation in a plurality of initial data characteristic representations by utilizing each of the plurality of power data analysis networks, wherein the one initial data characteristic representation comprises an initial characteristic representation corresponding to the power data file segment to be processed.
Optionally, in some embodiments, the power data file to be processed includes a plurality of segments of the power data file to be processed, based on which the step of mining out one of a plurality of initial data feature representations with each of the plurality of power data analysis networks may further include:
performing data mining operation on the plurality of to-be-processed power data file fragments by using a data mining sub-network included in each power data analysis network, wherein the data mining sub-network is used for mining a plurality of initial feature representations in the plurality of to-be-processed power data file fragments, and the data mining operation can refer to mapping and filtering processing of feature spaces;
determining the related relation description data of a plurality of to-be-processed power data file fragments, wherein the related relation description data is used for reflecting the distribution related relation of the plurality of to-be-processed power data file fragments in the to-be-processed power data file, such as forming a time sequence relation;
And carrying out association mining operation on the plurality of initial characteristic representations based on the related relation description data so as to output corresponding initial data characteristic representations.
Optionally, in some embodiments, the step of performing a data mining operation on the plurality of power data file segments to be processed by using a data mining sub-network included in each of the power data analysis networks may further include the following steps:
loading a plurality of to-be-processed power data file fragments to be loaded into a plurality of data mining sub-networks included in each power data analysis network, wherein the plurality of data mining sub-networks are used for mining a plurality of groups of intermediate feature representations in the plurality of to-be-processed power data file fragments, the plurality of data mining sub-networks are in one-to-one correspondence with the plurality of groups of intermediate feature representations, each group of intermediate feature representations in the plurality of groups of intermediate feature representations comprises a plurality of intermediate feature representations, and the plurality of intermediate feature representations are in one-to-one correspondence with the plurality of to-be-processed power data file fragments;
and merging the intermediate feature representations corresponding to the same power data file segment to be processed in the plurality of sets of intermediate feature representations to form a plurality of initial feature representations, wherein the intermediate feature representations corresponding to the same power data file segment to be processed can be spliced.
Optionally, in some embodiments, the step of performing, based on the correlation description data, an association mining operation on a plurality of the initial feature representations to output corresponding initial data feature representations may further include the following:
based on the related relation description data, loading the initial characteristic representation in sequence to be loaded into a data association mining unit;
mining associated data feature representations based on the data associated mining unit, wherein the data associated mining unit can splice a plurality of initial feature representations according to the related relationship description data to form corresponding associated data feature representations;
utilizing a focusing characteristic analysis unit to perform focusing characteristic analysis operation on the associated data characteristic representations so as to output a plurality of to-be-processed data characteristic representations, wherein the focusing characteristic analysis unit is used for analyzing to-be-processed data characteristic representations corresponding to each to-be-processed power data file segment based on the content representation important parameter of each to-be-processed power data file segment, and illustratively, the initial characteristic representations in the associated data characteristic representations can be subjected to inter-modal focusing characteristic analysis operation based on adjacent initial characteristic representations so as to obtain corresponding to-be-processed data characteristic representations, wherein focusing characteristic weight parameters obtained by performing focusing characteristic analysis operation can be used as the content representation important parameters, so that weighting can be performed based on the content representation important parameters so as to obtain the corresponding to-be-processed data characteristic representations;
And carrying out feature integration operation on the multiple data feature representations to be processed by utilizing a feature integration unit included in each power data analysis network so as to output corresponding initial data feature representations, wherein the processing procedure of the feature integration unit can be opposite to the processing procedure of feature mining, such as carrying out inverse filtering processing (up-sampling) so as to obtain corresponding initial data feature representations.
Optionally, in some embodiments, the power data file to be processed includes a plurality of power data file segments to be processed; each of the plurality of initial data feature representations includes a plurality of preliminary feature representations having a one-to-one correspondence with a plurality of the power data file segments to be processed, based on which the step of performing a fusion operation of the feature representations on the plurality of initial data feature representations to form a corresponding aggregate data feature representation may further include:
screening a plurality of preliminary feature representation clusters which have one-to-one correspondence with a plurality of to-be-processed power data file fragments from the plurality of initial data feature representations, wherein each preliminary feature representation cluster in the plurality of preliminary feature representation clusters comprises a preliminary feature representation corresponding to one to-be-processed power data file fragment in the plurality of to-be-processed power data file fragments in the plurality of initial data feature representations;
Determining a mean preliminary feature representation of each preliminary feature representation cluster in the plurality of preliminary feature representation clusters (i.e., performing mean superposition on each preliminary feature representation in the preliminary feature representation clusters to obtain a mean preliminary feature representation) so as to output a plurality of mean preliminary feature representations with a one-to-one correspondence with a plurality of to-be-processed power data file segments;
the feature representations comprising the plurality of mean preliminary feature representations are labeled for marking as corresponding aggregate data feature representations, that is, the aggregate data feature representations may comprise the plurality of mean preliminary feature representations.
Wherein, optionally, in some embodiments, the power data file to be processed includes a plurality of power data file segments to be processed; each of the plurality of initial data feature representations includes a plurality of preliminary feature representations having a one-to-one correspondence with a plurality of the power data file segments to be processed, based on which the step of performing a fusion operation of the feature representations on the plurality of initial data feature representations to form a corresponding aggregate data feature representation may further include:
Screening a plurality of preliminary feature representation clusters which have one-to-one correspondence with a plurality of to-be-processed power data file fragments from the plurality of initial data feature representations, wherein each preliminary feature representation cluster in the plurality of preliminary feature representation clusters comprises a preliminary feature representation corresponding to one to-be-processed power data file fragment in the plurality of to-be-processed power data file fragments in the plurality of initial data feature representations;
determining the most relevant preliminary feature representation of each preliminary feature representation cluster in the plurality of preliminary feature representation clusters (namely clustering all the preliminary feature representations in the preliminary feature representation clusters to determine a clustering center as the most relevant preliminary feature representation) so as to output a plurality of most relevant preliminary feature representations with one-to-one correspondence with a plurality of to-be-processed power data file fragments;
the feature representations comprising the plurality of most relevant preliminary feature representations are labeled as corresponding aggregated data feature representations, that is, the aggregated data feature representations may comprise the plurality of most relevant preliminary feature representations, which may be spliced, for example, to form the corresponding aggregated data feature representations.
Optionally, in some embodiments, the power data file to be processed includes N power data file segments to be processed, the aggregate data feature representation includes N feature representations corresponding to the N power data file segments to be processed, based on which the step of analyzing the target power anomaly characterization data corresponding to the power data file to be processed based on the aggregate data feature representation may further include the following:
performing full connection operation on the N feature representations to obtain full connection feature representations;
respectively carrying out similarity calculation on the full-connection feature representation and the plurality of center feature representations to output a plurality of corresponding feature representation similarity;
determining one feature representation similarity (such as the largest feature representation similarity) from the feature representation similarities, and marking the feature representation similarity as a target feature representation similarity;
and marking the central characteristic representation corresponding to the target characteristic representation similarity as target power abnormality characterization data corresponding to the power data file to be processed, wherein each central characteristic representation is determined (clustered to be determined) based on the characteristic representation corresponding to at least one typical power data file with the corresponding reference power abnormality characterization data.
Optionally, in some embodiments, before the step of performing feature mining operations on the power data file to be processed by using the plurality of power data analysis networks to output a corresponding plurality of initial data feature representations, the method for processing the power data file may further include the following:
based on the typical power data file (and the corresponding actual power abnormality characterization data), performing network updating operation on each of a plurality of power data analysis networks to be updated to form a plurality of corresponding updated power data analysis networks;
performing a network update operation on each of a plurality of associated networks based on the typical power data file to form a corresponding plurality of updated associated networks, wherein each of the plurality of associated networks includes one updated power data analysis network and one feature representation restoration network, and the feature representation restoration network is used for restoring a feature representation corresponding to the typical power data file based on the power anomaly characterization data analyzed by the updated power data analysis network (so that a corresponding error parameter can be determined based on a difference between the feature representation mined by the updated power data analysis network and the feature representation restored by the feature representation restoration network, and then performing network update processing based on the error parameter);
And determining a plurality of power data analysis networks based on the plurality of updated associated networks, for example, constructing a power data analysis network based on network parameters of the plurality of updated associated networks.
Optionally, in some embodiments, the step of performing a first classification process on the plurality of power data files to be stored based on the corresponding target power anomaly characterization data to form at least one first classification set may further include the following:
carrying out consistency or similarity analysis on target power abnormality characterization data corresponding to each two to-be-stored power data files in the plurality of to-be-stored power data files;
and distributing the analyzed power data files to be stored, of which the corresponding target power abnormality characterization data are consistent, or the power data files to be stored, of which the corresponding target power abnormality characterization data belong to the same parameter interval, into the same first classification set to form at least one first classification set.
Optionally, in some embodiments, the step of forming at least one second classification set corresponding to each first classification set by performing a second classification process inside each first classification set based on the similarity between the power data files to be stored, may further include the following:
For each first classification set, performing a quantity statistics operation on the power data files to be stored, which are included in the first classification set, so as to form a corresponding file quantity statistical value, determining the first classification set as a corresponding second classification set if the file quantity statistical value is smaller than or equal to a predetermined first reference value (such as 5), and determining the first classification set as a corresponding third classification set if the file quantity statistical value is greater than the first reference value;
and respectively carrying out second classification processing inside each third classification set based on the similarity between the power data files to be stored to form at least one second classification set corresponding to each third classification set.
Optionally, in some embodiments, the step of forming at least one second classification set corresponding to each third classification set by performing a second classification process inside each third classification set based on the similarity between the power data files to be stored, may further include the following:
performing keyword extraction operation on each power data file to be stored in the third classification set to form a keyword sequence corresponding to each power data file to be stored, wherein each keyword in the keyword sequence belongs to a reference keyword set configured for the power system field;
Performing feature mining processing on the corresponding keyword sequences respectively to form keyword feature representations corresponding to the power data files to be stored;
and calculating the similarity between the corresponding power data files to be stored based on the keyword characteristic representation, and respectively carrying out second classification processing inside each third classification set based on the similarity between the power data files to be stored to form at least one second classification set corresponding to each third classification set.
Optionally, in some embodiments, the step of performing feature mining processing on the corresponding keyword sequences to form keyword feature representations corresponding to the to-be-stored power data file may further include the following contents:
for each keyword in the keyword sequence, carrying out embedding processing on the keyword to form word embedding feature representation corresponding to the keyword; determining whether each keyword in the keyword sequence has a related keyword or not based on target power data corpus, wherein the co-occurrence probability of the related keyword and the corresponding keyword in the target power data corpus is larger than a preset probability;
Marking each keyword in the keyword sequence without related keywords as a first keyword, marking each keyword in the keyword sequence with related keywords as a second keyword, and marking a word embedding feature representation of each first keyword to be a target word embedding feature representation of the first keyword;
for each second keyword, marking word embedding feature representations corresponding to related keywords corresponding to the second keyword so as to mark the related word embedding feature representations corresponding to the second keyword, and performing transposition operation on the related word embedding feature representations so as to form transposed word embedding feature representations corresponding to the second keyword;
respectively calculating word embedding feature representations corresponding to each second keyword, corresponding transposed word embedding feature representations and related word embedding feature representations, and performing fusion operation to form target word embedding feature representations corresponding to each second keyword, wherein the transposed word embedding feature representations and the word embedding feature representations can be multiplied by the number of dimensions of the word embedding feature representations, and then performing parameter normalization processing and multiplying the related word embedding feature representations to realize fusion, so that the corresponding target word embedding feature representations are obtained;
And performing splicing operation on the target word embedded feature representation corresponding to each keyword in the keyword sequence to form the keyword feature representation corresponding to the power data file to be stored.
Optionally, in some embodiments, the step of calculating the similarity between the corresponding to-be-stored power data files based on the keyword feature representation, and performing a second classification process inside each third classification set based on the similarity between the to-be-stored power data files, to form at least one second classification set corresponding to each third classification set, may further include the following contents:
performing average value calculation on the keyword characteristic representation corresponding to each to-be-stored power data file in the third classification set so as to output a corresponding average value keyword characteristic representation;
for each to-be-stored power data file in the third classification set, calculating cosine similarity between the keyword feature representation corresponding to the to-be-stored power data file and the average keyword feature representation to obtain cosine similarity corresponding to the to-be-stored power data file;
And performing second classification processing on each to-be-stored power data file in the third classification set based on a plurality of continuous similarity intervals configured by cosine, so as to form at least one second classification set corresponding to the third classification set, wherein cosine similarity corresponding to each to-be-stored power data file in the second classification set belongs to the same similarity interval, and the similarity interval can be configured in advance according to actual requirements without specific limitation.
With reference to fig. 3, an embodiment of the present invention further provides a system for processing a power data file, which is applicable to the above-mentioned processing platform for the power data file. Wherein, the processing system of the power data file may include the following software functional modules:
the data file marking module is used for marking each to-be-stored power data file in a plurality of to-be-stored power data files to be stored as a to-be-processed power data file;
the power abnormality analysis module is used for analyzing target power abnormality characterization data corresponding to the power data file to be processed by utilizing a plurality of power data analysis networks, and the target power abnormality characterization data is used for reflecting the abnormal state of the power system corresponding to the power data file to be processed;
The first classification processing module is used for carrying out first classification processing on the plurality of power data files to be stored based on the corresponding target power abnormality characterization data so as to form at least one first classification set, wherein each first classification set comprises at least one power data file to be stored;
the second classification processing module is used for respectively carrying out second classification processing in each first classification set based on the similarity between the power data files to be stored to form at least one second classification set corresponding to each first classification set, wherein each second classification set comprises at least one power data file to be stored;
and the classification storage module is used for respectively carrying out classification storage on each obtained second classification set.
Optionally, in some embodiments, the first classification processing module is specifically configured to:
carrying out consistency or similarity analysis on target power abnormality characterization data corresponding to each two to-be-stored power data files in the plurality of to-be-stored power data files;
and distributing the analyzed power data files to be stored, of which the corresponding target power abnormality characterization data are consistent, or the power data files to be stored, of which the corresponding target power abnormality characterization data belong to the same parameter interval, into the same first classification set to form at least one first classification set.
Optionally, in some embodiments, the second classification processing module is specifically configured to:
for each first classification set, performing a quantity statistics operation on the power data files to be stored, which are included in the first classification set, to form a corresponding file quantity statistics value, determining the first classification set as a corresponding second classification set if the file quantity statistics value is smaller than or equal to a predetermined first reference value, and determining the first classification set as a corresponding third classification set if the file quantity statistics value is greater than the first reference value;
and respectively carrying out second classification processing inside each third classification set based on the similarity between the power data files to be stored to form at least one second classification set corresponding to each third classification set.
In summary, according to the processing system and method for power data files provided by the present invention, for each power data file to be stored, the power data file to be stored is marked as a power data file to be processed; analyzing target power abnormality characterization data corresponding to a power data file to be processed by utilizing a plurality of power data analysis networks; performing first classification processing on a plurality of power data files to be stored based on corresponding target power abnormality characterization data to form at least one first classification set; based on the similarity between the power data files to be stored, respectively carrying out second classification processing in each first classification set to form at least one second classification set corresponding to each first classification set; and respectively storing each second classification set in a classification way. Based on the foregoing, two-stage classification processing can be performed through the similarity between the data files and the corresponding target power abnormality characterization data, so that the classification accuracy is higher, and the reliability of classification storage is improved.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.