CN116467461A - Data processing method, device, equipment and medium applied to power distribution network - Google Patents

Data processing method, device, equipment and medium applied to power distribution network Download PDF

Info

Publication number
CN116467461A
CN116467461A CN202310256432.8A CN202310256432A CN116467461A CN 116467461 A CN116467461 A CN 116467461A CN 202310256432 A CN202310256432 A CN 202310256432A CN 116467461 A CN116467461 A CN 116467461A
Authority
CN
China
Prior art keywords
data
preset
target
processed
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310256432.8A
Other languages
Chinese (zh)
Inventor
黄裕春
张晏玉
雷才嘉
佟佳俊
贾巍
童家鹏
范杏元
黄文�
钏星
李水天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202310256432.8A priority Critical patent/CN116467461A/en
Publication of CN116467461A publication Critical patent/CN116467461A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a data processing method, a device, equipment and a medium applied to a power distribution network, wherein the method comprises the following steps: accessing a target data source, and extracting data to be processed from the target data source according to a preset extraction rule; processing the data to be processed according to a preset data processing method to obtain data to be applied; processing data to be applied based on a preset extraction model, and determining target data; and constructing a power knowledge graph based on the target data, and storing the power knowledge graph into a target database. Based on the technical scheme, the method and the device have the advantages that needed target data are determined according to the data information of the power distribution network, and the power knowledge graph is built based on the target data, so that the technical effects of improving the accuracy and the building efficiency of the power knowledge graph are achieved.

Description

Data processing method, device, equipment and medium applied to power distribution network
Technical Field
The present invention relates to the field of power technologies, and in particular, to a data processing method, apparatus, device, and medium applied to a power distribution network.
Background
With the rapid development of power supply technology, the intelligent power grid accumulates massive information in daily construction and operation, and workers can inquire from stored information according to requirements, so that normal operation of the power supply network can be ensured.
However, the existing method for storing the data of the power supply network simply stores the data in the power supply system, so that the problems of low data utilization rate, low management efficiency, high retrieval difficulty and the like are caused. In addition, most of information is described in a natural language form, and the word expression habits of all workers are different, so that the difficulty of understanding the information by a computer is increased.
Disclosure of Invention
The invention provides a data processing method, a device, equipment and a medium applied to a power distribution network, which realize the technical effects of determining required target data according to data information of the power distribution network, constructing a power knowledge graph based on the target data, and further improving the accuracy and the construction efficiency of the power knowledge graph.
In a first aspect, an embodiment of the present invention provides a data processing method applied to a power distribution network, including:
accessing a target data source, and extracting data to be processed from the target data source according to a preset extraction rule;
processing the data to be processed according to a preset data processing method to obtain data to be applied;
processing the data to be applied based on a preset extraction model, and determining target data;
And constructing a power knowledge graph based on the target data, and storing the power knowledge graph into a target database.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus applied to a power distribution network, where the apparatus includes:
the data processing system comprises a data acquisition module to be processed, a data processing module and a data processing module, wherein the data acquisition module is used for accessing a target data source and extracting data to be processed from the target data source according to a preset extraction rule;
the data to be applied determining module is used for processing the data to be processed according to a preset data processing method to obtain the data to be applied;
the target data determining module is used for processing the data to be applied based on a preset extraction model and determining target data;
and the map construction module is used for constructing a power knowledge map based on the target data and storing the power knowledge map into a target database.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to implement a data processing method for a power distribution network according to any one of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the program when executed by a processor implements a data processing method applied to a power distribution network according to any one of the embodiments of the present invention.
According to the technical scheme, the target data source is accessed, the data to be processed is extracted from the target data source according to the preset extraction rule, the data to be processed is processed according to the preset data processing method to obtain the data to be applied, the data to be applied is further processed based on the preset extraction model, the target data is determined, the electric power knowledge graph is finally constructed based on the target data, and the electric power knowledge graph is stored in the target database. Based on the technical scheme, the method and the device have the advantages that needed target data are determined according to the data information of the power distribution network, and the power knowledge graph is built based on the target data, so that the technical effects of improving the accuracy and the building efficiency of the power knowledge graph are achieved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a data processing method applied to a power distribution network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an improved CBOW model provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of the BERT-BiGRU-Attention-CRF model provided by the embodiment of the invention;
FIG. 4 is a schematic representation of an input representation of a BERT model provided by an embodiment of the invention;
fig. 5 is a flowchart of a data processing method applied to a power distribution network according to an embodiment of the present invention;
fig. 6 is a block diagram of a data processing device applied to a power distribution network according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a schematic flow chart of a data processing method applied to a power distribution network, which is provided by the embodiment of the invention, and the embodiment is applicable to determining data to be applied according to data to be processed in a target data source, further determining target data based on the data to be applied, so as to construct a power knowledge graph based on the target data.
As shown in fig. 1, the method includes:
s110, accessing a target data source, and extracting data to be processed from the target data source according to a preset extraction rule.
The target data source may understand a database selected by the user, for example, mySQL database, etc. The preset extraction rule may be a rule set in advance for extracting data from the database. The data to be processed may be understood as data obtained from a target data source.
Specifically, the target data source selected by the user is accessed, and the data to be processed is extracted from the target data source according to the preset extraction rule, for example, the user can select from a plurality of data sources to be selected, determine the target data source to be accessed, and then match the target data source based on the preset extraction rule, so as to obtain the data to be processed matched with the preset extraction rule, for example, a plurality of keywords are preset, and then match the data to be processed based on the keywords in the target data source, so that the successfully matched data is used as the data to be processed.
On the basis of the above technical solution, the extracting the data to be processed from the target data source according to the preset extraction rule includes: extracting source text data from the target data source; and filtering the source text data based on the extraction rule according to the preset to obtain the data to be processed matched with the preset extraction rule.
Where source text data may be understood as text data stored in a target data source. The source text data includes at least one of run text data, schedule text data, plan text data, and operator book text data. The operation text data may be understood as text data generated during operation of the power distribution network. The dispatch text data may be text data generated by the power distribution network when dispatch is performed. The planning text data may be text data generated when planning the power distribution network. The operation manual text data may be understood as operation manual data corresponding to power distribution equipment in the power distribution network.
Specifically, source text data is extracted from a target data source, the source text data is further filtered based on a preset extraction rule to obtain data to be processed, which is matched with the preset extraction rule, for example, the data to be processed can be obtained by obtaining text data of a power source network load, which can be text data such as operation text, scheduling text, related planning text, operation manual and the like generated in the operation process of a power grid, and then the source text data is filtered according to preset keywords to obtain the data to be processed. For example, the acquired text data may be filtered according to preset keywords such as "power transmission/line/wire/section/and/split pattern/response/satisfaction/corona/radio/interference/audible noise" to obtain data to be processed matching the keywords.
S120, processing the data to be processed according to a preset data processing method to obtain data to be applied.
The preset data processing method is understood to be a preset algorithm for determining the data to be applied.
Specifically, after the data to be processed is obtained, the data to be processed is processed based on a preset data processing method to obtain data to be applied, for example, feature extraction may be performed on the data to be processed according to a preset feature extraction algorithm to obtain feature data corresponding to the data to be processed, and the feature data corresponding to the data to be processed is used as the data to be applied.
On the basis of the above technical solution, the processing the data to be processed according to the preset data processing method includes, before obtaining the data to be applied: carrying out data cleaning on the data to be processed according to a preset data cleaning algorithm to obtain text data to be applied; and determining to-be-applied data corresponding to the to-be-processed data based on the to-be-applied text data.
The preset data cleaning algorithm may be an algorithm for performing data cleaning on the data to be processed. The text data to be applied may be text data obtained after data cleansing.
Specifically, after the data to be processed is obtained, in order to avoid that error data in the data to be processed affects the accuracy of the knowledge graph, data cleaning can be performed on the data to be processed according to a preset data cleaning algorithm to obtain text data to be applied after the error data is removed, and further the data to be applied corresponding to the data to be processed can be determined according to the text data to be applied. For example, a random forest algorithm-based big data repeated cleaning method is adopted, firstly, a multidimensional distributed heterogeneous storage node distribution model of the power source network load data is constructed, a cloud fusion scheduling method is adopted to conduct data self-adaptive scheduling and information feature sampling, anti-interference processing in the data acquisition process is achieved by combining a fuzzy feature matching and detection filtering method, and big data filtering and self-adaptive cleaning are conducted according to the acquired data by adopting the random forest algorithm.
On the basis of the technical scheme, determining the data to be applied corresponding to the data to be processed based on the text data to be applied comprises the following steps: performing word segmentation processing on the text data to be applied based on a preset word segmentation algorithm to obtain keyword data; determining word vector data corresponding to the text data to be applied according to a preset vector identification model and the text data to be applied; and constructing data to be applied based on the keyword data and the word vector data.
The preset word segmentation algorithm may be an algorithm for segmenting the text data to be applied, such as a markov random field word segmentation algorithm (HMM-CRF) and the like. The keyword data may be keywords extracted from text data to be applied according to a preset word segmentation algorithm. The preset vector identification model may be a model for acquiring word vector data of text data to be applied. Word vector data may be understood as word vectors representing text data to be applied.
Specifically, word segmentation is performed on text data to be applied according to a preset word segmentation algorithm to obtain keyword data, then the text data to be applied is processed by adopting a preset vector identification model to obtain corresponding word vector data, and then the data to be applied is constructed according to the keyword data and the word vector data. For example, text data may be represented by using a Word2vec model, so as to obtain corresponding Word vector data, where the Word2vec model converts words into vectors, and includes two algorithms, skip-gram and CBOW, respectively. The scheme of the embodiment of the invention adopts an improved CBOW model, as shown in figure 2, and comprises the following steps: input layer (Input layer), hidden layer (Hidden layer), and Output layer (Output layer). The synonyms can be calculated by calculating cosine similarity among word vectors, word vectors in the obtained corpus can also be used as input of a subsequent entity recognition model, in order to improve the entity recognition capability, the CBOW model is improved, and a multi-layer neuron structure is expanded in a hidden layer.
On the basis of the above technical scheme, the word segmentation processing is performed on the text data to be applied based on a preset word segmentation algorithm to obtain keyword data, including: performing word segmentation processing on the text data to be applied by adopting a first word segmentation algorithm to obtain a first keyword list; counting keywords in the first keyword list to obtain frequency information corresponding to each keyword; and processing the first keyword list according to a preset frequency threshold and the frequency information, and determining the keyword data.
The first word segmentation algorithm may be a preset algorithm for performing word segmentation processing on text data, for example, a jieba word segmentation algorithm. The first keyword table may be understood as a keyword table constructed based on the word segmentation result. The frequency information may be a frequency of occurrence of each keyword in the first keyword table. The preset frequency threshold may be a frequency threshold for determining whether the current keyword is a high frequency vocabulary.
Specifically, word segmentation is performed on the text data to be applied by adopting a first word segmentation algorithm to obtain a first keyword list, frequency information of each keyword in the first keyword list is counted, the first keyword list is filtered based on a preset frequency threshold value and the frequency information to obtain high-frequency keywords meeting the preset frequency threshold value, and the high-frequency keywords are used as keyword data. For example, part of original text and word segmentation data thereof can be sampled after the jieba is segmented once, word segmentation recognition results are observed, original text is compared, and the words with error segmentation are extracted to construct a keyword list; then, carrying out numerical statistics on the word segmentation text, and adding the high-frequency word into a keyword list because the electric power data has field characteristics; perfecting a keyword table based on a TF-IDF algorithm; then, the processed document is segmented again using a CRF-based segmentation model, which is imported into a high frequency dictionary; and finally obtaining a high-precision segmentation result. Taking the power dispatching text as an example, corresponding tag information can be set according to the type of the dispatching text, for example, tags such as sentence class, semantic component, semantic model and the like are designed for the power dispatching text, and the tags are as follows:
S130, processing the data to be applied based on a preset extraction model, and determining target data.
The preset extraction model may be a model for extracting information of the device from data to be applied.
Specifically, after the data to be applied is obtained, the data to be applied can be used as input of a model, the input value is preset in an extraction model, and the target data corresponding to the data to be applied is obtained after the data to be applied is processed through the preset extraction model. For example, the data to be applied may be processed through a preset extraction model, and the key device information in the data to be applied is extracted.
On the basis of the above technical solution, the processing the data to be applied based on the preset extraction model, and determining the target data include: inputting the data to be applied to the preset extraction model to obtain characteristic data corresponding to the data to be applied; the target data is determined based on the feature data.
The feature data may be data representing a feature of the data to be applied, for example, a feature vector or the like.
Specifically, the application data may be processed by adopting a BERT-bigur-attribute-CRF combination model, and it should be noted that, as shown in fig. 3, the BERT-bigur-attribute-CRF combination model in the embodiment of the present invention includes:
BERT layer: the BERT model uses a bi-directional transducer structure, performs feature extraction and training through a multi-layer neural network, and converts input text into word vectors so that the biglu layer can learn contextual features. The key part of the BERT model is a deep network based on self-attitution mechanism, and the core goal is to select information more fitting with the current task goal from a large amount of information. As shown in FIG. 4, compared with the traditional static word vector training, the dynamic word vector trained by the BERT model contains the meaning of the word and the characteristics among the context words, so that the implicit characteristics of a sentence level can be better captured.
BiGRU layer: features of the sentence context are automatically extracted. The input to each biglu element is a sequence of dynamic word vectors. The biglu unit then learns how to extract the local features of the sentence. And finally, outputting a hidden state sequence by the forward GRU model, and splicing all hidden state sequences by the backward GRU model according to the sentence sequence to obtain a complete hidden state sequence. The relevant data can be obtained by a formula, and the GRU model has the following calculation formula:
z t =σ(W z ·[c t-1 ,x t ])
r t =σ(W r ·[c t-1 ,x t ])
h t =tanh(W·[r t ·c t-1 ,x t ])
c t =(1-z t )·c t-1 +z t ·h t
wherein sigma is a sigmoid function, W is a weight matrix, and x t Input text data representing time t, c t-1 R is the input text data of the last moment t Denoted as reset gate, z t To update the door, h t To be activated, c t Is the output of the current time. The updating formula of the bidirectional GRU from left to right cyclic neural network layer is as follows: the update formula of the cyclic neural network layer from right to left is as follows: /> The calculation formula of the final output of the BiGRU neural network is as follows: />
Wherein W, V, U is a weight matrix, b, c is a paranoid matrix, x t Input text data h representing time t t Is in a state to be activated.
The Attention layer: the input of the Attention layer is the hidden state output by the BiGRU layer, the hidden state is weighted and characterized by an Attention mechanism, the relation among words is mined, and finally a new state sequence is output.
CRF layer: is a markov model represented by an undirected graph and typically has two classes of feature functions, transition features and state features, respectively. The state characteristics represent a relationship between the input sequence and the current state, and the transition characteristics represent a relationship between the previous output state and the current output state. And normalizing the local features into global features, and solving the problem of partial labeling deviation by calculating the probability distribution of the whole sequence to obtain a global optimal solution. Meanwhile, the CRF model can obtain hidden constraint rules of the labels when training data, and the validity of a final prediction result can be ensured by adding some constraints.
And S140, constructing a power knowledge graph based on the target data, and storing the power knowledge graph into a target database.
The power knowledge graph can be a graph constructed according to the target data. The target database may be understood as a database for storing power knowledge maps.
Specifically, the power knowledge graph is built based on the target number, and the built power knowledge graph is stored in the target database, for example, the processed knowledge is stored in the graph database to build the power source network load knowledge graph, for example, the power source network load knowledge graph can be stored in a Neo4j database, neo4j is a high-performance NoSQL graph database, and the power source network load knowledge graph is an embedded Java persistence engine which is based on a magnetic disk and has complete transaction characteristics. The structured data may be stored on the network rather than in tables, with the data being stored as nodes in the graph and relationships between the nodes. Different entities in Neo4j are related through various different relations to form a complex graph.
On the basis of the above technical scheme, the constructing the power knowledge graph based on the target data includes: performing relation extraction on the target data based on a relation extraction algorithm, and determining a dependency relation corresponding to each target data; processing the target data based on the dependency relationship to obtain a target data set with a specific data structure; and constructing the power knowledge graph according to the dependency relationship and the target data set.
The relationship extraction algorithm may be an algorithm for extracting a dependency relationship between each power device in the target data. Dependency can be understood as a dependency between power devices, such as a-switch home and a-substation. The specific data structure may be a preset data structure, for example, the target data may be represented in a triplet manner.
Specifically, a dependency syntax analysis-based method is employed to identify relationships between extraction entities. Dependencies, i.e., the dominant and dominant relationships between sentence components, are represented by arcs in the dependency syntax analysis, and what is considered the dominant sentence in the dependency syntax analysis is its core verb. Dependency relationships between sentence components are analyzed by identifying and locating syntactic relationships. First, a subject and a core predicate are extracted by semantic role recognition. Then, through dependency syntax analysis, objects and subjects related to the core predicate meaning are found. And finally, obtaining the related dependency relationship in the power source network load text and the entity relationship based on the ontology structure through dependency syntax analysis. Further, after the dependency relationship is obtained, the extracted entity, attribute and relationship may be stored in a database, and the semantic triplet representing step specifically represents the extracted knowledge in the form of triples. The triplet forms are as follows: < power plant, including equipment set >, < power plant, owned, line >, < substation, including, transformer >, etc.
According to the technical scheme, the target data source is accessed, the data to be processed is extracted from the target data source according to the preset extraction rule, the data to be processed is processed according to the preset data processing method to obtain the data to be applied, the data to be applied is further processed based on the preset extraction model, the target data is determined, the electric power knowledge graph is finally constructed based on the target data, and the electric power knowledge graph is stored in the target database. Based on the technical scheme, the method and the device have the advantages that needed target data are determined according to the data information of the power distribution network, and the power knowledge graph is built based on the target data, so that the technical effects of improving the accuracy and the building efficiency of the power knowledge graph are achieved.
Example two
Fig. 5 is a flowchart of a data processing method applied to a power distribution network according to an embodiment of the present invention. The embodiment further optimizes the data processing method applied to the power distribution network based on the embodiment. The specific implementation manner can be seen in the technical scheme of the embodiment. Wherein, the technical terms identical to or corresponding to the above embodiments are not repeated herein.
As shown in fig. 5, the method includes:
acquiring data to be processed: specifically, the preprocessing text data of the power source network load is obtained, and the preprocessing data comprises operation text, dispatch text, government related planning text, an operation manual and the like.
Acquiring data to be applied: specifically, the data preprocessing is performed, the data preprocessing step specifically comprises data cleaning, word segmentation processing, word vector representation, keyword extraction and ontology dictionary construction, and the data cleaning specifically adopts a big data repeated cleaning method based on a random forest algorithm. And constructing a multidimensional distributed heterogeneous storage node distribution model of the power source network load data, carrying out data self-adaptive scheduling and information feature sampling by adopting a cloud fusion scheduling method, realizing anti-interference processing in the data acquisition process by combining a fuzzy feature matching and detection filtering method, and carrying out big data filtering and self-adaptive cleaning by adopting a random forest algorithm according to the acquired data.
The word segmentation processing specifically adopts a word segmentation method of HMM-CRF, and the word segmentation step specifically comprises the following steps: after the jieba performs word segmentation once, sampling part of original text and word segmentation data thereof, observing and identifying word segmentation results, comparing the original text, extracting words with error segmentation, and constructing a keyword list; then, carrying out numerical statistics on the word segmentation text, and adding the high-frequency word into a keyword list because the power data of the I have field characteristics; perfecting a keyword table based on a TF-IDF algorithm; then, the processed document is segmented again using a CRF-based segmentation model, which is imported into a high frequency dictionary; and finally obtaining a high-precision segmentation result. Taking the power dispatching text as an example, labels such as sentence types, semantic components, semantic models and the like are designed for the power dispatching text so as to be used in subsequent researches, the Word vector represents text data specifically by using a Word2vec model, and the Word2vec model converts words into vectors, which comprise two algorithms, namely skip-gram and CBOW. The present patent employs a modified CBOW model, a schematic diagram of which is shown in fig. 2, comprising: input layer (Input layer), hidden layer (Hidden layer), and Output layer (Output layer). Synonyms can be calculated by calculating cosine similarity between word vectors, and word vectors in the obtained corpus can also be used as input of a subsequent entity recognition model. Wherein, to improve entity recognition capability, the CBOW model is improved, and a multi-layer neuron structure is expanded in a hidden layer. Further, keyword extraction and ontology dictionary construction specifically extract high-frequency keywords according to the average value of frequency weights and average information entropy, and irrelevant words are removed through manual screening to construct an ontology dictionary.
Determining target data: specifically, entity extraction is carried out, and the entity extraction step specifically adopts a BERT-BiGRU-Attention-CRF combined model to extract the processed data. The BERT-BiGRU-Attention-CRF combination model comprises:
BERT layer: the BERT model uses a bi-directional transducer structure, performs feature extraction and training through a multi-layer neural network, and converts input text into word vectors so that the biglu layer can learn contextual features. The key part of the BERT model is a deep network based on self-attitution mechanism, and the core goal is to select information more fitting with the current task goal from a large amount of information. Compared with the traditional static word vector training, the dynamic word vector trained by the BERT model contains the meaning of the word and the characteristics among the context words, and can better capture the implicit characteristics of the sentence level.
BiGRU layer: features of the sentence context are automatically extracted. The input to each biglu element is a sequence of dynamic word vectors. The biglu unit then learns how to extract the local features of the sentence. And finally, outputting a hidden state sequence by the forward GRU model, and splicing all hidden state sequences by the backward GRU model according to the sentence sequence to obtain a complete hidden state sequence. The relevant data can be obtained by the formula:
The GRU model has the following calculation formula:
z t =σ(W z ·[c t-1 ,x t ])
r t =σ(W r ·[c t-1 ,x t ])
h t =tanh(W·[r t ·c t-1 ,x t ])
c t =(1-z t )·c t-1 +z t ·h t
wherein sigma is a sigmoid function, W is a weight matrix, and x t Input text data representing time t, c t-1 R is the input text data of the last moment t Denoted as reset gate, z t To update the door, h t To be activated, c t Is the output of the current time.
The updating formula of the bidirectional GRU from left to right cyclic neural network layer is as follows:
the update formula of the cyclic neural network layer from right to left is as follows:
the calculation formula of the final output of the BiGRU neural network is as follows:
wherein W, V, U is a weight matrix, b, c is a paranoid matrix, x t Input text data h representing time t t Is in a state to be activated.
The Attention layer: the input of the Attention layer is the hidden state output by the BiGRU layer, the hidden state is weighted and characterized by an Attention mechanism, the relation among words is mined, and finally a new state sequence is output.
CRF layer: is a markov model represented by an undirected graph and typically has two classes of feature functions, transition features and state features, respectively. The state characteristics represent a relationship between the input sequence and the current state, and the transition characteristics represent a relationship between the previous output state and the current output state. And normalizing the local features into global features, and solving the problem of partial labeling deviation by calculating the probability distribution of the whole sequence to obtain a global optimal solution. Meanwhile, the CRF model can obtain hidden constraint rules of the labels when training data, and the validity of a final prediction result can be ensured by adding some constraints.
Determining a dependency relationship: specifically, the performing relationship extraction step specifically employs a dependency syntax analysis-based method to identify relationships between the extracted entities. Dependencies, i.e., the dominant and dominant relationships between sentence components, are represented by arcs in the dependency syntax analysis, and what is considered the dominant sentence in the dependency syntax analysis is its core verb. Dependency relationships between sentence components are analyzed by identifying and locating syntactic relationships. First, a subject and a core predicate are extracted by semantic role recognition. Then, through dependency syntax analysis, objects and subjects related to the core predicate meaning are found. And finally, obtaining the related dependency relationship in the power source network load text and the entity relationship based on the ontology structure through dependency syntax analysis. Further, storing the knowledge and representing the semantic triplet, wherein the storing of the knowledge specifically comprises storing the extracted entity, attribute and relationship into a database, and the representing step of the semantic triplet specifically represents the extracted knowledge in the form of a triplet. The triplet forms are as follows: < power plant, including equipment set >, < power plant, owned, line >, < substation, including, transformer >, etc.
Constructing a power source network load knowledge graph: specifically, the processed knowledge is stored in a graph database to construct a power source network load knowledge graph. Neo4j is a high performance NoSQL graphics database that is an embedded disk-based Java persistence engine with full transactional properties. It stores the structured data on the network rather than in tables, storing the data as nodes in the graph and relationships between the nodes. Different entities in Neo4j are related through various different relations to form a complex graph.
According to the technical scheme, the target data source is accessed, the data to be processed is extracted from the target data source according to the preset extraction rule, the data to be processed is processed according to the preset data processing method to obtain the data to be applied, the data to be applied is further processed based on the preset extraction model, the target data is determined, the electric power knowledge graph is finally constructed based on the target data, and the electric power knowledge graph is stored in the target database. Based on the technical scheme, the method and the device have the advantages that needed target data are determined according to the data information of the power distribution network, and the power knowledge graph is built based on the target data, so that the technical effects of improving the accuracy and the building efficiency of the power knowledge graph are achieved.
Example III
Fig. 6 is a block diagram of a data processing device applied to a power distribution network according to an embodiment of the present invention. The device comprises: a pending data acquisition module 610, a pending application data determination module 620, a target data determination module 630, and a graph construction module 640.
The to-be-processed data acquisition module 610 is configured to access a target data source, and extract to-be-processed data from the target data source according to a preset extraction rule;
the to-be-applied data determining module 620 is configured to process the to-be-processed data according to a preset data processing method to obtain to-be-applied data;
the target data determining module 630 is configured to process the data to be applied based on a preset extraction model, and determine target data;
and a graph construction module 640, configured to construct a power knowledge graph based on the target data, and store the power knowledge graph in a target database.
On the basis of the technical scheme, the data acquisition module to be processed is used for extracting source text data from the target data source; wherein the source text data includes at least one of run text data, schedule text data, planning text data, and operator book text data; and filtering the original text data based on the extraction rule according to the preset to obtain the data to be processed matched with the preset extraction rule.
On the basis of the above technical solution, the data determining module to be applied includes: the data cleaning unit is used for performing data cleaning on the data to be processed according to a preset data cleaning algorithm before the data to be processed are processed according to a preset data processing method to obtain data to be applied, so as to obtain text data to be applied; and determining to-be-applied data corresponding to the to-be-processed data based on the to-be-applied text data.
On the basis of the technical scheme, the data to be applied determining module is used for performing word segmentation processing on the text data to be applied based on a preset word segmentation algorithm to obtain keyword data; determining word vector data corresponding to the text data to be applied according to a preset vector identification model and the text data to be applied; and constructing data to be applied based on the keyword data and the word vector data.
On the basis of the technical scheme, the data to be applied determining module is used for performing word segmentation on the text data to be applied by adopting a first word segmentation algorithm to obtain a first keyword list; counting keywords in the first keyword list to obtain frequency information corresponding to each keyword; and processing the first keyword list according to a preset frequency threshold and the frequency information, and determining the keyword data.
On the basis of the technical scheme, the target data determining module is used for inputting the data to be applied to the preset extraction model to obtain characteristic data corresponding to the data to be applied; the target data is determined based on the feature data.
On the basis of the technical scheme, the map construction module is used for carrying out relation extraction on the target data based on a relation extraction algorithm and determining the dependency relation corresponding to each target data; processing the target data based on the dependency relationship to obtain a target data set with a specific data structure; and constructing the power knowledge graph according to the dependency relationship and the target data set.
According to the technical scheme, the target data source is accessed, the data to be processed is extracted from the target data source according to the preset extraction rule, the data to be processed is processed according to the preset data processing method to obtain the data to be applied, the data to be applied is further processed based on the preset extraction model, the target data is determined, the electric power knowledge graph is finally constructed based on the target data, and the electric power knowledge graph is stored in the target database. Based on the technical scheme, the method and the device have the advantages that needed target data are determined according to the data information of the power distribution network, and the power knowledge graph is built based on the target data, so that the technical effects of improving the accuracy and the building efficiency of the power knowledge graph are achieved.
The data processing device applied to the power distribution network provided by the embodiment of the invention can execute the data processing method applied to the power distribution network provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
It should be noted that each unit and module included in the above apparatus are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for convenience of distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present disclosure.
Example IV
Fig. 7 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, for example data processing methods applied to the distribution network.
In some embodiments, the data processing method applied to the distribution network may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more of the steps of the data processing method described above as being applied to the distribution network may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data processing method applied to the power distribution network in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A data processing method applied to a power distribution network, comprising:
accessing a target data source, and extracting data to be processed from the target data source according to a preset extraction rule;
processing the data to be processed according to a preset data processing method to obtain data to be applied;
processing the data to be applied based on a preset extraction model, and determining target data;
and constructing a power knowledge graph based on the target data, and storing the power knowledge graph into a target database.
2. The method according to claim 1, wherein the extracting the data to be processed from the target data source according to a preset extraction rule comprises:
extracting source text data from the target data source; wherein the source text data includes at least one of run text data, schedule text data, planning text data, and operator book text data;
and filtering the original text data based on the extraction rule according to the preset to obtain the data to be processed matched with the preset extraction rule.
3. The method according to claim 1, wherein the processing the data to be processed according to a preset data processing method includes, before obtaining the data to be applied:
carrying out data cleaning on the data to be processed according to a preset data cleaning algorithm to obtain text data to be applied;
and determining to-be-applied data corresponding to the to-be-processed data based on the to-be-applied text data.
4. A method according to claim 3, wherein determining to-be-applied data corresponding to the to-be-processed data based on the to-be-applied text data comprises:
performing word segmentation processing on the text data to be applied based on a preset word segmentation algorithm to obtain keyword data;
Determining word vector data corresponding to the text data to be applied according to a preset vector identification model and the text data to be applied;
and constructing data to be applied based on the keyword data and the word vector data.
5. The method of claim 4, wherein the word segmentation processing is performed on the text data to be applied based on a preset word segmentation algorithm to obtain keyword data, and the method comprises the following steps:
performing word segmentation processing on the text data to be applied by adopting a first word segmentation algorithm to obtain a first keyword list;
counting keywords in the first keyword list to obtain frequency information corresponding to each keyword;
and processing the first keyword list according to a preset frequency threshold and the frequency information, and determining the keyword data.
6. The method according to claim 1, wherein the processing the data to be applied based on a preset extraction model, determining target data, comprises:
inputting the data to be applied to the preset extraction model to obtain characteristic data corresponding to the data to be applied;
the target data is determined based on the feature data.
7. The method of claim 1, wherein the constructing a power knowledge-graph based on the target data comprises:
performing relation extraction on the target data based on a relation extraction algorithm, and determining a dependency relation corresponding to each target data;
processing the target data based on the dependency relationship to obtain a target data set with a specific data structure;
and constructing the power knowledge graph according to the dependency relationship and the target data set.
8. A data processing apparatus for use in a power distribution network, comprising:
the data processing system comprises a data acquisition module to be processed, a data processing module and a data processing module, wherein the data acquisition module is used for accessing a target data source and extracting data to be processed from the target data source according to a preset extraction rule;
the data to be applied determining module is used for processing the data to be processed according to a preset data processing method to obtain the data to be applied;
the target data determining module is used for processing the data to be applied based on a preset extraction model and determining target data;
and the map construction module is used for constructing a power knowledge map based on the target data and storing the power knowledge map into a target database.
9. An electronic device, the electronic device comprising:
one or more processors; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method applied to the power distribution network of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement the data processing method applied to an electrical distribution network according to any one of claims 1-7 when executed.
CN202310256432.8A 2023-03-16 2023-03-16 Data processing method, device, equipment and medium applied to power distribution network Pending CN116467461A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310256432.8A CN116467461A (en) 2023-03-16 2023-03-16 Data processing method, device, equipment and medium applied to power distribution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310256432.8A CN116467461A (en) 2023-03-16 2023-03-16 Data processing method, device, equipment and medium applied to power distribution network

Publications (1)

Publication Number Publication Date
CN116467461A true CN116467461A (en) 2023-07-21

Family

ID=87179646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310256432.8A Pending CN116467461A (en) 2023-03-16 2023-03-16 Data processing method, device, equipment and medium applied to power distribution network

Country Status (1)

Country Link
CN (1) CN116467461A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117313748A (en) * 2023-11-24 2023-12-29 中电科大数据研究院有限公司 Multi-feature fusion semantic understanding method and device for government affair question and answer

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117313748A (en) * 2023-11-24 2023-12-29 中电科大数据研究院有限公司 Multi-feature fusion semantic understanding method and device for government affair question and answer
CN117313748B (en) * 2023-11-24 2024-03-12 中电科大数据研究院有限公司 Multi-feature fusion semantic understanding method and device for government affair question and answer

Similar Documents

Publication Publication Date Title
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
CN113590645B (en) Searching method, searching device, electronic equipment and storage medium
US20220067439A1 (en) Entity linking method, electronic device and storage medium
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN112559684A (en) Keyword extraction and information retrieval method
CN113918714A (en) Classification model training method, clustering method and electronic equipment
KR20200096133A (en) Method, apparatus and device for constructing data model, and medium
CN112148881A (en) Method and apparatus for outputting information
CN114281968B (en) Model training and corpus generation method, device, equipment and storage medium
CN116450867B (en) Graph data semantic search method based on contrast learning and large language model
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN113553412A (en) Question and answer processing method and device, electronic equipment and storage medium
CN115688920A (en) Knowledge extraction method, model training method, device, equipment and medium
CN116467461A (en) Data processing method, device, equipment and medium applied to power distribution network
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN114970553B (en) Information analysis method and device based on large-scale unmarked corpus and electronic equipment
CN116401345A (en) Intelligent question-answering method, device, storage medium and equipment
CN114841172A (en) Knowledge distillation method, apparatus and program product for text matching double tower model
CN114647727A (en) Model training method, device and equipment applied to entity information recognition
CN113590774A (en) Event query method, device and storage medium
CN114491030A (en) Skill label extraction and candidate phrase classification model training method and device
CN116244432B (en) Pre-training method and device for language model and electronic equipment
CN114925185B (en) Interaction method, model training method, device, equipment and medium
CN116737520B (en) Data braiding method, device and equipment for log data and storage medium
CN113076080B (en) Model training method and device and intention recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination