CN116310407A - Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service - Google Patents

Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service Download PDF

Info

Publication number
CN116310407A
CN116310407A CN202211095695.7A CN202211095695A CN116310407A CN 116310407 A CN116310407 A CN 116310407A CN 202211095695 A CN202211095695 A CN 202211095695A CN 116310407 A CN116310407 A CN 116310407A
Authority
CN
China
Prior art keywords
semantic
data
text
image
power distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211095695.7A
Other languages
Chinese (zh)
Inventor
丁一
张磐
滕飞
庞超
霍现旭
吴磊
戚艳
杨挺
尚学军
陈沛
焦秋良
孙峤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Tianjin Electric Power Co Ltd, Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202211095695.7A priority Critical patent/CN116310407A/en
Publication of CN116310407A publication Critical patent/CN116310407A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention relates to a heterogeneous data semantic extraction method for power distribution and utilization multi-dimensional service, which comprises the following steps: step 1, preprocessing power distribution and utilization multidimensional picture data; step 2, carrying out semantic tag extraction on the power distribution multi-dimensional picture data preprocessed in the step 1 by adopting a deep learning model and combining with a manual correction mode, and constructing an image semantic tag set; step 3, carrying out semantic extraction on the inspection text data based on the constructed image semantic tag set constructed in the step 2, and matching corresponding semantic tags for equipment and places in the inspection text; and 4, based on the extraction result of the text data keywords of the inspection in the step 3, establishing text and picture semantic sequences with the picture semantic tags in the step 2 by using an LCS algorithm, calculating the similarity between the sequences, and performing data matching check. The invention can improve the accuracy of data semantic extraction.

Description

Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service
Technical Field
The invention belongs to the field of deep learning algorithm in power big data application, relates to a heterogeneous data semantic extraction method, and particularly relates to a heterogeneous data semantic extraction method for power distribution and utilization multi-dimensional service.
Background
Because of the rapid development speed of distribution network intellectualization in recent years, along with the promotion of paperless records, a large amount of electric heterogeneous data with poor organization are emerging. The image and text data are greatly increased, and the image and text data comprise text data such as patrol records and picture data such as running states and environment states of power equipment shot by the patrol robot. But such data is difficult for a computer to understand and use. Meanwhile, the distribution network multidimensional service field has wide data sources and various forms, and each source or form can be regarded as a structural form, such as pictures, numbers, texts and the like. The current semantic extraction is mostly focused on single-structure data processing such as text, however, a great deal of data such as images is emerging, so that the requirement for semantic extraction of image data is more urgent.
The semantic understanding of the data enables the intelligent agent to sense and understand the real data scene more deeply, and further can infer the sensed data information so as to better support the intelligent sensing industry application of the electric power system. The data semantic extraction aims at giving multi-source heterogeneous data, and the target semantic is extracted by using an automatic mode such as manual or machine learning, deep learning and the like. Manual extraction generally adopts a mode of expert or organization discussion, and more manpower resources are required. While automatic extraction relies on rapidly developing computer technology, many achievements have been made in recent years, and the development of early text semantic extraction is advanced to the extraction of data with structural diversity such as pictures, texts and the like.
The method effectively extracts accurate semantic information of mass growing measurement data, can better understand, search and manage the data, further realizes multi-level and multi-dimensional semantic understanding and association of visual elements, and lays an important foundation for supporting intelligent application services of content-oriented power industry.
However, the existing heterogeneous data semantic extraction method for the power distribution and utilization multidimensional service has the following defects: the conventional semantic extraction method is mostly oriented to the general field and is not applicable to the electric power vertical field. For example, the natural language processing technology based on LSTM mainly utilizes the context relevance of data to improve the semantic extraction accuracy, but the situations that a large amount of short messages exist in the power distribution business data, the context relevance is weak and the like are not considered, so that the existing semantic extraction method has poor practicability on the power distribution data.
No prior art patent document, which is the same as or similar to the present invention, was found after searching.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a heterogeneous data semantic extraction method for power distribution and utilization multi-dimensional service, which can solve the technical problem that information is difficult to extract due to various data structures among different services.
The invention solves the practical problems by adopting the following technical scheme:
a heterogeneous data semantic extraction method for power distribution and utilization multidimensional service comprises the following steps:
step 1, preprocessing power distribution and utilization multidimensional picture data;
step 2, carrying out semantic tag extraction on the power distribution multi-dimensional picture data preprocessed in the step 1 by adopting a deep learning model and combining with a manual correction mode, and constructing an image semantic tag set;
step 3, carrying out semantic extraction on the inspection text data based on the constructed image semantic tag set constructed in the step 2, and matching corresponding semantic tags for equipment and places in the inspection text;
and 4, based on the extraction result of the text data keywords of the inspection in the step 3, establishing text and picture semantic sequences with the picture semantic tags in the step 2 by using an LCS algorithm, calculating the similarity between the sequences, and performing data matching check.
Moreover, the specific steps of the step 1 include:
(1) Unifying the sizes of the patrol shooting pictures, scaling the original image into a specified image size, and unifying the sizes of the image size to 600 multiplied by 800;
(2) Cutting and filling, namely unifying the sizes of images of different distribution network service sources, cutting an original image if the original image is larger than a target image, and filling blank pixel points generated in the image stretching process by using black pixels if the original image is smaller than the target image;
(3) And adjusting the image proportion of the equipment, adjusting the length-width ratio of partial image files in the data set, taking the function value as 1, taking the central point as a reference, and storing the adjusted image as a new image.
Moreover, the specific steps of the step 2 include:
(1) Constructing a deep learning model adopting an Encoder-Decoder structure, firstly inputting the power-distribution multi-dimensional picture data preprocessed in the step 1, entering an Encoder part, extracting the characteristics x= (x) of n positions of a picture by using a feature map of a convolution layer through the spatial characteristics of CNN 1 ,x 2 ,…,x n ) Wherein x is a D-dimensional vector;
(2) The context vector of the incoming Decoder CNN is z when the decoding is carried out in the t stage, namely the generation of the t feature semantics t The hidden layer state of the previous stage of CNN is g t-1 The method comprises the steps of carrying out a first treatment on the surface of the This context vector z t Is x= (x) 1 ,x 2 ,...,x n ) In particular z t And x= (x) 1 ,x z ,...,x n ) The relationship of (2) can be expressed as:
Figure BDA0003838486700000031
wherein alpha is t,m The weight of the image features at the mth position is measured when the t feature semantics are generated; this weight is actually the previous hidden layer state g t-1 And mth position image feature x m Is a function of (2);
(3) Obtaining the characteristic z t As the input of CNN, the model result y is output through hidden variable generation t
(4) And (3) manually intervening and calibrating part of data to finally obtain an image semantic tag set.
Moreover, the specific steps of the step 3 include:
(1) The characteristic is represented as follows: taking word vector X of the inspection text mapping as input, using a hash coding mode to code the dimensions of the inspection text vector of the power distribution station to an h layer, and then sequentially obtaining a 128-dimensional low-dimensional vector Y through a three-layer network of DNN, wherein the calculation process is as follows:
Figure BDA0003838486700000041
wherein l represents a hidden layer node, W i Is the first 1 F is the activation function of the hidden layer and the output layer, and the patrol text semantic model uses tanh as the activation function of the hidden layer and the output layer:
Figure BDA0003838486700000042
(2) By using n+1 low-dimensional vectors with 128 dimensions, respectively representing Query and N docs, the semantic similarity between the Query and Doc can be represented by the cosine distance (cosine similarity) between the two semantic vectors
Figure BDA0003838486700000043
(3) Semantic similarity of Query and positive sample Doc can be converted into a posterior probability through a softmax function:
Figure BDA0003838486700000044
where γ is the smoothing factor of softmax, d+ is the positive sample under Query, D 'is any sample under Query, and D' is the entire sample space under Query. In the training phase, a log-loss function is used:
L=-ln P(D + ∣Q) (6)
(4) The CNN-DSSM extracts the context information of the inspection record text under the sliding window through the convolution layer, and extracts the global context information through the pooling layer, so that semantic features are effectively reserved, and corresponding semantic tags are matched for equipment, places and the like in the inspection text.
Moreover, the specific steps of the step 4 include:
(1) Recursively solving a common subsequence of a picture semantic sequence and a text sequence: let the sequence x= (X) 1 ,x 2 ,...,x n ) And y= (Y) 1 ,y 2 ,...,y m ). The number of elements of X, Y is n and m respectively; x is X i (x 1 ,x 2 ,...,x n ) And Y i (y 1 ,y 2 ,...,y m ) Respectively, are subsets of sequences X and Y, where i.ltoreq.n, j.ltoreq.m. Based on the fact that the subset of the longest common subsequence between the two sequences is also its common subsequence property, the longest common subsequence LCS (X n ,Y m ) The specific calculation formula is as follows:
Figure BDA0003838486700000051
(2) Similarity calculation: if the obtained longest common subsequence length is k, the text 1 length is n 1 Text 2 has length n 2 The calculation formula of the similarity S is:
Figure BDA0003838486700000052
(3) And sorting labels according to the similarity: and (3) carrying out similarity calculation through a longest public subsequence algorithm, solving the longest public subsequence between the matched electrogram semantic sequence and the patrol text sequence, sequencing all label pairs according to the similarity S between the solved result measurement sequences, taking a group of labels with highest similarity as a matching result, and finishing checking among different semantic sequences.
The invention has the advantages and beneficial effects that:
1. the invention provides a heterogeneous data semantic extraction method for power distribution and utilization multi-dimensional service, which utilizes CNN and CNN-DSSM algorithms to extract semantic information of equipment fault maintenance pictures and inspection record texts respectively according to the characteristics of the power distribution and utilization multi-dimensional heterogeneous data, and adopts LCS to carry out semantic matching and other operations on image-text information. According to the technical scheme, useful information can be quickly and accurately obtained from unstructured data, information check is carried out between the structured data and unstructured data, and the data semantic extraction accuracy is improved.
2. The invention provides a data semantic extraction method for the specific field of power distribution and utilization multidimensional advanced measurement data, integrating data resources, applying data normalization, semantic extraction, similarity calculation and other construction methods. The present invention is mainly directed to multi-dimensional high-level measurement data of power distribution and utilization, for example: the invention uses intelligent learning algorithm to extract semanteme, including picture character recognition, text semanteme extraction, image-text information matching, etc., adopts CNN model, merges multi-layer perceptron architecture, further improves the depth of model, makes the model extract deeper information, realizes intelligent recognition and check of overhaul data semanteme in distribution and utilization multidimensional, and is beneficial to intelligent application of distribution and network big data.
Drawings
Fig. 1 is a schematic drawing of power inspection picture extraction according to the present invention;
FIG. 2 is a schematic diagram of a power station house inspection text information extraction model of the present invention;
fig. 3 is a schematic diagram of the graphic information matching check of the present invention.
Detailed Description
Embodiments of the invention are described in further detail below with reference to the attached drawing figures:
a heterogeneous data semantic extraction method for power distribution and utilization multidimensional service comprises the following steps:
step 1, preprocessing power distribution and utilization multidimensional picture data;
the specific steps of the step 1 comprise:
(1) Unifying the sizes of the patrol shooting pictures, scaling the original image into a specified image size, and unifying the sizes of the image size to 600 multiplied by 800;
(2) Cutting and filling, namely unifying the sizes of images of different distribution network service sources, cutting an original image if the original image is larger than a target image, and filling blank pixel points generated in the image stretching process by using black pixels if the original image is smaller than the target image;
(3) And adjusting the image proportion of the equipment, adjusting the length-width ratio of partial image files in the data set, taking the function value as 1, taking the central point as a reference, and storing the adjusted image as a new image.
The working principle of the step 1 is as follows:
and aiming at distribution network operation and maintenance service data, the method comprehensively utilizes the Pandas, numpy and other tool packages in Python to unify the format and the organization storage form of the data, and comprises the preprocessing operations of fragmentation data removal, size adjustment and normalization of an original power equipment image and the like.
Step 2, carrying out semantic tag extraction on the power distribution multi-dimensional picture data preprocessed in the step 1 by adopting a deep learning model and combining with a manual correction mode, and constructing an image semantic tag set;
the specific steps of the step 2 include:
(2) Constructing a deep learning model adopting an Encoder-Decoder structure, as shown in FIG. 1, firstly inputting the power-on multi-dimensional picture data preprocessed in the step 1, entering an Encoder part, extracting the characteristics x= (x) of n positions of a picture by using a feature map of a convolution layer through the spatial characteristics of CNN 1 ,x 2 ,…,x n ) Wherein x is a D-dimensional vector;
in this embodiment, the feature map is set to have a height and width of 14, the number of channels is 256, n=14×14=196, and d=256; meanwhile, in order to extract high-level semantics of the power equipment image, an Attention mechanism is added during decoding, and different weights are distributed to the extracted image features;
(2) The context vector of the incoming Decoder CNN is z when the decoding is carried out in the t stage, namely the generation of the t feature semantics t The hidden layer state of the previous stage of CNN is g t-1 The method comprises the steps of carrying out a first treatment on the surface of the This context vector z t Is x= (x) 1 ,x 2 ,...,x n ) In particular z t And x= (x) 1 ,x z ,...,x n ) The relationship of (2) can be expressed as:
Figure BDA0003838486700000071
wherein alpha is t,m The weight of the image features at the mth position is measured when the t feature semantics are generated; this weight is actually the previous hidden layer state g t-1 And mth position image feature x m Is a function of (2);
(3) Obtaining the characteristic z t As the input of CNN, the model result y is output through hidden variable generation t
(4) And (3) manually intervening and calibrating part of data to finally obtain an image semantic tag set.
The working principle of the step 2 is as follows:
and performing semantic tag extraction on the semi-structured and unstructured picture data by adopting modes of deep learning, manual labeling and the like, and constructing a semantic tag set.
Because the equipment fault maintenance picture contains more visual information, such as the color and the texture of a shallow layer and semantic information of a higher layer, such as operators, power equipment and the like, relative to the inspection record text, the invention adopts a deep learning model to extract the semantic label of the power picture by combining manual correction.
Compared with the traditional method, the model greatly improves the accuracy of the generated semantic label, but due to less supervision data, the problems of semantic gravity center deviation, inexactness in description and the like still exist, and in order to solve the influence of the data quantity problem on the label accuracy, part of data is calibrated by adopting manual intervention, the accuracy of the semantic label is improved, and finally the image semantic label is obtained.
Step 3, carrying out semantic extraction on the inspection text data based on the constructed image semantic tag set constructed in the step 2, and matching corresponding semantic tags for equipment and places in the inspection text;
the specific steps of the step 3 include:
(1) The characteristic is represented as follows: taking word vector X mapped by the inspection text as input, wherein the dimension coding of the inspection text vector of the power distribution station is carried out to h layers in a hash coding mode, and then 128-dimensional low-dimensional vector Y is obtained through a three-layer network of DNN in sequence, wherein the specific process is shown in figure 2; the calculation process is as follows:
Figure BDA0003838486700000081
wherein l represents a hidden layer node, W i Is the first 1 F is the activation function of the hidden layer and the output layer, and the patrol text semantic model uses tanh as the activation function of the hidden layer and the output layer:
Figure BDA0003838486700000082
(2) By using n+1 low-dimensional vectors with 128 dimensions, respectively representing Query and N docs, the semantic similarity between the Query and Doc can be represented by the cosine distance (cosine similarity) between the two semantic vectors
Figure BDA0003838486700000091
(3) Semantic similarity of Query and positive sample Doc can be converted into a posterior probability through a softmax function:
Figure BDA0003838486700000092
where γ is the smoothing factor of softmax, d+ is the positive sample under Query, D 'is any sample under Query, and D' is the entire sample space under Query. In the training phase, a log-loss function is used:
L=-ln P(D + ∣Q) (6)
(4) The CNN-DSSM extracts the context information of the inspection record text under the sliding window through the convolution layer, and extracts the global context information through the pooling layer, so that semantic features are effectively reserved, and corresponding semantic tags are matched for equipment, places and the like in the inspection text.
The working principle of the step 3 is as follows:
and extracting keywords from the inspection text data by deep learning. In the existing semantic extraction method, the CNN-DSSM is utilized for keyword extraction, so that the semantic degree of feature vector representation can be improved, the sparsity of data dimension can be reduced, and the problem of data noise can be effectively solved.
The DSSM core concept is that the query and the doc are mapped into a semantic space with a common dimension, and the hidden semantic model is obtained through training by maximizing cosine similarity between the query and doc semantic vectors, so that the purpose of searching is achieved.
The input of the initial DSSM model consists of two parts: (1) query, the Query portion; (2) doc, the document part, where the number of documents is dynamic, depends on the specific scenario and business. The MLP part of the model is divided into 5 layers, decreasing in order from bottom to top dimensions, respectively: 500k, 30k, 300, 128.
Step 4, based on the extraction result of the text data keyword in the inspection step 3, establishing text and picture semantic sequences with the picture semantic tags in the step 2 by using an LCS algorithm, calculating the similarity between the sequences, and performing data matching check;
as shown in fig. 3, the specific steps of the step 4 include:
(1) Recursively solving a common subsequence of a picture semantic sequence and a text sequence: let the sequence x= (X) 1 ,x 2 ,...,x n ) And y= (Y) 1 ,y 2 ,...,y m ). The number of elements of X, Y is n and m respectively; x is X i (x 1 ,x 2 ,...,x n ) And Y i (y 1 ,y 2 ,...,y m ) Respectively, are subsets of sequences X and Y, where i.ltoreq.n, j.ltoreq.m. Based on the subset of the longest common subsequence between the two sequences being also its common subsequence natureThe quality must be based on whether the corresponding element values are equal, recursively solving the longest common subsequence LCS (X n ,Y m ) The specific calculation formula is as follows:
Figure BDA0003838486700000101
(2) Similarity calculation: if the obtained longest common subsequence length is k, the text 1 length is n 1 Text 2 has length n 2 The calculation formula of the similarity S is:
Figure BDA0003838486700000102
(3) And sorting labels according to the similarity: and (3) carrying out similarity calculation through a longest public subsequence algorithm, rapidly solving the longest public subsequence between the matched electrogram semantic sequence and the inspection text sequence, sequencing all label pairs according to the similarity S between the solved result measurement sequences, taking a group of labels with the highest similarity as a matching result, and finishing the check between different semantic sequences.
The working principle of the step 4 is as follows:
and calculating the label similarity of the label set by adopting a semantic similarity method of the longest public subsequence, and selecting a text with the highest confidence level to carry out matching check on the semantics of the data of different modes.
The longest common subsequence refers to a sequence, given two sequences X and Y, of which all common subsequences are queried, the length of which can be quantized by applying the longest common subsequence to calculate similarity. The longest common subsequence solving method comprises an exhaustion method and a dynamic programming method, but when the data quantity reaches a certain quantity, the calculated quantity can be exponentially increased along with the increase of the element quantity in the sequence, and the dynamic programming method can avoid repeated calculation, so that the matching efficiency of LCS to the two subsequences is improved.
Step 5, experimental verification
In order to prove that the atlas constructed by the method has the advantages of rich mode, high accuracy, high retrieval efficiency and the like, the method provided by the invention selects two common indexes of AUC and average accuracy AP for method verification.
In order to prove that the matched electric heterogeneous data semantic extraction technology constructed by the method has the advantages of high accuracy, high retrieval efficiency and the like, the method provided by the invention selects two common indexes of AUC and average accuracy AP for method verification. The accuracy of the proposed method is proved to be high. The AP index corresponds to the area under the Precision-Recall curve, so that the integral performance of the algorithm can be better reflected, and the calculation formula is as follows:
Figure BDA0003838486700000111
where TP is the correctly aligned data pair and FP is the data pair that is semantically uncorrelated with the data tag.
Whereas the AUC method verifies the probability of aligning the correct data pairs for the link, provided that in the experiment, a total of f aligned links are performed, where f occurs 1 The number of times the linked data pair can be selected as a result of the secondary experiment, where f is present 2 The result of the experiment is that the node pair score is equal to or lower than that of one node at f 1 AUC can be expressed as:
Figure BDA0003838486700000112
it should be emphasized that the embodiments described herein are illustrative rather than limiting, and that this invention encompasses other embodiments which may be made by those skilled in the art based on the teachings herein and which fall within the scope of this invention.

Claims (5)

1. A heterogeneous data semantic extraction method for power distribution and utilization multidimensional service is characterized by comprising the following steps of: the method comprises the following steps:
step 1, preprocessing power distribution and utilization multidimensional picture data;
step 2, carrying out semantic tag extraction on the power distribution multi-dimensional picture data preprocessed in the step 1 by adopting a deep learning model and combining with a manual correction mode, and constructing an image semantic tag set;
step 3, carrying out semantic extraction on the inspection text data based on the constructed image semantic tag set constructed in the step 2, and matching corresponding semantic tags for equipment and places in the inspection text;
and 4, based on the extraction result of the text data keywords of the inspection in the step 3, establishing text and picture semantic sequences with the picture semantic tags in the step 2 by using an LCS algorithm, calculating the similarity between the sequences, and performing data matching check.
2. The heterogeneous data semantic extraction method for the power distribution and utilization multi-dimensional service according to claim 1, wherein the heterogeneous data semantic extraction method is characterized by comprising the following steps: the specific steps of the step 1 comprise:
(1) Unifying the sizes of the patrol shooting pictures, scaling the original image into a specified image size, and unifying the sizes of the image size to 600 multiplied by 800;
(2) Cutting and filling, namely unifying the sizes of images of different distribution network service sources, cutting an original image if the original image is larger than a target image, and filling blank pixel points generated in the image stretching process by using black pixels if the original image is smaller than the target image;
(3) And adjusting the image proportion of the equipment, adjusting the length-width ratio of partial image files in the data set, taking the function value as 1, taking the central point as a reference, and storing the adjusted image as a new image.
3. The heterogeneous data semantic extraction method for the power distribution and utilization multi-dimensional service according to claim 1, wherein the heterogeneous data semantic extraction method is characterized by comprising the following steps: the specific steps of the step 2 include:
(1) Constructing a deep learning model adopting an Encoder-Decoder structure, firstly inputting the power-on multi-dimensional picture data preprocessed in the step 1, entering an Encoder part, and passing through a CNN spaceFeature map of convolution layer is used to extract feature x= (x) of n positions of a picture 1 ,x 2 ,…,x n ) Wherein x is a D-dimensional vector;
(2) The context vector of the incoming Decoder CNN is z when the decoding is carried out in the t stage, namely the generation of the t feature semantics t The hidden layer state of the previous stage of CNN is g t-1 The method comprises the steps of carrying out a first treatment on the surface of the This context vector z t Is x= (x) 1 ,x 2 ,...,x n ) In particular z t And x= (x) 1 ,x z ,...,x n ) The relationship of (2) can be expressed as:
Figure FDA0003838486690000021
wherein alpha is t,m The weight of the image features at the mth position is measured when the t feature semantics are generated; this weight is actually the previous hidden layer state g t-1 And mth position image feature x m Is a function of (2);
(3) Obtaining the characteristic z t As the input of CNN, the model result y is output through hidden variable generation t
(4) And (3) manually intervening and calibrating part of data to finally obtain an image semantic tag set.
4. The heterogeneous data semantic extraction method for the power distribution and utilization multi-dimensional service according to claim 1, wherein the heterogeneous data semantic extraction method is characterized by comprising the following steps: the specific steps of the step 3 include:
(1) The characteristic is represented as follows: taking word vector X of the inspection text mapping as input, using a hash coding mode to code the dimensions of the inspection text vector of the power distribution station to an h layer, and then sequentially obtaining a 128-dimensional low-dimensional vector Y through a three-layer network of DNN, wherein the calculation process is as follows:
Figure FDA0003838486690000022
wherein l represents a hidden layer node, W i Is the first 1 F is the activation function of the hidden layer and the output layer, and the patrol text semantic model uses tanh as the activation function of the hidden layer and the output layer:
Figure FDA0003838486690000031
(2) By using n+1 low-dimensional vectors with 128 dimensions, respectively representing Query and N docs, the semantic similarity between the Query and Doc can be represented by the cosine distance (cosine similarity) between the two semantic vectors
Figure FDA0003838486690000032
(3) Semantic similarity of Query and positive sample Doc can be converted into a posterior probability through a softmax function:
Figure FDA0003838486690000033
wherein, gamma is a smoothing factor of softmax, D+ is a positive sample under Query, D 'is any sample under Query, and D' is the whole sample space under Query; in the training phase, a log-loss function is used:
L=-lnP(D + ∣Q) (6)
(4) The CNN-DSSM extracts the context information of the inspection record text under the sliding window through the convolution layer, and extracts the global context information through the pooling layer, so that semantic features are effectively reserved, and corresponding semantic tags are matched for equipment, places and the like in the inspection text.
5. The heterogeneous data semantic extraction method for the power distribution and utilization multi-dimensional service according to claim 1, wherein the heterogeneous data semantic extraction method is characterized by comprising the following steps: the specific steps of the step 4 include:
(1) Recursively solving picture semantic orderCommon subsequence of column and text sequences: let the sequence x= (X) 1 ,x 2 ,...,x n ) And y= (Y) 1 ,y 2 ,...,y m ) The method comprises the steps of carrying out a first treatment on the surface of the The number of elements of X, Y is n and m respectively; x is X i (x 1 ,x 2 ,...,x n ) And Y i (y 1 ,y 2 ,...,y m ) Respectively a subset of sequences X and Y, wherein i is less than or equal to n and j is less than or equal to m; based on the fact that the subset of the longest common subsequence between the two sequences is also its common subsequence property, the longest common subsequence LCS (X n ,Y m ) The specific calculation formula is as follows:
Figure FDA0003838486690000041
(2) Similarity calculation: if the obtained longest common subsequence length is k, the text 1 length is n 1 Text 2 has length n 2 The calculation formula of the similarity S is:
Figure FDA0003838486690000042
(2) And sorting labels according to the similarity: and (3) carrying out similarity calculation through a longest public subsequence algorithm, solving the longest public subsequence between the matched electrogram semantic sequence and the patrol text sequence, sequencing all label pairs according to the similarity S between the solved result measurement sequences, taking a group of labels with highest similarity as a matching result, and finishing checking among different semantic sequences.
CN202211095695.7A 2022-09-08 2022-09-08 Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service Pending CN116310407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211095695.7A CN116310407A (en) 2022-09-08 2022-09-08 Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211095695.7A CN116310407A (en) 2022-09-08 2022-09-08 Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service

Publications (1)

Publication Number Publication Date
CN116310407A true CN116310407A (en) 2023-06-23

Family

ID=86794771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211095695.7A Pending CN116310407A (en) 2022-09-08 2022-09-08 Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service

Country Status (1)

Country Link
CN (1) CN116310407A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627619A (en) * 2023-07-24 2023-08-22 山东华科信息技术有限公司 Cloud edge end cooperative information interaction method and system for multi-service heterogeneous resource scheduling

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627619A (en) * 2023-07-24 2023-08-22 山东华科信息技术有限公司 Cloud edge end cooperative information interaction method and system for multi-service heterogeneous resource scheduling
CN116627619B (en) * 2023-07-24 2023-10-10 山东华科信息技术有限公司 Cloud edge end cooperative information interaction method and system for multi-service heterogeneous resource scheduling

Similar Documents

Publication Publication Date Title
CN110516095B (en) Semantic migration-based weak supervision deep hash social image retrieval method and system
CN111198959B (en) Two-stage image retrieval method based on convolutional neural network
US8126274B2 (en) Visual language modeling for image classification
CN112733866B (en) Network construction method for improving text description correctness of controllable image
US20170262478A1 (en) Method and apparatus for image retrieval with feature learning
US9569698B2 (en) Method of classifying a multimodal object
CN108446334B (en) Image retrieval method based on content for unsupervised countermeasure training
CN111125411A (en) Large-scale image retrieval method for deep strong correlation hash learning
CN112417381B (en) Method and device for rapidly positioning infringement image applied to image copyright protection
CN108595546B (en) Semi-supervision-based cross-media feature learning retrieval method
CN110929080A (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN115357747B (en) Image retrieval method and system based on ordinal hash
CN113806580B (en) Cross-modal hash retrieval method based on hierarchical semantic structure
CN113177141A (en) Multi-label video hash retrieval method and device based on semantic embedded soft similarity
CN112163114B (en) Image retrieval method based on feature fusion
CN115329120A (en) Weak label Hash image retrieval framework with knowledge graph embedded attention mechanism
CN116310407A (en) Heterogeneous data semantic extraction method for power distribution and utilization multidimensional service
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN112269892B (en) Based on multi-mode is unified at many levels Interactive phrase positioning and identifying method
CN105678349A (en) Method for generating context descriptors of visual vocabulary
CN116383422B (en) Non-supervision cross-modal hash retrieval method based on anchor points
CN110533074B (en) Automatic image category labeling method and system based on double-depth neural network
CN105117735A (en) Image detection method in big data environment
CN110717068B (en) Video retrieval method based on deep learning
CN112084353A (en) Bag-of-words model method for rapid landmark-convolution feature matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination