CN114595324A

CN114595324A - Method, device, terminal and non-transitory storage medium for power grid service data domain division

Info

Publication number: CN114595324A
Application number: CN202111159160.7A
Authority: CN
Inventors: 沈亮; 欧阳红; 何鑫; 高士杰; 朱广新; 陈翔; 廖小琦; 张鹏宇; 李杏; 占震滨; 陈小明; 张伟; 颜克礼; 刘玉玺
Original assignee: Big Data Center Of State Grid Corp Of China; State Grid Zhejiang Electric Power Co Ltd; Beijing China Power Information Technology Co Ltd
Current assignee: Big Data Center Of State Grid Corp Of China; State Grid Zhejiang Electric Power Co Ltd; Beijing China Power Information Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-06-07

Abstract

The present disclosure provides a method, an apparatus, a terminal and a non-transitory storage medium for power grid service data domain division, including: judging the characteristics of the power grid service data for the power grid service data; wherein the features include descriptive features and discrete features; determining, by a first model, a category of first data for which the feature is a descriptive feature; determining a category of second data for which the feature is a discrete feature by a second model; and performing power grid service data domain division by using the category of the first data and the category of the second data. The method for dividing the power grid service data into the domains can be used for processing the classification problem of different types of data.

Description

Method, device, terminal and non-transitory storage medium for power grid service data domain division

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a method, a device, a terminal and a non-transitory storage medium for power grid service data domain division.

Background

In a data model of a power grid service, relevant features of a table and the like need to be subjected to data processing, and semantic source data are intelligently extracted to be used as input of intelligent analysis. The current data models are huge in quantity and extremely numerous in professional categories, and many terms are difficult to determine through grammatical rules and need to be described together by constructing multi-feature semantic information; and because the feature space is huge, the feature representation of the domain is unclear, the entity connection relation of the related table is unclear, and the domain name confusion is easily caused.

Disclosure of Invention

In order to solve the problems in the prior art, the present disclosure provides a method, an apparatus, a terminal and a non-transitory storage medium for power grid service data domain division.

The present disclosure provides a method for power grid service data domain division, which includes:

judging the characteristics of the power grid service data for the power grid service data; wherein the features include descriptive features and discrete features;

determining, by a first model, a category of first data for which the feature is a descriptive feature;

determining a category of second data for which the feature is a discrete feature by a second model; and

and performing power grid service data domain division by using the category of the first data and the category of the second data.

In an embodiment of the present disclosure, the determining the characteristic of the data includes:

preprocessing the data; wherein the pre-processing comprises: at least one of splitting matching, counting, word segmentation and removal of stop words; and

extracting the features of the preprocessed data through a bag of words model and/or a feature extraction model based on word vectors.

In an embodiment of the present disclosure, the determining, by the first model, the category of the first data of which the feature is the descriptive feature includes:

extracting domain name information, label information and word information from the first data;

vectorizing the domain name information and the label information to obtain vectorized data;

vectorizing the word information to obtain a word vector;

obtaining an attention value according to the word vector;

converting the word vector into a fixed-length vector; and

and classifying the vectorization data and the fixed-length vector according to the attention value to obtain a classification result.

In an embodiment of the disclosure, the first model comprises: a text embedding ALE layer, an ATT layer, a TextRNN layer and a softmax perception layer; the ATT layer comprises a single layer or multiple layers;

wherein, when the ATT layer is a single layer, the method further comprises:

obtaining corresponding weight according to the word vector;

weighting the word vectors according to the corresponding weights to obtain weights; and

weighting and summing the word vectors and the weight values to obtain the attention numerical value;

wherein, when the ATT layer is a multilayer, the method further comprises:

converting the word vector to a sentence vector on a first layer; and

converting the sentence vector to a paragraph vector on a second layer.

In an embodiment of the present disclosure, the obtaining, by the first model, the category of the first data with the characteristic being the descriptive characteristic further includes:

acquiring a word bank of the keywords;

extracting a category label with the highest occurrence frequency of each keyword from the word stock; and

re-encoding the first data such that each data group in the first data corresponds to a multi-dimensional vector; wherein each dimension of the multi-dimensional vector characterizes a statistic of the occurrence of a class label in the data set.

In an embodiment of the present disclosure, the determining, by the second model, the category of the second data of which the feature is a discrete feature includes:

extracting domain serialization features from the first data;

extracting discretization features from the second data;

performing second-order cross calculation on the domain continuous features and the discretization features to obtain a calculation result; and

and according to the calculation result and the high-dimensional embedding of the domain continuous features in the depth measurement, obtaining a classification result.

In embodiments of the present disclosure, the discretization feature comprises a domain discretization feature and a source system discretization feature;

wherein the method further comprises:

adopting discretization one-hot vector representation to the domain discretization feature; and

and carrying out discrete value coding processing on the discretization characteristic of the source system.

The utility model provides a device of electric wire netting business data locoregion, includes:

the judging module is used for judging the characteristics of the power grid service data;

the determining module is used for determining the category of the first data of which the characteristic is a descriptive characteristic through a first model and determining the category of the second data of which the characteristic is a discrete characteristic through a second model; and

and the domain division module is used for performing domain division on the power grid service data by utilizing the category of the first data and the category of the second data.

The present disclosure proposes a terminal, comprising:

at least one memory and at least one processor;

wherein the at least one memory is configured to store program code, and the at least one processor is configured to call the program code stored in the at least one memory to perform any of the methods described above.

The present disclosure proposes a non-transitory storage medium for storing program code which, when executed by a computer device, causes the computer device to perform the method of any of the above.

The technical scheme of the present disclosure has the following positive effects:

(1) the method aims at the problem of identification and classification of domain name related semantic attributes of a power grid data model, provides a first model ALE-TextRNN model for the problem of classification of table description features according to text polarity, can capture the importance of different context information to given class tendency, facilitates the resolution of fine class difference in multi-classification tasks, and reduces class confusion.

(2) Aiming at the characteristics of fields and source systems of the table, the method and the device have the advantages that the semantic dependency relationship among discrete and classified characteristics is excavated by using the improved deep FM model, the characteristic distribution diversity can be increased without manually designing input characteristics, an embedded layer can be supervised by the FM side, the characteristic dimension and the repeatability can be effectively reduced, and the combined calculation performance is improved.

(3) And mining and identifying the semantics of the multi-form features, redefining an evaluation mode, and determining a final result by the party with higher regression prediction score of the two parts, thereby obtaining more accurate semantics and more ideal domain division effect.

Drawings

The features and advantages of the invention, as well as the technical and industrial significance of exemplary embodiments, will be described in detail below with reference to the accompanying drawings, wherein like reference numerals indicate like elements, and wherein:

fig. 1 is a flowchart of a power grid service data domain division method according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a fastText in accordance with an embodiment of the present disclosure.

FIGS. 3a-3b are schematic diagrams of a TextCNN according to an embodiment of the present disclosure.

FIG. 4 is a schematic diagram of a TextRNN according to an embodiment of the present disclosure.

FIG. 5 is a frame diagram of a TextRNN + attention according to an embodiment of the present disclosure.

FIG. 6 is a schematic diagram of the Hierarchical orientation of an embodiment of the present disclosure.

Fig. 7 is a schematic diagram of a Wide & Deep framework according to an embodiment of the present disclosure.

Fig. 8 is an architecture diagram of a first model according to an embodiment of the disclosure.

Fig. 9 is a schematic diagram of RNN principles of an embodiment of the present disclosure.

Fig. 10 is a schematic diagram of softmax of an embodiment of the present disclosure.

Fig. 11 is a schematic structural diagram of a second model of an embodiment of the disclosure.

Fig. 12 is a schematic structural diagram of a power grid service data domain dividing device according to an embodiment of the present disclosure.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.

In a data model of the power grid service, domain name characteristics can be used as the most core boundary among data sources, represent the common characteristics and the application range of all data under a table, and are the basic work of all subsequent semantic identification and knowledge mining. Therefore, we need to use the domain name of the entity table as an important attribute value for constructing the map, extract the semantics related to the domain name from the description features or other features of the table, classify different domains and other attributes in the domains according to the semantics attribute value, and complement the semantics attribute information.

With the rapid development of electronic office, mobile terminals and social media, a large number of short texts are accumulated in an internet database to be processed urgently, and the continuity and the consistency of the original corpus data enable the original corpus data to have certain local laws, so that the solution of text representation and classification based on deep learning becomes an important field in natural language processing. The text classification model mainly comprises emotion polarity analysis and main body class classification in application. The emotion polarity analysis can find out user preferences by mining user feedback, and can play a great help role in further popularization of products made by companies and manufacturers. And the classification of the subject categories is helpful to the mining of public expectations, the implementation of public opinion monitoring and the identification of sensitive topics.

Referring to fig. 1, fig. 1 is a flowchart of a power grid service data domain dividing method according to an embodiment of the present disclosure, including the following steps.

S100, judging the characteristics of the power grid service data for the power grid service data; wherein the features include descriptive features and discrete features.

Specifically, embodiments of the present disclosure may include preprocessing the data; wherein the pre-processing comprises: at least one of splitting matching, counting, word segmentation and removal of stop words; and extracting the features of the preprocessed data through a bag of words model and/or a feature extraction model based on word vectors.

More specifically, the embodiment of the disclosure may first perform original sentence splitting and matching or word statistics on a text, thereby performing a series of preprocessing operations such as chinese word segmentation or stop word removal; then, text feature extraction is performed, and the following methods are mainly used:

(1) bag of words model

This embodiment can establish a dictionary base containing all words in the training corpus and represent the uniquely identifiable number of each word by one-hot, wherein the word vector dimension and the number of words in the dictionary base are the same.

(2) Text feature extraction Term Frequency-Inverse Document Frequency

The embodiment can measure the importance degree of the words in the text by using two parameters of a certain word occurrence frequency (TF) in the document and the occurrence probability (IDF) of each word in the document, and can respectively show the importance degree of the words in the document and the importance degree of the words between the documents. The expression of the characteristic value is as follows:

TF_IDF(i,j)＝TF_i,j*IDF_i (1)

wherein, TF _ IDF (i, j) is an importance index of a word i in a document j and is obtained by multiplying the word frequency by the reverse file frequency; TF_i,jRepresenting the number of times the word i appears in the document j,

IDF_ithe inverse file frequency of the word i is represented,

(3) feature extraction model based on word vectors

According to the embodiment, each word can be mapped into one dimension in a large text corpus through training a neural network, for example, a word2vec model and the like, each word can be represented by a vector obtained through training a large number of corpora, and the effect on efficiency and grammar semantic expression is improved.

The deep learning text classification model is used for carrying out feature expression, and the following modes are mainly adopted:

(1) fastText: as shown in fig. 2, fig. 2 is a schematic diagram of fastText according to an embodiment of the present disclosure. In the embodiment, all word vectors in the sentence can be normalized after being averaged, and the local sequence information can be obtained by means of n-gram trigk.

(2) TextCNN: 3a-3b, FIGS. 3a-3b are schematic diagrams of a TextCNN according to an embodiment of the disclosure. According to the method, more local sequence classification information can be concerned on the basis of the quick text tribk, and better prediction accuracy can be achieved. But also because of the introduction of one-dimensional convolution, a plurality of convolution kernels with different sizes need to be specified to obtain fields of view with different widths, and a dynamic pooling (k-max Pooling) method needs to be introduced to keep k maximum information related to global sequences.

(3) TextRNN: as shown in FIG. 4, FIG. 4 is a schematic diagram of a TextRNN according to an embodiment of the present disclosure. In this embodiment, k can more flexibly model the text sequence information with a longer length, and achieve the effect of better expressing the text context information.

(4) TextRNN + attention: as shown in FIG. 5, FIG. 5 is a frame diagram of TextRNN + attention according to an embodiment of the present disclosure. The embodiment can introduce an attention mechanism to identify the semantics of emotion polarity, and can capture important information responding to given polarity by adding N-pass embedding on a hidden layer of a basic LSTM. Emotion classification refers to recognizing emotion tendencies, which are contained in corpora, commendable or deresome, extracting attitudes and viewpoints of a text description subject, and is considered as a special text classification problem due to the fact that information expression has concealment and ambiguity. It can learn by building supervised, semi-supervised and unsupervised tasks. The supervised emotion classification task needs to label a large number of samples by taking fine-grained emotion polarity words as labels, and learns a feature space by using a multi-classification algorithm; the problem of insufficient corpus labeling can be solved through collaborative learning in semi-supervised learning; the unsupervised learning can obtain the emotional tendency by calculating the point mutual information between the emotional seed vocabulary and the text vocabulary.

S200, determining the category of the first data with the characteristic as the descriptive characteristic through the first model.

Specifically, the embodiments of the present disclosure may include extracting domain name information, label information, and word information from the first data; vectorizing the domain name information and the label information to obtain vectorized data; vectorizing the word information to obtain a word vector; obtaining an attention value according to the word vector; converting the word vector into a fixed-length vector; and classifying the vectorization data and the fixed-length vector according to the attention value to obtain a classification result. Wherein the first model (ATT-ALE-TextRNN model) includes: a text embedding ALE layer, an ATT layer, a TextRNN layer and a softmax perception layer; the ATT layer comprises a single layer or multiple layers. More specifically, when the ATT layer is a single layer, embodiments of the present disclosure may include: obtaining corresponding weight according to the word vector; weighting the word vectors according to the corresponding weights to obtain weights; and weighting and summing the word vector and the weight value to obtain the attention numerical value. When the ATT layer is a multilayer, embodiments of the present disclosure may further include: converting the word vector to a sentence vector on a first layer; and converting the sentence vector to a paragraph vector on a second layer. In addition, the embodiment of the disclosure may further include obtaining a thesaurus of the keywords; extracting a category label with the highest occurrence frequency of each keyword from the word stock; re-encoding the first data to enable each data group in the first data to correspond to a multi-dimensional vector; wherein each dimension of the multi-dimensional vector characterizes a statistic of the occurrence of a class label in the data set.

In order to prevent confusion of domain names, the embodiment of the disclosure can send the fusion features obtained by connecting the text information and the information related to the domain names into a deep attention network to identify the most important information under different domains of the text. Referring to fig. 5, in the present embodiment, the core information in the sentence is obtained by learning the vocabulary and the sentence with hierarchical attention weights, and then the text classification task is realized by the TextRNN structure. A keyword weight learning model is introduced into the scheme, and a Hierarchical Attention Network is added to the structure of the TextRNN. Referring to fig. 6, fig. 6 is a schematic diagram of a structural attachment according to an embodiment of the disclosure. The Hierarchical Attention Network can be used for longer text classification problems: the first layer can represent the word vector as sentence vector through TextRNN + Attention, and the second layer represents the sentence vector as the vector of the whole text segment through the same structure; the network only containing single-layer Attention can apply different external weights to different pixels on the same feature map in image learning for learning, migrate to a single-layer feature map, and perform word-by-word weighting by considering the similarity between words based on the interior of the corpus. The hierarchical Attention network refers to the classification problem of longer texts, the word vector is represented as a sentence vector through TextRNN + Attention in the first layer, and the sentence vector is represented as a vector of an integral text segment through the same structure in the second layer. See the following equation:

u_t＝tanh(W_wh_t+b_w) (2)

α_t＝exp(u_t ^Tu_w)/(∑_texp(u_t ^Tu_w)) (3)

wherein t represents the t-th word, h_tA word vector representing a tth word; u. of_tIs h_tIs represented by a hidden layer of h_tObtained through the operation of a single-layer neural network in a formula (2); u. of_wA randomly initialized weight vector is trained as a parameter of the model; formula (3) to h_tHidden layer representation of u_tPerforming softmax standardization to obtain alpha_t；s_tIs a vector representation of the t-th sentence, denoted by α_tAnd the word vector h of the t-th word_tAnd (4) obtaining the weight calculation.

Based on the steps, the trained model has better text representation and higher classification precision, and the importance of words and sentences in the text classification is expressed by an intuitive method, so that the model interpretability is improved.

For descriptive continuous text features, semantic information related to the labels can be obtained by means of a text classification algorithm. The textCNN algorithm can extract local sequence information of a text, and the textRNN algorithm can learn the context relationship of the sequence, does not consider the limitation of the length of the sequence, and is more suitable for the condition that the length of the corpus cannot be fixed. In practical verification, the accuracy of the TextCNN can reach 69%, and the accuracy of the TextRNN is slightly higher than that of the TextCNN and can reach 73% although the training speed is slower.

For example, the feature words under two domain names have a certain similarity, and the corpus is short, so that the confusion of categories is easy to occur if representative features are missing. Therefore, in order to enhance semantic information related to corpora and categories, the embodiment of the present disclosure adds an attention mechanism in Natural Language Processing (NLP) to focus on core features of the original corpora. In addition, the word frequency statistics of the dictionary is mapped to the category label information, the original corpus is recoded and added into the last hidden layer, and the category prediction result is influenced together.

Please refer to fig. 7, fig. 7 is a schematic diagram of the Wide & Deep framework according to the embodiment of the present disclosure. The embodiment of the disclosure can utilize the web architecture of wide & deep to generate a word bank of high-frequency keywords according to the training corpus; and extracting the category label with the highest occurrence frequency of each keyword, and recoding the original sentence, wherein each sentence corresponds to a 10-dimensional vector, each dimension represents the word number statistics of one category label in the sentence, and the statistics can be used as category weighting when the full-connection layer softmax is output, so that the mapping relation from the keyword to the corresponding label and the category semantics in the original sentence are strengthened.

Referring to fig. 8, fig. 8 is a schematic diagram of a first model according to the embodiment of the disclosure. The embodiment of the disclosure can connect the theme to be strengthened, such as label-aspect embedding, with the text information as input, and calculate the attribute score together by adding N times of polarity information of the theme on the hidden layer, and meanwhile, the long-short time memory network is used for facilitating serialized feature extraction, so that polarity information and text key information can be better mined and utilized. The attention mechanism can dynamically change the weight according to input data, and for a neural network not comprising the attention mechanism, the weight of the neural network is independent of an input word vector, and the weight is not dynamically adjusted according to the input. The attention mechanism can input different weights according to different word vectors, endow different word vectors with different weights, and obtain the weights by comparing different word vectors. Specifically, a word vector is input first, and the entry weights the input word vector (entry weighting), and then the sum of the word vector and the weighting value can obtain a corresponding entry value, i.e., an entry score.

According to the method, the second-level domain name information and the text (table description) information are used as input, then the bidirectional RNN is used for training and sharing weight information, the obtained domain name characteristics and the text characteristics are fused, and finally the fusion characteristics are processed by a deep attention mechanism, so that the polarity division tendency of different domain names in the text can be effectively recognized, and the recognition capability of the whole model on domain name related semantics is improved.

The layers included in the first model (ATT-ALE-TextRNN model) will be described below, respectively: text embedding ALE layer, ATT layer, TextRNN layer, and softmax perception layer.

(1) Text embedding ALE layer

The embodiment of the disclosure introduces an ALE module/layer (attitude aspect-level embedding) for the detection of fine-grained polarity classification. In this embodiment, the original corpus coding and the secondary domain name coding are input at the input end, and the advantages are as follows: firstly, by embedding the training aspect-level into another vector space, the information of the aspect can be more fully utilized, and the related semantics of the domain name in the original corpus can be further strengthened. Secondly, the problem that word vectors are not consistent with the embedding of aspect-level is solved, and the most important information responding to the given aspect-level is captured. The model can capture the most important part of the sentence at present and the part with distinction degree when different aspect-levels are given when different second-level domain names are given.

(2) ATT layer

The embodiment of the disclosure can calculate the weight between the secondary domain name and the output vector of the original feature after deep network processing by using an attention mechanism, thereby calculating the attention degree of the related semantics of the content and the domain name of the original corpus, enabling the model to notice different parts of sentences, and capturing the potential relevance of the content and the domain name. For example, hidden layer vector [ h1, h2, h3, …, hN]A matrix of composition, where the size of the hidden layer is d, the length of a given sentence is N, v_laDefined as the embedding of label-aspect, the attention mechanism will produce an attribute weight vector α and a weighted hidden vector r that characterize the weighting of sentences of a given classification polarity. See in particular the following formula:

α＝softmax(w^TM) (6)

r＝Hα^T (7)

wherein the content of the first and second substances,

the characterization is to connect the repeated processes, that is, the process of performing N times of linear transformation on the repeated processes, where the number of times is the length of the sentence. The final sentence can be characterized as follows:

h^*＝tanh(W_Pr+W_xh_N) (8)

y＝softmax(W_sh^*+b_s) (9)

where h may be considered as a new feature representation after adding a given secondary domain name to the original sentence. And then adding a linear layer, converting the sentence vector into a vector e with the length equal to the number of the categories, and finally converting e into conditional probability distribution through a softmax layer. The loss function is also defined as a cross-entropy function.

(3) TextRNN layer

The neural network model may represent variable-length text as fixed-length vectors, which typically consist of a projection layer that maps words, sub-word units, or n-grams to vector representations (which may typically be trained using unsupervised methods) and then combines them with different neural network architectures to model text, such as neural bag-of-words models, convolutional neural networks, recursive neural networks, and so forth. The Recurrent Neural Network (RNN) is very suitable for processing variable-length texts due to the recurrent structure, can be used for processing natural language problems, recursively performs state transition on internal hidden states of an input sequence according to activation of the input sequence and a previous hidden state vector under different time steps. Referring to fig. 9, fig. 9 is a schematic diagram of an RNN according to an embodiment of the disclosure. The embodiment of the disclosure can complete sequence mapping from an input vector to a fixed length vector through RNN, and then input a softmax layer for predicting the probability distribution of categories and applying to classification or other tasks; meanwhile, the cross entropy of prediction and real distribution is minimized by training network parameters.

To avoid gradient explosion or disappearance during training due to lack of learning of long-range correlations in the sequence, e.g., as a gradient vector grows or decays exponentially over time, embodiments of the present disclosure further introduce an LSTM network. The LSTM internal independent memory cells include many variations. In this embodiment, the TextRNN model defines LSTM units of each time step t as a vector set with a size of d, where each LSTM unit includes an input gate, a forgetting gate, an output gate, a hidden state and a storage unit, and d represents the number of LSTM units. The algorithm formula is as follows:

i_t＝σ(W_ix_t+U_ih_t-1+V_ic_t-1) (10)

f_t＝σ(W_fx_t+U_fh_t-1+V_fc_t-1) (11)

o_t＝σ(W_ox_t+U_oh_t-1+V_oc_t-1) (12)

h_t＝o_t⊙tanh(c_t) (15)

the hidden layer of RNN has only one state, h, which is very sensitive to short-term inputs, and LSTM added state c preserves long-term state. In the above formula, at time t, the input of LSTM includes: input value x of the network at the present moment_tLast time LSTM output value h_t-1And cell state c at the previous time_t-1. The output of the LSTM includes: output value h of current time LSTM_tAnd cell state c at the present time t_t. In the formula (10), W_iWeight matrix representing input gates, in equation (11), W_fA weight matrix representing a forgetting gate.

(4) Sensing layer softmax

Referring to fig. 10, fig. 10 is a schematic diagram of softmax according to an embodiment of the present disclosure. The embodiment of the disclosure can use a softmax regression model as an output layer of a deep learning network to output the probability of a certain sample on all possible categories in the form of predicted probability values. The formulas and principles are as follows:

wherein, V_iThe output value of the ith node is represented, and the output value of the multi-classification can be converted into the range of [0,1 ] through a Softmax function]And a probability distribution of 1.

And S300, determining the category of the second data with the characteristic as the discrete characteristic through a second model.

Specifically, embodiments of the present disclosure may include extracting domain-serialization features from the first data; extracting discretization features from the second data; performing second-order cross calculation on the domain continuous features and the discretization features to obtain a calculation result; and embedding the high-dimensional characteristic in the depth measurement according to the calculation result and the domain continuous characteristic to obtain a classification result. Wherein the discretization feature comprises a domain discretization feature and a source system discretization feature. More specifically, embodiments of the present disclosure may also include employing a discretized one-hot vector representation of the domain discretization feature; and carrying out discrete value coding processing on the discretization characteristic of the source system.

In the CTR prediction of the recommendation system, it is necessary to determine whether a commodity can be recommended according to the click rate of the CTR prediction. Whether a user clicks the advertisement on an interface or not determines the conversion (passing) rate of the advertisement, and the user, the advertisement and the corresponding cross description features are more, such as the age, the gender, the region, the position, the size, the industry and the real-time feedback information of the advertisement … … of the mobile phone type, the CTR of the advertisement and the gender cross, and the like. Since advertisement clicks are a sparse event, the number of times that many combined features appear in the training dataset is very small, which directly results in insufficient weight learning of the model for such features, resulting in overfitting. Therefore, when the CTR estimation is performed and whether an advertisement can be clicked is determined, the features are often combined in addition to the single feature. The algorithms like LR and GBDT consider all cross signatures to be independent from each other, even though two cross signatures have relevance from a traffic perspective. This results in models that are also independent of each other when optimizing parameters, and that do not fully exploit the correlation on feature services, resulting in overfitting. Preferably, compared with LR and GBDT, the Factorization Machine (FM) algorithm can extract features through the computation of implicit vector inner products and perform cross-combination when there are more sparse features, so as to learn few or no feature combinations, and is beneficial to automatically processing the problem of feature cross, that is, learning cross features efficiently under the condition that the combined features are not sufficiently co-occurrence. For example: the feature a and the feature b never appear in pairs in the training data, but the feature b and the feature c often co-occur, and the feature a and the feature c also often co-occur, so that it can be considered that there is a certain correlation between the feature a and the feature b in the FM model. The comparison between the LR method and FM is as follows:

wherein x is an n-dimensional vector, x_iA value, x, representing the ith dimension of the vector_jA value, w, representing the jth dimension of the vector_ijIs the corresponding weight; < v_i，v_jDenotes the vector v_iSum vector v_jA dot product of, wherein v_iA vector of dimension i, V, representing a matrix of coefficients V_jRepresents the j-th dimension vector of the coefficient matrix V.

In addition to the linear part, the disclosed embodiments introduce quadratic cross terms to achieve linear training complexity. When the domain name of the table is classified, a plurality of field features can be used as advertisement features, low-dimensional intensive embedding of high-dimensional discrete features is carried out on the advertisement features, feature combination is carried out, and effective feature information is mined. In this embodiment, domain information is fused in a feature processing process, features with the same property are classified into the same domain, and a formula after a quadratic cross term is improved is as follows:

where sample x is an n-dimensional vector, x_iA value, x, representing the ith dimension of the vector_jA value, f, representing the jth dimension of the vector_jDenotes the domain to which the jth feature belongs, v_i,fjRepresents x_iThe corresponding hidden vector.

The calculation of the second-order cross features can solve the problem of limitation of calculation complexity, and the classification precision of actual verification can reach 62%.

The DNN is also called a multi-layer perceptron and can be considered as a neural network with many hidden layers because the local look is the same as the perceptron principle, i.e. the linear relationship and the activation function together form a non-linear relationship. The internal host network structure can be roughly divided into three categories: an input layer, a hidden layer and an output layer. All the layers are connected, namely any neuron in the previous layer is connected with any neuron in the next layer. And because of the full connection structure, the method has the capability of high-order characteristic representation.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a second model according to an embodiment of the disclosure. The second model (deep FM model) in the embodiment of the disclosure can give consideration to the high-order feature representation and the calculation parallel capability of the model, respectively introduce the domain discretization feature and the source system discretization feature into the FM layer, and simultaneously add the domain serialization feature into the FM to perform second-order cross calculation with the discretization feature. And combining with high-dimensional embedding of the continuous features on the depth side, finally, jointly calculating logsitc score and softmax normalized output, and jointly influencing the prediction result.

And S400, performing power grid service data domain division by using the category of the first data and the category of the second data.

In the embodiment of the present disclosure, if the category of the first data is identical to the category of the second data, the identical result may be used as a final result with high confidence.

The utility model provides a power grid data model domain division method based on natural language processing technology, which integrates a characteristic description feature and a characteristic dispersion feature, and designs an ATT-ALE-TextRNN text classification model based on the characteristic description feature and a deep FM classification model based on the characteristic dispersion feature.

The present disclosure provides a data model domain-dividing method based on natural language processing technology. In the domain name classification task of the data table, a plurality of characteristics of the table can be divided into two types of input of continuous text characteristics and other discretization category characteristics according to different forms of the table. In the field of recommendation algorithms, a model architecture combining continuous and discrete inputs is provided, but as the relevance of text features and discrete features of most tables is not strong, overlapped words are lacked, the category of field features is large, and the traditional single model training effect is poor. Aiming at descriptive characteristics of the table and discrete characteristics of the table, the method adopts suitable models to predict respectively, and finally integrates the two models to achieve a better training effect.

Referring to fig. 12, an embodiment of the present disclosure further provides a device 10 for power grid service data domain division, including: the device comprises a judging module 11, a determining module 13 and a domain dividing module 15. The judging module 11 may be configured to judge characteristics of the power grid service data; the determining module 13 is operable to determine a category of the first data in which the feature is a descriptive feature by a first model, and determine a category of the second data in which the feature is a discrete feature by a second model; the domain division module 15 may be configured to perform domain division on the grid service data according to the category of the first data and the category of the second data.

For the embodiments of the apparatus, since they correspond substantially to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described apparatus embodiments are merely illustrative, wherein the modules described as separate modules may or may not be separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The method has the advantages that by means of deep learning and natural language processing technologies, a designed and trained model has better text representation and higher classification precision, and can be applied to a power grid service data model to realize semantic automatic classification and association of information under a power grid data model domain, and finally realize automatic mapping of the service data model.

It should be understood that the above-described specific embodiments are merely illustrative of the present invention and are not intended to limit the present invention. Obvious variations or modifications which are within the spirit of the invention are also within the scope of the invention.

In the present specification, whenever reference is made to "an exemplary embodiment", "a preferred embodiment", "one embodiment", or the like, it is intended that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with any embodiment, it is submitted that it is within the purview of one skilled in the art to effect such feature, structure, or characteristic in other ones of all the embodiments described.

The embodiments of the present invention have been described above in detail. However, aspects of the present invention are not limited to the above embodiments. Various modifications and substitutions may be made to the above-described embodiments without departing from the scope of the present invention.

Claims

1. A method for power grid service data domain division comprises the following steps:

2. The method of claim 1, wherein the determining the characteristic of the data comprises:

3. The method of claim 1, wherein determining the class of the first data for which the feature is a descriptive feature by the first model comprises:

vectorizing the word information to obtain a word vector;

obtaining an attention value according to the word vector;

converting the word vector into a fixed-length vector; and

4. The method of claim 3, wherein the first model comprises: a text embedding ALE layer, an ATT layer, a TextRNN layer and a softmax perception layer; the ATT layer comprises a single layer or multiple layers;

wherein, when the ATT layer is a single layer, the method further comprises:

obtaining corresponding weight according to the word vector;

wherein, when the ATT layer is a multilayer, the method further comprises:

converting the word vector to a sentence vector on a first layer; and

converting the sentence vector to a paragraph vector on a second layer.

5. The method of claim 1, wherein obtaining the category of the first data with the characteristic being the descriptive characteristic through the first model further comprises:

acquiring a word bank of the keywords;

6. The method of claim 1, wherein determining the class of the second data for which the feature is a discrete feature using the second model comprises:

extracting domain serialization features from the first data;

extracting discretization features from the second data;

7. The method of claim 6, wherein the discretized features comprise a domain discretized feature and a source system discretized feature;

wherein the method further comprises:

8. An apparatus for power grid service data zoning, comprising:

9. A terminal, comprising:

at least one memory and at least one processor;

wherein the at least one memory is configured to store program code and the at least one processor is configured to invoke the program code stored in the at least one memory to perform the method of any of claims 1 to 7.

10. A non-transitory storage medium storing program code which, when executed by a computer device, causes the computer device to perform the method of any one of claims 1 to 7.