CN113673202A

CN113673202A - Double-layer matching coding mapping recommendation method based on hybrid supervision

Info

Publication number: CN113673202A
Application number: CN202110905914.2A
Authority: CN
Inventors: 傅骏伟; 王豆; 郭鼎; 张震伟; 李炳辰; 姜志锋; 吴林峰; 陆金奇
Original assignee: Zhejiang Energy Group Research Institute Co Ltd
Current assignee: Zhejiang Zheneng Digital Technology Co.,Ltd.; Zhejiang Energy Group Research Institute Co Ltd
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2021-11-19

Abstract

The invention relates to a hybrid supervision-based double-layer matching coding mapping recommendation method, which comprises the following steps: collecting an original KKS code list and a new KKS code list by using collection equipment; carrying out manual matching; training data set of supervised matching modelDAnd carrying out supervised training. The invention has the beneficial effects that: the intelligent KKS code mapping task is provided, the acquisition equipment is used for acquiring an original KKS code list and a new KKS code list, manual matching is performed, and a supervision matching model training data set is usedDPerforming supervised training by using Chinese description in original KKS coding list、Unsupervised matching is performed on Chinese descriptions in the new KKS coding list, and supervised matching is performed on data which fail in unsupervised matchingPreparing; the mapping table of the standard codes can be directly obtained, the workload of standardization work is greatly reduced, the stability of system operation is improved, the generalization of bottom layer data is realized, and the coding rules are unified.

Description

Double-layer matching coding mapping recommendation method based on hybrid supervision

Technical Field

The invention belongs to the technical field of power plant information, and particularly relates to a hybrid supervision-based double-layer matching coding mapping recommendation method.

Background

The problems of information confusion, low data quality, isolated island of data application and the like are easily caused by complex asset structure, various types and large data scale of power generation enterprises, and data sharing and application are seriously hindered. To address the above, KKS encoding systems were introduced. The system identifies the coding system of the system, equipment, assembly and construction in the power plant according to the function, model and installation position, and becomes the most widely used identification system of the power plant at present, and has been for 50 years since the past.

However, with the advancement of power generation production informatization, the informatization assets and virtualization activities are increased day by day, the characteristics of the KKS coding weak standard rule are more and more obvious, and the original identification coding system is improved according to the requirement of intelligent production at present, so that coding standards in a plurality of enterprises are provided. The improved standard codes are easy to implement in newly built power generation enterprises, and in old factories which have been operated for years, firstly, the coding rules adopted by the operated systems are inconsistent, and secondly, due to the lack of manpower and material resources, the coding problem of the whole factory is difficult to be solved by organizing corresponding force. The invention discloses a CN201310367698.6 Internet of things unified identification code multi-mode recognition method, which decomposes a unified identification code into single-mode code information, and then performs matched analysis to analyze the code content carried by each single-mode code information. The invention patent CN201310289939.X provides a KKS intelligent batch coding method for three-dimensional design of a transformer substation. All the above patents consider the mapping problem between different codes, and only make innovation on the coding process. Therefore, in order to further reduce the encoding workload, improve the stability of system operation, and unify the encoding rules, a set of encoding matching method is urgently needed for the mapping task of encoding.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a hybrid supervision-based double-layer matching coding mapping recommendation method.

The method for recommending the mapping based on the hybrid supervision double-layer matching coding comprises the following steps:

step 1, collecting an original KKS code list and a new KKS code list stored in a database through an interface provided by a database by using collection equipment, and storing the original KKS code list and the new KKS code list;

wherein the original KKS code list is:

in the above formula, the first and second carbon atoms are,

to

Is an English code in the original KKS code list,

to

The Chinese description in the original KKS coding list is obtained;

wherein the new KKS code list is:

in the above formula, the first and second carbon atoms are,

to

For english codes in the new KKS code list,

to

Chinese descriptions in the new KKS coding list;

step 2, manually matching the original KKS code list and the new KKS code list obtained in the step 1:

obtaining a supervised matching model training datasetD；

Step 3, training data set of the supervision matching model obtained in the step 2DCarrying out supervised training to obtain a supervised matching modelModel；

Step 4, adopting the Chinese description in the original KKS coding list obtained in the step 1d ¹ 、Chinese descriptions in new KKS coding listsd ²Carrying out unsupervised matching;

step 5, extracting the supervised matching model obtained in the step 3 in the storage deviceModelCarrying out supervised matching on the data which fails in the unsupervised matching in the step 4;

step 5.1, according to the supervised matching modelModelObtaining a generation result;

and 5.2, performing similarity matching on the generated result and the new code by adopting the minimum editing distance, and storing the result in a storage device.

Preferably, the database in the step 1 is a PI database; the acquisition equipment writes a python script for data extraction according to an access rule provided by a PI database, acquires a data coding list through an interface provided by the database and stores the original KKS coding list and the new KKS coding list in a CSV mode; the original KKS coding list adopts a design institute coding rule or a DCS manufacturer coding rule, and the coding conditions of different power plants are inconsistent; the new KKS codes are new coding rules redesigned according to the conditions of each power plant; although the two encoding rules are encoded in terms of units, systems, devices, components and the like, the specific encoding form and the construction rule are completely different.

Preferably, the amount of data manually matched in step 2 is 5000 pieces.

Preferably, step 3 specifically comprises the following steps:

step 3.1, the supervision matching model training data setDThe data in the method adopts Jieba (Python Chinese word segmentation component) to carry out word segmentation to obtain word segmentation results, and then adopts N-Gram to carry out vectorization to obtain a vectorized training data set

；

Step 3.2, training data set vectorized

Input to supervised matching modelsModel(ii) a First, two layers of MLP coding networks are used to generate sparse features, wherein the first layer of MLP coding network is

The second layer of MLP coding network is

(ii) a Generating reconstruction characteristics by two layers of MLP decoding networks, wherein the first layer of MLP decoding network is

The second layer MLP decoding network is

：

In the above formula, the first and second carbon atoms are,

sparse features generated for the first layer MLP coding network,

sparse features generated for the second layer MLP coding network,

the reconstruction signature generated for the first layer MLP decoding network,

a reconstruction signature generated for a second layer MLP decoding network;

step 3.3, reconstructing characteristics generated by the second-layer MLP decoding network

Performing cross entropy calculation with the label vector to obtain a loss functionLoss Function：

In the above formula, the first and second carbon atoms are,nn reconstruction features are provided for one batch; the above formula isnSumming the cross entropies of the bar reconstruction features to serve as a loss function value of the batch of training data;

step 3.4, obtaining the loss function according to step 3.3Loss FunctionCarrying out a dieType iteration, as a function of lossLoss FunctionWhen the convergence tends to be reached, a supervised matching model is obtainedModelAnd stored in a storage device.

Preferably, step 3.1 also expands the word segmentation result according to the coding word segmentation amount, and fills up the shortage by using blank spaces.

Preferably, the encoding word quantity in the step 3.1 is 20, and the word quantity of Chinese description is also 20; step 3.2 with supervised matching modelsModelInputting 256 × 40 data into the first-layer MLP coding network, wherein 256 is batch data volume, and 40 is coding participle and Chinese participle data volume; the output of the first layer of MLP coding network is 256 x 20 data; the output of the second layer MLP coding network is 256 x 10 data; the output of the first layer of MLP decoding network is 256 x 20 data; the output of the second layer MLP decoding network is 256 × 40 data.

Preferably, the loss function in step 3.4Loss FunctionWhen the convergence tends to be reached, the loss function value is converged to 0.17, and the preset value of the iteration times is 10000 steps.

Preferably, step 4 specifically comprises the following steps:

step 4.1, obtaining the description participle of the original KKS code and the new KKS code respectively through participle

WhereinwThe word is divided into sub-words,iis the number of word segmentation;

step 4.2, the description participles of the original KKS code and the new KKS code obtained in step 4.1 adopt the minimum editing distance (Eidt distance) Similarity calculation is carried out on the word segmentation result to obtain the similarity scores of the editing distances of the original KKS code and the new KKS codeScore；

Step 4.3, similarity score obtained from step 4.2ScoreAccording to the threshold of similarity

Filtering is carried out if the similarity score isScoreIs lower than

If the matching fails, entering step 5 to perform supervised matching; if the similarity score isScoreAbove

If the matching is successful, the matching result is directly output.

Preferably, the similarity threshold is set in step 4.3

Is 85.

Preferably, the model is matched according to supervision in step 5.1ModelThe specific way to obtain the generated result is as follows: will have a supervision to match the modelModelProviding service through a computing unit and a data interface of the storage device to obtain codes and Chinese descriptions of corresponding rules; the computing unit is computing hardware equipment and comprises a CPU, a memory and a GPU; the general calculation unit mainly comprises a singlechip, a small computer and a common computer, and only abstracts the devices into the calculation unit; and 5.2, obtaining 5 recommendation results after similarity matching.

The invention has the beneficial effects that: the invention provides a task of intelligent KKS code mapping, which comprises the steps of firstly collecting an original KKS code list and a new KKS code list by using a collecting device, then carrying out manual matching, and then carrying out a training data set of an obtained supervision matching modelDPerforming supervised training by using Chinese description in original KKS coding list、Unsupervised matching is carried out on the Chinese descriptions in the new KKS coding list, and supervised matching is carried out on data which fail in unsupervised matching; the mapping table of the standard codes can be directly obtained, the workload of standardization work is greatly reduced, the stability of system operation is improved, the generalization of bottom layer data is realized, and the coding rules are unified.

Drawings

FIG. 1 is a flowchart of a hybrid supervised double-layer matching based coding mapping recommendation method;

FIG. 2 is a diagram of a memory device architecture;

FIG. 3 is a view of the acquisition equipment;

FIG. 4 is a flow chart of a hybrid supervised code based matching method;

fig. 5 is a diagram of a supervised matching network.

Detailed Description

The present invention will be further described with reference to the following examples. The following examples are set forth merely to aid in the understanding of the invention. It should be noted that, for a person skilled in the art, several modifications can be made to the invention without departing from the principle of the invention, and these modifications and modifications also fall within the protection scope of the claims of the present invention.

With the advancement of power generation production informatization, information assets and virtualization activities are increased day by day, the characteristics of a KKS coding weak standard rule are more and more obvious, an original identification coding system is improved according to the requirement of intelligent production, and coding standards inside a plurality of enterprises are provided. The improved standard codes are easy to implement in newly built power generation enterprises, and in old factories which have been operated for years, firstly, the coding rules adopted by the operated systems are inconsistent, and secondly, due to the lack of manpower and material resources, the coding problem of the whole factory is difficult to be solved by organizing corresponding force.

Example one

The embodiment of the application provides a method for recommending mapping based on hybrid supervised double-layer matching coding as shown in fig. 1 and fig. 4:

wherein the original KKS code list is:

in the above formula, the first and second carbon atoms are,

to

Is an English code in the original KKS code list,

to

The Chinese description in the original KKS coding list is obtained;

wherein the new KKS code list is:

in the above formula, the first and second carbon atoms are,

to

For english codes in the new KKS code list,

to

Chinese descriptions in the new KKS coding list;

obtaining a supervised matching model training datasetD；

step 5, extracting the data stored in the storage deviceStep 3, the obtained supervised matching modelModelCarrying out supervised matching on the data which fails in the unsupervised matching in the step 4;

Example two

On the basis of the first embodiment, the second embodiment of the present application provides a specific implementation flow of step 3 shown in fig. 5:

；

Step 3.2, training data set vectorized

The second layer of MLP coding network is

The second layer MLP decoding network is

：

In the above formula, the first and second carbon atoms are,

sparse features generated for the first layer MLP coding network,

sparse features generated for the second layer MLP coding network,

a reconstruction signature generated for a second layer MLP decoding network;

In the above formula, the first and second carbon atoms are,nn reconstruction features are provided for one batch; the above formula isnSumming the cross entropies of the bar reconstruction features to serve as a loss function value of the batch;

step 3.4, obtaining the loss function according to step 3.3Loss FunctionModel iteration is carried out when the loss functionLoss FunctionWhen the convergence tends to be reached, a supervised matching model is obtainedModelAnd stored in a storage device.

EXAMPLE III

On the basis of the first embodiment and the second embodiment, the third embodiment of the present application provides a specific implementation flow of step 4 shown in fig. 4:

WhereinwThe word is divided into sub-words,iis the number of word segmentation;

Filtering is carried out if the similarity score isScoreIs lower than

If the matching is successful, the matching result is directly output.

Example four

On the basis of the first embodiment to the third embodiment, the fourth embodiment of the present application provides an application of the hybrid supervision-based double-layer matching coding mapping recommendation method to a certain power plant:

step 1, collecting and constructing an original KKS coding list by adopting a collecting device as shown in figure 3

Wherein the content of the first and second substances,c ¹the code is an English code, and the code is,d ¹encoding lists for Chinese descriptions and new KKS codes

Wherein the content of the first and second substances,c ²the code is an English code, and the code is,d ²for Chinese description, the existing coded data stored in the ERP system constructed by the power plant is transmitted to a group side through a group industrial internet and stored, and the storage form is a CSV format;

step 2, manual matching is carried out on the KKS coding list obtained in the step 1

Obtaining a supervised matching model training data setD。

Step 3, training data set obtained in step 2DCarrying out supervised training to obtain a supervised matching modelModel：

Step 3.1, training data setDThe data in the method adopts Jieba to carry out word segmentation to obtain word segmentation results, and then adopts N-Gram to carry out vectorization to obtain the word segmentation results

；

Step 3.2, training data to be vectorized

Input to the supervised matching model, as shown in FIG. 5, first generate sparse features from the two-layer MLP coding network

Wherein the first layer is

The second layer is

And generating reconstruction characteristics by a two-layer MLP decoding network, wherein the first layer is

The second layer is

The input of the coding layer is 256 x 40 data, 256 is batch data volume, 40 is data volume of coding participles and Chinese participles, the first layer is

Has an output of 256 x 20, and a second layer of

Has an output of 256 x 10, and a third layer of

The output of (2) is 256 x 20, the fourth layer is

256 x 40:

step 3.3, features to reconstruct

Performing cross entropy calculation with the label vector to obtain a loss functionLoss Function；

Step 3.4, according to the loss functionLoss FunctionModel iteration is performed, loss function valueslossThe convergence of 0.1724 is approached, and a supervised matching model is obtainedModelThe model is stored in a storage device and used for multiplexing the model, the device is provided with at least a computing unit of a display card of more than 8G, and an API interface service is provided by adopting a flash framework, as shown in figure 2, a solid state disk and a mechanical hard disk are used as storage units;

step 4, KKS code description obtained from step 1d ¹ ,d ²Carrying out unsupervised matching, which comprises the following specific steps:

step 4.1, by word segmentationTo obtain

WhereinwThe word is divided into sub-words,iin order to divide the number of the word,ia maximum value of 20;

step 4.2, the two types of coding description participles obtained in step 4.1 adopt the minimum editing distance (a)Eidt distance) Similarity calculation is carried out on the word segmentation result to obtain similarity scores of two editing distancesScore；

Step 4.3, the similarity score obtained in step 4.2 is based on the similarity threshold

If the matching is failed, the supervised matching is required to be performed in step 5, if the threshold is lower than the threshold, the matching is successful, and the result can be directly stored in the storage device.

And 5, extracting the supervised matching model obtained in the step 3 from the storage device, and carrying out supervised matching on the data failed in the step 4.3, wherein the method specifically comprises the following steps:

step 5.1, obtaining the codes and Chinese descriptions of the corresponding rules by adopting the model obtained in the step 3;

and 5.2, performing similarity matching on the generated result and the new code by adopting the minimum editing distance to obtain 5 recommendation results, and storing the results on a storage device in a CSV form for system calling.

Through the steps, the mapping table of the standard codes can be directly obtained, and the workload of standardization work is greatly reduced. The standardized production codes of petroleum enterprises belonging to a certain group are standardized, and the final results are shown in the following table 1:

TABLE 1 standardized result table of production codes of petroleum enterprises under a certain group

Through statistics, the accuracy of 1316 encoding results is about 89.9%, the requirements of plant side personnel are met, and meanwhile, the time for standardized encoding is shortened.

The final results obtained for the code standardization work of a certain group of subordinate photovoltaic power plants are shown in table 2 below:

table 2 coding standardization result table of photovoltaic power plant belonging to a certain group

The accuracy of 580 coding results is calculated to be about 91.2%, and the coding results meet the requirements of standardization work through manual inspection.

Claims

1. A mixed supervision based double-layer matching coding mapping recommendation method is characterized by comprising the following steps:

wherein the original KKS code list is:

in the above formula, the first and second carbon atoms are,

to

Is an English code in the original KKS code list,

to

The Chinese description in the original KKS coding list is obtained;

wherein the new KKS code list is:

in the above formula, the first and second carbon atoms are,

to

For english codes in the new KKS code list,

to

Chinese descriptions in the new KKS coding list;

obtaining a supervised matching model training datasetD；

2. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 1, wherein: the database in the step 1 is a PI database; the acquisition equipment writes a python script for data extraction according to an access rule provided by a PI database, acquires a data coding list through an interface provided by the database and stores the original KKS coding list and the new KKS coding list in a CSV mode; the original KKS coding list adopts a design institute coding rule or a DCS manufacturer coding rule; the new KKS code is a new code rule redesigned according to each power plant situation.

3. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 1, wherein: the amount of data manually matched in step 2 is 5000 pieces.

4. The hybrid supervision-based double-layer matching coding mapping recommendation method according to claim 1, wherein step 3 specifically comprises the following steps:

step 3.1, the supervision matching model training data setDThe data in the method adopts Jieba to carry out word segmentation to obtain word segmentation results, and then adopts N-Gram to carry out vectorization to obtain a vectorized training data set

；

Step 3.2, training data set vectorized

The second layer of MLP coding network is

The second layer MLP decoding network is

：

In the above formula, the first and second carbon atoms are,

sparse features generated for the first layer MLP coding network,

sparse features generated for the second layer MLP coding network,

a reconstruction signature generated for a second layer MLP decoding network;

Performing cross entropy calculation with the label vector to obtain a loss functionLossFunction：

step 3.4, obtaining the loss function according to step 3.3LossFunctionModel iteration is carried out when the loss functionLossFunctionWhen the convergence tends to be reached, a supervised matching model is obtainedModelAnd stored in a storage device.

5. The hybrid supervision-based double-layer matching coding mapping recommendation method of claim 4, wherein step 3.1 further expands the word segmentation result according to the coding word segmentation amount, and blanks are adopted to fill up the shortage.

6. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 5, wherein: in step 3.1, the encoding word quantity is 20, and the Chinese description word quantity is also 20; step 3.2 with supervised matching modelsModelInputting 256 × 40 data into the first-layer MLP coding network, wherein 256 is batch data volume, and 40 is coding participle and Chinese participle data volume; the output of the first layer of MLP coding network is 256 x 20 data; the output of the second layer MLP coding network is 256 x 10 data; the output of the first layer of MLP decoding network is 256 x 20 data; the output of the second layer MLP decoding network is 256 × 40 data.

7. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 4, wherein: loss function in step 3.4LossFunctionWhen the convergence tends to be reached, the loss function value is converged to 0.17, and the preset value of the iteration times is 10000 steps.

8. The hybrid supervised double-layer matching coding mapping recommendation method according to claim 1 or 4, wherein the step 4 specifically comprises the following steps:

WhereinwThe word is divided into sub-words,iis the number of word segmentation;

step 4.2, the description participles of the original KKS code and the new KKS code obtained in the step 4.1 are subjected to similarity calculation on the participle result by adopting the minimum editing distance, and the similarity scores of the editing distances of the original KKS code and the new KKS code are obtainedScore；

Filtering is carried out if the similarity score isScoreIs lower than

If the matching is successful, the matching result is directly output.

9. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 8, wherein: similarity threshold in step 4.3

Is 85.

10. The hybrid supervised double-layer matching based coding mapping recommendation method as claimed in claim 1, wherein the step 5.1 is based on supervised mappingDu matching modelModelThe specific way to obtain the generated result is as follows: will have a supervision to match the modelModelProviding service through a computing unit and a data interface of the storage equipment to obtain codes and Chinese description; and 5.2, obtaining 5 recommendation results after similarity matching.