CN113673202A - Double-layer matching coding mapping recommendation method based on hybrid supervision - Google Patents

Double-layer matching coding mapping recommendation method based on hybrid supervision Download PDF

Info

Publication number
CN113673202A
CN113673202A CN202110905914.2A CN202110905914A CN113673202A CN 113673202 A CN113673202 A CN 113673202A CN 202110905914 A CN202110905914 A CN 202110905914A CN 113673202 A CN113673202 A CN 113673202A
Authority
CN
China
Prior art keywords
matching
coding
kks
layer
supervised
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110905914.2A
Other languages
Chinese (zh)
Inventor
傅骏伟
王豆
郭鼎
张震伟
李炳辰
姜志锋
吴林峰
陆金奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zheneng Digital Technology Co.,Ltd.
Zhejiang Energy Group Research Institute Co Ltd
Original Assignee
Zhejiang Energy Group Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Energy Group Research Institute Co Ltd filed Critical Zhejiang Energy Group Research Institute Co Ltd
Priority to CN202110905914.2A priority Critical patent/CN113673202A/en
Publication of CN113673202A publication Critical patent/CN113673202A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention relates to a hybrid supervision-based double-layer matching coding mapping recommendation method, which comprises the following steps: collecting an original KKS code list and a new KKS code list by using collection equipment; carrying out manual matching; training data set of supervised matching modelDAnd carrying out supervised training. The invention has the beneficial effects that: the intelligent KKS code mapping task is provided, the acquisition equipment is used for acquiring an original KKS code list and a new KKS code list, manual matching is performed, and a supervision matching model training data set is usedDPerforming supervised training by using Chinese description in original KKS coding listUnsupervised matching is performed on Chinese descriptions in the new KKS coding list, and supervised matching is performed on data which fail in unsupervised matchingPreparing; the mapping table of the standard codes can be directly obtained, the workload of standardization work is greatly reduced, the stability of system operation is improved, the generalization of bottom layer data is realized, and the coding rules are unified.

Description

Double-layer matching coding mapping recommendation method based on hybrid supervision
Technical Field
The invention belongs to the technical field of power plant information, and particularly relates to a hybrid supervision-based double-layer matching coding mapping recommendation method.
Background
The problems of information confusion, low data quality, isolated island of data application and the like are easily caused by complex asset structure, various types and large data scale of power generation enterprises, and data sharing and application are seriously hindered. To address the above, KKS encoding systems were introduced. The system identifies the coding system of the system, equipment, assembly and construction in the power plant according to the function, model and installation position, and becomes the most widely used identification system of the power plant at present, and has been for 50 years since the past.
However, with the advancement of power generation production informatization, the informatization assets and virtualization activities are increased day by day, the characteristics of the KKS coding weak standard rule are more and more obvious, and the original identification coding system is improved according to the requirement of intelligent production at present, so that coding standards in a plurality of enterprises are provided. The improved standard codes are easy to implement in newly built power generation enterprises, and in old factories which have been operated for years, firstly, the coding rules adopted by the operated systems are inconsistent, and secondly, due to the lack of manpower and material resources, the coding problem of the whole factory is difficult to be solved by organizing corresponding force. The invention discloses a CN201310367698.6 Internet of things unified identification code multi-mode recognition method, which decomposes a unified identification code into single-mode code information, and then performs matched analysis to analyze the code content carried by each single-mode code information. The invention patent CN201310289939.X provides a KKS intelligent batch coding method for three-dimensional design of a transformer substation. All the above patents consider the mapping problem between different codes, and only make innovation on the coding process. Therefore, in order to further reduce the encoding workload, improve the stability of system operation, and unify the encoding rules, a set of encoding matching method is urgently needed for the mapping task of encoding.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a hybrid supervision-based double-layer matching coding mapping recommendation method.
The method for recommending the mapping based on the hybrid supervision double-layer matching coding comprises the following steps:
step 1, collecting an original KKS code list and a new KKS code list stored in a database through an interface provided by a database by using collection equipment, and storing the original KKS code list and the new KKS code list;
wherein the original KKS code list is:
Figure 419335DEST_PATH_IMAGE001
in the above formula, the first and second carbon atoms are,
Figure 100002_DEST_PATH_IMAGE002
to
Figure 210573DEST_PATH_IMAGE003
Is an English code in the original KKS code list,
Figure 100002_DEST_PATH_IMAGE004
to
Figure 492650DEST_PATH_IMAGE005
The Chinese description in the original KKS coding list is obtained;
wherein the new KKS code list is:
Figure 100002_DEST_PATH_IMAGE006
in the above formula, the first and second carbon atoms are,
Figure 803677DEST_PATH_IMAGE007
to
Figure 100002_DEST_PATH_IMAGE008
For english codes in the new KKS code list,
Figure 637641DEST_PATH_IMAGE009
to
Figure 100002_DEST_PATH_IMAGE010
Chinese descriptions in the new KKS coding list;
step 2, manually matching the original KKS code list and the new KKS code list obtained in the step 1:
Figure 156654DEST_PATH_IMAGE011
obtaining a supervised matching model training datasetD
Step 3, training data set of the supervision matching model obtained in the step 2DCarrying out supervised training to obtain a supervised matching modelModel
Step 4, adopting the Chinese description in the original KKS coding list obtained in the step 1d 1 Chinese descriptions in new KKS coding listsd 2 Carrying out unsupervised matching;
step 5, extracting the supervised matching model obtained in the step 3 in the storage deviceModelCarrying out supervised matching on the data which fails in the unsupervised matching in the step 4;
step 5.1, according to the supervised matching modelModelObtaining a generation result;
and 5.2, performing similarity matching on the generated result and the new code by adopting the minimum editing distance, and storing the result in a storage device.
Preferably, the database in the step 1 is a PI database; the acquisition equipment writes a python script for data extraction according to an access rule provided by a PI database, acquires a data coding list through an interface provided by the database and stores the original KKS coding list and the new KKS coding list in a CSV mode; the original KKS coding list adopts a design institute coding rule or a DCS manufacturer coding rule, and the coding conditions of different power plants are inconsistent; the new KKS codes are new coding rules redesigned according to the conditions of each power plant; although the two encoding rules are encoded in terms of units, systems, devices, components and the like, the specific encoding form and the construction rule are completely different.
Preferably, the amount of data manually matched in step 2 is 5000 pieces.
Preferably, step 3 specifically comprises the following steps:
step 3.1, the supervision matching model training data setDThe data in the method adopts Jieba (Python Chinese word segmentation component) to carry out word segmentation to obtain word segmentation results, and then adopts N-Gram to carry out vectorization to obtain a vectorized training data set
Figure 100002_DEST_PATH_IMAGE012
Step 3.2, training data set vectorized
Figure 101476DEST_PATH_IMAGE012
Input to supervised matching modelsModel(ii) a First, two layers of MLP coding networks are used to generate sparse features, wherein the first layer of MLP coding network is
Figure 188381DEST_PATH_IMAGE013
The second layer of MLP coding network is
Figure 100002_DEST_PATH_IMAGE014
(ii) a Generating reconstruction characteristics by two layers of MLP decoding networks, wherein the first layer of MLP decoding network is
Figure 943978DEST_PATH_IMAGE015
The second layer MLP decoding network is
Figure 100002_DEST_PATH_IMAGE016
Figure 444230DEST_PATH_IMAGE017
Figure 100002_DEST_PATH_IMAGE018
In the above formula, the first and second carbon atoms are,
Figure 442011DEST_PATH_IMAGE019
sparse features generated for the first layer MLP coding network,
Figure 100002_DEST_PATH_IMAGE020
sparse features generated for the second layer MLP coding network,
Figure 976897DEST_PATH_IMAGE021
the reconstruction signature generated for the first layer MLP decoding network,
Figure 100002_DEST_PATH_IMAGE022
a reconstruction signature generated for a second layer MLP decoding network;
step 3.3, reconstructing characteristics generated by the second-layer MLP decoding network
Figure 168975DEST_PATH_IMAGE023
Performing cross entropy calculation with the label vector to obtain a loss functionLoss Function:
Figure 100002_DEST_PATH_IMAGE024
In the above formula, the first and second carbon atoms are,nn reconstruction features are provided for one batch; the above formula isnSumming the cross entropies of the bar reconstruction features to serve as a loss function value of the batch of training data;
step 3.4, obtaining the loss function according to step 3.3Loss FunctionCarrying out a dieType iteration, as a function of lossLoss FunctionWhen the convergence tends to be reached, a supervised matching model is obtainedModelAnd stored in a storage device.
Preferably, step 3.1 also expands the word segmentation result according to the coding word segmentation amount, and fills up the shortage by using blank spaces.
Preferably, the encoding word quantity in the step 3.1 is 20, and the word quantity of Chinese description is also 20; step 3.2 with supervised matching modelsModelInputting 256 × 40 data into the first-layer MLP coding network, wherein 256 is batch data volume, and 40 is coding participle and Chinese participle data volume; the output of the first layer of MLP coding network is 256 x 20 data; the output of the second layer MLP coding network is 256 x 10 data; the output of the first layer of MLP decoding network is 256 x 20 data; the output of the second layer MLP decoding network is 256 × 40 data.
Preferably, the loss function in step 3.4Loss FunctionWhen the convergence tends to be reached, the loss function value is converged to 0.17, and the preset value of the iteration times is 10000 steps.
Preferably, step 4 specifically comprises the following steps:
step 4.1, obtaining the description participle of the original KKS code and the new KKS code respectively through participle
Figure 156523DEST_PATH_IMAGE025
WhereinwThe word is divided into sub-words,iis the number of word segmentation;
step 4.2, the description participles of the original KKS code and the new KKS code obtained in step 4.1 adopt the minimum editing distance (Eidt distance) Similarity calculation is carried out on the word segmentation result to obtain the similarity scores of the editing distances of the original KKS code and the new KKS codeScore
Step 4.3, similarity score obtained from step 4.2ScoreAccording to the threshold of similarity
Figure 100002_DEST_PATH_IMAGE026
Filtering is carried out if the similarity score isScoreIs lower than
Figure 429766DEST_PATH_IMAGE027
If the matching fails, entering step 5 to perform supervised matching; if the similarity score isScoreAbove
Figure 225683DEST_PATH_IMAGE026
If the matching is successful, the matching result is directly output.
Preferably, the similarity threshold is set in step 4.3
Figure 103510DEST_PATH_IMAGE026
Is 85.
Preferably, the model is matched according to supervision in step 5.1ModelThe specific way to obtain the generated result is as follows: will have a supervision to match the modelModelProviding service through a computing unit and a data interface of the storage device to obtain codes and Chinese descriptions of corresponding rules; the computing unit is computing hardware equipment and comprises a CPU, a memory and a GPU; the general calculation unit mainly comprises a singlechip, a small computer and a common computer, and only abstracts the devices into the calculation unit; and 5.2, obtaining 5 recommendation results after similarity matching.
The invention has the beneficial effects that: the invention provides a task of intelligent KKS code mapping, which comprises the steps of firstly collecting an original KKS code list and a new KKS code list by using a collecting device, then carrying out manual matching, and then carrying out a training data set of an obtained supervision matching modelDPerforming supervised training by using Chinese description in original KKS coding listUnsupervised matching is carried out on the Chinese descriptions in the new KKS coding list, and supervised matching is carried out on data which fail in unsupervised matching; the mapping table of the standard codes can be directly obtained, the workload of standardization work is greatly reduced, the stability of system operation is improved, the generalization of bottom layer data is realized, and the coding rules are unified.
Drawings
FIG. 1 is a flowchart of a hybrid supervised double-layer matching based coding mapping recommendation method;
FIG. 2 is a diagram of a memory device architecture;
FIG. 3 is a view of the acquisition equipment;
FIG. 4 is a flow chart of a hybrid supervised code based matching method;
fig. 5 is a diagram of a supervised matching network.
Detailed Description
The present invention will be further described with reference to the following examples. The following examples are set forth merely to aid in the understanding of the invention. It should be noted that, for a person skilled in the art, several modifications can be made to the invention without departing from the principle of the invention, and these modifications and modifications also fall within the protection scope of the claims of the present invention.
With the advancement of power generation production informatization, information assets and virtualization activities are increased day by day, the characteristics of a KKS coding weak standard rule are more and more obvious, an original identification coding system is improved according to the requirement of intelligent production, and coding standards inside a plurality of enterprises are provided. The improved standard codes are easy to implement in newly built power generation enterprises, and in old factories which have been operated for years, firstly, the coding rules adopted by the operated systems are inconsistent, and secondly, due to the lack of manpower and material resources, the coding problem of the whole factory is difficult to be solved by organizing corresponding force.
Example one
The embodiment of the application provides a method for recommending mapping based on hybrid supervised double-layer matching coding as shown in fig. 1 and fig. 4:
step 1, collecting an original KKS code list and a new KKS code list stored in a database through an interface provided by a database by using collection equipment, and storing the original KKS code list and the new KKS code list;
wherein the original KKS code list is:
Figure DEST_PATH_IMAGE028
in the above formula, the first and second carbon atoms are,
Figure 594665DEST_PATH_IMAGE029
to
Figure DEST_PATH_IMAGE030
Is an English code in the original KKS code list,
Figure 684981DEST_PATH_IMAGE031
to
Figure DEST_PATH_IMAGE032
The Chinese description in the original KKS coding list is obtained;
wherein the new KKS code list is:
Figure 69826DEST_PATH_IMAGE033
in the above formula, the first and second carbon atoms are,
Figure DEST_PATH_IMAGE034
to
Figure 571083DEST_PATH_IMAGE035
For english codes in the new KKS code list,
Figure DEST_PATH_IMAGE036
to
Figure 533223DEST_PATH_IMAGE037
Chinese descriptions in the new KKS coding list;
step 2, manually matching the original KKS code list and the new KKS code list obtained in the step 1:
Figure DEST_PATH_IMAGE038
obtaining a supervised matching model training datasetD
Step 3, training data set of the supervision matching model obtained in the step 2DCarrying out supervised training to obtain a supervised matching modelModel
Step 4, adopting the Chinese description in the original KKS coding list obtained in the step 1d 1 Chinese descriptions in new KKS coding listsd 2 Carrying out unsupervised matching;
step 5, extracting the data stored in the storage deviceStep 3, the obtained supervised matching modelModelCarrying out supervised matching on the data which fails in the unsupervised matching in the step 4;
step 5.1, according to the supervised matching modelModelObtaining a generation result;
and 5.2, performing similarity matching on the generated result and the new code by adopting the minimum editing distance, and storing the result in a storage device.
Example two
On the basis of the first embodiment, the second embodiment of the present application provides a specific implementation flow of step 3 shown in fig. 5:
step 3.1, the supervision matching model training data setDThe data in the method adopts Jieba (Python Chinese word segmentation component) to carry out word segmentation to obtain word segmentation results, and then adopts N-Gram to carry out vectorization to obtain a vectorized training data set
Figure 177962DEST_PATH_IMAGE039
Step 3.2, training data set vectorized
Figure 73106DEST_PATH_IMAGE039
Input to supervised matching modelsModel(ii) a First, two layers of MLP coding networks are used to generate sparse features, wherein the first layer of MLP coding network is
Figure DEST_PATH_IMAGE040
The second layer of MLP coding network is
Figure 748194DEST_PATH_IMAGE014
(ii) a Generating reconstruction characteristics by two layers of MLP decoding networks, wherein the first layer of MLP decoding network is
Figure 869734DEST_PATH_IMAGE041
The second layer MLP decoding network is
Figure DEST_PATH_IMAGE042
Figure 301852DEST_PATH_IMAGE043
Figure DEST_PATH_IMAGE044
In the above formula, the first and second carbon atoms are,
Figure 5497DEST_PATH_IMAGE045
sparse features generated for the first layer MLP coding network,
Figure DEST_PATH_IMAGE046
sparse features generated for the second layer MLP coding network,
Figure 864869DEST_PATH_IMAGE047
the reconstruction signature generated for the first layer MLP decoding network,
Figure DEST_PATH_IMAGE048
a reconstruction signature generated for a second layer MLP decoding network;
step 3.3, reconstructing characteristics generated by the second-layer MLP decoding network
Figure 785289DEST_PATH_IMAGE049
Performing cross entropy calculation with the label vector to obtain a loss functionLoss Function:
Figure DEST_PATH_IMAGE050
In the above formula, the first and second carbon atoms are,nn reconstruction features are provided for one batch; the above formula isnSumming the cross entropies of the bar reconstruction features to serve as a loss function value of the batch;
step 3.4, obtaining the loss function according to step 3.3Loss FunctionModel iteration is carried out when the loss functionLoss FunctionWhen the convergence tends to be reached, a supervised matching model is obtainedModelAnd stored in a storage device.
EXAMPLE III
On the basis of the first embodiment and the second embodiment, the third embodiment of the present application provides a specific implementation flow of step 4 shown in fig. 4:
step 4.1, obtaining the description participle of the original KKS code and the new KKS code respectively through participle
Figure 286677DEST_PATH_IMAGE051
WhereinwThe word is divided into sub-words,iis the number of word segmentation;
step 4.2, the description participles of the original KKS code and the new KKS code obtained in step 4.1 adopt the minimum editing distance (Eidt distance) Similarity calculation is carried out on the word segmentation result to obtain the similarity scores of the editing distances of the original KKS code and the new KKS codeScore
Step 4.3, similarity score obtained from step 4.2ScoreAccording to the threshold of similarity
Figure 766200DEST_PATH_IMAGE026
Filtering is carried out if the similarity score isScoreIs lower than
Figure 78364DEST_PATH_IMAGE026
If the matching fails, entering step 5 to perform supervised matching; if the similarity score isScoreAbove
Figure 908917DEST_PATH_IMAGE026
If the matching is successful, the matching result is directly output.
Example four
On the basis of the first embodiment to the third embodiment, the fourth embodiment of the present application provides an application of the hybrid supervision-based double-layer matching coding mapping recommendation method to a certain power plant:
step 1, collecting and constructing an original KKS coding list by adopting a collecting device as shown in figure 3
Figure 89362DEST_PATH_IMAGE028
Wherein the content of the first and second substances,c 1 the code is an English code, and the code is,d 1 encoding lists for Chinese descriptions and new KKS codes
Figure DEST_PATH_IMAGE052
Wherein the content of the first and second substances,c 2 the code is an English code, and the code is,d 2 for Chinese description, the existing coded data stored in the ERP system constructed by the power plant is transmitted to a group side through a group industrial internet and stored, and the storage form is a CSV format;
step 2, manual matching is carried out on the KKS coding list obtained in the step 1
Figure DEST_PATH_IMAGE053
Obtaining a supervised matching model training data setD
Step 3, training data set obtained in step 2DCarrying out supervised training to obtain a supervised matching modelModel
Step 3.1, training data setDThe data in the method adopts Jieba to carry out word segmentation to obtain word segmentation results, and then adopts N-Gram to carry out vectorization to obtain the word segmentation results
Figure DEST_PATH_IMAGE054
Step 3.2, training data to be vectorized
Figure 596960DEST_PATH_IMAGE039
Input to the supervised matching model, as shown in FIG. 5, first generate sparse features from the two-layer MLP coding network
Figure DEST_PATH_IMAGE055
Wherein the first layer is
Figure DEST_PATH_IMAGE056
The second layer is
Figure DEST_PATH_IMAGE057
And generating reconstruction characteristics by a two-layer MLP decoding network, wherein the first layer is
Figure DEST_PATH_IMAGE058
The second layer is
Figure 345605DEST_PATH_IMAGE042
The input of the coding layer is 256 x 40 data, 256 is batch data volume, 40 is data volume of coding participles and Chinese participles, the first layer is
Figure 788087DEST_PATH_IMAGE056
Has an output of 256 x 20, and a second layer of
Figure DEST_PATH_IMAGE059
Has an output of 256 x 10, and a third layer of
Figure DEST_PATH_IMAGE060
The output of (2) is 256 x 20, the fourth layer is
Figure DEST_PATH_IMAGE061
256 x 40:
Figure DEST_PATH_IMAGE062
Figure DEST_PATH_IMAGE063
step 3.3, features to reconstruct
Figure DEST_PATH_IMAGE064
Performing cross entropy calculation with the label vector to obtain a loss functionLoss Function
Step 3.4, according to the loss functionLoss FunctionModel iteration is performed, loss function valueslossThe convergence of 0.1724 is approached, and a supervised matching model is obtainedModelThe model is stored in a storage device and used for multiplexing the model, the device is provided with at least a computing unit of a display card of more than 8G, and an API interface service is provided by adopting a flash framework, as shown in figure 2, a solid state disk and a mechanical hard disk are used as storage units;
step 4, KKS code description obtained from step 1d 1 ,d 2 Carrying out unsupervised matching, which comprises the following specific steps:
step 4.1, by word segmentationTo obtain
Figure DEST_PATH_IMAGE065
WhereinwThe word is divided into sub-words,iin order to divide the number of the word,ia maximum value of 20;
step 4.2, the two types of coding description participles obtained in step 4.1 adopt the minimum editing distance (a)Eidt distance) Similarity calculation is carried out on the word segmentation result to obtain similarity scores of two editing distancesScore
Step 4.3, the similarity score obtained in step 4.2 is based on the similarity threshold
Figure DEST_PATH_IMAGE066
If the matching is failed, the supervised matching is required to be performed in step 5, if the threshold is lower than the threshold, the matching is successful, and the result can be directly stored in the storage device.
And 5, extracting the supervised matching model obtained in the step 3 from the storage device, and carrying out supervised matching on the data failed in the step 4.3, wherein the method specifically comprises the following steps:
step 5.1, obtaining the codes and Chinese descriptions of the corresponding rules by adopting the model obtained in the step 3;
and 5.2, performing similarity matching on the generated result and the new code by adopting the minimum editing distance to obtain 5 recommendation results, and storing the results on a storage device in a CSV form for system calling.
Through the steps, the mapping table of the standard codes can be directly obtained, and the workload of standardization work is greatly reduced. The standardized production codes of petroleum enterprises belonging to a certain group are standardized, and the final results are shown in the following table 1:
TABLE 1 standardized result table of production codes of petroleum enterprises under a certain group
Figure DEST_PATH_IMAGE067
Through statistics, the accuracy of 1316 encoding results is about 89.9%, the requirements of plant side personnel are met, and meanwhile, the time for standardized encoding is shortened.
The final results obtained for the code standardization work of a certain group of subordinate photovoltaic power plants are shown in table 2 below:
table 2 coding standardization result table of photovoltaic power plant belonging to a certain group
Figure DEST_PATH_IMAGE068
The accuracy of 580 coding results is calculated to be about 91.2%, and the coding results meet the requirements of standardization work through manual inspection.

Claims (10)

1. A mixed supervision based double-layer matching coding mapping recommendation method is characterized by comprising the following steps:
step 1, collecting an original KKS code list and a new KKS code list stored in a database through an interface provided by a database by using collection equipment, and storing the original KKS code list and the new KKS code list;
wherein the original KKS code list is:
Figure 908520DEST_PATH_IMAGE001
in the above formula, the first and second carbon atoms are,
Figure DEST_PATH_IMAGE002
to
Figure 638710DEST_PATH_IMAGE003
Is an English code in the original KKS code list,
Figure DEST_PATH_IMAGE004
to
Figure 215185DEST_PATH_IMAGE005
The Chinese description in the original KKS coding list is obtained;
wherein the new KKS code list is:
Figure DEST_PATH_IMAGE006
in the above formula, the first and second carbon atoms are,
Figure 921978DEST_PATH_IMAGE007
to
Figure DEST_PATH_IMAGE008
For english codes in the new KKS code list,
Figure 798667DEST_PATH_IMAGE009
to
Figure DEST_PATH_IMAGE010
Chinese descriptions in the new KKS coding list;
step 2, manually matching the original KKS code list and the new KKS code list obtained in the step 1:
Figure 434179DEST_PATH_IMAGE011
obtaining a supervised matching model training datasetD
Step 3, training data set of the supervision matching model obtained in the step 2DCarrying out supervised training to obtain a supervised matching modelModel
Step 4, adopting the Chinese description in the original KKS coding list obtained in the step 1d 1 Chinese descriptions in new KKS coding listsd 2 Carrying out unsupervised matching;
step 5, extracting the supervised matching model obtained in the step 3 in the storage deviceModelCarrying out supervised matching on the data which fails in the unsupervised matching in the step 4;
step 5.1, according to the supervised matching modelModelObtaining a generation result;
and 5.2, performing similarity matching on the generated result and the new code by adopting the minimum editing distance, and storing the result in a storage device.
2. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 1, wherein: the database in the step 1 is a PI database; the acquisition equipment writes a python script for data extraction according to an access rule provided by a PI database, acquires a data coding list through an interface provided by the database and stores the original KKS coding list and the new KKS coding list in a CSV mode; the original KKS coding list adopts a design institute coding rule or a DCS manufacturer coding rule; the new KKS code is a new code rule redesigned according to each power plant situation.
3. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 1, wherein: the amount of data manually matched in step 2 is 5000 pieces.
4. The hybrid supervision-based double-layer matching coding mapping recommendation method according to claim 1, wherein step 3 specifically comprises the following steps:
step 3.1, the supervision matching model training data setDThe data in the method adopts Jieba to carry out word segmentation to obtain word segmentation results, and then adopts N-Gram to carry out vectorization to obtain a vectorized training data set
Figure DEST_PATH_IMAGE012
Step 3.2, training data set vectorized
Figure 497950DEST_PATH_IMAGE012
Input to supervised matching modelsModel(ii) a First, two layers of MLP coding networks are used to generate sparse features, wherein the first layer of MLP coding network is
Figure 542523DEST_PATH_IMAGE013
The second layer of MLP coding network is
Figure DEST_PATH_IMAGE014
(ii) a Generating reconstruction characteristics by two layers of MLP decoding networks, wherein the first layer of MLP decoding network is
Figure 273719DEST_PATH_IMAGE015
The second layer MLP decoding network is
Figure DEST_PATH_IMAGE016
Figure 80132DEST_PATH_IMAGE017
Figure DEST_PATH_IMAGE018
In the above formula, the first and second carbon atoms are,
Figure 896778DEST_PATH_IMAGE019
sparse features generated for the first layer MLP coding network,
Figure DEST_PATH_IMAGE020
sparse features generated for the second layer MLP coding network,
Figure 210953DEST_PATH_IMAGE021
the reconstruction signature generated for the first layer MLP decoding network,
Figure DEST_PATH_IMAGE022
a reconstruction signature generated for a second layer MLP decoding network;
step 3.3, reconstructing characteristics generated by the second-layer MLP decoding network
Figure 796656DEST_PATH_IMAGE023
Performing cross entropy calculation with the label vector to obtain a loss functionLossFunction:
Figure DEST_PATH_IMAGE024
In the above formula, the first and second carbon atoms are,nn reconstruction features are provided for one batch; the above formula isnSumming the cross entropies of the bar reconstruction features to serve as a loss function value of the batch of training data;
step 3.4, obtaining the loss function according to step 3.3LossFunctionModel iteration is carried out when the loss functionLossFunctionWhen the convergence tends to be reached, a supervised matching model is obtainedModelAnd stored in a storage device.
5. The hybrid supervision-based double-layer matching coding mapping recommendation method of claim 4, wherein step 3.1 further expands the word segmentation result according to the coding word segmentation amount, and blanks are adopted to fill up the shortage.
6. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 5, wherein: in step 3.1, the encoding word quantity is 20, and the Chinese description word quantity is also 20; step 3.2 with supervised matching modelsModelInputting 256 × 40 data into the first-layer MLP coding network, wherein 256 is batch data volume, and 40 is coding participle and Chinese participle data volume; the output of the first layer of MLP coding network is 256 x 20 data; the output of the second layer MLP coding network is 256 x 10 data; the output of the first layer of MLP decoding network is 256 x 20 data; the output of the second layer MLP decoding network is 256 × 40 data.
7. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 4, wherein: loss function in step 3.4LossFunctionWhen the convergence tends to be reached, the loss function value is converged to 0.17, and the preset value of the iteration times is 10000 steps.
8. The hybrid supervised double-layer matching coding mapping recommendation method according to claim 1 or 4, wherein the step 4 specifically comprises the following steps:
step 4.1, obtaining the description participle of the original KKS code and the new KKS code respectively through participle
Figure 39549DEST_PATH_IMAGE025
WhereinwThe word is divided into sub-words,iis the number of word segmentation;
step 4.2, the description participles of the original KKS code and the new KKS code obtained in the step 4.1 are subjected to similarity calculation on the participle result by adopting the minimum editing distance, and the similarity scores of the editing distances of the original KKS code and the new KKS code are obtainedScore
Step 4.3, similarity score obtained from step 4.2ScoreAccording to the threshold of similarity
Figure DEST_PATH_IMAGE026
Filtering is carried out if the similarity score isScoreIs lower than
Figure 343492DEST_PATH_IMAGE026
If the matching fails, entering step 5 to perform supervised matching; if the similarity score isScoreAbove
Figure 87457DEST_PATH_IMAGE026
If the matching is successful, the matching result is directly output.
9. The hybrid supervised double-layer matching based coding mapping recommendation method as recited in claim 8, wherein: similarity threshold in step 4.3
Figure 322740DEST_PATH_IMAGE026
Is 85.
10. The hybrid supervised double-layer matching based coding mapping recommendation method as claimed in claim 1, wherein the step 5.1 is based on supervised mappingDu matching modelModelThe specific way to obtain the generated result is as follows: will have a supervision to match the modelModelProviding service through a computing unit and a data interface of the storage equipment to obtain codes and Chinese description; and 5.2, obtaining 5 recommendation results after similarity matching.
CN202110905914.2A 2021-08-09 2021-08-09 Double-layer matching coding mapping recommendation method based on hybrid supervision Pending CN113673202A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110905914.2A CN113673202A (en) 2021-08-09 2021-08-09 Double-layer matching coding mapping recommendation method based on hybrid supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110905914.2A CN113673202A (en) 2021-08-09 2021-08-09 Double-layer matching coding mapping recommendation method based on hybrid supervision

Publications (1)

Publication Number Publication Date
CN113673202A true CN113673202A (en) 2021-11-19

Family

ID=78541833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110905914.2A Pending CN113673202A (en) 2021-08-09 2021-08-09 Double-layer matching coding mapping recommendation method based on hybrid supervision

Country Status (1)

Country Link
CN (1) CN113673202A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461714A (en) * 2022-01-13 2022-05-10 湖北国际物流机场有限公司 BIM code conversion system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461714A (en) * 2022-01-13 2022-05-10 湖北国际物流机场有限公司 BIM code conversion system
CN114461714B (en) * 2022-01-13 2024-03-29 湖北国际物流机场有限公司 BIM code conversion system

Similar Documents

Publication Publication Date Title
CN113033534B (en) Method and device for establishing bill type recognition model and recognizing bill type
CN109284372B (en) User operation behavior analysis method, electronic device and computer readable storage medium
CN110597735A (en) Software defect prediction method for open-source software defect feature deep learning
CN105022790B (en) A kind of city entity geocoding integration method of object-oriented
CN110515931B (en) Capacitive type equipment defect prediction method based on random forest algorithm
CN111709244B (en) Deep learning method for identifying cause and effect relationship of contradictory dispute
CN109284371A (en) Anti- fraud method, electronic device and computer readable storage medium
CN115526236A (en) Text network graph classification method based on multi-modal comparative learning
CN113673202A (en) Double-layer matching coding mapping recommendation method based on hybrid supervision
CN108805280B (en) Image retrieval method and device
CN110489423B (en) Information extraction method and device, storage medium and electronic equipment
CN115906857A (en) Chinese medicine text named entity recognition method based on vocabulary enhancement
Khalyasmaa et al. Data mining applied to decision support systems for power transformers’ health diagnostics
CN117131449A (en) Data management-oriented anomaly identification method and system with propagation learning capability
CN115617666A (en) GPT2 model-based Chinese test case completion method
CN115345163A (en) Outfield quality analysis method and system based on fault data
CN113032372B (en) ClickHouse database-based space big data management method
CN113343643B (en) Supervised-based multi-model coding mapping recommendation method
CN115116080A (en) Table analysis method and device, electronic equipment and storage medium
CN115423105A (en) Pre-training language model construction method, system and device
CN113643141A (en) Method, device and equipment for generating explanatory conclusion report and storage medium
CN114168720A (en) Natural language data query method and storage device based on deep learning
CN107944045A (en) Image search method and system based on t distribution Hash
CN114611510A (en) Method and device for assisting machine reading understanding based on generative model
CN104951651B (en) It is a kind of that the non-negative view data dimension reduction method optimized with A is constrained based on Hessen canonical

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220823

Address after: Room 307, No. 32, Gaoji Street, Xihu District, Hangzhou City, Zhejiang Province, 310002

Applicant after: Zhejiang Zheneng Digital Technology Co.,Ltd.

Applicant after: ZHEJIANG ENERGY R & D INSTITUTE Co.,Ltd.

Address before: 5 / F, building 1, No. 2159-1, yuhangtang Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Applicant before: ZHEJIANG ENERGY R & D INSTITUTE Co.,Ltd.