CN112256753A

CN112256753A - Data encryption secure transmission method

Info

Publication number: CN112256753A
Application number: CN202011087757.0A
Authority: CN
Inventors: 于士国
Original assignee: Shandong Sunsam Information Technology Co ltd
Current assignee: Shandong Sunsam Information Technology Co ltd
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-01-22
Anticipated expiration: 2040-10-13
Also published as: CN112256753B

Abstract

A project data prediction processing method based on data mining comprises the steps of initialization setting, sub-target data cluster establishment, preprocessing to form an effective target data cluster, clustering processing and analysis, homogeneous data association processing and data internal priority number set establishment, related class encryption ID and priority number set encryption ID setting, class data transmission, data verification, mining processing and the like.

Description

Data encryption secure transmission method

Technical Field

The invention relates to the field of data analysis, processing and transmission, in particular to an encryption safe transmission method of engineering data.

Background

Big data (big data), an IT industry term, refers to a data set that cannot be captured, managed, and processed with a conventional software tool within a certain time range, and is a massive, high-growth-rate, diversified information asset that needs a new processing mode to have stronger decision-making power, insight discovery power, and process optimization capability.

Data mining is an emerging discipline, which was born in the 80 th 20 th century and is mainly oriented to the artificial intelligence research field of commercial application. From a technical perspective, data mining is the process of obtaining implicit, previously undetected, potentially valuable information and knowledge from large, complex, irregular, random, and ambiguous data. From the commercial perspective, data mining is to extract, convert and analyze some potential laws and values from a huge database, and obtain key information and useful knowledge for assisting business decisions. Data mining, which is a nontrivial process that reveals implicit, previously unknown and potentially valuable information from a large amount of data in a database, is a hot problem for research in the fields of artificial intelligence and databases. Data mining refers to the process of algorithmically searching a large amount of data for information hidden therein. Data mining is generally related to computer science and achieves this through many methods such as statistics, online analytical processing, intelligence retrieval, machine learning, expert systems (relying on past rules of thumb), and pattern recognition.

Data mining is a decision support process, and is mainly based on artificial intelligence, machine learning, pattern recognition, statistics, databases, visualization technologies and the like, the data of enterprises are analyzed in a highly automated manner, inductive reasoning is made, potential patterns are mined out from the data, decision makers are helped to adjust market strategies, risks are reduced, and correct decisions are made. The knowledge discovery process consists of three phases: firstly, preparing data; data mining; expression and explanation of results. Data mining may interact with users or knowledge bases.

Data encryption is a long-history technology, which means that plaintext is converted into ciphertext through an encryption algorithm and an encryption key, and decryption is realized by recovering the ciphertext into plaintext through a decryption algorithm and a decryption key. Its core is cryptography. Data encryption is still the most reliable way for computer systems to protect information. The information is encrypted by utilizing a cryptographic technology, so that the information is concealed, and the effect of protecting the safety of the information is achieved. The purpose of data transmission encryption technology is to encrypt data stream in transmission, and there are two kinds of encryption, line encryption and end-to-end encryption. The line encryption focuses on the line without considering the information source and the information sink, and the security protection is provided for the secret information through different encryption keys of each line. The end-end encryption means that information is automatically encrypted by a sending end, data packet encapsulation is carried out by TCP/IP, then the information passes through the Internet as unreadable and unidentifiable data, and when the information reaches a destination, the information is automatically recombined and decrypted to form readable data. The data storage encryption technology aims to prevent data from being decrypted in a storage link, and can be divided into ciphertext storage and access control. The former is generally realized by methods such as encryption algorithm conversion, additional encryption codes, encryption modules and the like; the latter checks and limits the user qualification and authority to prevent illegal user from accessing data or legal user from unauthorized accessing data

A large amount of engineering data cannot be processed according to a conventional processing and storing method, the problems of large data volume and low processing efficiency can be effectively solved by using a large data mode and a data mining data processing mode, but the requirement on the transmission safety of the engineering data is higher. The existing data mining mostly utilizes the operations of clustering data and the like at a server side of big data, the data pertinence is poor, the calculated amount is large, meanwhile, the interaction of a client side needs bidirectional big data transmission, the efficiency is low, and the speed is low. For the encryption mode of data, the prior art basically only aims at the confidentiality of client information, and does not carry out encryption protection on the data from the whole data processing process, and particularly does not carry out targeted encryption measures in the data transmission process, so that the security of the data is low.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide an encryption safe transmission method of engineering data, which can improve the processing efficiency and speed of data prediction and has a targeted data encryption mode so that the data safety is high.

The invention provides an encryption safe transmission method of engineering data, which comprises the following steps in sequence:

(1) initializing, namely establishing bidirectional communication connection among a data mining server, a plurality of remote engineering target terminals and a plurality of class node servers respectively;

(2) establishing a sub-target data cluster for each remote engineering target terminal; preprocessing the sub-target data cluster to form an effective target data cluster;

(3) for each remote engineering target end, clustering and analyzing the effective target data clusters, and classifying data in the effective target data clusters according to a preset data selection type;

(4) according to the classification result of the clustering process, respectively aiming at the classification result of each class, performing association processing on the data of the same class, and establishing a priority number set in the data;

(5) based on the classification result of each class, different encryption IDs with association are respectively set for different classes and corresponding priority number sets, specifically:

(5.1) randomly selecting an encryption algorithm and a key, carrying out confidential treatment on the class attributes of different classes, and setting corresponding class encryption IDs aiming at the different classes, wherein the class encryption IDs comprise algorithm index numbers and key index numbers;

(5.2) adding a priority number set index number in the class encryption ID based on the class encryption ID set for different classes to form a priority number set encryption ID corresponding to the priority number set;

and (5.3) respectively sending the class encryption ID and the priority number set encryption ID to a data mining server side and a plurality of class node server sides.

(5.4) after the data mining server end and the plurality of class node server ends receive the class encryption ID and the priority number set encryption ID, decryption can be realized by searching an encryption algorithm and a key which accord with the class encryption ID and the priority number set encryption ID;

(6) the data mining server side sends a data transmission instruction to one or more of the plurality of class node servers; the class node server receiving the transmission instruction sets transmission storage parameters corresponding to the class according to the transmission class requirement, and respectively sends the class transmission instruction to a plurality of remote engineering target ends connected with the class transmission storage parameters, wherein each class node server corresponds to the transmission of one class;

(7) after receiving the transmission instruction, the plurality of remote engineering target terminals respectively send the priority number sets in the plurality of classification results to the class node servers matched with the transmission instruction in sequence;

(8) and the category node server verifies the received priority number set encryption ID, transmits the priority number set corresponding to the category to the data mining server side after the verification condition is met, and realizes data mining and prediction after processing and analysis.

Further, the preprocessing of the sub-target data clusters in the step (2) is to perform cleaning and washing after performing preliminary screening on the sub-target data clusters, and remove noise and abnormal data therein.

Further, the cleaning and washing in the step (2) are specifically the cleaning and washing of the screened data.

Further, the step (4) specifically includes the following steps:

(4.1) randomly selecting high-reliability data from the classification result of each class as first data, and classifying the first data into a priority number set;

(4.2) setting a first threshold based on the first data, sequentially carrying out error processing on other data in each class of classification results and the first data, and classifying second data and third data which fall within the first threshold and have the minimum positive error and the minimum negative error relative to the first data into a priority number set;

(4.3) associating the second and third data with the first data, respectively, while dividing other data that do not fall into the priority number set into a positive error group and a negative error group according to whether they have a positive error or a negative error with respect to the first data;

(4.4) setting a second threshold smaller than the first threshold based on the second data and the third data, sequentially carrying out error processing on the data in the positive error group and the negative error group and the second data and the third data respectively, classifying the fourth data and the fifth data which fall within the second threshold and have the minimum error relative to the second data and the third data into a priority number set, and associating the fourth data and the fifth data with the second data and the third data respectively;

(4.5) performing association processing in the same manner as in steps (4.3) to (4.4) until:

a. ending when the requirement of the data quantity of the priority number set is met; or

b. And (5) when no data meets the corresponding threshold and does not meet the requirement of the data quantity of the priority number set, selecting high-reliability data, and repeating the steps (4.1) - (4.5) until the requirement of the data quantity of the priority number set is met.

Further, the amount of data in the priority number set is within 20% of the amount in the classification result of each class.

Further, the amount of data in the priority number set is 15% of the amount in the classification result of each class.

Further, the method further comprises a step (9), specifically, when the data mining server side needs more complete data corresponding to the category, the method directly sends an instruction of complete data transmission and keys corresponding to the class encryption ID to a plurality of remote engineering target sides, carries out class encryption ID and key verification on the complete data corresponding to the category at the plurality of remote engineering target sides, and directly sends the complete data to the data mining server side for analysis processing when verification conditions are met.

The encryption safe transmission method of the engineering data can realize that:

1) in the two-step screening process, the reliability of the data is higher, the useful target data is screened out firstly, then the useful data is processed, so that the processing speed can be effectively improved, the pertinence is higher, the efficiency of the whole prediction processing method can be effectively improved at the front end, and the guarantee is provided for the subsequent processing;

2) after the effective target data clusters corresponding to each remote engineering target end are clustered, different types of data which are classified preliminarily can be obtained, so that the data are packaged according to the types of the data, and the data integration level is higher;

3) the data is preprocessed in advance, so that the data transmission is targeted, the transmission efficiency can be greatly improved, meanwhile, the data is correlated with high correlation degree, multi-correlation data in positive and negative directions is established, the continuity of the data is strong, and the data processing accuracy is correspondingly higher for subsequent processing analysis;

4) the clustered classification results of the remote engineering target ends are secondarily classified, so that each node server only converges one type of results in one transmission requirement time period, and the correspondingly set attribute parameters (such as optimized setting of transmission length, time and the like of the data) make the data transmission have pertinence, and the efficiency is obviously improved;

5) the classification result and the priority number set are transmitted to different degrees based on different strategies, the transmission efficiency is improved, the class processing effect is high, the data processing pertinence is strong, the efficiency is high, the class encryption ID and the priority number set encryption ID are respectively set, a targeted data encryption mode is provided, the data security is improved, and the data verification is realized.

Drawings

Fig. 1 is a flowchart of an encrypted secure transmission method of engineering data.

Detailed Description

Reference will now be made in detail to the embodiments of the present invention, the following examples of which are intended to be illustrative only and are not to be construed as limiting the scope of the invention.

The invention provides an encryption safe transmission method of engineering data, the specific flow of which is shown in figure 1, the method can improve the processing efficiency and speed of data prediction, and the data safety is high, which is specifically described below.

Data mining is a technology for searching a rule from a large amount of data by analyzing each piece of data, and mainly comprises three steps of data preparation, rule searching and rule representation. The data preparation is to select required data from related data sources and integrate the data into a data set for data mining; the rule searching is to find out the rule contained in the data set by a certain method; the law representation is to represent the found laws as much as possible in a manner understandable to the user (e.g., visualization). The data mining task comprises association analysis, cluster analysis, classification analysis, anomaly analysis, specific group analysis, evolution analysis and the like. The invention adopts a clustering analysis method to carry out specific processing.

The invention provides an encryption safe transmission method of engineering data, which specifically comprises the following steps that are sequentially carried out:

firstly, establishing a sub-target data cluster for each remote engineering target end; and screening the sub-target data clusters, specifically screening out interference data in the data clusters, selecting data related to the mining target, cleaning and washing the screened data again, and removing noise and abnormal data in the screened data to form an effective target data cluster. Therefore, through the two-step screening process, compared with the mode of directly screening or directly cleaning and washing in the prior art, the method has the advantages that the data reliability is higher, the useful target data is screened out firstly, then the useful data is processed, the processing speed can be effectively improved, the pertinence is stronger, the efficiency of the whole prediction processing method can be effectively improved at the front end, and the follow-up processing is guaranteed.

Secondly, clustering the effective target data clusters aiming at each remote engineering target end, analyzing the data in the effective target data clusters according to the data attributes (including but not limited to type, size, time and the like) in the effective target data clusters, and classifying the data in the effective target data clusters according to the preset data selection type. Therefore, after the effective target data clusters corresponding to each remote engineering target end are clustered, different types of data which are classified preliminarily can be obtained, and the data are subjected to type packaging according to the attributes of the data.

Then, according to the classification result of the clustering process, respectively aiming at the classification result of each class, performing association process on the data of the same class, and establishing a priority number set in the data, specifically: randomly selecting high-reliability data from each class of classification results as first data, and classifying the first data into a priority number set, wherein the high-reliability data can be the data closest to the historical standard data or the data with higher reliability determined in other ways, and the high-reliability data is selected according to actual conditions without further limitation; setting a first threshold (namely a threshold range which is relatively close to first data) based on the first data, sequentially carrying out error processing (such as absolute difference error, standard error and the like) on other data in each class of classification results and the first data, classifying second and third data which fall within the first threshold and have the minimum positive and negative errors relative to the first data into a priority set (the positive and negative represent the directions deviating from the first data), respectively associating the second and third data with the first data, and simultaneously dividing other data which do not fall into the priority set into a positive error group and a negative error group according to the positive or negative errors relative to the first data; then, based on the second and third data, setting a second threshold smaller than the first threshold, sequentially performing error processing on the data in the positive error group and the negative error group with the second and third data respectively, classifying the fourth and fifth data which fall within the second threshold and have the minimum error relative to the second and third data into a priority set, and associating the fourth and fifth data with the second and third data respectively, and so on, ending when the requirement of the number of the priority set data is met or the corresponding threshold is not met, but for the condition that the corresponding threshold is not met, selecting the high-reliability data again because the number of the priority set data is not enough, and repeating the steps until the requirement of the number of the priority set data is met. Further, the number of data in the priority number set is preferably within 20%, more preferably 15%, of the number in the classification result for each class because selection is not so large and the advantage of setting the priority number set is lost when the number is too large. Therefore, data is preprocessed at a remote engineering target end in advance, so that the data transmission is pointed, the transmission efficiency can be greatly improved, high-correlation-degree correlation is performed on the data, multi-correlation data in positive and negative directions is established, the continuity is high, and the data processing accuracy is correspondingly higher for subsequent processing analysis.

Next, respectively setting different encryption IDs with associations aiming at different classes and corresponding priority number sets thereof based on the classification result of each class; the decoder is respectively arranged at the plurality of class node server ends, the encrypted ID can be decrypted by using the decoder, and the collected data can be authenticated at the class node server ends to meet the trusted environment of the data, so that the data meeting the decoding requirement can be transmitted in the expected transmission time; the plurality of category node servers are respectively connected with the plurality of remote engineering target terminals and the data mining server terminal.

Wherein, different encryption IDs with associations are respectively set for different classes and corresponding priority number sets, specifically: randomly selecting an encryption algorithm and a key, carrying out confidential processing on the category attributes of different categories, and setting corresponding category encryption IDs aiming at the different categories, wherein the category encryption IDs comprise algorithm index numbers and key index numbers; then, based on the class encryption IDs set for different classes, a priority number set index number is added to the class encryption ID, thereby constituting a priority number set encryption ID corresponding to the priority number set. And then, the class encryption ID and the priority number set encryption ID are respectively sent to a data mining server side and a plurality of class node server sides.

After the data mining server end and the plurality of class node server ends receive the class encryption ID and the priority number set encryption ID, the encryption algorithm and the key which are in accordance with the class encryption ID and the priority number set encryption ID are searched, and therefore decryption can be achieved. It should be noted that the selection of the encryption algorithm is random, and the specific selection thereof as a way of verification can be implemented by using one or more encryption algorithms in the prior art.

The data mining server side sends a data transmission instruction to one or more of the plurality of class node servers; the class node server receiving the transmission instruction sets transmission storage parameters corresponding to the class according to the transmission class requirement, and respectively sends the class transmission instruction to a plurality of remote engineering target ends connected with the class transmission storage parameters, and at the moment, each class node server corresponds to transmission of one class; and after receiving the transmission instruction, the remote engineering target terminals respectively and correspondingly send the priority number sets in the classification results to the classification node servers matched with the transmission instruction in sequence. In this way, the classification node servers classify the clustered classification results of the plurality of remote engineering target ends again, so that each node server only converges one type of results again in a transmission requirement time period, and the correspondingly set attribute parameters (such as the transmission length, time and the like of the data) make the data transmission have pertinence and the efficiency is obviously improved.

The type node server verifies the encrypted ID of the received priority number set of the priority number set, transmits the priority number set corresponding to the type to the data mining server end after the verification condition is met, and carries out processing analysis at the mining server end, so that data mining is realized, and engineering data is predicted according to the mining result. When the data mining server side needs more complete data corresponding to the types, the command of complete data transmission and the ID keys corresponding to the class encryption IDs are directly sent to the plurality of remote engineering target sides, the class encryption IDs and the ID keys corresponding to the complete data corresponding to the types are verified at the plurality of remote engineering target sides, and the complete data are directly sent to the data mining server side to be analyzed and processed when verification conditions are met, so that more comprehensive data are obtained for mining and prediction.

Although exemplary embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions, substitutions and the like can be made in form and detail without departing from the scope and spirit of the invention as disclosed in the accompanying claims, all of which are intended to fall within the scope of the claims, and that various steps in the various sections and methods of the claimed product can be combined together in any combination. Therefore, the description of the embodiments disclosed in the present invention is not intended to limit the scope of the present invention, but to describe the present invention. Accordingly, the scope of the present invention is not limited by the above embodiments, but is defined by the claims or their equivalents.

Claims

1. The encryption safety transmission method of the engineering data is characterized by comprising the following steps of sequentially carrying out:

(5.3) respectively sending the class encryption ID and the priority number set encryption ID to a data mining server side and a plurality of class node server sides;

2. The method of claim 1, wherein: the preprocessing of the sub-target data clusters in the step (2) is to perform cleaning and washing after primary screening of the sub-target data clusters, and to remove noise and abnormal data in the sub-target data clusters.

3. The method of claim 2, wherein: and (3) cleaning and washing in the step (2) specifically cleaning and washing the screened data.

4. The method of claim 1, wherein: the step (4) specifically comprises the following steps:

5. The method of claim 1, wherein: the amount of data in the priority number set is within 20% of the amount in the classification result of each class.

6. The method of claim 5, wherein: the amount of data in the priority number set is 15% of the amount in the classification result of each class.

7. The method of claim 7, wherein: and (9) specifically, when the data mining server side needs more complete data corresponding to the category, directly sending an instruction of complete data transmission and a key corresponding to the class encryption ID to a plurality of remote engineering target sides, verifying the class encryption ID and the key of the complete data corresponding to the category at the plurality of remote engineering target sides, and directly sending the complete data to the data mining server side for analysis and processing when verification conditions are met.