CN116506230B - Data acquisition method and system based on RSA asymmetric encryption - Google Patents

Data acquisition method and system based on RSA asymmetric encryption Download PDF

Info

Publication number
CN116506230B
CN116506230B CN202310771767.3A CN202310771767A CN116506230B CN 116506230 B CN116506230 B CN 116506230B CN 202310771767 A CN202310771767 A CN 202310771767A CN 116506230 B CN116506230 B CN 116506230B
Authority
CN
China
Prior art keywords
data
format
original
original data
cleaning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310771767.3A
Other languages
Chinese (zh)
Other versions
CN116506230A (en
Inventor
杨小剑
陈文忠
欧志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Changying Technology Inc
Original Assignee
Guangdong Changying Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Changying Technology Inc filed Critical Guangdong Changying Technology Inc
Priority to CN202310771767.3A priority Critical patent/CN116506230B/en
Publication of CN116506230A publication Critical patent/CN116506230A/en
Application granted granted Critical
Publication of CN116506230B publication Critical patent/CN116506230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0442Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/123Applying verification of the received information received data contents, e.g. message integrity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5682Policies or rules for updating, deleting or replacing the stored data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S40/00Systems for electrical power generation, transmission, distribution or end-user application management characterised by the use of communication or information technologies, or communication or information technology specific aspects supporting them
    • Y04S40/20Information technology specific aspects, e.g. CAD, simulation, modelling, system security

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a data acquisition method, a system, electronic equipment and a computer storage medium based on RSA asymmetric encryption, wherein the acquisition method comprises the following steps: acquiring data of an original data source, and analyzing the data of the original data source to find out problem data in the original data source; establishing a strategy and a rule of a data cleaning processing model, and importing problem data in an original data source into the data cleaning processing model to obtain clean target data; performing format conversion on the target data, converting the key value format of the target data into a set data format and outputting the set data format; and the target data is encrypted by RSA data and forwarded to the cloud server. The invention can reduce the workload of cleaning and processing and improve the speed and efficiency of cleaning and processing; the target data is transmitted to the cloud server through RSA encryption, so that the bandwidth between edge calculation and the cloud server can be reduced; meanwhile, the data is output after being encrypted by RSA, and the data transmission process is saved and is not revealed and tampered.

Description

Data acquisition method and system based on RSA asymmetric encryption
Technical Field
The invention relates to the technical field of data acquisition, in particular to a data acquisition method and system based on RSA asymmetric encryption.
Background
The intelligent meter is a typical application of a smart city, is widely applied to life and industrial production of urban residents, has a basic measurement function, can monitor and report water leakage, electricity stealing, pipeline abnormality and the like, is gradually applied to technologies such as a related low-power consumption wide area network and the like, and can be used in an increasingly wide application field. The intelligent meter relates to information and control instructions of equipment, if the data acquisition and instruction issuing process is attacked once, the accuracy of data and the automatic control process can be more directly affected, the security threat exists in the data acquisition and transmission process, and the existing data acquisition scheme has the following defects:
(1) Industrial data relates to a large amount of important industrial data and user privacy information, the data is easy to intercept and tamper in the transmission process, and the data transmission process is unsafe;
(2) The original data is reported by the production equipment, and the information system cannot directly utilize the original data;
(3) The real-time requirement of industrial data acquisition is difficult to guarantee;
in view of this, a new solution is needed to solve the above technical problems.
Disclosure of Invention
The invention mainly aims to provide a data acquisition method and system based on RSA asymmetric encryption, which aims to solve the technical problems of untimely industrial data acquisition and unsafe data transmission of an intelligent meter.
In order to achieve the above purpose, the invention adopts the following technical means:
in a first aspect, the present invention provides a data acquisition method based on RSA asymmetric encryption, which includes:
s101, acquiring data of an original data source, and analyzing the data of the original data source to find out problem data in the original data source;
s201, establishing a strategy and a rule of a data cleaning processing model, and importing problem data in an original data source into the data cleaning processing model to obtain clean target data;
s301, performing format conversion on the target data, converting a key value format of the target data into a set data format and outputting the set data format;
s401, the target data is encrypted by RSA data and forwarded to the cloud server.
Optionally, the step S201 further includes:
s601, acquiring a problem data sample set in a simulated original data source;
s602, defining an algorithm of a strategy and a rule of the data cleaning processing model, importing a problem data sample set into the algorithm of the data cleaning processing model for training, searching and determining an error instance;
s603, obtaining a data format warehouse for cleaning the processing model according to the fitting algorithm and error correction found error examples allowed by the algorithm.
Optionally, searching and determining error instances in step S602 includes automatically detecting attribute errors and detecting repeated records;
the automatic attribute error detection algorithm comprises at least one of a statistical-based method, a clustering method and an association rule method;
the detection repeated record algorithm detects two data sets or one combined data set, so that repeated records of the same real entity are determined; the detection repeated record algorithm comprises at least one of a field matching algorithm and a recursive field matching algorithm.
Optionally, correcting the found error instance in the step S603 includes:
s604, training through the format of intelligent meter data to obtain a model of a regular expression and storing the model in a database;
s605, receiving the original data transmitted from the acquisition module, and comparing the original data with the mode of the database;
s606, if the similarity exceeds 90%, the original data part is directly replaced by the characters in the model.
Optionally, the S301 includes:
s302, extracting values from attribute fields of an original data source; the original data source comprises at least one attribute, and each attribute field contains respective data information;
s303, determining and correcting errors of input and spelling;
s304, matching and merging the attribute fields of the error instance are recorded, and the attribute values of the original data sources are converted into a unified format.
Optionally, the step S201 includes:
s202, importing problem data in an original data source into a data format warehouse of a cleaning processing model to find out an error instance;
s203, correcting error data of the error instance to obtain clean original data according to data analysis of the data format warehouse;
s204, returning the filtered clean original data to obtain target data.
Optionally, the S204 includes:
and replacing problem data in the original data source with clean original data to obtain target data.
Optionally, the policies and rules for creating the data cleansing processing model in S201 include:
defining a data cleaning strategy and a rule according to the number of problem data in the original data and the degree of the problem data in the data source, and selecting a proper data cleaning algorithm.
Optionally, the S401 includes:
calculating a public key and a private key by adopting a Euclid expansion algorithm, and encrypting RSA data into a key pair;
the edge device establishes connection with the server, and the encrypted target data is sent to the cloud server through an Https protocol.
In a second aspect, the present invention further provides a data acquisition system based on RSA asymmetric encryption, configured to implement the above-mentioned acquisition method, where the acquisition system includes an intelligent meter and an edge device interconnected and intercommunicated with the intelligent meter, where the edge device includes:
the acquisition module is used for acquiring data of an original data source, analyzing the data of the original data source and finding out problem data in the original data source;
the data cleaning and processing module is used for establishing a strategy and a rule of a data cleaning and processing model, and importing problem data in an original data source into the data cleaning and processing model to obtain clean target data;
the format conversion module is used for carrying out format conversion on the target data, converting the key value format of the target data into a set data format and outputting the set data format;
and the encryption module is used for encrypting and forwarding the RSA data of the target data to the cloud server.
In a third aspect, the present invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the acquisition method described above.
In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the steps of the aforementioned acquisition method.
Compared with the prior art, the invention has the following technical effects:
the invention discloses a data acquisition method and a system based on RSA asymmetric encryption, wherein the acquisition method comprises the following steps: data processing is carried out at the edge of the acquired data, so that the real-time performance of the data is ensured; compared with the traditional data cleaning, the method has the advantages that original data are not required to be analyzed and corrected one by one, the workload of cleaning is reduced, and the speed and efficiency of cleaning are improved; the format conversion of the target data is processed at the edge, and the data received by the server can be directly stored, so that the pressure of the server is reduced; the target data is transmitted to the cloud server through cleaning and RSA encryption, so that the bandwidth between edge calculation and the cloud server can be reduced, and the cost is saved; meanwhile, the data is output after being encrypted by RSA, and the data transmission process is saved and is not revealed and tampered.
The data acquisition system is used for realizing the data acquisition method and has similar technical effects as the acquisition method.
Drawings
FIG. 1 is a schematic diagram of a data acquisition method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a data acquisition system according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, an embodiment of the present invention proposes a data collection method based on RSA asymmetric encryption, including:
s101, acquiring data of an original data source, and analyzing the data of the original data source to find out problem data in the original data source;
dirty data in the custom data source is problem data, as opposed to clean data.
In this embodiment, the data of the original data source is acquired in real time or periodically in the smart meter. The unit for acquiring the data source sends the problem data to the data cleaning processing model of the edge equipment in real time; alternatively, the data source sample set may be periodically collected, and the space in which the data source sample set is stored may be customized.
The computer program detects and analyzes the data of the original data source to obtain problem data in the original data source. And data processing is carried out at the edge of the acquired data, so that the real-time performance of the data is ensured.
S201, establishing a strategy and a rule of a data cleaning processing model, and importing problem data in an original data source into the data cleaning processing model to obtain clean target data;
it will be appreciated that the data cleansing process policies and rules are logical algorithms that involve cleansing processes, including objective functions of logical operations, attribute variables of problem data, and the allowable range of errors of the algorithm. And in the range meeting the error allowance, at least one attribute variable of the problem data is wrong, and the algorithm results corresponding to the objective function are different. And the objective function, the attribute variable of the problem data and the error range are mutually restricted to form a strategy and a rule of the data cleaning processing model.
Before using the policies and rules of the data cleansing process model, the data cleansing process model also needs to be trained by a number of data sources or problem data sample sets that simulate smart meters to improve the accuracy of cleansing process models. In use, the problem data in the original data source is imported into the data cleaning processing model, compared with the traditional data cleaning, the original data does not need to be analyzed and corrected one by one, the workload of cleaning processing is reduced, and the speed and the efficiency of the cleaning processing are improved.
S301, performing format conversion on target data, converting a key value format of the target data into a set data format and outputting the set data format;
in one embodiment, the key value format of the original target data is converted into the corresponding JSON data format for output, so that the original target data is easy to read and recognize by a person.
S401, the target data is encrypted by RSA data and forwarded to the cloud server.
It is known to use RSA data for asymmetric encryption algorithms. It includes two keys: public and private keys. Encrypting by using a public key, and decrypting by using a corresponding private key; encryption is performed with a private key and decryption is only possible with the corresponding public key. The asymmetric encryption algorithm realizes the exchange process of confidential information as follows: the first party generates a pair of secret keys and discloses one of the secret keys as a public key to the other party; the party B obtaining the public key uses the secret key to encrypt the confidential information and then sends the encrypted confidential information to the party A; the first party decrypts the encrypted information with another private key of the first party.
In one embodiment, the step of RSA data encryption comprises:
(1) Two large prime numbers p and q are selected (the length of the two current prime numbers is close to 512 bits, so that the safety is ensured);
(2) Calculating the product n=p×q, Φ (n) = (p-1) (q-1), where Φ (n) is an euler function of n (since the euler function of the product of two prime numbers is equal to the product of two numbers subtracted by one, respectively);
(3) Randomly selecting an integer e (1<e < phi (n)) as a public key d, wherein the maximum common divisor of e and phi (n) is required to be 1, namely the two mutually prime;
(4) Calculating a private key d by using a Euclid expansion algorithm, wherein d.ident.e.1 (mod phi (n)), namely d.ident.e (-1) (mod phi (n)), e and n are public keys, and d is a private key;
converting the received plaintext into a specific coding mode; example p=43, q=59, e=13, the plaintext of the target data is defined as cybergeratwall, and the sequence of the english alphabet a=00, b=01, and z=25 is 022401041706001922001111 after encoding. The code is the RSA data encryption code of the target data, and the encrypted RSA data is forwarded to the cloud server to prepare for cleaning and processing the problem data.
S301, performing format conversion on target data, converting a key value format of the target data into a set data format and outputting the set data format;
in one embodiment, the key value format of the original target data is converted into the corresponding JSON data format for output, so that the original target data is easy to read and recognize by a person. The format conversion of the target data is processed at the edge, and the data received by the server can be directly stored, so that the pressure of the server is relieved.
S401, the target data is encrypted by RSA data and forwarded to the cloud server.
According to the embodiment, the data is encrypted by RSA and then transmitted, and the data transmission process is saved and is not revealed and tampered. Meanwhile, the target data is subjected to the cleaning processing in the step S201 and the RSA encryption in the step S401 and forwarded to the cloud server, so that the bandwidth between the edge calculation and the cloud server can be reduced, and the cost is saved.
In one embodiment, a Euclid expansion algorithm is adopted to calculate a public key and a private key, and RSA data is encrypted into a key pair; the edge device establishes connection with the server, and the encrypted target data is sent to the cloud server through the Https protocol, so that the bandwidth between the edge computing and the cloud server can be reduced, and the cost is saved.
In one embodiment, step S201 further includes:
s601, acquiring a problem data sample set in a simulated original data source;
s602, defining an algorithm of a strategy and a rule of the data cleaning processing model, importing a problem data sample set into the algorithm of the data cleaning processing model for training, searching and determining an error instance;
s603, obtaining a data format warehouse for cleaning the processing model according to the fitting algorithm and error correction found error examples allowed by the algorithm.
Specifically, simulating a working scene of an intelligent meter in modeling software, acquiring an original data source in the scene, and analyzing problem data in the original data source to serve as simulated sample set data; according to a self-defined algorithm for cleaning a strategy and a rule of a processing model, a problem data sample set in a simulation scene is imported into the algorithm for cleaning the processing model for repeated training, so that training precision and sensitivity of the model are improved; and obtaining an objective function matched with the strategy and the rule of the cleaning processing model through limited experimental data. The problem data collected in real time is imported again, and error examples are searched and determined. And obtaining a data format warehouse for cleaning the processing model according to the fitting algorithm and the error correction found error examples.
It will be appreciated that the data format repository is a model of the data cleansing process, and that the imported target data may be quickly searched and error instances determined, and found error instances corrected, through repeated training. Compared with the traditional data cleaning, the method has the advantages that original data does not need to be analyzed and corrected one by one, the workload of cleaning is reduced, and the speed and efficiency of cleaning are improved. Meanwhile, the RSA in the step S301 is encrypted and then output, so that the data transmission process is not leaked and tampered.
In one embodiment, searching for and determining error instances in step S602 includes an algorithm that automatically detects attribute errors and detects duplicate records;
the automatic attribute error detection algorithm comprises at least one of a statistical-based method, a clustering method and an association rule method;
the detection repeated record algorithm detects two data sets or one combined data set, so that repeated records of the same real entity are determined; the detection repetition record algorithm comprises at least one of a field matching algorithm and a recursive field matching algorithm.
In this embodiment, two calculation engines, namely, a Flink calculation engine and a TensorFlow calculation engine, are used for model training according to the characteristics of the smart meter data format.
Examples of errors found in the correction in step S603 include:
s604, training through the format of intelligent meter data to obtain a model of a regular expression and storing the model in a database
S605, receiving the original data transmitted from the acquisition module, comparing the original data with the database mode
S606, if the similarity exceeds 90%, the original data part is directly replaced by the characters in the model.
In one embodiment, step S301 includes:
s302, extracting values from attribute fields of an original data source; the original data source comprises at least one attribute, and each attribute field contains respective data information;
s303, determining and correcting errors of input and spelling;
s304, matching and merging the attribute fields of the error instance are recorded, and the attribute values of the original data sources are converted into a unified format.
The original attributes are converted by the original data source.
In one embodiment, step S201 includes:
s202, importing problem data in an original data source into a data format warehouse of a cleaning processing model to find out an error instance;
the above embodiments illustrate the creation of a data format repository into which problem data in the original data source is imported in use to find error instances.
S203, correcting error data of the error instance to obtain clean original data according to data analysis of the data format warehouse;
after the original data of the intelligent meter is obtained, extracting data conforming to an expression through a regular expression, wherein the parameters of the regular expression comprise:
set.t: matching all characters including line feed
set.H: localized identification matching
Set.n: can match multiple rows of characters, affecting both ≡and $
Set.b: indistinguishable from upper and lower letters
set.U: resolving the character from the Unicode character set, this flag affects \w, \b
set.X: using flexible formats facilitates writing regular expressions to be more easily understood
Examples: if the set.t parameter is not used, matching is performed only in each row, and if one row is not used, the next row is replaced to restart without crossing the rows. After using the set.t parameter, the regular expression will take the string as a whole, add "\n" as a common character to the string, and match in the whole.
S204, returning the filtered clean original data to obtain target data.
The return is clean original data to replace problem data in the original data source, and target data is obtained.
The policies and rules for creating the data cleansing process model in S201 include:
defining a data cleaning strategy and a rule according to the number of problem data in the original data and the degree of the problem data in the data source, and selecting a proper data cleaning algorithm.
The data cleaning strategy and rule comprise a part of data discarding strategy, a complete missing data complementing strategy and a data non-processing strategy, the original data is matched with a data format warehouse according to the data analysis result after entering the data cleaning logic, the format rule is less than 60% in similarity, and the data with low value is directly discarded by using the discarding strategy; the format rule similarity is 60-90%, the data strategy of the complete deletion is used, and the content in the format library is replaced and supplemented; data with format rule similarity exceeding 90% is returned directly as processed data using no processing strategy.
Referring to fig. 2, the present invention further provides a data collection system based on RSA asymmetric encryption, which is configured to implement a data collection method based on RSA asymmetric encryption, where the collection system 800 includes an edge device 700 that interconnects the smart meter 100 and the smart meter, and the edge device 700 includes:
the acquiring module 701 is configured to acquire data of an original data source, analyze the data of the original data source, and find problem data existing in the original data source;
the data cleaning processing module 702 is configured to establish a policy and a rule of a data cleaning processing model, and import the problem data in the original data source into the data cleaning processing model to obtain clean target data;
a format conversion module 703, configured to perform format conversion on the target data, and convert a key value format of the target data into a set data format for output;
and the encryption module 704 is used for encrypting and forwarding RSA data of the target data to the cloud server.
The specific implementation of the data acquisition system based on RSA asymmetric encryption is basically the same as the above embodiments of the data acquisition method based on RSA asymmetric encryption, and will not be repeated here.
The present invention also provides an electronic device including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the acquisition method described above.
The specific implementation manner of the electronic device of the present invention is substantially the same as that of each embodiment of the above-mentioned acquisition method, and will not be repeated here.
The present invention also provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the steps of the aforementioned acquisition method.
The method implemented when the computer instruction control program is executed may refer to various embodiments of the acquisition method of the present invention, and will not be described herein.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims (9)

1. The data acquisition method based on RSA asymmetric encryption is characterized by comprising the following steps of:
s101, acquiring data of an original data source, and analyzing the data of the original data source to find out problem data in the original data source;
s201, establishing a strategy and a rule of a data cleaning processing model, and importing problem data in an original data source into the data cleaning processing model to obtain clean target data;
s301, performing format conversion on the target data, converting a key value format of the target data into a set data format and outputting the set data format;
s401, performing RSA data encryption on target data and forwarding the target data to a cloud server;
the step S201 further includes:
s601, acquiring a problem data sample set in a simulated original data source;
s602, defining an algorithm of a strategy and a rule of the data cleaning processing model, importing a problem data sample set into the algorithm of the data cleaning processing model for training, searching and determining an error instance;
s603, obtaining a data format warehouse for cleaning the processing model according to the fitting algorithm and error correction found error examples allowed by the algorithm.
2. The method according to claim 1, wherein the searching and determining error instances in step S602 includes an automatic attribute error detection algorithm and a repeated record detection algorithm;
the automatic attribute error detection algorithm comprises at least one of a statistical-based method, a clustering method and an association rule method;
the detection repeated record algorithm detects two data sets or one combined data set, so that repeated records of the same real entity are determined; the detection repeated record algorithm comprises at least one of a field matching algorithm and a recursive field matching algorithm.
3. The method according to claim 1, wherein correcting the found error instance in the step S603 comprises:
s604, training through the format of intelligent meter data to obtain a model of a regular expression and storing the model in a database;
s605, receiving the original data transmitted from the acquisition module, and comparing the original data with the mode of the database;
s606, if the similarity exceeds 90%, the original data part is directly replaced by the characters in the model.
4. The acquisition method according to claim 1, characterized in that said S301 comprises:
s302, extracting values from attribute fields of an original data source; the original data source comprises at least one attribute, and each attribute field contains respective data information;
s303, determining and correcting errors of input and spelling;
s304, matching and merging the attribute fields of the error instance are recorded, and the attribute values of the original data sources are converted into a unified format.
5. The acquisition method according to any one of claims 1 to 4, characterized in that the S201 comprises:
s202, importing problem data in an original data source into a data format warehouse of a cleaning processing model to find out an error instance;
s203, correcting error data of the error instance to obtain clean original data according to data analysis of the data format warehouse;
s204, returning the filtered clean original data to obtain target data.
6. The acquisition method according to claim 5, characterized in that the S204 comprises:
clean original data replace problem data in an original data source to obtain target data;
the policies and rules for creating the data cleansing processing model in S201 include:
defining a data cleaning strategy and a rule according to the number of problem data in the original data and the degree of the problem data in the data source, and selecting a proper data cleaning algorithm;
the S401 includes:
calculating a public key and a private key by adopting a Euclid expansion algorithm, and encrypting RSA data into a key pair;
the edge device establishes connection with the server, and the encrypted target data is sent to the cloud server through an Https protocol.
7. A data acquisition system based on RSA asymmetric encryption, which is used for implementing the acquisition method of any one of claims 1 to 6, the acquisition system comprising an intelligent meter and an edge device interconnected and intercommunicated with the intelligent meter, wherein the edge device comprises:
the acquisition module is used for acquiring data of an original data source, analyzing the data of the original data source and finding out problem data in the original data source;
the data cleaning and processing module is used for establishing a strategy and a rule of a data cleaning and processing model, and importing problem data in an original data source into the data cleaning and processing model to obtain clean target data;
the format conversion module is used for carrying out format conversion on the target data, converting the key value format of the target data into a set data format and outputting the set data format;
and the encryption module is used for encrypting and forwarding the RSA data of the target data to the cloud server.
8. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the acquisition method of any one of the preceding claims 1-6.
9. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the steps of the acquisition method of any one of the preceding claims 1-6.
CN202310771767.3A 2023-06-28 2023-06-28 Data acquisition method and system based on RSA asymmetric encryption Active CN116506230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310771767.3A CN116506230B (en) 2023-06-28 2023-06-28 Data acquisition method and system based on RSA asymmetric encryption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310771767.3A CN116506230B (en) 2023-06-28 2023-06-28 Data acquisition method and system based on RSA asymmetric encryption

Publications (2)

Publication Number Publication Date
CN116506230A CN116506230A (en) 2023-07-28
CN116506230B true CN116506230B (en) 2023-10-03

Family

ID=87318755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310771767.3A Active CN116506230B (en) 2023-06-28 2023-06-28 Data acquisition method and system based on RSA asymmetric encryption

Country Status (1)

Country Link
CN (1) CN116506230B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117014233B (en) * 2023-10-07 2024-02-09 中国电子科技集团公司第十五研究所 Tamper-resistant contract data acquisition and generation method and tamper-resistant contract data acquisition and generation device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111181587A (en) * 2019-12-19 2020-05-19 胡友彬 Assembled ocean hydrology information wireless transmission system
CN111542083A (en) * 2020-03-24 2020-08-14 浙江中烟工业有限责任公司 Method for collecting and analyzing through industrial wireless network air interface
CN113010506A (en) * 2021-03-11 2021-06-22 江苏省生态环境监控中心(江苏省环境信息中心) Multi-source heterogeneous water environment big data management system
CN113138970A (en) * 2021-04-23 2021-07-20 上海中通吉网络技术有限公司 Real-time statistical analysis system and method for database error logs
US11221778B1 (en) * 2019-04-02 2022-01-11 Pure Storage, Inc. Preparing data for deduplication
CN114240667A (en) * 2022-02-15 2022-03-25 湖南和信安华区块链科技有限公司 Data asset transaction system based on block chain
CN116186013A (en) * 2023-02-24 2023-05-30 江苏建筑职业技术学院 Carbon peak prediction platform system based on Internet of things

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11221778B1 (en) * 2019-04-02 2022-01-11 Pure Storage, Inc. Preparing data for deduplication
CN111181587A (en) * 2019-12-19 2020-05-19 胡友彬 Assembled ocean hydrology information wireless transmission system
CN111542083A (en) * 2020-03-24 2020-08-14 浙江中烟工业有限责任公司 Method for collecting and analyzing through industrial wireless network air interface
CN113010506A (en) * 2021-03-11 2021-06-22 江苏省生态环境监控中心(江苏省环境信息中心) Multi-source heterogeneous water environment big data management system
CN113138970A (en) * 2021-04-23 2021-07-20 上海中通吉网络技术有限公司 Real-time statistical analysis system and method for database error logs
CN114240667A (en) * 2022-02-15 2022-03-25 湖南和信安华区块链科技有限公司 Data asset transaction system based on block chain
CN116186013A (en) * 2023-02-24 2023-05-30 江苏建筑职业技术学院 Carbon peak prediction platform system based on Internet of things

Also Published As

Publication number Publication date
CN116506230A (en) 2023-07-28

Similar Documents

Publication Publication Date Title
WO2018041066A1 (en) Data processing method, apparatus and system based on block chain technology
JP2022549581A (en) Computing system, method, non-transitory computer-readable medium and computer program product for determining the sequential order of blocks in a DAG-structured blockchain
TW201947446A (en) Blockchain-based information supervision method and device
CN113961434A (en) Method and system for monitoring abnormal behaviors of distributed block chain system users
CN108809630A (en) A kind of testament store method, system, equipment and computer readable storage medium
CN116506230B (en) Data acquisition method and system based on RSA asymmetric encryption
CN112468347A (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN111639355B (en) Data security management method and system
CN113779355A (en) Network rumor source tracing evidence obtaining method and system based on block chain
CN110011990A (en) Intranet security threatens intelligent analysis method
CN113536770B (en) Text analysis method, device and equipment based on artificial intelligence and storage medium
CN113055153A (en) Data encryption method, system and medium based on fully homomorphic encryption algorithm
CN108090364B (en) Method and system for positioning data leakage source
Siewert Big data in the cloud
Yang et al. TAPESTRY: a de-centralized service for trusted interaction online
US20200175321A1 (en) Computer architecture for identifying data clusters using correlithm objects and machine learning in a correlithm object processing system
CN114416673A (en) User behavior abnormity detection method and system embedded with tense
CN116015633A (en) Data encryption method, data decryption method and related devices
CN113935874A (en) District chain-based book management system for studying income
Guo et al. Privacy-Preserving Multi-Label Propagation Based on Federated Learning
CN110727532B (en) Data restoration method, electronic equipment and storage medium
CN109063097B (en) Data comparison and consensus method based on block chain
CN111159200A (en) Data storage method and device based on deep learning
CN116668023B (en) Soil and groundwater environment big data analysis method and system
Wang et al. PrigSim: Towards Privacy-Preserving Graph Similarity Search as a Cloud Service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant