CN115048682B - Safe storage method for land circulation information - Google Patents
Safe storage method for land circulation information Download PDFInfo
- Publication number
- CN115048682B CN115048682B CN202210971299.XA CN202210971299A CN115048682B CN 115048682 B CN115048682 B CN 115048682B CN 202210971299 A CN202210971299 A CN 202210971299A CN 115048682 B CN115048682 B CN 115048682B
- Authority
- CN
- China
- Prior art keywords
- data
- land
- segment
- sensitive
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/70—Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
- G06F21/78—Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Storage Device Security (AREA)
- Complex Calculations (AREA)
Abstract
The invention relates to the technical field of data storage, in particular to a safe storage method of land circulation information, which divides sensitive data and insensitive data of a plurality of pieces of land circulation information according to the data characteristics of the land circulation information; the method comprises the steps of self-adaptively obtaining the sensitive information distribution correlation degree of sensitive data according to the distribution characteristics of the sensitive data and insensitive data; sensitive data migration is carried out by utilizing the distribution relevance of the sensitive information, so that the sensitive data is hidden in the insensitive data, and key acquisition and data coding compression of corresponding data are carried out on land circulation information after data migration is generated, so that the safe storage of the land circulation information is simple and efficient, the confidentiality is higher, and the safety of data storage is improved.
Description
Technical Field
The invention relates to the technical field of data storage, in particular to a safe storage method of land circulation information.
Background
The land circulation is one of important solutions for solving rural land problems, the traditional management mode of land circulation information generally adopts signing a land circulation written contract, the land circulation written contract is examined and certified, and the land circulation written contract is stored in a form of data files, however, the traditional management mode is difficult to adapt to the requirements of modern land circulation management, namely, the traditional management mode of land circulation information has great shortage of storage safety, and because the storage mode is not high in safety, important data is easily lost, the integration degree is low, and the information storage aspect is not standard enough, so that an efficient land circulation information safe storage method is needed.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a safe storage method of land circulation information, and the adopted technical scheme is as follows:
collecting N pieces of land circulation information, wherein N is a positive integer, constructing a data matrix according to various information attributes in each piece of land circulation information, each row in the data matrix represents one piece of land circulation information, and each column represents one information attribute; setting a digital semantic label for text data of each piece of land circulation information under the information attribute in the data matrix; determining a data type corresponding to each piece of land circulation information according to the land circulation area and the land transaction price under the information attribute, wherein the data type comprises sensitive data and insensitive data;
constructing a rectangular coordinate system under the current land category by taking the land circulation area as a horizontal coordinate and the land intersection price as a vertical coordinate, and acquiring the self-adaptive neighborhood range of each sensitive data point under the current land category according to the distribution of the data points in the rectangular coordinate system; calculating the position difference between the data point corresponding to the current sensitive data and other data points in the self-adaptive neighborhood range of the current sensitive data to obtain the local sensitive information distribution degree of the data point corresponding to the current sensitive data, and obtaining the sensitive information distribution association degree of the current sensitive data according to the local sensitive information distribution degree of each piece of sensitive data in the self-adaptive neighborhood range of the data point corresponding to the current sensitive data;
acquiring the sensitive information distribution association degree of all sensitive data; respectively calculating data offset corresponding to the soil circulation area and the land transaction price according to the sensitive information distribution relevance of each piece of sensitive data, and obtaining the offset land circulation area and the offset land transaction price according to the data offset; setting the data offset of the land circulation area and the land transaction price of the insensitive data as 0;
obtaining a key matrix of N pieces of land circulation information according to the data offset of the sensitive data and the insensitive data; and coding and compressing the digital semantic label corresponding to each piece of land circulation information, the land circulation area after deviation and the land transaction price after deviation, and respectively storing the compressed land circulation information and the key matrix.
Further, the method for determining the data type corresponding to each piece of land circulation information according to the land circulation area and the land transaction price under the information attribute comprises the following steps:
respectively calculating the average land transaction price and the average land circulation area of a unit area under each land category based on the historically stored land circulation information;
according to the firstCalculating a first data sensitivity degree of the land circulation area and a second data sensitivity degree of the land transaction price respectively according to the land category, the land circulation area and the land transaction price under the land circulation information, wherein the calculation expressions of the first data sensitivity degree and the second data sensitivity degree are as follows:
wherein, the first and the second end of the pipe are connected with each other,is as followsA first data sensitivity degree of the strip land circulation information;is a firstA second data sensitivity level of the strip land circulation information;is a land category ofTo (1)Land circulation area of the land circulation information;is a land category ofTo (1)The land transaction price of the land circulation information;representing land categoriesAverage land circulation area under;representing land categoriesThe average land transaction price per unit area;representing a hyperbolic tangent function;
respectively setting a first data sensitivity threshold and a second data sensitivity threshold, and confirming that the first data sensitivity degree is greater than or equal to the first data sensitivity threshold or the second data sensitivity degree is greater than or equal to the second data sensitivity thresholdThe strip land circulation information belongs to sensitive data; when the first data sensitivity degree is less than the first data sensitivity threshold value and the second data sensitivity degree is less than the second data sensitivity threshold value, confirming that the first data sensitivity degree is less than the first data sensitivity threshold valueThe strip land circulation information belongs to insensitive data.
Further, the method for obtaining the self-adaptive neighborhood range of the data point corresponding to each piece of sensitive data in the current land category according to the distribution of the data points in the rectangular coordinate system includes:
performing trend line fitting on clusters formed by insensitive data in an orthogonal coordinate system to obtain trend lines, and performing equal division on the trend lines to initially divide the clusters into 10 segments of data segments to obtain the second segmentInterval length and number of segment data segmentsThe total number of data points in the segment data segment;
select the firstTaking any data point in the segment data segment as a target data point, taking the target data point as a circle center, obtaining a circle corresponding to the target data point by using a set radius, respectively calculating the data similarity between the target data point and other data points in the circle, marking the data points with the data similarity larger than a data similarity threshold, marking the target data point, and counting the total number of the marked data pointsCalculating the firstTotal number of data points in segment data segmentAnd total number of labeled data pointsRatio of (a) to (b)Taking the ratio as the distribution probability of the target data points;
from the firstOne data point is selected as a target data point continuously from the unmarked data points in the segment data segment, and the distribution probability of a plurality of target data points is obtained; according to the firstThe distribution probability and the interval length of the target data points in the segment data segment are obtainedThe adaptive interval length of the segment data segment;
acquiring the self-adaptive interval length of each segment of data segment to re-divide the clusters to obtain new data segments; and obtaining the self-adaptive neighborhood range of the data point corresponding to each piece of sensitive data according to the position between the new data segment and the data point corresponding to the sensitive data and the number of the data points in the new data segment.
Further, the calculation formula of the data similarity is as follows:
wherein the content of the first and second substances,is a data pointAnd data pointsData similarity between them;represents the L2 norm;are data pointsThe coordinates of (a);are data pointsThe coordinates of (a);is a natural constant.
Further, the method according to the second aspectThe distribution probability and the interval length of the target data points in the segment data segment are obtainedA method of adaptive gap length for a segment data segment, comprising:
according to the firstCalculating the distribution probability of the target data points in the segment data segmentAnd if the data point distribution characteristic index of the segment data segment is the following formula:
wherein, the first and the second end of the pipe are connected with each other,is a firstData point distribution characteristic indexes of the segment data segments;is a firstThe number of target data points in the segment data segment;is a firstThe distribution probability of each target data point;
acquiring data point distribution characteristic index of each data segment according to the firstThe interval length of the segment data segments and the data point distribution characteristic index of each segment data segment are calculatedThe adaptive interval length of the segment data segment is thenThe calculation formula of the self-adaptive interval length of the segment data segment is as follows:
wherein the content of the first and second substances,is as followsThe adaptive interval length of the segment data segment;is as followsInterval length of segment data segment;is a firstA sign function of the segment data segment;representing a hyper-parameter;is a natural constant;is a firstData point distribution characteristic indexes of the segment data segments;is a firstData point distribution characteristic index of the segment data segment.
Further, the method for obtaining the adaptive neighborhood range of the data point corresponding to each piece of sensitive data according to the position between the new data segment and the data point corresponding to the sensitive data and the number of the data points in the new data segment includes:
acquiring the mass center of each new data segment, taking the mass center as a central data point, respectively calculating the Euclidean distance between the data point of the current sensitive data and each central data point, and taking the new data segment corresponding to the shortest Euclidean distance as a target data segment of the current sensitive data;
calculating the self-adaptive neighborhood range of the data points of the current sensitive data according to the data point number and the shortest Euclidean distance of the target data segment, wherein the calculation formula is as follows:wherein, in the step (A),an adaptive neighborhood range for a data point of current sensitive data;the number of data points for the target data segment J;the Euclidean distance is the shortest Euclidean distance in the Euclidean distances between the data point of the current sensitive data and each central data point;representing a rounding function.
Further, a calculation formula of the local sensitive information distribution degree of the data point corresponding to the current sensitive data is as follows:
wherein, the first and the second end of the pipe are connected with each other,is as followsThe strip sensitive data corresponds to the local sensitive information distribution degree of the data points;is as followsThe number of data points in the adaptive neighborhood range of the data points corresponding to the strip sensitive data;coordinates representing a data point r within the adaptive neighborhood range;is as followsCoordinates of the bar sensitive data corresponding to the data points;representing the L2 norm.
Further, a calculation formula of the sensitive information distribution association degree of the current sensitive data is as follows:
wherein the content of the first and second substances,is as followsThe sensitive information distribution relevance of the strip sensitive data;is as followsAdaptive neighborhood for strip sensitive dataRangeAverage local sensitive information distribution degree of all sensitive data in the system;is a firstAdaptive neighborhood range for strip sensitive dataThe amount of all sensitive data in;is as followsThe strip sensitive data corresponds to the local sensitive information distribution degree of the data points;is as followsThe strip sensitive data corresponds to the local sensitive information distribution degree of the data point.
Further, the calculation formula of the land circulation area after the deviation is as follows:
wherein, the first and the second end of the pipe are connected with each other,is as followsThe land circulation area after the deviation corresponding to the strip sensitive data;is a land category ofTo (1) aThe land circulation area of the bar sensitive data;is a land categoryThe land circulation area average value of the lower insensitive data;is as followsData offset of land circulation area of the bar sensitive data;is as followsA first data sensitivity level of the strip sensitive data;is a first data sensitivity threshold.
Further, the calculation formula of the biased land bargaining price is as follows:
wherein the content of the first and second substances,is as followsThe shifted land transaction price corresponding to the strip sensitive data;is a land category ofTo (1) aThe land transaction price of the bar sensitive data;is a land categoryThe land transaction price mean value of the lower insensitive data;is as followsData offset of land bargain price of the strip sensitive data;is as followsA second data sensitivity level of the strip sensitive data;is a second data sensitivity threshold.
The embodiment of the invention at least has the following beneficial effects: the method comprises the steps of dividing sensitive data and insensitive data according to data characteristics of land circulation information, adaptively obtaining sensitive information distribution association degree of the sensitive data according to distribution characteristics of the sensitive data and the insensitive data, carrying out sensitive data migration by using the sensitive information distribution association degree, enabling the sensitive data to be hidden in the insensitive data, and carrying out key obtaining and data coding compression on corresponding data on land circulation information after data migration is generated, so that simple and efficient safe storage of the land circulation information is achieved, the confidentiality degree is high, and the safety of data storage is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart illustrating steps of a method for securely storing land circulation information according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description, the detailed structure, the features and the effects of the method for securely storing land circulation information according to the present invention are provided with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the method for securely storing land circulation information provided by the present invention in detail with reference to the accompanying drawings.
The embodiment of the invention aims at the following specific scenes: in the process of safely storing the land circulation information, in order to better store sensitive data, special processing needs to be carried out on the sensitive data. The collected land circulation information is subjected to characteristic analysis, and the data offset of the sensitive data is acquired in a self-adaptive mode to generate a corresponding data key, so that the sensitive data can be safely stored.
Referring to fig. 1, a flowchart of steps of a method for securely storing land circulation information according to an embodiment of the present invention is shown, where the method includes the following steps:
s001, collecting N pieces of land circulation information, wherein N is a positive integer, constructing a data matrix according to various information attributes in each piece of land circulation information, wherein each row in the data matrix represents one piece of land circulation information, and each column represents one information attribute; setting a digital semantic label for text data of each piece of land circulation information under the information attribute in the data matrix; and determining a data type corresponding to each piece of land circulation information according to the land circulation area and the land transaction price under the information attribute, wherein the data type comprises sensitive data and insensitive data.
Specifically, gather N land circulation information, N is the positive integer, carries out the preliminary treatment to land circulation information, makes its standardization, and the preliminary treatment process is: because the land circulation information comprises other information attributes such as an outflow party, an inflow party, a land category, a circulation mode, a land area, a land transaction price and the like, a data matrix is constructed according to various information attributes in each piece of land circulation information, each row in the data matrix represents one piece of land circulation information, and each column represents one information attribute.
Further, in the data matrix, since the data with the information attributes of the outgoing party, the incoming party, the land category and the circulation mode are text data, a large space is occupied when the text data is encoded and stored, but the text data all have obvious semantic features, such as: the outflow party and the inflow party can be divided into semantic notes of individuals, groups, governments and the like; the land category can be classified into semantic notes such as forest land, cultivated land, home base and the like; the circulation mode can be divided into semantic notes such as land interchange, land rent, land stock, home-based housing, share cooperation and the like, so that a digital semantic label can be set for text data in land circulation information.
Taking a land circulation mode as an example, a DNN semantic network is used for acquiring the digital semantic tags of the land circulation mode in each piece of land circulation information, and the specific training process of the DNN semantic network is as follows: the input data of the DNN semantic network is land circulation information; labeling a land circulation mode in land circulation information, setting a land interchange mode as a digital semantic label 0, a land renting mode as a digital semantic label 1, a land stock-entering mode as a digital semantic label 2, a home-based house mode as a digital semantic label 3 and a stock cooperation mode as a digital semantic label 4; the task of the DNN semantic network is to classify and therefore employ a cross-entropy loss function.
Similarly, for other text data in the land circulation information, the digital semantic tags are obtained by utilizing respective DNN semantic networks, and then the digital semantic tags of various text data in each piece of land circulation information in the data matrix can be obtained.
Text data with information attributes of an out-flowing party, an in-flowing party, land types, a circulation mode and the like are removed from one piece of land circulation information, and data under the residual information attributes are digital data, such as land circulation area and land transaction price. Because the digital semantic tags in the land circulation information do not have specific numeric size meanings, the data type of each piece of land circulation information is mainly analyzed according to the land circulation area and the land transaction price in the digital data, the data type comprises sensitive data and insensitive data, and the specific steps are as follows:
calculating the average land transaction price of the unit area under the same land category based on the historically stored land circulation information, and then respectively obtaining the average land transaction prices of the unit area under all the land categories; and similarly, calculating the average land circulation area under the same land type, and then respectively obtaining the average land circulation areas under all the land types.
Recording the second of N pieces of land circulation informationThe land category of the bar land circulation information isOf 1 atThe land circulation area of the bar land circulation information isThe land bargaining price isAccording to land categoriesCalculating the average land transaction price and the average land circulation areaFirst data sensitivity degree of land circulation area in bar land circulation informationSecond data sensitivity of land bargaining priceThen, the calculation expression of the first data sensitivity level and the second data sensitivity level is:
wherein the content of the first and second substances,representing land categoriesThe mean land circulation area below;representing land categoriesThe average land transaction price per unit area;representing a hyperbolic tangent function.
Setting a first data sensitivity threshold respectivelyAnd a second data sensitivity thresholdWhen the first data sensitivity levelGreater than or equal to a first data sensitivity thresholdOr a second degree of data sensitivityGreater than or equal to a second data sensitivity thresholdWhen it is confirmed thatThe strip land circulation information belongs to sensitive data; otherwise, when the first data sensitivity degreeLess than a first data sensitivity thresholdAnd the second dataDegree of sensitivityLess than a second data sensitivity thresholdWhen it is confirmed thatThe strip land circulation information belongs to insensitive data.
Preferably, in the embodiment of the present invention, the first data sensitivity threshold isAnd a second data sensitivity thresholdTaking an empirical value, let,The implementation may be specific to the particular implementation.
And calculating a first data sensitivity degree and a second data sensitivity degree of each piece of land circulation information in the N pieces of land circulation information, and confirming the data type of each piece of land circulation information according to the first data sensitivity degree and the second data sensitivity degree.
S002, constructing a rectangular coordinate system under the current land category by taking the land circulation area as a horizontal coordinate and the land transaction price as a vertical coordinate, and acquiring the self-adaptive neighborhood range of each sensitive data point under the current land category according to the distribution of the data points in the rectangular coordinate system; and calculating the position difference between the data point corresponding to the current sensitive data and other data points in the self-adaptive neighborhood range of the current sensitive data to obtain the local sensitive information distribution degree of the data point corresponding to the current sensitive data, and obtaining the sensitive information distribution association degree of the current sensitive data according to the local sensitive information distribution degree of each piece of sensitive data in the self-adaptive neighborhood range of the data point corresponding to the current sensitive data.
Specifically, because the land circulation area and the land transaction price are closely related, a rectangular coordinate system under the same land category is respectively constructed by taking the land circulation area as a horizontal coordinate and the land transaction price as a vertical coordinate, and the land circulation area in each piece of land circulation informationPrice of bargaining with landA data point is formed, each data point having coordinates ofThe rectangular coordinate system corresponding to each land category has a corresponding cluster, that is, the insensitive data under the same land type is converged into a cluster, and the sensitive data is independently distributed in the rectangular coordinate system, so that the distribution characteristic of the sensitive data in the rectangular coordinate system is represented by calculating the distribution relevance of the sensitive information of the sensitive data, and the land category is usedFor example, the distribution characteristic of each piece of sensitive data in the land category in the rectangular coordinate system is analyzed, and the specific process is as follows:
(1) Constructing land categoriesThe land category is obtained according to the distribution of data points in the rectangular coordinate systemEach piece of sensitive data below corresponds to an adaptive neighborhood range of the data point.
Specifically, trend line fitting is carried out on a cluster formed by insensitive data in a rectangular coordinate system to obtain a trendA line equally dividing the trend line to initially divide the cluster into 10 segments, each segment having an interval length ofCounting the total number of data points in each segmentWherein, in the step (A),indicating the sequence number of the data segment.
Because the data are equally divided into 10 segments and the distribution characteristics of the data are not considered, the interval length is adjusted according to the data points in each segment, and because the distances are more and more related due to the correlation among the data points, the data similarity between the data points in each segment and the surrounding neighborhood data points is calculated, and then the calculation formula of the data similarity is as follows:
wherein the content of the first and second substances,are data pointsAnd data pointsData similarity between them;represents the L2 norm;are data pointsThe coordinates of (a);is a data pointThe coordinates of (a).
Is selected to beAny data point in the segment data segment is taken as a target data point, the target data point is taken as the circle center, circles corresponding to the target data point are obtained by utilizing the set radius, the data similarity between the target data point and other data points in the circles is respectively calculated, and a data similarity threshold value is setMarking the data points with the data similarity larger than the data similarity threshold, marking the target data points, and counting the total number of the marked data pointsCalculating the firstTotal number of data points in segment data segmentAnd total number of labeled data pointsRatio of (a) to (b)Taking the ratio as the distribution probability of the target data points; continuing to select one data point from the unmarked data points as the target data point, and repeating the operation till the first timeNumber of stagesAll data points in a segment are labeled.
According to the firstCalculating the distribution probability of the target data points in the segment data segmentAnd if the data point distribution characteristic index of the segment data segment is the following formula:
wherein the content of the first and second substances,is a firstData point distribution characteristic indexes of the segment data segments;is as followsThe number of target data points in the segment data segment;is as followsDistribution probability of each target data point.
Similarly, the data point distribution characteristic index of each segment of data segment can be obtained according to the method. The denser the data point distribution in the data segment is, the smaller the set interval length is, on the contrary, the looser the data point distribution is, the larger the set interval length is, therefore, the interval length of the data segment is adjusted according to the data point distribution characteristic index of the data segment, and the number of each segment is obtainedThe adaptive interval length of the segment is thenThe calculation formula of the self-adaptive interval length of the segment data segment is as follows:
wherein the content of the first and second substances,is as followsThe adaptive interval length of the segment data segment;is as followsInterval length of segment data segments;is shown asData point distribution characteristic indexes of the segment data segments;is as followsData point distribution characteristic indexes of the segment data segments;representing hyper-parameters for adjustingTaking the value of (A), taking the empirical reference value;Is shown asSign functions of segment data segments, i.e.,Is a set data distribution characteristic threshold.
Adjusting according to the self-adaptive interval length of each segment of data segment to obtain a new data segment, and counting the number of data points in each segment of new data segment to obtain the second data segmentNumber of data points of segment data segment。
Calculating land categories respectivelyThe following adaptive neighborhood range of each sensitive data point corresponds to: obtaining the centroid of each new data segment, taking the centroid as a central data point, respectively calculating the Euclidean distance between the data point of the current sensitive data and each central data point, taking the new data segment corresponding to the shortest Euclidean distance as a target data segment of the current sensitive data, and calculating the self-adaptive neighborhood range of the data point of the current sensitive data according to the data point number of the target data segment and the shortest Euclidean distance, wherein the calculation formula is as follows:wherein, in the process,an adaptive neighborhood range for a data point of current sensitive data;the number of data points of the target data segment J;the Euclidean distance is the shortest Euclidean distance in the Euclidean distances between the data point of the current sensitive data and each central data point;representing a rounding function.
(2) And calculating the position difference between the data point corresponding to the sensitive data and other data points in the self-adaptive neighborhood range of the sensitive data to obtain the local sensitive information distribution degree of the data point corresponding to each piece of sensitive data.
Specifically, the land category is determined through the step (1)The adaptive neighborhood range of the data point corresponding to each next sensitive data is determined according to the secondData point distribution in the adaptive neighborhood range of the data points corresponding to the strip sensitive data is calculatedDegree of distribution of partial sensitive information of strip sensitive data for representing the secondThe neighborhood distribution characteristic of the data point corresponding to the strip sensitive data isLocal sensitive information distribution degree of corresponding data points of strip sensitive dataThe computational expression of (a) is:
wherein, the first and the second end of the pipe are connected with each other,is a firstThe number of data points in the adaptive neighborhood range of the data points corresponding to the strip sensitive data;coordinates representing a data point r within the adaptive neighborhood range;is as followsCoordinates of the bar sensitive data corresponding to the data points;representing the L2 norm.
(3) And obtaining the sensitive information distribution correlation degree of each piece of sensitive data according to the local sensitive information distribution degree of each piece of sensitive data in the self-adaptive neighborhood range of the data point corresponding to the sensitive data.
Specifically, the local sensitive information distribution degree of each piece of sensitive data is obtained through the step (2), and the sensitive information distribution association degree of the current sensitive data is obtained according to the local sensitive information distribution degree of other sensitive data in the adaptive neighborhood range of the current sensitive data, so that the calculation expression of the sensitive information distribution association degree is as follows:
wherein the content of the first and second substances,is a firstThe sensitive information distribution relevance of the strip sensitive data;is a firstThe average local sensitive information distribution degree of all the sensitive data in the self-adaptive neighborhood range of the strip sensitive data;is as followsThe number of all sensitive data in the self-adaptive neighborhood range of the strip sensitive data;is as followsThe strip sensitive data corresponds to the local sensitive information distribution degree of the data point.
It should be noted that the sensitive information distribution association degree is greater than 1, which indicates that the local sensitive information distribution degree of the data point corresponding to the sensitive data is smaller than the local sensitive information distribution degree of other data points in the adaptive neighborhood range; the sensitive information distribution correlation degree is less than 1, which indicates that the local sensitive information distribution degree of the data point corresponding to the sensitive data is greater than the local sensitive information distribution degree of other data points in the self-adaptive neighborhood range.
S003, acquiring the sensitive information distribution association degrees of all the sensitive data; respectively calculating data offset corresponding to the soil flow area and the land transaction price according to the sensitive information distribution association degree of each piece of sensitive data, and obtaining the offset land flow area and the offset land transaction price according to the data offset; and setting the data offset of the land circulation area of the insensitive data and the land transaction price to be 0.
Specifically, in order to safely store the sensitive data, the method of step S002 is used to obtain the sensitive information distribution relevancy of all the sensitive data in the N pieces of land circulation information, and the data offset calculation is performed on the sensitive data, so that the sensitive data is hidden in the insensitive data. The larger the sensitive information distribution relevance of the sensitive data is, the smaller the local sensitive information distribution of the data point of the sensitive data is, the smaller the local sensitive information distribution of other data points in the adaptive neighborhood range of the sensitive data is, the more sparse the sensitive data is, and the larger the data offset to be adjusted is; the smaller the sensitive information distribution correlation degree of the sensitive data is, the greater the local sensitive information distribution degree of the data point of the sensitive data is than that of other data points in the adaptive neighborhood range, and the closer the sensitive data is, the smaller the data offset required to be adjusted is.
Respectively calculating data offset corresponding to the soil circulation area and the land transaction price according to the sensitive information distribution relevance of each piece of sensitive data, and obtaining the land circulation area after the offset and the land transaction price after the offset according to the data offsetTaking bar sensitive data as an example, the calculation formula of the land circulation area after the deviation is as follows:
wherein the content of the first and second substances,is as followsThe land circulation area after the deviation corresponding to the strip sensitive data;is a land category ofTo (1) aThe land circulation area of the bar sensitive data;as the land categoryThe land circulation area average value of the lower insensitive data;is a hyperbolic tangent function;is as followsData offset of land circulation area of the bar sensitive data;is as followsA first data sensitivity level of the strip sensitive data.
The conditions are as followsIs shown asStrip sensitive data satisfaction soilSensitivity requirement of land circulation area and average land circulation areaThe difference of (a) is negative, and the offset needs to be increased; conditionIs shown asThe strip sensitive data meets the sensitivity requirement of the land circulation area and is equal to the average land circulation areaIs positive, the offset needs to be reduced.
The calculation formula of the land transaction price after deviation is as follows:
wherein the content of the first and second substances,is as followsThe shifted land transaction price corresponding to the strip sensitive data;is a land category ofTo (1)The land transaction price of the bar sensitive data;is a land categoryThe land transaction price mean value of the lower insensitive data;is a hyperbolic tangent function;is as followsData offset of land bargain price of the strip sensitive data;is as followsA second data sensitivity level of the strip sensitive data.
The conditions are as followsIs shown asThe strip sensitive data meets the sensitive requirement of land transaction price and the average land transaction priceThe difference of (c) is negative, and the offset needs to be increased; conditions ofDenotes the firstThe strip sensitive data meets the sensitivity requirement of the land transaction price and the average land transaction priceIs positive, the offset needs to be reduced.
The calculation formula of the land circulation area after the deviation and the land transaction price after the deviation is obtainedData offset of land transaction price of bar sensitive dataData offset from land circulation areaAnd further, the data offset of the land transaction price and the data offset of the land circulation area of each piece of sensitive data can be obtained. Meanwhile, the data offset of the land circulation area and the land transaction price for setting the insensitive data is 0.
Step S004, obtaining N key matrixes of land circulation information according to the data offset of the sensitive data and the insensitive data; and coding and compressing the digital semantic label corresponding to each piece of land circulation information, the land circulation area after deviation and the land transaction price after deviation, and respectively storing the compressed land circulation information and the key matrix.
Specifically, for sensitive data, binary coding is respectively carried out on data offset of land traffic price and data offset of land circulation area, a key matrix is generated by the binary coding, the size of the key matrix is 2 × c, c is the maximum value of binary coding digits of the two data offsets, and if the binary coding digits are not enough, 0 is supplemented to the highest digit of the binary coding; and carrying out binary coding on the data offset of the land transaction price and the data offset of the land circulation area of each piece of sensitive data, so that one piece of sensitive data corresponds to one key matrix.
Similarly, binary coding is also performed on the data offset of each piece of insensitive data, and all the digits after binary coding are 0 because the data offsets of the land circulation area and the land transaction price of the insensitive data are both 0. And combining and splicing the key matrixes of the sensitive data and the insensitive data to form an integral key matrix, wherein the size of the integral key matrix is (2 x N) x C, C refers to the maximum value of the binary coding number in the sensitive data and the insensitive data, the binary coding number is insufficient, and 0 is supplemented to the highest bit of the binary coding.
Further, respectively encoding and compressing the digital semantic tags corresponding to the N pieces of land circulation information, the deflected land circulation area and the deflected land transaction price, wherein the deflected land circulation area and the deflected land transaction price corresponding to the insensitive data in the N pieces of land circulation information are original data; and respectively storing the compressed land circulation information and the integral key matrix in two databases, namely, one database stores the compressed land circulation information, and the other database stores the integral key matrix, thereby finishing the storage of the land circulation information.
In summary, the embodiment of the present invention provides a method for securely storing land circulation information, which divides sensitive data and insensitive data according to data characteristics of the land circulation information, adaptively obtains a sensitive information distribution association degree of the sensitive data according to distribution characteristics of the sensitive data and the insensitive data, and performs sensitive data migration by using the sensitive information distribution association degree, so that the sensitive data is hidden in the insensitive data, and performs key acquisition and data encoding compression on the corresponding data on the land circulation information after data migration, thereby implementing simple and efficient secure storage of the land circulation information, having a higher confidentiality degree, and improving security of data storage.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit of the present invention are intended to be included therein.
Claims (10)
1. A safe storage method of land circulation information is characterized by comprising the following steps:
collecting N pieces of land circulation information, wherein N is a positive integer, constructing a data matrix according to various information attributes in each piece of land circulation information, wherein each row in the data matrix represents one piece of land circulation information, and each column represents one information attribute; setting a digital semantic label for text data of each piece of land circulation information under the information attribute in the data matrix; determining a data type corresponding to each piece of land circulation information according to the land circulation area and the land transaction price under the information attribute, wherein the data type comprises sensitive data and insensitive data;
constructing a rectangular coordinate system under the current land category by taking the land circulation area as a horizontal coordinate and the land intersection price as a vertical coordinate, and acquiring the self-adaptive neighborhood range of each sensitive data point under the current land category according to the distribution of the data points in the rectangular coordinate system; calculating the position difference between the data point corresponding to the current sensitive data and other data points in the self-adaptive neighborhood range of the current sensitive data to obtain the local sensitive information distribution degree of the data point corresponding to the current sensitive data, and obtaining the sensitive information distribution association degree of the current sensitive data according to the local sensitive information distribution degree of each piece of sensitive data in the self-adaptive neighborhood range of the data point corresponding to the current sensitive data; acquiring the sensitive information distribution association degree of all sensitive data; respectively calculating data offset corresponding to the soil circulation area and the land transaction price according to the sensitive information distribution relevance of each piece of sensitive data, and obtaining the offset land circulation area and the offset land transaction price according to the data offset; setting the data offset of the land circulation area and the land transaction price of the insensitive data as 0;
obtaining a key matrix of N pieces of land circulation information according to the data offset of the sensitive data and the insensitive data; and coding and compressing the digital semantic label corresponding to each piece of land circulation information, the land circulation area after the deviation and the land transaction price after the deviation, and respectively storing the compressed land circulation information and the key matrix.
2. The method for safely storing the land circulation information as claimed in claim 1, wherein the method for determining the data type corresponding to each piece of land circulation information according to the land circulation area and the land circulation price under the information attribute comprises the following steps:
respectively calculating the average land transaction price and the average land circulation area of a unit area under each land category based on the historically stored land circulation information;
according to the firstCalculating a first data sensitivity degree of the land circulation area and a second data sensitivity degree of the land transaction price respectively according to the land category, the land circulation area and the land transaction price under the land circulation information, wherein the calculation expressions of the first data sensitivity degree and the second data sensitivity degree are as follows:
wherein the content of the first and second substances,is a firstA first data sensitivity degree of the strip land circulation information;is as followsA second data sensitivity level of the strip land circulation information;is a land category ofTo (1) aLand circulation area of the bar land circulation information;is a land category ofTo (1)The land transaction price of the land circulation information;representing land categoriesAverage land circulation area under;representing land categoriesThe average land transaction price per unit area is lower;representing a hyperbolic tangent function;
setting a first data sensitivity threshold and a second data sensitivity threshold respectively, and confirming the first data sensitivity degree is larger than or equal to the first data sensitivity threshold or the second data sensitivity degree is larger than or equal to the second data sensitivity thresholdThe strip land circulation information belongs to sensitive data; when the first data sensitivity degree is less than the first data sensitivity threshold value and the second data sensitivity degree is less than the second data sensitivity threshold value, confirming that the first data sensitivity degree is less than the first data sensitivity threshold valueThe strip land circulation information belongs to insensitive data.
3. The method for safely storing land circulation information as claimed in claim 1, wherein the method for obtaining the adaptive neighborhood range of the data point corresponding to each sensitive data under the current land category according to the distribution of the data points in the rectangular coordinate system comprises:
performing trend line fitting on clusters formed by insensitive data in an orthogonal coordinate system to obtain trend lines, and performing equal division on the trend lines to initially divide the clusters into 10 segments of data segments to obtain the second segmentInterval length and number of segment data segmentsThe total number of data points in the segment data segment;
is selected to beAny data point in the segment data segment is a target numberThe data points are taken as the circle centers, circles corresponding to the target data points are obtained by utilizing the set radiuses, the data similarity between the target data points and other data points in the circles is respectively calculated, the data points with the data similarity larger than a data similarity threshold value are marked, the target data points are also marked, the total number of the marked data points is countedCalculating the firstTotal number of data points in segment data segmentAnd total number of labeled data pointsRatio of (a) to (b)Taking the ratio as the distribution probability of the target data points;
from the firstContinuously selecting one data point as a target data point from the unmarked data points in the segment data segment to obtain the distribution probability of a plurality of target data points; according to the firstThe distribution probability and the interval length of the target data points in the segment data segment are obtainedThe adaptive interval length of the segment data segment;
acquiring the self-adaptive interval length of each segment of data segment to re-divide the clusters to obtain new data segments; and obtaining the self-adaptive neighborhood range of the data point corresponding to each piece of sensitive data according to the position between the new data segment and the data point corresponding to the sensitive data and the number of the data points in the new data segment.
4. The safe storage method of land circulation information as claimed in claim 3, wherein the calculation formula of the data similarity is:
5. A method for securely storing land circulation information according to claim 3, wherein said method is based on the second principleThe distribution probability and the interval length of the target data points in the segment data segment are obtainedA method of adaptive gap length for a segment data segment, comprising:
according to the firstCalculating the distribution probability of the target data points in the segment data segmentAnd if the data point distribution characteristic index of the segment data segment is the following, the calculation formula of the data point distribution characteristic index is as follows:
wherein the content of the first and second substances,is as followsData point distribution characteristic indexes of the segment data segments;is a firstTarget data points in segment data segmentThe number of (2);is as followsThe distribution probability of each target data point;
acquiring data point distribution characteristic index of each data segment according toThe interval length of the segment data segments and the data point distribution characteristic index of each segment data segment are calculatedAdaptive interval length of segment data segment, thenThe calculation formula of the self-adaptive interval length of the segment data segment is as follows:
wherein, the first and the second end of the pipe are connected with each other,is as followsThe adaptive interval length of the segment data segment;is a firstInterval length of segment data segment;is a firstA sign function of the segment data segment;representing a hyper-parameter;is a natural constant;is as followsData point distribution characteristic indexes of the segment data segments;is as followsData point distribution characteristic index of the segment data segment.
6. A method for securely storing land circulation information according to claim 3, wherein the method for obtaining the adaptive neighborhood range of the data point corresponding to each sensitive data according to the position between the new data segment and the data point corresponding to the sensitive data and the number of data points in the new data segment comprises:
acquiring the centroid of each new data segment, taking the centroid as a central data point, respectively calculating the Euclidean distance between the data point of the current sensitive data and each central data point, and taking the new data segment corresponding to the shortest Euclidean distance as a target data segment of the current sensitive data;
calculating the data points of the current sensitive data according to the data point quantity and the shortest Euclidean distance of the target data segmentThe adaptive neighborhood range of (2) is calculated as:wherein, in the step (A),an adaptive neighborhood range for a data point of current sensitive data;the number of data points of the target data segment J;the Euclidean distance is the shortest Euclidean distance in the Euclidean distances between the data point of the current sensitive data and each central data point;representing a rounding function.
7. The safe storage method of land circulation information, as claimed in claim 1, wherein the calculation formula of the local sensitive information distribution degree of the data points corresponding to the current sensitive data is:
wherein the content of the first and second substances,is as followsThe strip sensitive data corresponds to the local sensitive information distribution degree of the data point;is a firstThe number of data points in the adaptive neighborhood range of the data points corresponding to the strip sensitive data;coordinates representing a data point r within the adaptive neighborhood range;is a firstCoordinates of the bar sensitive data corresponding to the data points;representing the L2 norm.
8. The method for safely storing land circulation information as claimed in claim 1, wherein the calculation formula of the distribution relevancy of the sensitive information of the current sensitive data is as follows:
wherein the content of the first and second substances,is as followsThe sensitive information distribution relevance of the strip sensitive data;is as followsAdaptive neighborhood range for strip sensitive dataAverage local sensitive information distribution degree of all sensitive data in the system;is as followsAdaptive neighborhood range for strip sensitive dataThe amount of all sensitive data in;is as followsThe strip sensitive data corresponds to the local sensitive information distribution degree of the data point;is as followsThe strip sensitive data corresponds to the local sensitive information distribution degree of the data point.
9. The method for safely storing land circulation information as claimed in claim 2, wherein the calculation formula of the land circulation area after the deviation is as follows:
wherein, the first and the second end of the pipe are connected with each other,is as followsThe land circulation area after the deviation corresponding to the strip sensitive data;is a land category ofTo (1)Land circulation area of bar sensitive data;is a land categoryThe land circulation area average value of the lower insensitive data;is as followsData offset of land circulation area of the bar sensitive data;is as followsA first data sensitivity level of the strip sensitive data;is a first data sensitivity threshold.
10. The method for securely storing land circulation information according to claim 2, wherein the calculation formula of the biased land transaction price is as follows:
wherein the content of the first and second substances,is a firstThe shifted land transaction price corresponding to the strip sensitive data;is a land category ofTo (1) aThe land transaction price of the bar sensitive data;is a land categoryThe land transaction price mean value of the lower insensitive data;is as followsData offset of land bargain price of the strip sensitive data;is as followsA second data sensitivity level of the strip sensitive data;is a second data sensitivity threshold.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210971299.XA CN115048682B (en) | 2022-08-15 | 2022-08-15 | Safe storage method for land circulation information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210971299.XA CN115048682B (en) | 2022-08-15 | 2022-08-15 | Safe storage method for land circulation information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115048682A CN115048682A (en) | 2022-09-13 |
CN115048682B true CN115048682B (en) | 2022-11-01 |
Family
ID=83166479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210971299.XA Active CN115048682B (en) | 2022-08-15 | 2022-08-15 | Safe storage method for land circulation information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115048682B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115442155B (en) * | 2022-10-27 | 2023-01-31 | 深圳市光联世纪信息科技有限公司 | Data encryption method and system for SD-WAN |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104881611A (en) * | 2014-02-28 | 2015-09-02 | 国际商业机器公司 | Method and apparatus for protecting sensitive data in software product |
CN110502602A (en) * | 2019-08-14 | 2019-11-26 | 平安科技(深圳)有限公司 | Date storage method, device, equipment and computer storage medium |
CN112579523A (en) * | 2020-12-15 | 2021-03-30 | 广东后海控股股份有限公司 | Rural land circulation management system based on block chain technology |
CN114328640A (en) * | 2021-02-07 | 2022-04-12 | 湖南科技学院 | Differential privacy protection and data mining method and system based on mobile user dynamic sensitive data |
CN114626097A (en) * | 2022-03-22 | 2022-06-14 | 中国平安人寿保险股份有限公司 | Desensitization method, desensitization device, electronic apparatus, and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2417499A1 (en) * | 2000-07-27 | 2002-02-07 | Activated Content Corporation | Stegotext encoder and decoder |
US20220222368A1 (en) * | 2019-05-14 | 2022-07-14 | Equifax Inc. | Data protection via attributes-based aggregation |
-
2022
- 2022-08-15 CN CN202210971299.XA patent/CN115048682B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104881611A (en) * | 2014-02-28 | 2015-09-02 | 国际商业机器公司 | Method and apparatus for protecting sensitive data in software product |
CN110502602A (en) * | 2019-08-14 | 2019-11-26 | 平安科技(深圳)有限公司 | Date storage method, device, equipment and computer storage medium |
CN112579523A (en) * | 2020-12-15 | 2021-03-30 | 广东后海控股股份有限公司 | Rural land circulation management system based on block chain technology |
CN114328640A (en) * | 2021-02-07 | 2022-04-12 | 湖南科技学院 | Differential privacy protection and data mining method and system based on mobile user dynamic sensitive data |
CN114626097A (en) * | 2022-03-22 | 2022-06-14 | 中国平安人寿保险股份有限公司 | Desensitization method, desensitization device, electronic apparatus, and storage medium |
Non-Patent Citations (2)
Title |
---|
不同主体视角下农地流转的风险识别及评价研究――基于上海涉农郊区的调研;牛星等;《中国农业资源与区划》;20180525(第05期);全文 * |
基于数据敏感性的大数据存储安全技术;胡志达;《移动通信》;20200815(第08期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115048682A (en) | 2022-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108920720B (en) | Large-scale image retrieval method based on depth hash and GPU acceleration | |
CN104008174B (en) | A kind of secret protection index generation method of massive image retrieval | |
CN107085607B (en) | Image feature point matching method | |
CN104036012B (en) | Dictionary learning, vision bag of words feature extracting method and searching system | |
Caruso et al. | Deprivation and the dimensionality of welfare: a variable‐selection cluster‐analysis approach | |
Pan et al. | Product quantization with dual codebooks for approximate nearest neighbor search | |
CN107784110A (en) | A kind of index establishing method and device | |
CN113869052B (en) | AI-based house address matching method, storage medium and equipment | |
CN111191051B (en) | Method and system for constructing emergency knowledge map based on Chinese word segmentation technology | |
CN115048682B (en) | Safe storage method for land circulation information | |
Erpolat Taşabat | A Novel Multicriteria Decision‐Making Method Based on Distance, Similarity, and Correlation: DSC TOPSIS | |
WO2023024408A1 (en) | Method for determining feature vector of user, and related device and medium | |
CN102693258A (en) | High-accuracy similarity search system | |
De Stefano et al. | An adaptive weighted majority vote rule for combining multiple classifiers | |
CN114943285B (en) | Intelligent auditing system for internet news content data | |
CN108256058B (en) | Real-time response big media neighbor retrieval method based on micro-computing platform | |
CN113742495B (en) | Rating feature weight determining method and device based on prediction model and electronic equipment | |
CN115186138A (en) | Comparison method and terminal for power distribution network data | |
CN110147497B (en) | Individual content recommendation method for teenager group | |
CN113220936A (en) | Intelligent video recommendation method and device based on random matrix coding and simplified convolutional network and storage medium | |
CN110796546A (en) | Distributed clustering algorithm based on block chain | |
CN113627598B (en) | Twin self-encoder neural network algorithm and system for accelerating recommendation | |
CN117725102B (en) | Digital ticket management method and system based on artificial intelligence | |
CN114297250B (en) | Bidder ring group naming method based on frequency | |
PEŁKA et al. | Symbolic Ensemble Clustering And Linear Ordering Of European Countries According To Their Economic Freedom |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |