CN115048682B - Safe storage method for land circulation information - Google Patents

Safe storage method for land circulation information Download PDF

Info

Publication number
CN115048682B
CN115048682B CN202210971299.XA CN202210971299A CN115048682B CN 115048682 B CN115048682 B CN 115048682B CN 202210971299 A CN202210971299 A CN 202210971299A CN 115048682 B CN115048682 B CN 115048682B
Authority
CN
China
Prior art keywords
data
land
segment
sensitive
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210971299.XA
Other languages
Chinese (zh)
Other versions
CN115048682A (en
Inventor
蔡海燕
侯亮
耿记申
谢华峰
杨振立
吴云凤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Information And Economic Research Institute Hebei Academy Of Agriculture And Forestry Sciences
Original Assignee
Agricultural Information And Economic Research Institute Hebei Academy Of Agriculture And Forestry Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Information And Economic Research Institute Hebei Academy Of Agriculture And Forestry Sciences filed Critical Agricultural Information And Economic Research Institute Hebei Academy Of Agriculture And Forestry Sciences
Priority to CN202210971299.XA priority Critical patent/CN115048682B/en
Publication of CN115048682A publication Critical patent/CN115048682A/en
Application granted granted Critical
Publication of CN115048682B publication Critical patent/CN115048682B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/78Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Storage Device Security (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to the technical field of data storage, in particular to a safe storage method of land circulation information, which divides sensitive data and insensitive data of a plurality of pieces of land circulation information according to the data characteristics of the land circulation information; the method comprises the steps of self-adaptively obtaining the sensitive information distribution correlation degree of sensitive data according to the distribution characteristics of the sensitive data and insensitive data; sensitive data migration is carried out by utilizing the distribution relevance of the sensitive information, so that the sensitive data is hidden in the insensitive data, and key acquisition and data coding compression of corresponding data are carried out on land circulation information after data migration is generated, so that the safe storage of the land circulation information is simple and efficient, the confidentiality is higher, and the safety of data storage is improved.

Description

Safe storage method of land circulation information
Technical Field
The invention relates to the technical field of data storage, in particular to a safe storage method of land circulation information.
Background
The land circulation is one of important solutions for solving rural land problems, the traditional management mode of land circulation information generally adopts signing a land circulation written contract, the land circulation written contract is examined and certified, and the land circulation written contract is stored in a form of data files, however, the traditional management mode is difficult to adapt to the requirements of modern land circulation management, namely, the traditional management mode of land circulation information has great shortage of storage safety, and because the storage mode is not high in safety, important data is easily lost, the integration degree is low, and the information storage aspect is not standard enough, so that an efficient land circulation information safe storage method is needed.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a safe storage method of land circulation information, and the adopted technical scheme is as follows:
collecting N pieces of land circulation information, wherein N is a positive integer, constructing a data matrix according to various information attributes in each piece of land circulation information, each row in the data matrix represents one piece of land circulation information, and each column represents one information attribute; setting a digital semantic label for text data of each piece of land circulation information under the information attribute in the data matrix; determining a data type corresponding to each piece of land circulation information according to the land circulation area and the land transaction price under the information attribute, wherein the data type comprises sensitive data and insensitive data;
constructing a rectangular coordinate system under the current land category by taking the land circulation area as a horizontal coordinate and the land intersection price as a vertical coordinate, and acquiring the self-adaptive neighborhood range of each sensitive data point under the current land category according to the distribution of the data points in the rectangular coordinate system; calculating the position difference between the data point corresponding to the current sensitive data and other data points in the self-adaptive neighborhood range of the current sensitive data to obtain the local sensitive information distribution degree of the data point corresponding to the current sensitive data, and obtaining the sensitive information distribution association degree of the current sensitive data according to the local sensitive information distribution degree of each piece of sensitive data in the self-adaptive neighborhood range of the data point corresponding to the current sensitive data;
acquiring the sensitive information distribution association degree of all sensitive data; respectively calculating data offset corresponding to the soil circulation area and the land transaction price according to the sensitive information distribution relevance of each piece of sensitive data, and obtaining the offset land circulation area and the offset land transaction price according to the data offset; setting the data offset of the land circulation area and the land transaction price of the insensitive data as 0;
obtaining a key matrix of N pieces of land circulation information according to the data offset of the sensitive data and the insensitive data; and coding and compressing the digital semantic label corresponding to each piece of land circulation information, the land circulation area after deviation and the land transaction price after deviation, and respectively storing the compressed land circulation information and the key matrix.
Further, the method for determining the data type corresponding to each piece of land circulation information according to the land circulation area and the land transaction price under the information attribute comprises the following steps:
respectively calculating the average land transaction price and the average land circulation area of a unit area under each land category based on the historically stored land circulation information;
according to the first
Figure 279217DEST_PATH_IMAGE001
Calculating a first data sensitivity degree of the land circulation area and a second data sensitivity degree of the land transaction price respectively according to the land category, the land circulation area and the land transaction price under the land circulation information, wherein the calculation expressions of the first data sensitivity degree and the second data sensitivity degree are as follows:
Figure 399620DEST_PATH_IMAGE002
wherein, the first and the second end of the pipe are connected with each other,
Figure 320302DEST_PATH_IMAGE003
is as follows
Figure 417309DEST_PATH_IMAGE001
A first data sensitivity degree of the strip land circulation information;
Figure 169364DEST_PATH_IMAGE004
is a first
Figure 929510DEST_PATH_IMAGE001
A second data sensitivity level of the strip land circulation information;
Figure 868647DEST_PATH_IMAGE005
is a land category of
Figure 503766DEST_PATH_IMAGE006
To (1)
Figure 907065DEST_PATH_IMAGE001
Land circulation area of the land circulation information;
Figure 838112DEST_PATH_IMAGE007
is a land category of
Figure 530124DEST_PATH_IMAGE006
To (1)
Figure 939240DEST_PATH_IMAGE001
The land transaction price of the land circulation information;
Figure 898844DEST_PATH_IMAGE008
representing land categories
Figure 266371DEST_PATH_IMAGE006
Average land circulation area under;
Figure 242417DEST_PATH_IMAGE009
representing land categories
Figure 455224DEST_PATH_IMAGE006
The average land transaction price per unit area;
Figure 529054DEST_PATH_IMAGE010
representing a hyperbolic tangent function;
respectively setting a first data sensitivity threshold and a second data sensitivity threshold, and confirming that the first data sensitivity degree is greater than or equal to the first data sensitivity threshold or the second data sensitivity degree is greater than or equal to the second data sensitivity threshold
Figure 801903DEST_PATH_IMAGE001
The strip land circulation information belongs to sensitive data; when the first data sensitivity degree is less than the first data sensitivity threshold value and the second data sensitivity degree is less than the second data sensitivity threshold value, confirming that the first data sensitivity degree is less than the first data sensitivity threshold value
Figure 202929DEST_PATH_IMAGE001
The strip land circulation information belongs to insensitive data.
Further, the method for obtaining the self-adaptive neighborhood range of the data point corresponding to each piece of sensitive data in the current land category according to the distribution of the data points in the rectangular coordinate system includes:
performing trend line fitting on clusters formed by insensitive data in an orthogonal coordinate system to obtain trend lines, and performing equal division on the trend lines to initially divide the clusters into 10 segments of data segments to obtain the second segment
Figure 219426DEST_PATH_IMAGE001
Interval length and number of segment data segments
Figure 153622DEST_PATH_IMAGE001
The total number of data points in the segment data segment;
select the first
Figure 659690DEST_PATH_IMAGE001
Taking any data point in the segment data segment as a target data point, taking the target data point as a circle center, obtaining a circle corresponding to the target data point by using a set radius, respectively calculating the data similarity between the target data point and other data points in the circle, marking the data points with the data similarity larger than a data similarity threshold, marking the target data point, and counting the total number of the marked data points
Figure 548011DEST_PATH_IMAGE011
Calculating the first
Figure 571462DEST_PATH_IMAGE001
Total number of data points in segment data segment
Figure 658367DEST_PATH_IMAGE012
And total number of labeled data points
Figure 37133DEST_PATH_IMAGE011
Ratio of (a) to (b)
Figure 147172DEST_PATH_IMAGE013
Taking the ratio as the distribution probability of the target data points;
from the first
Figure 505472DEST_PATH_IMAGE001
One data point is selected as a target data point continuously from the unmarked data points in the segment data segment, and the distribution probability of a plurality of target data points is obtained; according to the first
Figure 712462DEST_PATH_IMAGE001
The distribution probability and the interval length of the target data points in the segment data segment are obtained
Figure 996551DEST_PATH_IMAGE001
The adaptive interval length of the segment data segment;
acquiring the self-adaptive interval length of each segment of data segment to re-divide the clusters to obtain new data segments; and obtaining the self-adaptive neighborhood range of the data point corresponding to each piece of sensitive data according to the position between the new data segment and the data point corresponding to the sensitive data and the number of the data points in the new data segment.
Further, the calculation formula of the data similarity is as follows:
Figure 859465DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure 818193DEST_PATH_IMAGE015
is a data point
Figure 817373DEST_PATH_IMAGE016
And data points
Figure 39407DEST_PATH_IMAGE017
Data similarity between them;
Figure 628433DEST_PATH_IMAGE018
represents the L2 norm;
Figure 594115DEST_PATH_IMAGE019
are data points
Figure 510118DEST_PATH_IMAGE016
The coordinates of (a);
Figure 637474DEST_PATH_IMAGE020
are data points
Figure 973515DEST_PATH_IMAGE017
The coordinates of (a);
Figure 211730DEST_PATH_IMAGE021
is a natural constant.
Further, the method according to the second aspect
Figure 451081DEST_PATH_IMAGE022
The distribution probability and the interval length of the target data points in the segment data segment are obtained
Figure 546076DEST_PATH_IMAGE001
A method of adaptive gap length for a segment data segment, comprising:
according to the first
Figure 605299DEST_PATH_IMAGE001
Calculating the distribution probability of the target data points in the segment data segment
Figure 411319DEST_PATH_IMAGE001
And if the data point distribution characteristic index of the segment data segment is the following formula:
Figure 36335DEST_PATH_IMAGE023
wherein, the first and the second end of the pipe are connected with each other,
Figure 771073DEST_PATH_IMAGE024
is a first
Figure 317592DEST_PATH_IMAGE001
Data point distribution characteristic indexes of the segment data segments;
Figure 661723DEST_PATH_IMAGE025
is a first
Figure 406825DEST_PATH_IMAGE001
The number of target data points in the segment data segment;
Figure 46885DEST_PATH_IMAGE026
is a first
Figure 80700DEST_PATH_IMAGE027
The distribution probability of each target data point;
acquiring data point distribution characteristic index of each data segment according to the first
Figure 729987DEST_PATH_IMAGE022
The interval length of the segment data segments and the data point distribution characteristic index of each segment data segment are calculated
Figure 48972DEST_PATH_IMAGE001
The adaptive interval length of the segment data segment is then
Figure 922250DEST_PATH_IMAGE028
The calculation formula of the self-adaptive interval length of the segment data segment is as follows:
Figure 177782DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 365180DEST_PATH_IMAGE030
is as follows
Figure 786672DEST_PATH_IMAGE001
The adaptive interval length of the segment data segment;
Figure 286311DEST_PATH_IMAGE031
is as follows
Figure 294719DEST_PATH_IMAGE001
Interval length of segment data segment;
Figure 285808DEST_PATH_IMAGE032
is a first
Figure 860009DEST_PATH_IMAGE001
A sign function of the segment data segment;
Figure 511308DEST_PATH_IMAGE033
representing a hyper-parameter;
Figure 741432DEST_PATH_IMAGE021
is a natural constant;
Figure 332951DEST_PATH_IMAGE024
is a first
Figure 699341DEST_PATH_IMAGE001
Data point distribution characteristic indexes of the segment data segments;
Figure 781261DEST_PATH_IMAGE034
is a first
Figure 560998DEST_PATH_IMAGE035
Data point distribution characteristic index of the segment data segment.
Further, the method for obtaining the adaptive neighborhood range of the data point corresponding to each piece of sensitive data according to the position between the new data segment and the data point corresponding to the sensitive data and the number of the data points in the new data segment includes:
acquiring the mass center of each new data segment, taking the mass center as a central data point, respectively calculating the Euclidean distance between the data point of the current sensitive data and each central data point, and taking the new data segment corresponding to the shortest Euclidean distance as a target data segment of the current sensitive data;
calculating the self-adaptive neighborhood range of the data points of the current sensitive data according to the data point number and the shortest Euclidean distance of the target data segment, wherein the calculation formula is as follows:
Figure 628311DEST_PATH_IMAGE036
wherein, in the step (A),
Figure 114788DEST_PATH_IMAGE037
an adaptive neighborhood range for a data point of current sensitive data;
Figure 937250DEST_PATH_IMAGE038
the number of data points for the target data segment J;
Figure 640502DEST_PATH_IMAGE039
the Euclidean distance is the shortest Euclidean distance in the Euclidean distances between the data point of the current sensitive data and each central data point;
Figure 777085DEST_PATH_IMAGE040
representing a rounding function.
Further, a calculation formula of the local sensitive information distribution degree of the data point corresponding to the current sensitive data is as follows:
Figure 852488DEST_PATH_IMAGE041
wherein, the first and the second end of the pipe are connected with each other,
Figure 49114DEST_PATH_IMAGE042
is as follows
Figure 239662DEST_PATH_IMAGE043
The strip sensitive data corresponds to the local sensitive information distribution degree of the data points;
Figure 914357DEST_PATH_IMAGE044
is as follows
Figure 375425DEST_PATH_IMAGE043
The number of data points in the adaptive neighborhood range of the data points corresponding to the strip sensitive data;
Figure 539690DEST_PATH_IMAGE045
coordinates representing a data point r within the adaptive neighborhood range;
Figure 453420DEST_PATH_IMAGE046
is as follows
Figure 164762DEST_PATH_IMAGE043
Coordinates of the bar sensitive data corresponding to the data points;
Figure 745916DEST_PATH_IMAGE018
representing the L2 norm.
Further, a calculation formula of the sensitive information distribution association degree of the current sensitive data is as follows:
Figure 18765DEST_PATH_IMAGE047
wherein the content of the first and second substances,
Figure 685370DEST_PATH_IMAGE048
is as follows
Figure 206262DEST_PATH_IMAGE043
The sensitive information distribution relevance of the strip sensitive data;
Figure 641922DEST_PATH_IMAGE049
is as follows
Figure 882411DEST_PATH_IMAGE043
Adaptive neighborhood for strip sensitive dataRange
Figure 36312DEST_PATH_IMAGE050
Average local sensitive information distribution degree of all sensitive data in the system;
Figure 89456DEST_PATH_IMAGE051
is a first
Figure 379623DEST_PATH_IMAGE043
Adaptive neighborhood range for strip sensitive data
Figure 56592DEST_PATH_IMAGE050
The amount of all sensitive data in;
Figure 166631DEST_PATH_IMAGE052
is as follows
Figure 321668DEST_PATH_IMAGE053
The strip sensitive data corresponds to the local sensitive information distribution degree of the data points;
Figure 230456DEST_PATH_IMAGE042
is as follows
Figure 750430DEST_PATH_IMAGE043
The strip sensitive data corresponds to the local sensitive information distribution degree of the data point.
Further, the calculation formula of the land circulation area after the deviation is as follows:
Figure 613344DEST_PATH_IMAGE054
wherein, the first and the second end of the pipe are connected with each other,
Figure 775335DEST_PATH_IMAGE055
is as follows
Figure 538630DEST_PATH_IMAGE043
The land circulation area after the deviation corresponding to the strip sensitive data;
Figure 557401DEST_PATH_IMAGE056
is a land category of
Figure 110874DEST_PATH_IMAGE006
To (1) a
Figure 810976DEST_PATH_IMAGE043
The land circulation area of the bar sensitive data;
Figure 726980DEST_PATH_IMAGE057
is a land category
Figure 636028DEST_PATH_IMAGE006
The land circulation area average value of the lower insensitive data;
Figure 473534DEST_PATH_IMAGE058
is as follows
Figure 711749DEST_PATH_IMAGE043
Data offset of land circulation area of the bar sensitive data;
Figure 685521DEST_PATH_IMAGE059
is as follows
Figure 544630DEST_PATH_IMAGE043
A first data sensitivity level of the strip sensitive data;
Figure 869432DEST_PATH_IMAGE060
is a first data sensitivity threshold.
Further, the calculation formula of the biased land bargaining price is as follows:
Figure 645758DEST_PATH_IMAGE061
wherein the content of the first and second substances,
Figure 739616DEST_PATH_IMAGE062
is as follows
Figure 271092DEST_PATH_IMAGE043
The shifted land transaction price corresponding to the strip sensitive data;
Figure 316146DEST_PATH_IMAGE063
is a land category of
Figure 427321DEST_PATH_IMAGE006
To (1) a
Figure 110107DEST_PATH_IMAGE043
The land transaction price of the bar sensitive data;
Figure 15746DEST_PATH_IMAGE064
is a land category
Figure 344834DEST_PATH_IMAGE006
The land transaction price mean value of the lower insensitive data;
Figure 462962DEST_PATH_IMAGE065
is as follows
Figure 531413DEST_PATH_IMAGE043
Data offset of land bargain price of the strip sensitive data;
Figure 404691DEST_PATH_IMAGE066
is as follows
Figure 925802DEST_PATH_IMAGE043
A second data sensitivity level of the strip sensitive data;
Figure 617595DEST_PATH_IMAGE067
is a second data sensitivity threshold.
The embodiment of the invention at least has the following beneficial effects: the method comprises the steps of dividing sensitive data and insensitive data according to data characteristics of land circulation information, adaptively obtaining sensitive information distribution association degree of the sensitive data according to distribution characteristics of the sensitive data and the insensitive data, carrying out sensitive data migration by using the sensitive information distribution association degree, enabling the sensitive data to be hidden in the insensitive data, and carrying out key obtaining and data coding compression on corresponding data on land circulation information after data migration is generated, so that simple and efficient safe storage of the land circulation information is achieved, the confidentiality degree is high, and the safety of data storage is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart illustrating steps of a method for securely storing land circulation information according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description, the detailed structure, the features and the effects of the method for securely storing land circulation information according to the present invention are provided with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the method for securely storing land circulation information provided by the present invention in detail with reference to the accompanying drawings.
The embodiment of the invention aims at the following specific scenes: in the process of safely storing the land circulation information, in order to better store sensitive data, special processing needs to be carried out on the sensitive data. The collected land circulation information is subjected to characteristic analysis, and the data offset of the sensitive data is acquired in a self-adaptive mode to generate a corresponding data key, so that the sensitive data can be safely stored.
Referring to fig. 1, a flowchart of steps of a method for securely storing land circulation information according to an embodiment of the present invention is shown, where the method includes the following steps:
s001, collecting N pieces of land circulation information, wherein N is a positive integer, constructing a data matrix according to various information attributes in each piece of land circulation information, wherein each row in the data matrix represents one piece of land circulation information, and each column represents one information attribute; setting a digital semantic label for text data of each piece of land circulation information under the information attribute in the data matrix; and determining a data type corresponding to each piece of land circulation information according to the land circulation area and the land transaction price under the information attribute, wherein the data type comprises sensitive data and insensitive data.
Specifically, gather N land circulation information, N is the positive integer, carries out the preliminary treatment to land circulation information, makes its standardization, and the preliminary treatment process is: because the land circulation information comprises other information attributes such as an outflow party, an inflow party, a land category, a circulation mode, a land area, a land transaction price and the like, a data matrix is constructed according to various information attributes in each piece of land circulation information, each row in the data matrix represents one piece of land circulation information, and each column represents one information attribute.
Further, in the data matrix, since the data with the information attributes of the outgoing party, the incoming party, the land category and the circulation mode are text data, a large space is occupied when the text data is encoded and stored, but the text data all have obvious semantic features, such as: the outflow party and the inflow party can be divided into semantic notes of individuals, groups, governments and the like; the land category can be classified into semantic notes such as forest land, cultivated land, home base and the like; the circulation mode can be divided into semantic notes such as land interchange, land rent, land stock, home-based housing, share cooperation and the like, so that a digital semantic label can be set for text data in land circulation information.
Taking a land circulation mode as an example, a DNN semantic network is used for acquiring the digital semantic tags of the land circulation mode in each piece of land circulation information, and the specific training process of the DNN semantic network is as follows: the input data of the DNN semantic network is land circulation information; labeling a land circulation mode in land circulation information, setting a land interchange mode as a digital semantic label 0, a land renting mode as a digital semantic label 1, a land stock-entering mode as a digital semantic label 2, a home-based house mode as a digital semantic label 3 and a stock cooperation mode as a digital semantic label 4; the task of the DNN semantic network is to classify and therefore employ a cross-entropy loss function.
Similarly, for other text data in the land circulation information, the digital semantic tags are obtained by utilizing respective DNN semantic networks, and then the digital semantic tags of various text data in each piece of land circulation information in the data matrix can be obtained.
Text data with information attributes of an out-flowing party, an in-flowing party, land types, a circulation mode and the like are removed from one piece of land circulation information, and data under the residual information attributes are digital data, such as land circulation area and land transaction price. Because the digital semantic tags in the land circulation information do not have specific numeric size meanings, the data type of each piece of land circulation information is mainly analyzed according to the land circulation area and the land transaction price in the digital data, the data type comprises sensitive data and insensitive data, and the specific steps are as follows:
calculating the average land transaction price of the unit area under the same land category based on the historically stored land circulation information, and then respectively obtaining the average land transaction prices of the unit area under all the land categories; and similarly, calculating the average land circulation area under the same land type, and then respectively obtaining the average land circulation areas under all the land types.
Recording the second of N pieces of land circulation information
Figure 274973DEST_PATH_IMAGE001
The land category of the bar land circulation information is
Figure 319152DEST_PATH_IMAGE006
Of 1 at
Figure 327559DEST_PATH_IMAGE001
The land circulation area of the bar land circulation information is
Figure 551605DEST_PATH_IMAGE005
The land bargaining price is
Figure 329068DEST_PATH_IMAGE007
According to land categories
Figure 544149DEST_PATH_IMAGE006
Calculating the average land transaction price and the average land circulation area
Figure 508694DEST_PATH_IMAGE001
First data sensitivity degree of land circulation area in bar land circulation information
Figure 100212DEST_PATH_IMAGE003
Second data sensitivity of land bargaining price
Figure 230717DEST_PATH_IMAGE004
Then, the calculation expression of the first data sensitivity level and the second data sensitivity level is:
Figure 819961DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 537381DEST_PATH_IMAGE008
representing land categories
Figure 870274DEST_PATH_IMAGE006
The mean land circulation area below;
Figure 855285DEST_PATH_IMAGE009
representing land categories
Figure 881010DEST_PATH_IMAGE006
The average land transaction price per unit area;
Figure 882464DEST_PATH_IMAGE010
representing a hyperbolic tangent function.
Setting a first data sensitivity threshold respectively
Figure 487889DEST_PATH_IMAGE060
And a second data sensitivity threshold
Figure 94451DEST_PATH_IMAGE067
When the first data sensitivity level
Figure 580490DEST_PATH_IMAGE003
Greater than or equal to a first data sensitivity threshold
Figure 741344DEST_PATH_IMAGE060
Or a second degree of data sensitivity
Figure 478356DEST_PATH_IMAGE004
Greater than or equal to a second data sensitivity threshold
Figure 673845DEST_PATH_IMAGE067
When it is confirmed that
Figure 775793DEST_PATH_IMAGE001
The strip land circulation information belongs to sensitive data; otherwise, when the first data sensitivity degree
Figure 250375DEST_PATH_IMAGE003
Less than a first data sensitivity threshold
Figure 728761DEST_PATH_IMAGE060
And the second dataDegree of sensitivity
Figure 778756DEST_PATH_IMAGE004
Less than a second data sensitivity threshold
Figure 317185DEST_PATH_IMAGE067
When it is confirmed that
Figure 514948DEST_PATH_IMAGE001
The strip land circulation information belongs to insensitive data.
Preferably, in the embodiment of the present invention, the first data sensitivity threshold is
Figure 29981DEST_PATH_IMAGE060
And a second data sensitivity threshold
Figure 262379DEST_PATH_IMAGE067
Taking an empirical value, let
Figure 706130DEST_PATH_IMAGE068
Figure 594451DEST_PATH_IMAGE069
The implementation may be specific to the particular implementation.
And calculating a first data sensitivity degree and a second data sensitivity degree of each piece of land circulation information in the N pieces of land circulation information, and confirming the data type of each piece of land circulation information according to the first data sensitivity degree and the second data sensitivity degree.
S002, constructing a rectangular coordinate system under the current land category by taking the land circulation area as a horizontal coordinate and the land transaction price as a vertical coordinate, and acquiring the self-adaptive neighborhood range of each sensitive data point under the current land category according to the distribution of the data points in the rectangular coordinate system; and calculating the position difference between the data point corresponding to the current sensitive data and other data points in the self-adaptive neighborhood range of the current sensitive data to obtain the local sensitive information distribution degree of the data point corresponding to the current sensitive data, and obtaining the sensitive information distribution association degree of the current sensitive data according to the local sensitive information distribution degree of each piece of sensitive data in the self-adaptive neighborhood range of the data point corresponding to the current sensitive data.
Specifically, because the land circulation area and the land transaction price are closely related, a rectangular coordinate system under the same land category is respectively constructed by taking the land circulation area as a horizontal coordinate and the land transaction price as a vertical coordinate, and the land circulation area in each piece of land circulation information
Figure 913175DEST_PATH_IMAGE070
Price of bargaining with land
Figure 468921DEST_PATH_IMAGE071
A data point is formed, each data point having coordinates of
Figure 817994DEST_PATH_IMAGE072
The rectangular coordinate system corresponding to each land category has a corresponding cluster, that is, the insensitive data under the same land type is converged into a cluster, and the sensitive data is independently distributed in the rectangular coordinate system, so that the distribution characteristic of the sensitive data in the rectangular coordinate system is represented by calculating the distribution relevance of the sensitive information of the sensitive data, and the land category is used
Figure 193612DEST_PATH_IMAGE073
For example, the distribution characteristic of each piece of sensitive data in the land category in the rectangular coordinate system is analyzed, and the specific process is as follows:
(1) Constructing land categories
Figure 321886DEST_PATH_IMAGE073
The land category is obtained according to the distribution of data points in the rectangular coordinate system
Figure 466559DEST_PATH_IMAGE073
Each piece of sensitive data below corresponds to an adaptive neighborhood range of the data point.
Specifically, trend line fitting is carried out on a cluster formed by insensitive data in a rectangular coordinate system to obtain a trendA line equally dividing the trend line to initially divide the cluster into 10 segments, each segment having an interval length of
Figure 314430DEST_PATH_IMAGE031
Counting the total number of data points in each segment
Figure 911764DEST_PATH_IMAGE012
Wherein, in the step (A),
Figure 73755DEST_PATH_IMAGE001
indicating the sequence number of the data segment.
Because the data are equally divided into 10 segments and the distribution characteristics of the data are not considered, the interval length is adjusted according to the data points in each segment, and because the distances are more and more related due to the correlation among the data points, the data similarity between the data points in each segment and the surrounding neighborhood data points is calculated, and then the calculation formula of the data similarity is as follows:
Figure 633787DEST_PATH_IMAGE074
wherein the content of the first and second substances,
Figure 855821DEST_PATH_IMAGE015
are data points
Figure 940452DEST_PATH_IMAGE016
And data points
Figure 640555DEST_PATH_IMAGE017
Data similarity between them;
Figure 556558DEST_PATH_IMAGE018
represents the L2 norm;
Figure 182449DEST_PATH_IMAGE019
are data points
Figure 19955DEST_PATH_IMAGE016
The coordinates of (a);
Figure 320487DEST_PATH_IMAGE020
is a data point
Figure 294259DEST_PATH_IMAGE017
The coordinates of (a).
Is selected to be
Figure 91051DEST_PATH_IMAGE022
Any data point in the segment data segment is taken as a target data point, the target data point is taken as the circle center, circles corresponding to the target data point are obtained by utilizing the set radius, the data similarity between the target data point and other data points in the circles is respectively calculated, and a data similarity threshold value is set
Figure 947012DEST_PATH_IMAGE075
Marking the data points with the data similarity larger than the data similarity threshold, marking the target data points, and counting the total number of the marked data points
Figure 988917DEST_PATH_IMAGE011
Calculating the first
Figure 82775DEST_PATH_IMAGE001
Total number of data points in segment data segment
Figure 68047DEST_PATH_IMAGE012
And total number of labeled data points
Figure 348987DEST_PATH_IMAGE011
Ratio of (a) to (b)
Figure 194583DEST_PATH_IMAGE013
Taking the ratio as the distribution probability of the target data points; continuing to select one data point from the unmarked data points as the target data point, and repeating the operation till the first time
Figure 142947DEST_PATH_IMAGE022
Number of stagesAll data points in a segment are labeled.
According to the first
Figure 845324DEST_PATH_IMAGE001
Calculating the distribution probability of the target data points in the segment data segment
Figure 377674DEST_PATH_IMAGE001
And if the data point distribution characteristic index of the segment data segment is the following formula:
Figure 495803DEST_PATH_IMAGE076
wherein the content of the first and second substances,
Figure 564253DEST_PATH_IMAGE024
is a first
Figure 139329DEST_PATH_IMAGE001
Data point distribution characteristic indexes of the segment data segments;
Figure 394861DEST_PATH_IMAGE025
is as follows
Figure 582260DEST_PATH_IMAGE001
The number of target data points in the segment data segment;
Figure 505216DEST_PATH_IMAGE026
is as follows
Figure 549396DEST_PATH_IMAGE027
Distribution probability of each target data point.
Similarly, the data point distribution characteristic index of each segment of data segment can be obtained according to the method. The denser the data point distribution in the data segment is, the smaller the set interval length is, on the contrary, the looser the data point distribution is, the larger the set interval length is, therefore, the interval length of the data segment is adjusted according to the data point distribution characteristic index of the data segment, and the number of each segment is obtainedThe adaptive interval length of the segment is then
Figure 790759DEST_PATH_IMAGE001
The calculation formula of the self-adaptive interval length of the segment data segment is as follows:
Figure 516269DEST_PATH_IMAGE077
wherein the content of the first and second substances,
Figure 293733DEST_PATH_IMAGE030
is as follows
Figure 508813DEST_PATH_IMAGE001
The adaptive interval length of the segment data segment;
Figure 508911DEST_PATH_IMAGE031
is as follows
Figure 772533DEST_PATH_IMAGE001
Interval length of segment data segments;
Figure 466820DEST_PATH_IMAGE024
is shown as
Figure 56064DEST_PATH_IMAGE001
Data point distribution characteristic indexes of the segment data segments;
Figure 507905DEST_PATH_IMAGE034
is as follows
Figure 136070DEST_PATH_IMAGE035
Data point distribution characteristic indexes of the segment data segments;
Figure 888126DEST_PATH_IMAGE033
representing hyper-parameters for adjusting
Figure 648271DEST_PATH_IMAGE078
Taking the value of (A), taking the empirical reference value
Figure 852988DEST_PATH_IMAGE079
Figure 488106DEST_PATH_IMAGE080
Is shown as
Figure 173297DEST_PATH_IMAGE001
Sign functions of segment data segments, i.e.
Figure 602879DEST_PATH_IMAGE081
,
Figure 294891DEST_PATH_IMAGE082
Is a set data distribution characteristic threshold.
Adjusting according to the self-adaptive interval length of each segment of data segment to obtain a new data segment, and counting the number of data points in each segment of new data segment to obtain the second data segment
Figure 31903DEST_PATH_IMAGE001
Number of data points of segment data segment
Figure 492971DEST_PATH_IMAGE083
Calculating land categories respectively
Figure 822016DEST_PATH_IMAGE006
The following adaptive neighborhood range of each sensitive data point corresponds to: obtaining the centroid of each new data segment, taking the centroid as a central data point, respectively calculating the Euclidean distance between the data point of the current sensitive data and each central data point, taking the new data segment corresponding to the shortest Euclidean distance as a target data segment of the current sensitive data, and calculating the self-adaptive neighborhood range of the data point of the current sensitive data according to the data point number of the target data segment and the shortest Euclidean distance, wherein the calculation formula is as follows:
Figure 1325DEST_PATH_IMAGE036
wherein, in the process,
Figure 276448DEST_PATH_IMAGE037
an adaptive neighborhood range for a data point of current sensitive data;
Figure 326444DEST_PATH_IMAGE038
the number of data points of the target data segment J;
Figure 864873DEST_PATH_IMAGE039
the Euclidean distance is the shortest Euclidean distance in the Euclidean distances between the data point of the current sensitive data and each central data point;
Figure 561171DEST_PATH_IMAGE040
representing a rounding function.
(2) And calculating the position difference between the data point corresponding to the sensitive data and other data points in the self-adaptive neighborhood range of the sensitive data to obtain the local sensitive information distribution degree of the data point corresponding to each piece of sensitive data.
Specifically, the land category is determined through the step (1)
Figure 577668DEST_PATH_IMAGE006
The adaptive neighborhood range of the data point corresponding to each next sensitive data is determined according to the second
Figure 544487DEST_PATH_IMAGE043
Data point distribution in the adaptive neighborhood range of the data points corresponding to the strip sensitive data is calculated
Figure 457080DEST_PATH_IMAGE043
Degree of distribution of partial sensitive information of strip sensitive data for representing the second
Figure 142139DEST_PATH_IMAGE043
The neighborhood distribution characteristic of the data point corresponding to the strip sensitive data is
Figure 460863DEST_PATH_IMAGE043
Local sensitive information distribution degree of corresponding data points of strip sensitive data
Figure 485450DEST_PATH_IMAGE042
The computational expression of (a) is:
Figure 162419DEST_PATH_IMAGE041
wherein, the first and the second end of the pipe are connected with each other,
Figure 272458DEST_PATH_IMAGE044
is a first
Figure 394872DEST_PATH_IMAGE043
The number of data points in the adaptive neighborhood range of the data points corresponding to the strip sensitive data;
Figure 336284DEST_PATH_IMAGE045
coordinates representing a data point r within the adaptive neighborhood range;
Figure 121837DEST_PATH_IMAGE046
is as follows
Figure 781488DEST_PATH_IMAGE043
Coordinates of the bar sensitive data corresponding to the data points;
Figure 943480DEST_PATH_IMAGE018
representing the L2 norm.
(3) And obtaining the sensitive information distribution correlation degree of each piece of sensitive data according to the local sensitive information distribution degree of each piece of sensitive data in the self-adaptive neighborhood range of the data point corresponding to the sensitive data.
Specifically, the local sensitive information distribution degree of each piece of sensitive data is obtained through the step (2), and the sensitive information distribution association degree of the current sensitive data is obtained according to the local sensitive information distribution degree of other sensitive data in the adaptive neighborhood range of the current sensitive data, so that the calculation expression of the sensitive information distribution association degree is as follows:
Figure 447054DEST_PATH_IMAGE084
wherein the content of the first and second substances,
Figure 403509DEST_PATH_IMAGE048
is a first
Figure 550456DEST_PATH_IMAGE043
The sensitive information distribution relevance of the strip sensitive data;
Figure 250559DEST_PATH_IMAGE085
is a first
Figure 369825DEST_PATH_IMAGE043
The average local sensitive information distribution degree of all the sensitive data in the self-adaptive neighborhood range of the strip sensitive data;
Figure 58033DEST_PATH_IMAGE051
is as follows
Figure 364380DEST_PATH_IMAGE043
The number of all sensitive data in the self-adaptive neighborhood range of the strip sensitive data;
Figure 868174DEST_PATH_IMAGE052
is as follows
Figure 904263DEST_PATH_IMAGE053
The strip sensitive data corresponds to the local sensitive information distribution degree of the data point.
It should be noted that the sensitive information distribution association degree is greater than 1, which indicates that the local sensitive information distribution degree of the data point corresponding to the sensitive data is smaller than the local sensitive information distribution degree of other data points in the adaptive neighborhood range; the sensitive information distribution correlation degree is less than 1, which indicates that the local sensitive information distribution degree of the data point corresponding to the sensitive data is greater than the local sensitive information distribution degree of other data points in the self-adaptive neighborhood range.
S003, acquiring the sensitive information distribution association degrees of all the sensitive data; respectively calculating data offset corresponding to the soil flow area and the land transaction price according to the sensitive information distribution association degree of each piece of sensitive data, and obtaining the offset land flow area and the offset land transaction price according to the data offset; and setting the data offset of the land circulation area of the insensitive data and the land transaction price to be 0.
Specifically, in order to safely store the sensitive data, the method of step S002 is used to obtain the sensitive information distribution relevancy of all the sensitive data in the N pieces of land circulation information, and the data offset calculation is performed on the sensitive data, so that the sensitive data is hidden in the insensitive data. The larger the sensitive information distribution relevance of the sensitive data is, the smaller the local sensitive information distribution of the data point of the sensitive data is, the smaller the local sensitive information distribution of other data points in the adaptive neighborhood range of the sensitive data is, the more sparse the sensitive data is, and the larger the data offset to be adjusted is; the smaller the sensitive information distribution correlation degree of the sensitive data is, the greater the local sensitive information distribution degree of the data point of the sensitive data is than that of other data points in the adaptive neighborhood range, and the closer the sensitive data is, the smaller the data offset required to be adjusted is.
Respectively calculating data offset corresponding to the soil circulation area and the land transaction price according to the sensitive information distribution relevance of each piece of sensitive data, and obtaining the land circulation area after the offset and the land transaction price after the offset according to the data offset
Figure 202520DEST_PATH_IMAGE086
Taking bar sensitive data as an example, the calculation formula of the land circulation area after the deviation is as follows:
Figure 760279DEST_PATH_IMAGE087
wherein the content of the first and second substances,
Figure 67763DEST_PATH_IMAGE055
is as follows
Figure 958359DEST_PATH_IMAGE043
The land circulation area after the deviation corresponding to the strip sensitive data;
Figure 693097DEST_PATH_IMAGE056
is a land category of
Figure 472571DEST_PATH_IMAGE006
To (1) a
Figure 114905DEST_PATH_IMAGE043
The land circulation area of the bar sensitive data;
Figure 797691DEST_PATH_IMAGE057
as the land category
Figure 703330DEST_PATH_IMAGE006
The land circulation area average value of the lower insensitive data;
Figure 533882DEST_PATH_IMAGE088
is a hyperbolic tangent function;
Figure 433704DEST_PATH_IMAGE058
is as follows
Figure 33312DEST_PATH_IMAGE043
Data offset of land circulation area of the bar sensitive data;
Figure 109853DEST_PATH_IMAGE059
is as follows
Figure 365385DEST_PATH_IMAGE043
A first data sensitivity level of the strip sensitive data.
The conditions are as follows
Figure 615100DEST_PATH_IMAGE089
Is shown as
Figure 771013DEST_PATH_IMAGE086
Strip sensitive data satisfaction soilSensitivity requirement of land circulation area and average land circulation area
Figure 752875DEST_PATH_IMAGE090
The difference of (a) is negative, and the offset needs to be increased; condition
Figure 761283DEST_PATH_IMAGE091
Is shown as
Figure 549110DEST_PATH_IMAGE043
The strip sensitive data meets the sensitivity requirement of the land circulation area and is equal to the average land circulation area
Figure 60994DEST_PATH_IMAGE008
Is positive, the offset needs to be reduced.
The calculation formula of the land transaction price after deviation is as follows:
Figure 977872DEST_PATH_IMAGE092
wherein the content of the first and second substances,
Figure 473576DEST_PATH_IMAGE062
is as follows
Figure 799515DEST_PATH_IMAGE043
The shifted land transaction price corresponding to the strip sensitive data;
Figure 165905DEST_PATH_IMAGE063
is a land category of
Figure 519264DEST_PATH_IMAGE006
To (1)
Figure 33422DEST_PATH_IMAGE043
The land transaction price of the bar sensitive data;
Figure 100735DEST_PATH_IMAGE064
is a land category
Figure 852790DEST_PATH_IMAGE006
The land transaction price mean value of the lower insensitive data;
Figure 409674DEST_PATH_IMAGE088
is a hyperbolic tangent function;
Figure 118784DEST_PATH_IMAGE065
is as follows
Figure 989789DEST_PATH_IMAGE043
Data offset of land bargain price of the strip sensitive data;
Figure 596350DEST_PATH_IMAGE066
is as follows
Figure 792976DEST_PATH_IMAGE043
A second data sensitivity level of the strip sensitive data.
The conditions are as follows
Figure 983524DEST_PATH_IMAGE093
Is shown as
Figure 658219DEST_PATH_IMAGE043
The strip sensitive data meets the sensitive requirement of land transaction price and the average land transaction price
Figure 119287DEST_PATH_IMAGE094
The difference of (c) is negative, and the offset needs to be increased; conditions of
Figure 486815DEST_PATH_IMAGE095
Denotes the first
Figure 164659DEST_PATH_IMAGE086
The strip sensitive data meets the sensitivity requirement of the land transaction price and the average land transaction price
Figure 377465DEST_PATH_IMAGE094
Is positive, the offset needs to be reduced.
The calculation formula of the land circulation area after the deviation and the land transaction price after the deviation is obtained
Figure 693040DEST_PATH_IMAGE086
Data offset of land transaction price of bar sensitive data
Figure 762627DEST_PATH_IMAGE065
Data offset from land circulation area
Figure 163653DEST_PATH_IMAGE058
And further, the data offset of the land transaction price and the data offset of the land circulation area of each piece of sensitive data can be obtained. Meanwhile, the data offset of the land circulation area and the land transaction price for setting the insensitive data is 0.
Step S004, obtaining N key matrixes of land circulation information according to the data offset of the sensitive data and the insensitive data; and coding and compressing the digital semantic label corresponding to each piece of land circulation information, the land circulation area after deviation and the land transaction price after deviation, and respectively storing the compressed land circulation information and the key matrix.
Specifically, for sensitive data, binary coding is respectively carried out on data offset of land traffic price and data offset of land circulation area, a key matrix is generated by the binary coding, the size of the key matrix is 2 × c, c is the maximum value of binary coding digits of the two data offsets, and if the binary coding digits are not enough, 0 is supplemented to the highest digit of the binary coding; and carrying out binary coding on the data offset of the land transaction price and the data offset of the land circulation area of each piece of sensitive data, so that one piece of sensitive data corresponds to one key matrix.
Similarly, binary coding is also performed on the data offset of each piece of insensitive data, and all the digits after binary coding are 0 because the data offsets of the land circulation area and the land transaction price of the insensitive data are both 0. And combining and splicing the key matrixes of the sensitive data and the insensitive data to form an integral key matrix, wherein the size of the integral key matrix is (2 x N) x C, C refers to the maximum value of the binary coding number in the sensitive data and the insensitive data, the binary coding number is insufficient, and 0 is supplemented to the highest bit of the binary coding.
Further, respectively encoding and compressing the digital semantic tags corresponding to the N pieces of land circulation information, the deflected land circulation area and the deflected land transaction price, wherein the deflected land circulation area and the deflected land transaction price corresponding to the insensitive data in the N pieces of land circulation information are original data; and respectively storing the compressed land circulation information and the integral key matrix in two databases, namely, one database stores the compressed land circulation information, and the other database stores the integral key matrix, thereby finishing the storage of the land circulation information.
In summary, the embodiment of the present invention provides a method for securely storing land circulation information, which divides sensitive data and insensitive data according to data characteristics of the land circulation information, adaptively obtains a sensitive information distribution association degree of the sensitive data according to distribution characteristics of the sensitive data and the insensitive data, and performs sensitive data migration by using the sensitive information distribution association degree, so that the sensitive data is hidden in the insensitive data, and performs key acquisition and data encoding compression on the corresponding data on the land circulation information after data migration, thereby implementing simple and efficient secure storage of the land circulation information, having a higher confidentiality degree, and improving security of data storage.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit of the present invention are intended to be included therein.

Claims (10)

1. A safe storage method of land circulation information is characterized by comprising the following steps:
collecting N pieces of land circulation information, wherein N is a positive integer, constructing a data matrix according to various information attributes in each piece of land circulation information, wherein each row in the data matrix represents one piece of land circulation information, and each column represents one information attribute; setting a digital semantic label for text data of each piece of land circulation information under the information attribute in the data matrix; determining a data type corresponding to each piece of land circulation information according to the land circulation area and the land transaction price under the information attribute, wherein the data type comprises sensitive data and insensitive data;
constructing a rectangular coordinate system under the current land category by taking the land circulation area as a horizontal coordinate and the land intersection price as a vertical coordinate, and acquiring the self-adaptive neighborhood range of each sensitive data point under the current land category according to the distribution of the data points in the rectangular coordinate system; calculating the position difference between the data point corresponding to the current sensitive data and other data points in the self-adaptive neighborhood range of the current sensitive data to obtain the local sensitive information distribution degree of the data point corresponding to the current sensitive data, and obtaining the sensitive information distribution association degree of the current sensitive data according to the local sensitive information distribution degree of each piece of sensitive data in the self-adaptive neighborhood range of the data point corresponding to the current sensitive data; acquiring the sensitive information distribution association degree of all sensitive data; respectively calculating data offset corresponding to the soil circulation area and the land transaction price according to the sensitive information distribution relevance of each piece of sensitive data, and obtaining the offset land circulation area and the offset land transaction price according to the data offset; setting the data offset of the land circulation area and the land transaction price of the insensitive data as 0;
obtaining a key matrix of N pieces of land circulation information according to the data offset of the sensitive data and the insensitive data; and coding and compressing the digital semantic label corresponding to each piece of land circulation information, the land circulation area after the deviation and the land transaction price after the deviation, and respectively storing the compressed land circulation information and the key matrix.
2. The method for safely storing the land circulation information as claimed in claim 1, wherein the method for determining the data type corresponding to each piece of land circulation information according to the land circulation area and the land circulation price under the information attribute comprises the following steps:
respectively calculating the average land transaction price and the average land circulation area of a unit area under each land category based on the historically stored land circulation information;
according to the first
Figure 466519DEST_PATH_IMAGE001
Calculating a first data sensitivity degree of the land circulation area and a second data sensitivity degree of the land transaction price respectively according to the land category, the land circulation area and the land transaction price under the land circulation information, wherein the calculation expressions of the first data sensitivity degree and the second data sensitivity degree are as follows:
Figure 180397DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 569921DEST_PATH_IMAGE003
is a first
Figure 28713DEST_PATH_IMAGE004
A first data sensitivity degree of the strip land circulation information;
Figure 702140DEST_PATH_IMAGE005
is as follows
Figure 72073DEST_PATH_IMAGE004
A second data sensitivity level of the strip land circulation information;
Figure 447428DEST_PATH_IMAGE006
is a land category of
Figure 239804DEST_PATH_IMAGE007
To (1) a
Figure 456153DEST_PATH_IMAGE004
Land circulation area of the bar land circulation information;
Figure 557839DEST_PATH_IMAGE008
is a land category of
Figure 108906DEST_PATH_IMAGE007
To (1)
Figure 658967DEST_PATH_IMAGE004
The land transaction price of the land circulation information;
Figure 775827DEST_PATH_IMAGE009
representing land categories
Figure 313994DEST_PATH_IMAGE007
Average land circulation area under;
Figure 899827DEST_PATH_IMAGE010
representing land categories
Figure 971688DEST_PATH_IMAGE007
The average land transaction price per unit area is lower;
Figure 729341DEST_PATH_IMAGE011
representing a hyperbolic tangent function;
setting a first data sensitivity threshold and a second data sensitivity threshold respectively, and confirming the first data sensitivity degree is larger than or equal to the first data sensitivity threshold or the second data sensitivity degree is larger than or equal to the second data sensitivity threshold
Figure 611977DEST_PATH_IMAGE004
The strip land circulation information belongs to sensitive data; when the first data sensitivity degree is less than the first data sensitivity threshold value and the second data sensitivity degree is less than the second data sensitivity threshold value, confirming that the first data sensitivity degree is less than the first data sensitivity threshold value
Figure 199953DEST_PATH_IMAGE004
The strip land circulation information belongs to insensitive data.
3. The method for safely storing land circulation information as claimed in claim 1, wherein the method for obtaining the adaptive neighborhood range of the data point corresponding to each sensitive data under the current land category according to the distribution of the data points in the rectangular coordinate system comprises:
performing trend line fitting on clusters formed by insensitive data in an orthogonal coordinate system to obtain trend lines, and performing equal division on the trend lines to initially divide the clusters into 10 segments of data segments to obtain the second segment
Figure 324773DEST_PATH_IMAGE001
Interval length and number of segment data segments
Figure 166958DEST_PATH_IMAGE004
The total number of data points in the segment data segment;
is selected to be
Figure 797660DEST_PATH_IMAGE001
Any data point in the segment data segment is a target numberThe data points are taken as the circle centers, circles corresponding to the target data points are obtained by utilizing the set radiuses, the data similarity between the target data points and other data points in the circles is respectively calculated, the data points with the data similarity larger than a data similarity threshold value are marked, the target data points are also marked, the total number of the marked data points is counted
Figure 794303DEST_PATH_IMAGE012
Calculating the first
Figure 286596DEST_PATH_IMAGE004
Total number of data points in segment data segment
Figure 232555DEST_PATH_IMAGE013
And total number of labeled data points
Figure 221109DEST_PATH_IMAGE012
Ratio of (a) to (b)
Figure 268830DEST_PATH_IMAGE014
Taking the ratio as the distribution probability of the target data points;
from the first
Figure 282923DEST_PATH_IMAGE001
Continuously selecting one data point as a target data point from the unmarked data points in the segment data segment to obtain the distribution probability of a plurality of target data points; according to the first
Figure 795638DEST_PATH_IMAGE004
The distribution probability and the interval length of the target data points in the segment data segment are obtained
Figure 518875DEST_PATH_IMAGE004
The adaptive interval length of the segment data segment;
acquiring the self-adaptive interval length of each segment of data segment to re-divide the clusters to obtain new data segments; and obtaining the self-adaptive neighborhood range of the data point corresponding to each piece of sensitive data according to the position between the new data segment and the data point corresponding to the sensitive data and the number of the data points in the new data segment.
4. The safe storage method of land circulation information as claimed in claim 3, wherein the calculation formula of the data similarity is:
Figure 240843DEST_PATH_IMAGE015
wherein, the first and the second end of the pipe are connected with each other,
Figure 307894DEST_PATH_IMAGE016
are data points
Figure 979178DEST_PATH_IMAGE017
And data points
Figure 60266DEST_PATH_IMAGE018
Data similarity between them;
Figure 315536DEST_PATH_IMAGE019
represents the L2 norm;
Figure 484480DEST_PATH_IMAGE020
is a data point
Figure 508806DEST_PATH_IMAGE017
The coordinates of (a);
Figure 495216DEST_PATH_IMAGE021
is a data point
Figure 4826DEST_PATH_IMAGE018
The coordinates of (a);
Figure 367674DEST_PATH_IMAGE022
is a natural constant.
5. A method for securely storing land circulation information according to claim 3, wherein said method is based on the second principle
Figure 783524DEST_PATH_IMAGE001
The distribution probability and the interval length of the target data points in the segment data segment are obtained
Figure 488306DEST_PATH_IMAGE001
A method of adaptive gap length for a segment data segment, comprising:
according to the first
Figure 672163DEST_PATH_IMAGE004
Calculating the distribution probability of the target data points in the segment data segment
Figure 150286DEST_PATH_IMAGE004
And if the data point distribution characteristic index of the segment data segment is the following, the calculation formula of the data point distribution characteristic index is as follows:
Figure 588352DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 244461DEST_PATH_IMAGE024
is as follows
Figure 696040DEST_PATH_IMAGE004
Data point distribution characteristic indexes of the segment data segments;
Figure DEST_PATH_IMAGE025
is a first
Figure 276057DEST_PATH_IMAGE004
Target data points in segment data segmentThe number of (2);
Figure 332744DEST_PATH_IMAGE026
is as follows
Figure 644908DEST_PATH_IMAGE027
The distribution probability of each target data point;
acquiring data point distribution characteristic index of each data segment according to
Figure 600094DEST_PATH_IMAGE004
The interval length of the segment data segments and the data point distribution characteristic index of each segment data segment are calculated
Figure 375282DEST_PATH_IMAGE004
Adaptive interval length of segment data segment, then
Figure 850256DEST_PATH_IMAGE004
The calculation formula of the self-adaptive interval length of the segment data segment is as follows:
Figure 785851DEST_PATH_IMAGE028
wherein, the first and the second end of the pipe are connected with each other,
Figure 477602DEST_PATH_IMAGE029
is as follows
Figure 805946DEST_PATH_IMAGE004
The adaptive interval length of the segment data segment;
Figure 384695DEST_PATH_IMAGE030
is a first
Figure 333934DEST_PATH_IMAGE004
Interval length of segment data segment;
Figure DEST_PATH_IMAGE031
is a first
Figure 794871DEST_PATH_IMAGE004
A sign function of the segment data segment;
Figure 441753DEST_PATH_IMAGE032
representing a hyper-parameter;
Figure 891320DEST_PATH_IMAGE022
is a natural constant;
Figure 220582DEST_PATH_IMAGE024
is as follows
Figure 309761DEST_PATH_IMAGE004
Data point distribution characteristic indexes of the segment data segments;
Figure 776646DEST_PATH_IMAGE033
is as follows
Figure 516937DEST_PATH_IMAGE034
Data point distribution characteristic index of the segment data segment.
6. A method for securely storing land circulation information according to claim 3, wherein the method for obtaining the adaptive neighborhood range of the data point corresponding to each sensitive data according to the position between the new data segment and the data point corresponding to the sensitive data and the number of data points in the new data segment comprises:
acquiring the centroid of each new data segment, taking the centroid as a central data point, respectively calculating the Euclidean distance between the data point of the current sensitive data and each central data point, and taking the new data segment corresponding to the shortest Euclidean distance as a target data segment of the current sensitive data;
calculating the data points of the current sensitive data according to the data point quantity and the shortest Euclidean distance of the target data segmentThe adaptive neighborhood range of (2) is calculated as:
Figure 761974DEST_PATH_IMAGE035
wherein, in the step (A),
Figure 417077DEST_PATH_IMAGE036
an adaptive neighborhood range for a data point of current sensitive data;
Figure 655030DEST_PATH_IMAGE037
the number of data points of the target data segment J;
Figure 548030DEST_PATH_IMAGE038
the Euclidean distance is the shortest Euclidean distance in the Euclidean distances between the data point of the current sensitive data and each central data point;
Figure 229547DEST_PATH_IMAGE039
representing a rounding function.
7. The safe storage method of land circulation information, as claimed in claim 1, wherein the calculation formula of the local sensitive information distribution degree of the data points corresponding to the current sensitive data is:
Figure 339324DEST_PATH_IMAGE040
wherein the content of the first and second substances,
Figure 882432DEST_PATH_IMAGE041
is as follows
Figure 122615DEST_PATH_IMAGE042
The strip sensitive data corresponds to the local sensitive information distribution degree of the data point;
Figure 975033DEST_PATH_IMAGE043
is a first
Figure 73570DEST_PATH_IMAGE042
The number of data points in the adaptive neighborhood range of the data points corresponding to the strip sensitive data;
Figure 856587DEST_PATH_IMAGE044
coordinates representing a data point r within the adaptive neighborhood range;
Figure 973448DEST_PATH_IMAGE045
is a first
Figure 747500DEST_PATH_IMAGE042
Coordinates of the bar sensitive data corresponding to the data points;
Figure 566289DEST_PATH_IMAGE019
representing the L2 norm.
8. The method for safely storing land circulation information as claimed in claim 1, wherein the calculation formula of the distribution relevancy of the sensitive information of the current sensitive data is as follows:
Figure 716779DEST_PATH_IMAGE046
wherein the content of the first and second substances,
Figure 688146DEST_PATH_IMAGE047
is as follows
Figure 866055DEST_PATH_IMAGE042
The sensitive information distribution relevance of the strip sensitive data;
Figure 939184DEST_PATH_IMAGE048
is as follows
Figure 132180DEST_PATH_IMAGE042
Adaptive neighborhood range for strip sensitive data
Figure 426896DEST_PATH_IMAGE049
Average local sensitive information distribution degree of all sensitive data in the system;
Figure 277171DEST_PATH_IMAGE050
is as follows
Figure 336132DEST_PATH_IMAGE042
Adaptive neighborhood range for strip sensitive data
Figure DEST_PATH_IMAGE051
The amount of all sensitive data in;
Figure 156320DEST_PATH_IMAGE052
is as follows
Figure 351547DEST_PATH_IMAGE053
The strip sensitive data corresponds to the local sensitive information distribution degree of the data point;
Figure 638303DEST_PATH_IMAGE041
is as follows
Figure 669713DEST_PATH_IMAGE042
The strip sensitive data corresponds to the local sensitive information distribution degree of the data point.
9. The method for safely storing land circulation information as claimed in claim 2, wherein the calculation formula of the land circulation area after the deviation is as follows:
Figure 729810DEST_PATH_IMAGE054
wherein, the first and the second end of the pipe are connected with each other,
Figure 749850DEST_PATH_IMAGE055
is as follows
Figure 67279DEST_PATH_IMAGE042
The land circulation area after the deviation corresponding to the strip sensitive data;
Figure 789248DEST_PATH_IMAGE056
is a land category of
Figure 357763DEST_PATH_IMAGE007
To (1)
Figure 527582DEST_PATH_IMAGE042
Land circulation area of bar sensitive data;
Figure 156141DEST_PATH_IMAGE057
is a land category
Figure 162143DEST_PATH_IMAGE007
The land circulation area average value of the lower insensitive data;
Figure 767306DEST_PATH_IMAGE058
is as follows
Figure 558675DEST_PATH_IMAGE042
Data offset of land circulation area of the bar sensitive data;
Figure 856670DEST_PATH_IMAGE059
is as follows
Figure 84389DEST_PATH_IMAGE042
A first data sensitivity level of the strip sensitive data;
Figure 994708DEST_PATH_IMAGE060
is a first data sensitivity threshold.
10. The method for securely storing land circulation information according to claim 2, wherein the calculation formula of the biased land transaction price is as follows:
Figure 879399DEST_PATH_IMAGE061
wherein the content of the first and second substances,
Figure 849760DEST_PATH_IMAGE062
is a first
Figure 830355DEST_PATH_IMAGE042
The shifted land transaction price corresponding to the strip sensitive data;
Figure 777320DEST_PATH_IMAGE063
is a land category of
Figure 277702DEST_PATH_IMAGE007
To (1) a
Figure 183079DEST_PATH_IMAGE042
The land transaction price of the bar sensitive data;
Figure 119811DEST_PATH_IMAGE064
is a land category
Figure 434249DEST_PATH_IMAGE007
The land transaction price mean value of the lower insensitive data;
Figure 287673DEST_PATH_IMAGE065
is as follows
Figure 599837DEST_PATH_IMAGE042
Data offset of land bargain price of the strip sensitive data;
Figure 532852DEST_PATH_IMAGE066
is as follows
Figure 572353DEST_PATH_IMAGE042
A second data sensitivity level of the strip sensitive data;
Figure 47327DEST_PATH_IMAGE067
is a second data sensitivity threshold.
CN202210971299.XA 2022-08-15 2022-08-15 Safe storage method for land circulation information Active CN115048682B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210971299.XA CN115048682B (en) 2022-08-15 2022-08-15 Safe storage method for land circulation information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210971299.XA CN115048682B (en) 2022-08-15 2022-08-15 Safe storage method for land circulation information

Publications (2)

Publication Number Publication Date
CN115048682A CN115048682A (en) 2022-09-13
CN115048682B true CN115048682B (en) 2022-11-01

Family

ID=83166479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210971299.XA Active CN115048682B (en) 2022-08-15 2022-08-15 Safe storage method for land circulation information

Country Status (1)

Country Link
CN (1) CN115048682B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442155B (en) * 2022-10-27 2023-01-31 深圳市光联世纪信息科技有限公司 Data encryption method and system for SD-WAN

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881611A (en) * 2014-02-28 2015-09-02 国际商业机器公司 Method and apparatus for protecting sensitive data in software product
CN110502602A (en) * 2019-08-14 2019-11-26 平安科技(深圳)有限公司 Date storage method, device, equipment and computer storage medium
CN112579523A (en) * 2020-12-15 2021-03-30 广东后海控股股份有限公司 Rural land circulation management system based on block chain technology
CN114328640A (en) * 2021-02-07 2022-04-12 湖南科技学院 Differential privacy protection and data mining method and system based on mobile user dynamic sensitive data
CN114626097A (en) * 2022-03-22 2022-06-14 中国平安人寿保险股份有限公司 Desensitization method, desensitization device, electronic apparatus, and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2417499A1 (en) * 2000-07-27 2002-02-07 Activated Content Corporation Stegotext encoder and decoder
US20220222368A1 (en) * 2019-05-14 2022-07-14 Equifax Inc. Data protection via attributes-based aggregation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881611A (en) * 2014-02-28 2015-09-02 国际商业机器公司 Method and apparatus for protecting sensitive data in software product
CN110502602A (en) * 2019-08-14 2019-11-26 平安科技(深圳)有限公司 Date storage method, device, equipment and computer storage medium
CN112579523A (en) * 2020-12-15 2021-03-30 广东后海控股股份有限公司 Rural land circulation management system based on block chain technology
CN114328640A (en) * 2021-02-07 2022-04-12 湖南科技学院 Differential privacy protection and data mining method and system based on mobile user dynamic sensitive data
CN114626097A (en) * 2022-03-22 2022-06-14 中国平安人寿保险股份有限公司 Desensitization method, desensitization device, electronic apparatus, and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
不同主体视角下农地流转的风险识别及评价研究――基于上海涉农郊区的调研;牛星等;《中国农业资源与区划》;20180525(第05期);全文 *
基于数据敏感性的大数据存储安全技术;胡志达;《移动通信》;20200815(第08期);全文 *

Also Published As

Publication number Publication date
CN115048682A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN108920720B (en) Large-scale image retrieval method based on depth hash and GPU acceleration
CN104008174B (en) A kind of secret protection index generation method of massive image retrieval
CN107085607B (en) Image feature point matching method
CN104036012B (en) Dictionary learning, vision bag of words feature extracting method and searching system
Caruso et al. Deprivation and the dimensionality of welfare: a variable‐selection cluster‐analysis approach
Pan et al. Product quantization with dual codebooks for approximate nearest neighbor search
CN107784110A (en) A kind of index establishing method and device
CN113869052B (en) AI-based house address matching method, storage medium and equipment
CN111191051B (en) Method and system for constructing emergency knowledge map based on Chinese word segmentation technology
CN115048682B (en) Safe storage method for land circulation information
Erpolat Taşabat A Novel Multicriteria Decision‐Making Method Based on Distance, Similarity, and Correlation: DSC TOPSIS
WO2023024408A1 (en) Method for determining feature vector of user, and related device and medium
CN102693258A (en) High-accuracy similarity search system
De Stefano et al. An adaptive weighted majority vote rule for combining multiple classifiers
CN114943285B (en) Intelligent auditing system for internet news content data
CN108256058B (en) Real-time response big media neighbor retrieval method based on micro-computing platform
CN113742495B (en) Rating feature weight determining method and device based on prediction model and electronic equipment
CN115186138A (en) Comparison method and terminal for power distribution network data
CN110147497B (en) Individual content recommendation method for teenager group
CN113220936A (en) Intelligent video recommendation method and device based on random matrix coding and simplified convolutional network and storage medium
CN110796546A (en) Distributed clustering algorithm based on block chain
CN113627598B (en) Twin self-encoder neural network algorithm and system for accelerating recommendation
CN117725102B (en) Digital ticket management method and system based on artificial intelligence
CN114297250B (en) Bidder ring group naming method based on frequency
PEŁKA et al. Symbolic Ensemble Clustering And Linear Ordering Of European Countries According To Their Economic Freedom

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant