CN114329154A - Safety search method based on data stored by cloud server - Google Patents
Safety search method based on data stored by cloud server Download PDFInfo
- Publication number
- CN114329154A CN114329154A CN202111650058.7A CN202111650058A CN114329154A CN 114329154 A CN114329154 A CN 114329154A CN 202111650058 A CN202111650058 A CN 202111650058A CN 114329154 A CN114329154 A CN 114329154A
- Authority
- CN
- China
- Prior art keywords
- search
- data
- user
- cloud server
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000005070 sampling Methods 0.000 claims description 21
- 235000019580 granularity Nutrition 0.000 claims description 19
- 241000405147 Hermes Species 0.000 claims description 16
- 238000012423 maintenance Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 7
- 230000001133 acceleration Effects 0.000 claims description 6
- 238000005259 measurement Methods 0.000 abstract description 3
- 238000011158 quantitative evaluation Methods 0.000 abstract description 3
- 238000011160 research Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Abstract
The invention relates to the technical field of search methods, in particular to a safe search method for data stored on the basis of a cloud server, which researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three big dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution scheme aiming at general search and from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like.
Description
Technical Field
The invention relates to the technical field of search methods, in particular to a safe search method based on data stored by a cloud server.
Background
Data retrieval is the most common data service and is also the basis for carrying out complex statistical operations such as further analysis and mining on data. Taking this as an example, given a data scan operation of 1PB data, even with a high performance computer, several tens of hours of time overhead is required, no matter which is subjected to a complicated analysis operation. From the search requirements of users, with the pace of life and work of people being accelerated, the user greatly changes the search period, and compared with the method of waiting for an accurate solution boring away, people expect to obtain a quick search experience meeting the precision requirement. From the characteristics of big data, because the big data is often formed by multi-source aggregation, the period mostly contains information such as noise and redundancy, and the big data is mostly open in a limited way (that is, because the big data may contain sensitive/private information, a data owner needs to anonymize the data information before issuing the big data), the expectation of carrying out accurate statistical analysis on the big data is neither practical nor possible. From the search technology, the traditional search engine technology mainly faces to Webl.0 static Web pages, is 'existence scanning search' based on keywords, and cannot support the application of big data with 4V characteristics facing to Web2 and 0/3.0 and meet the search requirement of users on high speed and high precision. These problems have prompted the search of new fast, high-precision big data search techniques for privacy protection.
In addition, as the big data has the characteristics of huge size, rich connotation knowledge and limited openness, the requirement of a user on big data search is not a single-dimensional existence solution, but is evaluated from multiple dimensions such as the precision and the availability of search results and the timeliness of search, so that the traditional search evaluation index aiming at the single dimension is not suitable for evaluating the big data search. On one hand, the user considers the precision and the search time efficiency of the big data search result, and the binary requirement is difficult to be simultaneously met in the practical scene with limited computing resources; on the other hand, the data owner sets differentiated privacy protection strength for data search requests of users with different roles. At present, a common ruler is not available for measuring the privacy protection strength and the search precision, so that the two parties are in impasse.
Disclosure of Invention
The invention provides a safe searching method based on data stored by a cloud server, aiming at the problems in the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a safe searching method based on data stored in a cloud server, which comprises the following steps:
the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user;
secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user;
step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user;
step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J;
step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism;
and step six, the data platform returns the task result to the search engine Hermes and presents the task result to the user.
Preferably, the search engine Hermes comprises a search evaluation module, an approximate search module and a search maintenance module.
Preferably, the search evaluation module is responsible for bridging the user and the data platform, waiting for the search request of the user and analyzing the required resources; and intermittently collecting the state information of the data platform, and finally forming a feasible search plan.
Preferably, the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, wherein the operation layer is composed of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation exists that a corresponding statistic formed based on a specific data sampling algorithm is Op, the Op can be included in the operation layer in the form of a component; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides a plurality of sampling techniques.
Preferably, the plurality of sampling techniques include Bernoulli sampling, bootstrap, knife cutting.
Preferably, the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of incremental release of big data, the search maintenance module gives the applicability of the historical results to isomorphic search of new version data from the point of granularity of privacy protection and stability of the historical results.
Preferably, the data privacy protection granularity is the control of the data owner on the information publishing granularity, that is, information with different granularities is provided according to the difference between the requirements and the permissions of the terminal information users.
Preferably, in step three, the search engine Hermes performs "negotiation" with the user to modify the search parameters, that is, the user is required to modify the keywords of the search requirement.
The invention has the beneficial effects that:
the invention provides a safe searching method based on data stored in a cloud server, which comprises the following steps: the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user; secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user; step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user; step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J; step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism; the invention researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like aiming at general retrieval.
Drawings
FIG. 1 is a block diagram of the framework of the present invention.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention. The present invention is described in detail below with reference to the attached drawings.
As shown in fig. 1, the secure search method based on the data stored in the cloud server provided by the present invention includes the following steps: the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user; secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user; step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user; step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J; step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism; the invention researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like aiming at general retrieval.
In this embodiment, the search engine Hermes includes a search evaluation module, an approximate search module, and a search maintenance module, where the search evaluation module is responsible for bridging a user and a data platform, waiting for a search request from the user, and analyzing resources required by the user; intermittently collecting state information of a data platform to finally form a feasible search plan, wherein the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, the operation layer consists of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation that corresponding statistics formed based on a specific data sampling algorithm are Op exists, the Op can be brought into the operation layer in the form of the components; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides various sampling technologies, the various sampling technologies comprise Bernoulli sampling, a bootstrap method and a cutting method, the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of incremental release of large data, the search maintenance module starts from the granularity of privacy protection and the stability of the historical results and provides the applicability of the historical results relative to isomorphic search of new version data.
In this embodiment, the data privacy protection granularity is the management and control of data owner to the information issuing granularity, and according to the difference of terminal information user's demand and authority, provide the information of different granularities promptly, data owner can set for the privacy protection granularity of different parameters according to the importance of data, in the high in the clouds is stored, can prevent that the attacker from stealing data.
In this embodiment, in step three, the search engine Hermes and the user perform "negotiation" for modifying search parameters, that is, the user is required to modify the keywords of the search requirement.
Although the present invention has been described with reference to the above preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A safe searching method based on data stored by a cloud server is characterized by comprising the following steps:
the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user;
secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user;
step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user;
step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J;
step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism;
and step six, the data platform returns the task result to the search engine Hermes and presents the task result to the user.
2. The secure search method based on the cloud server storage data of claim 1, wherein: the search engine Hermes comprises a search evaluation module, an approximate search module and a search maintenance module.
3. The secure search method based on the cloud server storage data of claim 2, wherein: the search evaluation module is responsible for bridging the user and the data platform, waiting for the search request of the user and analyzing the required resources; and intermittently collecting the state information of the data platform, and finally forming a feasible search plan.
4. The secure search method based on the cloud server storage data of claim 2, wherein: the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, wherein the operation layer is composed of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation that a corresponding statistic based on a specific data sampling algorithm is Op exists, the Op can be brought into the operation layer in the form of the components; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides a plurality of sampling techniques.
5. The secure search method based on the cloud server storage data of claim 4, wherein: the various sampling techniques include bernoulli sampling, bootstrap, knife cutting.
6. The secure search method based on the cloud server storage data of claim 2, wherein: the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of large data increment release, the search maintenance module gives the applicability of the isomorphic search of the historical results relative to new version data from the granularity of privacy protection and the stability of the historical results.
7. The secure search method based on the cloud server storage data of claim 1, wherein: the data privacy protection granularity is the control of the data owner on the information publishing granularity, namely, information with different granularities is provided according to the difference of the requirements and the authority of terminal information users.
8. The secure search method based on the cloud server storage data of claim 1, wherein: in step three, the search engine Hermes and the user perform "negotiation" for modifying search parameters, that is, the user is required to modify the keywords of the search requirement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111650058.7A CN114329154A (en) | 2021-12-30 | 2021-12-30 | Safety search method based on data stored by cloud server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111650058.7A CN114329154A (en) | 2021-12-30 | 2021-12-30 | Safety search method based on data stored by cloud server |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114329154A true CN114329154A (en) | 2022-04-12 |
Family
ID=81018973
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111650058.7A Pending CN114329154A (en) | 2021-12-30 | 2021-12-30 | Safety search method based on data stored by cloud server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114329154A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100146299A1 (en) * | 2008-10-29 | 2010-06-10 | Ashwin Swaminathan | System and method for confidentiality-preserving rank-ordered search |
CN110866275A (en) * | 2019-11-13 | 2020-03-06 | 哈尔滨工业大学 | Approximate retrieval method of big data with privacy protection |
-
2021
- 2021-12-30 CN CN202111650058.7A patent/CN114329154A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100146299A1 (en) * | 2008-10-29 | 2010-06-10 | Ashwin Swaminathan | System and method for confidentiality-preserving rank-ordered search |
CN110866275A (en) * | 2019-11-13 | 2020-03-06 | 哈尔滨工业大学 | Approximate retrieval method of big data with privacy protection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9189280B2 (en) | Tracking large numbers of moving objects in an event processing system | |
CN107145526B (en) | Reverse-nearest neighbor query processing method for geographic social keywords under road network | |
CN108509437A (en) | A kind of ElasticSearch inquiries accelerated method | |
CN106599190A (en) | Dynamic Skyline query method based on cloud computing | |
Li et al. | A cloud-based trajectory data management system | |
CN114297714A (en) | Method for data privacy protection and safe search in cloud environment | |
CN106599189A (en) | Dynamic Skyline inquiry device based on cloud computing | |
CN109145225B (en) | Data processing method and device | |
CN107704585A (en) | One kind inquiry HDFS data methods and system | |
CN108319604B (en) | Optimization method for association of large and small tables in hive | |
CN114329154A (en) | Safety search method based on data stored by cloud server | |
Liu et al. | Indexing Large Moving Objects from Past to Future with PCFI+-Index. | |
CN110866275A (en) | Approximate retrieval method of big data with privacy protection | |
CN110909072B (en) | Data table establishment method, device and equipment | |
CN104572648B (en) | A kind of storage statistical system and method based on high-performance calculation | |
CN113986545A (en) | Method and device for associating user with role | |
Huang et al. | Processing continuous K-nearest skyline query with uncertainty in spatio-temporal databases | |
Xie et al. | Construction for the city taxi trajectory data analysis system by Hadoop platform | |
Abdalla et al. | NoSQL: Robust and efficient data management on deduplication process by using a mobile application | |
CN111026747A (en) | Distributed graph data management system, method and storage medium | |
Swari et al. | Performance analysis of sales big data processing using hadoop and hive in cloud environment | |
CN115934759B (en) | Acceleration calculation method for massive multi-source heterogeneous satellite data query | |
Li et al. | Dynamic object models with spatial application | |
Shi et al. | A probabilistic range query of moving objects in road network | |
Daud et al. | Improvement from Proof Of Concept into the production environment: cater for highperformance capability |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20220412 |