CN114329154A - Safety search method based on data stored by cloud server - Google Patents

Safety search method based on data stored by cloud server Download PDF

Info

Publication number
CN114329154A
CN114329154A CN202111650058.7A CN202111650058A CN114329154A CN 114329154 A CN114329154 A CN 114329154A CN 202111650058 A CN202111650058 A CN 202111650058A CN 114329154 A CN114329154 A CN 114329154A
Authority
CN
China
Prior art keywords
search
data
user
cloud server
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111650058.7A
Other languages
Chinese (zh)
Inventor
张宏莉
周志刚
叶麟
李东
余翔湛
于海宁
方滨兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Electronic Information Engineering Research Institute of UESTC
Original Assignee
Guangdong Electronic Information Engineering Research Institute of UESTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Electronic Information Engineering Research Institute of UESTC filed Critical Guangdong Electronic Information Engineering Research Institute of UESTC
Priority to CN202111650058.7A priority Critical patent/CN114329154A/en
Publication of CN114329154A publication Critical patent/CN114329154A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to the technical field of search methods, in particular to a safe search method for data stored on the basis of a cloud server, which researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three big dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution scheme aiming at general search and from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like.

Description

Safety search method based on data stored by cloud server
Technical Field
The invention relates to the technical field of search methods, in particular to a safe search method based on data stored by a cloud server.
Background
Data retrieval is the most common data service and is also the basis for carrying out complex statistical operations such as further analysis and mining on data. Taking this as an example, given a data scan operation of 1PB data, even with a high performance computer, several tens of hours of time overhead is required, no matter which is subjected to a complicated analysis operation. From the search requirements of users, with the pace of life and work of people being accelerated, the user greatly changes the search period, and compared with the method of waiting for an accurate solution boring away, people expect to obtain a quick search experience meeting the precision requirement. From the characteristics of big data, because the big data is often formed by multi-source aggregation, the period mostly contains information such as noise and redundancy, and the big data is mostly open in a limited way (that is, because the big data may contain sensitive/private information, a data owner needs to anonymize the data information before issuing the big data), the expectation of carrying out accurate statistical analysis on the big data is neither practical nor possible. From the search technology, the traditional search engine technology mainly faces to Webl.0 static Web pages, is 'existence scanning search' based on keywords, and cannot support the application of big data with 4V characteristics facing to Web2 and 0/3.0 and meet the search requirement of users on high speed and high precision. These problems have prompted the search of new fast, high-precision big data search techniques for privacy protection.
In addition, as the big data has the characteristics of huge size, rich connotation knowledge and limited openness, the requirement of a user on big data search is not a single-dimensional existence solution, but is evaluated from multiple dimensions such as the precision and the availability of search results and the timeliness of search, so that the traditional search evaluation index aiming at the single dimension is not suitable for evaluating the big data search. On one hand, the user considers the precision and the search time efficiency of the big data search result, and the binary requirement is difficult to be simultaneously met in the practical scene with limited computing resources; on the other hand, the data owner sets differentiated privacy protection strength for data search requests of users with different roles. At present, a common ruler is not available for measuring the privacy protection strength and the search precision, so that the two parties are in impasse.
Disclosure of Invention
The invention provides a safe searching method based on data stored by a cloud server, aiming at the problems in the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a safe searching method based on data stored in a cloud server, which comprises the following steps:
the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user;
secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user;
step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user;
step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J;
step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism;
and step six, the data platform returns the task result to the search engine Hermes and presents the task result to the user.
Preferably, the search engine Hermes comprises a search evaluation module, an approximate search module and a search maintenance module.
Preferably, the search evaluation module is responsible for bridging the user and the data platform, waiting for the search request of the user and analyzing the required resources; and intermittently collecting the state information of the data platform, and finally forming a feasible search plan.
Preferably, the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, wherein the operation layer is composed of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation exists that a corresponding statistic formed based on a specific data sampling algorithm is Op, the Op can be included in the operation layer in the form of a component; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides a plurality of sampling techniques.
Preferably, the plurality of sampling techniques include Bernoulli sampling, bootstrap, knife cutting.
Preferably, the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of incremental release of big data, the search maintenance module gives the applicability of the historical results to isomorphic search of new version data from the point of granularity of privacy protection and stability of the historical results.
Preferably, the data privacy protection granularity is the control of the data owner on the information publishing granularity, that is, information with different granularities is provided according to the difference between the requirements and the permissions of the terminal information users.
Preferably, in step three, the search engine Hermes performs "negotiation" with the user to modify the search parameters, that is, the user is required to modify the keywords of the search requirement.
The invention has the beneficial effects that:
the invention provides a safe searching method based on data stored in a cloud server, which comprises the following steps: the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user; secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user; step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user; step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J; step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism; the invention researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like aiming at general retrieval.
Drawings
FIG. 1 is a block diagram of the framework of the present invention.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention. The present invention is described in detail below with reference to the attached drawings.
As shown in fig. 1, the secure search method based on the data stored in the cloud server provided by the present invention includes the following steps: the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user; secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user; step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user; step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J; step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism; the invention researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like aiming at general retrieval.
In this embodiment, the search engine Hermes includes a search evaluation module, an approximate search module, and a search maintenance module, where the search evaluation module is responsible for bridging a user and a data platform, waiting for a search request from the user, and analyzing resources required by the user; intermittently collecting state information of a data platform to finally form a feasible search plan, wherein the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, the operation layer consists of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation that corresponding statistics formed based on a specific data sampling algorithm are Op exists, the Op can be brought into the operation layer in the form of the components; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides various sampling technologies, the various sampling technologies comprise Bernoulli sampling, a bootstrap method and a cutting method, the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of incremental release of large data, the search maintenance module starts from the granularity of privacy protection and the stability of the historical results and provides the applicability of the historical results relative to isomorphic search of new version data.
In this embodiment, the data privacy protection granularity is the management and control of data owner to the information issuing granularity, and according to the difference of terminal information user's demand and authority, provide the information of different granularities promptly, data owner can set for the privacy protection granularity of different parameters according to the importance of data, in the high in the clouds is stored, can prevent that the attacker from stealing data.
In this embodiment, in step three, the search engine Hermes and the user perform "negotiation" for modifying search parameters, that is, the user is required to modify the keywords of the search requirement.
Although the present invention has been described with reference to the above preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A safe searching method based on data stored by a cloud server is characterized by comprising the following steps:
the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user;
secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user;
step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user;
step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J;
step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism;
and step six, the data platform returns the task result to the search engine Hermes and presents the task result to the user.
2. The secure search method based on the cloud server storage data of claim 1, wherein: the search engine Hermes comprises a search evaluation module, an approximate search module and a search maintenance module.
3. The secure search method based on the cloud server storage data of claim 2, wherein: the search evaluation module is responsible for bridging the user and the data platform, waiting for the search request of the user and analyzing the required resources; and intermittently collecting the state information of the data platform, and finally forming a feasible search plan.
4. The secure search method based on the cloud server storage data of claim 2, wherein: the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, wherein the operation layer is composed of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation that a corresponding statistic based on a specific data sampling algorithm is Op exists, the Op can be brought into the operation layer in the form of the components; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides a plurality of sampling techniques.
5. The secure search method based on the cloud server storage data of claim 4, wherein: the various sampling techniques include bernoulli sampling, bootstrap, knife cutting.
6. The secure search method based on the cloud server storage data of claim 2, wherein: the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of large data increment release, the search maintenance module gives the applicability of the isomorphic search of the historical results relative to new version data from the granularity of privacy protection and the stability of the historical results.
7. The secure search method based on the cloud server storage data of claim 1, wherein: the data privacy protection granularity is the control of the data owner on the information publishing granularity, namely, information with different granularities is provided according to the difference of the requirements and the authority of terminal information users.
8. The secure search method based on the cloud server storage data of claim 1, wherein: in step three, the search engine Hermes and the user perform "negotiation" for modifying search parameters, that is, the user is required to modify the keywords of the search requirement.
CN202111650058.7A 2021-12-30 2021-12-30 Safety search method based on data stored by cloud server Pending CN114329154A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111650058.7A CN114329154A (en) 2021-12-30 2021-12-30 Safety search method based on data stored by cloud server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111650058.7A CN114329154A (en) 2021-12-30 2021-12-30 Safety search method based on data stored by cloud server

Publications (1)

Publication Number Publication Date
CN114329154A true CN114329154A (en) 2022-04-12

Family

ID=81018973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111650058.7A Pending CN114329154A (en) 2021-12-30 2021-12-30 Safety search method based on data stored by cloud server

Country Status (1)

Country Link
CN (1) CN114329154A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100146299A1 (en) * 2008-10-29 2010-06-10 Ashwin Swaminathan System and method for confidentiality-preserving rank-ordered search
CN110866275A (en) * 2019-11-13 2020-03-06 哈尔滨工业大学 Approximate retrieval method of big data with privacy protection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100146299A1 (en) * 2008-10-29 2010-06-10 Ashwin Swaminathan System and method for confidentiality-preserving rank-ordered search
CN110866275A (en) * 2019-11-13 2020-03-06 哈尔滨工业大学 Approximate retrieval method of big data with privacy protection

Similar Documents

Publication Publication Date Title
US9189280B2 (en) Tracking large numbers of moving objects in an event processing system
CN107145526B (en) Reverse-nearest neighbor query processing method for geographic social keywords under road network
CN108509437A (en) A kind of ElasticSearch inquiries accelerated method
CN106599190A (en) Dynamic Skyline query method based on cloud computing
Li et al. A cloud-based trajectory data management system
CN114297714A (en) Method for data privacy protection and safe search in cloud environment
CN106599189A (en) Dynamic Skyline inquiry device based on cloud computing
CN109145225B (en) Data processing method and device
CN107704585A (en) One kind inquiry HDFS data methods and system
CN108319604B (en) Optimization method for association of large and small tables in hive
CN114329154A (en) Safety search method based on data stored by cloud server
Liu et al. Indexing Large Moving Objects from Past to Future with PCFI+-Index.
CN110866275A (en) Approximate retrieval method of big data with privacy protection
CN110909072B (en) Data table establishment method, device and equipment
CN104572648B (en) A kind of storage statistical system and method based on high-performance calculation
CN113986545A (en) Method and device for associating user with role
Huang et al. Processing continuous K-nearest skyline query with uncertainty in spatio-temporal databases
Xie et al. Construction for the city taxi trajectory data analysis system by Hadoop platform
Abdalla et al. NoSQL: Robust and efficient data management on deduplication process by using a mobile application
CN111026747A (en) Distributed graph data management system, method and storage medium
Swari et al. Performance analysis of sales big data processing using hadoop and hive in cloud environment
CN115934759B (en) Acceleration calculation method for massive multi-source heterogeneous satellite data query
Li et al. Dynamic object models with spatial application
Shi et al. A probabilistic range query of moving objects in road network
Daud et al. Improvement from Proof Of Concept into the production environment: cater for highperformance capability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220412