CN114329154A

CN114329154A - Safety search method based on data stored by cloud server

Info

Publication number: CN114329154A
Application number: CN202111650058.7A
Authority: CN
Inventors: 张宏莉; 周志刚; 叶麟; 李东; 余翔湛; 于海宁; 方滨兴
Original assignee: Guangdong Electronic Information Engineering Research Institute of UESTC
Current assignee: Guangdong Electronic Information Engineering Research Institute of UESTC
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-12

Abstract

The invention relates to the technical field of search methods, in particular to a safe search method for data stored on the basis of a cloud server, which researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three big dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution scheme aiming at general search and from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like.

Description

Safety search method based on data stored by cloud server

Technical Field

The invention relates to the technical field of search methods, in particular to a safe search method based on data stored by a cloud server.

Background

Data retrieval is the most common data service and is also the basis for carrying out complex statistical operations such as further analysis and mining on data. Taking this as an example, given a data scan operation of 1PB data, even with a high performance computer, several tens of hours of time overhead is required, no matter which is subjected to a complicated analysis operation. From the search requirements of users, with the pace of life and work of people being accelerated, the user greatly changes the search period, and compared with the method of waiting for an accurate solution boring away, people expect to obtain a quick search experience meeting the precision requirement. From the characteristics of big data, because the big data is often formed by multi-source aggregation, the period mostly contains information such as noise and redundancy, and the big data is mostly open in a limited way (that is, because the big data may contain sensitive/private information, a data owner needs to anonymize the data information before issuing the big data), the expectation of carrying out accurate statistical analysis on the big data is neither practical nor possible. From the search technology, the traditional search engine technology mainly faces to Webl.0 static Web pages, is 'existence scanning search' based on keywords, and cannot support the application of big data with 4V characteristics facing to Web2 and 0/3.0 and meet the search requirement of users on high speed and high precision. These problems have prompted the search of new fast, high-precision big data search techniques for privacy protection.

In addition, as the big data has the characteristics of huge size, rich connotation knowledge and limited openness, the requirement of a user on big data search is not a single-dimensional existence solution, but is evaluated from multiple dimensions such as the precision and the availability of search results and the timeliness of search, so that the traditional search evaluation index aiming at the single dimension is not suitable for evaluating the big data search. On one hand, the user considers the precision and the search time efficiency of the big data search result, and the binary requirement is difficult to be simultaneously met in the practical scene with limited computing resources; on the other hand, the data owner sets differentiated privacy protection strength for data search requests of users with different roles. At present, a common ruler is not available for measuring the privacy protection strength and the search precision, so that the two parties are in impasse.

Disclosure of Invention

The invention provides a safe searching method based on data stored by a cloud server, aiming at the problems in the prior art.

In order to solve the technical problems, the invention adopts the following technical scheme:

the invention provides a safe searching method based on data stored in a cloud server, which comprises the following steps:

the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user;

secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user;

step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user;

step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J;

step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism;

and step six, the data platform returns the task result to the search engine Hermes and presents the task result to the user.

Preferably, the search engine Hermes comprises a search evaluation module, an approximate search module and a search maintenance module.

Preferably, the search evaluation module is responsible for bridging the user and the data platform, waiting for the search request of the user and analyzing the required resources; and intermittently collecting the state information of the data platform, and finally forming a feasible search plan.

Preferably, the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, wherein the operation layer is composed of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation exists that a corresponding statistic formed based on a specific data sampling algorithm is Op, the Op can be included in the operation layer in the form of a component; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides a plurality of sampling techniques.

Preferably, the plurality of sampling techniques include Bernoulli sampling, bootstrap, knife cutting.

Preferably, the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of incremental release of big data, the search maintenance module gives the applicability of the historical results to isomorphic search of new version data from the point of granularity of privacy protection and stability of the historical results.

Preferably, the data privacy protection granularity is the control of the data owner on the information publishing granularity, that is, information with different granularities is provided according to the difference between the requirements and the permissions of the terminal information users.

Preferably, in step three, the search engine Hermes performs "negotiation" with the user to modify the search parameters, that is, the user is required to modify the keywords of the search requirement.

The invention has the beneficial effects that:

the invention provides a safe searching method based on data stored in a cloud server, which comprises the following steps: the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user; secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user; step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user; step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J; step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism; the invention researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like aiming at general retrieval.

Drawings

FIG. 1 is a block diagram of the framework of the present invention.

Detailed Description

In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention. The present invention is described in detail below with reference to the attached drawings.

As shown in fig. 1, the secure search method based on the data stored in the cloud server provided by the present invention includes the following steps: the method comprises the following steps: a user puts forward a data search requirement and submits the data search requirement to a search engine Hermes, wherein the search request is a quadruple Q (Op, D, p, T), wherein Op represents a search operation on a target data set D, p is a search precision lower limit value set by the user, and T is a search time limit acceptable by the user; secondly, when a search request of a user is received, a search engine Hermes firstly collects state information of a current data platform, and estimates the feasibility of Q by analyzing operation Op and statistical information of a data object D and combining with data privacy protection granularity set by the data platform for the user; step three, if the estimated result deviates from the search requirement set by the user, rejecting the search request and carrying out 'negotiation' for modifying the search parameter with the user; step four, if the estimated result is consistent with the search requirement set by the user, generating a search task J; step five, the data platform implements the search task J according to the configuration file of the search task J, and if the search task isomorphic with the task is found in the cache, the search task can be accelerated through a result multiplexing mechanism; the invention researches a new mode and a quantitative evaluation criterion of approximate search and privacy protection of big data from the viewpoints of improving search precision, timeliness, privacy protection granularity, data availability and the like, realizes a data retrieval scheme with three dimensions of big data search in coordination, solves the problem of re-search caused by isomorphic search and data version update, improves the retrieval efficiency of general search, and provides a whole set of solution from multi-dimensional unified quantitative measurement indexes, search modes, architectures, algorithms and the like aiming at general retrieval.

In this embodiment, the search engine Hermes includes a search evaluation module, an approximate search module, and a search maintenance module, where the search evaluation module is responsible for bridging a user and a data platform, waiting for a search request from the user, and analyzing resources required by the user; intermittently collecting state information of a data platform to finally form a feasible search plan, wherein the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, the operation layer consists of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation that corresponding statistics formed based on a specific data sampling algorithm are Op exists, the Op can be brought into the operation layer in the form of the components; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides various sampling technologies, the various sampling technologies comprise Bernoulli sampling, a bootstrap method and a cutting method, the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of incremental release of large data, the search maintenance module starts from the granularity of privacy protection and the stability of the historical results and provides the applicability of the historical results relative to isomorphic search of new version data.

In this embodiment, the data privacy protection granularity is the management and control of data owner to the information issuing granularity, and according to the difference of terminal information user's demand and authority, provide the information of different granularities promptly, data owner can set for the privacy protection granularity of different parameters according to the importance of data, in the high in the clouds is stored, can prevent that the attacker from stealing data.

In this embodiment, in step three, the search engine Hermes and the user perform "negotiation" for modifying search parameters, that is, the user is required to modify the keywords of the search requirement.

Although the present invention has been described with reference to the above preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A safe searching method based on data stored by a cloud server is characterized by comprising the following steps:

2. The secure search method based on the cloud server storage data of claim 1, wherein: the search engine Hermes comprises a search evaluation module, an approximate search module and a search maintenance module.

3. The secure search method based on the cloud server storage data of claim 2, wherein: the search evaluation module is responsible for bridging the user and the data platform, waiting for the search request of the user and analyzing the required resources; and intermittently collecting the state information of the data platform, and finally forming a feasible search plan.

4. The secure search method based on the cloud server storage data of claim 2, wherein: the approximate search module comprises a sampling layer, an acceleration layer and an operation layer, wherein the operation layer is composed of a plurality of basic operation components, and for a given operation Op, if an unbiased estimation that a corresponding statistic based on a specific data sampling algorithm is Op exists, the Op can be brought into the operation layer in the form of the components; the acceleration layer provides a quick response mechanism, and records related information of historical search by constructing a quick table so as to accelerate the received isomorphic search; the sampling layer provides a plurality of sampling techniques.

5. The secure search method based on the cloud server storage data of claim 4, wherein: the various sampling techniques include bernoulli sampling, bootstrap, knife cutting.

6. The secure search method based on the cloud server storage data of claim 2, wherein: the search maintenance module introduces an increment sampling strategy, for isomorphic search with variable precision, time overhead is greatly reduced by effectively multiplexing historical results, and for the characteristic of large data increment release, the search maintenance module gives the applicability of the isomorphic search of the historical results relative to new version data from the granularity of privacy protection and the stability of the historical results.

7. The secure search method based on the cloud server storage data of claim 1, wherein: the data privacy protection granularity is the control of the data owner on the information publishing granularity, namely, information with different granularities is provided according to the difference of the requirements and the authority of terminal information users.

8. The secure search method based on the cloud server storage data of claim 1, wherein: in step three, the search engine Hermes and the user perform "negotiation" for modifying search parameters, that is, the user is required to modify the keywords of the search requirement.